tooluniverse-multi-omics-integration by mims-harvard/tooluniverse
npx skills add https://github.com/mims-harvard/tooluniverse --skill tooluniverse-multi-omics-integration协调并整合多个组学数据集,以进行全面的系统生物学分析。通过编排专门的 ToolUniverse 技能来执行跨组学关联分析、多组学聚类、通路水平整合和统一解释。
Phase 1: Data Loading & QC
加载每种组学类型,进行格式特异性质控,标准化
支持:RNA-seq、蛋白质组学、甲基化、CNV/SNV、代谢组学
Phase 2: Sample Matching
协调样本 ID,寻找共有样本,处理缺失的组学数据
Phase 3: Feature Mapping
将特征映射到共同的基因水平标识符
CpG->基因(启动子区)、CNV->基因、代谢物->酶
Phase 4: Cross-Omics Correlation
RNA 与蛋白质(翻译效率)
甲基化与表达(表观遗传调控)
CNV 与表达(剂量效应)
eQTL 变异与表达(遗传调控)
Phase 5: Multi-Omics Clustering
使用 MOFA+、NMF、SNF 进行患者分型
Phase 6: Pathway-Level Integration
在通路水平汇总组学证据
使用组合证据评估通路失调程度
Phase 7: Biomarker Discovery
跨组学的特征选择,多组学分类
Phase 8: Integrated Report
总结、相关性、聚类、通路、生物标志物
详见:phase_details.md 获取完整的代码和实现细节。
| 组学 | 格式 | 质控重点 |
|---|---|---|
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| 转录组学 |
| CSV/TSV, HDF5, h5ad |
| 低计数过滤,标准化(TPM/DESeq2),对数转换 |
| 蛋白质组学 | MaxQuant, Spectronaut, DIA-NN | 缺失值插补,中位数/分位数标准化 |
| 甲基化 | IDAT, beta 矩阵 | 失败探针,批次校正,交叉反应探针过滤 |
| 基因组学 | VCF, SEG (CNV) | 变异质控,CNV 分段 |
| 代谢组学 | 峰表 | 缺失值,标准化 |
def match_samples_across_omics(omics_data_dict):
"""跨多个组学数据集匹配样本。"""
sample_ids = {k: set(df.columns) for k, df in omics_data_dict.items()}
common_samples = set.intersection(*sample_ids.values())
matched_data = {k: df[sorted(common_samples)] for k, df in omics_data_dict.items()}
return sorted(common_samples), matched_data
from scipy.stats import spearmanr, pearsonr
# RNA 与蛋白质:预期正相关 r ~ 0.4-0.6
# 甲基化与表达:预期负相关(启动子抑制)
# CNV 与表达:预期正相关(剂量效应)
for gene in common_genes:
r, p = spearmanr(rna[gene], protein[gene])
# 使用所有组学的组合证据评估通路失调程度
# 先汇总每个基因的证据,再汇总每个通路的证据
pathway_score = mean(abs(rna_fc) + abs(protein_fc) + abs(meth_diff) + abs(cnv))
详见:phase_details.md 获取每个操作的完整实现。
| 方法 | 描述 | 最佳适用场景 |
|---|---|---|
| MOFA+ | 解释跨组学变异的潜在因子 | 识别共享/组学特异性驱动因素 |
| Joint NMF | 跨组学的共享分解 | 患者亚型发现 |
| SNF | 相似性网络融合 | 整合异构数据类型 |
| 技能 | 用途 | 阶段 |
|---|---|---|
tooluniverse-rnaseq-deseq2 | RNA-seq 分析 | 1, 4 |
tooluniverse-epigenomics | 甲基化,ChIP-seq | 1, 4 |
tooluniverse-variant-analysis | CNV/SNV 处理 | 1, 3, 4 |
tooluniverse-protein-interactions | 蛋白质网络背景 | 6 |
tooluniverse-gene-enrichment | 通路富集分析 | 6 |
tooluniverse-expression-data-retrieval | 公共数据检索 | 1 |
tooluniverse-target-research | 基因/蛋白质注释 | 3, 8 |
整合 TCGA RNA-seq + 蛋白质组学 + 甲基化 + CNV 数据,以识别患者亚型、跨组学驱动基因和多组学生物标志物。
识别 SNP -> 甲基化 -> 表达的调控链(中介分析)。
使用基线多组学谱预测药物反应;识别耐药/敏感通路。
详见:phase_details.md 中的"使用案例"部分,获取详细的逐步工作流程。
| 组件 | 要求 |
|---|---|
| 组学类型 | 至少 2 个数据集 |
| 共有样本 | 跨组学至少 10 个 |
| 跨组学相关性 | 计算 Pearson/Spearman 相关性 |
| 聚类 | 至少使用一种方法(MOFA+、NMF 或 SNF) |
| 通路整合 | 使用多组学证据评分进行富集分析 |
| 报告 | 包含总结、相关性、聚类、通路、生物标志物 |
每周安装次数
124
代码仓库
GitHub 星标数
1.2K
首次出现
Feb 19, 2026
安全审计
安装于
codex121
gemini-cli120
opencode120
github-copilot119
cursor117
kimi-cli116
Coordinate and integrate multiple omics datasets for comprehensive systems biology analysis. Orchestrates specialized ToolUniverse skills to perform cross-omics correlation, multi-omics clustering, pathway-level integration, and unified interpretation.
Phase 1: Data Loading & QC
Load each omics type, format-specific QC, normalize
Supported: RNA-seq, proteomics, methylation, CNV/SNV, metabolomics
Phase 2: Sample Matching
Harmonize sample IDs, find common samples, handle missing omics
Phase 3: Feature Mapping
Map features to common gene-level identifiers
CpG->gene (promoter), CNV->gene, metabolite->enzyme
Phase 4: Cross-Omics Correlation
RNA vs Protein (translation efficiency)
Methylation vs Expression (epigenetic regulation)
CNV vs Expression (dosage effect)
eQTL variants vs Expression (genetic regulation)
Phase 5: Multi-Omics Clustering
MOFA+, NMF, SNF for patient subtyping
Phase 6: Pathway-Level Integration
Aggregate omics evidence at pathway level
Score pathway dysregulation with combined evidence
Phase 7: Biomarker Discovery
Feature selection across omics, multi-omics classification
Phase 8: Integrated Report
Summary, correlations, clusters, pathways, biomarkers
See: phase_details.md for complete code and implementation details.
| Omics | Formats | QC Focus |
|---|---|---|
| Transcriptomics | CSV/TSV, HDF5, h5ad | Low-count filter, normalize (TPM/DESeq2), log-transform |
| Proteomics | MaxQuant, Spectronaut, DIA-NN | Missing value imputation, median/quantile normalization |
| Methylation | IDAT, beta matrices | Failed probes, batch correction, cross-reactive filter |
| Genomics | VCF, SEG (CNV) | Variant QC, CNV segmentation |
| Metabolomics | Peak tables | Missing values, normalization |
def match_samples_across_omics(omics_data_dict):
"""Match samples across multiple omics datasets."""
sample_ids = {k: set(df.columns) for k, df in omics_data_dict.items()}
common_samples = set.intersection(*sample_ids.values())
matched_data = {k: df[sorted(common_samples)] for k, df in omics_data_dict.items()}
return sorted(common_samples), matched_data
from scipy.stats import spearmanr, pearsonr
# RNA vs Protein: expect positive r ~ 0.4-0.6
# Methylation vs Expression: expect negative r (promoter repression)
# CNV vs Expression: expect positive r (dosage effect)
for gene in common_genes:
r, p = spearmanr(rna[gene], protein[gene])
# Score pathway dysregulation using combined evidence from all omics
# Aggregate per-gene evidence, then per-pathway
pathway_score = mean(abs(rna_fc) + abs(protein_fc) + abs(meth_diff) + abs(cnv))
See: phase_details.md for full implementations of each operation.
| Method | Description | Best For |
|---|---|---|
| MOFA+ | Latent factors explaining cross-omics variation | Identifying shared/omics-specific drivers |
| Joint NMF | Shared decomposition across omics | Patient subtype discovery |
| SNF | Similarity network fusion | Integrating heterogeneous data types |
| Skill | Used For | Phase |
|---|---|---|
tooluniverse-rnaseq-deseq2 | RNA-seq analysis | 1, 4 |
tooluniverse-epigenomics | Methylation, ChIP-seq | 1, 4 |
tooluniverse-variant-analysis | CNV/SNV processing | 1, 3, 4 |
tooluniverse-protein-interactions | Protein network context | 6 |
tooluniverse-gene-enrichment |
Integrate TCGA RNA-seq + proteomics + methylation + CNV to identify patient subtypes, cross-omics driver genes, and multi-omics biomarkers.
Identify SNP -> methylation -> expression regulatory chains (mediation analysis).
Predict drug response using baseline multi-omics profiles; identify resistance/sensitivity pathways.
See: phase_details.md "Use Cases" for detailed step-by-step workflows.
| Component | Requirement |
|---|---|
| Omics types | At least 2 datasets |
| Common samples | At least 10 across omics |
| Cross-correlation | Pearson/Spearman computed |
| Clustering | At least one method (MOFA+, NMF, or SNF) |
| Pathway integration | Enrichment with multi-omics evidence scores |
| Report | Summary, correlations, clusters, pathways, biomarkers |
Weekly Installs
124
Repository
GitHub Stars
1.2K
First Seen
Feb 19, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
codex121
gemini-cli120
opencode120
github-copilot119
cursor117
kimi-cli116
免费AI数据抓取智能体:自动化收集、丰富与存储网站/API数据
1,100 周安装
Vuetify0 无头组件库 - Vue 3 组合式函数与无样式UI构建块
219 周安装
AWS Serverless 开发指南:Lambda 函数与 API Gateway 集成模式最佳实践
216 周安装
Simplify代码简化工具 - 提升代码清晰度、一致性和可维护性的AI助手
222 周安装
GitHub CLI (gh) 使用指南:高效管理仓库、PR、议题与API调用
212 周安装
Cheerio HTML解析教程:Node.js网页抓取与DOM操作指南
214 周安装
资深前端开发工程师AI助手 - 精通React/NextJS/TypeScript/TailwindCSS
216 周安装
| Pathway enrichment |
| 6 |
tooluniverse-expression-data-retrieval | Public data retrieval | 1 |
tooluniverse-target-research | Gene/protein annotation | 3, 8 |