Microbiome Research by mims-harvard/tooluniverse
npx skills add https://github.com/mims-harvard/tooluniverse --skill 'Microbiome Research'利用 MGnify(EBI 宏基因组学)、GTDB(基因组分类学)、ENA(测序数据)、OLS(用于 ENVO 生物群系的术语查询)和 EuropePMC(文献)进行全面的微生物组分析。
| 工具 | 用途 | 认证 |
|---|---|---|
| MGnify_search_studies | 按生物群系/关键词查找宏基因组学研究 | 无 |
| MGnify_search_studies_detail | 研究元数据、摘要、样本计数 | 无 |
| MGnify_list_analyses | 列出研究的分类/功能分析输出 | 无 |
| MGnify_get_taxonomy | 从分析中获取分类组成 | 无 |
| MGnify_get_go_terms | 从分析中获取 GO 功能注释 | 无 |
| MGnify_get_interpro | InterPro 蛋白质结构域注释 | 无 |
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| MGnify_list_biomes |
| 浏览 MGnify 生物群系层级 |
| 无 |
| MGnify_search_genomes | 搜索宏基因组组装基因组 | 无 |
| MGnify_get_genome | 基因组质量指标(完整性、污染度) | 无 |
| GTDB_search_genomes | 按分类学搜索细菌/古菌基因组 | 无 |
| GTDB_get_species | 获取 GTDB 中的物种簇详细信息 | 无 |
| GTDB_get_taxon_info | 获取 GTDB 层级中的分类等级信息 | 无 |
| GTDB_search_taxon | 在所有等级中按部分名称搜索分类单元 | 无 |
| ENAPortal_search_studies | 在 ENA 中查找测序研究。查询格式:description="关键词" | 无 |
| ENAPortal_search_samples | 查找带有环境元数据的样本 | 无 |
| ols_search_terms | 在 ENVO 本体中搜索生物群系/环境术语 | 无 |
| EuropePMC_search_articles | 查找微生物组出版物 | 无 |
| PubMed_search_articles | 文献搜索(覆盖范围与 EuropePMC 不同) | 无 |
对于药物-微生物组研究,还可使用:
PubChem_get_CID_by_compound_name / PubChem_get_compound_properties_by_CID — 药物身份CTD_get_chemical_gene_interactions — 药物-基因相互作用(例如,二甲双胍影响 1,175+ 个基因)kegg_search_pathway / kegg_get_pathway_info — 微生物代谢途径(丁酸、丙酸)ReactomeAnalysis_pathway_enrichment — 对受药物影响的基因进行宿主通路富集分析drugbank_vocab_search — 药物机制和靶点MGnify 提示:使用简洁的单关键词搜索(例如 "metformin")—— 多词查询可能会超时。对于广泛的搜索,MGnify API 可能较慢。
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()
# 1. 搜索肠道微生物组研究
studies = tu.run_one_function({
'name': 'MGnify_search_studies',
'arguments': {'search': 'gut microbiome', 'size': 5}
})
# 2. 获取研究详情
detail = tu.run_one_function({
'name': 'MGnify_search_studies_detail',
'arguments': {'study_accession': 'MGYS00006860'}
})
# 3. 列出研究的分析
analyses = tu.run_one_function({
'name': 'MGnify_list_analyses',
'arguments': {'study_accession': 'MGYS00006860', 'size': 5}
})
# 4. 从分析中获取分类谱
taxonomy = tu.run_one_function({
'name': 'MGnify_get_taxonomy',
'arguments': {'analysis_accession': 'MGYA00612683'}
})
# 5. 获取功能注释
go_terms = tu.run_one_function({
'name': 'MGnify_get_go_terms',
'arguments': {'analysis_accession': 'MGYA00612683'}
})
使用 MGnify 的生物群系层级查找特定生物群系的研究:
# 浏览生物群系层级
biomes = tu.run_one_function({
'name': 'MGnify_list_biomes',
'arguments': {'lineage': 'root:Host-associated:Human', 'depth': 3}
})
# 在特定生物群系中搜索研究
studies = tu.run_one_function({
'name': 'MGnify_search_studies',
'arguments': {'biome': 'root:Host-associated:Human:Digestive system', 'size': 10}
})
# 查找环境元数据的 ENVO 本体术语
envo = tu.run_one_function({
'name': 'ols_search_terms',
'arguments': {'query': 'human gut', 'ontology': 'envo', 'rows': 5}
})
获取宏基因组样本的微生物组成:
# 获取研究的分析
analyses = tu.run_one_function({
'name': 'MGnify_list_analyses',
'arguments': {'study_accession': 'MGYS00006860', 'size': 3}
})
# 获取特定分析的分类信息
taxonomy = tu.run_one_function({
'name': 'MGnify_get_taxonomy',
'arguments': {'analysis_accession': 'MGYA00612683'}
})
# 返回带有谱系、丰度计数和分类等级的微生物
评估宏基因组组装基因组:
# 搜索特定分类单元的基因组
genomes = tu.run_one_function({
'name': 'MGnify_search_genomes',
'arguments': {'search': 'Faecalibacterium prausnitzii', 'size': 5}
})
# 获取基因组的质量指标
genome = tu.run_one_function({
'name': 'MGnify_get_genome',
'arguments': {'genome_accession': 'MGYG000000001'}
})
# 返回完整性、污染度、N50、基因组长度、分类学信息
# 与 GTDB 分类学交叉参考
gtdb = tu.run_one_function({
'name': 'GTDB_search_genomes',
'arguments': {'operation': 'search_genomes', 'query': 'Faecalibacterium', 'items_per_page': 5}
})
发现宏基因组的功能潜力:
# 从分析中获取 GO 术语
go_terms = tu.run_one_function({
'name': 'MGnify_get_go_terms',
'arguments': {'analysis_accession': 'MGYA00612683'}
})
# InterPro 结构域
interpro = tu.run_one_function({
'name': 'MGnify_get_interpro',
'arguments': {'analysis_accession': 'MGYA00612683'}
})
将宏基因组学数据与已发表的研究相结合:
# 查找相关出版物
papers = tu.run_one_function({
'name': 'EuropePMC_search_articles',
'arguments': {'query': 'gut microbiome AND Faecalibacterium AND (IBD OR "Crohn")', 'limit': 10}
})
# 在 ENA 中查找测序数据
ena_studies = tu.run_one_function({
'name': 'ENAPortal_search_studies',
'arguments': {'query': 'description="gut microbiome 16S"', 'limit': 5}
})
关键生物群系谱系(使用 MGnify_list_biomes 发现其他谱系):
root:Host-associated:Human:Digestive systemroot:Host-associated:Human:Oral / root:Host-associated:Human:Skinroot:Environmental:Terrestrial:Soilroot:Environmental:Aquatic:Marine / root:Environmental:Aquatic:Freshwaterroot:Engineered:WastewaterMGnify:研究=MGYS*,分析=MGYA*,基因组=MGYG*。ENA 研究=PRJEB*。GTDB 基因组=GCA_*。ENVO 术语=ENVO:*(例如 ENVO:00002041)。
微生物组分析始于:问题是什么?先查询,勿猜测 — 在解释结果之前,务必检查研究类型和测序方法。
数据类型决策树:
在调用任何工具之前,通过 MGnify_search_studies_detail 确定用户拥有的数据类型 —— 流程类型(扩增子 vs 鸟枪法)决定了哪些分析是有效的。不要将 16S 多样性指标应用于宏基因组数据,反之亦然。
菌群失调(微生物失衡)是依赖于上下文的 —— 没有通用的“健康”微生物组。先查询,勿猜测 —— 与研究中匹配的对照组比较,而非一般人群参考。
MGnify_get_taxonomy 获取群落谱,然后评估丰富度和均匀度。GTDB_get_species 和通过 EuropePMC_search_articles 的文献来查询类群作用。MGnify_get_go_terms 和 MGnify_get_interpro。MGnify_get_taxonomy + GTDB_search_genomes。MGnify_get_go_terms + MGnify_get_interpro + kegg_search_pathway。| 等级 | 描述 | 示例 |
|---|---|---|
| T1 | 在多个队列中重复发现且效应一致 | IBD 患者中 Faecalibacterium 减少(>10 项独立研究) |
| T2 | 单项具有足够统计效力的研究(n > 100)并设有适当对照 | 一项对照试验中二甲双胍相关的 Akkermansia 富集 |
| T3 | 试点研究或观察性关联,样本量小 | n=15 病例对照中的分类学变化,无验证队列 |
| T4 | 计算预测或单样本观察 | 具有预测功能的新 MAG,无培养确认 |
α 多样性(样本内):Shannon 指数衡量丰富度和均匀度。较高的 Shannon 指数(肠道 >3.0)表明群落稳定。α 多样性降低与菌群失调(IBD、抗生素)相关。始终与研究匹配的对照组比较 —— 多样性因身体部位、测序深度和地理位置而异。
β 多样性(样本间):Bray-Curtis(基于丰度)或 UniFrac(基于系统发育)。PERMANOVA p < 0.05 且 R 平方 > 0.05 表明条件驱动的聚类。即使 p 值显著,低 R 平方(<0.02)也表明相对于个体间变异,效应较小。当优势类群最重要时选择加权 UniFrac;当稀有类群重要时选择非加权 UniFrac。
分类学组成:门水平的相对丰度(厚壁菌门/拟杆菌门比率)是一个粗略指标;属或种水平的分辨率更佳。在多个样本中相对丰度 >1% 的类群是可靠检测到的。相对丰度 <0.1% 的类群可能是噪音或测序假象。GTDB 分类学可能重新分类 NCBI 名称(例如,厚壁菌门被拆分为多个门)。
功能谱分析:来自 MGnify 的 GO 术语和 InterPro 结构域反映了群落的功能潜力(不一定是活性)。特定途径(例如,丁酸生产、LPS 生物合成)的富集应结合分类学数据进行解读,以确定哪些微生物贡献了这些功能。
一份完整的微生物组报告应回答:
MGYS 开头,分析以 MGYA 开头,基因组以 MGYG 开头MGnify_list_biomes 查找正确的生物群系谱系字符串MGnify_get_taxonomy 返回从门水平到种水平的组成size 参数控制每页结果数(最大 100)每周安装次数
–
代码仓库
GitHub 星标数
1.2K
首次出现
–
安全审计
Comprehensive microbiome analysis using MGnify (EBI metagenomics), GTDB (genome taxonomy), ENA (sequencing data), OLS (ontology lookup for ENVO biomes), and EuropePMC (literature).
| Tool | Purpose | Auth |
|---|---|---|
| MGnify_search_studies | Find metagenomics studies by biome/keyword | None |
| MGnify_search_studies_detail | Study metadata, abstract, sample counts | None |
| MGnify_list_analyses | List taxonomic/functional analysis outputs for a study | None |
| MGnify_get_taxonomy | Taxonomic composition from an analysis | None |
| MGnify_get_go_terms | GO functional annotations from an analysis | None |
| MGnify_get_interpro | InterPro protein domain annotations | None |
| MGnify_list_biomes | Browse MGnify biome hierarchy | None |
| MGnify_search_genomes | Search metagenome-assembled genomes (MAGs) | None |
| MGnify_get_genome | Genome quality metrics (completeness, contamination) | None |
| GTDB_search_genomes | Search bacterial/archaeal genomes by taxonomy | None |
| GTDB_get_species | Species cluster details from GTDB | None |
| GTDB_get_taxon_info | Taxonomic rank info in GTDB hierarchy | None |
| GTDB_search_taxon | Search taxa by partial name across all ranks | None |
| ENAPortal_search_studies | Find sequencing studies in ENA. Query format: description="keyword" | None |
| ENAPortal_search_samples | Find samples with environmental metadata | None |
| ols_search_terms | Search ENVO ontology for biome/environment terms | None |
| EuropePMC_search_articles | Find microbiome publications | None |
| PubMed_search_articles | Literature search (different coverage than EuropePMC) | None |
For drug-microbiome studies , also use:
PubChem_get_CID_by_compound_name / PubChem_get_compound_properties_by_CID — drug identityCTD_get_chemical_gene_interactions — drug-gene interactions (e.g., metformin affects 1,175+ genes)kegg_search_pathway / kegg_get_pathway_info — microbial metabolic pathways (butanoate, propanoate)ReactomeAnalysis_pathway_enrichment — host pathway enrichment for drug-affected genesdrugbank_vocab_search — drug mechanism and targetsMGnify tip : Use concise single-keyword searches (e.g., "metformin") — multi-word queries may timeout. The MGnify API can be slow for broad searches.
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()
# 1. Search for gut microbiome studies
studies = tu.run_one_function({
'name': 'MGnify_search_studies',
'arguments': {'search': 'gut microbiome', 'size': 5}
})
# 2. Get study details
detail = tu.run_one_function({
'name': 'MGnify_search_studies_detail',
'arguments': {'study_accession': 'MGYS00006860'}
})
# 3. List analyses for a study
analyses = tu.run_one_function({
'name': 'MGnify_list_analyses',
'arguments': {'study_accession': 'MGYS00006860', 'size': 5}
})
# 4. Get taxonomic profile from an analysis
taxonomy = tu.run_one_function({
'name': 'MGnify_get_taxonomy',
'arguments': {'analysis_accession': 'MGYA00612683'}
})
# 5. Get functional annotations
go_terms = tu.run_one_function({
'name': 'MGnify_get_go_terms',
'arguments': {'analysis_accession': 'MGYA00612683'}
})
Find studies for a specific biome using MGnify's biome hierarchy:
# Browse biome hierarchy
biomes = tu.run_one_function({
'name': 'MGnify_list_biomes',
'arguments': {'lineage': 'root:Host-associated:Human', 'depth': 3}
})
# Search studies in a specific biome
studies = tu.run_one_function({
'name': 'MGnify_search_studies',
'arguments': {'biome': 'root:Host-associated:Human:Digestive system', 'size': 10}
})
# Look up ENVO ontology terms for environment metadata
envo = tu.run_one_function({
'name': 'ols_search_terms',
'arguments': {'query': 'human gut', 'ontology': 'envo', 'rows': 5}
})
Get the microbial composition of a metagenomics sample:
# Get analyses for a study
analyses = tu.run_one_function({
'name': 'MGnify_list_analyses',
'arguments': {'study_accession': 'MGYS00006860', 'size': 3}
})
# Get taxonomy for a specific analysis
taxonomy = tu.run_one_function({
'name': 'MGnify_get_taxonomy',
'arguments': {'analysis_accession': 'MGYA00612683'}
})
# Returns organisms with lineage, abundance counts, and taxonomy rank
Evaluate metagenome-assembled genomes (MAGs):
# Search for genomes from a specific taxon
genomes = tu.run_one_function({
'name': 'MGnify_search_genomes',
'arguments': {'search': 'Faecalibacterium prausnitzii', 'size': 5}
})
# Get quality metrics for a genome
genome = tu.run_one_function({
'name': 'MGnify_get_genome',
'arguments': {'genome_accession': 'MGYG000000001'}
})
# Returns completeness, contamination, N50, genome length, taxonomy
# Cross-reference with GTDB taxonomy
gtdb = tu.run_one_function({
'name': 'GTDB_search_genomes',
'arguments': {'operation': 'search_genomes', 'query': 'Faecalibacterium', 'items_per_page': 5}
})
Discover functional potential of a metagenome:
# GO terms from an analysis
go_terms = tu.run_one_function({
'name': 'MGnify_get_go_terms',
'arguments': {'analysis_accession': 'MGYA00612683'}
})
# InterPro domains
interpro = tu.run_one_function({
'name': 'MGnify_get_interpro',
'arguments': {'analysis_accession': 'MGYA00612683'}
})
Combine metagenomics data with published research:
# Find relevant publications
papers = tu.run_one_function({
'name': 'EuropePMC_search_articles',
'arguments': {'query': 'gut microbiome AND Faecalibacterium AND (IBD OR "Crohn")', 'limit': 10}
})
# Find sequencing data in ENA
ena_studies = tu.run_one_function({
'name': 'ENAPortal_search_studies',
'arguments': {'query': 'description="gut microbiome 16S"', 'limit': 5}
})
Key biome lineages (use MGnify_list_biomes to discover others):
root:Host-associated:Human:Digestive systemroot:Host-associated:Human:Oral / root:Host-associated:Human:Skinroot:Environmental:Terrestrial:Soilroot:Environmental:Aquatic:Marine / root:Environmental:Aquatic:Freshwaterroot:Engineered:WastewaterMGnify: studies=MGYS*, analyses=MGYA*, genomes=MGYG*. ENA studies=PRJEB*. GTDB genomes=GCA_*. ENVO terms=ENVO:* (e.g. ENVO:00002041).
Microbiome analysis starts with: what is the question? LOOK UP DON'T GUESS — always check the study type and sequencing method before interpreting results.
Decision tree for data type:
Before calling any tool, determine which data type the user has via MGnify_search_studies_detail — the pipeline type (amplicon vs shotgun) determines which analyses are valid. Do not apply 16S diversity metrics to metagenomic data or vice versa.
Dysbiosis (microbial imbalance) is context-dependent — there is no universal "healthy" microbiome. LOOK UP DON'T GUESS — compare to study-matched controls, not general population references.
MGnify_get_taxonomy to get community profiles, then assess richness and evenness.GTDB_get_species and literature via EuropePMC_search_articles.MGnify_get_go_terms and MGnify_get_interpro for the affected samples.MGnify_get_taxonomy + GTDB_search_genomes.MGnify_get_go_terms + MGnify_get_interpro + kegg_search_pathway.| Tier | Description | Example |
|---|---|---|
| T1 | Replicated finding across multiple cohorts with consistent effect | Reduced Faecalibacterium in IBD (>10 independent studies) |
| T2 | Single well-powered study (n > 100) with appropriate controls | Metformin-associated Akkermansia enrichment in a controlled trial |
| T3 | Pilot study or observational association, small sample size | Taxonomic shift in n=15 case-control, no validation cohort |
| T4 | Computational prediction or single-sample observation | Novel MAG with predicted function, no culture confirmation |
Alpha diversity (within-sample) : Shannon index measures richness and evenness. Higher Shannon (>3.0 for gut) suggests a stable community. Reduced alpha diversity is associated with dysbiosis (IBD, antibiotics). Always compare to study-matched controls — diversity varies by body site, sequencing depth, and geography.
Beta diversity (between-sample) : Bray-Curtis (abundance-based) or UniFrac (phylogenetic). PERMANOVA p < 0.05 with R-squared > 0.05 indicates condition-driven clustering. Low R-squared (<0.02) even with significant p suggests the effect is small relative to inter-individual variation. Choose weighted UniFrac when abundant taxa matter most; unweighted when rare taxa are important.
Taxonomic composition : Relative abundance at phylum level (Firmicutes/Bacteroidetes ratio) is a coarse indicator; genus- or species-level resolution is preferred. A taxon present at >1% relative abundance in multiple samples is reliably detected. Taxa at <0.1% may be noise or sequencing artifacts. GTDB taxonomy may reclassify NCBI names (e.g., Firmicutes split into multiple phyla).
Functional profiling : GO terms and InterPro domains from MGnify reflect the metabolic potential (not necessarily activity) of the community. Enrichment of specific pathways (e.g., butyrate production, LPS biosynthesis) should be interpreted alongside taxonomic data to identify which organisms contribute the functions.
A complete microbiome report should answer:
MGYS, analyses with MGYA, genomes with MGYGMGnify_list_biomes first to find the correct biome lineage stringMGnify_get_taxonomy returns phylum-level to species-level compositionsize parameter in MGnify tools controls results per page (max 100)Weekly Installs
–
Repository
GitHub Stars
1.2K
First Seen
–
Security Audits
Excel财务建模规范与xlsx文件处理指南:专业格式、零错误公式与数据分析
42,000 周安装