微生物组研究工具集 | 宏基因组分析、分类学查询与文献检索一站式解决方案

Microbiome Research by mims-harvard/tooluniverse

1,200 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/mims-harvard/tooluniverse --skill 'Microbiome Research'

数据分析科研工具生物信息学

🇨🇳中文介绍

使用 ToolUniverse 进行微生物组研究

利用 MGnify（EBI 宏基因组学）、GTDB（基因组分类学）、ENA（测序数据）、OLS（用于 ENVO 生物群系的术语查询）和 EuropePMC（文献）进行全面的微生物组分析。

核心工具

工具	用途	认证
MGnify_search_studies	按生物群系/关键词查找宏基因组学研究	无
MGnify_search_studies_detail	研究元数据、摘要、样本计数	无
MGnify_list_analyses	列出研究的分类/功能分析输出	无
MGnify_get_taxonomy	从分析中获取分类组成	无
MGnify_get_go_terms	从分析中获取 GO 功能注释	无
MGnify_get_interpro	InterPro 蛋白质结构域注释	无

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

相关 Skills

Azure Data Explorer (Kusto) 查询技能：KQL数据分析、日志遥测与时间序列处理

114,200 周安装

专业SEO审计工具：全面网站诊断、技术SEO优化与页面分析指南

62,600 周安装

Python PDF处理教程：合并拆分、提取文本表格、创建PDF文件

58,600 周安装

DOCX文件创建、编辑与分析完整指南 - 使用docx-js、Pandoc和Python脚本

46,400 周安装

PubChem_get_CID_by_compound_name / PubChem_get_compound_properties_by_CID — 药物身份
CTD_get_chemical_gene_interactions — 药物-基因相互作用（例如，二甲双胍影响 1,175+ 个基因）
kegg_search_pathway / kegg_get_pathway_info — 微生物代谢途径（丁酸、丙酸）
ReactomeAnalysis_pathway_enrichment — 对受药物影响的基因进行宿主通路富集分析
drugbank_vocab_search — 药物机制和靶点

from tooluniverse import ToolUniverse

tu = ToolUniverse()
tu.load_tools()

# 1. 搜索肠道微生物组研究
studies = tu.run_one_function({
    'name': 'MGnify_search_studies',
    'arguments': {'search': 'gut microbiome', 'size': 5}
})

# 2. 获取研究详情
detail = tu.run_one_function({
    'name': 'MGnify_search_studies_detail',
    'arguments': {'study_accession': 'MGYS00006860'}
})

# 3. 列出研究的分析
analyses = tu.run_one_function({
    'name': 'MGnify_list_analyses',
    'arguments': {'study_accession': 'MGYS00006860', 'size': 5}
})

# 4. 从分析中获取分类谱
taxonomy = tu.run_one_function({
    'name': 'MGnify_get_taxonomy',
    'arguments': {'analysis_accession': 'MGYA00612683'}
})

# 5. 获取功能注释
go_terms = tu.run_one_function({
    'name': 'MGnify_get_go_terms',
    'arguments': {'analysis_accession': 'MGYA00612683'}
})

# 浏览生物群系层级
biomes = tu.run_one_function({
    'name': 'MGnify_list_biomes',
    'arguments': {'lineage': 'root:Host-associated:Human', 'depth': 3}
})

# 在特定生物群系中搜索研究
studies = tu.run_one_function({
    'name': 'MGnify_search_studies',
    'arguments': {'biome': 'root:Host-associated:Human:Digestive system', 'size': 10}
})

# 查找环境元数据的 ENVO 本体术语
envo = tu.run_one_function({
    'name': 'ols_search_terms',
    'arguments': {'query': 'human gut', 'ontology': 'envo', 'rows': 5}
})

# 获取研究的分析
analyses = tu.run_one_function({
    'name': 'MGnify_list_analyses',
    'arguments': {'study_accession': 'MGYS00006860', 'size': 3}
})

# 获取特定分析的分类信息
taxonomy = tu.run_one_function({
    'name': 'MGnify_get_taxonomy',
    'arguments': {'analysis_accession': 'MGYA00612683'}
})
# 返回带有谱系、丰度计数和分类等级的微生物

# 搜索特定分类单元的基因组
genomes = tu.run_one_function({
    'name': 'MGnify_search_genomes',
    'arguments': {'search': 'Faecalibacterium prausnitzii', 'size': 5}
})

# 获取基因组的质量指标
genome = tu.run_one_function({
    'name': 'MGnify_get_genome',
    'arguments': {'genome_accession': 'MGYG000000001'}
})
# 返回完整性、污染度、N50、基因组长度、分类学信息

# 与 GTDB 分类学交叉参考
gtdb = tu.run_one_function({
    'name': 'GTDB_search_genomes',
    'arguments': {'operation': 'search_genomes', 'query': 'Faecalibacterium', 'items_per_page': 5}
})

# 从分析中获取 GO 术语
go_terms = tu.run_one_function({
    'name': 'MGnify_get_go_terms',
    'arguments': {'analysis_accession': 'MGYA00612683'}
})

# InterPro 结构域
interpro = tu.run_one_function({
    'name': 'MGnify_get_interpro',
    'arguments': {'analysis_accession': 'MGYA00612683'}
})

# 查找相关出版物
papers = tu.run_one_function({
    'name': 'EuropePMC_search_articles',
    'arguments': {'query': 'gut microbiome AND Faecalibacterium AND (IBD OR "Crohn")', 'limit': 10}
})

# 在 ENA 中查找测序数据
ena_studies = tu.run_one_function({
    'name': 'ENAPortal_search_studies',
    'arguments': {'query': 'description="gut microbiome 16S"', 'limit': 5}
})

人类肠道：root:Host-associated:Human:Digestive system
人类口腔/皮肤：root:Host-associated:Human:Oral / root:Host-associated:Human:Skin
土壤：root:Environmental:Terrestrial:Soil
海洋/淡水：root:Environmental:Aquatic:Marine / root:Environmental:Aquatic:Freshwater
废水：root:Engineered:Wastewater

MGnify：研究=MGYS*，分析=MGYA*，基因组=MGYG*。ENA 研究=PRJEB*。GTDB 基因组=GCA_*。ENVO 术语=ENVO:*（例如 ENVO:00002041）。

检查 α 多样性：相对于对照组，Shannon 指数降低提示菌群失调。使用 MGnify_get_taxonomy 获取群落谱，然后评估丰富度和均匀度。
识别关键类群变化：已知有益类群的丧失（例如，肠道中的 Faecalibacterium、Roseburia）或机会致病菌的爆发（例如，肠杆菌科）。使用 GTDB_get_species 和通过 EuropePMC_search_articles 的文献来查询类群作用。
功能后果：分类学变化是否与代谢途径的丧失/获得相关？检查受影响样本的 MGnify_get_go_terms 和 MGnify_get_interpro。
混杂因素：抗生素、饮食、年龄和地理位置都会影响微生物组组成。提出菌群失调主张需要控制这些因素或承认其为局限性。

当问题是“存在哪些微生物？”或“组间群落组成是否不同？”时，仅进行分类学分析就足够了。使用 MGnify_get_taxonomy + GTDB_search_genomes。
当问题是“哪些代谢能力不同？”或“为什么分类学变化很重要？”时，需要进行功能分析。使用 MGnify_get_go_terms + MGnify_get_interpro + kegg_search_pathway。
当将微生物与功能联系起来时（例如，“在健康与 IBD 肠道中，哪些类群驱动丁酸产生？”），两者结合使用。将分类学谱与来自同一 MGnify 分析的功能注释进行交叉参考。

等级	描述	示例
T1	在多个队列中重复发现且效应一致	IBD 患者中 Faecalibacterium 减少（>10 项独立研究）
T2	单项具有足够统计效力的研究（n > 100）并设有适当对照	一项对照试验中二甲双胍相关的 Akkermansia 富集
T3	试点研究或观察性关联，样本量小	n=15 病例对照中的分类学变化，无验证队列
T4	计算预测或单样本观察	具有预测功能的新 MAG，无培养确认

MGnify 研究登录号以 MGYS 开头，分析以 MGYA 开头，基因组以 MGYG 开头
首先使用 MGnify_list_biomes 查找正确的生物群系谱系字符串
MGnify_get_taxonomy 返回从门水平到种水平的组成
GTDB 提供标准化的细菌/古菌分类学（在某些谱系中与 NCBI 不同）
对于 16S 扩增子研究，分类学是主要输出；对于鸟枪法宏基因组学，分类学和功能注释均可用
MGnify 工具中的 size 参数控制每页结果数（最大 100）

🇺🇸English

Microbiome Research with ToolUniverse

Comprehensive microbiome analysis using MGnify (EBI metagenomics), GTDB (genome taxonomy), ENA (sequencing data), OLS (ontology lookup for ENVO biomes), and EuropePMC (literature).

Core Tools

Tool	Purpose	Auth
MGnify_search_studies	Find metagenomics studies by biome/keyword	None
MGnify_search_studies_detail	Study metadata, abstract, sample counts	None
MGnify_list_analyses	List taxonomic/functional analysis outputs for a study	None
MGnify_get_taxonomy	Taxonomic composition from an analysis	None
MGnify_get_go_terms	GO functional annotations from an analysis	None
MGnify_get_interpro	InterPro protein domain annotations	None
MGnify_list_biomes	Browse MGnify biome hierarchy	None
MGnify_search_genomes	Search metagenome-assembled genomes (MAGs)	None
MGnify_get_genome	Genome quality metrics (completeness, contamination)	None
GTDB_search_genomes	Search bacterial/archaeal genomes by taxonomy	None
GTDB_get_species	Species cluster details from GTDB	None
GTDB_get_taxon_info	Taxonomic rank info in GTDB hierarchy	None
GTDB_search_taxon	Search taxa by partial name across all ranks	None
ENAPortal_search_studies	Find sequencing studies in ENA. Query format: `description="keyword"`	None
ENAPortal_search_samples	Find samples with environmental metadata	None
ols_search_terms	Search ENVO ontology for biome/environment terms	None
EuropePMC_search_articles	Find microbiome publications	None
PubMed_search_articles	Literature search (different coverage than EuropePMC)	None

For drug-microbiome studies , also use:

PubChem_get_CID_by_compound_name / PubChem_get_compound_properties_by_CID — drug identity
CTD_get_chemical_gene_interactions — drug-gene interactions (e.g., metformin affects 1,175+ genes)
kegg_search_pathway / kegg_get_pathway_info — microbial metabolic pathways (butanoate, propanoate)
ReactomeAnalysis_pathway_enrichment — host pathway enrichment for drug-affected genes
drugbank_vocab_search — drug mechanism and targets

MGnify tip : Use concise single-keyword searches (e.g., "metformin") — multi-word queries may timeout. The MGnify API can be slow for broad searches.

Quick Start

from tooluniverse import ToolUniverse

tu = ToolUniverse()
tu.load_tools()

# 1. Search for gut microbiome studies
studies = tu.run_one_function({
    'name': 'MGnify_search_studies',
    'arguments': {'search': 'gut microbiome', 'size': 5}
})

# 2. Get study details
detail = tu.run_one_function({
    'name': 'MGnify_search_studies_detail',
    'arguments': {'study_accession': 'MGYS00006860'}
})

# 3. List analyses for a study
analyses = tu.run_one_function({
    'name': 'MGnify_list_analyses',
    'arguments': {'study_accession': 'MGYS00006860', 'size': 5}
})

# 4. Get taxonomic profile from an analysis
taxonomy = tu.run_one_function({
    'name': 'MGnify_get_taxonomy',
    'arguments': {'analysis_accession': 'MGYA00612683'}
})

# 5. Get functional annotations
go_terms = tu.run_one_function({
    'name': 'MGnify_get_go_terms',
    'arguments': {'analysis_accession': 'MGYA00612683'}
})

Common Workflows

Workflow 1: Study Discovery by Environment

Find studies for a specific biome using MGnify's biome hierarchy:

# Browse biome hierarchy
biomes = tu.run_one_function({
    'name': 'MGnify_list_biomes',
    'arguments': {'lineage': 'root:Host-associated:Human', 'depth': 3}
})

# Search studies in a specific biome
studies = tu.run_one_function({
    'name': 'MGnify_search_studies',
    'arguments': {'biome': 'root:Host-associated:Human:Digestive system', 'size': 10}
})

# Look up ENVO ontology terms for environment metadata
envo = tu.run_one_function({
    'name': 'ols_search_terms',
    'arguments': {'query': 'human gut', 'ontology': 'envo', 'rows': 5}
})

Workflow 2: Taxonomic Profiling

Get the microbial composition of a metagenomics sample:

# Get analyses for a study
analyses = tu.run_one_function({
    'name': 'MGnify_list_analyses',
    'arguments': {'study_accession': 'MGYS00006860', 'size': 3}
})

# Get taxonomy for a specific analysis
taxonomy = tu.run_one_function({
    'name': 'MGnify_get_taxonomy',
    'arguments': {'analysis_accession': 'MGYA00612683'}
})
# Returns organisms with lineage, abundance counts, and taxonomy rank

Workflow 3: Genome Quality Assessment

Evaluate metagenome-assembled genomes (MAGs):

# Search for genomes from a specific taxon
genomes = tu.run_one_function({
    'name': 'MGnify_search_genomes',
    'arguments': {'search': 'Faecalibacterium prausnitzii', 'size': 5}
})

# Get quality metrics for a genome
genome = tu.run_one_function({
    'name': 'MGnify_get_genome',
    'arguments': {'genome_accession': 'MGYG000000001'}
})
# Returns completeness, contamination, N50, genome length, taxonomy

# Cross-reference with GTDB taxonomy
gtdb = tu.run_one_function({
    'name': 'GTDB_search_genomes',
    'arguments': {'operation': 'search_genomes', 'query': 'Faecalibacterium', 'items_per_page': 5}
})

Workflow 4: Functional Annotation

Discover functional potential of a metagenome:

# GO terms from an analysis
go_terms = tu.run_one_function({
    'name': 'MGnify_get_go_terms',
    'arguments': {'analysis_accession': 'MGYA00612683'}
})

# InterPro domains
interpro = tu.run_one_function({
    'name': 'MGnify_get_interpro',
    'arguments': {'analysis_accession': 'MGYA00612683'}
})

Workflow 5: Literature Integration

Combine metagenomics data with published research:

# Find relevant publications
papers = tu.run_one_function({
    'name': 'EuropePMC_search_articles',
    'arguments': {'query': 'gut microbiome AND Faecalibacterium AND (IBD OR "Crohn")', 'limit': 10}
})

# Find sequencing data in ENA
ena_studies = tu.run_one_function({
    'name': 'ENAPortal_search_studies',
    'arguments': {'query': 'description="gut microbiome 16S"', 'limit': 5}
})

MGnify Biome Hierarchy

Key biome lineages (use MGnify_list_biomes to discover others):

Human gut: root:Host-associated:Human:Digestive system
Human oral/skin: root:Host-associated:Human:Oral / root:Host-associated:Human:Skin
Soil: root:Environmental:Terrestrial:Soil
Ocean/Freshwater: root:Environmental:Aquatic:Marine / root:Environmental:Aquatic:Freshwater
Wastewater: root:Engineered:Wastewater

Key Identifiers

MGnify: studies=MGYS*, analyses=MGYA*, genomes=MGYG*. ENA studies=PRJEB*. GTDB genomes=GCA_*. ENVO terms=ENVO:* (e.g. ENVO:00002041).

Reasoning Framework

Starting Point: Define the Question First

Microbiome analysis starts with: what is the question? LOOK UP DON'T GUESS — always check the study type and sequencing method before interpreting results.

Decision tree for data type:

Community composition (who is there?) → 16S/ITS amplicon → alpha/beta diversity, differential abundance
Functional potential (what can they do?) → Shotgun metagenomics → MGnify GO terms, InterPro, KEGG pathways
Active function (what are they doing now?) → Metatranscriptomics → specialized pipelines (not MGnify/GTDB alone)

Before calling any tool, determine which data type the user has via MGnify_search_studies_detail — the pipeline type (amplicon vs shotgun) determines which analyses are valid. Do not apply 16S diversity metrics to metagenomic data or vice versa.

Dysbiosis Assessment Strategy

Dysbiosis (microbial imbalance) is context-dependent — there is no universal "healthy" microbiome. LOOK UP DON'T GUESS — compare to study-matched controls, not general population references.

Check alpha diversity : Reduced Shannon index relative to controls suggests dysbiosis. Use MGnify_get_taxonomy to get community profiles, then assess richness and evenness.
Identify keystone taxa shifts : Loss of known beneficial taxa (e.g., Faecalibacterium, Roseburia in gut) or bloom of pathobionts (e.g., Enterobacteriaceae). LOOK UP taxa roles with GTDB_get_species and literature via EuropePMC_search_articles.
Functional consequences : Does taxonomic shift correlate with loss/gain of metabolic pathways? Check MGnify_get_go_terms and MGnify_get_interpro for the affected samples.
Confounders : Antibiotics, diet, age, and geography all affect microbiome composition. A dysbiosis claim requires controlling for these factors or acknowledging them as limitations.

Taxonomic vs Functional Analysis: When to Use Each

Taxonomic analysis alone is sufficient when the question is "which organisms are present?" or "does community composition differ between groups?" Use MGnify_get_taxonomy + GTDB_search_genomes.
Functional analysis is needed when the question is "what metabolic capabilities differ?" or "why does a taxonomic shift matter?" Use MGnify_get_go_terms + MGnify_get_interpro + kegg_search_pathway.
Both together when linking organisms to functions (e.g., "which taxa drive butyrate production in healthy vs IBD gut?"). Cross-reference taxonomic profiles with functional annotations from the same MGnify analysis.

Evidence Grading

Tier	Description	Example
T1	Replicated finding across multiple cohorts with consistent effect	Reduced Faecalibacterium in IBD (>10 independent studies)
T2	Single well-powered study (n > 100) with appropriate controls	Metformin-associated Akkermansia enrichment in a controlled trial
T3	Pilot study or observational association, small sample size	Taxonomic shift in n=15 case-control, no validation cohort
T4	Computational prediction or single-sample observation	Novel MAG with predicted function, no culture confirmation

Interpretation Guidance

Alpha diversity (within-sample) : Shannon index measures richness and evenness. Higher Shannon (>3.0 for gut) suggests a stable community. Reduced alpha diversity is associated with dysbiosis (IBD, antibiotics). Always compare to study-matched controls — diversity varies by body site, sequencing depth, and geography.

Beta diversity (between-sample) : Bray-Curtis (abundance-based) or UniFrac (phylogenetic). PERMANOVA p < 0.05 with R-squared > 0.05 indicates condition-driven clustering. Low R-squared (<0.02) even with significant p suggests the effect is small relative to inter-individual variation. Choose weighted UniFrac when abundant taxa matter most; unweighted when rare taxa are important.

Taxonomic composition : Relative abundance at phylum level (Firmicutes/Bacteroidetes ratio) is a coarse indicator; genus- or species-level resolution is preferred. A taxon present at >1% relative abundance in multiple samples is reliably detected. Taxa at <0.1% may be noise or sequencing artifacts. GTDB taxonomy may reclassify NCBI names (e.g., Firmicutes split into multiple phyla).

Functional profiling : GO terms and InterPro domains from MGnify reflect the metabolic potential (not necessarily activity) of the community. Enrichment of specific pathways (e.g., butyrate production, LPS biosynthesis) should be interpreted alongside taxonomic data to identify which organisms contribute the functions.

Synthesis Questions

A complete microbiome report should answer:

How does alpha diversity compare between conditions, and is the difference significant?
Does beta diversity analysis show condition-driven clustering (PERMANOVA)?
Which taxa are differentially abundant, and are they known commensals or pathobionts?
What functional pathways are enriched, and which taxa likely drive them?
How do findings compare to published studies for the same biome/condition (literature context)?

Tips

MGnify study accessions start with MGYS, analyses with MGYA, genomes with MGYG
Use MGnify_list_biomes first to find the correct biome lineage string
MGnify_get_taxonomy returns phylum-level to species-level composition
GTDB provides standardized bacterial/archaeal taxonomy (differs from NCBI in some lineages)
For 16S amplicon studies, taxonomy is the primary output; for shotgun metagenomics, both taxonomy and functional annotations are available
The size parameter in MGnify tools controls results per page (max 100)

Weekly Installs

–

Repository

mims-harvard/to…universe

GitHub Stars

1.2K

First Seen

–

Security Audits

Gen Agent Trust HubPass SocketPass SnykWarn

微生物组研究工具集 | 宏基因组分析、分类学查询与文献检索一站式解决方案

🇨🇳中文介绍

使用 ToolUniverse 进行微生物组研究

核心工具

相关 Skills

快速开始

常见工作流程

工作流程 1：按环境发现研究

工作流程 2：分类谱分析

工作流程 3：基因组质量评估

工作流程 4：功能注释

工作流程 5：文献整合

MGnify 生物群系层级

关键标识符

推理框架

起点：首先定义问题

菌群失调评估策略

分类学分析与功能分析：何时使用

证据分级

解读指南

综合问题

提示