tooluniverse-gene-enrichment by mims-harvard/tooluniverse
npx skills add https://github.com/mims-harvard/tooluniverse --skill tooluniverse-gene-enrichment执行全面的基因富集分析,包括使用过表达分析(ORA)和基因集富集分析(GSEA)进行基因本体(GO)、KEGG、Reactome、WikiPathways 和 MSigDB 富集。通过 gseapy 进行本地计算,并整合 ToolUniverse 通路数据库,以获得经过交叉验证、可直接用于发表的结果。
重要提示:在工具调用中始终使用英文术语(基因名、通路名、生物体名),即使用户使用其他语言提问。仅当英文术语未返回结果时,才尝试使用原始语言术语作为备选。使用用户的语言进行回复。
当用户出现以下情况时应用:
不适用于(请改用其他技能):
tooluniverse-network-pharmacologytooluniverse-multiomic-disease-characterizationtooluniverse-disease-researchtooluniverse-spatial-omics-analysis广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
tooluniverse-protein-interactions| 参数 | 必需 | 描述 | 示例 |
|---|---|---|---|
| gene_list | 是 | 基因符号、Ensembl ID 或 Entrez ID 列表 | ["TP53", "BRCA1", "EGFR"] |
| organism | 否 | 生物体(默认:human)。支持:human, mouse, rat, fly, worm, yeast, zebrafish | human |
| analysis_type | 否 | ORA(默认)或 GSEA | ORA |
| enrichment_databases | 否 | 要查询的数据库。默认:所有适用的数据库 | ["GO_BP", "GO_MF", "GO_CC", "KEGG", "Reactome"] |
| gene_id_type | 否 | 输入 ID 类型:symbol, ensembl, entrez, uniprot(如果省略则自动检测) | symbol |
| p_value_cutoff | 否 | 显著性阈值(默认:0.05) | 0.05 |
| correction_method | 否 | 多重检验校正:BH(Benjamini-Hochberg,默认), bonferroni, fdr | BH |
| background_genes | 否 | 自定义背景基因集(默认:全基因组) | ["GENE1", "GENE2", ...] |
| ranked_gene_list | 否 | 用于 GSEA:基因到分数的映射(例如,log2FC) | {"TP53": 2.5, "BRCA1": -1.3, ...} |
Q: Do you have a ranked gene list (with scores/fold-changes)?
YES → Use GSEA (gseapy.prerank)
- Input: Gene-to-score mapping (e.g., log2FC)
- Statistics: Running enrichment score, permutation test
- Cutoff: FDR q-val < 0.25 (standard for GSEA)
- Output: NES (Normalized Enrichment Score), lead genes
See: references/gsea_workflow.md
NO → Use ORA (gseapy.enrichr)
- Input: Gene list only
- Statistics: Fisher's exact test, hypergeometric
- Cutoff: Adjusted P-value < 0.05 (or user specified)
- Output: P-value, adjusted P-value, overlap, odds ratio
See: references/ora_workflow.md
Q: Which enrichment method should I use?
Primary Analysis (ALWAYS):
├─ gseapy.enrichr (ORA) OR gseapy.prerank (GSEA)
│ - Most comprehensive (225+ Enrichr libraries)
│ - GO (BP, MF, CC), KEGG, Reactome, WikiPathways, MSigDB
│ - All organisms supported
│ - Returns: P-value, Adjusted P-value, Overlap, Genes
│ See: references/enrichr_guide.md
Cross-Validation (REQUIRED for publication):
├─ PANTHER_enrichment [T1 - curated]
│ - Curated GO enrichment
│ - Multiple organisms (taxonomy ID)
│ - GO BP, MF, CC, PANTHER pathways, Reactome
│
├─ STRING_functional_enrichment [T2 - validated]
│ - Returns ALL categories in one call
│ - Filter by category: Process, Function, Component, KEGG, Reactome
│ - Network-based enrichment
│
└─ ReactomeAnalysis_pathway_enrichment [T1 - curated]
- Reactome curated pathways
- Cross-species projection
- Detailed pathway hierarchy
Additional Context (Optional):
├─ GO_get_term_by_id, QuickGO_get_term_detail (GO term details)
├─ Reactome_get_pathway, Reactome_get_pathway_hierarchy (pathway context)
├─ WikiPathways_search, WikiPathways_get_pathway (community pathways)
└─ STRING_ppi_enrichment (network topology analysis)
report_path = f"{analysis_name}_enrichment_report.md"
# Write header with placeholder sections
# Update progressively as analysis proceeds
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()
# Detect ID type
gene_list = ["TP53", "BRCA1", "EGFR"]
# Auto-detect: ENSG* = Ensembl, numeric = Entrez, pattern = UniProt, else = Symbol
# Convert if needed (Ensembl/Entrez → Symbol)
result = tu.tools.MyGene_batch_query(
gene_ids=gene_list,
fields="symbol,entrezgene,ensembl.gene"
)
# Extract symbols from results
# Validate with STRING
mapped = tu.tools.STRING_map_identifiers(
protein_ids=gene_symbols,
species=9606 # human
)
# Use preferredName for canonical symbols
参见:references/id_conversion.md 获取完整示例
对于 ORA(仅基因列表):
import gseapy
# GO Biological Process
go_bp = gseapy.enrichr(
gene_list=gene_symbols,
gene_sets='GO_Biological_Process_2021',
organism='human',
outdir=None,
no_plot=True,
background=background_genes # None = genome-wide
)
go_bp_sig = go_bp.results[go_bp.results['Adjusted P-value'] < 0.05]
对于 GSEA(排序基因列表):
import pandas as pd
# Ranked by log2FC
ranked_series = pd.Series(gene_to_score).sort_values(ascending=False)
gsea_result = gseapy.prerank(
rnk=ranked_series,
gene_sets='GO_Biological_Process_2021',
outdir=None,
no_plot=True,
seed=42,
min_size=5,
max_size=500,
permutation_num=1000
)
gsea_sig = gsea_result.res2d[gsea_result.res2d['FDR q-val'] < 0.25]
参见:
# PANTHER [T1 - curated]
panther_bp = tu.tools.PANTHER_enrichment(
gene_list=','.join(gene_symbols), # comma-separated string
organism=9606,
annotation_dataset='GO:0008150' # biological_process
)
# STRING [T2 - validated]
string_result = tu.tools.STRING_functional_enrichment(
protein_ids=gene_symbols,
species=9606
)
# Filter by category: Process, Function, Component, KEGG, Reactome
# Reactome [T1 - curated]
reactome_result = tu.tools.ReactomeAnalysis_pathway_enrichment(
identifiers=' '.join(gene_symbols), # space-separated
page_size=50,
include_disease=True
)
参见:references/cross_validation.md 获取比较策略
## Results
### GO Biological Process (Top 10)
| Term | P-value | Adj. P-value | Overlap | Genes | Evidence |
|------|---------|-------------|---------|-------|----------|
| regulation of cell cycle (GO:0051726) | 1.2e-08 | 3.4e-06 | 12/45 | TP53;BRCA1;... | [T2] gseapy |
### Cross-Validation
| GO Term | gseapy FDR | PANTHER FDR | STRING FDR | Consensus |
|---------|-----------|-------------|-----------|-----------|
| GO:0051726 | 3.4e-06 | 2.1e-05 | 1.8e-05 | 3/3 ✓ |
### Completeness Checklist
- [x] ID Conversion (MyGene, STRING) - 95% mapped
- [x] GO BP (gseapy, PANTHER, STRING) - 24 significant terms
- [x] GO MF (gseapy, PANTHER, STRING) - 18 significant terms
- [x] GO CC (gseapy, PANTHER, STRING) - 12 significant terms
- [x] KEGG (gseapy, STRING) - 8 significant pathways
- [x] Reactome (gseapy, ReactomeAPI) - 15 significant pathways
- [x] Cross-validation - 12 consensus terms (2+ sources)
参见:scripts/format_enrichment_output.py 获取自动格式化脚本
| 层级 | 符号 | 标准 | 示例 |
|---|---|---|---|
| T1 | [T1] | 经过人工整理/实验验证的富集 | PANTHER, Reactome Analysis Service |
| T2 | [T2] | 计算富集,经过良好验证 | gseapy ORA/GSEA, STRING functional enrichment |
| T3 | [T3] | 文本挖掘/预测性富集 | Enrichr 非整理库 |
| T4 | [T4] | 单一来源注释 | 来自 QuickGO 的单个基因 GO 注释 |
| 生物体 | 分类学 ID | gseapy | PANTHER | STRING | Reactome |
|---|---|---|---|---|---|
| Human | 9606 | Yes | Yes | Yes | Yes |
| Mouse | 10090 | Yes (*_Mouse) | Yes | Yes | Yes (projection) |
| Rat | 10116 | Limited | Yes | Yes | Yes (projection) |
| Fly | 7227 | Limited | Yes | Yes | Yes (projection) |
| Worm | 6239 | Limited | Yes | Yes | Yes (projection) |
| Yeast | 4932 | Limited | Yes | Yes | Yes |
参见:references/organism_support.md 获取生物体特定库信息
Input: List of differentially expressed gene symbols
Flow: ID validation → gseapy ORA (GO + KEGG + Reactome) →
PANTHER + STRING cross-validation → Report top enriched terms
Use: When you have unranked gene list from DESeq2/edgeR
Input: Gene-to-log2FC mapping from differential expression
Flow: Convert to ranked Series → gseapy GSEA (GO + KEGG + MSigDB) →
Filter by FDR < 0.25 → Report NES and lead genes
Use: When you have fold-changes or other ranking metric
Input: Specific question about enrichment (e.g., "What is the adjusted p-val for neutrophil activation?")
Flow: Parse question for gene list and library → Run gseapy with exact library →
Find specific term → Report exact p-value and adjusted p-value
Use: When answering targeted questions about specific terms
Input: Gene list from mouse experiment
Flow: Use organism='mouse' for gseapy → organism=10090 for PANTHER/STRING →
projection=True for Reactome human pathway mapping
Use: When working with non-human organisms
参见:references/common_patterns.md 获取更多示例
“未发现显著富集”:
“未找到基因”错误:
“STRING 返回所有类别”:
d['category'] == 'Process' 进行过滤参见:references/troubleshooting.md 获取完整指南
| 工具 | 输入 | 输出 | 用途 |
|---|---|---|---|
gseapy.enrichr() | gene_list, gene_sets, organism | .results DataFrame | 使用 225+ 个库进行 ORA |
gseapy.prerank() | rnk (ranked Series), gene_sets | .res2d DataFrame | GSEA 分析 |
| 工具 | 关键参数 | 证据等级 |
|---|---|---|
PANTHER_enrichment | gene_list (comma-sep), organism, annotation_dataset | [T1] |
STRING_functional_enrichment | protein_ids, species | [T2] |
ReactomeAnalysis_pathway_enrichment | identifiers (space-sep), page_size | [T1] |
| 工具 | 输入 | 输出 |
|---|---|---|
MyGene_batch_query | gene_ids, fields | Symbol, Entrez, Ensembl 映射 |
STRING_map_identifiers | protein_ids, species | 首选名称, STRING IDs |
参见:references/tool_parameters.md 获取完整的参数文档
所有详细示例、代码块和高级主题已移至 references/ 目录:
辅助脚本:
网络级分析:tooluniverse-network-pharmacology 疾病表征:tooluniverse-multiomic-disease-characterization 空间组学:tooluniverse-spatial-omics-analysis 蛋白质相互作用:tooluniverse-protein-interactions
gseapy 文档:https://gseapy.readthedocs.io/ PANTHER API:http://pantherdb.org/services/oai/pantherdb/ STRING API:https://string-db.org/cgi/help?sessionId=&subpage=api Reactome 分析:https://reactome.org/AnalysisService/
每周安装次数
124
代码仓库
GitHub 星标数
1.2K
首次出现
2026年2月19日
安全审计
安装于
codex121
gemini-cli120
opencode120
github-copilot119
cursor117
kimi-cli116
Perform comprehensive gene enrichment analysis including Gene Ontology (GO), KEGG, Reactome, WikiPathways, and MSigDB enrichment using both Over-Representation Analysis (ORA) and Gene Set Enrichment Analysis (GSEA). Integrates local computation via gseapy with ToolUniverse pathway databases for cross-validated, publication-ready results.
IMPORTANT : Always use English terms in tool calls (gene names, pathway names, organism names), even if the user writes in another language. Only try original-language terms as a fallback if English returns no results. Respond in the user's language.
Apply when users:
NOT for (use other skills instead):
tooluniverse-network-pharmacologytooluniverse-multiomic-disease-characterizationtooluniverse-disease-researchtooluniverse-spatial-omics-analysistooluniverse-protein-interactions| Parameter | Required | Description | Example |
|---|---|---|---|
| gene_list | Yes | List of gene symbols, Ensembl IDs, or Entrez IDs | ["TP53", "BRCA1", "EGFR"] |
| organism | No | Organism (default: human). Supported: human, mouse, rat, fly, worm, yeast, zebrafish | human |
| analysis_type | No | ORA (default) or GSEA | ORA |
Q: Do you have a ranked gene list (with scores/fold-changes)?
YES → Use GSEA (gseapy.prerank)
- Input: Gene-to-score mapping (e.g., log2FC)
- Statistics: Running enrichment score, permutation test
- Cutoff: FDR q-val < 0.25 (standard for GSEA)
- Output: NES (Normalized Enrichment Score), lead genes
See: references/gsea_workflow.md
NO → Use ORA (gseapy.enrichr)
- Input: Gene list only
- Statistics: Fisher's exact test, hypergeometric
- Cutoff: Adjusted P-value < 0.05 (or user specified)
- Output: P-value, adjusted P-value, overlap, odds ratio
See: references/ora_workflow.md
Q: Which enrichment method should I use?
Primary Analysis (ALWAYS):
├─ gseapy.enrichr (ORA) OR gseapy.prerank (GSEA)
│ - Most comprehensive (225+ Enrichr libraries)
│ - GO (BP, MF, CC), KEGG, Reactome, WikiPathways, MSigDB
│ - All organisms supported
│ - Returns: P-value, Adjusted P-value, Overlap, Genes
│ See: references/enrichr_guide.md
Cross-Validation (REQUIRED for publication):
├─ PANTHER_enrichment [T1 - curated]
│ - Curated GO enrichment
│ - Multiple organisms (taxonomy ID)
│ - GO BP, MF, CC, PANTHER pathways, Reactome
│
├─ STRING_functional_enrichment [T2 - validated]
│ - Returns ALL categories in one call
│ - Filter by category: Process, Function, Component, KEGG, Reactome
│ - Network-based enrichment
│
└─ ReactomeAnalysis_pathway_enrichment [T1 - curated]
- Reactome curated pathways
- Cross-species projection
- Detailed pathway hierarchy
Additional Context (Optional):
├─ GO_get_term_by_id, QuickGO_get_term_detail (GO term details)
├─ Reactome_get_pathway, Reactome_get_pathway_hierarchy (pathway context)
├─ WikiPathways_search, WikiPathways_get_pathway (community pathways)
└─ STRING_ppi_enrichment (network topology analysis)
report_path = f"{analysis_name}_enrichment_report.md"
# Write header with placeholder sections
# Update progressively as analysis proceeds
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()
# Detect ID type
gene_list = ["TP53", "BRCA1", "EGFR"]
# Auto-detect: ENSG* = Ensembl, numeric = Entrez, pattern = UniProt, else = Symbol
# Convert if needed (Ensembl/Entrez → Symbol)
result = tu.tools.MyGene_batch_query(
gene_ids=gene_list,
fields="symbol,entrezgene,ensembl.gene"
)
# Extract symbols from results
# Validate with STRING
mapped = tu.tools.STRING_map_identifiers(
protein_ids=gene_symbols,
species=9606 # human
)
# Use preferredName for canonical symbols
See : references/id_conversion.md for complete examples
For ORA (gene list only) :
import gseapy
# GO Biological Process
go_bp = gseapy.enrichr(
gene_list=gene_symbols,
gene_sets='GO_Biological_Process_2021',
organism='human',
outdir=None,
no_plot=True,
background=background_genes # None = genome-wide
)
go_bp_sig = go_bp.results[go_bp.results['Adjusted P-value'] < 0.05]
For GSEA (ranked gene list) :
import pandas as pd
# Ranked by log2FC
ranked_series = pd.Series(gene_to_score).sort_values(ascending=False)
gsea_result = gseapy.prerank(
rnk=ranked_series,
gene_sets='GO_Biological_Process_2021',
outdir=None,
no_plot=True,
seed=42,
min_size=5,
max_size=500,
permutation_num=1000
)
gsea_sig = gsea_result.res2d[gsea_result.res2d['FDR q-val'] < 0.25]
See :
# PANTHER [T1 - curated]
panther_bp = tu.tools.PANTHER_enrichment(
gene_list=','.join(gene_symbols), # comma-separated string
organism=9606,
annotation_dataset='GO:0008150' # biological_process
)
# STRING [T2 - validated]
string_result = tu.tools.STRING_functional_enrichment(
protein_ids=gene_symbols,
species=9606
)
# Filter by category: Process, Function, Component, KEGG, Reactome
# Reactome [T1 - curated]
reactome_result = tu.tools.ReactomeAnalysis_pathway_enrichment(
identifiers=' '.join(gene_symbols), # space-separated
page_size=50,
include_disease=True
)
See : references/cross_validation.md for comparison strategies
## Results
### GO Biological Process (Top 10)
| Term | P-value | Adj. P-value | Overlap | Genes | Evidence |
|------|---------|-------------|---------|-------|----------|
| regulation of cell cycle (GO:0051726) | 1.2e-08 | 3.4e-06 | 12/45 | TP53;BRCA1;... | [T2] gseapy |
### Cross-Validation
| GO Term | gseapy FDR | PANTHER FDR | STRING FDR | Consensus |
|---------|-----------|-------------|-----------|-----------|
| GO:0051726 | 3.4e-06 | 2.1e-05 | 1.8e-05 | 3/3 ✓ |
### Completeness Checklist
- [x] ID Conversion (MyGene, STRING) - 95% mapped
- [x] GO BP (gseapy, PANTHER, STRING) - 24 significant terms
- [x] GO MF (gseapy, PANTHER, STRING) - 18 significant terms
- [x] GO CC (gseapy, PANTHER, STRING) - 12 significant terms
- [x] KEGG (gseapy, STRING) - 8 significant pathways
- [x] Reactome (gseapy, ReactomeAPI) - 15 significant pathways
- [x] Cross-validation - 12 consensus terms (2+ sources)
See : scripts/format_enrichment_output.py for automated formatting
| Tier | Symbol | Criteria | Examples |
|---|---|---|---|
| T1 | [T1] | Curated/experimental enrichment | PANTHER, Reactome Analysis Service |
| T2 | [T2] | Computational enrichment, well-validated | gseapy ORA/GSEA, STRING functional enrichment |
| T3 | [T3] | Text-mining/predicted enrichment | Enrichr non-curated libraries |
| T4 | [T4] | Single-source annotation | Individual gene GO annotations from QuickGO |
| Organism | Taxonomy ID | gseapy | PANTHER | STRING | Reactome |
|---|---|---|---|---|---|
| Human | 9606 | Yes | Yes | Yes | Yes |
| Mouse | 10090 | Yes (*_Mouse) | Yes | Yes | Yes (projection) |
| Rat | 10116 | Limited | Yes | Yes | Yes (projection) |
| Fly | 7227 | Limited | Yes | Yes | Yes (projection) |
| Worm |
See : references/organism_support.md for organism-specific libraries
Input: List of differentially expressed gene symbols
Flow: ID validation → gseapy ORA (GO + KEGG + Reactome) →
PANTHER + STRING cross-validation → Report top enriched terms
Use: When you have unranked gene list from DESeq2/edgeR
Input: Gene-to-log2FC mapping from differential expression
Flow: Convert to ranked Series → gseapy GSEA (GO + KEGG + MSigDB) →
Filter by FDR < 0.25 → Report NES and lead genes
Use: When you have fold-changes or other ranking metric
Input: Specific question about enrichment (e.g., "What is the adjusted p-val for neutrophil activation?")
Flow: Parse question for gene list and library → Run gseapy with exact library →
Find specific term → Report exact p-value and adjusted p-value
Use: When answering targeted questions about specific terms
Input: Gene list from mouse experiment
Flow: Use organism='mouse' for gseapy → organism=10090 for PANTHER/STRING →
projection=True for Reactome human pathway mapping
Use: When working with non-human organisms
See : references/common_patterns.md for more examples
"No significant enrichment found" :
"Gene not found" errors :
"STRING returns all categories" :
d['category'] == 'Process' after receiving resultsSee : references/troubleshooting.md for complete guide
| Tool | Input | Output | Use For |
|---|---|---|---|
gseapy.enrichr() | gene_list, gene_sets, organism | .results DataFrame | ORA with 225+ libraries |
gseapy.prerank() | rnk (ranked Series), gene_sets | .res2d DataFrame | GSEA analysis |
| Tool | Key Parameters | Evidence Grade |
|---|---|---|
PANTHER_enrichment | gene_list (comma-sep), organism, annotation_dataset | [T1] |
STRING_functional_enrichment | protein_ids, species | [T2] |
ReactomeAnalysis_pathway_enrichment | identifiers (space-sep), page_size | [T1] |
| Tool | Input | Output |
|---|---|---|
MyGene_batch_query | gene_ids, fields | Symbol, Entrez, Ensembl mappings |
STRING_map_identifiers | protein_ids, species | Preferred names, STRING IDs |
See : references/tool_parameters.md for complete parameter documentation
All detailed examples, code blocks, and advanced topics have been moved to references/:
Helper scripts:
For network-level analysis: tooluniverse-network-pharmacology For disease characterization: tooluniverse-multiomic-disease-characterization For spatial omics: tooluniverse-spatial-omics-analysis For protein interactions: tooluniverse-protein-interactions
gseapy documentation: https://gseapy.readthedocs.io/ PANTHER API: http://pantherdb.org/services/oai/pantherdb/ STRING API: https://string-db.org/cgi/help?sessionId=&subpage=api Reactome Analysis: https://reactome.org/AnalysisService/
Weekly Installs
124
Repository
GitHub Stars
1.2K
First Seen
Feb 19, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
codex121
gemini-cli120
opencode120
github-copilot119
cursor117
kimi-cli116
Excel财务建模规范与xlsx文件处理指南:专业格式、零错误公式与数据分析
44,500 周安装
| enrichment_databases | No | Which databases to query. Default: all applicable | ["GO_BP", "GO_MF", "GO_CC", "KEGG", "Reactome"] |
| gene_id_type | No | Input ID type: symbol, ensembl, entrez, uniprot (auto-detected if omitted) | symbol |
| p_value_cutoff | No | Significance threshold (default: 0.05) | 0.05 |
| correction_method | No | Multiple testing: BH (Benjamini-Hochberg, default), bonferroni, fdr | BH |
| background_genes | No | Custom background gene set (default: genome-wide) | ["GENE1", "GENE2", ...] |
| ranked_gene_list | No | For GSEA: gene-to-score mapping (e.g., log2FC) | {"TP53": 2.5, "BRCA1": -1.3, ...} |
| 6239 |
| Limited |
| Yes |
| Yes |
| Yes (projection) |
| Yeast | 4932 | Limited | Yes | Yes | Yes |