基因富集与通路分析工具 | ORA/GSEA分析 | GO/KEGG/Reactome富集

tooluniverse-gene-enrichment by mims-harvard/tooluniverse

124 周安装量

1,200 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/mims-harvard/tooluniverse --skill tooluniverse-gene-enrichment

数据分析科研工具生物信息学

🇨🇳中文介绍

基因富集与通路分析

执行全面的基因富集分析，包括使用过表达分析（ORA）和基因集富集分析（GSEA）进行基因本体（GO）、KEGG、Reactome、WikiPathways 和 MSigDB 富集。通过 gseapy 进行本地计算，并整合 ToolUniverse 通路数据库，以获得经过交叉验证、可直接用于发表的结果。

重要提示：在工具调用中始终使用英文术语（基因名、通路名、生物体名），即使用户使用其他语言提问。仅当英文术语未返回结果时，才尝试使用原始语言术语作为备选。使用用户的语言进行回复。

何时使用此技能

当用户出现以下情况时应用：

询问基因富集分析（GO、KEGG、Reactome 等）
拥有来自差异表达、聚类或任何实验的基因列表
想知道哪些生物过程、分子功能或细胞组分被富集
需要进行 KEGG 或 Reactome 通路富集分析
询问关于使用排序基因列表的 GSEA（基因集富集分析）
需要进行使用 Fisher 精确检验的过表达分析（ORA）
需要进行多重检验校正（Benjamini-Hochberg、Bonferroni）
询问关于 enrichGO、gseapy、clusterProfiler 风格的分析

不适用于（请改用其他技能）：

网络药理学/药物重定位 → 使用 tooluniverse-network-pharmacology
疾病表征 → 使用 tooluniverse-multiomic-disease-characterization
单基因功能查询 → 使用 tooluniverse-disease-research
空间组学分析 → 使用 tooluniverse-spatial-omics-analysis

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

参数	必需	描述	示例
gene_list	是	基因符号、Ensembl ID 或 Entrez ID 列表	`["TP53", "BRCA1", "EGFR"]`
organism	否	生物体（默认：human）。支持：human, mouse, rat, fly, worm, yeast, zebrafish	`human`
analysis_type	否	`ORA`（默认）或 `GSEA`	`ORA`
enrichment_databases	否	要查询的数据库。默认：所有适用的数据库	`["GO_BP", "GO_MF", "GO_CC", "KEGG", "Reactome"]`
gene_id_type	否	输入 ID 类型：`symbol`, `ensembl`, `entrez`, `uniprot`（如果省略则自动检测）	`symbol`
p_value_cutoff	否	显著性阈值（默认：0.05）	`0.05`
correction_method	否	多重检验校正：`BH`（Benjamini-Hochberg，默认）, `bonferroni`, `fdr`	`BH`
background_genes	否	自定义背景基因集（默认：全基因组）	`["GENE1", "GENE2", ...]`
ranked_gene_list	否	用于 GSEA：基因到分数的映射（例如，log2FC）	`{"TP53": 2.5, "BRCA1": -1.3, ...}`

决策树：ORA 与 GSEA

Q: Do you have a ranked gene list (with scores/fold-changes)?
  YES → Use GSEA (gseapy.prerank)
        - Input: Gene-to-score mapping (e.g., log2FC)
        - Statistics: Running enrichment score, permutation test
        - Cutoff: FDR q-val < 0.25 (standard for GSEA)
        - Output: NES (Normalized Enrichment Score), lead genes
        See: references/gsea_workflow.md

  NO  → Use ORA (gseapy.enrichr)
        - Input: Gene list only
        - Statistics: Fisher's exact test, hypergeometric
        - Cutoff: Adjusted P-value < 0.05 (or user specified)
        - Output: P-value, adjusted P-value, overlap, odds ratio
        See: references/ora_workflow.md

决策树：gseapy 与 ToolUniverse 工具

Q: Which enrichment method should I use?

Primary Analysis (ALWAYS):
  ├─ gseapy.enrichr (ORA) OR gseapy.prerank (GSEA)
  │  - Most comprehensive (225+ Enrichr libraries)
  │  - GO (BP, MF, CC), KEGG, Reactome, WikiPathways, MSigDB
  │  - All organisms supported
  │  - Returns: P-value, Adjusted P-value, Overlap, Genes
  │  See: references/enrichr_guide.md

Cross-Validation (REQUIRED for publication):
  ├─ PANTHER_enrichment [T1 - curated]
  │  - Curated GO enrichment
  │  - Multiple organisms (taxonomy ID)
  │  - GO BP, MF, CC, PANTHER pathways, Reactome
  │
  ├─ STRING_functional_enrichment [T2 - validated]
  │  - Returns ALL categories in one call
  │  - Filter by category: Process, Function, Component, KEGG, Reactome
  │  - Network-based enrichment
  │
  └─ ReactomeAnalysis_pathway_enrichment [T1 - curated]
     - Reactome curated pathways
     - Cross-species projection
     - Detailed pathway hierarchy

Additional Context (Optional):
  ├─ GO_get_term_by_id, QuickGO_get_term_detail (GO term details)
  ├─ Reactome_get_pathway, Reactome_get_pathway_hierarchy (pathway context)
  ├─ WikiPathways_search, WikiPathways_get_pathway (community pathways)
  └─ STRING_ppi_enrichment (network topology analysis)

快速开始工作流程

步骤 1：创建报告文件（立即执行）

report_path = f"{analysis_name}_enrichment_report.md"
# Write header with placeholder sections
# Update progressively as analysis proceeds

步骤 2：ID 转换与验证

from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()

# Detect ID type
gene_list = ["TP53", "BRCA1", "EGFR"]
# Auto-detect: ENSG* = Ensembl, numeric = Entrez, pattern = UniProt, else = Symbol

# Convert if needed (Ensembl/Entrez → Symbol)
result = tu.tools.MyGene_batch_query(
    gene_ids=gene_list,
    fields="symbol,entrezgene,ensembl.gene"
)
# Extract symbols from results

# Validate with STRING
mapped = tu.tools.STRING_map_identifiers(
    protein_ids=gene_symbols,
    species=9606  # human
)
# Use preferredName for canonical symbols

参见：references/id_conversion.md 获取完整示例

步骤 3：使用 gseapy 进行主要富集分析

对于 ORA（仅基因列表）：

import gseapy

# GO Biological Process
go_bp = gseapy.enrichr(
    gene_list=gene_symbols,
    gene_sets='GO_Biological_Process_2021',
    organism='human',
    outdir=None,
    no_plot=True,
    background=background_genes  # None = genome-wide
)
go_bp_sig = go_bp.results[go_bp.results['Adjusted P-value'] < 0.05]

对于 GSEA（排序基因列表）：

import pandas as pd

# Ranked by log2FC
ranked_series = pd.Series(gene_to_score).sort_values(ascending=False)

gsea_result = gseapy.prerank(
    rnk=ranked_series,
    gene_sets='GO_Biological_Process_2021',
    outdir=None,
    no_plot=True,
    seed=42,
    min_size=5,
    max_size=500,
    permutation_num=1000
)
gsea_sig = gsea_result.res2d[gsea_result.res2d['FDR q-val'] < 0.25]

references/ora_workflow.md 获取完整的 ORA 示例
references/gsea_workflow.md 获取完整的 GSEA 示例
references/enrichr_guide.md 获取所有 225+ 个库的信息

步骤 4：使用 ToolUniverse 进行交叉验证

# PANTHER [T1 - curated]
panther_bp = tu.tools.PANTHER_enrichment(
    gene_list=','.join(gene_symbols),  # comma-separated string
    organism=9606,
    annotation_dataset='GO:0008150'  # biological_process
)

# STRING [T2 - validated]
string_result = tu.tools.STRING_functional_enrichment(
    protein_ids=gene_symbols,
    species=9606
)
# Filter by category: Process, Function, Component, KEGG, Reactome

# Reactome [T1 - curated]
reactome_result = tu.tools.ReactomeAnalysis_pathway_enrichment(
    identifiers=' '.join(gene_symbols),  # space-separated
    page_size=50,
    include_disease=True
)

参见：references/cross_validation.md 获取比较策略

步骤 5：报告汇编

## Results

### GO Biological Process (Top 10)
| Term | P-value | Adj. P-value | Overlap | Genes | Evidence |
|------|---------|-------------|---------|-------|----------|
| regulation of cell cycle (GO:0051726) | 1.2e-08 | 3.4e-06 | 12/45 | TP53;BRCA1;... | [T2] gseapy |

### Cross-Validation
| GO Term | gseapy FDR | PANTHER FDR | STRING FDR | Consensus |
|---------|-----------|-------------|-----------|-----------|
| GO:0051726 | 3.4e-06 | 2.1e-05 | 1.8e-05 | 3/3 ✓ |

### Completeness Checklist
- [x] ID Conversion (MyGene, STRING) - 95% mapped
- [x] GO BP (gseapy, PANTHER, STRING) - 24 significant terms
- [x] GO MF (gseapy, PANTHER, STRING) - 18 significant terms
- [x] GO CC (gseapy, PANTHER, STRING) - 12 significant terms
- [x] KEGG (gseapy, STRING) - 8 significant pathways
- [x] Reactome (gseapy, ReactomeAPI) - 15 significant pathways
- [x] Cross-validation - 12 consensus terms (2+ sources)

参见：scripts/format_enrichment_output.py 获取自动格式化脚本

层级	符号	标准	示例
T1	[T1]	经过人工整理/实验验证的富集	PANTHER, Reactome Analysis Service
T2	[T2]	计算富集，经过良好验证	gseapy ORA/GSEA, STRING functional enrichment
T3	[T3]	文本挖掘/预测性富集	Enrichr 非整理库
T4	[T4]	单一来源注释	来自 QuickGO 的单个基因 GO 注释

生物体	分类学 ID	gseapy	PANTHER	STRING	Reactome
Human	9606	Yes	Yes	Yes	Yes
Mouse	10090	Yes (`*_Mouse`)	Yes	Yes	Yes (projection)
Rat	10116	Limited	Yes	Yes	Yes (projection)
Fly	7227	Limited	Yes	Yes	Yes (projection)
Worm	6239	Limited	Yes	Yes	Yes (projection)
Yeast	4932	Limited	Yes	Yes	Yes

参见：references/organism_support.md 获取生物体特定库信息

模式 1：标准 DEG 富集（ORA）

Input: List of differentially expressed gene symbols
Flow: ID validation → gseapy ORA (GO + KEGG + Reactome) →
      PANTHER + STRING cross-validation → Report top enriched terms
Use: When you have unranked gene list from DESeq2/edgeR

模式 2：排序基因列表（GSEA）

Input: Gene-to-log2FC mapping from differential expression
Flow: Convert to ranked Series → gseapy GSEA (GO + KEGG + MSigDB) →
      Filter by FDR < 0.25 → Report NES and lead genes
Use: When you have fold-changes or other ranking metric

模式 3：BixBench 富集问题

Input: Specific question about enrichment (e.g., "What is the adjusted p-val for neutrophil activation?")
Flow: Parse question for gene list and library → Run gseapy with exact library →
      Find specific term → Report exact p-value and adjusted p-value
Use: When answering targeted questions about specific terms

模式 4：多生物体富集

Input: Gene list from mouse experiment
Flow: Use organism='mouse' for gseapy → organism=10090 for PANTHER/STRING →
      projection=True for Reactome human pathway mapping
Use: When working with non-human organisms

参见：references/common_patterns.md 获取更多示例

“未发现显著富集”：

验证基因符号是否有效（使用 STRING_map_identifiers）
尝试不同的库版本（2021 vs 2023 vs 2025）
尝试放宽显著性阈值或改用 GSEA

“未找到基因”错误：

检查 ID 类型并使用 MyGene_batch_query 进行转换
从 Ensembl ID 中移除版本后缀（ENSG00000141510.16 → ENSG00000141510）

“STRING 返回所有类别”：

这是预期行为；在收到结果后通过 d['category'] == 'Process' 进行过滤

参见：references/troubleshooting.md 获取完整指南

工具	输入	输出	用途
`gseapy.enrichr()`	gene_list, gene_sets, organism	`.results` DataFrame	使用 225+ 个库进行 ORA
`gseapy.prerank()`	rnk (ranked Series), gene_sets	`.res2d` DataFrame	GSEA 分析

工具	关键参数	证据等级
`PANTHER_enrichment`	gene_list (comma-sep), organism, annotation_dataset	[T1]
`STRING_functional_enrichment`	protein_ids, species	[T2]
`ReactomeAnalysis_pathway_enrichment`	identifiers (space-sep), page_size	[T1]

工具	输入	输出
`MyGene_batch_query`	gene_ids, fields	Symbol, Entrez, Ensembl 映射
`STRING_map_identifiers`	protein_ids, species	首选名称, STRING IDs

参见：references/tool_parameters.md 获取完整的参数文档

所有详细示例、代码块和高级主题已移至 references/ 目录：

references/ora_workflow.md - 包含所有数据库的完整 ORA 示例
references/gsea_workflow.md - 包含排序列表的完整 GSEA 工作流程
references/enrichr_guide.md - 所有 225+ 个 Enrichr 库及其用法
references/cross_validation.md - 多源验证策略
references/id_conversion.md - 基因 ID 消歧与转换
references/tool_parameters.md - 完整的工具参数参考
references/organism_support.md - 生物体特定配置
references/common_patterns.md - 详细用例示例
references/troubleshooting.md - 完整的故障排除指南
references/multiple_testing.md - 校正方法（BH, Bonferroni, BY）
references/report_template.md - 标准报告格式

scripts/format_enrichment_output.py - 为报告格式化结果
scripts/compare_enrichment_sources.py - 交叉验证分析
scripts/filter_by_gene_set_size.py - 根据基因集大小过滤术语

🇺🇸English

Gene Enrichment and Pathway Analysis

Perform comprehensive gene enrichment analysis including Gene Ontology (GO), KEGG, Reactome, WikiPathways, and MSigDB enrichment using both Over-Representation Analysis (ORA) and Gene Set Enrichment Analysis (GSEA). Integrates local computation via gseapy with ToolUniverse pathway databases for cross-validated, publication-ready results.

IMPORTANT : Always use English terms in tool calls (gene names, pathway names, organism names), even if the user writes in another language. Only try original-language terms as a fallback if English returns no results. Respond in the user's language.

When to Use This Skill

Apply when users:

Ask about gene enrichment analysis (GO, KEGG, Reactome, etc.)
Have a gene list from differential expression, clustering, or any experiment
Want to know which biological processes, molecular functions, or cellular components are enriched
Need KEGG or Reactome pathway enrichment analysis
Ask about GSEA (Gene Set Enrichment Analysis) with ranked gene lists
Want over-representation analysis (ORA) with Fisher's exact test
Need multiple testing correction (Benjamini-Hochberg, Bonferroni)
Ask about enrichGO, gseapy, clusterProfiler-style analyses

NOT for (use other skills instead):

Network pharmacology / drug repurposing → Use tooluniverse-network-pharmacology
Disease characterization → Use tooluniverse-multiomic-disease-characterization
Single gene function lookup → Use tooluniverse-disease-research
Spatial omics analysis → Use tooluniverse-spatial-omics-analysis
Protein-protein interaction analysis only → Use tooluniverse-protein-interactions

Input Parameters

Parameter	Required	Description	Example
gene_list	Yes	List of gene symbols, Ensembl IDs, or Entrez IDs	`["TP53", "BRCA1", "EGFR"]`
organism	No	Organism (default: human). Supported: human, mouse, rat, fly, worm, yeast, zebrafish	`human`
analysis_type	No	`ORA` (default) or `GSEA`	`ORA`

Core Principles

Report-first approach - Create report file FIRST, then populate progressively
ID disambiguation FIRST - Detect and convert gene IDs before ANY enrichment
Multi-source validation - Run enrichment on at least 2 independent tools, cross-validate
Exact p-values - Report raw p-values AND adjusted p-values with correction method
Multiple testing correction - ALWAYS apply Benjamini-Hochberg unless user specifies otherwise
Gene set size filtering - Filter by min/max gene set size to avoid trivial/overly broad terms
Evidence grading - Grade enrichment sources T1-T4
Negative results documented - "No significant enrichment" is a valid finding
Source references - Every enrichment result must cite the tool/database/library used
Completeness checklist - Mandatory section at end showing analysis coverage

Decision Tree: ORA vs GSEA

Q: Do you have a ranked gene list (with scores/fold-changes)?
  YES → Use GSEA (gseapy.prerank)
        - Input: Gene-to-score mapping (e.g., log2FC)
        - Statistics: Running enrichment score, permutation test
        - Cutoff: FDR q-val < 0.25 (standard for GSEA)
        - Output: NES (Normalized Enrichment Score), lead genes
        See: references/gsea_workflow.md

  NO  → Use ORA (gseapy.enrichr)
        - Input: Gene list only
        - Statistics: Fisher's exact test, hypergeometric
        - Cutoff: Adjusted P-value < 0.05 (or user specified)
        - Output: P-value, adjusted P-value, overlap, odds ratio
        See: references/ora_workflow.md

Decision Tree: gseapy vs ToolUniverse Tools

Q: Which enrichment method should I use?

Primary Analysis (ALWAYS):
  ├─ gseapy.enrichr (ORA) OR gseapy.prerank (GSEA)
  │  - Most comprehensive (225+ Enrichr libraries)
  │  - GO (BP, MF, CC), KEGG, Reactome, WikiPathways, MSigDB
  │  - All organisms supported
  │  - Returns: P-value, Adjusted P-value, Overlap, Genes
  │  See: references/enrichr_guide.md

Cross-Validation (REQUIRED for publication):
  ├─ PANTHER_enrichment [T1 - curated]
  │  - Curated GO enrichment
  │  - Multiple organisms (taxonomy ID)
  │  - GO BP, MF, CC, PANTHER pathways, Reactome
  │
  ├─ STRING_functional_enrichment [T2 - validated]
  │  - Returns ALL categories in one call
  │  - Filter by category: Process, Function, Component, KEGG, Reactome
  │  - Network-based enrichment
  │
  └─ ReactomeAnalysis_pathway_enrichment [T1 - curated]
     - Reactome curated pathways
     - Cross-species projection
     - Detailed pathway hierarchy

Additional Context (Optional):
  ├─ GO_get_term_by_id, QuickGO_get_term_detail (GO term details)
  ├─ Reactome_get_pathway, Reactome_get_pathway_hierarchy (pathway context)
  ├─ WikiPathways_search, WikiPathways_get_pathway (community pathways)
  └─ STRING_ppi_enrichment (network topology analysis)

Quick Start Workflow

Step 1: Create Report File (IMMEDIATE)

report_path = f"{analysis_name}_enrichment_report.md"
# Write header with placeholder sections
# Update progressively as analysis proceeds

Step 2: ID Conversion and Validation

from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()

# Detect ID type
gene_list = ["TP53", "BRCA1", "EGFR"]
# Auto-detect: ENSG* = Ensembl, numeric = Entrez, pattern = UniProt, else = Symbol

# Convert if needed (Ensembl/Entrez → Symbol)
result = tu.tools.MyGene_batch_query(
    gene_ids=gene_list,
    fields="symbol,entrezgene,ensembl.gene"
)
# Extract symbols from results

# Validate with STRING
mapped = tu.tools.STRING_map_identifiers(
    protein_ids=gene_symbols,
    species=9606  # human
)
# Use preferredName for canonical symbols

See : references/id_conversion.md for complete examples

Step 3: Primary Enrichment with gseapy

For ORA (gene list only) :

import gseapy

# GO Biological Process
go_bp = gseapy.enrichr(
    gene_list=gene_symbols,
    gene_sets='GO_Biological_Process_2021',
    organism='human',
    outdir=None,
    no_plot=True,
    background=background_genes  # None = genome-wide
)
go_bp_sig = go_bp.results[go_bp.results['Adjusted P-value'] < 0.05]

For GSEA (ranked gene list) :

import pandas as pd

# Ranked by log2FC
ranked_series = pd.Series(gene_to_score).sort_values(ascending=False)

gsea_result = gseapy.prerank(
    rnk=ranked_series,
    gene_sets='GO_Biological_Process_2021',
    outdir=None,
    no_plot=True,
    seed=42,
    min_size=5,
    max_size=500,
    permutation_num=1000
)
gsea_sig = gsea_result.res2d[gsea_result.res2d['FDR q-val'] < 0.25]

See :

references/ora_workflow.md for complete ORA examples
references/gsea_workflow.md for complete GSEA examples
references/enrichr_guide.md for all 225+ libraries

Step 4: Cross-Validation with ToolUniverse

# PANTHER [T1 - curated]
panther_bp = tu.tools.PANTHER_enrichment(
    gene_list=','.join(gene_symbols),  # comma-separated string
    organism=9606,
    annotation_dataset='GO:0008150'  # biological_process
)

# STRING [T2 - validated]
string_result = tu.tools.STRING_functional_enrichment(
    protein_ids=gene_symbols,
    species=9606
)
# Filter by category: Process, Function, Component, KEGG, Reactome

# Reactome [T1 - curated]
reactome_result = tu.tools.ReactomeAnalysis_pathway_enrichment(
    identifiers=' '.join(gene_symbols),  # space-separated
    page_size=50,
    include_disease=True
)

See : references/cross_validation.md for comparison strategies

Step 5: Report Compilation

## Results

### GO Biological Process (Top 10)
| Term | P-value | Adj. P-value | Overlap | Genes | Evidence |
|------|---------|-------------|---------|-------|----------|
| regulation of cell cycle (GO:0051726) | 1.2e-08 | 3.4e-06 | 12/45 | TP53;BRCA1;... | [T2] gseapy |

### Cross-Validation
| GO Term | gseapy FDR | PANTHER FDR | STRING FDR | Consensus |
|---------|-----------|-------------|-----------|-----------|
| GO:0051726 | 3.4e-06 | 2.1e-05 | 1.8e-05 | 3/3 ✓ |

### Completeness Checklist
- [x] ID Conversion (MyGene, STRING) - 95% mapped
- [x] GO BP (gseapy, PANTHER, STRING) - 24 significant terms
- [x] GO MF (gseapy, PANTHER, STRING) - 18 significant terms
- [x] GO CC (gseapy, PANTHER, STRING) - 12 significant terms
- [x] KEGG (gseapy, STRING) - 8 significant pathways
- [x] Reactome (gseapy, ReactomeAPI) - 15 significant pathways
- [x] Cross-validation - 12 consensus terms (2+ sources)

See : scripts/format_enrichment_output.py for automated formatting

Evidence Grading

Tier	Symbol	Criteria	Examples
T1	[T1]	Curated/experimental enrichment	PANTHER, Reactome Analysis Service
T2	[T2]	Computational enrichment, well-validated	gseapy ORA/GSEA, STRING functional enrichment
T3	[T3]	Text-mining/predicted enrichment	Enrichr non-curated libraries
T4	[T4]	Single-source annotation	Individual gene GO annotations from QuickGO

Supported Organisms

Organism	Taxonomy ID	gseapy	PANTHER	STRING	Reactome
Human	9606	Yes	Yes	Yes	Yes
Mouse	10090	Yes (`*_Mouse`)	Yes	Yes	Yes (projection)
Rat	10116	Limited	Yes	Yes	Yes (projection)
Fly	7227	Limited	Yes	Yes	Yes (projection)
Worm

See : references/organism_support.md for organism-specific libraries

Common Patterns

Pattern 1: Standard DEG Enrichment (ORA)

Input: List of differentially expressed gene symbols
Flow: ID validation → gseapy ORA (GO + KEGG + Reactome) →
      PANTHER + STRING cross-validation → Report top enriched terms
Use: When you have unranked gene list from DESeq2/edgeR

Pattern 2: Ranked Gene List (GSEA)

Input: Gene-to-log2FC mapping from differential expression
Flow: Convert to ranked Series → gseapy GSEA (GO + KEGG + MSigDB) →
      Filter by FDR < 0.25 → Report NES and lead genes
Use: When you have fold-changes or other ranking metric

Pattern 3: BixBench Enrichment Question

Input: Specific question about enrichment (e.g., "What is the adjusted p-val for neutrophil activation?")
Flow: Parse question for gene list and library → Run gseapy with exact library →
      Find specific term → Report exact p-value and adjusted p-value
Use: When answering targeted questions about specific terms

Pattern 4: Multi-Organism Enrichment

Input: Gene list from mouse experiment
Flow: Use organism='mouse' for gseapy → organism=10090 for PANTHER/STRING →
      projection=True for Reactome human pathway mapping
Use: When working with non-human organisms

See : references/common_patterns.md for more examples

Troubleshooting

"No significant enrichment found" :

Verify gene symbols are valid (STRING_map_identifiers)
Try different library versions (2021 vs 2023 vs 2025)
Try relaxing significance cutoff or use GSEA instead

"Gene not found" errors :

Check ID type and convert using MyGene_batch_query
Remove version suffixes from Ensembl IDs (ENSG00000141510.16 → ENSG00000141510)

"STRING returns all categories" :

This is expected; filter by d['category'] == 'Process' after receiving results

See : references/troubleshooting.md for complete guide

Tool Reference

Primary Enrichment Tools

Tool	Input	Output	Use For
`gseapy.enrichr()`	gene_list, gene_sets, organism	`.results` DataFrame	ORA with 225+ libraries
`gseapy.prerank()`	rnk (ranked Series), gene_sets	`.res2d` DataFrame	GSEA analysis

Cross-Validation Tools

Tool	Key Parameters	Evidence Grade
`PANTHER_enrichment`	gene_list (comma-sep), organism, annotation_dataset	[T1]
`STRING_functional_enrichment`	protein_ids, species	[T2]
`ReactomeAnalysis_pathway_enrichment`	identifiers (space-sep), page_size	[T1]

ID Conversion Tools

Tool	Input	Output
`MyGene_batch_query`	gene_ids, fields	Symbol, Entrez, Ensembl mappings
`STRING_map_identifiers`	protein_ids, species	Preferred names, STRING IDs

See : references/tool_parameters.md for complete parameter documentation

Detailed Documentation

All detailed examples, code blocks, and advanced topics have been moved to references/:

references/ora_workflow.md - Complete ORA examples with all databases
references/gsea_workflow.md - Complete GSEA workflow with ranked lists
references/enrichr_guide.md - All 225+ Enrichr libraries and usage
references/cross_validation.md - Multi-source validation strategies
references/id_conversion.md - Gene ID disambiguation and conversion
references/tool_parameters.md - Complete tool parameter reference
references/organism_support.md - Organism-specific configurations
references/common_patterns.md - Detailed use case examples
references/troubleshooting.md - Complete troubleshooting guide
references/multiple_testing.md - Correction methods (BH, Bonferroni, BY)
references/report_template.md - Standard report format

Helper scripts:

scripts/format_enrichment_output.py - Format results for reports
scripts/compare_enrichment_sources.py - Cross-validation analysis
scripts/filter_by_gene_set_size.py - Filter terms by size

Resources

For network-level analysis: tooluniverse-network-pharmacology For disease characterization: tooluniverse-multiomic-disease-characterization For spatial omics: tooluniverse-spatial-omics-analysis For protein interactions: tooluniverse-protein-interactions

gseapy documentation: https://gseapy.readthedocs.io/ PANTHER API: http://pantherdb.org/services/oai/pantherdb/ STRING API: https://string-db.org/cgi/help?sessionId=&subpage=api Reactome Analysis: https://reactome.org/AnalysisService/

Weekly Installs

124

Repository

mims-harvard/to…universe

GitHub Stars

1.2K

First Seen

Feb 19, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

codex121

gemini-cli120

opencode120

github-copilot119

cursor117

kimi-cli116

Excel财务建模规范与xlsx文件处理指南：专业格式、零错误公式与数据分析

44,500 周安装