Protein Interaction Network Analysis by mims-harvard/tooluniverse
npx skills add https://github.com/mims-harvard/tooluniverse --skill 'Protein Interaction Network Analysis'使用 ToolUniverse 工具进行全面的蛋白质相互作用网络分析。通过 4 阶段工作流程分析蛋白质网络:标识符映射、网络检索、富集分析和可选的结构数据。
✅ 标识符映射 - 将蛋白质名称转换为数据库 ID(STRING、UniProt、Ensembl) ✅ 网络检索 - 获取带有置信度分数(0-1.0)的相互作用网络 ✅ 功能富集 - GO 术语、KEGG 通路、Reactome 通路 ✅ PPI 富集 - 测试蛋白质是否形成功能模块 ✅ 结构数据 - 可选的 SAXS/SANS 溶液结构(SASBDB) ✅ 备用策略 - STRING 为主(无需 API 密钥)→ BioGRID 为辅(如果密钥可用)
| 数据库 | 覆盖范围 | API 密钥 | 用途 |
|---|---|---|---|
| STRING | 1400 万+ 蛋白质,5000+ 种生物 | ❌ 不需要 | 主要的相互作用来源 |
| BioGRID | 230 万+ 相互作用,80+ 种生物 | ✅ 需要 | 备用,经过人工整理的数据 |
| SASBDB | 2000+ SAXS/SANS 条目 | ❌ 不需要 | 溶液结构 |
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
from tooluniverse import ToolUniverse
from python_implementation import analyze_protein_network
# 初始化 ToolUniverse
tu = ToolUniverse()
# 分析蛋白质网络
result = analyze_protein_network(
tu=tu,
proteins=["TP53", "MDM2", "ATM", "CHEK2"],
species=9606, # 人类
confidence_score=0.7 # 高置信度
)
# 访问结果
print(f"Mapped: {len(result.mapped_proteins)} proteins")
print(f"Network: {result.total_interactions} interactions")
print(f"Enrichment: {len(result.enriched_terms)} GO terms")
print(f"PPI p-value: {result.ppi_enrichment.get('p_value', 1.0):.2e}")
🔍 Phase 1: Mapping 4 protein identifiers...
✅ Mapped 4/4 proteins (100.0%)
🕸️ Phase 2: Retrieving interaction network...
✅ STRING: Retrieved 6 interactions
🧬 Phase 3: Performing enrichment analysis...
✅ Found 245 enriched GO terms (FDR < 0.05)
✅ PPI enrichment significant (p=3.45e-05)
✅ Analysis complete!
发现目标蛋白质的相互作用伙伴:
result = analyze_protein_network(
tu=tu,
proteins=["TP53"], # 单个蛋白质
species=9606,
confidence_score=0.7
)
# 前 5 个伙伴将出现在网络中
for edge in result.network_edges[:5]:
print(f"{edge['preferredName_A']} ↔ {edge['preferredName_B']} "
f"(score: {edge['score']})")
测试蛋白质是否形成功能复合物:
# DNA 损伤响应蛋白质
proteins = ["TP53", "ATM", "CHEK2", "BRCA1", "BRCA2"]
result = analyze_protein_network(tu=tu, proteins=proteins)
# 检查 PPI 富集
if result.ppi_enrichment.get("p_value", 1.0) < 0.05:
print("✅ Proteins form functional module!")
print(f" Expected edges: {result.ppi_enrichment['expected_number_of_edges']:.1f}")
print(f" Observed edges: {result.ppi_enrichment['number_of_edges']}")
else:
print("⚠️ Proteins may be unrelated")
为蛋白质集合寻找富集的通路:
result = analyze_protein_network(
tu=tu,
proteins=["MAPK1", "MAPK3", "RAF1", "MAP2K1"], # MAPK 通路
confidence_score=0.7
)
# 显示前几个富集的过程
print("\nTop Enriched Pathways:")
for term in result.enriched_terms[:10]:
print(f" {term['term']}: p={term['p_value']:.2e}, FDR={term['fdr']:.2e}")
为多个蛋白质构建完整的相互作用网络:
# 细胞凋亡调节因子
proteins = ["TP53", "BCL2", "BAX", "CASP3", "CASP9"]
result = analyze_protein_network(
tu=tu,
proteins=proteins,
confidence_score=0.7
)
# 导出网络供 Cytoscape 使用
import pandas as pd
df = pd.DataFrame(result.network_edges)
df.to_csv("apoptosis_network.tsv", sep="\t", index=False)
使用 BioGRID 获取经过实验验证的相互作用:
# 需要在环境中设置 BIOGRID_API_KEY
result = analyze_protein_network(
tu=tu,
proteins=["TP53", "MDM2"],
include_biogrid=True # 启用 BioGRID 备用
)
print(f"Primary source: {result.primary_source}") # "STRING" 或 "BioGRID"
添加 SAXS/SANS 溶液结构:
result = analyze_protein_network(
tu=tu,
proteins=["TP53"],
include_structure=True # 查询 SASBDB
)
if result.structural_data:
print(f"\nFound {len(result.structural_data)} SAXS/SANS entries:")
for entry in result.structural_data:
print(f" {entry.get('sasbdb_id')}: {entry.get('title')}")
analyze_protein_network() 参数| 参数 | 类型 | 默认值 | 描述 |
|---|---|---|---|
tu | ToolUniverse | 必需 | ToolUniverse 实例 |
proteins | list[str] | 必需 | 蛋白质标识符(基因符号、UniProt ID) |
species | int | 9606 | NCBI 分类学 ID(9606=人类,10090=小鼠) |
confidence_score | float | 0.7 | 最小相互作用置信度(0-1)。0.4=低,0.7=高,0.9=非常高 |
include_biogrid | bool | False | 如果 STRING 失败则使用 BioGRID(需要 API 密钥) |
include_structure | bool | False | 包含 SASBDB 结构数据(较慢) |
suppress_warnings | bool | True | 抑制 ToolUniverse 加载警告 |
9606 - 智人(人类)10090 - 小家鼠(小鼠)10116 - 褐家鼠(大鼠)7227 - 黑腹果蝇(果蝇)6239 - 秀丽隐杆线虫(线虫)7955 - 斑马鱼(斑马鱼)559292 - 酿酒酵母(酵母)| 分数 | 级别 | 描述 | 使用场景 |
|---|---|---|---|
| 0.15 | 非常低 | 所有证据 | 探索性,假设生成 |
| 0.4 | 低 | 中等证据 | 默认 STRING 阈值 |
| 0.7 | 高 | 强证据 | 推荐 - 可靠的相互作用 |
| 0.9 | 非常高 | 最强证据 | 仅核心相互作用 |
ProteinNetworkResult 对象@dataclass
class ProteinNetworkResult:
# Phase 1: Identifier mapping
mapped_proteins: List[Dict[str, Any]]
mapping_success_rate: float
# Phase 2: Network retrieval
network_edges: List[Dict[str, Any]]
total_interactions: int
# Phase 3: Enrichment analysis
enriched_terms: List[Dict[str, Any]]
ppi_enrichment: Dict[str, Any]
# Phase 4: Structural data (optional)
structural_data: Optional[List[Dict[str, Any]]]
# Metadata
primary_source: str # "STRING" or "BioGRID"
warnings: List[str]
{
"stringId_A": "9606.ENSP00000269305", # Protein A STRING ID
"stringId_B": "9606.ENSP00000258149", # Protein B STRING ID
"preferredName_A": "TP53", # Protein A name
"preferredName_B": "MDM2", # Protein B name
"ncbiTaxonId": 9606, # Species
"score": 0.999, # Combined confidence (0-1)
"nscore": 0.0, # Neighborhood score
"fscore": 0.0, # Gene fusion score
"pscore": 0.0, # Phylogenetic profile score
"ascore": 0.947, # Coexpression score
"escore": 0.951, # Experimental score
"dscore": 0.9, # Database score
"tscore": 0.994 # Text mining score
}
{
"category": "Process", # GO category
"term": "GO:0006915", # GO term ID
"description": "apoptotic process", # Term description
"number_of_genes": 4, # Genes in your set
"number_of_genes_in_background": 1234, # Genes in genome
"p_value": 1.23e-05, # Enrichment p-value
"fdr": 0.0012, # FDR correction
"inputGenes": "TP53,MDM2,BAX,CASP3" # Matching genes
}
┌─────────────────────────────────────────────────────────────┐
│ Phase 1: Identifier Mapping │
│ ─────────────────────────────────────────────────────────── │
│ STRING_map_identifiers() │
│ • Validates protein names exist in database │
│ • Converts to STRING IDs for consistency │
│ • Returns mapping success rate │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Phase 2: Network Retrieval │
│ ─────────────────────────────────────────────────────────── │
│ PRIMARY: STRING_get_network() (no API key needed) │
│ • Retrieves all pairwise interactions │
│ • Returns confidence scores by evidence type │
│ │
│ FALLBACK: BioGRID_get_interactions() (if enabled) │
│ • Used if STRING fails or for validation │
│ • Requires BIOGRID_API_KEY │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Phase 3: Enrichment Analysis │
│ ─────────────────────────────────────────────────────────── │
│ STRING_functional_enrichment() │
│ • GO terms (Process, Component, Function) │
│ • KEGG pathways │
│ • Reactome pathways │
│ • FDR-corrected p-values │
│ │
│ STRING_ppi_enrichment() │
│ • Tests if proteins interact more than random │
│ • Returns p-value for functional coherence │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Phase 4: Structural Data (Optional) │
│ ─────────────────────────────────────────────────────────── │
│ SASBDB_search_entries() │
│ • SAXS/SANS solution structures │
│ • Protein flexibility and conformations │
│ • Complements crystal/cryo-EM data │
└─────────────────────────────────────────────────────────────┘
# 安装 ToolUniverse(如果尚未安装)
pip install tooluniverse
# 或安装包含额外功能的版本
pip install tooluniverse[all]
用于 BioGRID 备用功能:
注册免费 API 密钥:https://webservice.thebiogrid.org/
添加到 .env 文件:
BIOGRID_API_KEY=your_key_here
tooluniverse-protein-interactions/
├── SKILL.md # This file
├── python_implementation.py # Main implementation
├── QUICK_START.md # Quick reference
├── DOMAIN_ANALYSIS.md # Design rationale
└── KNOWN_ISSUES.md # ToolUniverse limitations
问题:ToolUniverse 在分析过程中打印 40 多条警告信息。
解决方法:运行时过滤输出:
python your_script.py 2>&1 | grep -v "Error loading tools"
详情请参阅 KNOWN_ISSUES.md。
BioGRID 备用功能需要免费的 API 密钥。STRING 无需任何 API 密钥即可工作。
SASBDB 端点偶尔会返回错误。结构数据是可选的。
| 操作 | 时间 | 备注 |
|---|---|---|
| 标识符映射 | 1-2 秒 | 针对 5 个蛋白质 |
| 网络检索 | 2-3 秒 | 取决于网络大小 |
| 富集分析 | 3-5 秒 | 针对 374 个术语 |
| 完整的 4 阶段分析 | 6-10 秒 | 不包括 ToolUniverse 开销 |
注意:每个工具调用需额外增加 4-8 秒的 ToolUniverse 加载时间(框架限制)。
include_structure=Falseconfidence_score=0.9✅ 在此技能中已修复 - 所有参数名称已在第 2 阶段测试中验证。
confidence_score=0.4BIOGRID_API_KEY 已在环境中设置请参阅 python_implementation.py 了解:
example_tp53_analysis() - 完整的 TP53 网络分析analyze_protein_network() - 包含所有选项的主要函数ProteinNetworkResult - 结果数据结构对于以下问题:
与 ToolUniverse 框架许可证相同。
每周安装次数
–
代码仓库
GitHub 星标数
1.2K
首次出现
–
安全审计
Comprehensive protein interaction network analysis using ToolUniverse tools. Analyzes protein networks through a 4-phase workflow: identifier mapping, network retrieval, enrichment analysis, and optional structural data.
✅ Identifier Mapping - Convert protein names to database IDs (STRING, UniProt, Ensembl) ✅ Network Retrieval - Get interaction networks with confidence scores (0-1.0) ✅ Functional Enrichment - GO terms, KEGG pathways, Reactome pathways ✅ PPI Enrichment - Test if proteins form functional modules ✅ Structural Data - Optional SAXS/SANS solution structures (SASBDB) ✅ Fallback Strategy - STRING primary (no API key) → BioGRID secondary (if key available)
| Database | Coverage | API Key | Purpose |
|---|---|---|---|
| STRING | 14M+ proteins, 5,000+ organisms | ❌ Not required | Primary interaction source |
| BioGRID | 2.3M+ interactions, 80+ organisms | ✅ Required | Fallback, curated data |
| SASBDB | 2,000+ SAXS/SANS entries | ❌ Not required | Solution structures |
from tooluniverse import ToolUniverse
from python_implementation import analyze_protein_network
# Initialize ToolUniverse
tu = ToolUniverse()
# Analyze protein network
result = analyze_protein_network(
tu=tu,
proteins=["TP53", "MDM2", "ATM", "CHEK2"],
species=9606, # Human
confidence_score=0.7 # High confidence
)
# Access results
print(f"Mapped: {len(result.mapped_proteins)} proteins")
print(f"Network: {result.total_interactions} interactions")
print(f"Enrichment: {len(result.enriched_terms)} GO terms")
print(f"PPI p-value: {result.ppi_enrichment.get('p_value', 1.0):.2e}")
🔍 Phase 1: Mapping 4 protein identifiers...
✅ Mapped 4/4 proteins (100.0%)
🕸️ Phase 2: Retrieving interaction network...
✅ STRING: Retrieved 6 interactions
🧬 Phase 3: Performing enrichment analysis...
✅ Found 245 enriched GO terms (FDR < 0.05)
✅ PPI enrichment significant (p=3.45e-05)
✅ Analysis complete!
Discover interaction partners for a protein of interest:
result = analyze_protein_network(
tu=tu,
proteins=["TP53"], # Single protein
species=9606,
confidence_score=0.7
)
# Top 5 partners will be in the network
for edge in result.network_edges[:5]:
print(f"{edge['preferredName_A']} ↔ {edge['preferredName_B']} "
f"(score: {edge['score']})")
Test if proteins form a functional complex:
# DNA damage response proteins
proteins = ["TP53", "ATM", "CHEK2", "BRCA1", "BRCA2"]
result = analyze_protein_network(tu=tu, proteins=proteins)
# Check PPI enrichment
if result.ppi_enrichment.get("p_value", 1.0) < 0.05:
print("✅ Proteins form functional module!")
print(f" Expected edges: {result.ppi_enrichment['expected_number_of_edges']:.1f}")
print(f" Observed edges: {result.ppi_enrichment['number_of_edges']}")
else:
print("⚠️ Proteins may be unrelated")
Find enriched pathways for a protein set:
result = analyze_protein_network(
tu=tu,
proteins=["MAPK1", "MAPK3", "RAF1", "MAP2K1"], # MAPK pathway
confidence_score=0.7
)
# Show top enriched processes
print("\nTop Enriched Pathways:")
for term in result.enriched_terms[:10]:
print(f" {term['term']}: p={term['p_value']:.2e}, FDR={term['fdr']:.2e}")
Build complete interaction network for multiple proteins:
# Apoptosis regulators
proteins = ["TP53", "BCL2", "BAX", "CASP3", "CASP9"]
result = analyze_protein_network(
tu=tu,
proteins=proteins,
confidence_score=0.7
)
# Export network for Cytoscape
import pandas as pd
df = pd.DataFrame(result.network_edges)
df.to_csv("apoptosis_network.tsv", sep="\t", index=False)
Use BioGRID for experimentally validated interactions:
# Requires BIOGRID_API_KEY in environment
result = analyze_protein_network(
tu=tu,
proteins=["TP53", "MDM2"],
include_biogrid=True # Enable BioGRID fallback
)
print(f"Primary source: {result.primary_source}") # "STRING" or "BioGRID"
Add SAXS/SANS solution structures:
result = analyze_protein_network(
tu=tu,
proteins=["TP53"],
include_structure=True # Query SASBDB
)
if result.structural_data:
print(f"\nFound {len(result.structural_data)} SAXS/SANS entries:")
for entry in result.structural_data:
print(f" {entry.get('sasbdb_id')}: {entry.get('title')}")
analyze_protein_network() Parameters| Parameter | Type | Default | Description |
|---|---|---|---|
tu | ToolUniverse | Required | ToolUniverse instance |
proteins | list[str] | Required | Protein identifiers (gene symbols, UniProt IDs) |
species | int | 9606 | NCBI taxonomy ID (9606=human, 10090=mouse) |
confidence_score | float | 0.7 | Min interaction confidence (0-1). 0.4=low, 0.7=high, 0.9=very high |
9606 - Homo sapiens (human)10090 - Mus musculus (mouse)10116 - Rattus norvegicus (rat)7227 - Drosophila melanogaster (fruit fly)6239 - Caenorhabditis elegans (worm)7955 - Danio rerio (zebrafish)559292 - Saccharomyces cerevisiae (yeast)| Score | Level | Description | Use Case |
|---|---|---|---|
| 0.15 | Very low | All evidence | Exploratory, hypothesis generation |
| 0.4 | Low | Medium evidence | Default STRING threshold |
| 0.7 | High | Strong evidence | Recommended - reliable interactions |
| 0.9 | Very high | Strongest evidence | Core interactions only |
ProteinNetworkResult Object@dataclass
class ProteinNetworkResult:
# Phase 1: Identifier mapping
mapped_proteins: List[Dict[str, Any]]
mapping_success_rate: float
# Phase 2: Network retrieval
network_edges: List[Dict[str, Any]]
total_interactions: int
# Phase 3: Enrichment analysis
enriched_terms: List[Dict[str, Any]]
ppi_enrichment: Dict[str, Any]
# Phase 4: Structural data (optional)
structural_data: Optional[List[Dict[str, Any]]]
# Metadata
primary_source: str # "STRING" or "BioGRID"
warnings: List[str]
{
"stringId_A": "9606.ENSP00000269305", # Protein A STRING ID
"stringId_B": "9606.ENSP00000258149", # Protein B STRING ID
"preferredName_A": "TP53", # Protein A name
"preferredName_B": "MDM2", # Protein B name
"ncbiTaxonId": 9606, # Species
"score": 0.999, # Combined confidence (0-1)
"nscore": 0.0, # Neighborhood score
"fscore": 0.0, # Gene fusion score
"pscore": 0.0, # Phylogenetic profile score
"ascore": 0.947, # Coexpression score
"escore": 0.951, # Experimental score
"dscore": 0.9, # Database score
"tscore": 0.994 # Text mining score
}
{
"category": "Process", # GO category
"term": "GO:0006915", # GO term ID
"description": "apoptotic process", # Term description
"number_of_genes": 4, # Genes in your set
"number_of_genes_in_background": 1234, # Genes in genome
"p_value": 1.23e-05, # Enrichment p-value
"fdr": 0.0012, # FDR correction
"inputGenes": "TP53,MDM2,BAX,CASP3" # Matching genes
}
┌─────────────────────────────────────────────────────────────┐
│ Phase 1: Identifier Mapping │
│ ─────────────────────────────────────────────────────────── │
│ STRING_map_identifiers() │
│ • Validates protein names exist in database │
│ • Converts to STRING IDs for consistency │
│ • Returns mapping success rate │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Phase 2: Network Retrieval │
│ ─────────────────────────────────────────────────────────── │
│ PRIMARY: STRING_get_network() (no API key needed) │
│ • Retrieves all pairwise interactions │
│ • Returns confidence scores by evidence type │
│ │
│ FALLBACK: BioGRID_get_interactions() (if enabled) │
│ • Used if STRING fails or for validation │
│ • Requires BIOGRID_API_KEY │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Phase 3: Enrichment Analysis │
│ ─────────────────────────────────────────────────────────── │
│ STRING_functional_enrichment() │
│ • GO terms (Process, Component, Function) │
│ • KEGG pathways │
│ • Reactome pathways │
│ • FDR-corrected p-values │
│ │
│ STRING_ppi_enrichment() │
│ • Tests if proteins interact more than random │
│ • Returns p-value for functional coherence │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Phase 4: Structural Data (Optional) │
│ ─────────────────────────────────────────────────────────── │
│ SASBDB_search_entries() │
│ • SAXS/SANS solution structures │
│ • Protein flexibility and conformations │
│ • Complements crystal/cryo-EM data │
└─────────────────────────────────────────────────────────────┘
# Install ToolUniverse (if not already installed)
pip install tooluniverse
# Or with extras
pip install tooluniverse[all]
For BioGRID fallback functionality:
Register for free API key: https://webservice.thebiogrid.org/
Add to .env file:
BIOGRID_API_KEY=your_key_here
tooluniverse-protein-interactions/
├── SKILL.md # This file
├── python_implementation.py # Main implementation
├── QUICK_START.md # Quick reference
├── DOMAIN_ANALYSIS.md # Design rationale
└── KNOWN_ISSUES.md # ToolUniverse limitations
Issue : ToolUniverse prints 40+ warning messages during analysis.
Workaround : Filter output when running:
python your_script.py 2>&1 | grep -v "Error loading tools"
See KNOWN_ISSUES.md for details.
BioGRID fallback requires free API key. STRING works without any API key.
SASBDB endpoints occasionally return errors. Structural data is optional.
| Operation | Time | Notes |
|---|---|---|
| Identifier mapping | 1-2 sec | For 5 proteins |
| Network retrieval | 2-3 sec | Depends on network size |
| Enrichment analysis | 3-5 sec | For 374 terms |
| Full 4-phase analysis | 6-10 sec | Excluding ToolUniverse overhead |
Note : Add 4-8 seconds per tool call for ToolUniverse loading (framework limitation).
include_structure=Falseconfidence_score=0.9✅ Fixed in this skill - All parameter names verified in Phase 2 testing.
confidence_score=0.4BIOGRID_API_KEY is set in environmentSee python_implementation.py for:
example_tp53_analysis() - Complete TP53 network analysisanalyze_protein_network() - Main function with all optionsProteinNetworkResult - Result data structureFor issues with:
Same as ToolUniverse framework license.
Weekly Installs
–
Repository
GitHub Stars
1.2K
First Seen
–
Security Audits
Excel财务建模规范与xlsx文件处理指南:专业格式、零错误公式与数据分析
42,000 周安装
include_biogrid | bool | False | Use BioGRID if STRING fails (requires API key) |
include_structure | bool | False | Include SASBDB structural data (slower) |
suppress_warnings | bool | True | Suppress ToolUniverse loading warnings |