string-database by davila7/claude-code-templates
npx skills add https://github.com/davila7/claude-code-templates --skill string-databaseSTRING 是一个全面的已知和预测蛋白质-蛋白质相互作用数据库,涵盖 5000 多种生物中的 5900 万种蛋白质和超过 200 亿个相互作用。通过 REST API 查询相互作用网络、执行功能富集分析、发现相互作用伙伴,用于系统生物学和通路分析。
此技能应在以下情况下使用:
此技能提供:
scripts/string_api.py)references/string_reference.md)当用户请求 STRING 数据时,确定需要哪种操作并使用 scripts/string_api.py 中的相应函数。
string_map_ids)将基因名称、蛋白质名称和外部 ID 转换为 STRING 标识符。
:开始任何 STRING 分析、验证蛋白质名称、查找规范标识符时。
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
用法:
from scripts.string_api import string_map_ids
# 映射单个蛋白质
result = string_map_ids('TP53', species=9606)
# 映射多个蛋白质
result = string_map_ids(['TP53', 'BRCA1', 'EGFR', 'MDM2'], species=9606)
# 每个查询允许多个匹配项进行映射
result = string_map_ids('p53', species=9606, limit=5)
参数:
species: NCBI 分类单元 ID (9606 = 人类, 10090 = 小鼠, 7227 = 果蝇)limit: 每个标识符的匹配数量 (默认: 1)echo_query: 在输出中包含查询项 (默认: 1)最佳实践:始终先映射标识符,以便后续查询更快。
string_network)以表格格式获取蛋白质-蛋白质相互作用网络数据。
何时使用:构建相互作用网络、分析连接性、检索相互作用证据时。
用法:
from scripts.string_api import string_network
# 获取单个蛋白质的网络
network = string_network('9606.ENSP00000269305', species=9606)
# 获取多个蛋白质的网络
proteins = ['9606.ENSP00000269305', '9606.ENSP00000275493']
network = string_network(proteins, required_score=700)
# 使用额外的相互作用物扩展网络
network = string_network('TP53', species=9606, add_nodes=10, required_score=400)
# 仅物理相互作用
network = string_network('TP53', species=9606, network_type='physical')
参数:
required_score: 置信度阈值 (0-1000)
network_type: 'functional' (所有证据,默认) 或 'physical' (仅直接结合)add_nodes: 添加 N 个连接最紧密的蛋白质 (0-10)输出列:相互作用对、置信度分数以及各证据分数(邻域、融合、共表达、实验、数据库、文本挖掘)。
string_network_image)生成 PNG 格式的网络可视化图像。
何时使用:创建图表、可视化探索、演示时。
用法:
from scripts.string_api import string_network_image
# 获取网络图像
proteins = ['TP53', 'MDM2', 'ATM', 'CHEK2', 'BRCA1']
img_data = string_network_image(proteins, species=9606, required_score=700)
# 保存图像
with open('network.png', 'wb') as f:
f.write(img_data)
# 证据着色网络
img = string_network_image(proteins, species=9606, network_flavor='evidence')
# 基于置信度的可视化
img = string_network_image(proteins, species=9606, network_flavor='confidence')
# 作用网络(激活/抑制)
img = string_network_image(proteins, species=9606, network_flavor='actions')
网络风格:
'evidence': 彩色线条显示证据类型 (默认)'confidence': 线条粗细代表置信度'actions': 显示激活/抑制关系string_interaction_partners)查找与给定蛋白质相互作用的所有蛋白质。
何时使用:发现新的相互作用、寻找枢纽蛋白、扩展网络时。
用法:
from scripts.string_api import string_interaction_partners
# 获取 TP53 的前 10 个相互作用伙伴
partners = string_interaction_partners('TP53', species=9606, limit=10)
# 获取高置信度的相互作用伙伴
partners = string_interaction_partners('TP53', species=9606,
limit=20, required_score=700)
# 查找多个蛋白质的相互作用伙伴
partners = string_interaction_partners(['TP53', 'MDM2'],
species=9606, limit=15)
参数:
limit: 返回的伙伴最大数量 (默认: 10)required_score: 置信度阈值 (0-1000)使用场景:
string_enrichment)在基因本体论、KEGG 通路、Pfam 结构域等方面执行富集分析。
何时使用:解释蛋白质列表、通路分析、功能表征、理解生物过程时。
用法:
from scripts.string_enrichment import string_enrichment
# 对蛋白质列表进行富集分析
proteins = ['TP53', 'MDM2', 'ATM', 'CHEK2', 'BRCA1', 'ATR', 'TP73']
enrichment = string_enrichment(proteins, species=9606)
# 解析结果以查找显著项
import pandas as pd
df = pd.read_csv(io.StringIO(enrichment), sep='\t')
significant = df[df['fdr'] < 0.05]
富集类别:
输出列:
category: 注释数据库 (例如 "KEGG Pathways", "GO Biological Process")term: 术语标识符description: 人类可读的术语描述number_of_genes: 具有此注释的输入蛋白质数量p_value: 未校正的富集 p 值fdr: 错误发现率 (校正后的 p 值)统计方法:Fisher 精确检验,使用 Benjamini-Hochberg FDR 校正。
解释:FDR < 0.05 表示具有统计学显著性的富集。
string_ppi_enrichment)测试蛋白质网络是否比随机预期具有显著更多的相互作用。
何时使用:验证蛋白质是否形成功能模块、测试网络连接性时。
用法:
from scripts.string_api import string_ppi_enrichment
import json
# 测试网络连接性
proteins = ['TP53', 'MDM2', 'ATM', 'CHEK2', 'BRCA1']
result = string_ppi_enrichment(proteins, species=9606, required_score=400)
# 解析 JSON 结果
data = json.loads(result)
print(f"Observed edges: {data['number_of_edges']}")
print(f"Expected edges: {data['expected_number_of_edges']}")
print(f"P-value: {data['p_value']}")
输出字段:
number_of_nodes: 网络中的蛋白质数量number_of_edges: 观察到的相互作用数量expected_number_of_edges: 随机网络中的预期数量p_value: 统计显著性解释:
string_homology)检索蛋白质相似性和同源性信息。
何时使用:识别蛋白质家族、旁系同源分析、跨物种比较时。
用法:
from scripts.string_api import string_homology
# 获取蛋白质之间的同源性
proteins = ['TP53', 'TP63', 'TP73'] # p53 家族
homology = string_homology(proteins, species=9606)
使用场景:
string_version)获取当前 STRING 数据库版本。
何时使用:确保可重复性、记录方法时。
用法:
from scripts.string_api import string_version
version = string_version()
print(f"STRING version: {version}")
使用场景:分析来自实验的蛋白质列表 (例如,差异表达、蛋白质组学)。
from scripts.string_api import (string_map_ids, string_network,
string_enrichment, string_ppi_enrichment,
string_network_image)
# 步骤 1:将基因名称映射到 STRING ID
gene_list = ['TP53', 'BRCA1', 'ATM', 'CHEK2', 'MDM2', 'ATR', 'BRCA2']
mapping = string_map_ids(gene_list, species=9606)
# 步骤 2:获取相互作用网络
network = string_network(gene_list, species=9606, required_score=400)
# 步骤 3:测试网络是否富集
ppi_result = string_ppi_enrichment(gene_list, species=9606)
# 步骤 4:执行功能富集分析
enrichment = string_enrichment(gene_list, species=9606)
# 步骤 5:生成网络可视化
img = string_network_image(gene_list, species=9606,
network_flavor='evidence', required_score=400)
with open('protein_network.png', 'wb') as f:
f.write(img)
# 步骤 6:解析和解释结果
使用场景:深入研究一个蛋白质的相互作用和伙伴。
from scripts.string_api import (string_map_ids, string_interaction_partners,
string_network_image)
# 步骤 1:映射蛋白质名称
protein = 'TP53'
mapping = string_map_ids(protein, species=9606)
# 步骤 2:获取所有相互作用伙伴
partners = string_interaction_partners(protein, species=9606,
limit=20, required_score=700)
# 步骤 3:可视化扩展网络
img = string_network_image(protein, species=9606, add_nodes=15,
network_flavor='confidence', required_score=700)
with open('tp53_network.png', 'wb') as f:
f.write(img)
使用场景:识别和可视化特定生物通路中的蛋白质。
from scripts.string_api import string_enrichment, string_network
# 步骤 1:从已知的通路蛋白质开始
dna_repair_proteins = ['TP53', 'ATM', 'ATR', 'CHEK1', 'CHEK2',
'BRCA1', 'BRCA2', 'RAD51', 'XRCC1']
# 步骤 2:获取网络
network = string_network(dna_repair_proteins, species=9606,
required_score=700, add_nodes=5)
# 步骤 3:富集分析以确认通路注释
enrichment = string_enrichment(dna_repair_proteins, species=9606)
# 步骤 4:解析富集结果以查找 DNA 修复通路
import pandas as pd
import io
df = pd.read_csv(io.StringIO(enrichment), sep='\t')
dna_repair = df[df['description'].str.contains('DNA repair', case=False)]
使用场景:比较不同生物之间的蛋白质相互作用。
from scripts.string_api import string_network
# 人类网络
human_network = string_network('TP53', species=9606, required_score=700)
# 小鼠网络
mouse_network = string_network('Trp53', species=10090, required_score=700)
# 酵母网络 (如果存在直系同源物)
yeast_network = string_network('gene_name', species=4932, required_score=700)
使用场景:从种子蛋白质开始,发现连接的功能模块。
from scripts.string_api import (string_interaction_partners, string_network,
string_enrichment)
# 步骤 1:从种子蛋白质开始
seed_proteins = ['TP53']
# 步骤 2:获取一级相互作用伙伴
partners = string_interaction_partners(seed_proteins, species=9606,
limit=30, required_score=700)
# 步骤 3:解析伙伴以获取蛋白质列表
import pandas as pd
import io
df = pd.read_csv(io.StringIO(partners), sep='\t')
all_proteins = list(set(df['preferredName_A'].tolist() +
df['preferredName_B'].tolist()))
# 步骤 4:对扩展网络执行富集分析
enrichment = string_enrichment(all_proteins[:50], species=9606)
# 步骤 5:筛选感兴趣的功能模块
enrichment_df = pd.read_csv(io.StringIO(enrichment), sep='\t')
modules = enrichment_df[enrichment_df['fdr'] < 0.001]
指定物种时,请使用 NCBI 分类单元 ID:
| 生物体 | 常用名 | 分类单元 ID |
|---|---|---|
| Homo sapiens | 人类 | 9606 |
| Mus musculus | 小鼠 | 10090 |
| Rattus norvegicus | 大鼠 | 10116 |
| Drosophila melanogaster | 果蝇 | 7227 |
| Caenorhabditis elegans | 秀丽隐杆线虫 | 6239 |
| Saccharomyces cerevisiae | 酵母 | 4932 |
| Arabidopsis thaliana | 拟南芥 | 3702 |
| Escherichia coli | 大肠杆菌 | 511145 |
| Danio rerio | 斑马鱼 | 7955 |
STRING 提供综合置信度分数 (0-1000),整合了多种证据类型:
根据分析目标选择阈值:
权衡:
包含所有证据类型 (实验、计算、文本挖掘)。代表功能上相关的蛋白质,即使没有直接的物理结合。
何时使用:
仅包含直接物理结合的证据 (实验数据和数据库注释的物理相互作用)。
何时使用:
string_map_ids() 以获得更快的查询9606.ENSP00000269305 而不是基因名称有关全面的 API 文档、完整参数列表、输出格式和高级用法,请参阅 references/string_reference.md。其中包括:
未找到蛋白质:
string_map_ids() 映射标识符网络结果为空:
required_score)超时或查询缓慢:
"需要物种" 错误:
species 参数结果看起来意外:
string_version() 检查 STRING 版本对于蛋白质组规模的分析或完整物种网络上传:
对于完整数据集的批量下载:
STRING 数据在 Creative Commons BY 4.0 许可下免费提供:
在出版物中使用 STRING 时,请引用来自以下网址的最新出版物:https://string-db.org/cgi/about
每周安装次数
137
仓库
GitHub 星标数
23.4K
首次出现
2026年1月21日
安全审计
安装于
claude-code118
opencode111
gemini-cli106
cursor105
antigravity100
codex96
STRING is a comprehensive database of known and predicted protein-protein interactions covering 59M proteins and 20B+ interactions across 5000+ organisms. Query interaction networks, perform functional enrichment, discover partners via REST API for systems biology and pathway analysis.
This skill should be used when:
The skill provides:
scripts/string_api.py) for all STRING REST API operationsreferences/string_reference.md) with detailed API specificationsWhen users request STRING data, determine which operation is needed and use the appropriate function from scripts/string_api.py.
string_map_ids)Convert gene names, protein names, and external IDs to STRING identifiers.
When to use : Starting any STRING analysis, validating protein names, finding canonical identifiers.
Usage :
from scripts.string_api import string_map_ids
# Map single protein
result = string_map_ids('TP53', species=9606)
# Map multiple proteins
result = string_map_ids(['TP53', 'BRCA1', 'EGFR', 'MDM2'], species=9606)
# Map with multiple matches per query
result = string_map_ids('p53', species=9606, limit=5)
Parameters :
species: NCBI taxon ID (9606 = human, 10090 = mouse, 7227 = fly)limit: Number of matches per identifier (default: 1)echo_query: Include query term in output (default: 1)Best practice : Always map identifiers first for faster subsequent queries.
string_network)Get protein-protein interaction network data in tabular format.
When to use : Building interaction networks, analyzing connectivity, retrieving interaction evidence.
Usage :
from scripts.string_api import string_network
# Get network for single protein
network = string_network('9606.ENSP00000269305', species=9606)
# Get network with multiple proteins
proteins = ['9606.ENSP00000269305', '9606.ENSP00000275493']
network = string_network(proteins, required_score=700)
# Expand network with additional interactors
network = string_network('TP53', species=9606, add_nodes=10, required_score=400)
# Physical interactions only
network = string_network('TP53', species=9606, network_type='physical')
Parameters :
required_score: Confidence threshold (0-1000)
network_type: 'functional' (all evidence, default) or 'physical' (direct binding only)add_nodes: Add N most connected proteins (0-10)Output columns : Interaction pairs, confidence scores, and individual evidence scores (neighborhood, fusion, coexpression, experimental, database, text-mining).
string_network_image)Generate network visualization as PNG image.
When to use : Creating figures, visual exploration, presentations.
Usage :
from scripts.string_api import string_network_image
# Get network image
proteins = ['TP53', 'MDM2', 'ATM', 'CHEK2', 'BRCA1']
img_data = string_network_image(proteins, species=9606, required_score=700)
# Save image
with open('network.png', 'wb') as f:
f.write(img_data)
# Evidence-colored network
img = string_network_image(proteins, species=9606, network_flavor='evidence')
# Confidence-based visualization
img = string_network_image(proteins, species=9606, network_flavor='confidence')
# Actions network (activation/inhibition)
img = string_network_image(proteins, species=9606, network_flavor='actions')
Network flavors :
'evidence': Colored lines show evidence types (default)'confidence': Line thickness represents confidence'actions': Shows activating/inhibiting relationshipsstring_interaction_partners)Find all proteins that interact with given protein(s).
When to use : Discovering novel interactions, finding hub proteins, expanding networks.
Usage :
from scripts.string_api import string_interaction_partners
# Get top 10 interactors of TP53
partners = string_interaction_partners('TP53', species=9606, limit=10)
# Get high-confidence interactors
partners = string_interaction_partners('TP53', species=9606,
limit=20, required_score=700)
# Find interactors for multiple proteins
partners = string_interaction_partners(['TP53', 'MDM2'],
species=9606, limit=15)
Parameters :
limit: Maximum number of partners to return (default: 10)required_score: Confidence threshold (0-1000)Use cases :
string_enrichment)Perform enrichment analysis across Gene Ontology, KEGG pathways, Pfam domains, and more.
When to use : Interpreting protein lists, pathway analysis, functional characterization, understanding biological processes.
Usage :
from scripts.string_enrichment import string_enrichment
# Enrichment for a protein list
proteins = ['TP53', 'MDM2', 'ATM', 'CHEK2', 'BRCA1', 'ATR', 'TP73']
enrichment = string_enrichment(proteins, species=9606)
# Parse results to find significant terms
import pandas as pd
df = pd.read_csv(io.StringIO(enrichment), sep='\t')
significant = df[df['fdr'] < 0.05]
Enrichment categories :
Output columns :
category: Annotation database (e.g., "KEGG Pathways", "GO Biological Process")term: Term identifierdescription: Human-readable term descriptionnumber_of_genes: Input proteins with this annotationp_value: Uncorrected enrichment p-valuefdr: False discovery rate (corrected p-value)Statistical method : Fisher's exact test with Benjamini-Hochberg FDR correction.
Interpretation : FDR < 0.05 indicates statistically significant enrichment.
string_ppi_enrichment)Test if a protein network has significantly more interactions than expected by chance.
When to use : Validating if proteins form functional module, testing network connectivity.
Usage :
from scripts.string_api import string_ppi_enrichment
import json
# Test network connectivity
proteins = ['TP53', 'MDM2', 'ATM', 'CHEK2', 'BRCA1']
result = string_ppi_enrichment(proteins, species=9606, required_score=400)
# Parse JSON result
data = json.loads(result)
print(f"Observed edges: {data['number_of_edges']}")
print(f"Expected edges: {data['expected_number_of_edges']}")
print(f"P-value: {data['p_value']}")
Output fields :
number_of_nodes: Proteins in networknumber_of_edges: Observed interactionsexpected_number_of_edges: Expected in random networkp_value: Statistical significanceInterpretation :
string_homology)Retrieve protein similarity and homology information.
When to use : Identifying protein families, paralog analysis, cross-species comparisons.
Usage :
from scripts.string_api import string_homology
# Get homology between proteins
proteins = ['TP53', 'TP63', 'TP73'] # p53 family
homology = string_homology(proteins, species=9606)
Use cases :
string_version)Get current STRING database version.
When to use : Ensuring reproducibility, documenting methods.
Usage :
from scripts.string_api import string_version
version = string_version()
print(f"STRING version: {version}")
Use case : Analyze a list of proteins from experiment (e.g., differential expression, proteomics).
from scripts.string_api import (string_map_ids, string_network,
string_enrichment, string_ppi_enrichment,
string_network_image)
# Step 1: Map gene names to STRING IDs
gene_list = ['TP53', 'BRCA1', 'ATM', 'CHEK2', 'MDM2', 'ATR', 'BRCA2']
mapping = string_map_ids(gene_list, species=9606)
# Step 2: Get interaction network
network = string_network(gene_list, species=9606, required_score=400)
# Step 3: Test if network is enriched
ppi_result = string_ppi_enrichment(gene_list, species=9606)
# Step 4: Perform functional enrichment
enrichment = string_enrichment(gene_list, species=9606)
# Step 5: Generate network visualization
img = string_network_image(gene_list, species=9606,
network_flavor='evidence', required_score=400)
with open('protein_network.png', 'wb') as f:
f.write(img)
# Step 6: Parse and interpret results
Use case : Deep dive into one protein's interactions and partners.
from scripts.string_api import (string_map_ids, string_interaction_partners,
string_network_image)
# Step 1: Map protein name
protein = 'TP53'
mapping = string_map_ids(protein, species=9606)
# Step 2: Get all interaction partners
partners = string_interaction_partners(protein, species=9606,
limit=20, required_score=700)
# Step 3: Visualize expanded network
img = string_network_image(protein, species=9606, add_nodes=15,
network_flavor='confidence', required_score=700)
with open('tp53_network.png', 'wb') as f:
f.write(img)
Use case : Identify and visualize proteins in a specific biological pathway.
from scripts.string_api import string_enrichment, string_network
# Step 1: Start with known pathway proteins
dna_repair_proteins = ['TP53', 'ATM', 'ATR', 'CHEK1', 'CHEK2',
'BRCA1', 'BRCA2', 'RAD51', 'XRCC1']
# Step 2: Get network
network = string_network(dna_repair_proteins, species=9606,
required_score=700, add_nodes=5)
# Step 3: Enrichment to confirm pathway annotation
enrichment = string_enrichment(dna_repair_proteins, species=9606)
# Step 4: Parse enrichment for DNA repair pathways
import pandas as pd
import io
df = pd.read_csv(io.StringIO(enrichment), sep='\t')
dna_repair = df[df['description'].str.contains('DNA repair', case=False)]
Use case : Compare protein interactions across different organisms.
from scripts.string_api import string_network
# Human network
human_network = string_network('TP53', species=9606, required_score=700)
# Mouse network
mouse_network = string_network('Trp53', species=10090, required_score=700)
# Yeast network (if ortholog exists)
yeast_network = string_network('gene_name', species=4932, required_score=700)
Use case : Start with seed proteins and discover connected functional modules.
from scripts.string_api import (string_interaction_partners, string_network,
string_enrichment)
# Step 1: Start with seed protein(s)
seed_proteins = ['TP53']
# Step 2: Get first-degree interactors
partners = string_interaction_partners(seed_proteins, species=9606,
limit=30, required_score=700)
# Step 3: Parse partners to get protein list
import pandas as pd
import io
df = pd.read_csv(io.StringIO(partners), sep='\t')
all_proteins = list(set(df['preferredName_A'].tolist() +
df['preferredName_B'].tolist()))
# Step 4: Perform enrichment on expanded network
enrichment = string_enrichment(all_proteins[:50], species=9606)
# Step 5: Filter for interesting functional modules
enrichment_df = pd.read_csv(io.StringIO(enrichment), sep='\t')
modules = enrichment_df[enrichment_df['fdr'] < 0.001]
When specifying species, use NCBI taxon IDs:
| Organism | Common Name | Taxon ID |
|---|---|---|
| Homo sapiens | Human | 9606 |
| Mus musculus | Mouse | 10090 |
| Rattus norvegicus | Rat | 10116 |
| Drosophila melanogaster | Fruit fly | 7227 |
| Caenorhabditis elegans | C. elegans | 6239 |
| Saccharomyces cerevisiae | Yeast | 4932 |
| Arabidopsis thaliana | Thale cress | 3702 |
| Escherichia coli | E. coli | 511145 |
| Danio rerio | Zebrafish | 7955 |
Full list available at: https://string-db.org/cgi/input?input_page_active_form=organisms
STRING provides combined confidence scores (0-1000) integrating multiple evidence types:
Choose threshold based on analysis goals:
Trade-offs :
Includes all evidence types (experimental, computational, text-mining). Represents proteins that are functionally associated, even without direct physical binding.
When to use :
Only includes evidence for direct physical binding (experimental data and database annotations for physical interactions).
When to use :
string_map_ids() before other operations for faster queries9606.ENSP00000269305 instead of gene namesFor comprehensive API documentation, complete parameter lists, output formats, and advanced usage, refer to references/string_reference.md. This includes:
No proteins found :
string_map_ids()Empty network results :
required_score)Timeout or slow queries :
"Species required" error :
species parameter for networks with >10 proteinsResults look unexpected :
string_version()For proteome-scale analysis or complete species network upload:
For bulk downloads of complete datasets:
STRING data is freely available under Creative Commons BY 4.0 license:
When using STRING in publications, cite the most recent publication from: https://string-db.org/cgi/about
Weekly Installs
137
Repository
GitHub Stars
23.4K
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
claude-code118
opencode111
gemini-cli106
cursor105
antigravity100
codex96
Excel财务建模规范与xlsx文件处理指南:专业格式、零错误公式与数据分析
43,800 周安装
OpenViking 记忆插件指南:AI助手长期记忆管理与自动上下文注入
135 周安装
ByteRover CLI - 上下文工程平台,为AI编码智能体自动管理项目知识库
135 周安装
Symfony API Platform序列化指南:合约设计、安全防护与渐进式披露
135 周安装
PostgreSQL只读查询技能 - 安全连接AI助手执行数据库查询,支持SSL加密与权限控制
135 周安装
Next.js服务端与客户端组件选择指南:TypeScript最佳实践与性能优化
135 周安装
项目会话管理器 (PSM) - 使用 Git Worktrees 和 Tmux 自动化隔离开发环境
135 周安装