STRING 数据库 API 技能：蛋白质相互作用网络查询与功能富集分析

string-database by davila7/claude-code-templates

137 周安装量

23,400 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/davila7/claude-code-templates --skill string-database

数据分析科研工具生物信息学

🇨🇳中文介绍

STRING 数据库

概述

STRING 是一个全面的已知和预测蛋白质-蛋白质相互作用数据库，涵盖 5000 多种生物中的 5900 万种蛋白质和超过 200 亿个相互作用。通过 REST API 查询相互作用网络、执行功能富集分析、发现相互作用伙伴，用于系统生物学和通路分析。

何时使用此技能

此技能应在以下情况下使用：

检索单个或多个蛋白质的蛋白质-蛋白质相互作用网络
对蛋白质列表执行功能富集分析（GO、KEGG、Pfam）
发现相互作用伙伴并扩展蛋白质网络
测试蛋白质是否形成显著富集的功能模块
生成基于证据着色的网络可视化
分析同源性和蛋白质家族关系
进行跨物种蛋白质相互作用比较
识别枢纽蛋白和网络连接模式

快速开始

此技能提供：

用于所有 STRING REST API 操作的 Python 辅助函数 (scripts/string_api.py)
包含详细 API 规范的综合参考文档 (references/string_reference.md)

当用户请求 STRING 数据时，确定需要哪种操作并使用 scripts/string_api.py 中的相应函数。

核心操作

1. 标识符映射 (`string_map_ids`)

将基因名称、蛋白质名称和外部 ID 转换为 STRING 标识符。

：开始任何 STRING 分析、验证蛋白质名称、查找规范标识符时。

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

2. 网络检索 (`string_network`)

以表格格式获取蛋白质-蛋白质相互作用网络数据。

何时使用：构建相互作用网络、分析连接性、检索相互作用证据时。

from scripts.string_api import string_network

# 获取单个蛋白质的网络
network = string_network('9606.ENSP00000269305', species=9606)

# 获取多个蛋白质的网络
proteins = ['9606.ENSP00000269305', '9606.ENSP00000275493']
network = string_network(proteins, required_score=700)

# 使用额外的相互作用物扩展网络
network = string_network('TP53', species=9606, add_nodes=10, required_score=400)

# 仅物理相互作用
network = string_network('TP53', species=9606, network_type='physical')

required_score: 置信度阈值 (0-1000)
- 150: 低置信度 (探索性分析)
- 400: 中等置信度 (默认，标准分析)
- 700: 高置信度 (保守分析)
- 900: 最高置信度 (非常严格)
network_type: 'functional' (所有证据，默认) 或 'physical' (仅直接结合)
add_nodes: 添加 N 个连接最紧密的蛋白质 (0-10)

输出列：相互作用对、置信度分数以及各证据分数（邻域、融合、共表达、实验、数据库、文本挖掘）。

3. 网络可视化 (`string_network_image`)

生成 PNG 格式的网络可视化图像。

何时使用：创建图表、可视化探索、演示时。

from scripts.string_api import string_network_image

# 获取网络图像
proteins = ['TP53', 'MDM2', 'ATM', 'CHEK2', 'BRCA1']
img_data = string_network_image(proteins, species=9606, required_score=700)

# 保存图像
with open('network.png', 'wb') as f:
    f.write(img_data)

# 证据着色网络
img = string_network_image(proteins, species=9606, network_flavor='evidence')

# 基于置信度的可视化
img = string_network_image(proteins, species=9606, network_flavor='confidence')

# 作用网络（激活/抑制）
img = string_network_image(proteins, species=9606, network_flavor='actions')

'evidence': 彩色线条显示证据类型 (默认)
'confidence': 线条粗细代表置信度
'actions': 显示激活/抑制关系

4. 相互作用伙伴 (`string_interaction_partners`)

查找与给定蛋白质相互作用的所有蛋白质。

何时使用：发现新的相互作用、寻找枢纽蛋白、扩展网络时。

from scripts.string_api import string_interaction_partners

# 获取 TP53 的前 10 个相互作用伙伴
partners = string_interaction_partners('TP53', species=9606, limit=10)

# 获取高置信度的相互作用伙伴
partners = string_interaction_partners('TP53', species=9606,
                                      limit=20, required_score=700)

# 查找多个蛋白质的相互作用伙伴
partners = string_interaction_partners(['TP53', 'MDM2'],
                                      species=9606, limit=15)

limit: 返回的伙伴最大数量 (默认: 10)
required_score: 置信度阈值 (0-1000)

枢纽蛋白识别
从种子蛋白质扩展网络
发现间接连接

5. 功能富集 (`string_enrichment`)

在基因本体论、KEGG 通路、Pfam 结构域等方面执行富集分析。

何时使用：解释蛋白质列表、通路分析、功能表征、理解生物过程时。

from scripts.string_enrichment import string_enrichment

# 对蛋白质列表进行富集分析
proteins = ['TP53', 'MDM2', 'ATM', 'CHEK2', 'BRCA1', 'ATR', 'TP73']
enrichment = string_enrichment(proteins, species=9606)

# 解析结果以查找显著项
import pandas as pd
df = pd.read_csv(io.StringIO(enrichment), sep='\t')
significant = df[df['fdr'] < 0.05]

基因本体论：生物过程、分子功能、细胞成分
KEGG 通路：代谢和信号通路
Pfam：蛋白质结构域
InterPro：蛋白质家族和结构域
SMART：结构域架构
UniProt 关键词：经过人工审阅的功能关键词

category: 注释数据库 (例如 "KEGG Pathways", "GO Biological Process")
term: 术语标识符
description: 人类可读的术语描述
number_of_genes: 具有此注释的输入蛋白质数量
p_value: 未校正的富集 p 值
fdr: 错误发现率 (校正后的 p 值)

统计方法：Fisher 精确检验，使用 Benjamini-Hochberg FDR 校正。

解释：FDR < 0.05 表示具有统计学显著性的富集。

6. PPI 富集 (`string_ppi_enrichment`)

测试蛋白质网络是否比随机预期具有显著更多的相互作用。

何时使用：验证蛋白质是否形成功能模块、测试网络连接性时。

from scripts.string_api import string_ppi_enrichment
import json

# 测试网络连接性
proteins = ['TP53', 'MDM2', 'ATM', 'CHEK2', 'BRCA1']
result = string_ppi_enrichment(proteins, species=9606, required_score=400)

# 解析 JSON 结果
data = json.loads(result)
print(f"Observed edges: {data['number_of_edges']}")
print(f"Expected edges: {data['expected_number_of_edges']}")
print(f"P-value: {data['p_value']}")

number_of_nodes: 网络中的蛋白质数量
number_of_edges: 观察到的相互作用数量
expected_number_of_edges: 随机网络中的预期数量
p_value: 统计显著性

p 值 < 0.05: 网络显著富集 (蛋白质可能形成功能模块)
p 值 ≥ 0.05: 无显著富集 (蛋白质可能无关)

7. 同源性分数 (`string_homology`)

检索蛋白质相似性和同源性信息。

何时使用：识别蛋白质家族、旁系同源分析、跨物种比较时。

from scripts.string_api import string_homology

# 获取蛋白质之间的同源性
proteins = ['TP53', 'TP63', 'TP73']  # p53 家族
homology = string_homology(proteins, species=9606)

蛋白质家族识别
旁系同源发现
进化分析

8. 版本信息 (`string_version`)

获取当前 STRING 数据库版本。

何时使用：确保可重复性、记录方法时。

from scripts.string_api import string_version

version = string_version()
print(f"STRING version: {version}")

流程 1：蛋白质列表分析 (标准流程)

使用场景：分析来自实验的蛋白质列表 (例如，差异表达、蛋白质组学)。

from scripts.string_api import (string_map_ids, string_network,
                                string_enrichment, string_ppi_enrichment,
                                string_network_image)

# 步骤 1：将基因名称映射到 STRING ID
gene_list = ['TP53', 'BRCA1', 'ATM', 'CHEK2', 'MDM2', 'ATR', 'BRCA2']
mapping = string_map_ids(gene_list, species=9606)

# 步骤 2：获取相互作用网络
network = string_network(gene_list, species=9606, required_score=400)

# 步骤 3：测试网络是否富集
ppi_result = string_ppi_enrichment(gene_list, species=9606)

# 步骤 4：执行功能富集分析
enrichment = string_enrichment(gene_list, species=9606)

# 步骤 5：生成网络可视化
img = string_network_image(gene_list, species=9606,
                          network_flavor='evidence', required_score=400)
with open('protein_network.png', 'wb') as f:
    f.write(img)

# 步骤 6：解析和解释结果

流程 2：单个蛋白质研究

使用场景：深入研究一个蛋白质的相互作用和伙伴。

from scripts.string_api import (string_map_ids, string_interaction_partners,
                                string_network_image)

# 步骤 1：映射蛋白质名称
protein = 'TP53'
mapping = string_map_ids(protein, species=9606)

# 步骤 2：获取所有相互作用伙伴
partners = string_interaction_partners(protein, species=9606,
                                      limit=20, required_score=700)

# 步骤 3：可视化扩展网络
img = string_network_image(protein, species=9606, add_nodes=15,
                          network_flavor='confidence', required_score=700)
with open('tp53_network.png', 'wb') as f:
    f.write(img)

流程 3：以通路为中心的分析

使用场景：识别和可视化特定生物通路中的蛋白质。

from scripts.string_api import string_enrichment, string_network

# 步骤 1：从已知的通路蛋白质开始
dna_repair_proteins = ['TP53', 'ATM', 'ATR', 'CHEK1', 'CHEK2',
                       'BRCA1', 'BRCA2', 'RAD51', 'XRCC1']

# 步骤 2：获取网络
network = string_network(dna_repair_proteins, species=9606,
                        required_score=700, add_nodes=5)

# 步骤 3：富集分析以确认通路注释
enrichment = string_enrichment(dna_repair_proteins, species=9606)

# 步骤 4：解析富集结果以查找 DNA 修复通路
import pandas as pd
import io
df = pd.read_csv(io.StringIO(enrichment), sep='\t')
dna_repair = df[df['description'].str.contains('DNA repair', case=False)]

流程 4：跨物种分析

使用场景：比较不同生物之间的蛋白质相互作用。

from scripts.string_api import string_network

# 人类网络
human_network = string_network('TP53', species=9606, required_score=700)

# 小鼠网络
mouse_network = string_network('Trp53', species=10090, required_score=700)

# 酵母网络 (如果存在直系同源物)
yeast_network = string_network('gene_name', species=4932, required_score=700)

流程 5：网络扩展与发现

使用场景：从种子蛋白质开始，发现连接的功能模块。

from scripts.string_api import (string_interaction_partners, string_network,
                                string_enrichment)

# 步骤 1：从种子蛋白质开始
seed_proteins = ['TP53']

# 步骤 2：获取一级相互作用伙伴
partners = string_interaction_partners(seed_proteins, species=9606,
                                      limit=30, required_score=700)

# 步骤 3：解析伙伴以获取蛋白质列表
import pandas as pd
import io
df = pd.read_csv(io.StringIO(partners), sep='\t')
all_proteins = list(set(df['preferredName_A'].tolist() +
                       df['preferredName_B'].tolist()))

# 步骤 4：对扩展网络执行富集分析
enrichment = string_enrichment(all_proteins[:50], species=9606)

# 步骤 5：筛选感兴趣的功能模块
enrichment_df = pd.read_csv(io.StringIO(enrichment), sep='\t')
modules = enrichment_df[enrichment_df['fdr'] < 0.001]

指定物种时，请使用 NCBI 分类单元 ID：

生物体	常用名	分类单元 ID
Homo sapiens	人类	9606
Mus musculus	小鼠	10090
Rattus norvegicus	大鼠	10116
Drosophila melanogaster	果蝇	7227
Caenorhabditis elegans	秀丽隐杆线虫	6239
Saccharomyces cerevisiae	酵母	4932
Arabidopsis thaliana	拟南芥	3702
Escherichia coli	大肠杆菌	511145
Danio rerio	斑马鱼	7955

理解置信度分数

STRING 提供综合置信度分数 (0-1000)，整合了多种证据类型：

邻域 (nscore)：跨物种的保守基因组邻域
融合 (fscore)：基因融合事件
系统发育谱 (pscore)：跨物种的共现模式
共表达 (ascore)：相关的 RNA 表达
实验 (escore)：生化和遗传实验
数据库 (dscore)：经过人工审阅的通路和复合物数据库
文本挖掘 (tscore)：文献共现和 NLP 提取

根据分析目标选择阈值：

150 (低置信度)：探索性分析，假设生成
400 (中等置信度)：标准分析，平衡灵敏度/特异性
700 (高置信度)：保守分析，高置信度相互作用
900 (最高置信度)：非常严格，优先考虑实验证据

较低阈值：更多相互作用 (更高的召回率，更多的假阳性)
较高阈值：更少相互作用 (更高的精确度，更多的假阴性)

功能网络 (默认)

包含所有证据类型 (实验、计算、文本挖掘)。代表功能上相关的蛋白质，即使没有直接的物理结合。

通路分析
功能富集研究
系统生物学
大多数一般性分析

仅包含直接物理结合的证据 (实验数据和数据库注释的物理相互作用)。

结构生物学研究
蛋白质复合物分析
直接结合验证
需要物理接触时

始终先映射标识符：在其他操作之前使用 string_map_ids() 以获得更快的查询
尽可能使用 STRING ID：使用格式 9606.ENSP00000269305 而不是基因名称
为超过 10 个蛋白质的网络指定物种：需要以获得准确结果
遵守速率限制：API 调用之间等待 1 秒
使用版本化的 URL 以确保可重复性：参考文档中提供
优雅地处理错误：检查返回的字符串中是否有 "Error:" 前缀
选择合适的置信度阈值：使阈值与分析目标相匹配

有关全面的 API 文档、完整参数列表、输出格式和高级用法，请参阅 references/string_reference.md。其中包括：

完整的 API 端点规范
所有支持的输出格式 (TSV、JSON、XML、PSI-MI)
高级功能 (批量上传、值/排名富集)
错误处理和故障排除
与其他工具的集成 (Cytoscape、R、Python 库)
数据许可和引用信息

未找到蛋白质：

验证物种参数是否与标识符匹配
尝试先用 string_map_ids() 映射标识符
检查蛋白质名称是否有拼写错误

网络结果为空：

降低置信度阈值 (required_score)
检查蛋白质是否确实相互作用
验证物种是否正确

超时或查询缓慢：

减少输入蛋白质的数量
使用 STRING ID 而不是基因名称
将大型查询拆分成批次

"需要物种" 错误：

为超过 10 个蛋白质的网络添加 species 参数
为保持一致性，始终包含物种参数

结果看起来意外：

使用 string_version() 检查 STRING 版本
验证 network_type 是否合适 (功能型 vs 物理型)
检查置信度阈值的选择

对于蛋白质组规模的分析或完整物种网络上传：

访问 https://string-db.org
使用 "Upload proteome" 功能
STRING 将生成完整的相互作用网络并预测功能

对于完整数据集的批量下载：

下载页面：https://string-db.org/cgi/download
包括完整的相互作用文件、蛋白质注释和通路映射

STRING 数据在 Creative Commons BY 4.0 许可下免费提供：

学术和商业用途免费
发表时需要署名
引用最新的 STRING 出版物

在出版物中使用 STRING 时，请引用来自以下网址的最新出版物：https://string-db.org/cgi/about

🇺🇸English

STRING Database

Overview

STRING is a comprehensive database of known and predicted protein-protein interactions covering 59M proteins and 20B+ interactions across 5000+ organisms. Query interaction networks, perform functional enrichment, discover partners via REST API for systems biology and pathway analysis.

When to Use This Skill

This skill should be used when:

Retrieving protein-protein interaction networks for single or multiple proteins
Performing functional enrichment analysis (GO, KEGG, Pfam) on protein lists
Discovering interaction partners and expanding protein networks
Testing if proteins form significantly enriched functional modules
Generating network visualizations with evidence-based coloring
Analyzing homology and protein family relationships
Conducting cross-species protein interaction comparisons
Identifying hub proteins and network connectivity patterns

Quick Start

The skill provides:

Python helper functions (scripts/string_api.py) for all STRING REST API operations
Comprehensive reference documentation (references/string_reference.md) with detailed API specifications

When users request STRING data, determine which operation is needed and use the appropriate function from scripts/string_api.py.

Core Operations

1. Identifier Mapping (`string_map_ids`)

Convert gene names, protein names, and external IDs to STRING identifiers.

When to use : Starting any STRING analysis, validating protein names, finding canonical identifiers.

Usage :

from scripts.string_api import string_map_ids

# Map single protein
result = string_map_ids('TP53', species=9606)

# Map multiple proteins
result = string_map_ids(['TP53', 'BRCA1', 'EGFR', 'MDM2'], species=9606)

# Map with multiple matches per query
result = string_map_ids('p53', species=9606, limit=5)

Parameters :

species: NCBI taxon ID (9606 = human, 10090 = mouse, 7227 = fly)
limit: Number of matches per identifier (default: 1)
echo_query: Include query term in output (default: 1)

Best practice : Always map identifiers first for faster subsequent queries.

2. Network Retrieval (`string_network`)

Get protein-protein interaction network data in tabular format.

When to use : Building interaction networks, analyzing connectivity, retrieving interaction evidence.

Usage :

from scripts.string_api import string_network

# Get network for single protein
network = string_network('9606.ENSP00000269305', species=9606)

# Get network with multiple proteins
proteins = ['9606.ENSP00000269305', '9606.ENSP00000275493']
network = string_network(proteins, required_score=700)

# Expand network with additional interactors
network = string_network('TP53', species=9606, add_nodes=10, required_score=400)

# Physical interactions only
network = string_network('TP53', species=9606, network_type='physical')

Parameters :

required_score: Confidence threshold (0-1000)
- 150: low confidence (exploratory)
- 400: medium confidence (default, standard analysis)
- 700: high confidence (conservative)
- 900: highest confidence (very stringent)
network_type: 'functional' (all evidence, default) or 'physical' (direct binding only)
add_nodes: Add N most connected proteins (0-10)

Output columns : Interaction pairs, confidence scores, and individual evidence scores (neighborhood, fusion, coexpression, experimental, database, text-mining).

3. Network Visualization (`string_network_image`)

Generate network visualization as PNG image.

When to use : Creating figures, visual exploration, presentations.

Usage :

from scripts.string_api import string_network_image

# Get network image
proteins = ['TP53', 'MDM2', 'ATM', 'CHEK2', 'BRCA1']
img_data = string_network_image(proteins, species=9606, required_score=700)

# Save image
with open('network.png', 'wb') as f:
    f.write(img_data)

# Evidence-colored network
img = string_network_image(proteins, species=9606, network_flavor='evidence')

# Confidence-based visualization
img = string_network_image(proteins, species=9606, network_flavor='confidence')

# Actions network (activation/inhibition)
img = string_network_image(proteins, species=9606, network_flavor='actions')

Network flavors :

'evidence': Colored lines show evidence types (default)
'confidence': Line thickness represents confidence
'actions': Shows activating/inhibiting relationships

4. Interaction Partners (`string_interaction_partners`)

Find all proteins that interact with given protein(s).

When to use : Discovering novel interactions, finding hub proteins, expanding networks.

Usage :

from scripts.string_api import string_interaction_partners

# Get top 10 interactors of TP53
partners = string_interaction_partners('TP53', species=9606, limit=10)

# Get high-confidence interactors
partners = string_interaction_partners('TP53', species=9606,
                                      limit=20, required_score=700)

# Find interactors for multiple proteins
partners = string_interaction_partners(['TP53', 'MDM2'],
                                      species=9606, limit=15)

Parameters :

limit: Maximum number of partners to return (default: 10)
required_score: Confidence threshold (0-1000)

Use cases :

Hub protein identification
Network expansion from seed proteins
Discovering indirect connections

5. Functional Enrichment (`string_enrichment`)

Perform enrichment analysis across Gene Ontology, KEGG pathways, Pfam domains, and more.

When to use : Interpreting protein lists, pathway analysis, functional characterization, understanding biological processes.

Usage :

from scripts.string_enrichment import string_enrichment

# Enrichment for a protein list
proteins = ['TP53', 'MDM2', 'ATM', 'CHEK2', 'BRCA1', 'ATR', 'TP73']
enrichment = string_enrichment(proteins, species=9606)

# Parse results to find significant terms
import pandas as pd
df = pd.read_csv(io.StringIO(enrichment), sep='\t')
significant = df[df['fdr'] < 0.05]

Enrichment categories :

Gene Ontology : Biological Process, Molecular Function, Cellular Component
KEGG Pathways : Metabolic and signaling pathways
Pfam : Protein domains
InterPro : Protein families and domains
SMART : Domain architecture
UniProt Keywords : Curated functional keywords

Output columns :

category: Annotation database (e.g., "KEGG Pathways", "GO Biological Process")
term: Term identifier
description: Human-readable term description
number_of_genes: Input proteins with this annotation
p_value: Uncorrected enrichment p-value
fdr: False discovery rate (corrected p-value)

Statistical method : Fisher's exact test with Benjamini-Hochberg FDR correction.

Interpretation : FDR < 0.05 indicates statistically significant enrichment.

6. PPI Enrichment (`string_ppi_enrichment`)

Test if a protein network has significantly more interactions than expected by chance.

When to use : Validating if proteins form functional module, testing network connectivity.

Usage :

from scripts.string_api import string_ppi_enrichment
import json

# Test network connectivity
proteins = ['TP53', 'MDM2', 'ATM', 'CHEK2', 'BRCA1']
result = string_ppi_enrichment(proteins, species=9606, required_score=400)

# Parse JSON result
data = json.loads(result)
print(f"Observed edges: {data['number_of_edges']}")
print(f"Expected edges: {data['expected_number_of_edges']}")
print(f"P-value: {data['p_value']}")

Output fields :

number_of_nodes: Proteins in network
number_of_edges: Observed interactions
expected_number_of_edges: Expected in random network
p_value: Statistical significance

Interpretation :

p-value < 0.05: Network is significantly enriched (proteins likely form functional module)
p-value ≥ 0.05: No significant enrichment (proteins may be unrelated)

7. Homology Scores (`string_homology`)

Retrieve protein similarity and homology information.

When to use : Identifying protein families, paralog analysis, cross-species comparisons.

Usage :

from scripts.string_api import string_homology

# Get homology between proteins
proteins = ['TP53', 'TP63', 'TP73']  # p53 family
homology = string_homology(proteins, species=9606)

Use cases :

Protein family identification
Paralog discovery
Evolutionary analysis

8. Version Information (`string_version`)

Get current STRING database version.

When to use : Ensuring reproducibility, documenting methods.

Usage :

from scripts.string_api import string_version

version = string_version()
print(f"STRING version: {version}")

Common Analysis Workflows

Workflow 1: Protein List Analysis (Standard Workflow)

Use case : Analyze a list of proteins from experiment (e.g., differential expression, proteomics).

from scripts.string_api import (string_map_ids, string_network,
                                string_enrichment, string_ppi_enrichment,
                                string_network_image)

# Step 1: Map gene names to STRING IDs
gene_list = ['TP53', 'BRCA1', 'ATM', 'CHEK2', 'MDM2', 'ATR', 'BRCA2']
mapping = string_map_ids(gene_list, species=9606)

# Step 2: Get interaction network
network = string_network(gene_list, species=9606, required_score=400)

# Step 3: Test if network is enriched
ppi_result = string_ppi_enrichment(gene_list, species=9606)

# Step 4: Perform functional enrichment
enrichment = string_enrichment(gene_list, species=9606)

# Step 5: Generate network visualization
img = string_network_image(gene_list, species=9606,
                          network_flavor='evidence', required_score=400)
with open('protein_network.png', 'wb') as f:
    f.write(img)

# Step 6: Parse and interpret results

Workflow 2: Single Protein Investigation

Use case : Deep dive into one protein's interactions and partners.

from scripts.string_api import (string_map_ids, string_interaction_partners,
                                string_network_image)

# Step 1: Map protein name
protein = 'TP53'
mapping = string_map_ids(protein, species=9606)

# Step 2: Get all interaction partners
partners = string_interaction_partners(protein, species=9606,
                                      limit=20, required_score=700)

# Step 3: Visualize expanded network
img = string_network_image(protein, species=9606, add_nodes=15,
                          network_flavor='confidence', required_score=700)
with open('tp53_network.png', 'wb') as f:
    f.write(img)

Workflow 3: Pathway-Centric Analysis

Use case : Identify and visualize proteins in a specific biological pathway.

from scripts.string_api import string_enrichment, string_network

# Step 1: Start with known pathway proteins
dna_repair_proteins = ['TP53', 'ATM', 'ATR', 'CHEK1', 'CHEK2',
                       'BRCA1', 'BRCA2', 'RAD51', 'XRCC1']

# Step 2: Get network
network = string_network(dna_repair_proteins, species=9606,
                        required_score=700, add_nodes=5)

# Step 3: Enrichment to confirm pathway annotation
enrichment = string_enrichment(dna_repair_proteins, species=9606)

# Step 4: Parse enrichment for DNA repair pathways
import pandas as pd
import io
df = pd.read_csv(io.StringIO(enrichment), sep='\t')
dna_repair = df[df['description'].str.contains('DNA repair', case=False)]

Workflow 4: Cross-Species Analysis

Use case : Compare protein interactions across different organisms.

from scripts.string_api import string_network

# Human network
human_network = string_network('TP53', species=9606, required_score=700)

# Mouse network
mouse_network = string_network('Trp53', species=10090, required_score=700)

# Yeast network (if ortholog exists)
yeast_network = string_network('gene_name', species=4932, required_score=700)

Workflow 5: Network Expansion and Discovery

Use case : Start with seed proteins and discover connected functional modules.

from scripts.string_api import (string_interaction_partners, string_network,
                                string_enrichment)

# Step 1: Start with seed protein(s)
seed_proteins = ['TP53']

# Step 2: Get first-degree interactors
partners = string_interaction_partners(seed_proteins, species=9606,
                                      limit=30, required_score=700)

# Step 3: Parse partners to get protein list
import pandas as pd
import io
df = pd.read_csv(io.StringIO(partners), sep='\t')
all_proteins = list(set(df['preferredName_A'].tolist() +
                       df['preferredName_B'].tolist()))

# Step 4: Perform enrichment on expanded network
enrichment = string_enrichment(all_proteins[:50], species=9606)

# Step 5: Filter for interesting functional modules
enrichment_df = pd.read_csv(io.StringIO(enrichment), sep='\t')
modules = enrichment_df[enrichment_df['fdr'] < 0.001]

Common Species

When specifying species, use NCBI taxon IDs:

Organism	Common Name	Taxon ID
Homo sapiens	Human	9606
Mus musculus	Mouse	10090
Rattus norvegicus	Rat	10116
Drosophila melanogaster	Fruit fly	7227
Caenorhabditis elegans	C. elegans	6239
Saccharomyces cerevisiae	Yeast	4932
Arabidopsis thaliana	Thale cress	3702
Escherichia coli	E. coli	511145
Danio rerio	Zebrafish	7955

Full list available at: https://string-db.org/cgi/input?input_page_active_form=organisms

Understanding Confidence Scores

STRING provides combined confidence scores (0-1000) integrating multiple evidence types:

Evidence Channels

Neighborhood (nscore) : Conserved genomic neighborhood across species
Fusion (fscore) : Gene fusion events
Phylogenetic Profile (pscore) : Co-occurrence patterns across species
Coexpression (ascore) : Correlated RNA expression
Experimental (escore) : Biochemical and genetic experiments
Database (dscore) : Curated pathway and complex databases
Text-mining (tscore) : Literature co-occurrence and NLP extraction

Recommended Thresholds

Choose threshold based on analysis goals:

150 (low confidence) : Exploratory analysis, hypothesis generation
400 (medium confidence) : Standard analysis, balanced sensitivity/specificity
700 (high confidence) : Conservative analysis, high-confidence interactions
900 (highest confidence) : Very stringent, experimental evidence preferred

Trade-offs :

Lower thresholds: More interactions (higher recall, more false positives)
Higher thresholds: Fewer interactions (higher precision, more false negatives)

Network Types

Functional Networks (Default)

Includes all evidence types (experimental, computational, text-mining). Represents proteins that are functionally associated, even without direct physical binding.

When to use :

Pathway analysis
Functional enrichment studies
Systems biology
Most general analyses

Physical Networks

Only includes evidence for direct physical binding (experimental data and database annotations for physical interactions).

When to use :

Structural biology studies
Protein complex analysis
Direct binding validation
When physical contact is required

API Best Practices

Always map identifiers first : Use string_map_ids() before other operations for faster queries
Use STRING IDs when possible : Use format 9606.ENSP00000269305 instead of gene names
Specify species for networks >10 proteins: Required for accurate results
Respect rate limits : Wait 1 second between API calls
Use versioned URLs for reproducibility : Available in reference documentation
Handle errors gracefully : Check for "Error:" prefix in returned strings
Choose appropriate confidence thresholds : Match threshold to analysis goals

Detailed Reference

For comprehensive API documentation, complete parameter lists, output formats, and advanced usage, refer to references/string_reference.md. This includes:

Complete API endpoint specifications
All supported output formats (TSV, JSON, XML, PSI-MI)
Advanced features (bulk upload, values/ranks enrichment)
Error handling and troubleshooting
Integration with other tools (Cytoscape, R, Python libraries)
Data license and citation information

Troubleshooting

No proteins found :

Verify species parameter matches identifiers
Try mapping identifiers first with string_map_ids()
Check for typos in protein names

Empty network results :

Lower confidence threshold (required_score)
Check if proteins actually interact
Verify species is correct

Timeout or slow queries :

Reduce number of input proteins
Use STRING IDs instead of gene names
Split large queries into batches

"Species required" error :

Add species parameter for networks with >10 proteins
Always include species for consistency

Results look unexpected :

Check STRING version with string_version()
Verify network_type is appropriate (functional vs physical)
Review confidence threshold selection

Additional Resources

For proteome-scale analysis or complete species network upload:

Visit https://string-db.org
Use "Upload proteome" feature
STRING will generate complete interaction network and predict functions

For bulk downloads of complete datasets:

Download page: https://string-db.org/cgi/download
Includes complete interaction files, protein annotations, and pathway mappings

Data License

STRING data is freely available under Creative Commons BY 4.0 license:

Free for academic and commercial use
Attribution required when publishing
Cite latest STRING publication

Citation

When using STRING in publications, cite the most recent publication from: https://string-db.org/cgi/about

Weekly Installs

137

Repository

davila7/claude-…emplates

GitHub Stars

23.4K

First Seen

Jan 21, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

claude-code118

opencode111

gemini-cli106

cursor105

antigravity100

codex96

Excel财务建模规范与xlsx文件处理指南：专业格式、零错误公式与数据分析

43,800 周安装

STRING 数据库 API 技能：蛋白质相互作用网络查询与功能富集分析

🇨🇳中文介绍

STRING 数据库

概述

何时使用此技能

快速开始

核心操作

1. 标识符映射 (string_map_ids)

相关 Skills

2. 网络检索 (string_network)

3. 网络可视化 (string_network_image)

4. 相互作用伙伴 (string_interaction_partners)

5. 功能富集 (string_enrichment)

6. PPI 富集 (string_ppi_enrichment)

7. 同源性分数 (string_homology)

8. 版本信息 (string_version)

常见分析流程

流程 1：蛋白质列表分析 (标准流程)

流程 2：单个蛋白质研究

流程 3：以通路为中心的分析

流程 4：跨物种分析

流程 5：网络扩展与发现

常见物种

理解置信度分数

证据渠道

推荐阈值

网络类型

功能网络 (默认)

物理网络

API 最佳实践

详细参考

故障排除

其他资源

数据许可

引用

🇺🇸English

STRING Database

Overview

When to Use This Skill

Quick Start

Core Operations

1. Identifier Mapping (string_map_ids)

2. Network Retrieval (string_network)

3. Network Visualization (string_network_image)

4. Interaction Partners (string_interaction_partners)

5. Functional Enrichment (string_enrichment)

6. PPI Enrichment (string_ppi_enrichment)

7. Homology Scores (string_homology)

8. Version Information (string_version)

Common Analysis Workflows

Workflow 1: Protein List Analysis (Standard Workflow)

Workflow 2: Single Protein Investigation

Workflow 3: Pathway-Centric Analysis

Workflow 4: Cross-Species Analysis

Workflow 5: Network Expansion and Discovery

Common Species

Understanding Confidence Scores

Evidence Channels

Recommended Thresholds

Network Types

Functional Networks (Default)

Physical Networks

API Best Practices

Detailed Reference

Troubleshooting

Additional Resources

Data License

Citation

最新 Skills

1. 标识符映射 (`string_map_ids`)

2. 网络检索 (`string_network`)

3. 网络可视化 (`string_network_image`)

4. 相互作用伙伴 (`string_interaction_partners`)

5. 功能富集 (`string_enrichment`)

6. PPI 富集 (`string_ppi_enrichment`)

7. 同源性分数 (`string_homology`)

8. 版本信息 (`string_version`)

1. Identifier Mapping (`string_map_ids`)

2. Network Retrieval (`string_network`)

3. Network Visualization (`string_network_image`)

4. Interaction Partners (`string_interaction_partners`)

5. Functional Enrichment (`string_enrichment`)

6. PPI Enrichment (`string_ppi_enrichment`)

7. Homology Scores (`string_homology`)

8. Version Information (`string_version`)