BioServices Python包：生物信息学Web服务集成工具，支持UniProt、KEGG、BLAST等40+数据库

bioservices by davila7/claude-code-templates

193 周安装量

24,100 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/davila7/claude-code-templates --skill bioservices

Python Web框架生物信息学数据处理

🇨🇳中文介绍

BioServices

概述

BioServices 是一个 Python 软件包，提供对大约 40 个生物信息学网络服务和数据库的程序化访问。通过 Python 工作流检索生物数据、执行跨数据库查询、映射标识符、分析序列并整合多个生物资源。该软件包透明地处理 REST 和 SOAP/WSDL 协议。

使用场景

此技能应在以下情况使用：

从 UniProt、PDB、Pfam 检索蛋白质序列、注释或结构
通过 KEGG 或 Reactome 分析代谢途径和基因功能
在化合物数据库（ChEBI、ChEMBL、PubChem）中搜索化学信息
在不同生物数据库之间转换标识符（KEGG↔UniProt、化合物 ID）
运行序列相似性搜索（BLAST、MUSCLE 比对）
查询基因本体术语（QuickGO、GO 注释）
访问蛋白质-蛋白质相互作用数据（PSICQUIC、IntactComplex）
挖掘基因组数据（BioMart、ArrayExpress、ENA）
在单个工作流中整合来自多个生物信息学资源的数据

核心功能

1. 蛋白质分析

检索蛋白质信息、序列和功能注释：

from bioservices import UniProt

u = UniProt(verbose=False)

# 按名称搜索蛋白质
results = u.search("ZAP70_HUMAN", frmt="tab", columns="id,genes,organism")

# 检索 FASTA 序列
sequence = u.retrieve("P43403", "fasta")

# 在数据库之间映射标识符
kegg_ids = u.mapping(fr="UniProtKB_AC-ID", to="KEGG", query="P43403")

关键方法：

search(): 使用灵活的搜索词查询 UniProt
retrieve(): 以各种格式（FASTA、XML、tab）获取蛋白质条目

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

2. 途径发现与分析

访问基因和生物体的 KEGG 途径信息：

from bioservices import KEGG

k = KEGG()
k.organism = "hsa"  # 设置为人类

# 搜索生物体
k.lookfor_organism("droso")  # 查找果蝇物种

# 按名称查找途径
k.lookfor_pathway("B cell")  # 返回匹配的途径 ID

# 获取包含特定基因的途径
pathways = k.get_pathway_by_gene("7535", "hsa")  # ZAP70 基因

# 检索并解析途径数据
data = k.get("hsa04660")
parsed = k.parse(data)

# 提取途径相互作用
interactions = k.parse_kgml_pathway("hsa04660")
relations = interactions['relations']  # 蛋白质-蛋白质相互作用

# 转换为简单交互格式
sif_data = k.pathway2sif("hsa04660")

lookfor_organism()、lookfor_pathway(): 按名称搜索
get_pathway_by_gene(): 查找包含基因的途径
parse_kgml_pathway(): 提取结构化途径数据
pathway2sif(): 获取蛋白质相互作用网络

参考：references/workflow_patterns.md 获取完整的途径分析工作流。

3. 化合物数据库搜索

跨多个数据库搜索和交叉引用化合物：

from bioservices import KEGG, UniChem

k = KEGG()

# 按名称搜索化合物
results = k.find("compound", "Geldanamycin")  # 返回 cpd:C11222

# 获取包含数据库链接的化合物信息
compound_info = k.get("cpd:C11222")  # 包含 ChEBI 链接

# 使用 UniChem 进行 KEGG → ChEMBL 交叉引用
u = UniChem()
chembl_id = u.get_compound_id_from_kegg("C11222")  # 返回 CHEMBL278315

常见工作流：

在 KEGG 中按名称搜索化合物
提取 KEGG 化合物 ID
使用 UniChem 进行 KEGG → ChEMBL 映射
KEGG 条目中通常提供 ChEBI ID

参考：references/identifier_mapping.md 获取完整的跨数据库映射指南。

运行 BLAST 搜索和序列比对：

from bioservices import NCBIblast

s = NCBIblast(verbose=False)

# 针对 UniProtKB 运行 BLASTP
jobid = s.run(
    program="blastp",
    sequence=protein_sequence,
    stype="protein",
    database="uniprotkb",
    email="your.email@example.com"  # NCBI 要求
)

# 检查作业状态并检索结果
s.getStatus(jobid)
results = s.getResult(jobid, "out")

注意： BLAST 作业是异步的。检索结果前请检查状态。

在不同生物数据库之间转换标识符：

from bioservices import UniProt, KEGG

# UniProt 映射（支持许多数据库对）
u = UniProt()
results = u.mapping(
    fr="UniProtKB_AC-ID",  # 源数据库
    to="KEGG",              # 目标数据库
    query="P43403"          # 要转换的标识符
)

# KEGG 基因 ID → UniProt
kegg_to_uniprot = u.mapping(fr="KEGG", to="UniProtKB_AC-ID", query="hsa:7535")

# 对于化合物，使用 UniChem
from bioservices import UniChem
u = UniChem()
chembl_from_kegg = u.get_compound_id_from_kegg("C11222")

支持的映射（UniProt）：

UniProtKB ↔ KEGG
UniProtKB ↔ Ensembl
UniProtKB ↔ PDB
UniProtKB ↔ RefSeq
以及更多（参见 references/identifier_mapping.md）

6. 基因本体查询

访问 GO 术语和注释：

from bioservices import QuickGO

g = QuickGO(verbose=False)

# 检索 GO 术语信息
term_info = g.Term("GO:0003824", frmt="obo")

# 搜索注释
annotations = g.Annotation(protein="P43403", format="tsv")

7. 蛋白质-蛋白质相互作用

通过 PSICQUIC 查询相互作用数据库：

from bioservices import PSICQUIC

s = PSICQUIC(verbose=False)

# 查询特定数据库（例如，MINT）
interactions = s.query("mint", "ZAP70 AND species:9606")

# 列出可用的相互作用数据库
databases = s.activeDBs

可用数据库： MINT、IntAct、BioGRID、DIP 以及其他 30 多个。

多服务集成工作流

BioServices 擅长整合多个服务进行综合分析。常见的集成模式：

完整的蛋白质分析流程

执行完整的蛋白质表征工作流：

python scripts/protein_analysis_workflow.py ZAP70_HUMAN your.email@example.com

UniProt 搜索蛋白质条目
FASTA 序列检索
BLAST 相似性搜索
KEGG 途径发现
PSICQUIC 相互作用映射

分析生物体的所有途径：

python scripts/pathway_analysis.py hsa output_directory/

生物体的所有途径 ID
每个途径的蛋白质-蛋白质相互作用
相互作用类型分布
导出为 CSV/SIF 格式

跨数据库化合物搜索

跨数据库映射化合物标识符：

python scripts/compound_cross_reference.py Geldanamycin

KEGG 化合物 ID
ChEBI 标识符
ChEMBL 标识符
基本化合物属性

批量标识符转换

一次转换多个标识符：

python scripts/batch_id_converter.py input_ids.txt --from UniProtKB_AC-ID --to KEGG

不同服务以各种格式返回数据：

XML : 使用 BeautifulSoup 解析（大多数 SOAP 服务）
制表符分隔（TSV） : 使用 Pandas DataFrames 处理表格数据
字典/JSON : 直接进行 Python 操作
FASTA : 与 BioPython 集成进行序列分析

速率限制与详细程度

控制 API 请求行为：

from bioservices import KEGG

k = KEGG(verbose=False)  # 抑制 HTTP 请求详情
k.TIMEOUT = 30  # 为慢速连接调整超时时间

将服务调用包装在 try-except 块中：

try:
    results = u.search("ambiguous_query")
    if results:
        # 处理结果
        pass
except Exception as e:
    print(f"搜索失败: {e}")

使用标准生物体缩写：

hsa: 智人（人类）
mmu: 小家鼠（小鼠）
dme: 黑腹果蝇
sce: 酿酒酵母（酵母）

列出所有生物体：k.list("organism") 或 k.organismIds

与其他工具的集成

BioServices 可与以下工具良好协作：

BioPython : 对检索到的 FASTA 数据进行序列分析
Pandas : 表格数据操作
PyMOL : 3D 结构可视化（检索 PDB ID）
NetworkX : 途径相互作用的网络分析
Galaxy : 为工作流平台定制的工具包装器

演示完整工作流的可执行 Python 脚本：

protein_analysis_workflow.py: 端到端蛋白质表征
pathway_analysis.py: KEGG 途径发现和网络提取
compound_cross_reference.py: 多数据库化合物搜索
batch_id_converter.py: 批量标识符映射工具

脚本可以直接执行或根据特定用例进行调整。

按需加载的详细文档：

services_reference.md: 包含所有 40 多个服务及其方法的综合列表
workflow_patterns.md: 详细的多步骤分析工作流
identifier_mapping.md: 跨数据库 ID 转换的完整指南

在处理特定服务或复杂集成任务时加载参考文档。

uv pip install bioservices

依赖项自动管理。软件包在 Python 3.9-3.12 上测试。

有关详细的 API 文档和高级功能，请参考：

官方文档：https://bioservices.readthedocs.io/
源代码：https://github.com/cokelaer/bioservices
特定服务的参考文档位于 references/services_reference.md

🇺🇸English

BioServices

Overview

BioServices is a Python package providing programmatic access to approximately 40 bioinformatics web services and databases. Retrieve biological data, perform cross-database queries, map identifiers, analyze sequences, and integrate multiple biological resources in Python workflows. The package handles both REST and SOAP/WSDL protocols transparently.

When to Use This Skill

This skill should be used when:

Retrieving protein sequences, annotations, or structures from UniProt, PDB, Pfam
Analyzing metabolic pathways and gene functions via KEGG or Reactome
Searching compound databases (ChEBI, ChEMBL, PubChem) for chemical information
Converting identifiers between different biological databases (KEGG↔UniProt, compound IDs)
Running sequence similarity searches (BLAST, MUSCLE alignment)
Querying gene ontology terms (QuickGO, GO annotations)
Accessing protein-protein interaction data (PSICQUIC, IntactComplex)
Mining genomic data (BioMart, ArrayExpress, ENA)
Integrating data from multiple bioinformatics resources in a single workflow

Core Capabilities

1. Protein Analysis

Retrieve protein information, sequences, and functional annotations:

from bioservices import UniProt

u = UniProt(verbose=False)

# Search for protein by name
results = u.search("ZAP70_HUMAN", frmt="tab", columns="id,genes,organism")

# Retrieve FASTA sequence
sequence = u.retrieve("P43403", "fasta")

# Map identifiers between databases
kegg_ids = u.mapping(fr="UniProtKB_AC-ID", to="KEGG", query="P43403")

Key methods:

search(): Query UniProt with flexible search terms
retrieve(): Get protein entries in various formats (FASTA, XML, tab)
mapping(): Convert identifiers between databases

Reference: references/services_reference.md for complete UniProt API details.

2. Pathway Discovery and Analysis

Access KEGG pathway information for genes and organisms:

from bioservices import KEGG

k = KEGG()
k.organism = "hsa"  # Set to human

# Search for organisms
k.lookfor_organism("droso")  # Find Drosophila species

# Find pathways by name
k.lookfor_pathway("B cell")  # Returns matching pathway IDs

# Get pathways containing specific genes
pathways = k.get_pathway_by_gene("7535", "hsa")  # ZAP70 gene

# Retrieve and parse pathway data
data = k.get("hsa04660")
parsed = k.parse(data)

# Extract pathway interactions
interactions = k.parse_kgml_pathway("hsa04660")
relations = interactions['relations']  # Protein-protein interactions

# Convert to Simple Interaction Format
sif_data = k.pathway2sif("hsa04660")

Key methods:

lookfor_organism(), lookfor_pathway(): Search by name
get_pathway_by_gene(): Find pathways containing genes
parse_kgml_pathway(): Extract structured pathway data
pathway2sif(): Get protein interaction networks

Reference: references/workflow_patterns.md for complete pathway analysis workflows.

3. Compound Database Searches

Search and cross-reference compounds across multiple databases:

from bioservices import KEGG, UniChem

k = KEGG()

# Search compounds by name
results = k.find("compound", "Geldanamycin")  # Returns cpd:C11222

# Get compound information with database links
compound_info = k.get("cpd:C11222")  # Includes ChEBI links

# Cross-reference KEGG → ChEMBL using UniChem
u = UniChem()
chembl_id = u.get_compound_id_from_kegg("C11222")  # Returns CHEMBL278315

Common workflow:

Search compound by name in KEGG
Extract KEGG compound ID
Use UniChem for KEGG → ChEMBL mapping
ChEBI IDs are often provided in KEGG entries

Reference: references/identifier_mapping.md for complete cross-database mapping guide.

4. Sequence Analysis

Run BLAST searches and sequence alignments:

from bioservices import NCBIblast

s = NCBIblast(verbose=False)

# Run BLASTP against UniProtKB
jobid = s.run(
    program="blastp",
    sequence=protein_sequence,
    stype="protein",
    database="uniprotkb",
    email="your.email@example.com"  # Required by NCBI
)

# Check job status and retrieve results
s.getStatus(jobid)
results = s.getResult(jobid, "out")

Note: BLAST jobs are asynchronous. Check status before retrieving results.

5. Identifier Mapping

Convert identifiers between different biological databases:

from bioservices import UniProt, KEGG

# UniProt mapping (many database pairs supported)
u = UniProt()
results = u.mapping(
    fr="UniProtKB_AC-ID",  # Source database
    to="KEGG",              # Target database
    query="P43403"          # Identifier(s) to convert
)

# KEGG gene ID → UniProt
kegg_to_uniprot = u.mapping(fr="KEGG", to="UniProtKB_AC-ID", query="hsa:7535")

# For compounds, use UniChem
from bioservices import UniChem
u = UniChem()
chembl_from_kegg = u.get_compound_id_from_kegg("C11222")

Supported mappings (UniProt):

UniProtKB ↔ KEGG
UniProtKB ↔ Ensembl
UniProtKB ↔ PDB
UniProtKB ↔ RefSeq
And many more (see references/identifier_mapping.md)

6. Gene Ontology Queries

Access GO terms and annotations:

from bioservices import QuickGO

g = QuickGO(verbose=False)

# Retrieve GO term information
term_info = g.Term("GO:0003824", frmt="obo")

# Search annotations
annotations = g.Annotation(protein="P43403", format="tsv")

7. Protein-Protein Interactions

Query interaction databases via PSICQUIC:

from bioservices import PSICQUIC

s = PSICQUIC(verbose=False)

# Query specific database (e.g., MINT)
interactions = s.query("mint", "ZAP70 AND species:9606")

# List available interaction databases
databases = s.activeDBs

Available databases: MINT, IntAct, BioGRID, DIP, and 30+ others.

Multi-Service Integration Workflows

BioServices excels at combining multiple services for comprehensive analysis. Common integration patterns:

Complete Protein Analysis Pipeline

Execute a full protein characterization workflow:

python scripts/protein_analysis_workflow.py ZAP70_HUMAN your.email@example.com

This script demonstrates:

UniProt search for protein entry
FASTA sequence retrieval
BLAST similarity search
KEGG pathway discovery
PSICQUIC interaction mapping

Pathway Network Analysis

Analyze all pathways for an organism:

python scripts/pathway_analysis.py hsa output_directory/

Extracts and analyzes:

All pathway IDs for organism
Protein-protein interactions per pathway
Interaction type distributions
Exports to CSV/SIF formats

Cross-Database Compound Search

Map compound identifiers across databases:

python scripts/compound_cross_reference.py Geldanamycin

Retrieves:

KEGG compound ID
ChEBI identifier
ChEMBL identifier
Basic compound properties

Batch Identifier Conversion

Convert multiple identifiers at once:

python scripts/batch_id_converter.py input_ids.txt --from UniProtKB_AC-ID --to KEGG

Best Practices

Output Format Handling

Different services return data in various formats:

XML : Parse using BeautifulSoup (most SOAP services)
Tab-separated (TSV) : Pandas DataFrames for tabular data
Dictionary/JSON : Direct Python manipulation
FASTA : BioPython integration for sequence analysis

Rate Limiting and Verbosity

Control API request behavior:

from bioservices import KEGG

k = KEGG(verbose=False)  # Suppress HTTP request details
k.TIMEOUT = 30  # Adjust timeout for slow connections

Error Handling

Wrap service calls in try-except blocks:

try:
    results = u.search("ambiguous_query")
    if results:
        # Process results
        pass
except Exception as e:
    print(f"Search failed: {e}")

Organism Codes

Use standard organism abbreviations:

hsa: Homo sapiens (human)
mmu: Mus musculus (mouse)
dme: Drosophila melanogaster
sce: Saccharomyces cerevisiae (yeast)

List all organisms: k.list("organism") or k.organismIds

Integration with Other Tools

BioServices works well with:

BioPython : Sequence analysis on retrieved FASTA data
Pandas : Tabular data manipulation
PyMOL : 3D structure visualization (retrieve PDB IDs)
NetworkX : Network analysis of pathway interactions
Galaxy : Custom tool wrappers for workflow platforms

Resources

scripts/

Executable Python scripts demonstrating complete workflows:

protein_analysis_workflow.py: End-to-end protein characterization
pathway_analysis.py: KEGG pathway discovery and network extraction
compound_cross_reference.py: Multi-database compound searching
batch_id_converter.py: Bulk identifier mapping utility

Scripts can be executed directly or adapted for specific use cases.

references/

Detailed documentation loaded as needed:

services_reference.md: Comprehensive list of all 40+ services with methods
workflow_patterns.md: Detailed multi-step analysis workflows
identifier_mapping.md: Complete guide to cross-database ID conversion

Load references when working with specific services or complex integration tasks.

Installation

uv pip install bioservices

Dependencies are automatically managed. Package is tested on Python 3.9-3.12.

Additional Information

For detailed API documentation and advanced features, refer to:

Official documentation: https://bioservices.readthedocs.io/
Source code: https://github.com/cokelaer/bioservices
Service-specific references in references/services_reference.md

Weekly Installs

154

Repository

davila7/claude-…emplates

GitHub Stars

23.4K

First Seen

Jan 21, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykWarn

Installed on

claude-code129

opencode127

gemini-cli123

cursor123

antigravity114

codex112

Apify Actor 输出模式生成工具 - 自动化创建 dataset_schema.json 与 output_schema.json

1,100 周安装

BioServices Python包：生物信息学Web服务集成工具，支持UniProt、KEGG、BLAST等40+数据库

🇨🇳中文介绍

BioServices

概述

使用场景

核心功能

1. 蛋白质分析

相关 Skills

2. 途径发现与分析

3. 化合物数据库搜索

4. 序列分析

5. 标识符映射

6. 基因本体查询

7. 蛋白质-蛋白质相互作用

多服务集成工作流

完整的蛋白质分析流程

途径网络分析

跨数据库化合物搜索

批量标识符转换

最佳实践

输出格式处理

速率限制与详细程度

错误处理

生物体代码

与其他工具的集成

资源

scripts/

references/

安装

附加信息

🇺🇸English

BioServices

Overview

When to Use This Skill

Core Capabilities

1. Protein Analysis

2. Pathway Discovery and Analysis

3. Compound Database Searches

4. Sequence Analysis

5. Identifier Mapping

6. Gene Ontology Queries

7. Protein-Protein Interactions

Multi-Service Integration Workflows

Complete Protein Analysis Pipeline

Pathway Network Analysis

Cross-Database Compound Search

Batch Identifier Conversion

Best Practices

Output Format Handling

Rate Limiting and Verbosity

Error Handling

Organism Codes

Integration with Other Tools

Resources

scripts/

references/

Installation

Additional Information

最新 Skills