tooluniverse-protein-structure-retrieval by mims-harvard/tooluniverse
npx skills add https://github.com/mims-harvard/tooluniverse --skill tooluniverse-protein-structure-retrieval通过适当的消歧、质量评估和全面的元数据检索蛋白质结构。
重要提示:在工具调用中始终使用英文术语(蛋白质名称、生物体名称),即使用户使用其他语言书写。仅当英文术语未返回结果时,才尝试使用原始语言术语作为备选方案。使用用户的语言进行回复。
Phase 0: Clarify (if needed)
↓
Phase 1: Disambiguate Protein Identity
↓
Phase 2: Retrieve Structures (Internal)
↓
Phase 3: Report Structure Profile
仅在以下情况下询问用户:
以下情况跳过澄清:
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()
# 策略取决于输入类型
if user_provided_pdb_id:
# 直接结构检索
pdb_id = user_provided_pdb_id.upper()
elif user_provided_uniprot:
# 获取 UniProt 信息,然后搜索结构
uniprot_id = user_provided_uniprot
# 也可以获取 AlphaFold 结构
af_structure = tu.tools.alphafold_get_structure_by_uniprot(
uniprot_id=uniprot_id
)
elif user_provided_protein_name:
# 按名称搜索
result = tu.tools.search_structures_by_protein_name(
protein_name=protein_name
)
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
常见的模糊术语:
| 术语 | 模糊性 | 解决方案 |
|---|---|---|
| "kinase" | 数百种激酶 | 询问是哪种激酶(EGFR、CDK2 等) |
| "receptor" | 多种受体类型 | 指定受体家族 |
| "protease" | 多个家族 | 询问丝氨酸/半胱氨酸/金属等 |
| "hemoglobin" | 明确 | 继续(如果需要,指定 α/β 链) |
| "insulin" | 明确 | 继续 |
静默检索所有数据。不要叙述搜索过程。
# 按蛋白质名称搜索
result = tu.tools.search_structures_by_protein_name(
protein_name=protein_name
)
# 按质量过滤结果
high_res = [
entry for entry in result["data"]
if entry.get("resolution") and entry["resolution"] < 2.5
]
对于每个相关结构:
pdb_id = "4INS"
# 基本元数据
metadata = tu.tools.get_protein_metadata_by_pdb_id(pdb_id=pdb_id)
# 实验细节
exp_details = tu.tools.get_protein_experimental_details_by_pdb_id(
pdb_id=pdb_id
)
# 分辨率(如果是 X 射线)
resolution = tu.tools.get_protein_resolution_by_pdb_id(pdb_id=pdb_id)
# 结合配体
ligands = tu.tools.get_protein_ligands_by_pdb_id(pdb_id=pdb_id)
# 相似结构
similar = tu.tools.get_similar_structures_by_pdb_id(
pdb_id=pdb_id,
cutoff=2.0
)
# 条目摘要
summary = tu.tools.pdbe_get_entry_summary(pdb_id=pdb_id)
# 分子实体
molecules = tu.tools.pdbe_get_molecules(pdb_id=pdb_id)
# 结合位点
binding_sites = tu.tools.pdbe_get_binding_sites(pdb_id=pdb_id)
# 当没有实验结构存在时,或用于比较
if uniprot_id:
af_structure = tu.tools.alphafold_get_structure_by_uniprot(
uniprot_id=uniprot_id
)
| 主要 | 备用 | 备注 |
|---|---|---|
| RCSB 搜索 | PDBe 搜索 | 区域可用性 |
| get_protein_metadata | pdbe_get_entry_summary | 替代来源 |
| 实验结构 | AlphaFold 预测 | 无实验结构 |
| get_protein_ligands | pdbe_get_binding_sites | 配体信息不可用 |
以 结构概况报告 形式呈现。隐藏搜索过程。
# 蛋白质结构概况:[蛋白质名称]
**搜索摘要**
- 查询:[蛋白质名称/PDB ID]
- 生物体:[物种]
- 找到的结构:[N] 个实验结构,[M] 个 AlphaFold 结构
---
## 最佳可用结构
### [PDB ID]: [标题]
| 属性 | 值 |
|-----------|-------|
| **PDB ID** | [pdb_id] |
| **UniProt** | [uniprot_id] |
| **生物体** | [物种] |
| **方法** | X-ray / Cryo-EM / NMR |
| **分辨率** | [X.XX] Å |
| **发布日期** | [日期] |
**质量评估**:●●● 高 / ●●○ 中 / ●○○ 低
### 实验细节
| 参数 | 值 |
|-----------|-------|
| **方法** | [X-ray crystallography] |
| **分辨率** | [1.9 Å] |
| **R因子** | [0.18] |
| **R-free** | [0.21] |
| **空间群** | [P 21 21 21] |
### 结构组成
| 组分 | 数量 | 详情 |
|-----------|-------|---------|
| **链** | [N] | [A (酶), B (抑制剂)] |
| **残基** | [N] | [覆盖率 %] |
| **配体** | [N] | [列出配体名称] |
| **水分子** | [N] | |
| **金属离子** | [N] | [Zn, Mg, 等] |
### 结合配体
| 配体 ID | 名称 | 类型 | 结合位点 |
|-----------|------|------|--------------|
| [ATP] | Adenosine triphosphate | 底物 | 活性位点 |
| [MG] | Magnesium ion | 辅因子 | 催化位点 |
### 结合位点详情
用于药物发现应用:
**位点 1:活性位点**
- 位置:链 A,残基 45-89
- 关键残基:Asp45, Glu67, His89
- 口袋体积:[X] ų
- 成药性:高/中/低
---
## 替代结构
按质量和相关性排序:
| 排名 | PDB ID | 分辨率 | 方法 | 配体 | 备注 |
|------|--------|------------|--------|---------|-------|
| 1 | [4INS] | 1.9 Å | X-ray | Zn | 最佳分辨率 |
| 2 | [3I40] | 2.1 Å | X-ray | Zn, phenol | 含抑制剂 |
| 3 | [1TRZ] | 2.3 Å | X-ray | None | 猪源 |
---
## AlphaFold 预测
### AF-[UniProt]-F1
| 属性 | 值 |
|-----------|-------|
| **UniProt** | [uniprot_id] |
| **模型版本** | [v4] |
| **置信度 (pLDDT)** | [平均分数] |
**置信度分布**:
- 非常高 (>90):[X]% 的残基
- 高 (70-90):[X]% 的残基
- 低 (50-70):[X]% 的残基
- 非常低 (<50):[X]% 的残基
**使用场景**:
- ✓ 整体折叠可靠
- ✓ 核心结构域结构
- ⚠ 环区不确定
- ✗ 不适用于结合位点分析
---
## 结构比较
| 属性 | [PDB_1] | [PDB_2] | AlphaFold |
|----------|---------|---------|-----------|
| 分辨率 | 1.9 Å | 2.5 Å | N/A (预测) |
| 完整性 | 98% | 85% | 100% |
| 配体 | 是 | 否 | 否 |
| 置信度 | 实验 | 实验 | 高 (85 平均) |
---
## 下载链接
### 坐标文件
| 格式 | PDB ID | 链接 |
|--------|--------|------|
| PDB | [4INS] | [link] |
| mmCIF | [4INS] | [link] |
| AlphaFold | [UniProt] | [link] |
### 数据库链接
- RCSB PDB: https://www.rcsb.org/structure/[pdb_id]
- PDBe: https://www.ebi.ac.uk/pdbe/entry/pdb/[pdb_id]
- AlphaFold: https://alphafold.ebi.ac.uk/entry/[uniprot_id]
检索日期:[date]
| 等级 | 符号 | 标准 |
|---|---|---|
| 优秀 | ●●●● | X-ray <1.5Å,完整,R-free <0.22 |
| 高 | ●●●○ | X-ray <2.0Å 或 Cryo-EM <3.0Å |
| 良好 | ●●○○ | X-ray 2.0-3.0Å 或 Cryo-EM 3.0-4.0Å |
| 中等 | ●○○○ | X-ray >3.0Å 或 NMR 集合 |
| 低 | ○○○○ | >4.0Å,不完整,或有问题的 |
| 分辨率 | 使用场景 |
|---|---|
| <1.5 Å | 原子细节,氢键分析 |
| 1.5-2.0 Å | 药物设计,机制研究 |
| 2.0-2.5 Å | 基于结构的设计 |
| 2.5-3.5 Å | 整体架构,折叠 |
3.5 Å | 仅用于结构域排列
| pLDDT 分数 | 解释 |
|---|
90 | 置信度非常高,类似实验结构
70-90 | 主链置信度良好
50-70 | 不确定,柔性区域
<50 | 置信度低,可能无序
每个结构报告必须包含:
用户:"获取带有抑制剂的 EGFR 激酶结构" → 筛选配体结合结构,强调结合位点
用户:"为蛋白质 X 的同源建模寻找最佳模板" → 高分辨率结构,注意序列覆盖率
用户:"比较可用的 SARS-CoV-2 主要蛋白酶结构" → 所有结构,带有系统比较表
用户:"UniProt P12345 蛋白质的结构" → 先检查 PDB,然后 AlphaFold,注意置信度
| 错误 | 响应 |
|---|---|
| "未找到 PDB ID" | 验证 4 字符格式,检查是否已废弃 |
| "未找到蛋白质结构" | 提供 AlphaFold 预测,建议相似蛋白质 |
| "下载失败" | 重试一次,提供备用链接 |
| "分辨率不可用" | 可能是 NMR/模型,在评估中注明 |
RCSB PDB(实验结构)
| 工具 | 用途 |
|---|---|
search_structures_by_protein_name | 基于名称的搜索 |
get_protein_metadata_by_pdb_id | 基本信息 |
get_protein_experimental_details_by_pdb_id | 方法细节 |
get_protein_resolution_by_pdb_id | 质量指标 |
get_protein_ligands_by_pdb_id | 结合分子 |
download_pdb_structure_file | 坐标文件 |
get_similar_structures_by_pdb_id | 同源物 |
PDBe(欧洲 PDB)
| 工具 | 用途 |
|---|---|
pdbe_get_entry_summary | 概述 |
pdbe_get_molecules | 分子实体 |
pdbe_get_experiment_info | 实验数据 |
pdbe_get_binding_sites | 配体口袋 |
AlphaFold(预测)
| 工具 | 用途 |
|---|---|
alphafold_get_structure_by_uniprot | 获取预测 |
alphafold_search_structures | 搜索预测 |
每周安装量
163
仓库
GitHub Stars
1.2K
首次出现
Feb 4, 2026
安全审计
安装于
codex155
opencode153
gemini-cli149
github-copilot146
amp141
kimi-cli140
Retrieve protein structures with proper disambiguation, quality assessment, and comprehensive metadata.
IMPORTANT : Always use English terms in tool calls (protein names, organism names), even if the user writes in another language. Only try original-language terms as a fallback if English returns no results. Respond in the user's language.
Phase 0: Clarify (if needed)
↓
Phase 1: Disambiguate Protein Identity
↓
Phase 2: Retrieve Structures (Internal)
↓
Phase 3: Report Structure Profile
Ask the user ONLY if:
Skip clarification for:
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()
# Strategy depends on input type
if user_provided_pdb_id:
# Direct structure retrieval
pdb_id = user_provided_pdb_id.upper()
elif user_provided_uniprot:
# Get UniProt info, then search structures
uniprot_id = user_provided_uniprot
# Can also get AlphaFold structure
af_structure = tu.tools.alphafold_get_structure_by_uniprot(
uniprot_id=uniprot_id
)
elif user_provided_protein_name:
# Search by name
result = tu.tools.search_structures_by_protein_name(
protein_name=protein_name
)
Common ambiguous terms:
| Term | Ambiguity | Resolution |
|---|---|---|
| "kinase" | Hundreds of kinases | Ask which kinase (EGFR, CDK2, etc.) |
| "receptor" | Many receptor types | Specify receptor family |
| "protease" | Multiple families | Ask serine/cysteine/metallo/etc. |
| "hemoglobin" | Clear | Proceed (α/β chain specified if needed) |
| "insulin" | Clear | Proceed |
Retrieve all data silently. Do NOT narrate the search process.
# Search by protein name
result = tu.tools.search_structures_by_protein_name(
protein_name=protein_name
)
# Filter results by quality
high_res = [
entry for entry in result["data"]
if entry.get("resolution") and entry["resolution"] < 2.5
]
For each relevant structure:
pdb_id = "4INS"
# Basic metadata
metadata = tu.tools.get_protein_metadata_by_pdb_id(pdb_id=pdb_id)
# Experimental details
exp_details = tu.tools.get_protein_experimental_details_by_pdb_id(
pdb_id=pdb_id
)
# Resolution (if X-ray)
resolution = tu.tools.get_protein_resolution_by_pdb_id(pdb_id=pdb_id)
# Bound ligands
ligands = tu.tools.get_protein_ligands_by_pdb_id(pdb_id=pdb_id)
# Similar structures
similar = tu.tools.get_similar_structures_by_pdb_id(
pdb_id=pdb_id,
cutoff=2.0
)
# Entry summary
summary = tu.tools.pdbe_get_entry_summary(pdb_id=pdb_id)
# Molecular entities
molecules = tu.tools.pdbe_get_molecules(pdb_id=pdb_id)
# Binding sites
binding_sites = tu.tools.pdbe_get_binding_sites(pdb_id=pdb_id)
# When no experimental structure exists, or for comparison
if uniprot_id:
af_structure = tu.tools.alphafold_get_structure_by_uniprot(
uniprot_id=uniprot_id
)
| Primary | Fallback | Notes |
|---|---|---|
| RCSB search | PDBe search | Regional availability |
| get_protein_metadata | pdbe_get_entry_summary | Alternative source |
| Experimental structure | AlphaFold prediction | No experimental structure |
| get_protein_ligands | pdbe_get_binding_sites | Ligand info unavailable |
Present as a Structure Profile Report. Hide search process.
# Protein Structure Profile: [Protein Name]
**Search Summary**
- Query: [protein name/PDB ID]
- Organism: [species]
- Structures Found: [N] experimental, [M] AlphaFold
---
## Best Available Structure
### [PDB ID]: [Title]
| Attribute | Value |
|-----------|-------|
| **PDB ID** | [pdb_id] |
| **UniProt** | [uniprot_id] |
| **Organism** | [species] |
| **Method** | X-ray / Cryo-EM / NMR |
| **Resolution** | [X.XX] Å |
| **Release Date** | [date] |
**Quality Assessment**: ●●● High / ●●○ Medium / ●○○ Low
### Experimental Details
| Parameter | Value |
|-----------|-------|
| **Method** | [X-ray crystallography] |
| **Resolution** | [1.9 Å] |
| **R-factor** | [0.18] |
| **R-free** | [0.21] |
| **Space Group** | [P 21 21 21] |
### Structure Composition
| Component | Count | Details |
|-----------|-------|---------|
| **Chains** | [N] | [A (enzyme), B (inhibitor)] |
| **Residues** | [N] | [coverage %] |
| **Ligands** | [N] | [list ligand names] |
| **Waters** | [N] | |
| **Metals** | [N] | [Zn, Mg, etc.] |
### Bound Ligands
| Ligand ID | Name | Type | Binding Site |
|-----------|------|------|--------------|
| [ATP] | Adenosine triphosphate | Substrate | Active site |
| [MG] | Magnesium ion | Cofactor | Catalytic |
### Binding Site Details
For drug discovery applications:
**Site 1: Active Site**
- Location: Chain A, residues 45-89
- Key residues: Asp45, Glu67, His89
- Pocket volume: [X] ų
- Druggability: High/Medium/Low
---
## Alternative Structures
Ranked by quality and relevance:
| Rank | PDB ID | Resolution | Method | Ligands | Notes |
|------|--------|------------|--------|---------|-------|
| 1 | [4INS] | 1.9 Å | X-ray | Zn | Best resolution |
| 2 | [3I40] | 2.1 Å | X-ray | Zn, phenol | With inhibitor |
| 3 | [1TRZ] | 2.3 Å | X-ray | None | Porcine |
---
## AlphaFold Prediction
### AF-[UniProt]-F1
| Attribute | Value |
|-----------|-------|
| **UniProt** | [uniprot_id] |
| **Model Version** | [v4] |
| **Confidence (pLDDT)** | [average score] |
**Confidence Distribution**:
- Very High (>90): [X]% of residues
- High (70-90): [X]% of residues
- Low (50-70): [X]% of residues
- Very Low (<50): [X]% of residues
**Use Cases**:
- ✓ Overall fold reliable
- ✓ Core domain structure
- ⚠ Loop regions uncertain
- ✗ Not suitable for binding site analysis
---
## Structure Comparison
| Property | [PDB_1] | [PDB_2] | AlphaFold |
|----------|---------|---------|-----------|
| Resolution | 1.9 Å | 2.5 Å | N/A (predicted) |
| Completeness | 98% | 85% | 100% |
| Ligands | Yes | No | No |
| Confidence | Experimental | Experimental | High (85 avg) |
---
## Download Links
### Coordinate Files
| Format | PDB ID | Link |
|--------|--------|------|
| PDB | [4INS] | [link] |
| mmCIF | [4INS] | [link] |
| AlphaFold | [UniProt] | [link] |
### Database Links
- RCSB PDB: https://www.rcsb.org/structure/[pdb_id]
- PDBe: https://www.ebi.ac.uk/pdbe/entry/pdb/[pdb_id]
- AlphaFold: https://alphafold.ebi.ac.uk/entry/[uniprot_id]
Retrieved: [date]
| Tier | Symbol | Criteria |
|---|---|---|
| Excellent | ●●●● | X-ray <1.5Å, complete, R-free <0.22 |
| High | ●●●○ | X-ray <2.0Å OR Cryo-EM <3.0Å |
| Good | ●●○○ | X-ray 2.0-3.0Å OR Cryo-EM 3.0-4.0Å |
| Moderate | ●○○○ | X-ray >3.0Å OR NMR ensemble |
| Low | ○○○○ | >4.0Å, incomplete, or problematic |
| Resolution | Use Case |
|---|---|
| <1.5 Å | Atomic detail, H-bond analysis |
| 1.5-2.0 Å | Drug design, mechanism studies |
| 2.0-2.5 Å | Structure-based design |
| 2.5-3.5 Å | Overall architecture, fold |
3.5 Å | Domain arrangement only
| pLDDT Score | Interpretation |
|---|
90 | Very high confidence, experimental-like
70-90 | Good backbone confidence
50-70 | Uncertain, flexible regions
<50 | Low confidence, likely disordered
Every structure report MUST include:
User: "Get structure for EGFR kinase with inhibitor" → Filter for ligand-bound structures, emphasize binding site
User: "Find best template for homology modeling of protein X" → High-resolution structures, note sequence coverage
User: "Compare available SARS-CoV-2 main protease structures" → All structures with systematic comparison table
User: "Structure of protein with UniProt P12345" → Check PDB first, then AlphaFold, note confidence
| Error | Response |
|---|---|
| "PDB ID not found" | Verify 4-character format, check if obsoleted |
| "No structures for protein" | Offer AlphaFold prediction, suggest similar proteins |
| "Download failed" | Retry once, provide alternative link |
| "Resolution unavailable" | Likely NMR/model, note in assessment |
RCSB PDB (Experimental Structures)
| Tool | Purpose |
|---|---|
search_structures_by_protein_name | Name-based search |
get_protein_metadata_by_pdb_id | Basic info |
get_protein_experimental_details_by_pdb_id | Method details |
get_protein_resolution_by_pdb_id | Quality metric |
get_protein_ligands_by_pdb_id | Bound molecules |
download_pdb_structure_file |
PDBe (European PDB)
| Tool | Purpose |
|---|---|
pdbe_get_entry_summary | Overview |
pdbe_get_molecules | Molecular entities |
pdbe_get_experiment_info | Experimental data |
pdbe_get_binding_sites | Ligand pockets |
AlphaFold (Predictions)
| Tool | Purpose |
|---|---|
alphafold_get_structure_by_uniprot | Get prediction |
alphafold_search_structures | Search predictions |
Weekly Installs
163
Repository
GitHub Stars
1.2K
First Seen
Feb 4, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
codex155
opencode153
gemini-cli149
github-copilot146
amp141
kimi-cli140
智能OCR文字识别工具 - 支持100+语言,高精度提取图片/PDF/手写文本
976 周安装
| Coordinate files |
get_similar_structures_by_pdb_id | Homologs |