AlphaFold 数据库技能：AI预测蛋白质3D结构检索、下载与分析完整指南

alphafold-database by davila7/claude-code-templates

165 周安装量

23,500 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/davila7/claude-code-templates --skill alphafold-database

AI/机器学习科研工具生物信息学

🇨🇳中文介绍

AlphaFold 数据库

概述

AlphaFold DB 是一个包含超过 2 亿个蛋白质的 AI 预测 3D 结构的公共存储库，由 DeepMind 和 EMBL-EBI 维护。您可以访问带有置信度指标的结构预测、下载坐标文件、检索批量数据集，并将预测集成到计算工作流中。

何时使用此技能

在以下涉及 AI 预测蛋白质结构的场景中应使用此技能：

通过 UniProt ID 或蛋白质名称检索蛋白质结构预测
下载用于结构分析的 PDB/mmCIF 坐标文件
分析预测置信度指标（pLDDT、PAE）以评估可靠性
通过 Google Cloud Platform 访问批量蛋白质组数据集
将预测结构与实验数据进行比较
执行基于结构的药物发现或蛋白质工程
为缺乏实验结构的蛋白质构建结构模型
将 AlphaFold 预测集成到计算流程中

核心功能

1. 搜索和检索预测

使用 Biopython（推荐）：

Biopython 库提供了检索 AlphaFold 结构的最简单接口：

from Bio.PDB import alphafold_db

# 获取 UniProt 登录号的所有预测
predictions = list(alphafold_db.get_predictions("P00520"))

# 下载结构文件（mmCIF 格式）
for prediction in predictions:
    cif_file = alphafold_db.download_cif_for(prediction, directory="./structures")
    print(f"Downloaded: {cif_file}")

# 直接获取 Structure 对象
from Bio.PDB import MMCIFParser
structures = list(alphafold_db.get_structural_models_for("P00520"))

直接 API 访问：

使用 REST 端点查询预测：

import requests

# 获取 UniProt 登录号的预测元数据
uniprot_id = "P00520"
api_url = f"https://alphafold.ebi.ac.uk/api/prediction/{uniprot_id}"
response = requests.get(api_url)
prediction_data = response.json()

# 提取 AlphaFold ID
alphafold_id = prediction_data[0]['entryId']
print(f"AlphaFold ID: {alphafold_id}")

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

2. 下载结构文件

AlphaFold 为每个预测提供多种文件格式：

可用文件类型：

模型坐标 (model_v4.cif)：mmCIF/PDBx 格式的原子坐标
置信度分数 (confidence_v4.json)：每个残基的 pLDDT 分数（0-100）
预测对齐误差 (predicted_aligned_error_v4.json)：用于残基对置信度的 PAE 矩阵

import requests

alphafold_id = "AF-P00520-F1"
version = "v4"

# 模型坐标 (mmCIF)
model_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-model_{version}.cif"
response = requests.get(model_url)
with open(f"{alphafold_id}.cif", "w") as f:
    f.write(response.text)

# 置信度分数 (JSON)
confidence_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-confidence_{version}.json"
response = requests.get(confidence_url)
confidence_data = response.json()

# 预测对齐误差 (JSON)
pae_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-predicted_aligned_error_{version}.json"
response = requests.get(pae_url)
pae_data = response.json()

PDB 格式（替代方案）：

# 下载为 PDB 格式而非 mmCIF
pdb_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-model_{version}.pdb"
response = requests.get(pdb_url)
with open(f"{alphafold_id}.pdb", "wb") as f:
    f.write(response.content)

3. 使用置信度指标

AlphaFold 预测包含对解释至关重要的置信度估计：

pLDDT（每个残基的置信度）：

import json
import requests

# 加载置信度分数
alphafold_id = "AF-P00520-F1"
confidence_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-confidence_v4.json"
confidence = requests.get(confidence_url).json()

# 提取 pLDDT 分数
plddt_scores = confidence['confidenceScore']

# 解释置信度水平
# pLDDT > 90: 置信度非常高
# pLDDT 70-90: 置信度高
# pLDDT 50-70: 置信度低
# pLDDT < 50: 置信度非常低

high_confidence_residues = [i for i, score in enumerate(plddt_scores) if score > 90]
print(f"High confidence residues: {len(high_confidence_residues)}/{len(plddt_scores)}")

PAE（预测对齐误差）：

PAE 表示相对结构域位置的置信度：

import numpy as np
import matplotlib.pyplot as plt

# 加载 PAE 矩阵
pae_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-predicted_aligned_error_v4.json"
pae = requests.get(pae_url).json()

# 可视化 PAE 矩阵
pae_matrix = np.array(pae['distance'])
plt.figure(figsize=(10, 8))
plt.imshow(pae_matrix, cmap='viridis_r', vmin=0, vmax=30)
plt.colorbar(label='PAE (Å)')
plt.title(f'Predicted Aligned Error: {alphafold_id}')
plt.xlabel('Residue')
plt.ylabel('Residue')
plt.savefig(f'{alphafold_id}_pae.png', dpi=300, bbox_inches='tight')

# 低 PAE 值 (<5 Å) 表示相对定位置信度高
# 高 PAE 值 (>15 Å) 表示结构域排列不确定

4. 通过 Google Cloud 批量访问数据

对于大规模分析，请使用 Google Cloud 数据集：

Google Cloud Storage：

# 安装 gsutil
uv pip install gsutil

# 列出可用数据
gsutil ls gs://public-datasets-deepmind-alphafold-v4/

# 下载整个蛋白质组（按分类学 ID）
gsutil -m cp gs://public-datasets-deepmind-alphafold-v4/proteomes/proteome-tax_id-9606-*.tar .

# 下载特定文件
gsutil cp gs://public-datasets-deepmind-alphafold-v4/accession_ids.csv .

BigQuery 元数据访问：

from google.cloud import bigquery

# 初始化客户端
client = bigquery.Client()

# 查询元数据
query = """
SELECT
  entryId,
  uniprotAccession,
  organismScientificName,
  globalMetricValue,
  fractionPlddtVeryHigh
FROM `bigquery-public-data.deepmind_alphafold.metadata`
WHERE organismScientificName = 'Homo sapiens'
  AND fractionPlddtVeryHigh > 0.8
LIMIT 100
"""

results = client.query(query).to_dataframe()
print(f"Found {len(results)} high-confidence human proteins")

按物种下载：

import subprocess

def download_proteome(taxonomy_id, output_dir="./proteomes"):
    """下载某个物种的所有 AlphaFold 预测"""
    pattern = f"gs://public-datasets-deepmind-alphafold-v4/proteomes/proteome-tax_id-{taxonomy_id}-*_v4.tar"
    cmd = f"gsutil -m cp {pattern} {output_dir}/"
    subprocess.run(cmd, shell=True, check=True)

# 下载大肠杆菌蛋白质组（分类 ID: 83333）
download_proteome(83333)

# 下载人类蛋白质组（分类 ID: 9606）
download_proteome(9606)

5. 解析和分析结构

使用 BioPython 处理下载的 AlphaFold 结构：

from Bio.PDB import MMCIFParser, PDBIO
import numpy as np

# 解析 mmCIF 文件
parser = MMCIFParser(QUIET=True)
structure = parser.get_structure("protein", "AF-P00520-F1-model_v4.cif")

# 提取坐标
coords = []
for model in structure:
    for chain in model:
        for residue in chain:
            if 'CA' in residue:  # 仅 Alpha 碳原子
                coords.append(residue['CA'].get_coord())

coords = np.array(coords)
print(f"Structure has {len(coords)} residues")

# 计算距离
from scipy.spatial.distance import pdist, squareform
distance_matrix = squareform(pdist(coords))

# 识别接触（< 8 Å）
contacts = np.where((distance_matrix > 0) & (distance_matrix < 8))
print(f"Number of contacts: {len(contacts[0]) // 2}")

提取 B 因子（pLDDT 值）：

AlphaFold 将 pLDDT 分数存储在 B 因子列中：

from Bio.PDB import MMCIFParser

parser = MMCIFParser(QUIET=True)
structure = parser.get_structure("protein", "AF-P00520-F1-model_v4.cif")

# 从 B 因子中提取 pLDDT
plddt_scores = []
for model in structure:
    for chain in model:
        for residue in chain:
            if 'CA' in residue:
                plddt_scores.append(residue['CA'].get_bfactor())

# 识别高置信度区域
high_conf_regions = [(i, score) for i, score in enumerate(plddt_scores, 1) if score > 90]
print(f"High confidence residues: {len(high_conf_regions)}")

6. 批量处理多个蛋白质

高效处理多个预测：

from Bio.PDB import alphafold_db
import pandas as pd

uniprot_ids = ["P00520", "P12931", "P04637"]  # 多个蛋白质
results = []

for uniprot_id in uniprot_ids:
    try:
        # 获取预测
        predictions = list(alphafold_db.get_predictions(uniprot_id))

        if predictions:
            pred = predictions[0]

            # 下载结构
            cif_file = alphafold_db.download_cif_for(pred, directory="./batch_structures")

            # 获取置信度数据
            alphafold_id = pred['entryId']
            conf_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-confidence_v4.json"
            conf_data = requests.get(conf_url).json()

            # 计算统计信息
            plddt_scores = conf_data['confidenceScore']
            avg_plddt = np.mean(plddt_scores)
            high_conf_fraction = sum(1 for s in plddt_scores if s > 90) / len(plddt_scores)

            results.append({
                'uniprot_id': uniprot_id,
                'alphafold_id': alphafold_id,
                'avg_plddt': avg_plddt,
                'high_conf_fraction': high_conf_fraction,
                'length': len(plddt_scores)
            })
    except Exception as e:
        print(f"Error processing {uniprot_id}: {e}")

# 创建摘要 DataFrame
df = pd.DataFrame(results)
print(df)

# 安装 Biopython 用于结构访问
uv pip install biopython

# 安装 requests 用于 API 访问
uv pip install requests

# 用于可视化和分析
uv pip install numpy matplotlib pandas scipy

# 用于 Google Cloud 访问（可选）
uv pip install google-cloud-bigquery gsutil

3D-Beacons API 替代方案

AlphaFold 也可以通过 3D-Beacons 联合 API 访问：

import requests

# 通过 3D-Beacons 查询
uniprot_id = "P00520"
url = f"https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/api/uniprot/summary/{uniprot_id}.json"
response = requests.get(url)
data = response.json()

# 筛选 AlphaFold 结构
af_structures = [s for s in data['structures'] if s['provider'] == 'AlphaFold DB']

结构蛋白质组学

下载完整的蛋白质组预测进行分析
识别跨蛋白质的高置信度结构区域
将预测结构与实验数据进行比较
为蛋白质家族构建结构模型

检索用于对接研究的目标蛋白质结构
分析结合位点构象
在预测结构中识别可成药的结合口袋
比较同源物之间的结构

使用 pLDDT 识别稳定/不稳定区域
在高置信度区域设计突变
使用 PAE 分析结构域架构
建模蛋白质变体和突变

比较跨物种的直系同源结构
分析结构特征的保守性
研究结构域进化模式
识别功能重要的区域

UniProt 登录号： 蛋白质的主要标识符（例如 "P00520"）。查询 AlphaFold DB 时需要。

AlphaFold ID： 内部标识符格式：AF-[UniProt 登录号]-F[片段编号]（例如 "AF-P00520-F1"）。

pLDDT（预测局部距离差异测试）： 每个残基的置信度指标（0-100）。值越高表示预测置信度越高。

PAE（预测对齐误差）： 指示残基对之间相对位置置信度的矩阵。低值（<5 Å）表示相对定位置信度高。

数据库版本： 当前版本是 v4。文件 URL 包含版本后缀（例如 model_v4.cif）。

片段编号： 大蛋白质可能被分割成片段。片段编号出现在 AlphaFold ID 中（例如 F1、F2）。

置信度解释指南

> 90：置信度非常高 - 适合详细分析
70-90：置信度高 - 通常主链结构可靠
50-70：置信度低 - 谨慎使用，柔性区域
< 50：置信度非常低 - 可能无序或不可靠

< 5 Å：结构域相对定位置信度高
5-10 Å：排列置信度中等
> 15 Å：相对位置不确定，结构域可能具有流动性

references/api_reference.md

全面的 API 文档，涵盖：

完整的 REST API 端点规范
文件格式详细信息和数据模式
Google Cloud 数据集结构和访问模式
高级查询示例和批处理策略
速率限制、缓存和最佳实践
常见问题故障排除

请查阅此参考以获取详细的 API 信息、批量下载策略，或在处理大规模数据集时使用。

数据使用和归属

AlphaFold DB 在 CC-BY-4.0 许可下免费提供
引用：Jumper 等人 (2021) Nature 和 Varadi 等人 (2022) Nucleic Acids Research
预测是计算模型，而非实验结构
在下游分析之前始终评估置信度指标

当前数据库版本：v4（截至 2024-2025 年）
文件 URL 包含版本后缀（例如 _v4.cif）
定期检查数据库更新
旧版本可能随时间被弃用

数据质量考虑因素

高 pLDDT 不能保证功能准确性
低置信度区域在体内可能无序
PAE 表示相对结构域置信度，而非绝对定位
预测缺乏配体、翻译后修饰和辅因子
不预测多链复合物（仅单链）

使用 Biopython 进行简单的单蛋白质访问
使用 Google Cloud 进行批量下载（比单个文件快得多）
在本地缓存下载的文件以避免重复下载
BigQuery 免费层：每月处理 1 TB 数据
考虑大规模下载的网络带宽

2026 年 1 月 21 日

🇺🇸English

AlphaFold Database

Overview

AlphaFold DB is a public repository of AI-predicted 3D protein structures for over 200 million proteins, maintained by DeepMind and EMBL-EBI. Access structure predictions with confidence metrics, download coordinate files, retrieve bulk datasets, and integrate predictions into computational workflows.

When to Use This Skill

This skill should be used when working with AI-predicted protein structures in scenarios such as:

Retrieving protein structure predictions by UniProt ID or protein name
Downloading PDB/mmCIF coordinate files for structural analysis
Analyzing prediction confidence metrics (pLDDT, PAE) to assess reliability
Accessing bulk proteome datasets via Google Cloud Platform
Comparing predicted structures with experimental data
Performing structure-based drug discovery or protein engineering
Building structural models for proteins lacking experimental structures
Integrating AlphaFold predictions into computational pipelines

Core Capabilities

1. Searching and Retrieving Predictions

Using Biopython (Recommended):

The Biopython library provides the simplest interface for retrieving AlphaFold structures:

from Bio.PDB import alphafold_db

# Get all predictions for a UniProt accession
predictions = list(alphafold_db.get_predictions("P00520"))

# Download structure file (mmCIF format)
for prediction in predictions:
    cif_file = alphafold_db.download_cif_for(prediction, directory="./structures")
    print(f"Downloaded: {cif_file}")

# Get Structure objects directly
from Bio.PDB import MMCIFParser
structures = list(alphafold_db.get_structural_models_for("P00520"))

Direct API Access:

Query predictions using REST endpoints:

import requests

# Get prediction metadata for a UniProt accession
uniprot_id = "P00520"
api_url = f"https://alphafold.ebi.ac.uk/api/prediction/{uniprot_id}"
response = requests.get(api_url)
prediction_data = response.json()

# Extract AlphaFold ID
alphafold_id = prediction_data[0]['entryId']
print(f"AlphaFold ID: {alphafold_id}")

Using UniProt to Find Accessions:

Search UniProt to find protein accessions first:

import urllib.parse, urllib.request

def get_uniprot_ids(query, query_type='PDB_ID'):
    """Query UniProt to get accession IDs"""
    url = 'https://www.uniprot.org/uploadlists/'
    params = {
        'from': query_type,
        'to': 'ACC',
        'format': 'txt',
        'query': query
    }
    data = urllib.parse.urlencode(params).encode('ascii')
    with urllib.request.urlopen(urllib.request.Request(url, data)) as response:
        return response.read().decode('utf-8').splitlines()

# Example: Find UniProt IDs for a protein name
protein_ids = get_uniprot_ids("hemoglobin", query_type="GENE_NAME")

2. Downloading Structure Files

AlphaFold provides multiple file formats for each prediction:

File Types Available:

Model coordinates (model_v4.cif): Atomic coordinates in mmCIF/PDBx format
Confidence scores (confidence_v4.json): Per-residue pLDDT scores (0-100)
Predicted Aligned Error (predicted_aligned_error_v4.json): PAE matrix for residue pair confidence

Download URLs:

import requests

alphafold_id = "AF-P00520-F1"
version = "v4"

# Model coordinates (mmCIF)
model_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-model_{version}.cif"
response = requests.get(model_url)
with open(f"{alphafold_id}.cif", "w") as f:
    f.write(response.text)

# Confidence scores (JSON)
confidence_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-confidence_{version}.json"
response = requests.get(confidence_url)
confidence_data = response.json()

# Predicted Aligned Error (JSON)
pae_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-predicted_aligned_error_{version}.json"
response = requests.get(pae_url)
pae_data = response.json()

PDB Format (Alternative):

# Download as PDB format instead of mmCIF
pdb_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-model_{version}.pdb"
response = requests.get(pdb_url)
with open(f"{alphafold_id}.pdb", "wb") as f:
    f.write(response.content)

3. Working with Confidence Metrics

AlphaFold predictions include confidence estimates critical for interpretation:

pLDDT (per-residue confidence):

import json
import requests

# Load confidence scores
alphafold_id = "AF-P00520-F1"
confidence_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-confidence_v4.json"
confidence = requests.get(confidence_url).json()

# Extract pLDDT scores
plddt_scores = confidence['confidenceScore']

# Interpret confidence levels
# pLDDT > 90: Very high confidence
# pLDDT 70-90: High confidence
# pLDDT 50-70: Low confidence
# pLDDT < 50: Very low confidence

high_confidence_residues = [i for i, score in enumerate(plddt_scores) if score > 90]
print(f"High confidence residues: {len(high_confidence_residues)}/{len(plddt_scores)}")

PAE (Predicted Aligned Error):

PAE indicates confidence in relative domain positions:

import numpy as np
import matplotlib.pyplot as plt

# Load PAE matrix
pae_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-predicted_aligned_error_v4.json"
pae = requests.get(pae_url).json()

# Visualize PAE matrix
pae_matrix = np.array(pae['distance'])
plt.figure(figsize=(10, 8))
plt.imshow(pae_matrix, cmap='viridis_r', vmin=0, vmax=30)
plt.colorbar(label='PAE (Å)')
plt.title(f'Predicted Aligned Error: {alphafold_id}')
plt.xlabel('Residue')
plt.ylabel('Residue')
plt.savefig(f'{alphafold_id}_pae.png', dpi=300, bbox_inches='tight')

# Low PAE values (<5 Å) indicate confident relative positioning
# High PAE values (>15 Å) suggest uncertain domain arrangements

4. Bulk Data Access via Google Cloud

For large-scale analyses, use Google Cloud datasets:

Google Cloud Storage:

# Install gsutil
uv pip install gsutil

# List available data
gsutil ls gs://public-datasets-deepmind-alphafold-v4/

# Download entire proteomes (by taxonomy ID)
gsutil -m cp gs://public-datasets-deepmind-alphafold-v4/proteomes/proteome-tax_id-9606-*.tar .

# Download specific files
gsutil cp gs://public-datasets-deepmind-alphafold-v4/accession_ids.csv .

BigQuery Metadata Access:

from google.cloud import bigquery

# Initialize client
client = bigquery.Client()

# Query metadata
query = """
SELECT
  entryId,
  uniprotAccession,
  organismScientificName,
  globalMetricValue,
  fractionPlddtVeryHigh
FROM `bigquery-public-data.deepmind_alphafold.metadata`
WHERE organismScientificName = 'Homo sapiens'
  AND fractionPlddtVeryHigh > 0.8
LIMIT 100
"""

results = client.query(query).to_dataframe()
print(f"Found {len(results)} high-confidence human proteins")

Download by Species:

import subprocess

def download_proteome(taxonomy_id, output_dir="./proteomes"):
    """Download all AlphaFold predictions for a species"""
    pattern = f"gs://public-datasets-deepmind-alphafold-v4/proteomes/proteome-tax_id-{taxonomy_id}-*_v4.tar"
    cmd = f"gsutil -m cp {pattern} {output_dir}/"
    subprocess.run(cmd, shell=True, check=True)

# Download E. coli proteome (tax ID: 83333)
download_proteome(83333)

# Download human proteome (tax ID: 9606)
download_proteome(9606)

5. Parsing and Analyzing Structures

Work with downloaded AlphaFold structures using BioPython:

from Bio.PDB import MMCIFParser, PDBIO
import numpy as np

# Parse mmCIF file
parser = MMCIFParser(QUIET=True)
structure = parser.get_structure("protein", "AF-P00520-F1-model_v4.cif")

# Extract coordinates
coords = []
for model in structure:
    for chain in model:
        for residue in chain:
            if 'CA' in residue:  # Alpha carbons only
                coords.append(residue['CA'].get_coord())

coords = np.array(coords)
print(f"Structure has {len(coords)} residues")

# Calculate distances
from scipy.spatial.distance import pdist, squareform
distance_matrix = squareform(pdist(coords))

# Identify contacts (< 8 Å)
contacts = np.where((distance_matrix > 0) & (distance_matrix < 8))
print(f"Number of contacts: {len(contacts[0]) // 2}")

Extract B-factors (pLDDT values):

AlphaFold stores pLDDT scores in the B-factor column:

from Bio.PDB import MMCIFParser

parser = MMCIFParser(QUIET=True)
structure = parser.get_structure("protein", "AF-P00520-F1-model_v4.cif")

# Extract pLDDT from B-factors
plddt_scores = []
for model in structure:
    for chain in model:
        for residue in chain:
            if 'CA' in residue:
                plddt_scores.append(residue['CA'].get_bfactor())

# Identify high-confidence regions
high_conf_regions = [(i, score) for i, score in enumerate(plddt_scores, 1) if score > 90]
print(f"High confidence residues: {len(high_conf_regions)}")

6. Batch Processing Multiple Proteins

Process multiple predictions efficiently:

from Bio.PDB import alphafold_db
import pandas as pd

uniprot_ids = ["P00520", "P12931", "P04637"]  # Multiple proteins
results = []

for uniprot_id in uniprot_ids:
    try:
        # Get prediction
        predictions = list(alphafold_db.get_predictions(uniprot_id))

        if predictions:
            pred = predictions[0]

            # Download structure
            cif_file = alphafold_db.download_cif_for(pred, directory="./batch_structures")

            # Get confidence data
            alphafold_id = pred['entryId']
            conf_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-confidence_v4.json"
            conf_data = requests.get(conf_url).json()

            # Calculate statistics
            plddt_scores = conf_data['confidenceScore']
            avg_plddt = np.mean(plddt_scores)
            high_conf_fraction = sum(1 for s in plddt_scores if s > 90) / len(plddt_scores)

            results.append({
                'uniprot_id': uniprot_id,
                'alphafold_id': alphafold_id,
                'avg_plddt': avg_plddt,
                'high_conf_fraction': high_conf_fraction,
                'length': len(plddt_scores)
            })
    except Exception as e:
        print(f"Error processing {uniprot_id}: {e}")

# Create summary DataFrame
df = pd.DataFrame(results)
print(df)

Installation and Setup

Python Libraries

# Install Biopython for structure access
uv pip install biopython

# Install requests for API access
uv pip install requests

# For visualization and analysis
uv pip install numpy matplotlib pandas scipy

# For Google Cloud access (optional)
uv pip install google-cloud-bigquery gsutil

3D-Beacons API Alternative

AlphaFold can also be accessed via the 3D-Beacons federated API:

import requests

# Query via 3D-Beacons
uniprot_id = "P00520"
url = f"https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/api/uniprot/summary/{uniprot_id}.json"
response = requests.get(url)
data = response.json()

# Filter for AlphaFold structures
af_structures = [s for s in data['structures'] if s['provider'] == 'AlphaFold DB']

Common Use Cases

Structural Proteomics

Download complete proteome predictions for analysis
Identify high-confidence structural regions across proteins
Compare predicted structures with experimental data
Build structural models for protein families

Drug Discovery

Retrieve target protein structures for docking studies
Analyze binding site conformations
Identify druggable pockets in predicted structures
Compare structures across homologs

Protein Engineering

Identify stable/unstable regions using pLDDT
Design mutations in high-confidence regions
Analyze domain architectures using PAE
Model protein variants and mutations

Evolutionary Studies

Compare ortholog structures across species
Analyze conservation of structural features
Study domain evolution patterns
Identify functionally important regions

Key Concepts

UniProt Accession: Primary identifier for proteins (e.g., "P00520"). Required for querying AlphaFold DB.

AlphaFold ID: Internal identifier format: AF-[UniProt accession]-F[fragment number] (e.g., "AF-P00520-F1").

pLDDT (predicted Local Distance Difference Test): Per-residue confidence metric (0-100). Higher values indicate more confident predictions.

PAE (Predicted Aligned Error): Matrix indicating confidence in relative positions between residue pairs. Low values (<5 Å) suggest confident relative positioning.

Database Version: Current version is v4. File URLs include version suffix (e.g., model_v4.cif).

Fragment Number: Large proteins may be split into fragments. Fragment number appears in AlphaFold ID (e.g., F1, F2).

Confidence Interpretation Guidelines

pLDDT Thresholds:

> 90: Very high confidence - suitable for detailed analysis
70-90 : High confidence - generally reliable backbone structure
50-70 : Low confidence - use with caution, flexible regions
< 50: Very low confidence - likely disordered or unreliable

PAE Guidelines:

< 5 Å: Confident relative positioning of domains
5-10 Å : Moderate confidence in arrangement
> 15 Å: Uncertain relative positions, domains may be mobile

Resources

references/api_reference.md

Comprehensive API documentation covering:

Complete REST API endpoint specifications
File format details and data schemas
Google Cloud dataset structure and access patterns
Advanced query examples and batch processing strategies
Rate limiting, caching, and best practices
Troubleshooting common issues

Consult this reference for detailed API information, bulk download strategies, or when working with large-scale datasets.

Important Notes

Data Usage and Attribution

AlphaFold DB is freely available under CC-BY-4.0 license
Cite: Jumper et al. (2021) Nature and Varadi et al. (2022) Nucleic Acids Research
Predictions are computational models, not experimental structures
Always assess confidence metrics before downstream analysis

Version Management

Current database version: v4 (as of 2024-2025)
File URLs include version suffix (e.g., _v4.cif)
Check for database updates regularly
Older versions may be deprecated over time

Data Quality Considerations

High pLDDT doesn't guarantee functional accuracy
Low confidence regions may be disordered in vivo
PAE indicates relative domain confidence, not absolute positioning
Predictions lack ligands, post-translational modifications, and cofactors
Multi-chain complexes are not predicted (single chains only)

Performance Tips

Use Biopython for simple single-protein access
Use Google Cloud for bulk downloads (much faster than individual files)
Cache downloaded files locally to avoid repeated downloads
BigQuery free tier: 1 TB processed data per month
Consider network bandwidth for large-scale downloads

Additional Resources

AlphaFold DB Website: https://alphafold.ebi.ac.uk/
API Documentation: https://alphafold.ebi.ac.uk/api-docs
Google Cloud Dataset: https://cloud.google.com/blog/products/ai-machine-learning/alphafold-protein-structure-database
3D-Beacons API: https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/
AlphaFold Papers:
- Nature (2021): https://doi.org/10.1038/s41586-021-03819-2
- Nucleic Acids Research (2024): https://doi.org/10.1093/nar/gkad1011
Biopython Documentation: https://biopython.org/docs/dev/api/Bio.PDB.alphafold_db.html
GitHub Repository: https://github.com/google-deepmind/alphafold

Weekly Installs

165

Repository

davila7/claude-…emplates

GitHub Stars

23.5K

First Seen

Jan 21, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

opencode137

claude-code137

gemini-cli133

cursor132

codex123

antigravity120

超能力技能使用指南：AI助手技能调用优先级与工作流程详解

46,500 周安装

AlphaFold 数据库技能：AI预测蛋白质3D结构检索、下载与分析完整指南

🇨🇳中文介绍

AlphaFold 数据库

概述

何时使用此技能

核心功能

1. 搜索和检索预测

相关 Skills

2. 下载结构文件

3. 使用置信度指标

4. 通过 Google Cloud 批量访问数据

5. 解析和分析结构

6. 批量处理多个蛋白质

安装和设置

Python 库

3D-Beacons API 替代方案

常见用例

结构蛋白质组学

药物发现

蛋白质工程

进化研究

关键概念

置信度解释指南

资源

references/api_reference.md

重要说明

数据使用和归属

版本管理

数据质量考虑因素

性能提示

其他资源

🇺🇸English

AlphaFold Database

Overview

When to Use This Skill

Core Capabilities

1. Searching and Retrieving Predictions

2. Downloading Structure Files

3. Working with Confidence Metrics

4. Bulk Data Access via Google Cloud

5. Parsing and Analyzing Structures

6. Batch Processing Multiple Proteins

Installation and Setup

Python Libraries

3D-Beacons API Alternative

Common Use Cases

Structural Proteomics

Drug Discovery

Protein Engineering

Evolutionary Studies

Key Concepts

Confidence Interpretation Guidelines

Resources

references/api_reference.md

Important Notes

Data Usage and Attribution

Version Management

Data Quality Considerations

Performance Tips

Additional Resources

最新 Skills