重要前提
安装AI Skills的关键前提是:必须科学上网,且开启TUN模式,这一点至关重要,直接决定安装能否顺利完成,在此郑重提醒三遍:科学上网,科学上网,科学上网。查看完整安装教程 →
diffdock by davila7/claude-code-templates
npx skills add https://github.com/davila7/claude-code-templates --skill diffdockDiffDock 是一款基于扩散的深度学习工具,用于分子对接,可预测小分子配体与蛋白质靶点的 3D 结合构象。它代表了计算对接领域的最先进技术,对于基于结构的药物发现和化学生物学至关重要。
核心能力:
关键区别: DiffDock 预测的是结合构象(3D 结构)和置信度(预测确定性),而非结合亲和力(ΔG、Kd)。进行亲和力评估时,务必结合评分函数(如 GNINA、MM/GBSA)使用。
此技能应在以下情况使用:
在执行 DiffDock 任务之前,请验证环境设置:
# 使用提供的设置检查器
python scripts/setup_check.py
此脚本会验证 Python 版本、带 CUDA 的 PyTorch、PyTorch Geometric、RDKit、ESM 以及其他依赖项。
选项 1:Conda(推荐)
git clone https://github.com/gcorso/DiffDock.git
cd DiffDock
conda env create --file environment.yml
conda activate diffdock
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
选项 2:Docker
docker pull rbgcsail/diffdock
docker run -it --gpus all --entrypoint /bin/bash rbgcsail/diffdock
micromamba activate diffdock
重要说明:
使用场景: 将一个配体对接到一个蛋白质靶点
输入要求:
命令:
python -m inference \
--config default_inference_args.yaml \
--protein_path protein.pdb \
--ligand "CC(=O)Oc1ccccc1C(=O)O" \
--out_dir results/single_docking/
替代方案(蛋白质序列):
python -m inference \
--config default_inference_args.yaml \
--protein_sequence "MSKGEELFTGVVPILVELDGDVNGHKF..." \
--ligand ligand.sdf \
--out_dir results/sequence_docking/
输出结构:
results/single_docking/
├── rank_1.sdf # 排名第一的构象
├── rank_2.sdf # 排名第二的构象
├── ...
├── rank_10.sdf # 第 10 个构象(默认:10 个样本)
└── confidence_scores.txt
使用场景: 将多个配体对接到蛋白质,进行虚拟筛选任务
步骤 1:准备批量 CSV
使用提供的脚本创建或验证批量输入:
# 创建模板
python scripts/prepare_batch_csv.py --create --output batch_input.csv
# 验证现有 CSV
python scripts/prepare_batch_csv.py my_input.csv --validate
CSV 格式:
complex_name,protein_path,ligand_description,protein_sequence
complex1,protein1.pdb,CC(=O)Oc1ccccc1C(=O)O,
complex2,,COc1ccc(C#N)cc1,MSKGEELFT...
complex3,protein3.pdb,ligand3.sdf,
必需列:
complex_name:唯一标识符protein_path:PDB 文件路径(如果使用序列则留空)ligand_description:SMILES 字符串或配体文件路径protein_sequence:氨基酸序列(如果使用 PDB 则留空)步骤 2:运行批量对接
python -m inference \
--config default_inference_args.yaml \
--protein_ligand_csv batch_input.csv \
--out_dir results/batch/ \
--batch_size 10
对于大型虚拟筛选(>100 个化合物):
预计算蛋白质嵌入以加快处理速度:
# 预计算嵌入
python datasets/esm_embedding_preparation.py \
--protein_ligand_csv screening_input.csv \
--out_file protein_embeddings.pt
# 使用预计算的嵌入运行
python -m inference \
--config default_inference_args.yaml \
--protein_ligand_csv screening_input.csv \
--esm_embeddings_path protein_embeddings.pt \
--out_dir results/screening/
对接完成后,分析置信度分数并对预测进行排序:
# 分析所有结果
python scripts/analyze_results.py results/batch/
# 显示每个复合物的前 5 名
python scripts/analyze_results.py results/batch/ --top 5
# 按置信度阈值过滤
python scripts/analyze_results.py results/batch/ --threshold 0.0
# 导出到 CSV
python scripts/analyze_results.py results/batch/ --export summary.csv
# 显示所有复合物中排名前 20 的预测
python scripts/analyze_results.py results/batch/ --best 20
分析脚本的功能:
理解分数:
| 分数范围 | 置信度等级 | 解读 |
|---|---|---|
| > 0 | 高 | 强预测,可能准确 |
| -1.5 到 0 | 中 | 合理预测,需仔细验证 |
| < -1.5 | 低 | 不确定预测,需要验证 |
关键说明:
详细指南: 使用 Read 工具阅读 references/confidence_and_limitations.md
为特定用例创建自定义配置:
# 复制模板
cp assets/custom_inference_config.yaml my_config.yaml
# 编辑参数(参见模板中的预设)
# 然后使用自定义配置运行
python -m inference \
--config my_config.yaml \
--protein_ligand_csv input.csv \
--out_dir results/
采样密度:
samples_per_complex: 10 → 对于困难情况,增加到 20-40推理步骤:
inference_steps: 20 → 为获得更高精度,增加到 25-30温度参数(控制多样性):
temp_sampling_tor: 7.04 → 对于柔性配体,增加到 8-10temp_sampling_tor: 7.04 → 对于刚性配体,减少到 5-6模板中可用的预设:
完整参数参考: 使用 Read 工具阅读 references/parameters_reference.md
对于已知具有柔性的蛋白质,对接到多个构象:
# 创建集成 CSV
import pandas as pd
conformations = ["conf1.pdb", "conf2.pdb", "conf3.pdb"]
ligand = "CC(=O)Oc1ccccc1C(=O)O"
data = {
"complex_name": [f"ensemble_{i}" for i in range(len(conformations))],
"protein_path": conformations,
"ligand_description": [ligand] * len(conformations),
"protein_sequence": [""] * len(conformations)
}
pd.DataFrame(data).to_csv("ensemble_input.csv", index=False)
增加采样运行对接:
python -m inference \
--config default_inference_args.yaml \
--protein_ligand_csv ensemble_input.csv \
--samples_per_complex 20 \
--out_dir results/ensemble/
DiffDock 生成构象;结合其他工具进行亲和力评估:
GNINA(快速神经网络评分):
for pose in results/*.sdf; do
gnina -r protein.pdb -l "$pose" --score_only
done
MM/GBSA(更准确,更慢): 能量最小化后使用 AmberTools MMPBSA.py 或 gmx_MMPBSA
自由能计算(最准确): 使用 OpenMM + OpenFE 或 GROMACS 进行 FEP/TI 计算
推荐工作流程:
DiffDock 设计用于:
DiffDock 不设计用于:
完整局限性: 使用 Read 工具阅读 references/confidence_and_limitations.md
问题:所有预测的置信度分数都很低
samples_per_complex(20-40),尝试集成对接,验证蛋白质结构问题:内存不足错误
--batch_size 2 或一次处理更少的复合物问题:性能缓慢
python -c "import torch; print(torch.cuda.is_available())" 验证 CUDA,使用 GPU问题:不现实的结合构象
问题:"Module not found" 错误
python scripts/setup_check.py 进行诊断最佳实践:
如需交互式使用,启动 Web 界面:
python app/main.py
# 导航到 http://localhost:7860
或使用无需安装的在线演示:
scripts/)prepare_batch_csv.py:创建和验证批量输入 CSV 文件
analyze_results.py:分析置信度分数并对预测进行排序
setup_check.py:验证 DiffDock 环境设置
references/)parameters_reference.md:完整的参数文档
当用户需要时阅读此文件:
confidence_and_limitations.md:置信度分数解读和工具局限性
当用户需要时阅读此文件:
workflows_examples.md:全面的工作流程示例
当用户需要时阅读此文件:
assets/)batch_template.csv:批量处理模板
custom_inference_config.yaml:配置模板
setup_check.pyprepare_batch_csv.py 及早发现错误使用 DiffDock 时,请引用相应的论文:
DiffDock-L(当前默认模型):
Stärk et al. (2024) "DiffDock-L: Improving Molecular Docking with Diffusion Models"
arXiv:2402.18396
原始 DiffDock:
Corso et al. (2023) "DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking"
ICLR 2023, arXiv:2210.01776
每周安装次数
143
仓库
GitHub 星标数
23.4K
首次出现
Jan 21, 2026
安全审计
安装于
claude-code121
opencode119
cursor114
gemini-cli113
codex102
antigravity101
DiffDock is a diffusion-based deep learning tool for molecular docking that predicts 3D binding poses of small molecule ligands to protein targets. It represents the state-of-the-art in computational docking, crucial for structure-based drug discovery and chemical biology.
Core Capabilities:
Key Distinction: DiffDock predicts binding poses (3D structure) and confidence (prediction certainty), NOT binding affinity (ΔG, Kd). Always combine with scoring functions (GNINA, MM/GBSA) for affinity assessment.
This skill should be used when:
Before proceeding with DiffDock tasks, verify the environment setup:
# Use the provided setup checker
python scripts/setup_check.py
This script validates Python version, PyTorch with CUDA, PyTorch Geometric, RDKit, ESM, and other dependencies.
Option 1: Conda (Recommended)
git clone https://github.com/gcorso/DiffDock.git
cd DiffDock
conda env create --file environment.yml
conda activate diffdock
Option 2: Docker
docker pull rbgcsail/diffdock
docker run -it --gpus all --entrypoint /bin/bash rbgcsail/diffdock
micromamba activate diffdock
Important Notes:
Use Case: Dock one ligand to one protein target
Input Requirements:
Command:
python -m inference \
--config default_inference_args.yaml \
--protein_path protein.pdb \
--ligand "CC(=O)Oc1ccccc1C(=O)O" \
--out_dir results/single_docking/
Alternative (protein sequence):
python -m inference \
--config default_inference_args.yaml \
--protein_sequence "MSKGEELFTGVVPILVELDGDVNGHKF..." \
--ligand ligand.sdf \
--out_dir results/sequence_docking/
Output Structure:
results/single_docking/
├── rank_1.sdf # Top-ranked pose
├── rank_2.sdf # Second-ranked pose
├── ...
├── rank_10.sdf # 10th pose (default: 10 samples)
└── confidence_scores.txt
Use Case: Dock multiple ligands to proteins, virtual screening campaigns
Step 1: Prepare Batch CSV
Use the provided script to create or validate batch input:
# Create template
python scripts/prepare_batch_csv.py --create --output batch_input.csv
# Validate existing CSV
python scripts/prepare_batch_csv.py my_input.csv --validate
CSV Format:
complex_name,protein_path,ligand_description,protein_sequence
complex1,protein1.pdb,CC(=O)Oc1ccccc1C(=O)O,
complex2,,COc1ccc(C#N)cc1,MSKGEELFT...
complex3,protein3.pdb,ligand3.sdf,
Required Columns:
complex_name: Unique identifierprotein_path: PDB file path (leave empty if using sequence)ligand_description: SMILES string or ligand file pathprotein_sequence: Amino acid sequence (leave empty if using PDB)Step 2: Run Batch Docking
python -m inference \
--config default_inference_args.yaml \
--protein_ligand_csv batch_input.csv \
--out_dir results/batch/ \
--batch_size 10
For Large Virtual Screening ( >100 compounds):
Pre-compute protein embeddings for faster processing:
# Pre-compute embeddings
python datasets/esm_embedding_preparation.py \
--protein_ligand_csv screening_input.csv \
--out_file protein_embeddings.pt
# Run with pre-computed embeddings
python -m inference \
--config default_inference_args.yaml \
--protein_ligand_csv screening_input.csv \
--esm_embeddings_path protein_embeddings.pt \
--out_dir results/screening/
After docking completes, analyze confidence scores and rank predictions:
# Analyze all results
python scripts/analyze_results.py results/batch/
# Show top 5 per complex
python scripts/analyze_results.py results/batch/ --top 5
# Filter by confidence threshold
python scripts/analyze_results.py results/batch/ --threshold 0.0
# Export to CSV
python scripts/analyze_results.py results/batch/ --export summary.csv
# Show top 20 predictions across all complexes
python scripts/analyze_results.py results/batch/ --best 20
The analysis script:
Understanding Scores:
| Score Range | Confidence Level | Interpretation |
|---|---|---|
| > 0 | High | Strong prediction, likely accurate |
| -1.5 to 0 | Moderate | Reasonable prediction, validate carefully |
| < -1.5 | Low | Uncertain prediction, requires validation |
Critical Notes:
For detailed guidance: Read references/confidence_and_limitations.md using the Read tool
Create custom configuration for specific use cases:
# Copy template
cp assets/custom_inference_config.yaml my_config.yaml
# Edit parameters (see template for presets)
# Then run with custom config
python -m inference \
--config my_config.yaml \
--protein_ligand_csv input.csv \
--out_dir results/
Sampling Density:
samples_per_complex: 10 → Increase to 20-40 for difficult casesInference Steps:
inference_steps: 20 → Increase to 25-30 for higher accuracyTemperature Parameters (control diversity):
temp_sampling_tor: 7.04 → Increase for flexible ligands (8-10)temp_sampling_tor: 7.04 → Decrease for rigid ligands (5-6)Presets Available in Template:
For complete parameter reference: Read references/parameters_reference.md using the Read tool
For proteins with known flexibility, dock to multiple conformations:
# Create ensemble CSV
import pandas as pd
conformations = ["conf1.pdb", "conf2.pdb", "conf3.pdb"]
ligand = "CC(=O)Oc1ccccc1C(=O)O"
data = {
"complex_name": [f"ensemble_{i}" for i in range(len(conformations))],
"protein_path": conformations,
"ligand_description": [ligand] * len(conformations),
"protein_sequence": [""] * len(conformations)
}
pd.DataFrame(data).to_csv("ensemble_input.csv", index=False)
Run docking with increased sampling:
python -m inference \
--config default_inference_args.yaml \
--protein_ligand_csv ensemble_input.csv \
--samples_per_complex 20 \
--out_dir results/ensemble/
DiffDock generates poses; combine with other tools for affinity:
GNINA (Fast neural network scoring):
for pose in results/*.sdf; do
gnina -r protein.pdb -l "$pose" --score_only
done
MM/GBSA (More accurate, slower): Use AmberTools MMPBSA.py or gmx_MMPBSA after energy minimization
Free Energy Calculations (Most accurate): Use OpenMM + OpenFE or GROMACS for FEP/TI calculations
Recommended Workflow:
DiffDock IS Designed For:
DiffDock IS NOT Designed For:
For complete limitations: Read references/confidence_and_limitations.md using the Read tool
Issue: Low confidence scores across all predictions
samples_per_complex (20-40), try ensemble docking, validate protein structureIssue: Out of memory errors
--batch_size 2 or process fewer complexes at onceIssue: Slow performance
python -c "import torch; print(torch.cuda.is_available())", use GPUIssue: Unrealistic binding poses
Issue: "Module not found" errors
python scripts/setup_check.py to diagnoseFor Best Results:
For interactive use, launch the web interface:
python app/main.py
# Navigate to http://localhost:7860
Or use the online demo without installation:
scripts/)prepare_batch_csv.py : Create and validate batch input CSV files
analyze_results.py : Analyze confidence scores and rank predictions
setup_check.py : Verify DiffDock environment setup
references/)parameters_reference.md : Complete parameter documentation
Read this file when users need:
confidence_and_limitations.md : Confidence score interpretation and tool limitations
Read this file when users need:
workflows_examples.md : Comprehensive workflow examples
Read this file when users need:
assets/)batch_template.csv : Template for batch processing
custom_inference_config.yaml : Configuration template
setup_check.py before starting large jobsprepare_batch_csv.py to catch errors earlyWhen using DiffDock, cite the appropriate papers:
DiffDock-L (current default model):
Stärk et al. (2024) "DiffDock-L: Improving Molecular Docking with Diffusion Models"
arXiv:2402.18396
Original DiffDock:
Corso et al. (2023) "DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking"
ICLR 2023, arXiv:2210.01776
Weekly Installs
143
Repository
GitHub Stars
23.4K
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
claude-code121
opencode119
cursor114
gemini-cli113
codex102
antigravity101
超能力技能使用指南:AI助手技能调用优先级与工作流程详解
55,300 周安装
JSON Canvas 技能详解:创建与编辑 Obsidian 无限画布文件(.canvas)
110 周安装
Harness 长时运行智能体框架:自动恢复、任务依赖与标准化错误处理
109 周安装
AI驱动规模化个性化销售开场白生成器 - 提升8-15%回复率,节省90%研究时间
111 周安装
Three.js与TresJS开发指南:构建高性能3D全息HUD界面与WebGL安全实践
111 周安装
LayerChart Svelte 5 图表库 | Svelte 5 数据可视化组件,支持工具提示和多种图表模式
111 周安装
ArXiv论文对话AI:智能搜索、问答与文献综述助手,提升科研效率
110 周安装