⚠️

重要前提

安装AI Skills的关键前提是：必须科学上网，且开启TUN模式，这一点至关重要，直接决定安装能否顺利完成，在此郑重提醒三遍：科学上网，科学上网，科学上网。查看完整安装教程 →

DiffDock：基于扩散模型的分子对接工具，用于药物发现和蛋白质-配体结合预测

diffdock by davila7/claude-code-templates

194 周安装量

24,300 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/davila7/claude-code-templates --skill diffdock

AI/机器学习科研工具生物信息学

🇨🇳中文介绍

DiffDock：基于扩散模型的分子对接

概述

DiffDock 是一款基于扩散的深度学习工具，用于分子对接，可预测小分子配体与蛋白质靶点的 3D 结合构象。它代表了计算对接领域的最先进技术，对于基于结构的药物发现和化学生物学至关重要。

核心能力：

使用深度学习高精度预测配体结合构象
支持蛋白质结构（PDB 文件）或序列（通过 ESMFold）
处理单一复合物或批量虚拟筛选任务
生成置信度分数以评估预测可靠性
处理多种配体输入（SMILES、SDF、MOL2）

关键区别： DiffDock 预测的是结合构象（3D 结构）和置信度（预测确定性），而非结合亲和力（ΔG、Kd）。进行亲和力评估时，务必结合评分函数（如 GNINA、MM/GBSA）使用。

何时使用此技能

此技能应在以下情况使用：

"将此配体对接到蛋白质上" 或 "预测结合构象"
"运行分子对接" 或 "执行蛋白质-配体对接"
"虚拟筛选" 或 "筛选化合物库"
"这个分子在哪里结合？" 或 "预测结合位点"
基于结构的药物设计或先导化合物优化任务
涉及 PDB 文件 + SMILES 字符串或配体结构的任务
多个蛋白质-配体对的批量对接

安装与环境设置

检查环境状态

在执行 DiffDock 任务之前，请验证环境设置：

# 使用提供的设置检查器
python scripts/setup_check.py

此脚本会验证 Python 版本、带 CUDA 的 PyTorch、PyTorch Geometric、RDKit、ESM 以及其他依赖项。

安装选项

选项 1：Conda（推荐）

git clone https://github.com/gcorso/DiffDock.git
cd DiffDock
conda env create --file environment.yml
conda activate diffdock

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

工作流程 1：单一蛋白质-配体对接

使用场景： 将一个配体对接到一个蛋白质靶点

蛋白质：PDB 文件或氨基酸序列
配体：SMILES 字符串或结构文件（SDF/MOL2）

python -m inference \
  --config default_inference_args.yaml \
  --protein_path protein.pdb \
  --ligand "CC(=O)Oc1ccccc1C(=O)O" \
  --out_dir results/single_docking/

替代方案（蛋白质序列）：

python -m inference \
  --config default_inference_args.yaml \
  --protein_sequence "MSKGEELFTGVVPILVELDGDVNGHKF..." \
  --ligand ligand.sdf \
  --out_dir results/sequence_docking/

results/single_docking/
├── rank_1.sdf          # 排名第一的构象
├── rank_2.sdf          # 排名第二的构象
├── ...
├── rank_10.sdf         # 第 10 个构象（默认：10 个样本）
└── confidence_scores.txt

工作流程 2：批量处理多个复合物

使用场景： 将多个配体对接到蛋白质，进行虚拟筛选任务

步骤 1：准备批量 CSV

使用提供的脚本创建或验证批量输入：

# 创建模板
python scripts/prepare_batch_csv.py --create --output batch_input.csv

# 验证现有 CSV
python scripts/prepare_batch_csv.py my_input.csv --validate

complex_name,protein_path,ligand_description,protein_sequence
complex1,protein1.pdb,CC(=O)Oc1ccccc1C(=O)O,
complex2,,COc1ccc(C#N)cc1,MSKGEELFT...
complex3,protein3.pdb,ligand3.sdf,

complex_name：唯一标识符
protein_path：PDB 文件路径（如果使用序列则留空）
ligand_description：SMILES 字符串或配体文件路径
protein_sequence：氨基酸序列（如果使用 PDB 则留空）

步骤 2：运行批量对接

python -m inference \
  --config default_inference_args.yaml \
  --protein_ligand_csv batch_input.csv \
  --out_dir results/batch/ \
  --batch_size 10

对于大型虚拟筛选（>100 个化合物）：

预计算蛋白质嵌入以加快处理速度：

# 预计算嵌入
python datasets/esm_embedding_preparation.py \
  --protein_ligand_csv screening_input.csv \
  --out_file protein_embeddings.pt

# 使用预计算的嵌入运行
python -m inference \
  --config default_inference_args.yaml \
  --protein_ligand_csv screening_input.csv \
  --esm_embeddings_path protein_embeddings.pt \
  --out_dir results/screening/

工作流程 3：分析结果

对接完成后，分析置信度分数并对预测进行排序：

# 分析所有结果
python scripts/analyze_results.py results/batch/

# 显示每个复合物的前 5 名
python scripts/analyze_results.py results/batch/ --top 5

# 按置信度阈值过滤
python scripts/analyze_results.py results/batch/ --threshold 0.0

# 导出到 CSV
python scripts/analyze_results.py results/batch/ --export summary.csv

# 显示所有复合物中排名前 20 的预测
python scripts/analyze_results.py results/batch/ --best 20

分析脚本的功能：

解析所有预测的置信度分数
分类为高（>0）、中（-1.5 到 0）或低（<-1.5）置信度
在复合物内部和跨复合物之间对预测进行排序
生成统计摘要
将结果导出到 CSV 以供下游分析

置信度分数解读

分数范围	置信度等级	解读
> 0	高	强预测，可能准确
-1.5 到 0	中	合理预测，需仔细验证
< -1.5	低	不确定预测，需要验证

置信度 ≠ 亲和力：高置信度意味着模型对结构的确定性高，而非结合力强
具体情况具体分析：针对以下情况调整预期：
- 大配体（>500 Da）：预期置信度较低
- 多蛋白链：可能降低置信度
- 新型蛋白质家族：可能表现不佳
多个样本：查看前 3-5 个预测，寻找共识

详细指南： 使用 Read 工具阅读 references/confidence_and_limitations.md

使用自定义配置

为特定用例创建自定义配置：

# 复制模板
cp assets/custom_inference_config.yaml my_config.yaml

# 编辑参数（参见模板中的预设）
# 然后使用自定义配置运行
python -m inference \
  --config my_config.yaml \
  --protein_ligand_csv input.csv \
  --out_dir results/

需要调整的关键参数

samples_per_complex: 10 → 对于困难情况，增加到 20-40
更多样本 = 更好的覆盖率但运行时间更长

inference_steps: 20 → 为获得更高精度，增加到 25-30
更多步骤 = 可能质量更好但速度更慢

温度参数（控制多样性）：

temp_sampling_tor: 7.04 → 对于柔性配体，增加到 8-10
temp_sampling_tor: 7.04 → 对于刚性配体，减少到 5-6
更高温度 = 更多样化的构象

模板中可用的预设：

高精度：更多样本 + 步骤，更低温度
快速筛选：较少样本，更快
柔性配体：增加扭转温度
刚性配体：降低扭转温度

完整参数参考： 使用 Read 工具阅读 references/parameters_reference.md

集成对接（蛋白质柔性）

对于已知具有柔性的蛋白质，对接到多个构象：

# 创建集成 CSV
import pandas as pd

conformations = ["conf1.pdb", "conf2.pdb", "conf3.pdb"]
ligand = "CC(=O)Oc1ccccc1C(=O)O"

data = {
    "complex_name": [f"ensemble_{i}" for i in range(len(conformations))],
    "protein_path": conformations,
    "ligand_description": [ligand] * len(conformations),
    "protein_sequence": [""] * len(conformations)
}

pd.DataFrame(data).to_csv("ensemble_input.csv", index=False)

增加采样运行对接：

python -m inference \
  --config default_inference_args.yaml \
  --protein_ligand_csv ensemble_input.csv \
  --samples_per_complex 20 \
  --out_dir results/ensemble/

与评分函数集成

DiffDock 生成构象；结合其他工具进行亲和力评估：

GNINA（快速神经网络评分）：

for pose in results/*.sdf; do
    gnina -r protein.pdb -l "$pose" --score_only
done

MM/GBSA（更准确，更慢）： 能量最小化后使用 AmberTools MMPBSA.py 或 gmx_MMPBSA

自由能计算（最准确）： 使用 OpenMM + OpenFE 或 GROMACS 进行 FEP/TI 计算

推荐工作流程：

DiffDock → 生成带置信度分数的构象
可视化检查 → 检查结构合理性
GNINA 或 MM/GBSA → 重新评分并按亲和力排序
实验验证 → 生化检测

局限性与适用范围

DiffDock 设计用于：

小分子配体（通常 100-1000 Da）
类药有机化合物
小肽（<20 个残基）
单链或多链蛋白质

DiffDock 不设计用于：

大型生物分子（蛋白质-蛋白质对接）→ 使用 DiffDock-PP 或 AlphaFold-Multimer
大肽（>20 个残基）→ 使用替代方法
共价对接 → 使用专门的共价对接工具
结合亲和力预测 → 结合评分函数使用
膜蛋白 → 未专门训练，请谨慎使用

完整局限性： 使用 Read 工具阅读 references/confidence_and_limitations.md

问题：所有预测的置信度分数都很低

原因：大/不寻常配体、结合位点不明确、蛋白质柔性
解决方案：增加 samples_per_complex（20-40），尝试集成对接，验证蛋白质结构

问题：内存不足错误

原因：GPU 内存不足以处理批量大小
解决方案：减少 --batch_size 2 或一次处理更少的复合物

问题：性能缓慢

原因：在 CPU 而非 GPU 上运行
解决方案：使用 python -c "import torch; print(torch.cuda.is_available())" 验证 CUDA，使用 GPU

问题：不现实的结合构象

原因：蛋白质准备不佳、配体过大、错误的结合位点
解决方案：检查蛋白质是否有缺失残基，移除远处的水分子，考虑指定结合位点

问题："Module not found" 错误

原因：缺少依赖项或环境错误
解决方案：运行 python scripts/setup_check.py 进行诊断

使用 GPU（实际使用必不可少）
为重复使用的蛋白质预计算 ESM 嵌入
批量处理多个复合物
从默认参数开始，然后根据需要调整
验证蛋白质结构（解决缺失残基）
对配体使用规范 SMILES

如需交互式使用，启动 Web 界面：

python app/main.py
# 导航到 http://localhost:7860

或使用无需安装的在线演示：

辅助脚本 (`scripts/`)

prepare_batch_csv.py：创建和验证批量输入 CSV 文件

创建包含示例条目的模板
验证文件路径和 SMILES 字符串
检查必需列和格式问题

analyze_results.py：分析置信度分数并对预测进行排序

解析单次或批量运行的结果
生成统计摘要
导出到 CSV 以供下游分析
识别跨复合物的顶级预测

setup_check.py：验证 DiffDock 环境设置

检查 Python 版本和依赖项
验证 PyTorch 和 CUDA 可用性
测试 RDKit 和 PyTorch Geometric 安装
根据需要提供安装说明

参考文档 (`references/`)

parameters_reference.md：完整的参数文档

所有命令行选项和配置参数
默认值和可接受范围
用于控制多样性的温度参数
模型检查点位置和版本标志

当用户需要时阅读此文件：

详细的参数解释
针对特定系统的微调指导
替代采样策略

confidence_and_limitations.md：置信度分数解读和工具局限性

详细的置信度分数解读
何时信任预测
DiffDock 的范围和局限性
与互补工具的集成
预测质量故障排除

当用户需要时阅读此文件：

帮助解读置信度分数
理解何时不应使用 DiffDock
结合其他工具的指导
验证策略

workflows_examples.md：全面的工作流程示例

详细的安装说明
所有工作流程的逐步示例
高级集成模式
常见问题故障排除
最佳实践和优化技巧

当用户需要时阅读此文件：

包含代码的完整工作流程示例
与 GNINA、OpenMM 或其他工具的集成
虚拟筛选工作流程
集成对接程序

资源文件 (`assets/`)

batch_template.csv：批量处理模板

预格式化的 CSV，包含必需列
显示不同输入类型的示例条目
可直接用实际数据自定义

custom_inference_config.yaml：配置模板

包含所有参数的注释 YAML
四种常见用例的预设配置
详细解释每个参数的注释
可直接自定义和使用

始终验证环境：在开始大型任务前使用 setup_check.py
验证批量 CSV：使用 prepare_batch_csv.py 及早发现错误
从默认值开始：然后根据系统特定需求调整参数
生成多个样本（10-40）以获得稳健的预测
可视化检查：在下游分析前检查顶级构象
结合评分函数进行亲和力评估
使用置信度分数进行初步排序，而非最终决策
预计算嵌入：用于虚拟筛选任务
记录所用参数以确保可重复性
实验验证结果：在可能的情况下

使用 DiffDock 时，请引用相应的论文：

DiffDock-L（当前默认模型）：

Stärk et al. (2024) "DiffDock-L: Improving Molecular Docking with Diffusion Models"
arXiv:2402.18396

原始 DiffDock：

Corso et al. (2023) "DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking"
ICLR 2023, arXiv:2210.01776

🇺🇸English

DiffDock: Molecular Docking with Diffusion Models

Overview

DiffDock is a diffusion-based deep learning tool for molecular docking that predicts 3D binding poses of small molecule ligands to protein targets. It represents the state-of-the-art in computational docking, crucial for structure-based drug discovery and chemical biology.

Core Capabilities:

Predict ligand binding poses with high accuracy using deep learning
Support protein structures (PDB files) or sequences (via ESMFold)
Process single complexes or batch virtual screening campaigns
Generate confidence scores to assess prediction reliability
Handle diverse ligand inputs (SMILES, SDF, MOL2)

Key Distinction: DiffDock predicts binding poses (3D structure) and confidence (prediction certainty), NOT binding affinity (ΔG, Kd). Always combine with scoring functions (GNINA, MM/GBSA) for affinity assessment.

When to Use This Skill

This skill should be used when:

"Dock this ligand to a protein" or "predict binding pose"
"Run molecular docking" or "perform protein-ligand docking"
"Virtual screening" or "screen compound library"
"Where does this molecule bind?" or "predict binding site"
Structure-based drug design or lead optimization tasks
Tasks involving PDB files + SMILES strings or ligand structures
Batch docking of multiple protein-ligand pairs

Installation and Environment Setup

Check Environment Status

Before proceeding with DiffDock tasks, verify the environment setup:

# Use the provided setup checker
python scripts/setup_check.py

This script validates Python version, PyTorch with CUDA, PyTorch Geometric, RDKit, ESM, and other dependencies.

Installation Options

Option 1: Conda (Recommended)

git clone https://github.com/gcorso/DiffDock.git
cd DiffDock
conda env create --file environment.yml
conda activate diffdock

Option 2: Docker

docker pull rbgcsail/diffdock
docker run -it --gpus all --entrypoint /bin/bash rbgcsail/diffdock
micromamba activate diffdock

Important Notes:

GPU strongly recommended (10-100x speedup vs CPU)
First run pre-computes SO(2)/SO(3) lookup tables (~2-5 minutes)
Model checkpoints (~500MB) download automatically if not present

Core Workflows

Workflow 1: Single Protein-Ligand Docking

Use Case: Dock one ligand to one protein target

Input Requirements:

Protein: PDB file OR amino acid sequence
Ligand: SMILES string OR structure file (SDF/MOL2)

Command:

python -m inference \
  --config default_inference_args.yaml \
  --protein_path protein.pdb \
  --ligand "CC(=O)Oc1ccccc1C(=O)O" \
  --out_dir results/single_docking/

Alternative (protein sequence):

python -m inference \
  --config default_inference_args.yaml \
  --protein_sequence "MSKGEELFTGVVPILVELDGDVNGHKF..." \
  --ligand ligand.sdf \
  --out_dir results/sequence_docking/

Output Structure:

results/single_docking/
├── rank_1.sdf          # Top-ranked pose
├── rank_2.sdf          # Second-ranked pose
├── ...
├── rank_10.sdf         # 10th pose (default: 10 samples)
└── confidence_scores.txt

Workflow 2: Batch Processing Multiple Complexes

Use Case: Dock multiple ligands to proteins, virtual screening campaigns

Step 1: Prepare Batch CSV

Use the provided script to create or validate batch input:

# Create template
python scripts/prepare_batch_csv.py --create --output batch_input.csv

# Validate existing CSV
python scripts/prepare_batch_csv.py my_input.csv --validate

CSV Format:

complex_name,protein_path,ligand_description,protein_sequence
complex1,protein1.pdb,CC(=O)Oc1ccccc1C(=O)O,
complex2,,COc1ccc(C#N)cc1,MSKGEELFT...
complex3,protein3.pdb,ligand3.sdf,

Required Columns:

complex_name: Unique identifier
protein_path: PDB file path (leave empty if using sequence)
ligand_description: SMILES string or ligand file path
protein_sequence: Amino acid sequence (leave empty if using PDB)

Step 2: Run Batch Docking

python -m inference \
  --config default_inference_args.yaml \
  --protein_ligand_csv batch_input.csv \
  --out_dir results/batch/ \
  --batch_size 10

For Large Virtual Screening ( >100 compounds):

Pre-compute protein embeddings for faster processing:

# Pre-compute embeddings
python datasets/esm_embedding_preparation.py \
  --protein_ligand_csv screening_input.csv \
  --out_file protein_embeddings.pt

# Run with pre-computed embeddings
python -m inference \
  --config default_inference_args.yaml \
  --protein_ligand_csv screening_input.csv \
  --esm_embeddings_path protein_embeddings.pt \
  --out_dir results/screening/

Workflow 3: Analyzing Results

After docking completes, analyze confidence scores and rank predictions:

# Analyze all results
python scripts/analyze_results.py results/batch/

# Show top 5 per complex
python scripts/analyze_results.py results/batch/ --top 5

# Filter by confidence threshold
python scripts/analyze_results.py results/batch/ --threshold 0.0

# Export to CSV
python scripts/analyze_results.py results/batch/ --export summary.csv

# Show top 20 predictions across all complexes
python scripts/analyze_results.py results/batch/ --best 20

The analysis script:

Parses confidence scores from all predictions
Classifies as High (>0), Moderate (-1.5 to 0), or Low (<-1.5)
Ranks predictions within and across complexes
Generates statistical summaries
Exports results to CSV for downstream analysis

Confidence Score Interpretation

Understanding Scores:

Score Range	Confidence Level	Interpretation
> 0	High	Strong prediction, likely accurate
-1.5 to 0	Moderate	Reasonable prediction, validate carefully
< -1.5	Low	Uncertain prediction, requires validation

Critical Notes:

Confidence ≠ Affinity : High confidence means model certainty about structure, NOT strong binding
Context Matters : Adjust expectations for:
- Large ligands (>500 Da): Lower confidence expected
- Multiple protein chains: May decrease confidence
- Novel protein families: May underperform
Multiple Samples : Review top 3-5 predictions, look for consensus

For detailed guidance: Read references/confidence_and_limitations.md using the Read tool

Parameter Customization

Using Custom Configuration

Create custom configuration for specific use cases:

# Copy template
cp assets/custom_inference_config.yaml my_config.yaml

# Edit parameters (see template for presets)
# Then run with custom config
python -m inference \
  --config my_config.yaml \
  --protein_ligand_csv input.csv \
  --out_dir results/

Key Parameters to Adjust

Sampling Density:

samples_per_complex: 10 → Increase to 20-40 for difficult cases
More samples = better coverage but longer runtime

Inference Steps:

inference_steps: 20 → Increase to 25-30 for higher accuracy
More steps = potentially better quality but slower

Temperature Parameters (control diversity):

temp_sampling_tor: 7.04 → Increase for flexible ligands (8-10)
temp_sampling_tor: 7.04 → Decrease for rigid ligands (5-6)
Higher temperature = more diverse poses

Presets Available in Template:

High Accuracy: More samples + steps, lower temperature
Fast Screening: Fewer samples, faster
Flexible Ligands: Increased torsion temperature
Rigid Ligands: Decreased torsion temperature

For complete parameter reference: Read references/parameters_reference.md using the Read tool

Advanced Techniques

Ensemble Docking (Protein Flexibility)

For proteins with known flexibility, dock to multiple conformations:

# Create ensemble CSV
import pandas as pd

conformations = ["conf1.pdb", "conf2.pdb", "conf3.pdb"]
ligand = "CC(=O)Oc1ccccc1C(=O)O"

data = {
    "complex_name": [f"ensemble_{i}" for i in range(len(conformations))],
    "protein_path": conformations,
    "ligand_description": [ligand] * len(conformations),
    "protein_sequence": [""] * len(conformations)
}

pd.DataFrame(data).to_csv("ensemble_input.csv", index=False)

Run docking with increased sampling:

python -m inference \
  --config default_inference_args.yaml \
  --protein_ligand_csv ensemble_input.csv \
  --samples_per_complex 20 \
  --out_dir results/ensemble/

Integration with Scoring Functions

DiffDock generates poses; combine with other tools for affinity:

GNINA (Fast neural network scoring):

for pose in results/*.sdf; do
    gnina -r protein.pdb -l "$pose" --score_only
done

MM/GBSA (More accurate, slower): Use AmberTools MMPBSA.py or gmx_MMPBSA after energy minimization

Free Energy Calculations (Most accurate): Use OpenMM + OpenFE or GROMACS for FEP/TI calculations

Recommended Workflow:

DiffDock → Generate poses with confidence scores
Visual inspection → Check structural plausibility
GNINA or MM/GBSA → Rescore and rank by affinity
Experimental validation → Biochemical assays

Limitations and Scope

DiffDock IS Designed For:

Small molecule ligands (typically 100-1000 Da)
Drug-like organic compounds
Small peptides (<20 residues)
Single or multi-chain proteins

DiffDock IS NOT Designed For:

Large biomolecules (protein-protein docking) → Use DiffDock-PP or AlphaFold-Multimer
Large peptides (>20 residues) → Use alternative methods
Covalent docking → Use specialized covalent docking tools
Binding affinity prediction → Combine with scoring functions
Membrane proteins → Not specifically trained, use with caution

For complete limitations: Read references/confidence_and_limitations.md using the Read tool

Troubleshooting

Common Issues

Issue: Low confidence scores across all predictions

Cause: Large/unusual ligands, unclear binding site, protein flexibility
Solution: Increase samples_per_complex (20-40), try ensemble docking, validate protein structure

Issue: Out of memory errors

Cause: GPU memory insufficient for batch size
Solution: Reduce --batch_size 2 or process fewer complexes at once

Issue: Slow performance

Cause: Running on CPU instead of GPU
Solution: Verify CUDA with python -c "import torch; print(torch.cuda.is_available())", use GPU

Issue: Unrealistic binding poses

Cause: Poor protein preparation, ligand too large, wrong binding site
Solution: Check protein for missing residues, remove far waters, consider specifying binding site

Issue: "Module not found" errors

Cause: Missing dependencies or wrong environment
Solution: Run python scripts/setup_check.py to diagnose

Performance Optimization

For Best Results:

Use GPU (essential for practical use)
Pre-compute ESM embeddings for repeated protein use
Batch process multiple complexes together
Start with default parameters, then tune if needed
Validate protein structures (resolve missing residues)
Use canonical SMILES for ligands

Graphical User Interface

For interactive use, launch the web interface:

python app/main.py
# Navigate to http://localhost:7860

Or use the online demo without installation:

https://huggingface.co/spaces/reginabarzilaygroup/DiffDock-Web

Resources

Helper Scripts (`scripts/`)

prepare_batch_csv.py : Create and validate batch input CSV files

Create templates with example entries
Validate file paths and SMILES strings
Check for required columns and format issues

analyze_results.py : Analyze confidence scores and rank predictions

Parse results from single or batch runs
Generate statistical summaries
Export to CSV for downstream analysis
Identify top predictions across complexes

setup_check.py : Verify DiffDock environment setup

Check Python version and dependencies
Verify PyTorch and CUDA availability
Test RDKit and PyTorch Geometric installation
Provide installation instructions if needed

Reference Documentation (`references/`)

parameters_reference.md : Complete parameter documentation

All command-line options and configuration parameters
Default values and acceptable ranges
Temperature parameters for controlling diversity
Model checkpoint locations and version flags

Read this file when users need:

Detailed parameter explanations
Fine-tuning guidance for specific systems
Alternative sampling strategies

confidence_and_limitations.md : Confidence score interpretation and tool limitations

Detailed confidence score interpretation
When to trust predictions
Scope and limitations of DiffDock
Integration with complementary tools
Troubleshooting prediction quality

Read this file when users need:

Help interpreting confidence scores
Understanding when NOT to use DiffDock
Guidance on combining with other tools
Validation strategies

workflows_examples.md : Comprehensive workflow examples

Detailed installation instructions
Step-by-step examples for all workflows
Advanced integration patterns
Troubleshooting common issues
Best practices and optimization tips

Read this file when users need:

Complete workflow examples with code
Integration with GNINA, OpenMM, or other tools
Virtual screening workflows
Ensemble docking procedures

Assets (`assets/`)

batch_template.csv : Template for batch processing

Pre-formatted CSV with required columns
Example entries showing different input types
Ready to customize with actual data

custom_inference_config.yaml : Configuration template

Annotated YAML with all parameters
Four preset configurations for common use cases
Detailed comments explaining each parameter
Ready to customize and use

Best Practices

Always verify environment with setup_check.py before starting large jobs
Validate batch CSVs with prepare_batch_csv.py to catch errors early
Start with defaults then tune parameters based on system-specific needs
Generate multiple samples (10-40) for robust predictions
Visual inspection of top poses before downstream analysis
Combine with scoring functions for affinity assessment
Use confidence scores for initial ranking, not final decisions
Pre-compute embeddings for virtual screening campaigns
Document parameters used for reproducibility
Validate results experimentally when possible

Citations

When using DiffDock, cite the appropriate papers:

DiffDock-L (current default model):

Stärk et al. (2024) "DiffDock-L: Improving Molecular Docking with Diffusion Models"
arXiv:2402.18396

Original DiffDock:

Corso et al. (2023) "DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking"
ICLR 2023, arXiv:2210.01776

Additional Resources

GitHub Repository : https://github.com/gcorso/DiffDock
Online Demo : https://huggingface.co/spaces/reginabarzilaygroup/DiffDock-Web
DiffDock-L Paper : https://arxiv.org/abs/2402.18396
Original Paper : https://arxiv.org/abs/2210.01776

Weekly Installs

143

Repository

davila7/claude-…emplates

GitHub Stars

23.4K

First Seen

Jan 21, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

claude-code121

opencode119

cursor114

gemini-cli113

codex102

antigravity101

超能力技能使用指南：AI助手技能调用优先级与工作流程详解

55,300 周安装

DiffDock：基于扩散模型的分子对接工具，用于药物发现和蛋白质-配体结合预测

🇨🇳中文介绍

DiffDock：基于扩散模型的分子对接

概述

何时使用此技能

安装与环境设置

检查环境状态

安装选项

相关 Skills

核心工作流程

工作流程 1：单一蛋白质-配体对接

工作流程 2：批量处理多个复合物

工作流程 3：分析结果

置信度分数解读

参数自定义

使用自定义配置

需要调整的关键参数

高级技术

集成对接（蛋白质柔性）

与评分函数集成

局限性与适用范围

故障排除

常见问题

性能优化

图形用户界面

资源

辅助脚本 (scripts/)

参考文档 (references/)

资源文件 (assets/)

最佳实践

引用

其他资源

🇺🇸English

DiffDock: Molecular Docking with Diffusion Models

Overview

When to Use This Skill

Installation and Environment Setup

Check Environment Status

Installation Options

Core Workflows

Workflow 1: Single Protein-Ligand Docking

Workflow 2: Batch Processing Multiple Complexes

Workflow 3: Analyzing Results

Confidence Score Interpretation

Parameter Customization

Using Custom Configuration

Key Parameters to Adjust

Advanced Techniques

Ensemble Docking (Protein Flexibility)

Integration with Scoring Functions

Limitations and Scope

Troubleshooting

Common Issues

Performance Optimization

Graphical User Interface

Resources

Helper Scripts (scripts/)

Reference Documentation (references/)

Assets (assets/)

Best Practices

Citations

Additional Resources

最新 Skills

辅助脚本 (`scripts/`)

参考文档 (`references/`)

资源文件 (`assets/`)

Helper Scripts (`scripts/`)

Reference Documentation (`references/`)

Assets (`assets/`)