torchdrug by davila7/claude-code-templates
npx skills add https://github.com/davila7/claude-code-templates --skill torchdrugTorchDrug 是一个基于 PyTorch 的综合性机器学习工具箱,专为药物发现和分子科学设计。它应用图神经网络、预训练模型和任务定义来处理分子、蛋白质和生物知识图谱,涵盖分子性质预测、蛋白质建模、知识图谱推理、分子生成、逆合成规划等领域,并提供了 40 多个精选数据集和 20 多种模型架构。
此技能适用于处理以下内容:
数据类型:
任务:
库与集成:
uv pip install torchdrug
# 或安装包含可选依赖的版本
uv pip install torchdrug[full]
from torchdrug import datasets, models, tasks
from torch.utils.data import DataLoader
# 加载分子数据集
dataset = datasets.BBBP("~/molecule-datasets/")
train_set, valid_set, test_set = dataset.split()
# 定义 GNN 模型
model = models.GIN(
input_dim=dataset.node_feature_dim,
hidden_dims=[256, 256, 256],
edge_input_dim=dataset.edge_feature_dim,
batch_norm=True,
readout="mean"
)
# 创建性质预测任务
task = tasks.PropertyPrediction(
model,
task=dataset.tasks,
criterion="bce",
metric=["auroc", "auprc"]
)
# 使用 PyTorch 训练
optimizer = torch.optim.Adam(task.parameters(), lr=1e-3)
train_loader = DataLoader(train_set, batch_size=32, shuffle=True)
for epoch in range(100):
for batch in train_loader:
loss = task(batch)
optimizer.zero_grad()
loss.backward()
optimizer.step()
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
根据结构预测分子的化学、物理和生物学性质。
使用场景:
关键组件:
参考: 查看 references/molecular_property_prediction.md 获取:
处理蛋白质序列、结构和性质。
使用场景:
关键组件:
参考: 查看 references/protein_modeling.md 获取:
预测生物知识图谱中缺失的链接和关系。
使用场景:
关键组件:
参考: 查看 references/knowledge_graphs.md 获取:
生成具有所需性质的新颖分子结构。
使用场景:
关键组件:
参考: 查看 references/molecular_generation.md 获取:
预测从目标分子到起始原料的合成路线。
使用场景:
关键组件:
参考: 查看 references/retrosynthesis.md 获取:
针对不同数据类型和任务的全面 GNN 架构目录。
可用模型:
参考: 查看 references/models_architectures.md 获取:
涵盖化学、生物学和知识图谱的 40 多个精选数据集。
类别:
参考: 查看 references/datasets.md 获取:
场景: 预测候选药物的血脑屏障渗透性。
步骤:
datasets.BBBP()PropertyPrediction导航: references/molecular_property_prediction.md → 数据集选择 → 模型选择 → 训练
场景: 根据序列预测酶功能。
步骤:
datasets.EnzymeCommission()PropertyPrediction导航: references/protein_modeling.md → 模型选择(序列 vs 结构)→ 预训练策略
场景: 在 Hetionet 中寻找新的疾病治疗方法。
步骤:
datasets.Hetionet()KnowledgeGraphCompletion导航: references/knowledge_graphs.md → Hetionet 数据集 → 模型选择 → 生物医学应用
场景: 生成针对靶点结合优化的类药分子。
步骤:
导航: references/molecular_generation.md → 条件生成 → 多目标优化
场景: 为目标分子规划合成路线。
步骤:
datasets.USPTO50k()导航: references/retrosynthesis.md → 任务类型 → 多步规划
在 TorchDrug 分子和 RDKit 之间转换:
from torchdrug import data
from rdkit import Chem
# SMILES → TorchDrug 分子
smiles = "CCO"
mol = data.Molecule.from_smiles(smiles)
# TorchDrug → RDKit
rdkit_mol = mol.to_molecule()
# RDKit → TorchDrug
rdkit_mol = Chem.MolFromSmiles(smiles)
mol = data.Molecule.from_molecule(rdkit_mol)
使用预测的结构:
from torchdrug import data
# 加载 AlphaFold 预测的结构
protein = data.Protein.from_pdb("AF-P12345-F1-model_v4.pdb")
# 构建带有空间边的图
graph = protein.residue_graph(
node_position="ca",
edge_types=["sequential", "radius"],
radius_cutoff=10.0
)
将任务包装用于 Lightning 训练:
import pytorch_lightning as pl
class LightningTask(pl.LightningModule):
def __init__(self, torchdrug_task):
super().__init__()
self.task = torchdrug_task
def training_step(self, batch, batch_idx):
return self.task(batch)
def validation_step(self, batch, batch_idx):
pred = self.task.predict(batch)
target = self.task.target(batch)
return {"pred": pred, "target": target}
def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=1e-3)
深入了解 TorchDrug 的架构:
核心概念: 查看 references/core_concepts.md 获取:
选择数据集:
references/datasets.md → 分子部分references/datasets.md → 蛋白质部分references/datasets.md → 知识图谱部分选择模型:
references/models_architectures.md → GNN 部分 → GIN/GAT/SchNetreferences/models_architectures.md → 蛋白质部分 → ESMreferences/models_architectures.md → 蛋白质部分 → GearNetreferences/models_architectures.md → KG 部分 → RotatE/ComplEx常见任务:
references/molecular_property_prediction.md 或 references/protein_modeling.mdreferences/molecular_generation.mdreferences/retrosynthesis.mdreferences/knowledge_graphs.md理解架构:
references/core_concepts.md → 数据结构references/core_concepts.md → 模型接口references/core_concepts.md → 任务接口问题:维度不匹配错误 → 检查 model.input_dim 是否匹配 dataset.node_feature_dim → 查看 references/core_concepts.md → 基本属性
问题:分子任务性能不佳 → 使用骨架划分,而非随机划分 → 尝试 GIN 而非 GCN → 查看 references/molecular_property_prediction.md → 最佳实践
问题:蛋白质模型不学习 → 对于序列任务使用预训练的 ESM → 检查结构模型的边构建 → 查看 references/protein_modeling.md → 训练工作流程
问题:大图导致内存错误 → 减小批大小 → 使用梯度累积 → 查看 references/core_concepts.md → 内存效率
问题:生成的分子无效 → 添加有效性约束 → 使用 RDKit 验证进行后处理 → 查看 references/molecular_generation.md → 验证和过滤
官方文档: https://torchdrug.ai/docs/ GitHub: https://github.com/DeepGraphLearning/torchdrug 论文: TorchDrug: A Powerful and Flexible Machine Learning Platform for Drug Discovery
根据您的任务导航到相应的参考文件:
molecular_property_prediction.mdprotein_modeling.mdknowledge_graphs.mdmolecular_generation.mdretrosynthesis.mdmodels_architectures.mddatasets.mdcore_concepts.md每个参考文件都提供了其领域的全面覆盖,包括示例、最佳实践和常见用例。
每周安装量
123
代码仓库
GitHub 星标数
22.6K
首次出现
2026 年 1 月 21 日
安全审计
安装于
claude-code104
opencode97
gemini-cli91
cursor91
antigravity87
codex81
TorchDrug is a comprehensive PyTorch-based machine learning toolbox for drug discovery and molecular science. Apply graph neural networks, pre-trained models, and task definitions to molecules, proteins, and biological knowledge graphs, including molecular property prediction, protein modeling, knowledge graph reasoning, molecular generation, retrosynthesis planning, with 40+ curated datasets and 20+ model architectures.
This skill should be used when working with:
Data Types:
Tasks:
Libraries and Integration:
uv pip install torchdrug
# Or with optional dependencies
uv pip install torchdrug[full]
from torchdrug import datasets, models, tasks
from torch.utils.data import DataLoader
# Load molecular dataset
dataset = datasets.BBBP("~/molecule-datasets/")
train_set, valid_set, test_set = dataset.split()
# Define GNN model
model = models.GIN(
input_dim=dataset.node_feature_dim,
hidden_dims=[256, 256, 256],
edge_input_dim=dataset.edge_feature_dim,
batch_norm=True,
readout="mean"
)
# Create property prediction task
task = tasks.PropertyPrediction(
model,
task=dataset.tasks,
criterion="bce",
metric=["auroc", "auprc"]
)
# Train with PyTorch
optimizer = torch.optim.Adam(task.parameters(), lr=1e-3)
train_loader = DataLoader(train_set, batch_size=32, shuffle=True)
for epoch in range(100):
for batch in train_loader:
loss = task(batch)
optimizer.zero_grad()
loss.backward()
optimizer.step()
Predict chemical, physical, and biological properties of molecules from structure.
Use Cases:
Key Components:
Reference: See references/molecular_property_prediction.md for:
Work with protein sequences, structures, and properties.
Use Cases:
Key Components:
Reference: See references/protein_modeling.md for:
Predict missing links and relationships in biological knowledge graphs.
Use Cases:
Key Components:
Reference: See references/knowledge_graphs.md for:
Generate novel molecular structures with desired properties.
Use Cases:
Key Components:
Reference: See references/molecular_generation.md for:
Predict synthetic routes from target molecules to starting materials.
Use Cases:
Key Components:
Reference: See references/retrosynthesis.md for:
Comprehensive catalog of GNN architectures for different data types and tasks.
Available Models:
Reference: See references/models_architectures.md for:
40+ curated datasets spanning chemistry, biology, and knowledge graphs.
Categories:
Reference: See references/datasets.md for:
Scenario: Predict blood-brain barrier penetration for drug candidates.
Steps:
datasets.BBBP()PropertyPrediction with binary classificationNavigation: references/molecular_property_prediction.md → Dataset selection → Model selection → Training
Scenario: Predict enzyme function from sequence.
Steps:
datasets.EnzymeCommission()PropertyPrediction with multi-class classificationNavigation: references/protein_modeling.md → Model selection (sequence vs structure) → Pre-training strategies
Scenario: Find new disease treatments in Hetionet.
Steps:
datasets.Hetionet()KnowledgeGraphCompletionNavigation: references/knowledge_graphs.md → Hetionet dataset → Model selection → Biomedical applications
Scenario: Generate drug-like molecules optimized for target binding.
Steps:
Navigation: references/molecular_generation.md → Conditional generation → Multi-objective optimization
Scenario: Plan synthesis route for target molecule.
Steps:
datasets.USPTO50k()Navigation: references/retrosynthesis.md → Task types → Multi-step planning
Convert between TorchDrug molecules and RDKit:
from torchdrug import data
from rdkit import Chem
# SMILES → TorchDrug molecule
smiles = "CCO"
mol = data.Molecule.from_smiles(smiles)
# TorchDrug → RDKit
rdkit_mol = mol.to_molecule()
# RDKit → TorchDrug
rdkit_mol = Chem.MolFromSmiles(smiles)
mol = data.Molecule.from_molecule(rdkit_mol)
Use predicted structures:
from torchdrug import data
# Load AlphaFold predicted structure
protein = data.Protein.from_pdb("AF-P12345-F1-model_v4.pdb")
# Build graph with spatial edges
graph = protein.residue_graph(
node_position="ca",
edge_types=["sequential", "radius"],
radius_cutoff=10.0
)
Wrap tasks for Lightning training:
import pytorch_lightning as pl
class LightningTask(pl.LightningModule):
def __init__(self, torchdrug_task):
super().__init__()
self.task = torchdrug_task
def training_step(self, batch, batch_idx):
return self.task(batch)
def validation_step(self, batch, batch_idx):
pred = self.task.predict(batch)
target = self.task.target(batch)
return {"pred": pred, "target": target}
def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=1e-3)
For deep dives into TorchDrug's architecture:
Core Concepts: See references/core_concepts.md for:
Choose Dataset:
references/datasets.md → Molecular sectionreferences/datasets.md → Protein sectionreferences/datasets.md → Knowledge graph sectionChoose Model:
references/models_architectures.md → GNN section → GIN/GAT/SchNetreferences/models_architectures.md → Protein section → ESMreferences/models_architectures.md → Protein section → GearNetreferences/models_architectures.md → KG section → RotatE/ComplExCommon Tasks:
references/molecular_property_prediction.md or references/protein_modeling.mdreferences/molecular_generation.mdreferences/retrosynthesis.mdreferences/knowledge_graphs.mdUnderstand Architecture:
references/core_concepts.md → Data Structuresreferences/core_concepts.md → Model Interfacereferences/core_concepts.md → Task InterfaceIssue: Dimension mismatch errors → Check model.input_dim matches dataset.node_feature_dim → See references/core_concepts.md → Essential Attributes
Issue: Poor performance on molecular tasks → Use scaffold splitting, not random → Try GIN instead of GCN → See references/molecular_property_prediction.md → Best Practices
Issue: Protein model not learning → Use pre-trained ESM for sequence tasks → Check edge construction for structure models → See references/protein_modeling.md → Training Workflows
Issue: Memory errors with large graphs → Reduce batch size → Use gradient accumulation → See references/core_concepts.md → Memory Efficiency
Issue: Generated molecules are invalid → Add validity constraints → Post-process with RDKit validation → See references/molecular_generation.md → Validation and Filtering
Official Documentation: https://torchdrug.ai/docs/ GitHub: https://github.com/DeepGraphLearning/torchdrug Paper: TorchDrug: A Powerful and Flexible Machine Learning Platform for Drug Discovery
Navigate to the appropriate reference file based on your task:
molecular_property_prediction.mdprotein_modeling.mdknowledge_graphs.mdmolecular_generation.mdretrosynthesis.mdmodels_architectures.mddatasets.mdcore_concepts.mdEach reference provides comprehensive coverage of its domain with examples, best practices, and common use cases.
Weekly Installs
123
Repository
GitHub Stars
22.6K
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
claude-code104
opencode97
gemini-cli91
cursor91
antigravity87
codex81
AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具
50,900 周安装