chembl-database by davila7/claude-code-templates
npx skills add https://github.com/davila7/claude-code-templates --skill chembl-databaseChEMBL 是由欧洲生物信息学研究所(EBI)维护的手动整理的生物活性分子数据库,包含超过 200 万个化合物、1900 万条生物活性测量数据、13,000 多个药物靶点,以及已批准药物和临床候选药物的数据。使用 ChEMBL Python 客户端以编程方式访问和查询此数据,用于药物发现和药物化学研究。
此技能应在以下情况使用:
编程访问需要 ChEMBL Python 客户端:
uv pip install chembl_webresource_client
from chembl_webresource_client.new_client import new_client
# 访问不同的端点
molecule = new_client.molecule
target = new_client.target
activity = new_client.activity
drug = new_client.drug
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
通过 ChEMBL ID 检索:
molecule = new_client.molecule
aspirin = molecule.get('CHEMBL25')
按名称搜索:
results = molecule.filter(pref_name__icontains='aspirin')
按属性筛选:
# 查找具有良好 LogP 的小分子(MW <= 500)
results = molecule.filter(
molecule_properties__mw_freebase__lte=500,
molecule_properties__alogp__lte=5
)
检索靶点信息:
target = new_client.target
egfr = target.get('CHEMBL203')
搜索特定靶点类型:
# 查找所有激酶靶点
kinases = target.filter(
target_type='SINGLE PROTEIN',
pref_name__icontains='kinase'
)
查询靶点的活性数据:
activity = new_client.activity
# 查找强效的 EGFR 抑制剂
results = activity.filter(
target_chembl_id='CHEMBL203',
standard_type='IC50',
standard_value__lte=100,
standard_units='nM'
)
获取化合物的所有活性数据:
compound_activities = activity.filter(
molecule_chembl_id='CHEMBL25',
pchembl_value__isnull=False
)
相似性搜索:
similarity = new_client.similarity
# 查找与阿司匹林相似的化合物
similar = similarity.filter(
smiles='CC(=O)Oc1ccccc1C(=O)O',
similarity=85 # 85% 相似性阈值
)
子结构搜索:
substructure = new_client.substructure
# 查找包含苯环的化合物
results = substructure.filter(smiles='c1ccccc1')
检索药物数据:
drug = new_client.drug
drug_info = drug.get('CHEMBL25')
获取作用机制:
mechanism = new_client.mechanism
mechanisms = mechanism.filter(molecule_chembl_id='CHEMBL25')
查询药物适应症:
drug_indication = new_client.drug_indication
indications = drug_indication.filter(molecule_chembl_id='CHEMBL25')
通过名称搜索识别靶点:
targets = new_client.target.filter(pref_name__icontains='EGFR')
target_id = targets[0]['target_chembl_id']
查询该靶点的生物活性数据:
activities = new_client.activity.filter(
target_chembl_id=target_id,
standard_type='IC50',
standard_value__lte=100
)
提取化合物 ID 并检索详细信息:
compound_ids = [act['molecule_chembl_id'] for act in activities]
compounds = [new_client.molecule.get(cid) for cid in compound_ids]
获取药物信息:
drug_info = new_client.drug.get('CHEMBL1234')
检索作用机制:
mechanisms = new_client.mechanism.filter(molecule_chembl_id='CHEMBL1234')
查找所有生物活性数据:
activities = new_client.activity.filter(molecule_chembl_id='CHEMBL1234')
查找相似化合物:
similar = new_client.similarity.filter(smiles='query_smiles', similarity=80)
获取每个化合物的活性数据:
for compound in similar:
activities = new_client.activity.filter(
molecule_chembl_id=compound['molecule_chembl_id']
)
使用结果中的分子属性分析属性-活性关系。
ChEMBL 支持 Django 风格的查询过滤器:
__exact - 精确匹配__iexact - 不区分大小写的精确匹配__contains / __icontains - 子字符串匹配__startswith / __endswith - 前缀/后缀匹配__gt, __gte, __lt, __lte - 数值比较__range - 值在范围内__in - 值在列表中__isnull - 空值/非空检查将结果转换为 pandas DataFrame 进行分析:
import pandas as pd
activities = new_client.activity.filter(target_chembl_id='CHEMBL203')
df = pd.DataFrame(list(activities))
# 分析结果
print(df['standard_value'].describe())
print(df.groupby('standard_type').size())
客户端自动缓存结果 24 小时。配置缓存:
from chembl_webresource_client.settings import Settings
# 禁用缓存
Settings.Instance().CACHING = False
# 调整缓存过期时间(秒)
Settings.Instance().CACHE_EXPIRE = 86400
仅在访问数据时才执行查询。转换为列表以强制执行:
# 查询尚未执行
results = molecule.filter(pref_name__icontains='aspirin')
# 强制执行
results_list = list(results)
结果自动分页。遍历所有结果:
for activity in new_client.activity.filter(target_chembl_id='CHEMBL203'):
# 处理每个活性数据
print(activity['molecule_chembl_id'])
# 识别激酶靶点
kinases = new_client.target.filter(
target_type='SINGLE PROTEIN',
pref_name__icontains='kinase'
)
# 获取强效抑制剂
for kinase in kinases[:5]: # 前 5 个激酶
activities = new_client.activity.filter(
target_chembl_id=kinase['target_chembl_id'],
standard_type='IC50',
standard_value__lte=50
)
# 获取已批准药物
drugs = new_client.drug.filter()
# 对于每种药物,查找所有靶点
for drug in drugs[:10]:
mechanisms = new_client.mechanism.filter(
molecule_chembl_id=drug['molecule_chembl_id']
)
# 查找具有所需属性的化合物
candidates = new_client.molecule.filter(
molecule_properties__mw_freebase__range=[300, 500],
molecule_properties__alogp__lte=5,
molecule_properties__hba__lte=10,
molecule_properties__hbd__lte=5
)
演示常见 ChEMBL 查询模式的即用型 Python 函数:
get_molecule_info() - 通过 ID 检索分子详细信息search_molecules_by_name() - 基于名称的分子搜索find_molecules_by_properties() - 基于属性的筛选get_bioactivity_data() - 查询靶点的生物活性数据find_similar_compounds() - 相似性搜索substructure_search() - 子结构匹配get_drug_info() - 检索药物信息find_kinase_inhibitors() - 专门的激酶抑制剂搜索export_to_dataframe() - 将结果转换为 pandas DataFrame有关实现细节和使用示例,请查阅此脚本。
全面的 API 文档,包括:
需要详细 API 信息或排查查询问题时,请参考此文档。
data_validity_comment 字段potential_duplicate 标志pchembl_value 提供归一化的活性值(-log 标度)standard_type 以了解测量类型(IC50、Ki、EC50 等)每周安装数
121
代码仓库
GitHub 星标数
22.6K
首次出现
2026 年 1 月 21 日
安全审计
已安装于
claude-code102
opencode94
gemini-cli89
cursor88
antigravity84
codex79
ChEMBL is a manually curated database of bioactive molecules maintained by the European Bioinformatics Institute (EBI), containing over 2 million compounds, 19 million bioactivity measurements, 13,000+ drug targets, and data on approved drugs and clinical candidates. Access and query this data programmatically using the ChEMBL Python client for drug discovery and medicinal chemistry research.
This skill should be used when:
The ChEMBL Python client is required for programmatic access:
uv pip install chembl_webresource_client
from chembl_webresource_client.new_client import new_client
# Access different endpoints
molecule = new_client.molecule
target = new_client.target
activity = new_client.activity
drug = new_client.drug
Retrieve by ChEMBL ID:
molecule = new_client.molecule
aspirin = molecule.get('CHEMBL25')
Search by name:
results = molecule.filter(pref_name__icontains='aspirin')
Filter by properties:
# Find small molecules (MW <= 500) with favorable LogP
results = molecule.filter(
molecule_properties__mw_freebase__lte=500,
molecule_properties__alogp__lte=5
)
Retrieve target information:
target = new_client.target
egfr = target.get('CHEMBL203')
Search for specific target types:
# Find all kinase targets
kinases = target.filter(
target_type='SINGLE PROTEIN',
pref_name__icontains='kinase'
)
Query activities for a target:
activity = new_client.activity
# Find potent EGFR inhibitors
results = activity.filter(
target_chembl_id='CHEMBL203',
standard_type='IC50',
standard_value__lte=100,
standard_units='nM'
)
Get all activities for a compound:
compound_activities = activity.filter(
molecule_chembl_id='CHEMBL25',
pchembl_value__isnull=False
)
Similarity search:
similarity = new_client.similarity
# Find compounds similar to aspirin
similar = similarity.filter(
smiles='CC(=O)Oc1ccccc1C(=O)O',
similarity=85 # 85% similarity threshold
)
Substructure search:
substructure = new_client.substructure
# Find compounds containing benzene ring
results = substructure.filter(smiles='c1ccccc1')
Retrieve drug data:
drug = new_client.drug
drug_info = drug.get('CHEMBL25')
Get mechanisms of action:
mechanism = new_client.mechanism
mechanisms = mechanism.filter(molecule_chembl_id='CHEMBL25')
Query drug indications:
drug_indication = new_client.drug_indication
indications = drug_indication.filter(molecule_chembl_id='CHEMBL25')
Identify the target by searching by name:
targets = new_client.target.filter(pref_name__icontains='EGFR')
target_id = targets[0]['target_chembl_id']
Query bioactivity data for that target:
activities = new_client.activity.filter(
target_chembl_id=target_id,
standard_type='IC50',
standard_value__lte=100
)
Extract compound IDs and retrieve details:
compound_ids = [act['molecule_chembl_id'] for act in activities]
compounds = [new_client.molecule.get(cid) for cid in compound_ids]
Get drug information :
drug_info = new_client.drug.get('CHEMBL1234')
Retrieve mechanisms :
mechanisms = new_client.mechanism.filter(molecule_chembl_id='CHEMBL1234')
Find all bioactivities :
activities = new_client.activity.filter(molecule_chembl_id='CHEMBL1234')
Find similar compounds :
similar = new_client.similarity.filter(smiles='query_smiles', similarity=80)
Get activities for each compound :
for compound in similar:
activities = new_client.activity.filter(
molecule_chembl_id=compound['molecule_chembl_id']
)
Analyze property-activity relationships using molecular properties from results.
ChEMBL supports Django-style query filters:
__exact - Exact match__iexact - Case-insensitive exact match__contains / __icontains - Substring matching__startswith / __endswith - Prefix/suffix matching__gt, __gte, __lt, __lte - Numeric comparisons__range - Value in rangeConvert results to pandas DataFrame for analysis:
import pandas as pd
activities = new_client.activity.filter(target_chembl_id='CHEMBL203')
df = pd.DataFrame(list(activities))
# Analyze results
print(df['standard_value'].describe())
print(df.groupby('standard_type').size())
The client automatically caches results for 24 hours. Configure caching:
from chembl_webresource_client.settings import Settings
# Disable caching
Settings.Instance().CACHING = False
# Adjust cache expiration (seconds)
Settings.Instance().CACHE_EXPIRE = 86400
Queries execute only when data is accessed. Convert to list to force execution:
# Query is not executed yet
results = molecule.filter(pref_name__icontains='aspirin')
# Force execution
results_list = list(results)
Results are paginated automatically. Iterate through all results:
for activity in new_client.activity.filter(target_chembl_id='CHEMBL203'):
# Process each activity
print(activity['molecule_chembl_id'])
# Identify kinase targets
kinases = new_client.target.filter(
target_type='SINGLE PROTEIN',
pref_name__icontains='kinase'
)
# Get potent inhibitors
for kinase in kinases[:5]: # First 5 kinases
activities = new_client.activity.filter(
target_chembl_id=kinase['target_chembl_id'],
standard_type='IC50',
standard_value__lte=50
)
# Get approved drugs
drugs = new_client.drug.filter()
# For each drug, find all targets
for drug in drugs[:10]:
mechanisms = new_client.mechanism.filter(
molecule_chembl_id=drug['molecule_chembl_id']
)
# Find compounds with desired properties
candidates = new_client.molecule.filter(
molecule_properties__mw_freebase__range=[300, 500],
molecule_properties__alogp__lte=5,
molecule_properties__hba__lte=10,
molecule_properties__hbd__lte=5
)
Ready-to-use Python functions demonstrating common ChEMBL query patterns:
get_molecule_info() - Retrieve molecule details by IDsearch_molecules_by_name() - Name-based molecule searchfind_molecules_by_properties() - Property-based filteringget_bioactivity_data() - Query bioactivities for targetsfind_similar_compounds() - Similarity searchingsubstructure_search() - Substructure matchingget_drug_info() - Retrieve drug informationfind_kinase_inhibitors() - Specialized kinase inhibitor searchexport_to_dataframe() - Convert results to pandas DataFrameConsult this script for implementation details and usage examples.
Comprehensive API documentation including:
Refer to this document when detailed API information is needed or when troubleshooting queries.
data_validity_comment field in activity recordspotential_duplicate flagspchembl_value provides normalized activity (-log scale)standard_type to understand measurement type (IC50, Ki, EC50, etc.)Weekly Installs
121
Repository
GitHub Stars
22.6K
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
claude-code102
opencode94
gemini-cli89
cursor88
antigravity84
codex79
智能OCR文字识别工具 - 支持100+语言,高精度提取图片/PDF/手写文本
1,100 周安装
__in - Value in list__isnull - Null/not null check