clinvar-database by davila7/claude-code-templates
npx skills add https://github.com/davila7/claude-code-templates --skill clinvar-databaseClinVar 是美国国家生物技术信息中心(NCBI)维护的一个免费访问的数据库,用于存档人类遗传变异与表型之间关系的报告及其支持性证据。该数据库汇总了基因组变异及其与人类健康关系的信息,提供了临床遗传学和研究中使用的标准化变异分类。
在以下情况下应使用此技能:
通过网页界面在 https://www.ncbi.nlm.nih.gov/clinvar/ 搜索 ClinVar
常用搜索模式:
BRCA1[gene]pathogenic[CLNSIG]breast cancer[disorder]NM_000059.3:c.1310_1313del[variant name]广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
13[chr]BRCA1[gene] AND pathogenic[CLNSIG]使用 NCBI 的 E-utilities API 以编程方式访问 ClinVar。请参阅 references/api_reference.md 获取完整的 API 文档,包括:
使用 curl 的快速示例:
# 搜索致病性 BRCA1 变异
curl "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=clinvar&term=BRCA1[gene]+AND+pathogenic[CLNSIG]&retmode=json"
最佳实践:
Entrez.emailClinVar 使用标准化的术语进行变异分类。请参阅 references/clinical_significance.md 获取详细的解读指南。
关键种系分类术语(ACMG/AMP):
审阅状态(星级评级):
关键注意事项:
从 ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/ 下载完整的数据集
请参阅 references/data_formats.md 获取关于文件格式和处理的全面文档。
更新计划:
XML 文件(最全面):
xml/clinvar_variation/ - 以变异为中心的聚合xml/RCV/ - 变异-疾病对VCF 文件(用于基因组流程):
vcf_GRCh37/clinvar.vcf.gzvcf_GRCh38/clinvar.vcf.gz制表符分隔文件(用于快速分析):
tab_delimited/variant_summary.txt.gz - 所有变异的摘要tab_delimited/var_citations.txt.gz - PubMed 引用tab_delimited/cross_references.txt.gz - 数据库交叉引用下载示例:
# 下载最新的月度 XML 发布版本
wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/clinvar_variation/ClinVarVariationRelease_00-latest.xml.gz
# 下载 GRCh38 的 VCF 文件
wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz
处理 XML 文件以提取变异详情、分类和证据。
使用 xml.etree 的 Python 示例:
import gzip
import xml.etree.ElementTree as ET
with gzip.open('ClinVarVariationRelease.xml.gz', 'rt') as f:
for event, elem in ET.iterparse(f, events=('end',)):
if elem.tag == 'VariationArchive':
variation_id = elem.attrib.get('VariationID')
# 提取临床意义、审阅状态等
elem.clear() # 释放内存
使用 bcftools 或 Python 注释变异调用或按临床意义过滤。
使用 bcftools:
# 过滤致病性变异
bcftools view -i 'INFO/CLNSIG~"Pathogenic"' clinvar.vcf.gz
# 提取特定基因
bcftools view -i 'INFO/GENEINFO~"BRCA"' clinvar.vcf.gz
# 用 ClinVar 注释你的 VCF
bcftools annotate -a clinvar.vcf.gz -c INFO your_variants.vcf
在 Python 中使用 PyVCF:
import vcf
vcf_reader = vcf.Reader(filename='clinvar.vcf.gz')
for record in vcf_reader:
clnsig = record.INFO.get('CLNSIG', [])
if 'Pathogenic' in clnsig:
gene = record.INFO.get('GENEINFO', [''])[0]
print(f"{record.CHROM}:{record.POS} {gene} - {clnsig}")
使用 pandas 或命令行工具进行快速过滤和分析。
使用 pandas:
import pandas as pd
# 加载变异摘要
df = pd.read_csv('variant_summary.txt.gz', sep='\t', compression='gzip')
# 过滤特定基因中的致病性变异
pathogenic_brca = df[
(df['GeneSymbol'] == 'BRCA1') &
(df['ClinicalSignificance'].str.contains('Pathogenic', na=False))
]
# 按临床意义统计变异数量
sig_counts = df['ClinicalSignificance'].value_counts()
使用命令行工具:
# 提取特定基因的致病性变异
zcat variant_summary.txt.gz | \
awk -F'\t' '$7=="TP53" && $13~"Pathogenic"' | \
cut -f1,5,7,13,14
当多个提交者对同一变异提供不同的分类时,ClinVar 会报告“致病性解读存在矛盾”。
解决策略:
排除矛盾的搜索查询:
TP53[gene] AND pathogenic[CLNSIG] NOT conflicting[RVSTAT]
随着新证据的出现,变异分类可能会随时间改变。
分类改变的原因:
最佳实践:
组织可以向 ClinVar 提交变异解读。
提交方法:
references/api_reference.md要求:
联系方式:clinvar@ncbi.nlm.nih.gov 以设置提交账户。
目标: 查找 CFTR 基因中经过专家小组审阅的致病性变异。
步骤:
CFTR[gene] AND pathogenic[CLNSIG] AND (reviewed by expert panel[RVSTAT] OR practice guideline[RVSTAT])
目标: 向变异调用添加临床意义注释。
步骤:
wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz
wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz.tbi
bcftools annotate -a clinvar.vcf.gz \
-c INFO/CLNSIG,INFO/CLNDN,INFO/CLNREVSTAT \
-o annotated_variants.vcf \
your_variants.vcf
bcftools view -i 'INFO/CLNSIG~"Pathogenic"' annotated_variants.vcf
目标: 研究与遗传性乳腺癌相关的所有变异。
步骤:
hereditary breast cancer[disorder] OR "Breast-ovarian cancer, familial"[disorder]
目标: 为分析流程构建本地 ClinVar 数据库。
步骤:
wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/clinvar_variation/ClinVarVariationRelease_YYYY-MM.xml.gz
此技能包含全面的参考文档:
references/api_reference.md - 完整的 E-utilities API 文档,包含 esearch、esummary、efetch 和 elink 的示例;包括速率限制、身份验证以及 Python/Biopython 代码示例references/clinical_significance.md - 解读临床意义分类、审阅状态星级评级、冲突解决和变异解读最佳实践的详细指南references/data_formats.md - XML、VCF 和制表符分隔文件格式的文档;FTP 目录结构、处理示例和格式选择指南有关 ClinVar 或数据提交的问题:clinvar@ncbi.nlm.nih.gov
每周安装次数
124
代码仓库
GitHub 星标数
22.6K
首次出现
2026年1月21日
安全审计
安装于
claude-code104
opencode96
cursor92
gemini-cli91
antigravity87
codex81
ClinVar is NCBI's freely accessible archive of reports on relationships between human genetic variants and phenotypes, with supporting evidence. The database aggregates information about genomic variation and its relationship to human health, providing standardized variant classifications used in clinical genetics and research.
This skill should be used when:
Search ClinVar using the web interface at https://www.ncbi.nlm.nih.gov/clinvar/
Common search patterns:
BRCA1[gene]pathogenic[CLNSIG]breast cancer[disorder]NM_000059.3:c.1310_1313del[variant name]13[chr]BRCA1[gene] AND pathogenic[CLNSIG]Access ClinVar programmatically using NCBI's E-utilities API. Refer to references/api_reference.md for comprehensive API documentation including:
Quick example using curl:
# Search for pathogenic BRCA1 variants
curl "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=clinvar&term=BRCA1[gene]+AND+pathogenic[CLNSIG]&retmode=json"
Best practices:
Entrez.email when using BiopythonClinVar uses standardized terminology for variant classifications. Refer to references/clinical_significance.md for detailed interpretation guidelines.
Key germline classification terms (ACMG/AMP):
Review status (star ratings):
Critical considerations:
Download complete datasets from ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/
Refer to references/data_formats.md for comprehensive documentation on file formats and processing.
Update schedule:
XML files (most comprehensive):
xml/clinvar_variation/ - Variant-centric aggregationxml/RCV/ - Variant-condition pairsVCF files (for genomic pipelines):
vcf_GRCh37/clinvar.vcf.gzvcf_GRCh38/clinvar.vcf.gzTab-delimited files (for quick analysis):
tab_delimited/variant_summary.txt.gz - Summary of all variantstab_delimited/var_citations.txt.gz - PubMed citationstab_delimited/cross_references.txt.gz - Database cross-referencesExample download:
# Download latest monthly XML release
wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/clinvar_variation/ClinVarVariationRelease_00-latest.xml.gz
# Download VCF for GRCh38
wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz
Process XML files to extract variant details, classifications, and evidence.
Python example with xml.etree:
import gzip
import xml.etree.ElementTree as ET
with gzip.open('ClinVarVariationRelease.xml.gz', 'rt') as f:
for event, elem in ET.iterparse(f, events=('end',)):
if elem.tag == 'VariationArchive':
variation_id = elem.attrib.get('VariationID')
# Extract clinical significance, review status, etc.
elem.clear() # Free memory
Annotate variant calls or filter by clinical significance using bcftools or Python.
Using bcftools:
# Filter pathogenic variants
bcftools view -i 'INFO/CLNSIG~"Pathogenic"' clinvar.vcf.gz
# Extract specific genes
bcftools view -i 'INFO/GENEINFO~"BRCA"' clinvar.vcf.gz
# Annotate your VCF with ClinVar
bcftools annotate -a clinvar.vcf.gz -c INFO your_variants.vcf
Using PyVCF in Python:
import vcf
vcf_reader = vcf.Reader(filename='clinvar.vcf.gz')
for record in vcf_reader:
clnsig = record.INFO.get('CLNSIG', [])
if 'Pathogenic' in clnsig:
gene = record.INFO.get('GENEINFO', [''])[0]
print(f"{record.CHROM}:{record.POS} {gene} - {clnsig}")
Use pandas or command-line tools for rapid filtering and analysis.
Using pandas:
import pandas as pd
# Load variant summary
df = pd.read_csv('variant_summary.txt.gz', sep='\t', compression='gzip')
# Filter pathogenic variants in specific gene
pathogenic_brca = df[
(df['GeneSymbol'] == 'BRCA1') &
(df['ClinicalSignificance'].str.contains('Pathogenic', na=False))
]
# Count variants by clinical significance
sig_counts = df['ClinicalSignificance'].value_counts()
Using command-line tools:
# Extract pathogenic variants for specific gene
zcat variant_summary.txt.gz | \
awk -F'\t' '$7=="TP53" && $13~"Pathogenic"' | \
cut -f1,5,7,13,14
When multiple submitters provide different classifications for the same variant, ClinVar reports "Conflicting interpretations of pathogenicity."
Resolution strategy:
Search query to exclude conflicts:
TP53[gene] AND pathogenic[CLNSIG] NOT conflicting[RVSTAT]
Variant classifications may change over time as new evidence emerges.
Why classifications change:
Best practices:
Organizations can submit variant interpretations to ClinVar.
Submission methods:
references/api_reference.mdRequirements:
Contact: clinvar@ncbi.nlm.nih.gov for submission account setup.
Objective: Find pathogenic variants in CFTR gene with expert panel review.
Steps:
Search using web interface or E-utilities:
CFTR[gene] AND pathogenic[CLNSIG] AND (reviewed by expert panel[RVSTAT] OR practice guideline[RVSTAT])
Review results, noting review status (should be ★★★ or ★★★★)
Export variant list or retrieve full records via efetch
Cross-reference with clinical presentation if applicable
Objective: Add clinical significance annotations to variant calls.
Steps:
Download appropriate ClinVar VCF (match genome build: GRCh37 or GRCh38):
wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz
wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz.tbi
Annotate using bcftools:
bcftools annotate -a clinvar.vcf.gz \
-c INFO/CLNSIG,INFO/CLNDN,INFO/CLNREVSTAT \
-o annotated_variants.vcf \
your_variants.vcf
Filter annotated VCF for pathogenic variants:
bcftools view -i 'INFO/CLNSIG~"Pathogenic"' annotated_variants.vcf
Objective: Study all variants associated with hereditary breast cancer.
Steps:
Search by condition:
hereditary breast cancer[disorder] OR "Breast-ovarian cancer, familial"[disorder]
Download results as CSV or retrieve via E-utilities
Filter by review status to prioritize high-confidence variants
Analyze distribution across genes (BRCA1, BRCA2, PALB2, etc.)
Examine variants with conflicting interpretations separately
Objective: Build a local ClinVar database for analysis pipeline.
Steps:
Download monthly release for reproducibility:
wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/clinvar_variation/ClinVarVariationRelease_YYYY-MM.xml.gz
Parse XML and load into database (PostgreSQL, MySQL, MongoDB)
Index by gene, position, clinical significance, review status
Implement version tracking for updates
Schedule monthly updates from FTP site
This skill includes comprehensive reference documentation:
references/api_reference.md - Complete E-utilities API documentation with examples for esearch, esummary, efetch, and elink; includes rate limits, authentication, and Python/Biopython code samples
references/clinical_significance.md - Detailed guide to interpreting clinical significance classifications, review status star ratings, conflict resolution, and best practices for variant interpretation
references/data_formats.md - Documentation for XML, VCF, and tab-delimited file formats; FTP directory structure, processing examples, and format selection guidance
For questions about ClinVar or data submission: clinvar@ncbi.nlm.nih.gov
Weekly Installs
124
Repository
GitHub Stars
22.6K
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubFailSocketPassSnykWarn
Installed on
claude-code104
opencode96
cursor92
gemini-cli91
antigravity87
codex81
免费AI数据抓取智能体:自动化收集、丰富与存储网站/API数据
1,100 周安装