tooluniverse-variant-analysis by mims-harvard/tooluniverse
npx skills add https://github.com/mims-harvard/tooluniverse --skill tooluniverse-variant-analysis结合本地生物信息学计算与 ToolUniverse 数据库集成的生产就绪 VCF 处理与变异注释技能。旨在回答关于 VCF 数据、突变分类、变异筛选和临床注释的生物信息学分析问题。
触发条件:
示例问题:
| 能力 | 描述 |
|---|---|
| VCF 解析 | 纯 Python + cyvcf2 解析器。支持 VCF 4.x、gzip 压缩、多样本、SNV/indel/SV |
| 突变分类 | 将 SO 术语、SnpEff ANN、VEP CSQ、GATK Funcotator 映射到标准类型 |
| VAF 提取 |
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| 处理 AF、AD、AO/RO、NR/NV、INFO AF 格式 |
| 筛选 | VAF、深度、质量、PASS、变异类型、突变类型、后果、染色体、SV 大小 |
| 统计 | Ti/Tv 比率、每样本 VAF/深度统计、突变类型分布、SV 大小分布 |
| 注释 | MyVariant.info(聚合 ClinVar、dbSNP、gnomAD、CADD、SIFT、PolyPhen) |
| SV/CNV 分析 | gnomAD SV 群体频率、DGVa/dbVar 已知 SV、ClinGen 剂量敏感性 |
| 临床解读 | 使用单倍体不足/三倍体敏感评分进行 ACMG/ClinGen CNV 致病性分类 |
| DataFrame | 转换为 pandas 以进行高级分析 |
| 报告 | 包含表格和统计信息的 Markdown 报告、SV 临床报告 |
Input VCF File (SNVs/indels or SVs)
|
v
Phase 1: Parse VCF
|-- Pure Python parser (any VCF 4.x)
|-- cyvcf2 parser (faster, C-based)
|-- Extract: CHROM, POS, REF, ALT, QUAL, FILTER, INFO, FORMAT, samples
|-- Extract per-sample: GT, VAF, depth
|-- Extract annotations from INFO (ANN, CSQ, FUNCOTATION)
|-- Detect variant class: SNV/indel vs SV/CNV
|
v
Phase 2: Classify Variants
|-- Variant type: SNV, INS, DEL, MNV, COMPLEX, SV
|-- Mutation type: missense, nonsense, synonymous, frameshift, splice, etc.
|-- Impact: HIGH, MODERATE, LOW, MODIFIER
|-- SV type: DEL, DUP, INV, BND, CNV (if structural variant)
|
v
Phase 3: Apply Filters
|-- VAF range (min/max)
|-- Read depth minimum
|-- Quality threshold
|-- PASS only
|-- Variant/mutation type inclusion/exclusion
|-- Consequence exclusion (intronic, intergenic)
|-- Population frequency range
|-- Chromosome selection
|-- SV size range (for structural variants)
|
v
Phase 4: Compute Statistics
|-- Variant type distribution
|-- Mutation type distribution
|-- Impact distribution
|-- Chromosome distribution
|-- Ti/Tv ratio (for SNVs)
|-- Per-sample VAF/depth stats
|-- Gene mutation counts
|-- SV size distribution (for structural variants)
|
v
Phase 5: Annotate with ToolUniverse (optional)
|-- MyVariant.info: ClinVar, dbSNP, gnomAD, CADD, SIFT, PolyPhen
|-- dbSNP: Population frequencies, gene associations
|-- gnomAD: Population allele frequencies
|-- Ensembl VEP: Consequence prediction
|
v
Phase 6: Generate Report / Answer Question
|-- Markdown report with tables
|-- Direct answer to specific question
|-- DataFrame for downstream analysis
|
v
Phase 7: Structural Variant & CNV Analysis (if SV/CNV detected)
|-- Annotate with gnomAD SV population frequencies
|-- Query DGVa/dbVar for known SVs (Ensembl)
|-- Identify affected genes
|-- Query ClinGen dosage sensitivity (HI/TS scores)
|-- Classify pathogenicity (Pathogenic/Likely Pathogenic/VUS/Benign)
|-- Generate SV clinical report with ACMG/ClinGen guidelines
使用 pandas 的场景:
使用 python_implementation 工具的场景:
关键函数:
vcf_data = parse_vcf("input.vcf") # Pure Python (always works)
vcf_data = parse_vcf_cyvcf2("input.vcf") # Fast C-based (if installed)
df = variants_to_dataframe(vcf_data.variants, sample="TUMOR") # For pandas
从注释自动分类:
支持的突变类型:missense、nonsense、synonymous、frameshift、splice_site、splice_region、inframe_insertion、inframe_deletion、intronic、intergenic、UTR_5、UTR_3、upstream、downstream、stop_lost、start_lost
详见 references/mutation_classification_guide.md 获取完整信息
常见筛选模式:
# Somatic-like variants
criteria = FilterCriteria(
min_vaf=0.05, max_vaf=0.95,
min_depth=20, pass_only=True,
exclude_consequences=["intronic", "intergenic", "upstream", "downstream"]
)
# High-confidence germline
criteria = FilterCriteria(
min_vaf=0.25, min_depth=30, pass_only=True,
chromosomes=["1", "2", ..., "22", "X", "Y"]
)
# Rare pathogenic candidates
criteria = FilterCriteria(
min_depth=20, pass_only=True,
mutation_types=["missense", "nonsense", "frameshift"]
)
详见 references/vcf_filtering.md 获取所有筛选选项
使用 pandas 的场景:
使用 python_implementation 的场景:
何时使用 ToolUniverse 注释工具:
最佳实践:
关键工具:
MyVariant_query_variants:批量注释(ClinVar、dbSNP、gnomAD、CADD)dbsnp_get_variant_by_rsid:群体频率gnomad_get_variant:基本变异元数据EnsemblVEP_annotate_rsid:后果预测详见 references/annotation_guide.md 获取详细示例
报告包含:
当 VCF 包含 SV 调用时(SVTYPE=DEL/DUP/INV/BND):
识别受影响的基因(从 VCF 注释或坐标重叠)
查询 ClinGen 剂量敏感性:
clingen = ClinGen_dosage_by_gene(gene_symbol="BRCA1") # Returns: haploinsufficiency_score, triplosensitivity_score
检查群体频率:
gnomad_sv = gnomad_get_sv_by_gene(gene_symbol="BRCA1") # Returns: SVs with AF, AC, AN
分类致病性:
ClinGen 剂量评分解读:
详见 references/sv_cnv_analysis.md 获取完整的 SV 工作流程
问题:"VAF < X 的变异中,有多少比例被注释为 Y 突变?"
result = answer_vaf_mutation_fraction(
vcf_path="input.vcf",
max_vaf=0.3,
mutation_type="missense",
sample="TUMOR"
)
# Returns: fraction, total_below_vaf, matching_mutation_type
问题:"队列之间的突变频率差异是什么?"
result = answer_cohort_comparison(
vcf_paths=["cohort1.vcf", "cohort2.vcf"],
mutation_type="missense",
cohort_names=["Treatment", "Control"]
)
# Returns: cohorts, frequency_difference
问题:"筛选 X 后,还剩下多少个 Y?"
result = answer_non_reference_after_filter(
vcf_path="input.vcf",
exclude_intronic_intergenic=True
)
# Returns: total_input, non_reference, remaining
| 工具 | 何时使用 | 参数 | 响应 |
|---|---|---|---|
MyVariant_query_variants | 批量注释 | query (rsID/HGVS) | ClinVar、dbSNP、gnomAD、CADD |
dbsnp_get_variant_by_rsid | 群体频率 | rsid | 频率、临床意义 |
gnomad_get_variant | gnomAD 元数据 | variant_id (CHR-POS-REF-ALT) | 基本变异信息 |
EnsemblVEP_annotate_rsid | 后果预测 | variant_id (rsID) | 转录本影响 |
| 工具 | 何时使用 | 参数 | 响应 |
|---|---|---|---|
gnomad_get_sv_by_gene | SV 群体频率 | gene_symbol | 带有 AF、AC、AN 的 SV |
gnomad_get_sv_by_region | 区域 SV 搜索 | chrom、start、end | 区域内的 SV |
ClinGen_dosage_by_gene | 剂量敏感性 | gene_symbol | HI/TS 评分、疾病 |
ClinGen_dosage_region_search | 区域内剂量敏感基因 | chromosome、start、end | 所有带有 HI/TS 评分的基因 |
ensembl_get_structural_variants | DGVa/dbVar 中的已知 SV | chrom、start、end、species | 临床意义 |
详见 references/annotation_guide.md 获取详细的工具使用示例
解析 VCF,计算统计信息,生成报告。
report = variant_analysis_pipeline("input.vcf", output_file="report.md")
解析 VCF,应用多标准筛选,计算筛选集的统计信息。
report = variant_analysis_pipeline(
vcf_path="input.vcf",
filters=FilterCriteria(min_vaf=0.1, min_depth=20, pass_only=True),
output_file="filtered_report.md"
)
解析 VCF,用 ClinVar/gnomAD/CADD 注释顶级变异,生成临床报告。
report = variant_analysis_pipeline(
vcf_path="input.vcf",
annotate=True,
max_annotate=50,
output_file="annotated_report.md"
)
解析 VCF,应用特定筛选器,计算针对性统计信息以回答精确问题。
result = answer_vaf_mutation_fraction(
vcf_path="input.vcf",
max_vaf=0.3,
mutation_type="missense"
)
解析多个 VCF,比较队列间的突变频率。
result = answer_cohort_comparison(
vcf_paths=["cohort1.vcf", "cohort2.vcf"],
mutation_type="missense"
)
使用 pandas 的场景:
使用 python_implementation 的场景:
最佳方法:使用 python_implementation 进行解析/分类,然后转换为 DataFrame 进行自定义分析:
# Parse and classify
vcf_data = parse_vcf("input.vcf")
passing, failing = filter_variants(vcf_data.variants, criteria)
# Convert to DataFrame for custom analysis
df = variants_to_dataframe(passing, sample="TUMOR")
# Now use pandas
missense_high_vaf = df[(df['mutation_type'] == 'missense') & (df['vaf'] >= 0.3)]
参见 QUICK_START.md 了解:
每周安装次数
119
代码仓库
GitHub 星标数
1.2K
首次出现
2026年2月19日
安全审计
安装于
gemini-cli116
codex116
github-copilot115
opencode115
cursor113
kimi-cli112
Production-ready VCF processing and variant annotation skill combining local bioinformatics computation with ToolUniverse database integration. Designed to answer bioinformatics analysis questions about VCF data, mutation classification, variant filtering, and clinical annotation.
Triggers :
Example Questions :
| Capability | Description |
|---|---|
| VCF Parsing | Pure Python + cyvcf2 parsers. VCF 4.x, gzipped, multi-sample, SNV/indel/SV |
| Mutation Classification | Maps SO terms, SnpEff ANN, VEP CSQ, GATK Funcotator to standard types |
| VAF Extraction | Handles AF, AD, AO/RO, NR/NV, INFO AF formats |
| Filtering | VAF, depth, quality, PASS, variant type, mutation type, consequence, chromosome, SV size |
| Statistics | Ti/Tv ratio, per-sample VAF/depth stats, mutation type distribution, SV size distribution |
| Annotation | MyVariant.info (aggregates ClinVar, dbSNP, gnomAD, CADD, SIFT, PolyPhen) |
| SV/CNV Analysis | gnomAD SV population frequencies, DGVa/dbVar known SVs, ClinGen dosage sensitivity |
| Clinical Interpretation | ACMG/ClinGen CNV pathogenicity classification using haploinsufficiency/triplosensitivity scores |
| DataFrame | Convert to pandas for advanced analytics |
| Reporting | Markdown reports with tables and statistics, SV clinical reports |
Input VCF File (SNVs/indels or SVs)
|
v
Phase 1: Parse VCF
|-- Pure Python parser (any VCF 4.x)
|-- cyvcf2 parser (faster, C-based)
|-- Extract: CHROM, POS, REF, ALT, QUAL, FILTER, INFO, FORMAT, samples
|-- Extract per-sample: GT, VAF, depth
|-- Extract annotations from INFO (ANN, CSQ, FUNCOTATION)
|-- Detect variant class: SNV/indel vs SV/CNV
|
v
Phase 2: Classify Variants
|-- Variant type: SNV, INS, DEL, MNV, COMPLEX, SV
|-- Mutation type: missense, nonsense, synonymous, frameshift, splice, etc.
|-- Impact: HIGH, MODERATE, LOW, MODIFIER
|-- SV type: DEL, DUP, INV, BND, CNV (if structural variant)
|
v
Phase 3: Apply Filters
|-- VAF range (min/max)
|-- Read depth minimum
|-- Quality threshold
|-- PASS only
|-- Variant/mutation type inclusion/exclusion
|-- Consequence exclusion (intronic, intergenic)
|-- Population frequency range
|-- Chromosome selection
|-- SV size range (for structural variants)
|
v
Phase 4: Compute Statistics
|-- Variant type distribution
|-- Mutation type distribution
|-- Impact distribution
|-- Chromosome distribution
|-- Ti/Tv ratio (for SNVs)
|-- Per-sample VAF/depth stats
|-- Gene mutation counts
|-- SV size distribution (for structural variants)
|
v
Phase 5: Annotate with ToolUniverse (optional)
|-- MyVariant.info: ClinVar, dbSNP, gnomAD, CADD, SIFT, PolyPhen
|-- dbSNP: Population frequencies, gene associations
|-- gnomAD: Population allele frequencies
|-- Ensembl VEP: Consequence prediction
|
v
Phase 6: Generate Report / Answer Question
|-- Markdown report with tables
|-- Direct answer to specific question
|-- DataFrame for downstream analysis
|
v
Phase 7: Structural Variant & CNV Analysis (if SV/CNV detected)
|-- Annotate with gnomAD SV population frequencies
|-- Query DGVa/dbVar for known SVs (Ensembl)
|-- Identify affected genes
|-- Query ClinGen dosage sensitivity (HI/TS scores)
|-- Classify pathogenicity (Pathogenic/Likely Pathogenic/VUS/Benign)
|-- Generate SV clinical report with ACMG/ClinGen guidelines
Use pandas for :
Use python_implementation tools for :
Key functions :
vcf_data = parse_vcf("input.vcf") # Pure Python (always works)
vcf_data = parse_vcf_cyvcf2("input.vcf") # Fast C-based (if installed)
df = variants_to_dataframe(vcf_data.variants, sample="TUMOR") # For pandas
Automatic classification from annotations :
Mutation types supported : missense, nonsense, synonymous, frameshift, splice_site, splice_region, inframe_insertion, inframe_deletion, intronic, intergenic, UTR_5, UTR_3, upstream, downstream, stop_lost, start_lost
See references/mutation_classification_guide.md for full details
Common filtering patterns :
# Somatic-like variants
criteria = FilterCriteria(
min_vaf=0.05, max_vaf=0.95,
min_depth=20, pass_only=True,
exclude_consequences=["intronic", "intergenic", "upstream", "downstream"]
)
# High-confidence germline
criteria = FilterCriteria(
min_vaf=0.25, min_depth=30, pass_only=True,
chromosomes=["1", "2", ..., "22", "X", "Y"]
)
# Rare pathogenic candidates
criteria = FilterCriteria(
min_depth=20, pass_only=True,
mutation_types=["missense", "nonsense", "frameshift"]
)
See references/vcf_filtering.md for all filter options
Use pandas for :
Use python_implementation for :
When to use ToolUniverse annotation tools :
Best practices :
Key tools :
MyVariant_query_variants: Batch annotation (ClinVar, dbSNP, gnomAD, CADD)dbsnp_get_variant_by_rsid: Population frequenciesgnomad_get_variant: Basic variant metadataEnsemblVEP_annotate_rsid: Consequence predictionSee references/annotation_guide.md for detailed examples
Report includes :
When VCF contains SV calls (SVTYPE=DEL/DUP/INV/BND):
Identify affected genes (from VCF annotation or coordinate overlap)
Query ClinGen dosage sensitivity :
clingen = ClinGen_dosage_by_gene(gene_symbol="BRCA1") # Returns: haploinsufficiency_score, triplosensitivity_score
Check population frequency :
gnomad_sv = gnomad_get_sv_by_gene(gene_symbol="BRCA1") # Returns: SVs with AF, AC, AN
Classify pathogenicity :
ClinGen dosage score interpretation :
See references/sv_cnv_analysis.md for full SV workflow
Question : "What fraction of variants with VAF < X are annotated as Y mutations?"
result = answer_vaf_mutation_fraction(
vcf_path="input.vcf",
max_vaf=0.3,
mutation_type="missense",
sample="TUMOR"
)
# Returns: fraction, total_below_vaf, matching_mutation_type
Question : "What is the difference in mutation frequency between cohorts?"
result = answer_cohort_comparison(
vcf_paths=["cohort1.vcf", "cohort2.vcf"],
mutation_type="missense",
cohort_names=["Treatment", "Control"]
)
# Returns: cohorts, frequency_difference
Question : "After filtering X, how many Y remain?"
result = answer_non_reference_after_filter(
vcf_path="input.vcf",
exclude_intronic_intergenic=True
)
# Returns: total_input, non_reference, remaining
| Tool | When to Use | Parameters | Response |
|---|---|---|---|
MyVariant_query_variants | Batch annotation | query (rsID/HGVS) | ClinVar, dbSNP, gnomAD, CADD |
dbsnp_get_variant_by_rsid | Population frequencies | rsid | Frequencies, clinical significance |
gnomad_get_variant | gnomAD metadata | variant_id (CHR-POS-REF-ALT) |
| Tool | When to Use | Parameters | Response |
|---|---|---|---|
gnomad_get_sv_by_gene | SV population frequency | gene_symbol | SVs with AF, AC, AN |
gnomad_get_sv_by_region | Regional SV search | chrom, start, end | SVs in region |
ClinGen_dosage_by_gene |
See references/annotation_guide.md for detailed tool usage examples
Parse VCF, compute statistics, generate report.
report = variant_analysis_pipeline("input.vcf", output_file="report.md")
Parse VCF, apply multi-criteria filter, compute statistics on filtered set.
report = variant_analysis_pipeline(
vcf_path="input.vcf",
filters=FilterCriteria(min_vaf=0.1, min_depth=20, pass_only=True),
output_file="filtered_report.md"
)
Parse VCF, annotate top variants with ClinVar/gnomAD/CADD, generate clinical report.
report = variant_analysis_pipeline(
vcf_path="input.vcf",
annotate=True,
max_annotate=50,
output_file="annotated_report.md"
)
Parse VCF, apply specific filters, compute targeted statistics to answer precise questions.
result = answer_vaf_mutation_fraction(
vcf_path="input.vcf",
max_vaf=0.3,
mutation_type="missense"
)
Parse multiple VCFs, compare mutation frequencies across cohorts.
result = answer_cohort_comparison(
vcf_paths=["cohort1.vcf", "cohort2.vcf"],
mutation_type="missense"
)
Use pandas when :
Use python_implementation when :
Best approach : Use python_implementation for parsing/classification, then convert to DataFrame for custom analysis:
# Parse and classify
vcf_data = parse_vcf("input.vcf")
passing, failing = filter_variants(vcf_data.variants, criteria)
# Convert to DataFrame for custom analysis
df = variants_to_dataframe(passing, sample="TUMOR")
# Now use pandas
missense_high_vaf = df[(df['mutation_type'] == 'missense') & (df['vaf'] >= 0.3)]
See QUICK_START.md for:
Weekly Installs
119
Repository
GitHub Stars
1.2K
First Seen
Feb 19, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
gemini-cli116
codex116
github-copilot115
opencode115
cursor113
kimi-cli112
免费AI数据抓取智能体:自动化收集、丰富与存储网站/API数据
1,100 周安装
阿里云KMS密钥管理服务最小可行性测试 - 安全性与API验证指南
275 周安装
移动应用调试指南:iOS、Android、React Native 调试技巧与性能优化
71 周安装
阿里云DataLake DLF测试指南:最小可行性测试步骤与预期结果
275 周安装
阿里云AI技能测试指南:alicloud-ai-misc-crawl-and-skill-test 最小化验证与错误排查
272 周安装
阿里云AI图像编辑测试技能 - 通义千问图像编辑最小可行测试验证
273 周安装
Docker容器化最佳实践指南:生产就绪容器构建、安全优化与CI/CD部署
280 周安装
| Basic variant info |
EnsemblVEP_annotate_rsid | Consequence prediction | variant_id (rsID) | Transcript impact |
| Dosage sensitivity |
gene_symbol |
| HI/TS scores, disease |
ClinGen_dosage_region_search | Dosage-sensitive genes in region | chromosome, start, end | All genes with HI/TS scores |
ensembl_get_structural_variants | Known SVs from DGVa/dbVar | chrom, start, end, species | Clinical significance |