tooluniverse-gwas-finemapping by mims-harvard/tooluniverse
npx skills add https://github.com/mims-harvard/tooluniverse --skill tooluniverse-gwas-finemapping利用统计精细定位和位点到基因预测,识别并优先排序 GWAS 位点中的因果变异。
全基因组关联研究(GWAS)识别与性状相关的基因组区域,但连锁不平衡(LD)使得精确定位因果变异变得困难。精细定位使用贝叶斯统计方法,在给定 GWAS 汇总统计数据的条件下,计算每个变异是因果变异的后验概率。
此技能提供工具以:
可信集是一个包含高置信度(通常为 95% 或 99%)因果变异的最小变异集合。集合中的每个变异都有一个后验概率,表示其为因果变异的可能性,该概率使用以下方法计算:
在给定 GWAS 数据和 LD 结构的情况下,某个特定变异是因果变异的概率。后验概率越高 = 越可能是因果变异。
L2G 分数整合多种数据类型来预测变异影响哪个基因:
L2G 分数范围从 0 到 1,分数越高表示基因-变异关联越强。
:"TCF7L2 位点上哪个变异可能是 2 型糖尿病的因果变异?"
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
from python_implementation import prioritize_causal_variants
# 对 TCF7L2 中的糖尿病相关变异进行优先级排序
result = prioritize_causal_variants("TCF7L2", "type 2 diabetes")
print(result.get_summary())
# 输出显示:
# - 包含 TCF7L2 变异的可信集
# - 后验概率(通过精细定位方法)
# - 顶级 L2G 基因(可能受影响的基因)
# - 关联性状
问题:"从精细定位中,我们对 rs429358(APOE4)了解多少?"
# 精细定位特定变异
result = prioritize_causal_variants("rs429358")
# 检查哪些可信集包含此变异
for cs in result.credible_sets:
print(f"Trait: {cs.trait}")
print(f"Fine-mapping method: {cs.finemapping_method}")
print(f"Top gene: {cs.l2g_genes[0] if cs.l2g_genes else 'N/A'}")
print(f"Confidence: {cs.confidence}")
问题:"最近的 T2D 荟萃分析中所有因果位点是什么?"
from python_implementation import get_credible_sets_for_study
# 获取研究中所有精细定位的位点
credible_sets = get_credible_sets_for_study("GCST90029024") # T2D GWAS
print(f"Found {len(credible_sets)} independent loci")
# 检查每个位点
for cs in credible_sets:
print(f"\nRegion: {cs.region}")
print(f"Lead variant: {cs.lead_variant.rs_ids[0] if cs.lead_variant else 'N/A'}")
if cs.l2g_genes:
top_gene = cs.l2g_genes[0]
print(f"Most likely causal gene: {top_gene.gene_symbol} (L2G: {top_gene.l2g_score:.3f})")
问题:"阿尔茨海默病有哪些 GWAS 研究?"
from python_implementation import search_gwas_studies_for_disease
# 按疾病名称搜索
studies = search_gwas_studies_for_disease("Alzheimer's disease")
for study in studies[:5]:
print(f"{study['id']}: {study.get('nSamples', 'N/A')} samples")
print(f" Author: {study.get('publicationFirstAuthor', 'N/A')}")
print(f" Has summary stats: {study.get('hasSumstats', False)}")
# 或使用精确的疾病本体 ID
studies = search_gwas_studies_for_disease(
"Alzheimer's disease",
disease_id="EFO_0000249" # 阿尔茨海默病的 EFO ID
)
问题:"我们应该如何验证顶级因果变异?"
result = prioritize_causal_variants("APOE", "alzheimer")
# 获取实验验证建议
suggestions = result.get_validation_suggestions()
for suggestion in suggestions:
print(suggestion)
# 输出包括:
# - CRISPR 敲入实验
# - 报告基因检测
# - eQTL 分析
# - 共定位研究
from python_implementation import (
prioritize_causal_variants,
search_gwas_studies_for_disease,
get_credible_sets_for_study
)
# 步骤 1:查找相关 GWAS 研究
print("Step 1: Finding T2D GWAS studies...")
studies = search_gwas_studies_for_disease("type 2 diabetes", "MONDO_0005148")
largest_study = max(studies, key=lambda s: s.get('nSamples', 0) or 0)
print(f"Largest study: {largest_study['id']} ({largest_study.get('nSamples', 'N/A')} samples)")
# 步骤 2:获取研究中所有精细定位的位点
print("\nStep 2: Getting fine-mapped loci...")
credible_sets = get_credible_sets_for_study(largest_study['id'], max_sets=100)
print(f"Found {len(credible_sets)} credible sets")
# 步骤 3:查找感兴趣基因附近的位点
print("\nStep 3: Finding TCF7L2 loci...")
tcf7l2_loci = [
cs for cs in credible_sets
if any(gene.gene_symbol == "TCF7L2" for gene in cs.l2g_genes)
]
print(f"TCF7L2 appears in {len(tcf7l2_loci)} loci")
# 步骤 4:对 TCF7L2 位点的变异进行优先级排序
print("\nStep 4: Prioritizing TCF7L2 variants...")
result = prioritize_causal_variants("TCF7L2", "type 2 diabetes")
# 步骤 5:打印摘要和验证计划
print("\n" + "="*60)
print("FINE-MAPPING SUMMARY")
print("="*60)
print(result.get_summary())
print("\n" + "="*60)
print("VALIDATION STRATEGY")
print("="*60)
suggestions = result.get_validation_suggestions()
for suggestion in suggestions:
print(suggestion)
FineMappingResult主要结果对象,包含:
query_variant:变异注释query_gene:基因符号(如果按基因查询)credible_sets:精细定位位点列表associated_traits:所有关联性状top_causal_genes:按分数排序的 L2G 基因方法:
get_summary():人类可读的摘要get_validation_suggestions():实验验证策略CredibleSet表示一个精细定位的位点:
study_locus_id:唯一标识符region:基因组区域(例如 "10:112861809-113404438")lead_variant:按后验概率排序的顶级变异finemapping_method:使用的统计方法(SuSiE、FINEMAP 等)l2g_genes:位点到基因预测confidence:可信集置信度(95%、99%)L2GGene位点到基因预测:
gene_symbol:基因名称(例如 "TCF7L2")gene_id:Ensembl 基因 IDl2g_score:概率分数(0-1)VariantAnnotation变异的功能注释:
variant_id:Open Targets 格式(chr_pos_ref_alt)rs_ids:dbSNP 标识符chromosome, position:基因组坐标most_severe_consequence:功能影响allele_frequencies:人群特异性 MAFOpenTargets_get_variant_info:变异详情和等位基因频率OpenTargets_get_variant_credible_sets:包含某个变异的可信集OpenTargets_get_credible_set_detail:详细的可信集信息OpenTargets_get_study_credible_sets:GWAS 研究中的所有位点OpenTargets_search_gwas_studies_by_disease:按疾病查找研究gwas_search_snps:按基因或 rsID 查找 SNPgwas_get_snp_by_id:详细的 SNP 信息gwas_get_associations_for_snp:某个变异的所有性状关联gwas_search_studies:按疾病/性状查找研究| 方法 | 方法 | 优势 | 使用场景 |
|---|---|---|---|
| SuSiE | 单效应之和 | 处理多个因果变异 | 多信号位点 |
| FINEMAP | 贝叶斯随机搜索 | 快速,可扩展 | 大型研究 |
| PAINTOR | 功能注释 | 整合表观基因组学 | 调控变异 |
| CAVIAR | 共定位 | 发现共享的因果变异 | eQTL 重叠 |
问:为什么不是所有变异都有可信集? 答:精细定位需要:
问:一个变异可以出现在多个可信集中吗? 答:可以!一个变异可能是多个性状的因果变异(多效性),或者在同一性状的不同研究中出现。
问:如果顶级 L2G 基因距离变异很远怎么办? 答:这表明存在调控效应(增强子,启动子)。请检查:
问:如何在可信集中的变异之间进行选择? 答:按以下优先级排序:
每周安装数
117
代码仓库
GitHub 星标数
1.2K
首次出现
Feb 20, 2026
安全审计
安装于
codex115
gemini-cli113
github-copilot113
opencode113
kimi-cli111
amp111
Identify and prioritize causal variants at GWAS loci using statistical fine-mapping and locus-to-gene predictions.
Genome-wide association studies (GWAS) identify genomic regions associated with traits, but linkage disequilibrium (LD) makes it difficult to pinpoint the causal variant. Fine-mapping uses Bayesian statistical methods to compute the posterior probability that each variant is causal, given the GWAS summary statistics.
This skill provides tools to:
A credible set is a minimal set of variants that contains the causal variant with high confidence (typically 95% or 99%). Each variant in the set has a posterior probability of being causal, computed using methods like:
The probability that a specific variant is causal, given the GWAS data and LD structure. Higher posterior probability = more likely to be causal.
L2G scores integrate multiple data types to predict which gene is affected by a variant:
L2G scores range from 0 to 1, with higher scores indicating stronger gene-variant links.
Question : "Which variant at the TCF7L2 locus is likely causal for type 2 diabetes?"
from python_implementation import prioritize_causal_variants
# Prioritize variants in TCF7L2 for diabetes
result = prioritize_causal_variants("TCF7L2", "type 2 diabetes")
print(result.get_summary())
# Output shows:
# - Credible sets containing TCF7L2 variants
# - Posterior probabilities (via fine-mapping methods)
# - Top L2G genes (which genes are likely affected)
# - Associated traits
Question : "What do we know about rs429358 (APOE4) from fine-mapping?"
# Fine-map a specific variant
result = prioritize_causal_variants("rs429358")
# Check which credible sets contain this variant
for cs in result.credible_sets:
print(f"Trait: {cs.trait}")
print(f"Fine-mapping method: {cs.finemapping_method}")
print(f"Top gene: {cs.l2g_genes[0] if cs.l2g_genes else 'N/A'}")
print(f"Confidence: {cs.confidence}")
Question : "What are all the causal loci from the recent T2D meta-analysis?"
from python_implementation import get_credible_sets_for_study
# Get all fine-mapped loci from a study
credible_sets = get_credible_sets_for_study("GCST90029024") # T2D GWAS
print(f"Found {len(credible_sets)} independent loci")
# Examine each locus
for cs in credible_sets:
print(f"\nRegion: {cs.region}")
print(f"Lead variant: {cs.lead_variant.rs_ids[0] if cs.lead_variant else 'N/A'}")
if cs.l2g_genes:
top_gene = cs.l2g_genes[0]
print(f"Most likely causal gene: {top_gene.gene_symbol} (L2G: {top_gene.l2g_score:.3f})")
Question : "What GWAS studies exist for Alzheimer's disease?"
from python_implementation import search_gwas_studies_for_disease
# Search by disease name
studies = search_gwas_studies_for_disease("Alzheimer's disease")
for study in studies[:5]:
print(f"{study['id']}: {study.get('nSamples', 'N/A')} samples")
print(f" Author: {study.get('publicationFirstAuthor', 'N/A')}")
print(f" Has summary stats: {study.get('hasSumstats', False)}")
# Or use precise disease ontology IDs
studies = search_gwas_studies_for_disease(
"Alzheimer's disease",
disease_id="EFO_0000249" # EFO ID for Alzheimer's
)
Question : "How should we validate the top causal variant?"
result = prioritize_causal_variants("APOE", "alzheimer")
# Get experimental validation suggestions
suggestions = result.get_validation_suggestions()
for suggestion in suggestions:
print(suggestion)
# Output includes:
# - CRISPR knock-in experiments
# - Reporter assays
# - eQTL analysis
# - Colocalization studies
from python_implementation import (
prioritize_causal_variants,
search_gwas_studies_for_disease,
get_credible_sets_for_study
)
# Step 1: Find relevant GWAS studies
print("Step 1: Finding T2D GWAS studies...")
studies = search_gwas_studies_for_disease("type 2 diabetes", "MONDO_0005148")
largest_study = max(studies, key=lambda s: s.get('nSamples', 0) or 0)
print(f"Largest study: {largest_study['id']} ({largest_study.get('nSamples', 'N/A')} samples)")
# Step 2: Get all fine-mapped loci from the study
print("\nStep 2: Getting fine-mapped loci...")
credible_sets = get_credible_sets_for_study(largest_study['id'], max_sets=100)
print(f"Found {len(credible_sets)} credible sets")
# Step 3: Find loci near genes of interest
print("\nStep 3: Finding TCF7L2 loci...")
tcf7l2_loci = [
cs for cs in credible_sets
if any(gene.gene_symbol == "TCF7L2" for gene in cs.l2g_genes)
]
print(f"TCF7L2 appears in {len(tcf7l2_loci)} loci")
# Step 4: Prioritize variants at TCF7L2
print("\nStep 4: Prioritizing TCF7L2 variants...")
result = prioritize_causal_variants("TCF7L2", "type 2 diabetes")
# Step 5: Print summary and validation plan
print("\n" + "="*60)
print("FINE-MAPPING SUMMARY")
print("="*60)
print(result.get_summary())
print("\n" + "="*60)
print("VALIDATION STRATEGY")
print("="*60)
suggestions = result.get_validation_suggestions()
for suggestion in suggestions:
print(suggestion)
FineMappingResultMain result object containing:
query_variant: Variant annotationquery_gene: Gene symbol (if queried by gene)credible_sets: List of fine-mapped lociassociated_traits: All associated traitstop_causal_genes: L2G genes ranked by scoreMethods:
get_summary(): Human-readable summaryget_validation_suggestions(): Experimental validation strategiesCredibleSetRepresents a fine-mapped locus:
study_locus_id: Unique identifierregion: Genomic region (e.g., "10:112861809-113404438")lead_variant: Top variant by posterior probabilityfinemapping_method: Statistical method used (SuSiE, FINEMAP, etc.)l2g_genes: Locus-to-gene predictionsconfidence: Credible set confidence (95%, 99%)L2GGeneLocus-to-gene prediction:
gene_symbol: Gene name (e.g., "TCF7L2")gene_id: Ensembl gene IDl2g_score: Probability score (0-1)VariantAnnotationFunctional annotation for a variant:
variant_id: Open Targets format (chr_pos_ref_alt)rs_ids: dbSNP identifierschromosome, position: Genomic coordinatesmost_severe_consequence: Functional impactallele_frequencies: Population-specific MAFsOpenTargets_get_variant_info: Variant details and allele frequenciesOpenTargets_get_variant_credible_sets: Credible sets containing a variantOpenTargets_get_credible_set_detail: Detailed credible set informationOpenTargets_get_study_credible_sets: All loci from a GWAS studyOpenTargets_search_gwas_studies_by_disease: Find studies by diseasegwas_search_snps: Find SNPs by gene or rsIDgwas_get_snp_by_id: Detailed SNP informationgwas_get_associations_for_snp: All trait associations for a variantgwas_search_studies: Find studies by disease/trait| Method | Approach | Strengths | Use Case |
|---|---|---|---|
| SuSiE | Sum of Single Effects | Handles multiple causal variants | Multi-signal loci |
| FINEMAP | Bayesian shotgun stochastic search | Fast, scalable | Large studies |
| PAINTOR | Functional annotations | Integrates epigenomics | Regulatory variants |
| CAVIAR | Colocalization | Finds shared causal variants | eQTL overlap |
Q: Why don't all variants have credible sets? A: Fine-mapping requires:
Q: Can a variant be in multiple credible sets? A: Yes! A variant can be causal for multiple traits (pleiotropy) or appear in different studies for the same trait.
Q: What if the top L2G gene is far from the variant? A: This suggests regulatory effects (enhancers, promoters). Check:
Q: How do I choose between variants in a credible set? A: Prioritize by:
Weekly Installs
117
Repository
GitHub Stars
1.2K
First Seen
Feb 20, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
codex115
gemini-cli113
github-copilot113
opencode113
kimi-cli111
amp111
前端代码审计工具 - 自动化检测可访问性、性能、响应式设计、主题化与反模式
49,600 周安装
Markdown转HTML专业技能 - 使用marked.js、Pandoc和Hugo实现高效文档转换
8,200 周安装
GitHub Copilot 技能模板制作指南 - 创建自定义 Agent Skills 分步教程
8,200 周安装
ImageMagick图像处理技能:批量调整大小、格式转换与元数据提取
8,200 周安装
GitHub Actions 工作流规范创建指南:AI优化模板与CI/CD流程设计
8,200 周安装
GitHub Copilot SDK 官方开发包 - 在应用中嵌入AI智能体工作流(Python/TypeScript/Go/.NET)
8,200 周安装
AI提示工程安全审查与改进指南 - 负责任AI开发、偏见检测与提示优化
8,200 周安装