gwas-database by davila7/claude-code-templates
npx skills add https://github.com/davila7/claude-code-templates --skill gwas-databaseGWAS Catalog 是一个由美国国家人类基因组研究所 (NHGRI) 和欧洲生物信息学研究所 (EBI) 维护的、收录已发表全基因组关联研究的综合性知识库。该目录包含了来自数千项 GWAS 出版物的经过人工整理的 SNP-性状关联数据,包括遗传变异、相关性状与疾病、p 值、效应大小以及许多研究的完整汇总统计数据。
当查询涉及以下内容时,应使用此技能:
GWAS Catalog 围绕四个核心实体组织:
关键标识符:
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
GCSTrs 编号(例如,rs7903146)或 variant_id 格式位于 https://www.ebi.ac.uk/gwas/ 的 Web 界面支持多种搜索模式:
按变异(rs ID):
rs7903146
返回此 SNP 的所有性状关联。
按疾病/性状:
type 2 diabetes
Parkinson disease
body mass index
返回所有相关的遗传变异。
按基因:
APOE
TCF7L2
返回基因区域内或附近的变异。
按染色体区域:
10:114000000-115000000
返回指定基因组区间内的变异。
按出版物:
PMID:20581827
Author: McCarthy MI
GCST001234
返回研究详情和所有报告的关联。
GWAS Catalog 提供两个 REST API 用于程序化访问:
基础 URL:
https://www.ebi.ac.uk/gwas/rest/apihttps://www.ebi.ac.uk/gwas/summary-statistics/apiAPI 文档:
核心端点:
研究端点 - /studies/{accessionID}
import requests
# 获取特定研究
url = "https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001795"
response = requests.get(url, headers={"Content-Type": "application/json"})
study = response.json()
关联端点 - /associations
# 查找某个变异的关联
variant = "rs7903146"
url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{variant}/associations"
params = {"projection": "associationBySnp"}
response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
associations = response.json()
变异端点 - /singleNucleotidePolymorphisms/{rsID}
# 获取变异详情
url = "https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/rs7903146"
response = requests.get(url, headers={"Content-Type": "application/json"})
variant_info = response.json()
性状端点 - /efoTraits/{efoID}
# 获取性状信息
url = "https://www.ebi.ac.uk/gwas/rest/api/efoTraits/EFO_0001360"
response = requests.get(url, headers={"Content-Type": "application/json"})
trait_info = response.json()
示例 1:查找某个疾病的所有关联
import requests
trait = "EFO_0001360" # 2 型糖尿病
base_url = "https://www.ebi.ac.uk/gwas/rest/api"
# 查询此性状的关联
url = f"{base_url}/efoTraits/{trait}/associations"
response = requests.get(url, headers={"Content-Type": "application/json"})
associations = response.json()
# 处理结果
for assoc in associations.get('_embedded', {}).get('associations', []):
variant = assoc.get('rsId')
pvalue = assoc.get('pvalue')
risk_allele = assoc.get('strongestAllele')
print(f"{variant}: p={pvalue}, risk allele={risk_allele}")
示例 2:获取变异信息及其所有性状关联
import requests
variant = "rs7903146"
base_url = "https://www.ebi.ac.uk/gwas/rest/api"
# 获取变异详情
url = f"{base_url}/singleNucleotidePolymorphisms/{variant}"
response = requests.get(url, headers={"Content-Type": "application/json"})
variant_data = response.json()
# 获取此变异的所有关联
url = f"{base_url}/singleNucleotidePolymorphisms/{variant}/associations"
params = {"projection": "associationBySnp"}
response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
associations = response.json()
# 提取性状名称和 p 值
for assoc in associations.get('_embedded', {}).get('associations', []):
trait = assoc.get('efoTrait')
pvalue = assoc.get('pvalue')
print(f"Trait: {trait}, p-value: {pvalue}")
示例 3:访问汇总统计数据
import requests
# 查询汇总统计数据 API
base_url = "https://www.ebi.ac.uk/gwas/summary-statistics/api"
# 按性状和 p 值阈值查找关联
trait = "EFO_0001360" # 2 型糖尿病
p_upper = "0.000000001" # p < 1e-9
url = f"{base_url}/traits/{trait}/associations"
params = {
"p_upper": p_upper,
"size": 100 # 结果数量
}
response = requests.get(url, params=params)
results = response.json()
# 处理全基因组显著位点
for hit in results.get('_embedded', {}).get('associations', []):
variant_id = hit.get('variant_id')
chromosome = hit.get('chromosome')
position = hit.get('base_pair_location')
pvalue = hit.get('p_value')
print(f"{chromosome}:{position} ({variant_id}): p={pvalue}")
示例 4:按染色体区域查询
import requests
# 查找特定基因组区域的变异
chromosome = "10"
start_pos = 114000000
end_pos = 115000000
base_url = "https://www.ebi.ac.uk/gwas/rest/api"
url = f"{base_url}/singleNucleotidePolymorphisms/search/findByChromBpLocationRange"
params = {
"chrom": chromosome,
"bpStart": start_pos,
"bpEnd": end_pos
}
response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
variants_in_region = response.json()
GWAS Catalog 托管了许多研究的完整汇总统计数据,提供对所有测试变异(不仅仅是全基因组显著位点)的访问。
访问方法:
汇总统计数据 API 功能:
示例:下载某个研究的汇总统计数据
import requests
import gzip
# 获取可用的汇总统计数据
base_url = "https://www.ebi.ac.uk/gwas/summary-statistics/api"
url = f"{base_url}/studies/GCST001234"
response = requests.get(url)
study_info = response.json()
# 响应中提供了下载链接
# 或者,使用 FTP:
# ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCSTXXXXXX/
GWAS Catalog 提供指向外部资源的链接:
基因组数据库:
功能资源:
表型资源:
在 API 响应中跟踪链接:
import requests
# API 响应包含相关资源的 _links
response = requests.get("https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001234")
study = response.json()
# 跟踪指向关联的链接
associations_url = study['_links']['associations']['href']
associations_response = requests.get(associations_url)
使用 EFO 术语或自由文本识别性状:
通过 API 查询关联:
url = f"https://www.ebi.ac.uk/gwas/rest/api/efoTraits/{efo_id}/associations"
按显著性和群体筛选:
提取变异详情:
与其他数据库交叉引用:
查询该变异:
url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}"
检索所有性状关联:
url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}/associations"
分析多效性:
检查基因组背景:
在 Web 界面中按基因符号搜索,或使用:
url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/search/findByGene"
params = {"geneName": gene_symbol}
检索基因区域的变异:
分析关联模式:
功能解释:
识别具有汇总统计数据的研究:
下载汇总统计数据:
# 通过 FTP
wget ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCSTXXXXXX/harmonised/GCSTXXXXXX-harmonised.tsv.gz
通过 API 查询特定变异:
url = f"https://www.ebi.ac.uk/gwas/summary-statistics/api/chromosomes/{chrom}/associations"
params = {"start": start_pos, "end": end_pos}
处理和分析:
关联记录中的关键字段:
rsId:变异标识符(rs 编号)strongestAllele:关联的风险等位基因pvalue:关联 p 值pvalueText:文本形式的 p 值(可能包含不等式)orPerCopyNum:比值比或 beta 系数betaNum:效应大小(针对数量性状)betaUnit:beta 的测量单位range:置信区间efoTrait:相关性状名称mappedLabel:EFO 映射的性状术语研究元数据字段:
accessionId:GCST 研究标识符pubmedId:PubMed IDauthor:第一作者publicationDate:发表日期ancestryInitial:发现群体的祖先ancestryReplication:验证群体的祖先sampleSize:总样本量分页: 结果采用分页(默认每页 20 项)。使用以下参数导航:
size 参数:每页结果数量page 参数:页码(从 0 开始)_links:下一页/上一页的 URL查询和分析 GWAS 数据的完整工作流:
import requests
import pandas as pd
from time import sleep
def query_gwas_catalog(trait_id, p_threshold=5e-8):
"""
查询 GWAS Catalog 获取性状关联
Args:
trait_id: EFO 性状标识符(例如,'EFO_0001360')
p_threshold: 用于筛选的 P 值阈值
Returns:
包含关联结果的 pandas DataFrame
"""
base_url = "https://www.ebi.ac.uk/gwas/rest/api"
url = f"{base_url}/efoTraits/{trait_id}/associations"
headers = {"Content-Type": "application/json"}
results = []
page = 0
while True:
params = {"page": page, "size": 100}
response = requests.get(url, params=params, headers=headers)
if response.status_code != 200:
break
data = response.json()
associations = data.get('_embedded', {}).get('associations', [])
if not associations:
break
for assoc in associations:
pvalue = assoc.get('pvalue')
if pvalue and float(pvalue) <= p_threshold:
results.append({
'variant': assoc.get('rsId'),
'pvalue': pvalue,
'risk_allele': assoc.get('strongestAllele'),
'or_beta': assoc.get('orPerCopyNum') or assoc.get('betaNum'),
'trait': assoc.get('efoTrait'),
'pubmed_id': assoc.get('pubmedId')
})
page += 1
sleep(0.1) # 速率限制
return pd.DataFrame(results)
# 示例用法
df = query_gwas_catalog('EFO_0001360') # 2 型糖尿病
print(df.head())
print(f"\nTotal associations: {len(df)}")
print(f"Unique variants: {df['variant'].nunique()}")
全面的 API 文档,包括:
在以下情况下查阅此参考:
GWAS Catalog 团队提供研讨会材料:
使用 GWAS Catalog 数据时,请引用:
每周安装数
130
仓库
GitHub 星标数
22.6K
首次出现
2026 年 1 月 21 日
安全审计
安装于
claude-code110
opencode105
cursor99
gemini-cli98
codex90
antigravity88
The GWAS Catalog is a comprehensive repository of published genome-wide association studies maintained by the National Human Genome Research Institute (NHGRI) and the European Bioinformatics Institute (EBI). The catalog contains curated SNP-trait associations from thousands of GWAS publications, including genetic variants, associated traits and diseases, p-values, effect sizes, and full summary statistics for many studies.
This skill should be used when queries involve:
The GWAS Catalog is organized around four core entities:
Key Identifiers:
GCST IDs (e.g., GCST001234)rs numbers (e.g., rs7903146) or variant_id formatThe web interface at https://www.ebi.ac.uk/gwas/ supports multiple search modes:
By Variant (rs ID):
rs7903146
Returns all trait associations for this SNP.
By Disease/Trait:
type 2 diabetes
Parkinson disease
body mass index
Returns all associated genetic variants.
By Gene:
APOE
TCF7L2
Returns variants in or near the gene region.
By Chromosomal Region:
10:114000000-115000000
Returns variants in the specified genomic interval.
By Publication:
PMID:20581827
Author: McCarthy MI
GCST001234
Returns study details and all reported associations.
The GWAS Catalog provides two REST APIs for programmatic access:
Base URLs:
https://www.ebi.ac.uk/gwas/rest/apihttps://www.ebi.ac.uk/gwas/summary-statistics/apiAPI Documentation:
Core Endpoints:
Studies endpoint - /studies/{accessionID}
import requests
# Get a specific study
url = "https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001795"
response = requests.get(url, headers={"Content-Type": "application/json"})
study = response.json()
Associations endpoint - /associations
# Find associations for a variant
variant = "rs7903146"
url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{variant}/associations"
params = {"projection": "associationBySnp"}
response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
associations = response.json()
Variants endpoint - /singleNucleotidePolymorphisms/{rsID}
# Get variant details
url = "https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/rs7903146"
response = requests.get(url, headers={"Content-Type": "application/json"})
variant_info = response.json()
Example 1: Find all associations for a disease
import requests
trait = "EFO_0001360" # Type 2 diabetes
base_url = "https://www.ebi.ac.uk/gwas/rest/api"
# Query associations for this trait
url = f"{base_url}/efoTraits/{trait}/associations"
response = requests.get(url, headers={"Content-Type": "application/json"})
associations = response.json()
# Process results
for assoc in associations.get('_embedded', {}).get('associations', []):
variant = assoc.get('rsId')
pvalue = assoc.get('pvalue')
risk_allele = assoc.get('strongestAllele')
print(f"{variant}: p={pvalue}, risk allele={risk_allele}")
Example 2: Get variant information and all trait associations
import requests
variant = "rs7903146"
base_url = "https://www.ebi.ac.uk/gwas/rest/api"
# Get variant details
url = f"{base_url}/singleNucleotidePolymorphisms/{variant}"
response = requests.get(url, headers={"Content-Type": "application/json"})
variant_data = response.json()
# Get all associations for this variant
url = f"{base_url}/singleNucleotidePolymorphisms/{variant}/associations"
params = {"projection": "associationBySnp"}
response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
associations = response.json()
# Extract trait names and p-values
for assoc in associations.get('_embedded', {}).get('associations', []):
trait = assoc.get('efoTrait')
pvalue = assoc.get('pvalue')
print(f"Trait: {trait}, p-value: {pvalue}")
Example 3: Access summary statistics
import requests
# Query summary statistics API
base_url = "https://www.ebi.ac.uk/gwas/summary-statistics/api"
# Find associations by trait with p-value threshold
trait = "EFO_0001360" # Type 2 diabetes
p_upper = "0.000000001" # p < 1e-9
url = f"{base_url}/traits/{trait}/associations"
params = {
"p_upper": p_upper,
"size": 100 # Number of results
}
response = requests.get(url, params=params)
results = response.json()
# Process genome-wide significant hits
for hit in results.get('_embedded', {}).get('associations', []):
variant_id = hit.get('variant_id')
chromosome = hit.get('chromosome')
position = hit.get('base_pair_location')
pvalue = hit.get('p_value')
print(f"{chromosome}:{position} ({variant_id}): p={pvalue}")
Example 4: Query by chromosomal region
import requests
# Find variants in a specific genomic region
chromosome = "10"
start_pos = 114000000
end_pos = 115000000
base_url = "https://www.ebi.ac.uk/gwas/rest/api"
url = f"{base_url}/singleNucleotidePolymorphisms/search/findByChromBpLocationRange"
params = {
"chrom": chromosome,
"bpStart": start_pos,
"bpEnd": end_pos
}
response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
variants_in_region = response.json()
The GWAS Catalog hosts full summary statistics for many studies, providing access to all tested variants (not just genome-wide significant hits).
Access Methods:
Summary Statistics API Features:
Example: Download summary statistics for a study
import requests
import gzip
# Get available summary statistics
base_url = "https://www.ebi.ac.uk/gwas/summary-statistics/api"
url = f"{base_url}/studies/GCST001234"
response = requests.get(url)
study_info = response.json()
# Download link is provided in the response
# Alternatively, use FTP:
# ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCSTXXXXXX/
The GWAS Catalog provides links to external resources:
Genomic Databases:
Functional Resources:
Phenotype Resources:
Following Links in API Responses:
import requests
# API responses include _links for related resources
response = requests.get("https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001234")
study = response.json()
# Follow link to associations
associations_url = study['_links']['associations']['href']
associations_response = requests.get(associations_url)
Identify the trait using EFO terms or free text:
Query associations via API:
url = f"https://www.ebi.ac.uk/gwas/rest/api/efoTraits/{efo_id}/associations"
Filter by significance and population:
Extract variant details:
Cross-reference with other databases:
Query the variant:
url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}"
Retrieve all trait associations:
url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}/associations"
Analyze pleiotropy:
Check genomic context:
Search by gene symbol in web interface or:
url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/search/findByGene"
params = {"geneName": gene_symbol}
Retrieve variants in gene region:
Analyze association patterns:
Functional interpretation:
Define research question:
Comprehensive variant extraction:
Quality assessment:
Data synthesis:
Export and documentation:
Identify studies with summary statistics:
Download summary statistics:
# Via FTP
wget ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCSTXXXXXX/harmonised/GCSTXXXXXX-harmonised.tsv.gz
Query via API for specific variants:
url = f"https://www.ebi.ac.uk/gwas/summary-statistics/api/chromosomes/{chrom}/associations"
params = {"start": start_pos, "end": end_pos}
Process and analyze:
Key Fields in Association Records:
rsId: Variant identifier (rs number)strongestAllele: Risk allele for the associationpvalue: Association p-valuepvalueText: P-value as text (may include inequality)orPerCopyNum: Odds ratio or beta coefficientbetaNum: Effect size (for quantitative traits)betaUnit: Unit of measurement for betarange: Confidence intervalefoTrait: Associated trait namemappedLabel: EFO-mapped trait termStudy Metadata Fields:
accessionId: GCST study identifierpubmedId: PubMed IDauthor: First authorpublicationDate: Publication dateancestryInitial: Discovery population ancestryancestryReplication: Replication population ancestrysampleSize: Total sample sizePagination: Results are paginated (default 20 items per page). Navigate using:
size parameter: Number of results per pagepage parameter: Page number (0-indexed)_links in response: URLs for next/previous pagesComplete workflow for querying and analyzing GWAS data:
import requests
import pandas as pd
from time import sleep
def query_gwas_catalog(trait_id, p_threshold=5e-8):
"""
Query GWAS Catalog for trait associations
Args:
trait_id: EFO trait identifier (e.g., 'EFO_0001360')
p_threshold: P-value threshold for filtering
Returns:
pandas DataFrame with association results
"""
base_url = "https://www.ebi.ac.uk/gwas/rest/api"
url = f"{base_url}/efoTraits/{trait_id}/associations"
headers = {"Content-Type": "application/json"}
results = []
page = 0
while True:
params = {"page": page, "size": 100}
response = requests.get(url, params=params, headers=headers)
if response.status_code != 200:
break
data = response.json()
associations = data.get('_embedded', {}).get('associations', [])
if not associations:
break
for assoc in associations:
pvalue = assoc.get('pvalue')
if pvalue and float(pvalue) <= p_threshold:
results.append({
'variant': assoc.get('rsId'),
'pvalue': pvalue,
'risk_allele': assoc.get('strongestAllele'),
'or_beta': assoc.get('orPerCopyNum') or assoc.get('betaNum'),
'trait': assoc.get('efoTrait'),
'pubmed_id': assoc.get('pubmedId')
})
page += 1
sleep(0.1) # Rate limiting
return pd.DataFrame(results)
# Example usage
df = query_gwas_catalog('EFO_0001360') # Type 2 diabetes
print(df.head())
print(f"\nTotal associations: {len(df)}")
print(f"Unique variants: {df['variant'].nunique()}")
Comprehensive API documentation including:
Consult this reference when:
The GWAS Catalog team provides workshop materials:
When using GWAS Catalog data, cite:
Weekly Installs
130
Repository
GitHub Stars
22.6K
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
claude-code110
opencode105
cursor99
gemini-cli98
codex90
antigravity88
Excel财务建模规范与xlsx文件处理指南:专业格式、零错误公式与数据分析
43,800 周安装
Traits endpoint - /efoTraits/{efoID}
# Get trait information
url = "https://www.ebi.ac.uk/gwas/rest/api/efoTraits/EFO_0001360"
response = requests.get(url, headers={"Content-Type": "application/json"})
trait_info = response.json()