GWAS Catalog数据库使用指南：SNP查询、性状关联分析与API访问教程

gwas-database by davila7/claude-code-templates

152 周安装量

23,500 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/davila7/claude-code-templates --skill gwas-database

数据分析科研工具生物信息学

🇨🇳中文介绍

GWAS Catalog 数据库

概述

GWAS Catalog 是一个由美国国家人类基因组研究所 (NHGRI) 和欧洲生物信息学研究所 (EBI) 维护的、收录已发表全基因组关联研究的综合性知识库。该目录包含了来自数千项 GWAS 出版物的经过人工整理的 SNP-性状关联数据，包括遗传变异、相关性状与疾病、p 值、效应大小以及许多研究的完整汇总统计数据。

何时使用此技能

当查询涉及以下内容时，应使用此技能：

遗传变异关联：查找与疾病或性状相关的 SNP
SNP 查询：检索特定遗传变异（rs ID）的信息
性状/疾病搜索：发现表型的遗传关联
基因关联：查找特定基因内部或附近的变异
GWAS 汇总统计数据：访问完整的全基因组关联数据
研究元数据：检索出版物和队列信息
群体遗传学：探索祖先特异性关联
多基因风险评分：识别用于风险预测模型的变异
功能基因组学：理解变异效应和基因组背景
系统综述：对遗传关联进行全面的文献综合

核心功能

1. 理解 GWAS Catalog 数据结构

GWAS Catalog 围绕四个核心实体组织：

研究：包含元数据（PMID、作者、队列详情）的 GWAS 出版物
关联：具有统计学证据（p ≤ 5×10⁻⁸）的 SNP-性状关联
变异：具有基因组坐标和等位基因的遗传标记（SNP）
性状：表型和疾病（映射到 EFO 本体术语）

关键标识符：

研究编号： ID（例如，GCST001234）

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

4. 查询示例与模式

示例 1：查找某个疾病的所有关联

import requests

trait = "EFO_0001360"  # 2 型糖尿病
base_url = "https://www.ebi.ac.uk/gwas/rest/api"

# 查询此性状的关联
url = f"{base_url}/efoTraits/{trait}/associations"
response = requests.get(url, headers={"Content-Type": "application/json"})
associations = response.json()

# 处理结果
for assoc in associations.get('_embedded', {}).get('associations', []):
    variant = assoc.get('rsId')
    pvalue = assoc.get('pvalue')
    risk_allele = assoc.get('strongestAllele')
    print(f"{variant}: p={pvalue}, risk allele={risk_allele}")

示例 2：获取变异信息及其所有性状关联

import requests

variant = "rs7903146"
base_url = "https://www.ebi.ac.uk/gwas/rest/api"

# 获取变异详情
url = f"{base_url}/singleNucleotidePolymorphisms/{variant}"
response = requests.get(url, headers={"Content-Type": "application/json"})
variant_data = response.json()

# 获取此变异的所有关联
url = f"{base_url}/singleNucleotidePolymorphisms/{variant}/associations"
params = {"projection": "associationBySnp"}
response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
associations = response.json()

# 提取性状名称和 p 值
for assoc in associations.get('_embedded', {}).get('associations', []):
    trait = assoc.get('efoTrait')
    pvalue = assoc.get('pvalue')
    print(f"Trait: {trait}, p-value: {pvalue}")

示例 3：访问汇总统计数据

import requests

# 查询汇总统计数据 API
base_url = "https://www.ebi.ac.uk/gwas/summary-statistics/api"

# 按性状和 p 值阈值查找关联
trait = "EFO_0001360"  # 2 型糖尿病
p_upper = "0.000000001"  # p < 1e-9
url = f"{base_url}/traits/{trait}/associations"
params = {
    "p_upper": p_upper,
    "size": 100  # 结果数量
}
response = requests.get(url, params=params)
results = response.json()

# 处理全基因组显著位点
for hit in results.get('_embedded', {}).get('associations', []):
    variant_id = hit.get('variant_id')
    chromosome = hit.get('chromosome')
    position = hit.get('base_pair_location')
    pvalue = hit.get('p_value')
    print(f"{chromosome}:{position} ({variant_id}): p={pvalue}")

示例 4：按染色体区域查询

import requests

# 查找特定基因组区域的变异
chromosome = "10"
start_pos = 114000000
end_pos = 115000000

base_url = "https://www.ebi.ac.uk/gwas/rest/api"
url = f"{base_url}/singleNucleotidePolymorphisms/search/findByChromBpLocationRange"
params = {
    "chrom": chromosome,
    "bpStart": start_pos,
    "bpEnd": end_pos
}
response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
variants_in_region = response.json()

5. 处理汇总统计数据

GWAS Catalog 托管了许多研究的完整汇总统计数据，提供对所有测试变异（不仅仅是全基因组显著位点）的访问。

FTP 下载：http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/
REST API：基于查询的汇总统计数据访问
Web 界面：通过网站浏览和下载

汇总统计数据 API 功能：

按染色体、位置、p 值筛选
跨研究查询特定变异
检索效应大小和等位基因频率
访问经过协调和标准化的数据

示例：下载某个研究的汇总统计数据

import requests
import gzip

# 获取可用的汇总统计数据
base_url = "https://www.ebi.ac.uk/gwas/summary-statistics/api"
url = f"{base_url}/studies/GCST001234"
response = requests.get(url)
study_info = response.json()

# 响应中提供了下载链接
# 或者，使用 FTP：
# ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCSTXXXXXX/

6. 数据整合与交叉引用

GWAS Catalog 提供指向外部资源的链接：

基因组数据库：

Ensembl：基因注释和变异效应
dbSNP：变异标识符和群体频率
gnomAD：群体等位基因频率

Open Targets：靶点-疾病关联
PGS Catalog：多基因风险评分
UCSC Genome Browser：基因组背景

EFO（实验因子本体）：标准化性状术语
OMIM：疾病-基因关系
Disease Ontology：疾病层级结构

在 API 响应中跟踪链接：

import requests

# API 响应包含相关资源的 _links
response = requests.get("https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001234")
study = response.json()

# 跟踪指向关联的链接
associations_url = study['_links']['associations']['href']
associations_response = requests.get(associations_url)

工作流 1：探索某个疾病的遗传关联

使用 EFO 术语或自由文本识别性状：
- 在 Web 界面中搜索疾病名称
- 记下 EFO ID（例如，2 型糖尿病对应 EFO_0001360）

通过 API 查询关联：

url = f"https://www.ebi.ac.uk/gwas/rest/api/efoTraits/{efo_id}/associations"

按显著性和群体筛选：
- 检查 p 值（全基因组显著：p ≤ 5×10⁻⁸）
- 查看研究元数据中的祖先信息
- 按样本量或发现/验证状态筛选
提取变异详情：
- 每个关联的 rs ID
- 效应等位基因和方向
- 效应大小（比值比、beta 系数）
- 群体等位基因频率
与其他数据库交叉引用：
- 在 Ensembl 中查找变异效应
- 在 gnomAD 中检查群体频率
- 探索基因功能和通路

工作流 2：研究特定遗传变异

查询该变异：

url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}"

检索所有性状关联：

url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}/associations"

分析多效性：
- 识别与此变异相关的所有性状
- 审查跨性状的效应方向
- 寻找共享的生物学通路
检查基因组背景：
- 确定附近的基因
- 识别变异是否位于编码/调控区域
- 审查与其他变异的连锁不平衡关系

工作流 3：以基因为中心的关联分析

在 Web 界面中按基因符号搜索，或使用：

url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/search/findByGene"
params = {"geneName": gene_symbol}

检索基因区域的变异：
- 获取基因的染色体坐标
- 查询区域内的变异
- 包含启动子和调控区域（扩展边界）
分析关联模式：
- 识别与此基因内变异相关的性状
- 寻找跨研究的一致关联
- 审查效应大小和方向
功能解释：
- 确定变异效应（错义、调控等）
- 检查表达数量性状位点 (eQTL) 数据
- 审查通路和网络背景

工作流 4：遗传证据的系统性综述

定义研究问题：
- 感兴趣的特定性状或疾病
- 群体考量
- 研究设计要求
全面提取变异：
- 查询性状的所有关联
- 设置显著性阈值
- 记录发现和验证研究
质量评估：
- 审查研究样本量
- 检查群体多样性
- 评估跨研究的异质性
- 识别潜在偏倚
数据综合：
- 跨研究聚合关联
- 如果适用，进行荟萃分析
- 创建汇总表
- 生成曼哈顿图或森林图
导出和文档化：
- 下载完整的关联数据
- 如有需要，导出汇总统计数据
- 记录搜索策略和日期
- 创建可复现的分析脚本

工作流 5：访问和分析汇总统计数据

识别具有汇总统计数据的研究：
- 浏览汇总统计数据门户
- 检查 FTP 目录列表
- 查询 API 获取可用研究

下载汇总统计数据：

# 通过 FTP
wget ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCSTXXXXXX/harmonised/GCSTXXXXXX-harmonised.tsv.gz

通过 API 查询特定变异：

url = f"https://www.ebi.ac.uk/gwas/summary-statistics/api/chromosomes/{chrom}/associations"
params = {"start": start_pos, "end": end_pos}

处理和分析：
- 按 p 值阈值筛选
- 提取效应大小和置信区间
- 执行下游分析（精细定位、共定位等）

响应格式与数据字段

关联记录中的关键字段：

rsId：变异标识符（rs 编号）
strongestAllele：关联的风险等位基因
pvalue：关联 p 值
pvalueText：文本形式的 p 值（可能包含不等式）
orPerCopyNum：比值比或 beta 系数
betaNum：效应大小（针对数量性状）
betaUnit：beta 的测量单位
range：置信区间
efoTrait：相关性状名称
mappedLabel：EFO 映射的性状术语

研究元数据字段：

accessionId：GCST 研究标识符
pubmedId：PubMed ID
author：第一作者
publicationDate：发表日期
ancestryInitial：发现群体的祖先
ancestryReplication：验证群体的祖先
sampleSize：总样本量

分页： 结果采用分页（默认每页 20 项）。使用以下参数导航：

size 参数：每页结果数量
page 参数：页码（从 0 开始）
响应中的 _links：下一页/上一页的 URL

从 Web 界面开始，以识别相关的 EFO 术语和研究编号
使用 API 进行批量数据提取和自动化分析
为大型结果集实现分页处理
缓存 API 响应以减少冗余请求

始终检查 p 值阈值（全基因组：5×10⁻⁸）
审查祖先信息以了解群体适用性
评估证据强度时考虑样本量
检查跨独立研究的可重复性
注意效应大小估计中的“赢者诅咒”

速率限制与伦理

遵守 API 使用指南（避免过多请求）
使用汇总统计数据下载进行全基因组分析
在 API 调用之间实施适当的延迟
执行迭代分析时在本地缓存结果
在出版物中引用 GWAS Catalog

GWAS Catalog 整理已发表的关联（可能包含不一致之处）
效应大小按发表时报告（可能需要协调）
一些研究报告的是条件关联或联合关联
合并结果时检查研究重叠
注意检出和选择偏倚

查询和分析 GWAS 数据的完整工作流：

import requests
import pandas as pd
from time import sleep

def query_gwas_catalog(trait_id, p_threshold=5e-8):
    """
    查询 GWAS Catalog 获取性状关联

    Args:
        trait_id: EFO 性状标识符（例如，'EFO_0001360'）
        p_threshold: 用于筛选的 P 值阈值

    Returns:
        包含关联结果的 pandas DataFrame
    """
    base_url = "https://www.ebi.ac.uk/gwas/rest/api"
    url = f"{base_url}/efoTraits/{trait_id}/associations"

    headers = {"Content-Type": "application/json"}
    results = []
    page = 0

    while True:
        params = {"page": page, "size": 100}
        response = requests.get(url, params=params, headers=headers)

        if response.status_code != 200:
            break

        data = response.json()
        associations = data.get('_embedded', {}).get('associations', [])

        if not associations:
            break

        for assoc in associations:
            pvalue = assoc.get('pvalue')
            if pvalue and float(pvalue) <= p_threshold:
                results.append({
                    'variant': assoc.get('rsId'),
                    'pvalue': pvalue,
                    'risk_allele': assoc.get('strongestAllele'),
                    'or_beta': assoc.get('orPerCopyNum') or assoc.get('betaNum'),
                    'trait': assoc.get('efoTrait'),
                    'pubmed_id': assoc.get('pubmedId')
                })

        page += 1
        sleep(0.1)  # 速率限制

    return pd.DataFrame(results)

# 示例用法
df = query_gwas_catalog('EFO_0001360')  # 2 型糖尿病
print(df.head())
print(f"\nTotal associations: {len(df)}")
print(f"Unique variants: {df['variant'].nunique()}")

references/api_reference.md

全面的 API 文档，包括：

两个 API 的详细端点规范
查询参数和筛选器的完整列表
响应格式规范和字段描述
高级查询示例和模式
错误处理和故障排除
与外部数据库的集成

在以下情况下查阅此参考：

构建复杂的 API 查询
理解响应结构
实现分页或批量操作
排查 API 错误
探索高级筛选选项

GWAS Catalog 团队提供研讨会材料：

GitHub 仓库：https://github.com/EBISPOT/GWAS_Catalog-workshop
包含示例查询的 Jupyter notebooks
用于云端执行的 Google Colab 集成

GWAS Catalog 定期更新新出版物
定期重新运行查询以确保全面覆盖
随着研究发布数据，汇总统计数据会不断添加
EFO 映射可能随时间更新

使用 GWAS Catalog 数据时，请引用：

Sollis E, et al. (2023) The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Research. PMID: 37953337
在可用时包含访问日期和版本
讨论具体发现时引用原始研究

并非所有 GWAS 出版物都包含在内（适用整理标准）
仅部分研究提供完整的汇总统计数据
效应大小可能需要跨研究协调
群体多样性在增长，但历史上有限
某些关联代表条件效应或联合效应

Web 界面：免费，无需注册
REST API：免费，无需 API 密钥
FTP 下载：开放访问
API 适用速率限制（请合理使用）

GWAS Catalog 网站：https://www.ebi.ac.uk/gwas/
文档：https://www.ebi.ac.uk/gwas/docs
API 文档：https://www.ebi.ac.uk/gwas/rest/docs/api
汇总统计数据 API：https://www.ebi.ac.uk/gwas/summary-statistics/docs/
FTP 站点：http://ftp.ebi.ac.uk/pub/databases/gwas/
培训材料：https://github.com/EBISPOT/GWAS_Catalog-workshop
PGS Catalog（多基因评分）：https://www.pgscatalog.org/
帮助与支持：gwas-info@ebi.ac.uk

2026 年 1 月 21 日

🇺🇸English

GWAS Catalog Database

Overview

The GWAS Catalog is a comprehensive repository of published genome-wide association studies maintained by the National Human Genome Research Institute (NHGRI) and the European Bioinformatics Institute (EBI). The catalog contains curated SNP-trait associations from thousands of GWAS publications, including genetic variants, associated traits and diseases, p-values, effect sizes, and full summary statistics for many studies.

When to Use This Skill

This skill should be used when queries involve:

Genetic variant associations : Finding SNPs associated with diseases or traits
SNP lookups : Retrieving information about specific genetic variants (rs IDs)
Trait/disease searches : Discovering genetic associations for phenotypes
Gene associations : Finding variants in or near specific genes
GWAS summary statistics : Accessing complete genome-wide association data
Study metadata : Retrieving publication and cohort information
Population genetics : Exploring ancestry-specific associations
Polygenic risk scores : Identifying variants for risk prediction models
Functional genomics : Understanding variant effects and genomic context
Systematic reviews : Comprehensive literature synthesis of genetic associations

Core Capabilities

1. Understanding GWAS Catalog Data Structure

The GWAS Catalog is organized around four core entities:

Studies : GWAS publications with metadata (PMID, author, cohort details)
Associations : SNP-trait associations with statistical evidence (p ≤ 5×10⁻⁸)
Variants : Genetic markers (SNPs) with genomic coordinates and alleles
Traits : Phenotypes and diseases (mapped to EFO ontology terms)

Key Identifiers:

Study accessions: GCST IDs (e.g., GCST001234)
Variant IDs: rs numbers (e.g., rs7903146) or variant_id format
Trait IDs: EFO terms (e.g., EFO_0001360 for type 2 diabetes)
Gene symbols: HGNC approved names (e.g., TCF7L2)

2. Web Interface Searches

The web interface at https://www.ebi.ac.uk/gwas/ supports multiple search modes:

By Variant (rs ID):

rs7903146

Returns all trait associations for this SNP.

By Disease/Trait:

type 2 diabetes
Parkinson disease
body mass index

Returns all associated genetic variants.

By Gene:

APOE
TCF7L2

Returns variants in or near the gene region.

By Chromosomal Region:

10:114000000-115000000

Returns variants in the specified genomic interval.

By Publication:

PMID:20581827
Author: McCarthy MI
GCST001234

Returns study details and all reported associations.

3. REST API Access

The GWAS Catalog provides two REST APIs for programmatic access:

Base URLs:

GWAS Catalog API: https://www.ebi.ac.uk/gwas/rest/api
Summary Statistics API: https://www.ebi.ac.uk/gwas/summary-statistics/api

API Documentation:

Main API docs: https://www.ebi.ac.uk/gwas/rest/docs/api
Summary stats docs: https://www.ebi.ac.uk/gwas/summary-statistics/docs/

Core Endpoints:

Studies endpoint - /studies/{accessionID}

import requests

# Get a specific study
url = "https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001795"
response = requests.get(url, headers={"Content-Type": "application/json"})
study = response.json()

Associations endpoint - /associations

# Find associations for a variant
variant = "rs7903146"
url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{variant}/associations"
params = {"projection": "associationBySnp"}
response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
associations = response.json()

Variants endpoint - /singleNucleotidePolymorphisms/{rsID}

# Get variant details
url = "https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/rs7903146"
response = requests.get(url, headers={"Content-Type": "application/json"})
variant_info = response.json()

4. Query Examples and Patterns

Example 1: Find all associations for a disease

import requests

trait = "EFO_0001360"  # Type 2 diabetes
base_url = "https://www.ebi.ac.uk/gwas/rest/api"

# Query associations for this trait
url = f"{base_url}/efoTraits/{trait}/associations"
response = requests.get(url, headers={"Content-Type": "application/json"})
associations = response.json()

# Process results
for assoc in associations.get('_embedded', {}).get('associations', []):
    variant = assoc.get('rsId')
    pvalue = assoc.get('pvalue')
    risk_allele = assoc.get('strongestAllele')
    print(f"{variant}: p={pvalue}, risk allele={risk_allele}")

Example 2: Get variant information and all trait associations

import requests

variant = "rs7903146"
base_url = "https://www.ebi.ac.uk/gwas/rest/api"

# Get variant details
url = f"{base_url}/singleNucleotidePolymorphisms/{variant}"
response = requests.get(url, headers={"Content-Type": "application/json"})
variant_data = response.json()

# Get all associations for this variant
url = f"{base_url}/singleNucleotidePolymorphisms/{variant}/associations"
params = {"projection": "associationBySnp"}
response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
associations = response.json()

# Extract trait names and p-values
for assoc in associations.get('_embedded', {}).get('associations', []):
    trait = assoc.get('efoTrait')
    pvalue = assoc.get('pvalue')
    print(f"Trait: {trait}, p-value: {pvalue}")

Example 3: Access summary statistics

import requests

# Query summary statistics API
base_url = "https://www.ebi.ac.uk/gwas/summary-statistics/api"

# Find associations by trait with p-value threshold
trait = "EFO_0001360"  # Type 2 diabetes
p_upper = "0.000000001"  # p < 1e-9
url = f"{base_url}/traits/{trait}/associations"
params = {
    "p_upper": p_upper,
    "size": 100  # Number of results
}
response = requests.get(url, params=params)
results = response.json()

# Process genome-wide significant hits
for hit in results.get('_embedded', {}).get('associations', []):
    variant_id = hit.get('variant_id')
    chromosome = hit.get('chromosome')
    position = hit.get('base_pair_location')
    pvalue = hit.get('p_value')
    print(f"{chromosome}:{position} ({variant_id}): p={pvalue}")

Example 4: Query by chromosomal region

import requests

# Find variants in a specific genomic region
chromosome = "10"
start_pos = 114000000
end_pos = 115000000

base_url = "https://www.ebi.ac.uk/gwas/rest/api"
url = f"{base_url}/singleNucleotidePolymorphisms/search/findByChromBpLocationRange"
params = {
    "chrom": chromosome,
    "bpStart": start_pos,
    "bpEnd": end_pos
}
response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
variants_in_region = response.json()

5. Working with Summary Statistics

The GWAS Catalog hosts full summary statistics for many studies, providing access to all tested variants (not just genome-wide significant hits).

Access Methods:

FTP download : http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/
REST API : Query-based access to summary statistics
Web interface : Browse and download via the website

Summary Statistics API Features:

Filter by chromosome, position, p-value
Query specific variants across studies
Retrieve effect sizes and allele frequencies
Access harmonized and standardized data

Example: Download summary statistics for a study

import requests
import gzip

# Get available summary statistics
base_url = "https://www.ebi.ac.uk/gwas/summary-statistics/api"
url = f"{base_url}/studies/GCST001234"
response = requests.get(url)
study_info = response.json()

# Download link is provided in the response
# Alternatively, use FTP:
# ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCSTXXXXXX/

6. Data Integration and Cross-referencing

The GWAS Catalog provides links to external resources:

Genomic Databases:

Ensembl: Gene annotations and variant consequences
dbSNP: Variant identifiers and population frequencies
gnomAD: Population allele frequencies

Functional Resources:

Open Targets: Target-disease associations
PGS Catalog: Polygenic risk scores
UCSC Genome Browser: Genomic context

Phenotype Resources:

EFO (Experimental Factor Ontology): Standardized trait terms
OMIM: Disease gene relationships
Disease Ontology: Disease hierarchies

Following Links in API Responses:

import requests

# API responses include _links for related resources
response = requests.get("https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001234")
study = response.json()

# Follow link to associations
associations_url = study['_links']['associations']['href']
associations_response = requests.get(associations_url)

Query Workflows

Workflow 1: Exploring Genetic Associations for a Disease

Identify the trait using EFO terms or free text:
- Search web interface for disease name
- Note the EFO ID (e.g., EFO_0001360 for type 2 diabetes)

Query associations via API:

url = f"https://www.ebi.ac.uk/gwas/rest/api/efoTraits/{efo_id}/associations"

Filter by significance and population:
- Check p-values (genome-wide significant: p ≤ 5×10⁻⁸)
- Review ancestry information in study metadata
- Filter by sample size or discovery/replication status
Extract variant details:
- rs IDs for each association
- Effect alleles and directions
- Effect sizes (odds ratios, beta coefficients)
- Population allele frequencies
Cross-reference with other databases:
- Look up variant consequences in Ensembl
- Check population frequencies in gnomAD
- Explore gene function and pathways

Workflow 2: Investigating a Specific Genetic Variant

Query the variant:

url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}"

Retrieve all trait associations:

url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}/associations"

Analyze pleiotropy:
- Identify all traits associated with this variant
- Review effect directions across traits
- Look for shared biological pathways
Check genomic context:
- Determine nearby genes
- Identify if variant is in coding/regulatory regions
- Review linkage disequilibrium with other variants

Workflow 3: Gene-Centric Association Analysis

Search by gene symbol in web interface or:

url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/search/findByGene"
params = {"geneName": gene_symbol}

Retrieve variants in gene region:
- Get chromosomal coordinates for gene
- Query variants in region
- Include promoter and regulatory regions (extend boundaries)
Analyze association patterns:
- Identify traits associated with variants in this gene
- Look for consistent associations across studies
- Review effect sizes and directions
Functional interpretation:
- Determine variant consequences (missense, regulatory, etc.)
- Check expression QTL (eQTL) data
- Review pathway and network context

Workflow 4: Systematic Review of Genetic Evidence

Define research question:
- Specific trait or disease of interest
- Population considerations
- Study design requirements
Comprehensive variant extraction:
- Query all associations for trait
- Set significance threshold
- Note discovery and replication studies
Quality assessment:
- Review study sample sizes
- Check for population diversity
- Assess heterogeneity across studies
- Identify potential biases
Data synthesis:
- Aggregate associations across studies
- Perform meta-analysis if applicable
- Create summary tables
- Generate Manhattan or forest plots
Export and documentation:
- Download full association data
- Export summary statistics if needed
- Document search strategy and date
- Create reproducible analysis scripts

Workflow 5: Accessing and Analyzing Summary Statistics

Identify studies with summary statistics:
- Browse summary statistics portal
- Check FTP directory listings
- Query API for available studies

Download summary statistics:

# Via FTP
wget ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCSTXXXXXX/harmonised/GCSTXXXXXX-harmonised.tsv.gz

Query via API for specific variants:

url = f"https://www.ebi.ac.uk/gwas/summary-statistics/api/chromosomes/{chrom}/associations"
params = {"start": start_pos, "end": end_pos}

Process and analyze:
- Filter by p-value thresholds
- Extract effect sizes and confidence intervals
- Perform downstream analyses (fine-mapping, colocalization, etc.)

Response Formats and Data Fields

Key Fields in Association Records:

rsId: Variant identifier (rs number)
strongestAllele: Risk allele for the association
pvalue: Association p-value
pvalueText: P-value as text (may include inequality)
orPerCopyNum: Odds ratio or beta coefficient
betaNum: Effect size (for quantitative traits)
betaUnit: Unit of measurement for beta
range: Confidence interval
efoTrait: Associated trait name
mappedLabel: EFO-mapped trait term

Study Metadata Fields:

accessionId: GCST study identifier
pubmedId: PubMed ID
author: First author
publicationDate: Publication date
ancestryInitial: Discovery population ancestry
ancestryReplication: Replication population ancestry
sampleSize: Total sample size

Pagination: Results are paginated (default 20 items per page). Navigate using:

size parameter: Number of results per page
page parameter: Page number (0-indexed)
_links in response: URLs for next/previous pages

Best Practices

Query Strategy

Start with web interface to identify relevant EFO terms and study accessions
Use API for bulk data extraction and automated analyses
Implement pagination handling for large result sets
Cache API responses to minimize redundant requests

Data Interpretation

Always check p-value thresholds (genome-wide: 5×10⁻⁸)
Review ancestry information for population applicability
Consider sample size when assessing evidence strength
Check for replication across independent studies
Be aware of winner's curse in effect size estimates

Rate Limiting and Ethics

Respect API usage guidelines (no excessive requests)
Use summary statistics downloads for genome-wide analyses
Implement appropriate delays between API calls
Cache results locally when performing iterative analyses
Cite the GWAS Catalog in publications

Data Quality Considerations

GWAS Catalog curates published associations (may contain inconsistencies)
Effect sizes reported as published (may need harmonization)
Some studies report conditional or joint associations
Check for study overlap when combining results
Be aware of ascertainment and selection biases

Python Integration Example

Complete workflow for querying and analyzing GWAS data:

import requests
import pandas as pd
from time import sleep

def query_gwas_catalog(trait_id, p_threshold=5e-8):
    """
    Query GWAS Catalog for trait associations

    Args:
        trait_id: EFO trait identifier (e.g., 'EFO_0001360')
        p_threshold: P-value threshold for filtering

    Returns:
        pandas DataFrame with association results
    """
    base_url = "https://www.ebi.ac.uk/gwas/rest/api"
    url = f"{base_url}/efoTraits/{trait_id}/associations"

    headers = {"Content-Type": "application/json"}
    results = []
    page = 0

    while True:
        params = {"page": page, "size": 100}
        response = requests.get(url, params=params, headers=headers)

        if response.status_code != 200:
            break

        data = response.json()
        associations = data.get('_embedded', {}).get('associations', [])

        if not associations:
            break

        for assoc in associations:
            pvalue = assoc.get('pvalue')
            if pvalue and float(pvalue) <= p_threshold:
                results.append({
                    'variant': assoc.get('rsId'),
                    'pvalue': pvalue,
                    'risk_allele': assoc.get('strongestAllele'),
                    'or_beta': assoc.get('orPerCopyNum') or assoc.get('betaNum'),
                    'trait': assoc.get('efoTrait'),
                    'pubmed_id': assoc.get('pubmedId')
                })

        page += 1
        sleep(0.1)  # Rate limiting

    return pd.DataFrame(results)

# Example usage
df = query_gwas_catalog('EFO_0001360')  # Type 2 diabetes
print(df.head())
print(f"\nTotal associations: {len(df)}")
print(f"Unique variants: {df['variant'].nunique()}")

Resources

references/api_reference.md

Comprehensive API documentation including:

Detailed endpoint specifications for both APIs
Complete list of query parameters and filters
Response format specifications and field descriptions
Advanced query examples and patterns
Error handling and troubleshooting
Integration with external databases

Consult this reference when:

Constructing complex API queries
Understanding response structures
Implementing pagination or batch operations
Troubleshooting API errors
Exploring advanced filtering options

Training Materials

The GWAS Catalog team provides workshop materials:

GitHub repository: https://github.com/EBISPOT/GWAS_Catalog-workshop
Jupyter notebooks with example queries
Google Colab integration for cloud execution

Important Notes

Data Updates

The GWAS Catalog is updated regularly with new publications
Re-run queries periodically for comprehensive coverage
Summary statistics are added as studies release data
EFO mappings may be updated over time

Citation Requirements

When using GWAS Catalog data, cite:

Sollis E, et al. (2023) The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Research. PMID: 37953337
Include access date and version when available
Cite original studies when discussing specific findings

Limitations

Not all GWAS publications are included (curation criteria apply)
Full summary statistics available for subset of studies
Effect sizes may require harmonization across studies
Population diversity is growing but historically limited
Some associations represent conditional or joint effects

Data Access

Web interface: Free, no registration required
REST APIs: Free, no API key needed
FTP downloads: Open access
Rate limiting applies to API (be respectful)

Additional Resources

GWAS Catalog website : https://www.ebi.ac.uk/gwas/
Documentation : https://www.ebi.ac.uk/gwas/docs
API documentation : https://www.ebi.ac.uk/gwas/rest/docs/api
Summary Statistics API : https://www.ebi.ac.uk/gwas/summary-statistics/docs/
FTP site : http://ftp.ebi.ac.uk/pub/databases/gwas/
Training materials : https://github.com/EBISPOT/GWAS_Catalog-workshop
PGS Catalog (polygenic scores): https://www.pgscatalog.org/
Help and support : gwas-info@ebi.ac.uk

Weekly Installs

130

Repository

davila7/claude-…emplates

GitHub Stars

22.6K

First Seen

Jan 21, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

claude-code110

opencode105

cursor99

gemini-cli98

codex90

antigravity88

Excel财务建模规范与xlsx文件处理指南：专业格式、零错误公式与数据分析

43,800 周安装

Traits endpoint - /efoTraits/{efoID}

# Get trait information
url = "https://www.ebi.ac.uk/gwas/rest/api/efoTraits/EFO_0001360"
response = requests.get(url, headers={"Content-Type": "application/json"})
trait_info = response.json()

GWAS Catalog数据库使用指南：SNP查询、性状关联分析与API访问教程

🇨🇳中文介绍

GWAS Catalog 数据库

概述

何时使用此技能

核心功能

1. 理解 GWAS Catalog 数据结构

相关 Skills

2. Web 界面搜索

3. REST API 访问

4. 查询示例与模式

5. 处理汇总统计数据

6. 数据整合与交叉引用

查询工作流

工作流 1：探索某个疾病的遗传关联

工作流 2：研究特定遗传变异

工作流 3：以基因为中心的关联分析

工作流 4：遗传证据的系统性综述

工作流 5：访问和分析汇总统计数据

响应格式与数据字段

最佳实践

查询策略

数据解读

速率限制与伦理

数据质量考量

Python 集成示例

资源

references/api_reference.md

培训材料

重要说明

数据更新

引用要求

局限性

数据访问

其他资源

🇺🇸English

GWAS Catalog Database

Overview

When to Use This Skill

Core Capabilities

1. Understanding GWAS Catalog Data Structure

2. Web Interface Searches

3. REST API Access

4. Query Examples and Patterns

5. Working with Summary Statistics

6. Data Integration and Cross-referencing

Query Workflows

Workflow 1: Exploring Genetic Associations for a Disease

Workflow 2: Investigating a Specific Genetic Variant

Workflow 3: Gene-Centric Association Analysis

Workflow 4: Systematic Review of Genetic Evidence

Workflow 5: Accessing and Analyzing Summary Statistics

Response Formats and Data Fields

Best Practices

Query Strategy

Data Interpretation

Rate Limiting and Ethics

Data Quality Considerations

Python Integration Example

Resources

references/api_reference.md

Training Materials

Important Notes

Data Updates

Citation Requirements

Limitations

Data Access

Additional Resources

最新 Skills