gget生物信息学工具：统一访问20+基因组数据库的命令行与Python包

gget by davila7/claude-code-templates

190 周安装量

24,300 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/davila7/claude-code-templates --skill gget

Python Web框架命令行工具生物信息学

🇨🇳中文介绍

gget

概述

gget 是一个命令行生物信息学工具和 Python 包，提供对 20 多个基因组数据库和分析方法的统一访问。通过一致的接口查询基因信息、序列分析、蛋白质结构、表达数据和疾病关联。所有 gget 模块既可作为命令行工具使用，也可作为 Python 函数使用。

重要提示：gget 查询的数据库会持续更新，这有时会改变其结构。gget 模块每两周自动测试一次，并在必要时更新以匹配新的数据库结构。

安装

在干净的虚拟环境中安装 gget 以避免冲突：

# 使用 uv（推荐）
uv uv pip install gget

# 或使用 pip
uv pip install --upgrade gget

# 在 Python/Jupyter 中
import gget

快速开始

所有模块的基本使用模式：

# 命令行
gget <module> [arguments] [options]

# Python
gget.module(arguments, options)

大多数模块返回：

命令行：JSON（默认）或使用 -csv 标志的 CSV
Python：DataFrame 或字典

各模块通用的标志：

-o/--out：将结果保存到文件

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

1. 参考与基因信息

gget ref - 参考基因组下载

检索 Ensembl 参考基因组的下载链接和元数据。

species：属_种格式（例如 'homo_sapiens'、'mus_musculus'）。快捷方式：'human'、'mouse'
-w/--which：指定返回类型（gtf、cdna、dna、cds、cdrna、pep）。默认：全部
-r/--release：Ensembl 发布版本号（默认：最新）
-l/--list_species：列出可用的脊椎动物物种
-liv/--list_iv_species：列出可用的无脊椎动物物种
-ftp：仅返回 FTP 链接
-d/--download：下载文件（需要 curl）

# 列出可用物种
gget ref --list_species

# 获取人类的所有参考文件
gget ref homo_sapiens

# 仅下载小鼠的 GTF 注释
gget ref -w gtf -d mouse

# Python
gget.ref("homo_sapiens")
gget.ref("mus_musculus", which="gtf", download=True)

gget search - 基因搜索

跨物种按名称或描述定位基因。

searchwords：一个或多个搜索词（不区分大小写）
-s/--species：目标物种（例如 'homo_sapiens'、'mouse'）
-r/--release：Ensembl 发布版本号
-t/--id_type：返回 'gene'（默认）或 'transcript'
-ao/--andor：'or'（默认）查找任何搜索词；'and' 要求所有
-l/--limit：返回的最大结果数

返回：ensembl_id、gene_name、ensembl_description、ext_ref_description、biotype、URL

# 在人类中搜索 GABA 相关基因
gget search -s human gaba gamma-aminobutyric

# 查找特定基因，要求所有术语
gget search -s mouse -ao and pax7 transcription

# Python
gget.search(["gaba", "gamma-aminobutyric"], species="homo_sapiens")

gget info - 基因/转录本信息

从 Ensembl、UniProt 和 NCBI 检索全面的基因和转录本元数据。

ens_ids：一个或多个 Ensembl ID（也支持 WormBase、Flybase ID）。限制：约 1000 个 ID
-n/--ncbi：禁用 NCBI 数据检索
-u/--uniprot：禁用 UniProt 数据检索
-pdb：包含 PDB 标识符（增加运行时间）

返回：UniProt ID、NCBI 基因 ID、主要基因名称、同义词、蛋白质名称、描述、生物类型、规范转录本

# 获取多个基因的信息
gget info ENSG00000034713 ENSG00000104853 ENSG00000170296

# 包含 PDB ID
gget info ENSG00000034713 -pdb

# Python
gget.info(["ENSG00000034713", "ENSG00000104853"], pdb=True)

gget seq - 序列检索

获取基因和转录本的核苷酸或氨基酸序列。

ens_ids：一个或多个 Ensembl 标识符
-t/--translate：获取氨基酸序列而非核苷酸
-iso/--isoforms：返回所有转录本变体（仅限基因 ID）

返回：FASTA 格式序列

# 获取核苷酸序列
gget seq ENSG00000034713 ENSG00000104853

# 获取所有蛋白质异构体
gget seq -t -iso ENSG00000034713

# Python
gget.seq(["ENSG00000034713"], translate=True, isoforms=True)

2. 序列分析与比对

gget blast - BLAST 搜索

针对标准数据库进行核苷酸或氨基酸序列的 BLAST 搜索。

sequence：序列字符串或 FASTA/.txt 文件路径
-p/--program：blastn、blastp、blastx、tblastn、tblastx（自动检测）
-db/--database：
- 核苷酸：nt、refseq_rna、pdbnt
- 蛋白质：nr、swissprot、pdbaa、refseq_protein
-l/--limit：最大命中数（默认：50）
-e/--expect：E 值截止值（默认：10.0）
-lcf/--low_comp_filt：启用低复杂度过滤
-mbo/--megablast_off：禁用 MegaBLAST（仅限 blastn）

# BLAST 蛋白质序列
gget blast MKWMFKEDHSLEHRCVESAKIRAKYPDRVPVIVEKVSGSQIVDIDKRKYLVPSDITVAQFMWIIRKRIQLPSEKAIFLFVDKTVPQSR

# 使用特定数据库从文件进行 BLAST
gget blast sequence.fasta -db swissprot -l 10

# Python
gget.blast("MKWMFK...", database="swissprot", limit=10)

gget blat - BLAT 搜索

使用 UCSC BLAT 定位序列的基因组位置。

sequence：序列字符串或 FASTA/.txt 文件路径
-st/--seqtype：'DNA'、'protein'、'translated%20RNA'、'translated%20DNA'（自动检测）
-a/--assembly：目标组装（默认：'human'/hg38；选项：'mouse'/mm39、'zebrafinch'/taeGut2 等）

返回：基因组、查询大小、比对位置、匹配、不匹配、比对百分比

# 在人类中查找基因组位置
gget blat ATCGATCGATCGATCG

# 在不同组装中搜索
gget blat -a mm39 ATCGATCGATCGATCG

# Python
gget.blat("ATCGATCGATCGATCG", assembly="mouse")

gget muscle - 多序列比对

使用 Muscle5 比对多个核苷酸或氨基酸序列。

fasta：序列或 FASTA/.txt 文件路径
-s5/--super5：使用 Super5 算法进行更快处理（大型数据集）

返回：ClustalW 格式的比对序列或比对后的 FASTA (.afa)

# 比对文件中的序列
gget muscle sequences.fasta -o aligned.afa

# 对大型数据集使用 Super5
gget muscle large_dataset.fasta -s5

# Python
gget.muscle("sequences.fasta", save=True)

gget diamond - 局部序列比对

使用 DIAMOND 执行快速的局部蛋白质或翻译 DNA 比对。

Query：序列（字符串/列表）或 FASTA 文件路径
--reference：参考序列（字符串/列表）或 FASTA 文件路径（必需）
--sensitivity：fast、mid-sensitive、sensitive、more-sensitive、very-sensitive（默认）、ultra-sensitive
--threads：CPU 线程数（默认：1）
--diamond_db：保存数据库以供重复使用
--translated：启用核苷酸到氨基酸的比对

返回：同一性百分比、序列长度、匹配位置、缺口开放、E 值、比特分数

# 针对参考序列进行比对
gget diamond GGETISAWESQME -ref reference.fasta --threads 4

# 保存数据库以供重复使用
gget diamond query.fasta -ref ref.fasta --diamond_db my_db.dmnd

# Python
gget.diamond("GGETISAWESQME", reference="reference.fasta", threads=4)

3. 结构与蛋白质分析

gget pdb - 蛋白质结构

查询 RCSB 蛋白质数据库以获取结构和元数据。

pdb_id：PDB 标识符（例如 '7S7U'）
-r/--resource：数据类型（pdb、entry、pubmed、assembly、entity types）
-i/--identifier：组装、实体或链 ID

返回：PDB 格式（结构）或 JSON（元数据）

# 下载 PDB 结构
gget pdb 7S7U -o 7S7U.pdb

# 获取元数据
gget pdb 7S7U -r entry

# Python
gget.pdb("7S7U", save=True)

gget alphafold - 蛋白质结构预测

使用简化的 AlphaFold2 预测 3D 蛋白质结构。

# 首先安装 OpenMM
uv pip install openmm

# 然后设置 AlphaFold
gget setup alphafold

sequence：氨基酸序列（字符串）、多个序列（列表）或 FASTA 文件。多个序列触发多聚体建模
-mr/--multimer_recycles：循环迭代次数（默认：3；建议 20 以提高准确性）
-mfm/--multimer_for_monomer：将多聚体模型应用于单个蛋白质
-r/--relax：对排名最高的模型进行 AMBER 松弛
plot：仅限 Python；生成交互式 3D 可视化（默认：True）
show_sidechains：仅限 Python；包含侧链（默认：True）

返回：PDB 结构文件、JSON 比对误差数据、可选的 3D 可视化

# 预测单个蛋白质结构
gget alphafold MKWMFKEDHSLEHRCVESAKIRAKYPDRVPVIVEKVSGSQIVDIDKRKYLVPSDITVAQFMWIIRKRIQLPSEKAIFLFVDKTVPQSR

# 以更高准确性预测多聚体
gget alphafold sequence1.fasta -mr 20 -r

# Python 带可视化
gget.alphafold("MKWMFK...", plot=True, show_sidechains=True)

# 多聚体预测
gget.alphafold(["sequence1", "sequence2"], multimer_recycles=20)

gget elm - 真核生物线性基序

预测蛋白质序列中的真核生物线性基序。

sequence：氨基酸序列或 UniProt Acc
-u/--uniprot：指示序列是 UniProt Acc
-e/--expand：包含蛋白质名称、生物体、参考文献
-s/--sensitivity：DIAMOND 比对灵敏度（默认："very-sensitive"）
-t/--threads：线程数（默认：1）

返回：两个输出：

ortholog_df：来自直系同源蛋白质的线性基序
regex_df：输入序列中直接匹配的基序

# 从序列预测基序
gget elm LIAQSIGQASFV -o results

# 使用 UniProt 登录号并扩展信息
gget elm --uniprot Q02410 -e

# Python
ortholog_df, regex_df = gget.elm("LIAQSIGQASFV")

4. 表达与疾病数据

gget archs4 - 基因相关性与组织表达

查询 ARCHS4 数据库以获取相关基因或组织表达数据。

gene：基因符号或 Ensembl ID（使用 --ensembl 标志）
-w/--which：'correlation'（默认，返回 100 个最相关的基因）或 'tissue'（表达图谱）
-s/--species：'human'（默认）或 'mouse'（仅限组织数据）
-e/--ensembl：输入是 Ensembl ID

相关模式：基因符号、皮尔逊相关系数
组织模式：组织标识符、最小/Q1/中位数/Q3/最大表达值

# 获取相关基因
gget archs4 ACE2

# 获取组织表达
gget archs4 -w tissue ACE2

# Python
gget.archs4("ACE2", which="tissue")

gget cellxgene - 单细胞 RNA-seq 数据

查询 CZ CELLxGENE Discover Census 以获取单细胞数据。

gget setup cellxgene

--gene (-g)：基因名称或 Ensembl ID（区分大小写！人类用 'PAX7'，小鼠用 'Pax7'）
--tissue：组织类型
--cell_type：特定细胞类型
--species (-s)：'homo_sapiens'（默认）或 'mus_musculus'
--census_version (-cv)：版本（"stable"、"latest" 或带日期）
--ensembl (-e)：使用 Ensembl ID
--meta_only (-mo)：仅返回元数据
其他过滤器：disease、development_stage、sex、assay、dataset_id、donor_id、ethnicity、suspension_type

返回：包含计数矩阵和元数据的 AnnData 对象（或仅元数据的数据框）

# 获取特定基因和细胞类型的单细胞数据
gget cellxgene --gene ACE2 ABCA1 --tissue lung --cell_type "mucus secreting cell" -o lung_data.h5ad

# 仅元数据
gget cellxgene --gene PAX7 --tissue muscle --meta_only -o metadata.csv

# Python
adata = gget.cellxgene(gene=["ACE2", "ABCA1"], tissue="lung", cell_type="mucus secreting cell")

gget enrichr - 富集分析

使用 Enrichr 对基因列表进行本体富集分析。

genes：基因符号或 Ensembl ID
-db/--database：参考数据库（支持快捷方式：'pathway'、'transcription'、'ontology'、'diseases_drugs'、'celltypes'）
-s/--species：human（默认）、mouse、fly、yeast、worm、fish
-bkg_l/--background_list：用于比较的背景基因
-ko/--kegg_out：保存带有高亮基因的 KEGG 通路图像
plot：仅限 Python；生成图形结果

数据库快捷方式：

'pathway' → KEGG_2021_Human
'transcription' → ChEA_2016
'ontology' → GO_Biological_Process_2021
'diseases_drugs' → GWAS_Catalog_2019
'celltypes' → PanglaoDB_Augmented_2021

# 本体富集分析
gget enrichr -db ontology ACE2 AGT AGTR1

# 保存 KEGG 通路
gget enrichr -db pathway ACE2 AGT AGTR1 -ko ./kegg_images/

# Python 带绘图
gget.enrichr(["ACE2", "AGT", "AGTR1"], database="ontology", plot=True)

gget bgee - 直系同源与表达

从 Bgee 数据库检索直系同源和基因表达数据。

ens_id：Ensembl 基因 ID 或 NCBI 基因 ID（用于非 Ensembl 物种）。当 type=expression 时支持多个 ID
-t/--type：'orthologs'（默认）或 'expression'

直系同源模式：跨物种的匹配基因及其 ID、名称、分类信息
表达模式：解剖实体、置信度分数、表达状态

# 获取直系同源物
gget bgee ENSG00000169194

# 获取表达数据
gget bgee ENSG00000169194 -t expression

# 多个基因
gget bgee ENSBTAG00000047356 ENSBTAG00000018317 -t expression

# Python
gget.bgee("ENSG00000169194", type="orthologs")

gget opentargets - 疾病与药物关联

从 OpenTargets 检索疾病和药物关联。

Ensembl 基因 ID（必需）
-r/--resource：diseases（默认）、drugs、tractability、pharmacogenetics、expression、depmap、interactions
-l/--limit：限制结果数量
过滤器参数（因资源而异）：
- drugs：--filter_disease
- pharmacogenetics：--filter_drug
- expression/depmap：--filter_tissue、--filter_anat_sys、--filter_organ
- interactions：--filter_protein_a、--filter_protein_b、--filter_gene_b

# 获取相关疾病
gget opentargets ENSG00000169194 -r diseases -l 5

# 获取相关药物
gget opentargets ENSG00000169194 -r drugs -l 10

# 获取组织表达
gget opentargets ENSG00000169194 -r expression --filter_tissue brain

# Python
gget.opentargets("ENSG00000169194", resource="diseases", limit=5)

gget cbio - cBioPortal 癌症基因组学

使用 cBioPortal 数据绘制癌症基因组学热图。

两个子命令：

search - 查找研究 ID：

gget cbio search breast lung

plot - 生成热图：

-s/--study_ids：以空格分隔的 cBioPortal 研究 ID（必需）
-g/--genes：以空格分隔的基因名称或 Ensembl ID（必需）
-st/--stratification：组织数据的列（tissue、cancer_type、cancer_type_detailed、study_id、sample）
-vt/--variation_type：数据类型（mutation_occurrences、cna_nonbinary、sv_occurrences、cna_occurrences、Consequence）
-f/--filter：按列值过滤（例如 'study_id:msk_impact_2017'）
-dd/--data_dir：缓存目录（默认：./gget_cbio_cache）
-fd/--figure_dir：输出目录（默认：./gget_cbio_figures）
-dpi：分辨率（默认：100）
-sh/--show：在窗口中显示绘图
-nc/--no_confirm：跳过下载确认

# 搜索研究
gget cbio search esophag ovary

# 创建热图
gget cbio plot -s msk_impact_2017 -g AKT1 ALK BRAF -st tissue -vt mutation_occurrences

# Python
gget.cbio_search(["esophag", "ovary"])
gget.cbio_plot(["msk_impact_2017"], ["AKT1", "ALK"], stratification="tissue")

gget cosmic - COSMIC 数据库

搜索 COSMIC（癌症体细胞突变目录）数据库。

重要提示：商业使用需支付许可费。需要 COSMIC 账户凭据。

searchterm：基因名称、Ensembl ID、突变符号或样本 ID
-ctp/--cosmic_tsv_path：已下载的 COSMIC TSV 文件路径（查询必需）
-l/--limit：最大结果数（默认：100）

数据库下载标志：

-d/--download_cosmic：激活下载模式
-gm/--gget_mutate：为 gget mutate 创建版本
-cp/--cosmic_project：数据库类型（cancer、census、cell_line、resistance、genome_screen、targeted_screen）
-cv/--cosmic_version：COSMIC 版本
-gv/--grch_version：人类参考基因组（37 或 38）
--email、--password：COSMIC 凭据

# 首先下载数据库
gget cosmic -d --email user@example.com --password xxx -cp cancer

# 然后查询
gget cosmic EGFR -ctp cosmic_data.tsv -l 10

# Python
gget.cosmic("EGFR", cosmic_tsv_path="cosmic_data.tsv", limit=10)

gget mutate - 生成突变序列

从突变注释生成突变的核苷酸序列。

sequences：FASTA 文件路径或直接序列输入（字符串/列表）
-m/--mutations：包含突变数据的 CSV/TSV 文件或 DataFrame（必需）
-mc/--mut_column：突变列名（默认：'mutation'）
-sic/--seq_id_column：序列 ID 列（默认：'seq_ID'）
-mic/--mut_id_column：突变 ID 列
-k/--k：侧翼序列长度（默认：30 个核苷酸）

返回：FASTA 格式的突变序列

# 单个突变
gget mutate ATCGCTAAGCT -m "c.4G>T"

# 从文件获取多个序列及其突变
gget mutate sequences.fasta -m mutations.csv -o mutated.fasta

# Python
import pandas as pd
mutations_df = pd.DataFrame({"seq_ID": ["seq1"], "mutation": ["c.4G>T"]})
gget.mutate(["ATCGCTAAGCT"], mutations=mutations_df)

gget gpt - OpenAI 文本生成

使用 OpenAI 的 API 生成自然语言文本。

重要提示：免费层级在账户创建后仅限 3 个月。设置月度账单限制。

prompt：用于生成的文本输入（必需）
api_key：OpenAI 身份验证（必需）
模型配置：temperature、top_p、max_tokens、frequency_penalty、presence_penalty
默认模型：gpt-3.5-turbo（可配置）

gget gpt "Explain CRISPR" --api_key your_key_here

# Python
gget.gpt("Explain CRISPR", api_key="your_key_here")

gget setup - 安装依赖项

为特定模块安装/下载第三方依赖项。

module：需要安装依赖项的模块名称
-o/--out：输出文件夹路径（仅限 elm 模块）

需要设置的模块：

alphafold - 下载约 4GB 的模型参数
cellxgene - 安装 cellxgene-census（可能不支持最新的 Python）
elm - 下载本地 ELM 数据库
gpt - 配置 OpenAI 集成

# 设置 AlphaFold
gget setup alphafold

# 使用自定义目录设置 ELM
gget setup elm -o /path/to/elm_data

# Python
gget.setup("alphafold")

工作流程 1：基因发现到序列分析

查找和分析感兴趣的基因：

# 1. 搜索基因
results = gget.search(["GABA", "receptor"], species="homo_sapiens")

# 2. 获取详细信息
gene_ids = results["ensembl_id"].tolist()
info = gget.info(gene_ids[:5])

# 3. 检索序列
sequences = gget.seq(gene_ids[:5], translate=True)

工作流程 2：序列比对与结构

比对序列并预测结构：

# 1. 比对多个序列
alignment = gget.muscle("sequences.fasta")

# 2. 查找相似序列
blast_results = gget.blast(my_sequence, database="swissprot", limit=10)

# 3. 预测结构
structure = gget.alphafold(my_sequence, plot=True)

# 4. 查找线性基序
ortholog_df, regex_df = gget.elm(my_sequence)

工作流程 3：基因表达与富集

分析表达模式和功能富集：

# 1. 获取组织表达
tissue_expr = gget.archs4("ACE2", which="tissue")

# 2. 查找相关基因
correlated = gget.archs4("ACE2", which="correlation")

# 3. 获取单细胞数据
adata = gget.cellxgene(gene=["ACE2"], tissue="lung", cell_type="epithelial cell")

# 4. 执行富集分析
gene_list = correlated["gene_symbol"].tolist()[:50]
enrichment = gget.enrichr(gene_list, database="ontology", plot=True)

工作流程 4：疾病与药物分析

研究疾病关联和治疗靶点：

# 1. 搜索基因
genes = gget.search(["breast cancer"], species="homo_sapiens")

# 2. 获取疾病关联
diseases = gget.opentargets("ENSG00000169194", resource="diseases")

# 3. 获取药物关联
drugs = gget.opentargets("ENSG00000169194", resource="drugs")

# 4. 查询癌症基因组学数据
study_ids = gget.cbio_search(["breast"])
gget.cbio_plot(study_ids[:2], ["BRCA1", "BRCA2"], stratification="cancer_type")

# 5. 在 COSMIC 中搜索突变
cosmic_results = gget.cosmic("BRCA1", cosmic_tsv_path="cosmic.tsv")

工作流程 5：比较基因组学

跨物种比较蛋白质：

# 1. 获取直系同源物
orthologs = gget.bgee("ENSG00000169194", type="orthologs")

# 2. 获取用于比较的序列
human_seq = gget.seq("ENSG00000169194", translate=True)
mouse_seq = gget.seq("ENSMUSG00000026091", translate=True)

# 3. 比对序列
alignment = gget.muscle([human_seq, mouse_seq])

# 4. 比较结构
human_structure = gget.pdb("7S7U")
mouse_structure = gget.alphafold(mouse_seq)

工作流程 6：构建参考索引

为下游分析准备参考数据（例如，kallisto|bustools）：

# 1. 列出可用物种
gget ref --list_species

# 2. 下载参考文件
gget ref -w gtf -w cdna -d homo_sapiens

# 3. 构建 kallisto 索引
kallisto index -i transcriptome.idx transcriptome.fasta

# 4. 下载用于比对的基因组
gget ref -w dna -d homo_sapiens

使用 --limit 控制大型查询的结果大小
使用 -o/--out 保存结果以确保可重复性
检查数据库版本/发布以确保跨分析的一致性
在生产脚本中使用 --quiet 以减少输出

对于 BLAST/BLAT，从默认参数开始，然后调整灵敏度
使用带 --threads 的 gget diamond 进行更快的局部比对
使用 --diamond_db 保存 DIAMOND 数据库以供重复查询
对于多序列比对，大型数据集使用 -s5/--super5

表达与疾病数据

cellxgene 中的基因符号区分大小写（例如 'PAX7' 与 'Pax7'）
在首次使用 alphafold、cellxgene、elm、gpt 之前运行 gget setup
对于富集分析，使用数据库快捷方式以方便
使用 -dd 缓存 cBioPortal 数据以避免重复下载

AlphaFold 多聚体预测：使用 -mr 20 以获得更高准确性
使用 -r 标志对最终结构进行 AMBER 松弛
在 Python 中使用 plot=True 可视化结果
在运行 AlphaFold 预测之前先检查 PDB 数据库

数据库结构会变化；定期更新 gget：uv pip install --upgrade gget
使用 gget info 一次最多处理约 1000 个 Ensembl ID
对于大规模分析，对 API 查询实施速率限制
使用虚拟环境以避免依赖冲突

默认：JSON
CSV：添加 -csv 标志
FASTA：gget seq、gget mutate
PDB：gget pdb、gget alphafold
PNG：gget cbio plot

默认：DataFrame 或字典
JSON：添加 json=True 参数
保存到文件：添加 save=True 或指定 out="filename"
AnnData：gget cellxgene

此技能包含详细模块信息的参考文档：

module_reference.md - 所有模块的全面参数参考
database_info.md - 关于查询数据库及其更新频率的信息
workflows.md - 扩展的工作流程示例和用例

如需额外帮助：

官方文档：https://pachterlab.github.io/gget/
GitHub 问题：https://github.com/pachterlab/gget/issues
引用：Luebbert, L. & Pachter, L. (2023). Efficient querying of genomic reference databases with gget. Bioinformatics. https://doi.org/10.1093/bioinformatics/btac836

🇺🇸English

gget

Overview

gget is a command-line bioinformatics tool and Python package providing unified access to 20+ genomic databases and analysis methods. Query gene information, sequence analysis, protein structures, expression data, and disease associations through a consistent interface. All gget modules work both as command-line tools and as Python functions.

Important : The databases queried by gget are continuously updated, which sometimes changes their structure. gget modules are tested automatically on a biweekly basis and updated to match new database structures when necessary.

Installation

Install gget in a clean virtual environment to avoid conflicts:

# Using uv (recommended)
uv uv pip install gget

# Or using pip
uv pip install --upgrade gget

# In Python/Jupyter
import gget

Quick Start

Basic usage pattern for all modules:

# Command-line
gget <module> [arguments] [options]

# Python
gget.module(arguments, options)

Most modules return:

Command-line : JSON (default) or CSV with -csv flag
Python : DataFrame or dictionary

Common flags across modules:

-o/--out: Save results to file
-q/--quiet: Suppress progress information
-csv: Return CSV format (command-line only)

Module Categories

1. Reference & Gene Information

gget ref - Reference Genome Downloads

Retrieve download links and metadata for Ensembl reference genomes.

Parameters :

species: Genus_species format (e.g., 'homo_sapiens', 'mus_musculus'). Shortcuts: 'human', 'mouse'
-w/--which: Specify return types (gtf, cdna, dna, cds, cdrna, pep). Default: all
-r/--release: Ensembl release number (default: latest)
-l/--list_species: List available vertebrate species
-liv/--list_iv_species: List available invertebrate species
-ftp: Return only FTP links
-d/--download: Download files (requires curl)

Examples :

# List available species
gget ref --list_species

# Get all reference files for human
gget ref homo_sapiens

# Download only GTF annotation for mouse
gget ref -w gtf -d mouse



# Python
gget.ref("homo_sapiens")
gget.ref("mus_musculus", which="gtf", download=True)

gget search - Gene Search

Locate genes by name or description across species.

Parameters :

searchwords: One or more search terms (case-insensitive)
-s/--species: Target species (e.g., 'homo_sapiens', 'mouse')
-r/--release: Ensembl release number
-t/--id_type: Return 'gene' (default) or 'transcript'
-ao/--andor: 'or' (default) finds ANY searchword; 'and' requires ALL
-l/--limit: Maximum results to return

Returns : ensembl_id, gene_name, ensembl_description, ext_ref_description, biotype, URL

Examples :

# Search for GABA-related genes in human
gget search -s human gaba gamma-aminobutyric

# Find specific gene, require all terms
gget search -s mouse -ao and pax7 transcription



# Python
gget.search(["gaba", "gamma-aminobutyric"], species="homo_sapiens")

gget info - Gene/Transcript Information

Retrieve comprehensive gene and transcript metadata from Ensembl, UniProt, and NCBI.

Parameters :

ens_ids: One or more Ensembl IDs (also supports WormBase, Flybase IDs). Limit: ~1000 IDs
-n/--ncbi: Disable NCBI data retrieval
-u/--uniprot: Disable UniProt data retrieval
-pdb: Include PDB identifiers (increases runtime)

Returns : UniProt ID, NCBI gene ID, primary gene name, synonyms, protein names, descriptions, biotype, canonical transcript

Examples :

# Get info for multiple genes
gget info ENSG00000034713 ENSG00000104853 ENSG00000170296

# Include PDB IDs
gget info ENSG00000034713 -pdb



# Python
gget.info(["ENSG00000034713", "ENSG00000104853"], pdb=True)

gget seq - Sequence Retrieval

Fetch nucleotide or amino acid sequences for genes and transcripts.

Parameters :

ens_ids: One or more Ensembl identifiers
-t/--translate: Fetch amino acid sequences instead of nucleotide
-iso/--isoforms: Return all transcript variants (gene IDs only)

Returns : FASTA format sequences

Examples :

# Get nucleotide sequences
gget seq ENSG00000034713 ENSG00000104853

# Get all protein isoforms
gget seq -t -iso ENSG00000034713



# Python
gget.seq(["ENSG00000034713"], translate=True, isoforms=True)

2. Sequence Analysis & Alignment

gget blast - BLAST Searches

BLAST nucleotide or amino acid sequences against standard databases.

Parameters :

sequence: Sequence string or path to FASTA/.txt file
-p/--program: blastn, blastp, blastx, tblastn, tblastx (auto-detected)
-db/--database:
- Nucleotide: nt, refseq_rna, pdbnt
- Protein: nr, swissprot, pdbaa, refseq_protein
-l/--limit: Max hits (default: 50)
-e/--expect: E-value cutoff (default: 10.0)
-lcf/--low_comp_filt: Enable low complexity filtering
-mbo/--megablast_off: Disable MegaBLAST (blastn only)

Examples :

# BLAST protein sequence
gget blast MKWMFKEDHSLEHRCVESAKIRAKYPDRVPVIVEKVSGSQIVDIDKRKYLVPSDITVAQFMWIIRKRIQLPSEKAIFLFVDKTVPQSR

# BLAST from file with specific database
gget blast sequence.fasta -db swissprot -l 10



# Python
gget.blast("MKWMFK...", database="swissprot", limit=10)

gget blat - BLAT Searches

Locate genomic positions of sequences using UCSC BLAT.

Parameters :

sequence: Sequence string or path to FASTA/.txt file
-st/--seqtype: 'DNA', 'protein', 'translated%20RNA', 'translated%20DNA' (auto-detected)
-a/--assembly: Target assembly (default: 'human'/hg38; options: 'mouse'/mm39, 'zebrafinch'/taeGut2, etc.)

Returns : genome, query size, alignment positions, matches, mismatches, alignment percentage

Examples :

# Find genomic location in human
gget blat ATCGATCGATCGATCG

# Search in different assembly
gget blat -a mm39 ATCGATCGATCGATCG



# Python
gget.blat("ATCGATCGATCGATCG", assembly="mouse")

gget muscle - Multiple Sequence Alignment

Align multiple nucleotide or amino acid sequences using Muscle5.

Parameters :

fasta: Sequences or path to FASTA/.txt file
-s5/--super5: Use Super5 algorithm for faster processing (large datasets)

Returns : Aligned sequences in ClustalW format or aligned FASTA (.afa)

Examples :

# Align sequences from file
gget muscle sequences.fasta -o aligned.afa

# Use Super5 for large dataset
gget muscle large_dataset.fasta -s5



# Python
gget.muscle("sequences.fasta", save=True)

gget diamond - Local Sequence Alignment

Perform fast local protein or translated DNA alignment using DIAMOND.

Parameters :

Query: Sequences (string/list) or FASTA file path
--reference: Reference sequences (string/list) or FASTA file path (required)
--sensitivity: fast, mid-sensitive, sensitive, more-sensitive, very-sensitive (default), ultra-sensitive
--threads: CPU threads (default: 1)
--diamond_db: Save database for reuse
--translated: Enable nucleotide-to-amino acid alignment

Returns : Identity percentage, sequence lengths, match positions, gap openings, E-values, bit scores

Examples :

# Align against reference
gget diamond GGETISAWESQME -ref reference.fasta --threads 4

# Save database for reuse
gget diamond query.fasta -ref ref.fasta --diamond_db my_db.dmnd



# Python
gget.diamond("GGETISAWESQME", reference="reference.fasta", threads=4)

3. Structural & Protein Analysis

gget pdb - Protein Structures

Query RCSB Protein Data Bank for structure and metadata.

Parameters :

pdb_id: PDB identifier (e.g., '7S7U')
-r/--resource: Data type (pdb, entry, pubmed, assembly, entity types)
-i/--identifier: Assembly, entity, or chain ID

Returns : PDB format (structures) or JSON (metadata)

Examples :

# Download PDB structure
gget pdb 7S7U -o 7S7U.pdb

# Get metadata
gget pdb 7S7U -r entry



# Python
gget.pdb("7S7U", save=True)

gget alphafold - Protein Structure Prediction

Predict 3D protein structures using simplified AlphaFold2.

Setup Required :

# Install OpenMM first
uv pip install openmm

# Then setup AlphaFold
gget setup alphafold

Parameters :

sequence: Amino acid sequence (string), multiple sequences (list), or FASTA file. Multiple sequences trigger multimer modeling
-mr/--multimer_recycles: Recycling iterations (default: 3; recommend 20 for accuracy)
-mfm/--multimer_for_monomer: Apply multimer model to single proteins
-r/--relax: AMBER relaxation for top-ranked model
plot: Python-only; generate interactive 3D visualization (default: True)
show_sidechains: Python-only; include side chains (default: True)

Returns : PDB structure file, JSON alignment error data, optional 3D visualization

Examples :

# Predict single protein structure
gget alphafold MKWMFKEDHSLEHRCVESAKIRAKYPDRVPVIVEKVSGSQIVDIDKRKYLVPSDITVAQFMWIIRKRIQLPSEKAIFLFVDKTVPQSR

# Predict multimer with higher accuracy
gget alphafold sequence1.fasta -mr 20 -r



# Python with visualization
gget.alphafold("MKWMFK...", plot=True, show_sidechains=True)

# Multimer prediction
gget.alphafold(["sequence1", "sequence2"], multimer_recycles=20)

gget elm - Eukaryotic Linear Motifs

Predict Eukaryotic Linear Motifs in protein sequences.

Setup Required :

gget setup elm

Parameters :

sequence: Amino acid sequence or UniProt Acc
-u/--uniprot: Indicates sequence is UniProt Acc
-e/--expand: Include protein names, organisms, references
-s/--sensitivity: DIAMOND alignment sensitivity (default: "very-sensitive")
-t/--threads: Number of threads (default: 1)

Returns : Two outputs:

ortholog_df : Linear motifs from orthologous proteins
regex_df : Motifs directly matched in input sequence

Examples :

# Predict motifs from sequence
gget elm LIAQSIGQASFV -o results

# Use UniProt accession with expanded info
gget elm --uniprot Q02410 -e



# Python
ortholog_df, regex_df = gget.elm("LIAQSIGQASFV")

4. Expression & Disease Data

gget archs4 - Gene Correlation & Tissue Expression

Query ARCHS4 database for correlated genes or tissue expression data.

Parameters :

gene: Gene symbol or Ensembl ID (with --ensembl flag)
-w/--which: 'correlation' (default, returns 100 most correlated genes) or 'tissue' (expression atlas)
-s/--species: 'human' (default) or 'mouse' (tissue data only)
-e/--ensembl: Input is Ensembl ID

Returns :

Correlation mode : Gene symbols, Pearson correlation coefficients
Tissue mode : Tissue identifiers, min/Q1/median/Q3/max expression values

Examples :

# Get correlated genes
gget archs4 ACE2

# Get tissue expression
gget archs4 -w tissue ACE2



# Python
gget.archs4("ACE2", which="tissue")

gget cellxgene - Single-Cell RNA-seq Data

Query CZ CELLxGENE Discover Census for single-cell data.

Setup Required :

gget setup cellxgene

Parameters :

--gene (-g): Gene names or Ensembl IDs (case-sensitive! 'PAX7' for human, 'Pax7' for mouse)
--tissue: Tissue type(s)
--cell_type: Specific cell type(s)
--species (-s): 'homo_sapiens' (default) or 'mus_musculus'
--census_version (-cv): Version ("stable", "latest", or dated)
--ensembl (-e): Use Ensembl IDs
--meta_only (-mo): Return metadata only
Additional filters: disease, development_stage, sex, assay, dataset_id, donor_id, ethnicity, suspension_type

Returns : AnnData object with count matrices and metadata (or metadata-only dataframes)

Examples :

# Get single-cell data for specific genes and cell types
gget cellxgene --gene ACE2 ABCA1 --tissue lung --cell_type "mucus secreting cell" -o lung_data.h5ad

# Metadata only
gget cellxgene --gene PAX7 --tissue muscle --meta_only -o metadata.csv



# Python
adata = gget.cellxgene(gene=["ACE2", "ABCA1"], tissue="lung", cell_type="mucus secreting cell")

gget enrichr - Enrichment Analysis

Perform ontology enrichment analysis on gene lists using Enrichr.

Parameters :

genes: Gene symbols or Ensembl IDs
-db/--database: Reference database (supports shortcuts: 'pathway', 'transcription', 'ontology', 'diseases_drugs', 'celltypes')
-s/--species: human (default), mouse, fly, yeast, worm, fish
-bkg_l/--background_list: Background genes for comparison
-ko/--kegg_out: Save KEGG pathway images with highlighted genes
plot: Python-only; generate graphical results

Database Shortcuts :

'pathway' → KEGG_2021_Human
'transcription' → ChEA_2016
'ontology' → GO_Biological_Process_2021
'diseases_drugs' → GWAS_Catalog_2019
'celltypes' → PanglaoDB_Augmented_2021

Examples :

# Enrichment analysis for ontology
gget enrichr -db ontology ACE2 AGT AGTR1

# Save KEGG pathways
gget enrichr -db pathway ACE2 AGT AGTR1 -ko ./kegg_images/



# Python with plot
gget.enrichr(["ACE2", "AGT", "AGTR1"], database="ontology", plot=True)

gget bgee - Orthology & Expression

Retrieve orthology and gene expression data from Bgee database.

Parameters :

ens_id: Ensembl gene ID or NCBI gene ID (for non-Ensembl species). Multiple IDs supported when type=expression
-t/--type: 'orthologs' (default) or 'expression'

Returns :

Orthologs mode : Matching genes across species with IDs, names, taxonomic info
Expression mode : Anatomical entities, confidence scores, expression status

Examples :

# Get orthologs
gget bgee ENSG00000169194

# Get expression data
gget bgee ENSG00000169194 -t expression

# Multiple genes
gget bgee ENSBTAG00000047356 ENSBTAG00000018317 -t expression



# Python
gget.bgee("ENSG00000169194", type="orthologs")

gget opentargets - Disease & Drug Associations

Retrieve disease and drug associations from OpenTargets.

Parameters :

Ensembl gene ID (required)
-r/--resource: diseases (default), drugs, tractability, pharmacogenetics, expression, depmap, interactions
-l/--limit: Cap results count
Filter arguments (vary by resource):
- drugs: --filter_disease
- pharmacogenetics: --filter_drug
- expression/depmap: --filter_tissue, --filter_anat_sys, --filter_organ
- interactions: --filter_protein_a, --filter_protein_b,

Examples :

# Get associated diseases
gget opentargets ENSG00000169194 -r diseases -l 5

# Get associated drugs
gget opentargets ENSG00000169194 -r drugs -l 10

# Get tissue expression
gget opentargets ENSG00000169194 -r expression --filter_tissue brain



# Python
gget.opentargets("ENSG00000169194", resource="diseases", limit=5)

gget cbio - cBioPortal Cancer Genomics

Plot cancer genomics heatmaps using cBioPortal data.

Two subcommands :

search - Find study IDs:

gget cbio search breast lung

plot - Generate heatmaps:

Parameters :

-s/--study_ids: Space-separated cBioPortal study IDs (required)
-g/--genes: Space-separated gene names or Ensembl IDs (required)
-st/--stratification: Column to organize data (tissue, cancer_type, cancer_type_detailed, study_id, sample)
-vt/--variation_type: Data type (mutation_occurrences, cna_nonbinary, sv_occurrences, cna_occurrences, Consequence)
-f/--filter: Filter by column value (e.g., 'study_id:msk_impact_2017')
-dd/--data_dir: Cache directory (default: ./gget_cbio_cache)
-fd/--figure_dir: Output directory (default: ./gget_cbio_figures)
-dpi: Resolution (default: 100)
-sh/--show: Display plot in window

Examples :

# Search for studies
gget cbio search esophag ovary

# Create heatmap
gget cbio plot -s msk_impact_2017 -g AKT1 ALK BRAF -st tissue -vt mutation_occurrences



# Python
gget.cbio_search(["esophag", "ovary"])
gget.cbio_plot(["msk_impact_2017"], ["AKT1", "ALK"], stratification="tissue")

gget cosmic - COSMIC Database

Search COSMIC (Catalogue Of Somatic Mutations In Cancer) database.

Important : License fees apply for commercial use. Requires COSMIC account credentials.

Parameters :

searchterm: Gene name, Ensembl ID, mutation notation, or sample ID
-ctp/--cosmic_tsv_path: Path to downloaded COSMIC TSV file (required for querying)
-l/--limit: Maximum results (default: 100)

Database download flags :

-d/--download_cosmic: Activate download mode
-gm/--gget_mutate: Create version for gget mutate
-cp/--cosmic_project: Database type (cancer, census, cell_line, resistance, genome_screen, targeted_screen)
-cv/--cosmic_version: COSMIC version
-gv/--grch_version: Human reference genome (37 or 38)
--email, --password: COSMIC credentials

Examples :

# First download database
gget cosmic -d --email user@example.com --password xxx -cp cancer

# Then query
gget cosmic EGFR -ctp cosmic_data.tsv -l 10



# Python
gget.cosmic("EGFR", cosmic_tsv_path="cosmic_data.tsv", limit=10)

5. Additional Tools

gget mutate - Generate Mutated Sequences

Generate mutated nucleotide sequences from mutation annotations.

Parameters :

sequences: FASTA file path or direct sequence input (string/list)
-m/--mutations: CSV/TSV file or DataFrame with mutation data (required)
-mc/--mut_column: Mutation column name (default: 'mutation')
-sic/--seq_id_column: Sequence ID column (default: 'seq_ID')
-mic/--mut_id_column: Mutation ID column
-k/--k: Length of flanking sequences (default: 30 nucleotides)

Returns : Mutated sequences in FASTA format

Examples :

# Single mutation
gget mutate ATCGCTAAGCT -m "c.4G>T"

# Multiple sequences with mutations from file
gget mutate sequences.fasta -m mutations.csv -o mutated.fasta



# Python
import pandas as pd
mutations_df = pd.DataFrame({"seq_ID": ["seq1"], "mutation": ["c.4G>T"]})
gget.mutate(["ATCGCTAAGCT"], mutations=mutations_df)

gget gpt - OpenAI Text Generation

Generate natural language text using OpenAI's API.

Setup Required :

gget setup gpt

Important : Free tier limited to 3 months after account creation. Set monthly billing limits.

Parameters :

prompt: Text input for generation (required)
api_key: OpenAI authentication (required)
Model configuration: temperature, top_p, max_tokens, frequency_penalty, presence_penalty
Default model: gpt-3.5-turbo (configurable)

Examples :

gget gpt "Explain CRISPR" --api_key your_key_here



# Python
gget.gpt("Explain CRISPR", api_key="your_key_here")

gget setup - Install Dependencies

Install/download third-party dependencies for specific modules.

Parameters :

module: Module name requiring dependency installation
-o/--out: Output folder path (elm module only)

Modules requiring setup :

alphafold - Downloads ~4GB of model parameters
cellxgene - Installs cellxgene-census (may not support latest Python)
elm - Downloads local ELM database
gpt - Configures OpenAI integration

Examples :

# Setup AlphaFold
gget setup alphafold

# Setup ELM with custom directory
gget setup elm -o /path/to/elm_data



# Python
gget.setup("alphafold")

Common Workflows

Workflow 1: Gene Discovery to Sequence Analysis

Find and analyze genes of interest:

# 1. Search for genes
results = gget.search(["GABA", "receptor"], species="homo_sapiens")

# 2. Get detailed information
gene_ids = results["ensembl_id"].tolist()
info = gget.info(gene_ids[:5])

# 3. Retrieve sequences
sequences = gget.seq(gene_ids[:5], translate=True)

Workflow 2: Sequence Alignment and Structure

Align sequences and predict structures:

# 1. Align multiple sequences
alignment = gget.muscle("sequences.fasta")

# 2. Find similar sequences
blast_results = gget.blast(my_sequence, database="swissprot", limit=10)

# 3. Predict structure
structure = gget.alphafold(my_sequence, plot=True)

# 4. Find linear motifs
ortholog_df, regex_df = gget.elm(my_sequence)

Workflow 3: Gene Expression and Enrichment

Analyze expression patterns and functional enrichment:

# 1. Get tissue expression
tissue_expr = gget.archs4("ACE2", which="tissue")

# 2. Find correlated genes
correlated = gget.archs4("ACE2", which="correlation")

# 3. Get single-cell data
adata = gget.cellxgene(gene=["ACE2"], tissue="lung", cell_type="epithelial cell")

# 4. Perform enrichment analysis
gene_list = correlated["gene_symbol"].tolist()[:50]
enrichment = gget.enrichr(gene_list, database="ontology", plot=True)

Workflow 4: Disease and Drug Analysis

Investigate disease associations and therapeutic targets:

# 1. Search for genes
genes = gget.search(["breast cancer"], species="homo_sapiens")

# 2. Get disease associations
diseases = gget.opentargets("ENSG00000169194", resource="diseases")

# 3. Get drug associations
drugs = gget.opentargets("ENSG00000169194", resource="drugs")

# 4. Query cancer genomics data
study_ids = gget.cbio_search(["breast"])
gget.cbio_plot(study_ids[:2], ["BRCA1", "BRCA2"], stratification="cancer_type")

# 5. Search COSMIC for mutations
cosmic_results = gget.cosmic("BRCA1", cosmic_tsv_path="cosmic.tsv")

Workflow 5: Comparative Genomics

Compare proteins across species:

# 1. Get orthologs
orthologs = gget.bgee("ENSG00000169194", type="orthologs")

# 2. Get sequences for comparison
human_seq = gget.seq("ENSG00000169194", translate=True)
mouse_seq = gget.seq("ENSMUSG00000026091", translate=True)

# 3. Align sequences
alignment = gget.muscle([human_seq, mouse_seq])

# 4. Compare structures
human_structure = gget.pdb("7S7U")
mouse_structure = gget.alphafold(mouse_seq)

Workflow 6: Building Reference Indices

Prepare reference data for downstream analysis (e.g., kallisto|bustools):

# 1. List available species
gget ref --list_species

# 2. Download reference files
gget ref -w gtf -w cdna -d homo_sapiens

# 3. Build kallisto index
kallisto index -i transcriptome.idx transcriptome.fasta

# 4. Download genome for alignment
gget ref -w dna -d homo_sapiens

Best Practices

Data Retrieval

Use --limit to control result sizes for large queries
Save results with -o/--out for reproducibility
Check database versions/releases for consistency across analyses
Use --quiet in production scripts to reduce output

Sequence Analysis

For BLAST/BLAT, start with default parameters, then adjust sensitivity
Use gget diamond with --threads for faster local alignment
Save DIAMOND databases with --diamond_db for repeated queries
For multiple sequence alignment, use -s5/--super5 for large datasets

Expression and Disease Data

Gene symbols are case-sensitive in cellxgene (e.g., 'PAX7' vs 'Pax7')
Run gget setup before first use of alphafold, cellxgene, elm, gpt
For enrichment analysis, use database shortcuts for convenience
Cache cBioPortal data with -dd to avoid repeated downloads

Structure Prediction

AlphaFold multimer predictions: use -mr 20 for higher accuracy
Use -r flag for AMBER relaxation of final structures
Visualize results in Python with plot=True
Check PDB database first before running AlphaFold predictions

Error Handling

Database structures change; update gget regularly: uv pip install --upgrade gget
Process max ~1000 Ensembl IDs at once with gget info
For large-scale analyses, implement rate limiting for API queries
Use virtual environments to avoid dependency conflicts

Output Formats

Command-line

Default: JSON
CSV: Add -csv flag
FASTA: gget seq, gget mutate
PDB: gget pdb, gget alphafold
PNG: gget cbio plot

Python

Default: DataFrame or dictionary
JSON: Add json=True parameter
Save to file: Add save=True or specify out="filename"
AnnData: gget cellxgene

Resources

This skill includes reference documentation for detailed module information:

references/

module_reference.md - Comprehensive parameter reference for all modules
database_info.md - Information about queried databases and their update frequencies
workflows.md - Extended workflow examples and use cases

For additional help:

Official documentation: https://pachterlab.github.io/gget/
GitHub issues: https://github.com/pachterlab/gget/issues
Citation: Luebbert, L. & Pachter, L. (2023). Efficient querying of genomic reference databases with gget. Bioinformatics. https://doi.org/10.1093/bioinformatics/btac836

Weekly Installs

122

Repository

davila7/claude-…emplates

GitHub Stars

22.6K

First Seen

Jan 21, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykFail

Installed on

claude-code104

opencode98

gemini-cli92

cursor92

antigravity84

codex81

Lark CLI Wiki API 使用指南：获取知识空间节点信息与权限管理

39,100 周安装

-nc/--no_confirm: Skip download confirmations

gget生物信息学工具：统一访问20+基因组数据库的命令行与Python包

🇨🇳中文介绍

gget

概述

安装

快速开始

相关 Skills

模块类别

1. 参考与基因信息

gget ref - 参考基因组下载

gget search - 基因搜索

gget info - 基因/转录本信息

gget seq - 序列检索

2. 序列分析与比对

gget blast - BLAST 搜索

gget blat - BLAT 搜索

gget muscle - 多序列比对

gget diamond - 局部序列比对

3. 结构与蛋白质分析

gget pdb - 蛋白质结构

gget alphafold - 蛋白质结构预测

gget elm - 真核生物线性基序

4. 表达与疾病数据

gget archs4 - 基因相关性与组织表达

gget cellxgene - 单细胞 RNA-seq 数据

gget enrichr - 富集分析

gget bgee - 直系同源与表达

gget opentargets - 疾病与药物关联

gget cbio - cBioPortal 癌症基因组学

gget cosmic - COSMIC 数据库

5. 其他工具

gget mutate - 生成突变序列

gget gpt - OpenAI 文本生成

gget setup - 安装依赖项

常见工作流程

工作流程 1：基因发现到序列分析

工作流程 2：序列比对与结构

工作流程 3：基因表达与富集

工作流程 4：疾病与药物分析

工作流程 5：比较基因组学

工作流程 6：构建参考索引

最佳实践

数据检索

序列分析

表达与疾病数据

结构预测

错误处理

输出格式

命令行

Python

资源

references/

🇺🇸English

gget

Overview

Installation

Quick Start

Module Categories

1. Reference & Gene Information

gget ref - Reference Genome Downloads

gget search - Gene Search

gget info - Gene/Transcript Information

gget seq - Sequence Retrieval

2. Sequence Analysis & Alignment

gget blast - BLAST Searches

gget blat - BLAT Searches

gget muscle - Multiple Sequence Alignment

gget diamond - Local Sequence Alignment

3. Structural & Protein Analysis

gget pdb - Protein Structures

gget alphafold - Protein Structure Prediction

gget elm - Eukaryotic Linear Motifs

4. Expression & Disease Data

gget archs4 - Gene Correlation & Tissue Expression

gget cellxgene - Single-Cell RNA-seq Data

gget enrichr - Enrichment Analysis

gget bgee - Orthology & Expression

gget opentargets - Disease & Drug Associations

gget cbio - cBioPortal Cancer Genomics

gget cosmic - COSMIC Database