gget by davila7/claude-code-templates
npx skills add https://github.com/davila7/claude-code-templates --skill ggetgget 是一个命令行生物信息学工具和 Python 包,提供对 20 多个基因组数据库和分析方法的统一访问。通过一致的接口查询基因信息、序列分析、蛋白质结构、表达数据和疾病关联。所有 gget 模块既可作为命令行工具使用,也可作为 Python 函数使用。
重要提示:gget 查询的数据库会持续更新,这有时会改变其结构。gget 模块每两周自动测试一次,并在必要时更新以匹配新的数据库结构。
在干净的虚拟环境中安装 gget 以避免冲突:
# 使用 uv(推荐)
uv uv pip install gget
# 或使用 pip
uv pip install --upgrade gget
# 在 Python/Jupyter 中
import gget
所有模块的基本使用模式:
# 命令行
gget <module> [arguments] [options]
# Python
gget.module(arguments, options)
大多数模块返回:
-csv 标志的 CSV各模块通用的标志:
-o/--out:将结果保存到文件广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
-q/--quiet:抑制进度信息-csv:返回 CSV 格式(仅限命令行)检索 Ensembl 参考基因组的下载链接和元数据。
参数:
species:属_种格式(例如 'homo_sapiens'、'mus_musculus')。快捷方式:'human'、'mouse'-w/--which:指定返回类型(gtf、cdna、dna、cds、cdrna、pep)。默认:全部-r/--release:Ensembl 发布版本号(默认:最新)-l/--list_species:列出可用的脊椎动物物种-liv/--list_iv_species:列出可用的无脊椎动物物种-ftp:仅返回 FTP 链接-d/--download:下载文件(需要 curl)示例:
# 列出可用物种
gget ref --list_species
# 获取人类的所有参考文件
gget ref homo_sapiens
# 仅下载小鼠的 GTF 注释
gget ref -w gtf -d mouse
# Python
gget.ref("homo_sapiens")
gget.ref("mus_musculus", which="gtf", download=True)
跨物种按名称或描述定位基因。
参数:
searchwords:一个或多个搜索词(不区分大小写)-s/--species:目标物种(例如 'homo_sapiens'、'mouse')-r/--release:Ensembl 发布版本号-t/--id_type:返回 'gene'(默认)或 'transcript'-ao/--andor:'or'(默认)查找任何搜索词;'and' 要求所有-l/--limit:返回的最大结果数返回:ensembl_id、gene_name、ensembl_description、ext_ref_description、biotype、URL
示例:
# 在人类中搜索 GABA 相关基因
gget search -s human gaba gamma-aminobutyric
# 查找特定基因,要求所有术语
gget search -s mouse -ao and pax7 transcription
# Python
gget.search(["gaba", "gamma-aminobutyric"], species="homo_sapiens")
从 Ensembl、UniProt 和 NCBI 检索全面的基因和转录本元数据。
参数:
ens_ids:一个或多个 Ensembl ID(也支持 WormBase、Flybase ID)。限制:约 1000 个 ID-n/--ncbi:禁用 NCBI 数据检索-u/--uniprot:禁用 UniProt 数据检索-pdb:包含 PDB 标识符(增加运行时间)返回:UniProt ID、NCBI 基因 ID、主要基因名称、同义词、蛋白质名称、描述、生物类型、规范转录本
示例:
# 获取多个基因的信息
gget info ENSG00000034713 ENSG00000104853 ENSG00000170296
# 包含 PDB ID
gget info ENSG00000034713 -pdb
# Python
gget.info(["ENSG00000034713", "ENSG00000104853"], pdb=True)
获取基因和转录本的核苷酸或氨基酸序列。
参数:
ens_ids:一个或多个 Ensembl 标识符-t/--translate:获取氨基酸序列而非核苷酸-iso/--isoforms:返回所有转录本变体(仅限基因 ID)返回:FASTA 格式序列
示例:
# 获取核苷酸序列
gget seq ENSG00000034713 ENSG00000104853
# 获取所有蛋白质异构体
gget seq -t -iso ENSG00000034713
# Python
gget.seq(["ENSG00000034713"], translate=True, isoforms=True)
针对标准数据库进行核苷酸或氨基酸序列的 BLAST 搜索。
参数:
sequence:序列字符串或 FASTA/.txt 文件路径-p/--program:blastn、blastp、blastx、tblastn、tblastx(自动检测)-db/--database:
-l/--limit:最大命中数(默认:50)-e/--expect:E 值截止值(默认:10.0)-lcf/--low_comp_filt:启用低复杂度过滤-mbo/--megablast_off:禁用 MegaBLAST(仅限 blastn)示例:
# BLAST 蛋白质序列
gget blast MKWMFKEDHSLEHRCVESAKIRAKYPDRVPVIVEKVSGSQIVDIDKRKYLVPSDITVAQFMWIIRKRIQLPSEKAIFLFVDKTVPQSR
# 使用特定数据库从文件进行 BLAST
gget blast sequence.fasta -db swissprot -l 10
# Python
gget.blast("MKWMFK...", database="swissprot", limit=10)
使用 UCSC BLAT 定位序列的基因组位置。
参数:
sequence:序列字符串或 FASTA/.txt 文件路径-st/--seqtype:'DNA'、'protein'、'translated%20RNA'、'translated%20DNA'(自动检测)-a/--assembly:目标组装(默认:'human'/hg38;选项:'mouse'/mm39、'zebrafinch'/taeGut2 等)返回:基因组、查询大小、比对位置、匹配、不匹配、比对百分比
示例:
# 在人类中查找基因组位置
gget blat ATCGATCGATCGATCG
# 在不同组装中搜索
gget blat -a mm39 ATCGATCGATCGATCG
# Python
gget.blat("ATCGATCGATCGATCG", assembly="mouse")
使用 Muscle5 比对多个核苷酸或氨基酸序列。
参数:
fasta:序列或 FASTA/.txt 文件路径-s5/--super5:使用 Super5 算法进行更快处理(大型数据集)返回:ClustalW 格式的比对序列或比对后的 FASTA (.afa)
示例:
# 比对文件中的序列
gget muscle sequences.fasta -o aligned.afa
# 对大型数据集使用 Super5
gget muscle large_dataset.fasta -s5
# Python
gget.muscle("sequences.fasta", save=True)
使用 DIAMOND 执行快速的局部蛋白质或翻译 DNA 比对。
参数:
--reference:参考序列(字符串/列表)或 FASTA 文件路径(必需)--sensitivity:fast、mid-sensitive、sensitive、more-sensitive、very-sensitive(默认)、ultra-sensitive--threads:CPU 线程数(默认:1)--diamond_db:保存数据库以供重复使用--translated:启用核苷酸到氨基酸的比对返回:同一性百分比、序列长度、匹配位置、缺口开放、E 值、比特分数
示例:
# 针对参考序列进行比对
gget diamond GGETISAWESQME -ref reference.fasta --threads 4
# 保存数据库以供重复使用
gget diamond query.fasta -ref ref.fasta --diamond_db my_db.dmnd
# Python
gget.diamond("GGETISAWESQME", reference="reference.fasta", threads=4)
查询 RCSB 蛋白质数据库以获取结构和元数据。
参数:
pdb_id:PDB 标识符(例如 '7S7U')-r/--resource:数据类型(pdb、entry、pubmed、assembly、entity types)-i/--identifier:组装、实体或链 ID返回:PDB 格式(结构)或 JSON(元数据)
示例:
# 下载 PDB 结构
gget pdb 7S7U -o 7S7U.pdb
# 获取元数据
gget pdb 7S7U -r entry
# Python
gget.pdb("7S7U", save=True)
使用简化的 AlphaFold2 预测 3D 蛋白质结构。
所需设置:
# 首先安装 OpenMM
uv pip install openmm
# 然后设置 AlphaFold
gget setup alphafold
参数:
sequence:氨基酸序列(字符串)、多个序列(列表)或 FASTA 文件。多个序列触发多聚体建模-mr/--multimer_recycles:循环迭代次数(默认:3;建议 20 以提高准确性)-mfm/--multimer_for_monomer:将多聚体模型应用于单个蛋白质-r/--relax:对排名最高的模型进行 AMBER 松弛plot:仅限 Python;生成交互式 3D 可视化(默认:True)show_sidechains:仅限 Python;包含侧链(默认:True)返回:PDB 结构文件、JSON 比对误差数据、可选的 3D 可视化
示例:
# 预测单个蛋白质结构
gget alphafold MKWMFKEDHSLEHRCVESAKIRAKYPDRVPVIVEKVSGSQIVDIDKRKYLVPSDITVAQFMWIIRKRIQLPSEKAIFLFVDKTVPQSR
# 以更高准确性预测多聚体
gget alphafold sequence1.fasta -mr 20 -r
# Python 带可视化
gget.alphafold("MKWMFK...", plot=True, show_sidechains=True)
# 多聚体预测
gget.alphafold(["sequence1", "sequence2"], multimer_recycles=20)
预测蛋白质序列中的真核生物线性基序。
所需设置:
gget setup elm
参数:
sequence:氨基酸序列或 UniProt Acc-u/--uniprot:指示序列是 UniProt Acc-e/--expand:包含蛋白质名称、生物体、参考文献-s/--sensitivity:DIAMOND 比对灵敏度(默认:"very-sensitive")-t/--threads:线程数(默认:1)返回:两个输出:
示例:
# 从序列预测基序
gget elm LIAQSIGQASFV -o results
# 使用 UniProt 登录号并扩展信息
gget elm --uniprot Q02410 -e
# Python
ortholog_df, regex_df = gget.elm("LIAQSIGQASFV")
查询 ARCHS4 数据库以获取相关基因或组织表达数据。
参数:
gene:基因符号或 Ensembl ID(使用 --ensembl 标志)-w/--which:'correlation'(默认,返回 100 个最相关的基因)或 'tissue'(表达图谱)-s/--species:'human'(默认)或 'mouse'(仅限组织数据)-e/--ensembl:输入是 Ensembl ID返回:
示例:
# 获取相关基因
gget archs4 ACE2
# 获取组织表达
gget archs4 -w tissue ACE2
# Python
gget.archs4("ACE2", which="tissue")
查询 CZ CELLxGENE Discover Census 以获取单细胞数据。
所需设置:
gget setup cellxgene
参数:
--gene (-g):基因名称或 Ensembl ID(区分大小写!人类用 'PAX7',小鼠用 'Pax7')--tissue:组织类型--cell_type:特定细胞类型--species (-s):'homo_sapiens'(默认)或 'mus_musculus'--census_version (-cv):版本("stable"、"latest" 或带日期)--ensembl (-e):使用 Ensembl ID--meta_only (-mo):仅返回元数据返回:包含计数矩阵和元数据的 AnnData 对象(或仅元数据的数据框)
示例:
# 获取特定基因和细胞类型的单细胞数据
gget cellxgene --gene ACE2 ABCA1 --tissue lung --cell_type "mucus secreting cell" -o lung_data.h5ad
# 仅元数据
gget cellxgene --gene PAX7 --tissue muscle --meta_only -o metadata.csv
# Python
adata = gget.cellxgene(gene=["ACE2", "ABCA1"], tissue="lung", cell_type="mucus secreting cell")
使用 Enrichr 对基因列表进行本体富集分析。
参数:
genes:基因符号或 Ensembl ID-db/--database:参考数据库(支持快捷方式:'pathway'、'transcription'、'ontology'、'diseases_drugs'、'celltypes')-s/--species:human(默认)、mouse、fly、yeast、worm、fish-bkg_l/--background_list:用于比较的背景基因-ko/--kegg_out:保存带有高亮基因的 KEGG 通路图像plot:仅限 Python;生成图形结果数据库快捷方式:
示例:
# 本体富集分析
gget enrichr -db ontology ACE2 AGT AGTR1
# 保存 KEGG 通路
gget enrichr -db pathway ACE2 AGT AGTR1 -ko ./kegg_images/
# Python 带绘图
gget.enrichr(["ACE2", "AGT", "AGTR1"], database="ontology", plot=True)
从 Bgee 数据库检索直系同源和基因表达数据。
参数:
ens_id:Ensembl 基因 ID 或 NCBI 基因 ID(用于非 Ensembl 物种)。当 type=expression 时支持多个 ID-t/--type:'orthologs'(默认)或 'expression'返回:
示例:
# 获取直系同源物
gget bgee ENSG00000169194
# 获取表达数据
gget bgee ENSG00000169194 -t expression
# 多个基因
gget bgee ENSBTAG00000047356 ENSBTAG00000018317 -t expression
# Python
gget.bgee("ENSG00000169194", type="orthologs")
从 OpenTargets 检索疾病和药物关联。
参数:
-r/--resource:diseases(默认)、drugs、tractability、pharmacogenetics、expression、depmap、interactions-l/--limit:限制结果数量--filter_disease--filter_drug--filter_tissue、--filter_anat_sys、--filter_organ--filter_protein_a、--filter_protein_b、--filter_gene_b示例:
# 获取相关疾病
gget opentargets ENSG00000169194 -r diseases -l 5
# 获取相关药物
gget opentargets ENSG00000169194 -r drugs -l 10
# 获取组织表达
gget opentargets ENSG00000169194 -r expression --filter_tissue brain
# Python
gget.opentargets("ENSG00000169194", resource="diseases", limit=5)
使用 cBioPortal 数据绘制癌症基因组学热图。
两个子命令:
search - 查找研究 ID:
gget cbio search breast lung
plot - 生成热图:
参数:
-s/--study_ids:以空格分隔的 cBioPortal 研究 ID(必需)-g/--genes:以空格分隔的基因名称或 Ensembl ID(必需)-st/--stratification:组织数据的列(tissue、cancer_type、cancer_type_detailed、study_id、sample)-vt/--variation_type:数据类型(mutation_occurrences、cna_nonbinary、sv_occurrences、cna_occurrences、Consequence)-f/--filter:按列值过滤(例如 'study_id:msk_impact_2017')-dd/--data_dir:缓存目录(默认:./gget_cbio_cache)-fd/--figure_dir:输出目录(默认:./gget_cbio_figures)-dpi:分辨率(默认:100)-sh/--show:在窗口中显示绘图-nc/--no_confirm:跳过下载确认示例:
# 搜索研究
gget cbio search esophag ovary
# 创建热图
gget cbio plot -s msk_impact_2017 -g AKT1 ALK BRAF -st tissue -vt mutation_occurrences
# Python
gget.cbio_search(["esophag", "ovary"])
gget.cbio_plot(["msk_impact_2017"], ["AKT1", "ALK"], stratification="tissue")
搜索 COSMIC(癌症体细胞突变目录)数据库。
重要提示:商业使用需支付许可费。需要 COSMIC 账户凭据。
参数:
searchterm:基因名称、Ensembl ID、突变符号或样本 ID-ctp/--cosmic_tsv_path:已下载的 COSMIC TSV 文件路径(查询必需)-l/--limit:最大结果数(默认:100)数据库下载标志:
-d/--download_cosmic:激活下载模式-gm/--gget_mutate:为 gget mutate 创建版本-cp/--cosmic_project:数据库类型(cancer、census、cell_line、resistance、genome_screen、targeted_screen)-cv/--cosmic_version:COSMIC 版本-gv/--grch_version:人类参考基因组(37 或 38)--email、--password:COSMIC 凭据示例:
# 首先下载数据库
gget cosmic -d --email user@example.com --password xxx -cp cancer
# 然后查询
gget cosmic EGFR -ctp cosmic_data.tsv -l 10
# Python
gget.cosmic("EGFR", cosmic_tsv_path="cosmic_data.tsv", limit=10)
从突变注释生成突变的核苷酸序列。
参数:
sequences:FASTA 文件路径或直接序列输入(字符串/列表)-m/--mutations:包含突变数据的 CSV/TSV 文件或 DataFrame(必需)-mc/--mut_column:突变列名(默认:'mutation')-sic/--seq_id_column:序列 ID 列(默认:'seq_ID')-mic/--mut_id_column:突变 ID 列-k/--k:侧翼序列长度(默认:30 个核苷酸)返回:FASTA 格式的突变序列
示例:
# 单个突变
gget mutate ATCGCTAAGCT -m "c.4G>T"
# 从文件获取多个序列及其突变
gget mutate sequences.fasta -m mutations.csv -o mutated.fasta
# Python
import pandas as pd
mutations_df = pd.DataFrame({"seq_ID": ["seq1"], "mutation": ["c.4G>T"]})
gget.mutate(["ATCGCTAAGCT"], mutations=mutations_df)
使用 OpenAI 的 API 生成自然语言文本。
所需设置:
gget setup gpt
重要提示:免费层级在账户创建后仅限 3 个月。设置月度账单限制。
参数:
prompt:用于生成的文本输入(必需)api_key:OpenAI 身份验证(必需)示例:
gget gpt "Explain CRISPR" --api_key your_key_here
# Python
gget.gpt("Explain CRISPR", api_key="your_key_here")
为特定模块安装/下载第三方依赖项。
参数:
module:需要安装依赖项的模块名称-o/--out:输出文件夹路径(仅限 elm 模块)需要设置的模块:
alphafold - 下载约 4GB 的模型参数cellxgene - 安装 cellxgene-census(可能不支持最新的 Python)elm - 下载本地 ELM 数据库gpt - 配置 OpenAI 集成示例:
# 设置 AlphaFold
gget setup alphafold
# 使用自定义目录设置 ELM
gget setup elm -o /path/to/elm_data
# Python
gget.setup("alphafold")
查找和分析感兴趣的基因:
# 1. 搜索基因
results = gget.search(["GABA", "receptor"], species="homo_sapiens")
# 2. 获取详细信息
gene_ids = results["ensembl_id"].tolist()
info = gget.info(gene_ids[:5])
# 3. 检索序列
sequences = gget.seq(gene_ids[:5], translate=True)
比对序列并预测结构:
# 1. 比对多个序列
alignment = gget.muscle("sequences.fasta")
# 2. 查找相似序列
blast_results = gget.blast(my_sequence, database="swissprot", limit=10)
# 3. 预测结构
structure = gget.alphafold(my_sequence, plot=True)
# 4. 查找线性基序
ortholog_df, regex_df = gget.elm(my_sequence)
分析表达模式和功能富集:
# 1. 获取组织表达
tissue_expr = gget.archs4("ACE2", which="tissue")
# 2. 查找相关基因
correlated = gget.archs4("ACE2", which="correlation")
# 3. 获取单细胞数据
adata = gget.cellxgene(gene=["ACE2"], tissue="lung", cell_type="epithelial cell")
# 4. 执行富集分析
gene_list = correlated["gene_symbol"].tolist()[:50]
enrichment = gget.enrichr(gene_list, database="ontology", plot=True)
研究疾病关联和治疗靶点:
# 1. 搜索基因
genes = gget.search(["breast cancer"], species="homo_sapiens")
# 2. 获取疾病关联
diseases = gget.opentargets("ENSG00000169194", resource="diseases")
# 3. 获取药物关联
drugs = gget.opentargets("ENSG00000169194", resource="drugs")
# 4. 查询癌症基因组学数据
study_ids = gget.cbio_search(["breast"])
gget.cbio_plot(study_ids[:2], ["BRCA1", "BRCA2"], stratification="cancer_type")
# 5. 在 COSMIC 中搜索突变
cosmic_results = gget.cosmic("BRCA1", cosmic_tsv_path="cosmic.tsv")
跨物种比较蛋白质:
# 1. 获取直系同源物
orthologs = gget.bgee("ENSG00000169194", type="orthologs")
# 2. 获取用于比较的序列
human_seq = gget.seq("ENSG00000169194", translate=True)
mouse_seq = gget.seq("ENSMUSG00000026091", translate=True)
# 3. 比对序列
alignment = gget.muscle([human_seq, mouse_seq])
# 4. 比较结构
human_structure = gget.pdb("7S7U")
mouse_structure = gget.alphafold(mouse_seq)
为下游分析准备参考数据(例如,kallisto|bustools):
# 1. 列出可用物种
gget ref --list_species
# 2. 下载参考文件
gget ref -w gtf -w cdna -d homo_sapiens
# 3. 构建 kallisto 索引
kallisto index -i transcriptome.idx transcriptome.fasta
# 4. 下载用于比对的基因组
gget ref -w dna -d homo_sapiens
--limit 控制大型查询的结果大小-o/--out 保存结果以确保可重复性--quiet 以减少输出--threads 的 gget diamond 进行更快的局部比对--diamond_db 保存 DIAMOND 数据库以供重复查询-s5/--super5gget setup-dd 缓存 cBioPortal 数据以避免重复下载-mr 20 以获得更高准确性-r 标志对最终结构进行 AMBER 松弛plot=True 可视化结果uv pip install --upgrade gget-csv 标志json=True 参数save=True 或指定 out="filename"此技能包含详细模块信息的参考文档:
module_reference.md - 所有模块的全面参数参考database_info.md - 关于查询数据库及其更新频率的信息workflows.md - 扩展的工作流程示例和用例如需额外帮助:
每周安装次数
122
仓库
GitHub 星标数
22.6K
首次出现
2026年1月21日
安全审计
安装于
claude-code104
opencode98
gemini-cli92
cursor92
antigravity84
codex81
gget is a command-line bioinformatics tool and Python package providing unified access to 20+ genomic databases and analysis methods. Query gene information, sequence analysis, protein structures, expression data, and disease associations through a consistent interface. All gget modules work both as command-line tools and as Python functions.
Important : The databases queried by gget are continuously updated, which sometimes changes their structure. gget modules are tested automatically on a biweekly basis and updated to match new database structures when necessary.
Install gget in a clean virtual environment to avoid conflicts:
# Using uv (recommended)
uv uv pip install gget
# Or using pip
uv pip install --upgrade gget
# In Python/Jupyter
import gget
Basic usage pattern for all modules:
# Command-line
gget <module> [arguments] [options]
# Python
gget.module(arguments, options)
Most modules return:
-csv flagCommon flags across modules:
-o/--out: Save results to file-q/--quiet: Suppress progress information-csv: Return CSV format (command-line only)Retrieve download links and metadata for Ensembl reference genomes.
Parameters :
species: Genus_species format (e.g., 'homo_sapiens', 'mus_musculus'). Shortcuts: 'human', 'mouse'-w/--which: Specify return types (gtf, cdna, dna, cds, cdrna, pep). Default: all-r/--release: Ensembl release number (default: latest)-l/--list_species: List available vertebrate species-liv/--list_iv_species: List available invertebrate species-ftp: Return only FTP links-d/--download: Download files (requires curl)Examples :
# List available species
gget ref --list_species
# Get all reference files for human
gget ref homo_sapiens
# Download only GTF annotation for mouse
gget ref -w gtf -d mouse
# Python
gget.ref("homo_sapiens")
gget.ref("mus_musculus", which="gtf", download=True)
Locate genes by name or description across species.
Parameters :
searchwords: One or more search terms (case-insensitive)-s/--species: Target species (e.g., 'homo_sapiens', 'mouse')-r/--release: Ensembl release number-t/--id_type: Return 'gene' (default) or 'transcript'-ao/--andor: 'or' (default) finds ANY searchword; 'and' requires ALL-l/--limit: Maximum results to returnReturns : ensembl_id, gene_name, ensembl_description, ext_ref_description, biotype, URL
Examples :
# Search for GABA-related genes in human
gget search -s human gaba gamma-aminobutyric
# Find specific gene, require all terms
gget search -s mouse -ao and pax7 transcription
# Python
gget.search(["gaba", "gamma-aminobutyric"], species="homo_sapiens")
Retrieve comprehensive gene and transcript metadata from Ensembl, UniProt, and NCBI.
Parameters :
ens_ids: One or more Ensembl IDs (also supports WormBase, Flybase IDs). Limit: ~1000 IDs-n/--ncbi: Disable NCBI data retrieval-u/--uniprot: Disable UniProt data retrieval-pdb: Include PDB identifiers (increases runtime)Returns : UniProt ID, NCBI gene ID, primary gene name, synonyms, protein names, descriptions, biotype, canonical transcript
Examples :
# Get info for multiple genes
gget info ENSG00000034713 ENSG00000104853 ENSG00000170296
# Include PDB IDs
gget info ENSG00000034713 -pdb
# Python
gget.info(["ENSG00000034713", "ENSG00000104853"], pdb=True)
Fetch nucleotide or amino acid sequences for genes and transcripts.
Parameters :
ens_ids: One or more Ensembl identifiers-t/--translate: Fetch amino acid sequences instead of nucleotide-iso/--isoforms: Return all transcript variants (gene IDs only)Returns : FASTA format sequences
Examples :
# Get nucleotide sequences
gget seq ENSG00000034713 ENSG00000104853
# Get all protein isoforms
gget seq -t -iso ENSG00000034713
# Python
gget.seq(["ENSG00000034713"], translate=True, isoforms=True)
BLAST nucleotide or amino acid sequences against standard databases.
Parameters :
sequence: Sequence string or path to FASTA/.txt file-p/--program: blastn, blastp, blastx, tblastn, tblastx (auto-detected)-db/--database:
-l/--limit: Max hits (default: 50)-e/--expect: E-value cutoff (default: 10.0)-lcf/--low_comp_filt: Enable low complexity filtering-mbo/--megablast_off: Disable MegaBLAST (blastn only)Examples :
# BLAST protein sequence
gget blast MKWMFKEDHSLEHRCVESAKIRAKYPDRVPVIVEKVSGSQIVDIDKRKYLVPSDITVAQFMWIIRKRIQLPSEKAIFLFVDKTVPQSR
# BLAST from file with specific database
gget blast sequence.fasta -db swissprot -l 10
# Python
gget.blast("MKWMFK...", database="swissprot", limit=10)
Locate genomic positions of sequences using UCSC BLAT.
Parameters :
sequence: Sequence string or path to FASTA/.txt file-st/--seqtype: 'DNA', 'protein', 'translated%20RNA', 'translated%20DNA' (auto-detected)-a/--assembly: Target assembly (default: 'human'/hg38; options: 'mouse'/mm39, 'zebrafinch'/taeGut2, etc.)Returns : genome, query size, alignment positions, matches, mismatches, alignment percentage
Examples :
# Find genomic location in human
gget blat ATCGATCGATCGATCG
# Search in different assembly
gget blat -a mm39 ATCGATCGATCGATCG
# Python
gget.blat("ATCGATCGATCGATCG", assembly="mouse")
Align multiple nucleotide or amino acid sequences using Muscle5.
Parameters :
fasta: Sequences or path to FASTA/.txt file-s5/--super5: Use Super5 algorithm for faster processing (large datasets)Returns : Aligned sequences in ClustalW format or aligned FASTA (.afa)
Examples :
# Align sequences from file
gget muscle sequences.fasta -o aligned.afa
# Use Super5 for large dataset
gget muscle large_dataset.fasta -s5
# Python
gget.muscle("sequences.fasta", save=True)
Perform fast local protein or translated DNA alignment using DIAMOND.
Parameters :
--reference: Reference sequences (string/list) or FASTA file path (required)--sensitivity: fast, mid-sensitive, sensitive, more-sensitive, very-sensitive (default), ultra-sensitive--threads: CPU threads (default: 1)--diamond_db: Save database for reuse--translated: Enable nucleotide-to-amino acid alignmentReturns : Identity percentage, sequence lengths, match positions, gap openings, E-values, bit scores
Examples :
# Align against reference
gget diamond GGETISAWESQME -ref reference.fasta --threads 4
# Save database for reuse
gget diamond query.fasta -ref ref.fasta --diamond_db my_db.dmnd
# Python
gget.diamond("GGETISAWESQME", reference="reference.fasta", threads=4)
Query RCSB Protein Data Bank for structure and metadata.
Parameters :
pdb_id: PDB identifier (e.g., '7S7U')-r/--resource: Data type (pdb, entry, pubmed, assembly, entity types)-i/--identifier: Assembly, entity, or chain IDReturns : PDB format (structures) or JSON (metadata)
Examples :
# Download PDB structure
gget pdb 7S7U -o 7S7U.pdb
# Get metadata
gget pdb 7S7U -r entry
# Python
gget.pdb("7S7U", save=True)
Predict 3D protein structures using simplified AlphaFold2.
Setup Required :
# Install OpenMM first
uv pip install openmm
# Then setup AlphaFold
gget setup alphafold
Parameters :
sequence: Amino acid sequence (string), multiple sequences (list), or FASTA file. Multiple sequences trigger multimer modeling-mr/--multimer_recycles: Recycling iterations (default: 3; recommend 20 for accuracy)-mfm/--multimer_for_monomer: Apply multimer model to single proteins-r/--relax: AMBER relaxation for top-ranked modelplot: Python-only; generate interactive 3D visualization (default: True)show_sidechains: Python-only; include side chains (default: True)Returns : PDB structure file, JSON alignment error data, optional 3D visualization
Examples :
# Predict single protein structure
gget alphafold MKWMFKEDHSLEHRCVESAKIRAKYPDRVPVIVEKVSGSQIVDIDKRKYLVPSDITVAQFMWIIRKRIQLPSEKAIFLFVDKTVPQSR
# Predict multimer with higher accuracy
gget alphafold sequence1.fasta -mr 20 -r
# Python with visualization
gget.alphafold("MKWMFK...", plot=True, show_sidechains=True)
# Multimer prediction
gget.alphafold(["sequence1", "sequence2"], multimer_recycles=20)
Predict Eukaryotic Linear Motifs in protein sequences.
Setup Required :
gget setup elm
Parameters :
sequence: Amino acid sequence or UniProt Acc-u/--uniprot: Indicates sequence is UniProt Acc-e/--expand: Include protein names, organisms, references-s/--sensitivity: DIAMOND alignment sensitivity (default: "very-sensitive")-t/--threads: Number of threads (default: 1)Returns : Two outputs:
Examples :
# Predict motifs from sequence
gget elm LIAQSIGQASFV -o results
# Use UniProt accession with expanded info
gget elm --uniprot Q02410 -e
# Python
ortholog_df, regex_df = gget.elm("LIAQSIGQASFV")
Query ARCHS4 database for correlated genes or tissue expression data.
Parameters :
gene: Gene symbol or Ensembl ID (with --ensembl flag)-w/--which: 'correlation' (default, returns 100 most correlated genes) or 'tissue' (expression atlas)-s/--species: 'human' (default) or 'mouse' (tissue data only)-e/--ensembl: Input is Ensembl IDReturns :
Examples :
# Get correlated genes
gget archs4 ACE2
# Get tissue expression
gget archs4 -w tissue ACE2
# Python
gget.archs4("ACE2", which="tissue")
Query CZ CELLxGENE Discover Census for single-cell data.
Setup Required :
gget setup cellxgene
Parameters :
--gene (-g): Gene names or Ensembl IDs (case-sensitive! 'PAX7' for human, 'Pax7' for mouse)--tissue: Tissue type(s)--cell_type: Specific cell type(s)--species (-s): 'homo_sapiens' (default) or 'mus_musculus'--census_version (-cv): Version ("stable", "latest", or dated)--ensembl (-e): Use Ensembl IDs--meta_only (-mo): Return metadata onlyReturns : AnnData object with count matrices and metadata (or metadata-only dataframes)
Examples :
# Get single-cell data for specific genes and cell types
gget cellxgene --gene ACE2 ABCA1 --tissue lung --cell_type "mucus secreting cell" -o lung_data.h5ad
# Metadata only
gget cellxgene --gene PAX7 --tissue muscle --meta_only -o metadata.csv
# Python
adata = gget.cellxgene(gene=["ACE2", "ABCA1"], tissue="lung", cell_type="mucus secreting cell")
Perform ontology enrichment analysis on gene lists using Enrichr.
Parameters :
genes: Gene symbols or Ensembl IDs-db/--database: Reference database (supports shortcuts: 'pathway', 'transcription', 'ontology', 'diseases_drugs', 'celltypes')-s/--species: human (default), mouse, fly, yeast, worm, fish-bkg_l/--background_list: Background genes for comparison-ko/--kegg_out: Save KEGG pathway images with highlighted genesplot: Python-only; generate graphical resultsDatabase Shortcuts :
Examples :
# Enrichment analysis for ontology
gget enrichr -db ontology ACE2 AGT AGTR1
# Save KEGG pathways
gget enrichr -db pathway ACE2 AGT AGTR1 -ko ./kegg_images/
# Python with plot
gget.enrichr(["ACE2", "AGT", "AGTR1"], database="ontology", plot=True)
Retrieve orthology and gene expression data from Bgee database.
Parameters :
ens_id: Ensembl gene ID or NCBI gene ID (for non-Ensembl species). Multiple IDs supported when type=expression-t/--type: 'orthologs' (default) or 'expression'Returns :
Examples :
# Get orthologs
gget bgee ENSG00000169194
# Get expression data
gget bgee ENSG00000169194 -t expression
# Multiple genes
gget bgee ENSBTAG00000047356 ENSBTAG00000018317 -t expression
# Python
gget.bgee("ENSG00000169194", type="orthologs")
Retrieve disease and drug associations from OpenTargets.
Parameters :
-r/--resource: diseases (default), drugs, tractability, pharmacogenetics, expression, depmap, interactions-l/--limit: Cap results count--filter_disease--filter_drug--filter_tissue, --filter_anat_sys, --filter_organ--filter_protein_a, --filter_protein_b, Examples :
# Get associated diseases
gget opentargets ENSG00000169194 -r diseases -l 5
# Get associated drugs
gget opentargets ENSG00000169194 -r drugs -l 10
# Get tissue expression
gget opentargets ENSG00000169194 -r expression --filter_tissue brain
# Python
gget.opentargets("ENSG00000169194", resource="diseases", limit=5)
Plot cancer genomics heatmaps using cBioPortal data.
Two subcommands :
search - Find study IDs:
gget cbio search breast lung
plot - Generate heatmaps:
Parameters :
-s/--study_ids: Space-separated cBioPortal study IDs (required)-g/--genes: Space-separated gene names or Ensembl IDs (required)-st/--stratification: Column to organize data (tissue, cancer_type, cancer_type_detailed, study_id, sample)-vt/--variation_type: Data type (mutation_occurrences, cna_nonbinary, sv_occurrences, cna_occurrences, Consequence)-f/--filter: Filter by column value (e.g., 'study_id:msk_impact_2017')-dd/--data_dir: Cache directory (default: ./gget_cbio_cache)-fd/--figure_dir: Output directory (default: ./gget_cbio_figures)-dpi: Resolution (default: 100)-sh/--show: Display plot in windowExamples :
# Search for studies
gget cbio search esophag ovary
# Create heatmap
gget cbio plot -s msk_impact_2017 -g AKT1 ALK BRAF -st tissue -vt mutation_occurrences
# Python
gget.cbio_search(["esophag", "ovary"])
gget.cbio_plot(["msk_impact_2017"], ["AKT1", "ALK"], stratification="tissue")
Search COSMIC (Catalogue Of Somatic Mutations In Cancer) database.
Important : License fees apply for commercial use. Requires COSMIC account credentials.
Parameters :
searchterm: Gene name, Ensembl ID, mutation notation, or sample ID-ctp/--cosmic_tsv_path: Path to downloaded COSMIC TSV file (required for querying)-l/--limit: Maximum results (default: 100)Database download flags :
-d/--download_cosmic: Activate download mode-gm/--gget_mutate: Create version for gget mutate-cp/--cosmic_project: Database type (cancer, census, cell_line, resistance, genome_screen, targeted_screen)-cv/--cosmic_version: COSMIC version-gv/--grch_version: Human reference genome (37 or 38)--email, --password: COSMIC credentialsExamples :
# First download database
gget cosmic -d --email user@example.com --password xxx -cp cancer
# Then query
gget cosmic EGFR -ctp cosmic_data.tsv -l 10
# Python
gget.cosmic("EGFR", cosmic_tsv_path="cosmic_data.tsv", limit=10)
Generate mutated nucleotide sequences from mutation annotations.
Parameters :
sequences: FASTA file path or direct sequence input (string/list)-m/--mutations: CSV/TSV file or DataFrame with mutation data (required)-mc/--mut_column: Mutation column name (default: 'mutation')-sic/--seq_id_column: Sequence ID column (default: 'seq_ID')-mic/--mut_id_column: Mutation ID column-k/--k: Length of flanking sequences (default: 30 nucleotides)Returns : Mutated sequences in FASTA format
Examples :
# Single mutation
gget mutate ATCGCTAAGCT -m "c.4G>T"
# Multiple sequences with mutations from file
gget mutate sequences.fasta -m mutations.csv -o mutated.fasta
# Python
import pandas as pd
mutations_df = pd.DataFrame({"seq_ID": ["seq1"], "mutation": ["c.4G>T"]})
gget.mutate(["ATCGCTAAGCT"], mutations=mutations_df)
Generate natural language text using OpenAI's API.
Setup Required :
gget setup gpt
Important : Free tier limited to 3 months after account creation. Set monthly billing limits.
Parameters :
prompt: Text input for generation (required)api_key: OpenAI authentication (required)Examples :
gget gpt "Explain CRISPR" --api_key your_key_here
# Python
gget.gpt("Explain CRISPR", api_key="your_key_here")
Install/download third-party dependencies for specific modules.
Parameters :
module: Module name requiring dependency installation-o/--out: Output folder path (elm module only)Modules requiring setup :
alphafold - Downloads ~4GB of model parameterscellxgene - Installs cellxgene-census (may not support latest Python)elm - Downloads local ELM databasegpt - Configures OpenAI integrationExamples :
# Setup AlphaFold
gget setup alphafold
# Setup ELM with custom directory
gget setup elm -o /path/to/elm_data
# Python
gget.setup("alphafold")
Find and analyze genes of interest:
# 1. Search for genes
results = gget.search(["GABA", "receptor"], species="homo_sapiens")
# 2. Get detailed information
gene_ids = results["ensembl_id"].tolist()
info = gget.info(gene_ids[:5])
# 3. Retrieve sequences
sequences = gget.seq(gene_ids[:5], translate=True)
Align sequences and predict structures:
# 1. Align multiple sequences
alignment = gget.muscle("sequences.fasta")
# 2. Find similar sequences
blast_results = gget.blast(my_sequence, database="swissprot", limit=10)
# 3. Predict structure
structure = gget.alphafold(my_sequence, plot=True)
# 4. Find linear motifs
ortholog_df, regex_df = gget.elm(my_sequence)
Analyze expression patterns and functional enrichment:
# 1. Get tissue expression
tissue_expr = gget.archs4("ACE2", which="tissue")
# 2. Find correlated genes
correlated = gget.archs4("ACE2", which="correlation")
# 3. Get single-cell data
adata = gget.cellxgene(gene=["ACE2"], tissue="lung", cell_type="epithelial cell")
# 4. Perform enrichment analysis
gene_list = correlated["gene_symbol"].tolist()[:50]
enrichment = gget.enrichr(gene_list, database="ontology", plot=True)
Investigate disease associations and therapeutic targets:
# 1. Search for genes
genes = gget.search(["breast cancer"], species="homo_sapiens")
# 2. Get disease associations
diseases = gget.opentargets("ENSG00000169194", resource="diseases")
# 3. Get drug associations
drugs = gget.opentargets("ENSG00000169194", resource="drugs")
# 4. Query cancer genomics data
study_ids = gget.cbio_search(["breast"])
gget.cbio_plot(study_ids[:2], ["BRCA1", "BRCA2"], stratification="cancer_type")
# 5. Search COSMIC for mutations
cosmic_results = gget.cosmic("BRCA1", cosmic_tsv_path="cosmic.tsv")
Compare proteins across species:
# 1. Get orthologs
orthologs = gget.bgee("ENSG00000169194", type="orthologs")
# 2. Get sequences for comparison
human_seq = gget.seq("ENSG00000169194", translate=True)
mouse_seq = gget.seq("ENSMUSG00000026091", translate=True)
# 3. Align sequences
alignment = gget.muscle([human_seq, mouse_seq])
# 4. Compare structures
human_structure = gget.pdb("7S7U")
mouse_structure = gget.alphafold(mouse_seq)
Prepare reference data for downstream analysis (e.g., kallisto|bustools):
# 1. List available species
gget ref --list_species
# 2. Download reference files
gget ref -w gtf -w cdna -d homo_sapiens
# 3. Build kallisto index
kallisto index -i transcriptome.idx transcriptome.fasta
# 4. Download genome for alignment
gget ref -w dna -d homo_sapiens
--limit to control result sizes for large queries-o/--out for reproducibility--quiet in production scripts to reduce outputgget diamond with --threads for faster local alignment--diamond_db for repeated queries-s5/--super5 for large datasetsgget setup before first use of alphafold, cellxgene, elm, gpt-dd to avoid repeated downloads-mr 20 for higher accuracy-r flag for AMBER relaxation of final structuresplot=Trueuv pip install --upgrade gget-csv flagjson=True parametersave=True or specify out="filename"This skill includes reference documentation for detailed module information:
module_reference.md - Comprehensive parameter reference for all modulesdatabase_info.md - Information about queried databases and their update frequenciesworkflows.md - Extended workflow examples and use casesFor additional help:
Weekly Installs
122
Repository
GitHub Stars
22.6K
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykFail
Installed on
claude-code104
opencode98
gemini-cli92
cursor92
antigravity84
codex81
Lark CLI Wiki API 使用指南:获取知识空间节点信息与权限管理
39,100 周安装
Moodle外部API开发教程:创建自定义Web服务、REST端点与移动应用后端
153 周安装
Web Audio API 技能:JARVIS AI 音频反馈、语音处理与音效开发指南
152 周安装
X (Twitter) API v2 命令行工具 - 使用 OAuth 1.0a 管理推文、用户与互动
153 周安装
CRISPR筛选分析工具:基因必需性、合成致死性与药物靶点发现全流程
154 周安装
色彩理论与调色板和谐专家:基于感知科学的照片拼贴与色彩管理工具
154 周安装
TanStack Table 模式指南:使用 Meta 字段优化 React 表格性能与类型安全
154 周安装
--filter_gene_b-nc/--no_confirm: Skip download confirmations