nf-core 流程部署指南：生物信息学分析自动化与Nextflow开发

nextflow-development by anthropics/knowledge-work-plugins

152 周安装量

10,300 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/anthropics/knowledge-work-plugins --skill nextflow-development

自动化科研工具生物信息学

🇨🇳中文介绍

nf-core 流程部署

在本地或公共测序数据上运行 nf-core 生物信息学流程。

目标用户： 没有专门生物信息学培训的实验室科学家和研究人员，他们需要运行大规模组学分析——差异表达、变异检测或染色质可及性分析。

工作流程清单

- [ ] 步骤 0：获取数据（如果来自 GEO/SRA）
- [ ] 步骤 1：环境检查（必须通过）
- [ ] 步骤 2：选择流程（与用户确认）
- [ ] 步骤 3：运行测试配置文件（必须通过）
- [ ] 步骤 4：创建样本表
- [ ] 步骤 5：配置并运行（与用户确认基因组）
- [ ] 步骤 6：验证输出

步骤 0：获取数据（仅限 GEO/SRA）

如果用户有本地 FASTQ 文件，请跳过此步骤。

对于公共数据集，首先从 GEO/SRA 获取。完整工作流程请参阅 references/geo-sra-acquisition.md。

快速开始：

# 1. 获取研究信息
python scripts/sra_geo_fetch.py info GSE110004

# 2. 下载（交互模式）
python scripts/sra_geo_fetch.py download GSE110004 -o ./fastq -i

# 3. 生成样本表
python scripts/sra_geo_fetch.py samplesheet GSE110004 --fastq-dir ./fastq -o samplesheet.csv

决策点： 获取研究信息后，与用户确认：

下载哪个样本子集（如果有多种数据类型）
建议的基因组和流程

然后继续步骤 1。

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

步骤 1：环境检查

首先运行。环境未通过检查，流程将失败。

python scripts/check_environment.py

所有关键检查必须通过。如果有任何失败，请提供修复说明：

问题	修复
未安装	从 https://docs.docker.com/get-docker/ 安装
权限被拒绝	`sudo usermod -aG docker $USER` 然后重新登录
守护进程未运行	`sudo systemctl start docker`

问题	修复
未安装	`curl -s https://get.nextflow.io
版本 < 23.04	`nextflow self-update`

问题	修复
未安装 / < 11	`sudo apt install openjdk-11-jdk`

在所有检查通过之前不要继续。 对于 HPC/Singularity，请参阅 references/troubleshooting.md。

步骤 2：选择流程

决策点：在继续之前与用户确认。

数据类型	流程	版本	目标
RNA-seq	`rnaseq`	3.22.2	基因表达
WGS/WES	`sarek`	3.7.1	变异检测
ATAC-seq	`atacseq`	2.1.2	染色质可及性

从数据自动检测：

python scripts/detect_data_type.py /path/to/data

有关流程特定详细信息：

步骤 3：运行测试配置文件

使用小数据验证环境。在运行真实数据之前必须通过。

nextflow run nf-core/<pipeline> -r <version> -profile test,docker --outdir test_output

流程	命令
rnaseq	`nextflow run nf-core/rnaseq -r 3.22.2 -profile test,docker --outdir test_rnaseq`
sarek	`nextflow run nf-core/sarek -r 3.7.1 -profile test,docker --outdir test_sarek`
atacseq	`nextflow run nf-core/atacseq -r 2.1.2 -profile test,docker --outdir test_atacseq`

ls test_output/multiqc/multiqc_report.html
grep "Pipeline completed successfully" .nextflow.log

如果测试失败，请参阅 references/troubleshooting.md。

步骤 4：创建样本表

python scripts/generate_samplesheet.py /path/to/data <pipeline> -o samplesheet.csv

发现 FASTQ/BAM/CRAM 文件
配对 R1/R2 读段
推断样本元数据
在写入前进行验证

对于 sarek： 如果未自动检测到，脚本会提示输入肿瘤/正常状态。

验证现有样本表

python scripts/generate_samplesheet.py --validate samplesheet.csv <pipeline>

sample,fastq_1,fastq_2,strandedness
SAMPLE1,/abs/path/R1.fq.gz,/abs/path/R2.fq.gz,auto

patient,sample,lane,fastq_1,fastq_2,status
patient1,tumor,L001,/abs/path/tumor_R1.fq.gz,/abs/path/tumor_R2.fq.gz,1
patient1,normal,L001,/abs/path/normal_R1.fq.gz,/abs/path/normal_R2.fq.gz,0

sample,fastq_1,fastq_2,replicate
CONTROL,/abs/path/ctrl_R1.fq.gz,/abs/path/ctrl_R2.fq.gz,1

步骤 5：配置与运行

5a. 检查基因组可用性

python scripts/manage_genomes.py check <genome>
# 如果未安装：
python scripts/manage_genomes.py download <genome>

常用基因组：GRCh38（人类）、GRCh37（旧版）、GRCm39（小鼠）、R64-1-1（酵母）、BDGP6（果蝇）

决策点：与用户确认：

基因组： 使用哪个参考
流程特定选项： * rnaseq： 比对工具（推荐 star_salmon，低内存情况下使用 hisat2） * sarek： 工具（种系变异使用 haplotypecaller，体细胞变异使用 mutect2） * atacseq： 读长（50、75、100 或 150）

nextflow run nf-core/<pipeline> \
    -r <version> \
    -profile docker \
    --input samplesheet.csv \
    --outdir results \
    --genome <genome> \
    -resume

-r：固定版本
-profile docker：使用 Docker（或 HPC 使用 singularity）
--genome：iGenomes 键
-resume：从检查点继续

资源限制（如果需要）：

--max_cpus 8 --max_memory '32.GB' --max_time '24.h'

步骤 6：验证输出

ls results/multiqc/multiqc_report.html
grep "Pipeline completed successfully" .nextflow.log

各流程的关键输出

results/star_salmon/salmon.merged.gene_counts.tsv - 基因计数
results/star_salmon/salmon.merged.gene_tpm.tsv - TPM 值

results/variant_calling/*/ - VCF 文件
results/preprocessing/recalibrated/ - BAM 文件

results/macs2/narrowPeak/ - 峰调用
results/bwa/mergedLibrary/bigwig/ - 覆盖度轨迹

常见退出代码和修复方法，请参阅 references/troubleshooting.md。

恢复失败的运行

nextflow run nf-core/<pipeline> -resume

references/geo-sra-acquisition.md - 下载公共 GEO/SRA 数据
references/troubleshooting.md - 常见问题及修复方法
references/installation.md - 环境设置
references/pipelines/rnaseq.md - RNA-seq 流程详细信息
references/pipelines/sarek.md - 变异检测详细信息
references/pipelines/atacseq.md - ATAC-seq 详细信息

此技能作为原型示例提供，演示如何将 nf-core 生物信息学流程集成到 Claude Code 中，以实现自动化分析工作流程。当前实现支持三个流程（rnaseq、sarek 和 atacseq），作为基础，使社区能够扩展支持到完整的 nf-core 流程集。

它旨在用于教育和研究目的，未经针对您特定用例的适当验证，不应视为生产就绪。用户需负责确保其计算环境满足流程要求，并负责验证分析结果。

Anthropic 不保证生物信息学输出的准确性，用户应遵循验证计算分析的标准实践。此集成未经 nf-core 社区官方认可或附属。

发表结果时，请引用相应的流程。引用信息可在每个 nf-core 仓库的 CITATIONS.md 文件中找到（例如，https://github.com/nf-core/rnaseq/blob/3.22.2/CITATIONS.md）。

nf-core 流程： MIT 许可证 (https://nf-co.re/about)
Nextflow： Apache 许可证，版本 2.0 (https://www.nextflow.io/about-us.html)
NCBI SRA 工具包： 公共领域 (https://github.com/ncbi/sra-tools/blob/master/LICENSE)

🇺🇸English

nf-core Pipeline Deployment

Run nf-core bioinformatics pipelines on local or public sequencing data.

Target users: Bench scientists and researchers without specialized bioinformatics training who need to run large-scale omics analyses—differential expression, variant calling, or chromatin accessibility analysis.

Workflow Checklist

- [ ] Step 0: Acquire data (if from GEO/SRA)
- [ ] Step 1: Environment check (MUST pass)
- [ ] Step 2: Select pipeline (confirm with user)
- [ ] Step 3: Run test profile (MUST pass)
- [ ] Step 4: Create samplesheet
- [ ] Step 5: Configure & run (confirm genome with user)
- [ ] Step 6: Verify outputs

Step 0: Acquire Data (GEO/SRA Only)

Skip this step if user has local FASTQ files.

For public datasets, fetch from GEO/SRA first. See references/geo-sra-acquisition.md for the full workflow.

Quick start:

# 1. Get study info
python scripts/sra_geo_fetch.py info GSE110004

# 2. Download (interactive mode)
python scripts/sra_geo_fetch.py download GSE110004 -o ./fastq -i

# 3. Generate samplesheet
python scripts/sra_geo_fetch.py samplesheet GSE110004 --fastq-dir ./fastq -o samplesheet.csv

DECISION POINT: After fetching study info, confirm with user:

Which sample subset to download (if multiple data types)
Suggested genome and pipeline

Then continue to Step 1.

Step 1: Environment Check

Run first. Pipeline will fail without passing environment.

python scripts/check_environment.py

All critical checks must pass. If any fail, provide fix instructions:

Docker issues

Problem	Fix
Not installed	Install from https://docs.docker.com/get-docker/
Permission denied	`sudo usermod -aG docker $USER` then re-login
Daemon not running	`sudo systemctl start docker`

Nextflow issues

Problem	Fix
Not installed	`curl -s https://get.nextflow.io
Version < 23.04	`nextflow self-update`

Java issues

Problem	Fix
Not installed / < 11	`sudo apt install openjdk-11-jdk`

Do not proceed until all checks pass. For HPC/Singularity, see references/troubleshooting.md.

Step 2: Select Pipeline

DECISION POINT: Confirm with user before proceeding.

Data Type	Pipeline	Version	Goal
RNA-seq	`rnaseq`	3.22.2	Gene expression
WGS/WES	`sarek`	3.7.1	Variant calling
ATAC-seq	`atacseq`	2.1.2	Chromatin accessibility

Auto-detect from data:

python scripts/detect_data_type.py /path/to/data

For pipeline-specific details:

Step 3: Run Test Profile

Validates environment with small data. MUST pass before real data.

nextflow run nf-core/<pipeline> -r <version> -profile test,docker --outdir test_output

Pipeline	Command
rnaseq	`nextflow run nf-core/rnaseq -r 3.22.2 -profile test,docker --outdir test_rnaseq`
sarek	`nextflow run nf-core/sarek -r 3.7.1 -profile test,docker --outdir test_sarek`
atacseq	`nextflow run nf-core/atacseq -r 2.1.2 -profile test,docker --outdir test_atacseq`

Verify:

ls test_output/multiqc/multiqc_report.html
grep "Pipeline completed successfully" .nextflow.log

If test fails, see references/troubleshooting.md.

Step 4: Create Samplesheet

Generate automatically

python scripts/generate_samplesheet.py /path/to/data <pipeline> -o samplesheet.csv

The script:

Discovers FASTQ/BAM/CRAM files
Pairs R1/R2 reads
Infers sample metadata
Validates before writing

For sarek: Script prompts for tumor/normal status if not auto-detected.

Validate existing samplesheet

python scripts/generate_samplesheet.py --validate samplesheet.csv <pipeline>

Samplesheet formats

rnaseq:

sample,fastq_1,fastq_2,strandedness
SAMPLE1,/abs/path/R1.fq.gz,/abs/path/R2.fq.gz,auto

sarek:

patient,sample,lane,fastq_1,fastq_2,status
patient1,tumor,L001,/abs/path/tumor_R1.fq.gz,/abs/path/tumor_R2.fq.gz,1
patient1,normal,L001,/abs/path/normal_R1.fq.gz,/abs/path/normal_R2.fq.gz,0

atacseq:

sample,fastq_1,fastq_2,replicate
CONTROL,/abs/path/ctrl_R1.fq.gz,/abs/path/ctrl_R2.fq.gz,1

Step 5: Configure & Run

5a. Check genome availability

python scripts/manage_genomes.py check <genome>
# If not installed:
python scripts/manage_genomes.py download <genome>

Common genomes: GRCh38 (human), GRCh37 (legacy), GRCm39 (mouse), R64-1-1 (yeast), BDGP6 (fly)

5b. Decision points

DECISION POINT: Confirm with user:

Genome: Which reference to use
Pipeline-specific options:
- rnaseq: aligner (star_salmon recommended, hisat2 for low memory)
- sarek: tools (haplotypecaller for germline, mutect2 for somatic)
- atacseq: read_length (50, 75, 100, or 150)

5c. Run pipeline

nextflow run nf-core/<pipeline> \
    -r <version> \
    -profile docker \
    --input samplesheet.csv \
    --outdir results \
    --genome <genome> \
    -resume

Key flags:

-r: Pin version
-profile docker: Use Docker (or singularity for HPC)
--genome: iGenomes key
-resume: Continue from checkpoint

Resource limits (if needed):

--max_cpus 8 --max_memory '32.GB' --max_time '24.h'

Step 6: Verify Outputs

Check completion

ls results/multiqc/multiqc_report.html
grep "Pipeline completed successfully" .nextflow.log

Key outputs by pipeline

rnaseq:

results/star_salmon/salmon.merged.gene_counts.tsv - Gene counts
results/star_salmon/salmon.merged.gene_tpm.tsv - TPM values

sarek:

results/variant_calling/*/ - VCF files
results/preprocessing/recalibrated/ - BAM files

atacseq:

results/macs2/narrowPeak/ - Peak calls
results/bwa/mergedLibrary/bigwig/ - Coverage tracks

Quick Reference

For common exit codes and fixes, see references/troubleshooting.md.

Resume failed run

nextflow run nf-core/<pipeline> -resume

References

references/geo-sra-acquisition.md - Downloading public GEO/SRA data
references/troubleshooting.md - Common issues and fixes
references/installation.md - Environment setup
references/pipelines/rnaseq.md - RNA-seq pipeline details
references/pipelines/sarek.md - Variant calling details
references/pipelines/atacseq.md - ATAC-seq details

Disclaimer

This skill is provided as a prototype example demonstrating how to integrate nf-core bioinformatics pipelines into Claude Code for automated analysis workflows. The current implementation supports three pipelines (rnaseq, sarek, and atacseq), serving as a foundation that enables the community to expand support to the full set of nf-core pipelines.

It is intended for educational and research purposes and should not be considered production-ready without appropriate validation for your specific use case. Users are responsible for ensuring their computing environment meets pipeline requirements and for verifying analysis results.

Anthropic does not guarantee the accuracy of bioinformatics outputs, and users should follow standard practices for validating computational analyses. This integration is not officially endorsed by or affiliated with the nf-core community.

Attribution

When publishing results, cite the appropriate pipeline. Citations are available in each nf-core repository's CITATIONS.md file (e.g., https://github.com/nf-core/rnaseq/blob/3.22.2/CITATIONS.md).

Licenses

nf-core pipelines: MIT License (https://nf-co.re/about)
Nextflow: Apache License, Version 2.0 (https://www.nextflow.io/about-us.html)
NCBI SRA Toolkit: Public Domain (https://github.com/ncbi/sra-tools/blob/master/LICENSE)

Weekly Installs

132

Repository

anthropics/know…-plugins

GitHub Stars

8.9K

First Seen

Jan 31, 2026

Security Audits

Gen Agent Trust HubFail SocketPass SnykWarn

Installed on

opencode115

codex115

gemini-cli109

github-copilot106

claude-code103

cursor100

通过 LiteLLM 代理让 Claude Code 对接 GitHub Copilot 运行 | 高级变通方案指南

33,600 周安装