single-cell-rna-qc by anthropics/knowledge-work-plugins
npx skills add https://github.com/anthropics/knowledge-work-plugins --skill single-cell-rna-qc遵循 scverse 最佳实践的自动化单细胞 RNA 测序数据质量控制工作流。
当用户:
支持的输入格式:
.h5ad 文件(来自 scanpy/Python 工作流的 AnnData 格式).h5 文件(10X Genomics Cell Ranger 输出)默认建议:使用方案 1(完整流程),除非用户有特定的自定义需求或明确要求非标准的过滤逻辑。
对于遵循 scverse 最佳实践的标准 QC,请使用便捷脚本 scripts/qc_analysis.py:
python3 scripts/qc_analysis.py input.h5ad
# 或对于 10X Genomics .h5 文件:
python3 scripts/qc_analysis.py raw_feature_bc_matrix.h5
该脚本会自动检测文件格式并适当地加载。
何时使用此方案:
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
要求: anndata, scanpy, scipy, matplotlib, seaborn, numpy
参数:
使用命令行参数自定义过滤阈值和基因模式:
--output-dir - 输出目录--mad-counts, --mad-genes, --mad-mt - 计数/基因/线粒体百分比 的 MAD 阈值--mt-threshold - 硬性线粒体百分比 截止值--min-cells - 基因过滤阈值--mt-pattern, --ribo-pattern, --hb-pattern - 不同物种的基因名称模式使用 --help 查看当前默认值。
输出:
默认情况下,所有文件都保存到 <input_basename>_qc_results/ 目录(或由 --output-dir 指定的目录):
qc_metrics_before_filtering.png - 过滤前的可视化qc_filtering_thresholds.png - 基于 MAD 的阈值叠加图qc_metrics_after_filtering.png - 过滤后的质量指标<input_basename>_filtered.h5ad - 干净、过滤后的数据集,可用于下游分析<input_basename>_with_qc.h5ad - 保留 QC 注释的原始数据如果复制输出以供用户访问,请复制单个文件(而不是整个目录),以便用户可以直接预览它们。
该脚本执行以下步骤:
对于自定义分析工作流或非标准需求,请使用 scripts/qc_core.py 和 scripts/qc_plotting.py 中的模块化实用函数:
# 从 scripts/ 目录运行,或者如果需要,将 scripts/ 添加到 sys.path
import anndata as ad
from qc_core import calculate_qc_metrics, detect_outliers_mad, filter_cells
from qc_plotting import plot_qc_distributions # 仅在需要可视化时
adata = ad.read_h5ad('input.h5ad')
calculate_qc_metrics(adata, inplace=True)
# ... 此处为自定义分析逻辑
何时使用此方案:
可用的实用函数:
来自 qc_core.py(核心 QC 操作):
calculate_qc_metrics(adata, mt_pattern, ribo_pattern, hb_pattern, inplace=True) - 计算 QC 指标并注释 adatadetect_outliers_mad(adata, metric, n_mads, verbose=True) - 基于 MAD 的异常值检测,返回布尔掩码apply_hard_threshold(adata, metric, threshold, operator='>', verbose=True) - 应用硬性截止值,返回布尔掩码filter_cells(adata, mask, inplace=False) - 应用布尔掩码过滤细胞filter_genes(adata, min_cells=20, min_counts=None, inplace=True) - 按检测情况过滤基因print_qc_summary(adata, label='') - 打印汇总统计信息来自 qc_plotting.py(可视化):
plot_qc_distributions(adata, output_path, title) - 生成全面的 QC 图plot_filtering_thresholds(adata, outlier_masks, thresholds, output_path) - 可视化过滤阈值plot_qc_after_filtering(adata, output_path) - 生成过滤后图自定义工作流示例:
示例 1:仅计算指标和可视化,暂不进行过滤
adata = ad.read_h5ad('input.h5ad')
calculate_qc_metrics(adata, inplace=True)
plot_qc_distributions(adata, 'qc_before.png', title='初始 QC')
print_qc_summary(adata, label='过滤前')
示例 2:仅应用线粒体百分比 过滤,其他指标保持宽松
adata = ad.read_h5ad('input.h5ad')
calculate_qc_metrics(adata, inplace=True)
# 仅过滤高线粒体百分比 的细胞
high_mt = apply_hard_threshold(adata, 'pct_counts_mt', 10, operator='>')
adata_filtered = filter_cells(adata, ~high_mt)
adata_filtered.write('filtered.h5ad')
示例 3:对不同子集应用不同的阈值
adata = ad.read_h5ad('input.h5ad')
calculate_qc_metrics(adata, inplace=True)
# 应用类型特异性 QC(假设存在 cell_type 元数据)
neurons = adata.obs['cell_type'] == 'neuron'
other_cells = ~neurons
# 神经元能容忍更高的线粒体百分比 ,其他细胞使用更严格的阈值
neuron_qc = apply_hard_threshold(adata[neurons], 'pct_counts_mt', 15, operator='>')
other_qc = apply_hard_threshold(adata[other_cells], 'pct_counts_mt', 8, operator='>')
有关详细的 QC 方法、参数原理和故障排除指南,请参阅 references/scverse_qc_guidelines.md。此参考资料提供:
当用户需要更深入地理解方法或在解决 QC 问题时,请加载此参考资料。
典型的下游分析步骤:
每周安装次数
149
代码库
GitHub 星标数
8.9K
首次出现
2026 年 1 月 31 日
安全审计
安装于
opencode133
codex130
gemini-cli125
github-copilot121
claude-code117
cursor115
Automated QC workflow for single-cell RNA-seq data following scverse best practices.
Use when users:
Supported input formats:
.h5ad files (AnnData format from scanpy/Python workflows).h5 files (10X Genomics Cell Ranger output)Default recommendation : Use Approach 1 (complete pipeline) unless the user has specific custom requirements or explicitly requests non-standard filtering logic.
For standard QC following scverse best practices, use the convenience script scripts/qc_analysis.py:
python3 scripts/qc_analysis.py input.h5ad
# or for 10X Genomics .h5 files:
python3 scripts/qc_analysis.py raw_feature_bc_matrix.h5
The script automatically detects the file format and loads it appropriately.
When to use this approach:
Requirements: anndata, scanpy, scipy, matplotlib, seaborn, numpy
Parameters:
Customize filtering thresholds and gene patterns using command-line parameters:
--output-dir - Output directory--mad-counts, --mad-genes, --mad-mt - MAD thresholds for counts/genes/MT%--mt-threshold - Hard mitochondrial % cutoff--min-cells - Gene filtering threshold--mt-pattern, --ribo-pattern, --hb-pattern - Gene name patterns for different speciesUse --help to see current default values.
Outputs:
All files are saved to <input_basename>_qc_results/ directory by default (or to the directory specified by --output-dir):
qc_metrics_before_filtering.png - Pre-filtering visualizationsqc_filtering_thresholds.png - MAD-based threshold overlaysqc_metrics_after_filtering.png - Post-filtering quality metrics<input_basename>_filtered.h5ad - Clean, filtered dataset ready for downstream analysis<input_basename>_with_qc.h5ad - Original data with QC annotations preservedIf copying outputs for user access, copy individual files (not the entire directory) so users can preview them directly.
The script performs the following steps:
For custom analysis workflows or non-standard requirements, use the modular utility functions from scripts/qc_core.py and scripts/qc_plotting.py:
# Run from scripts/ directory, or add scripts/ to sys.path if needed
import anndata as ad
from qc_core import calculate_qc_metrics, detect_outliers_mad, filter_cells
from qc_plotting import plot_qc_distributions # Only if visualization needed
adata = ad.read_h5ad('input.h5ad')
calculate_qc_metrics(adata, inplace=True)
# ... custom analysis logic here
When to use this approach:
Available utility functions:
From qc_core.py (core QC operations):
calculate_qc_metrics(adata, mt_pattern, ribo_pattern, hb_pattern, inplace=True) - Calculate QC metrics and annotate adatadetect_outliers_mad(adata, metric, n_mads, verbose=True) - MAD-based outlier detection, returns boolean maskapply_hard_threshold(adata, metric, threshold, operator='>', verbose=True) - Apply hard cutoffs, returns boolean maskfilter_cells(adata, mask, inplace=False) - Apply boolean mask to filter cellsfilter_genes(adata, min_cells=20, min_counts=None, inplace=True) - Filter genes by detectionprint_qc_summary(adata, label='') - Print summary statisticsFrom qc_plotting.py (visualization):
plot_qc_distributions(adata, output_path, title) - Generate comprehensive QC plotsplot_filtering_thresholds(adata, outlier_masks, thresholds, output_path) - Visualize filtering thresholdsplot_qc_after_filtering(adata, output_path) - Generate post-filtering plotsExample custom workflows:
Example 1: Only calculate metrics and visualize, don't filter yet
adata = ad.read_h5ad('input.h5ad')
calculate_qc_metrics(adata, inplace=True)
plot_qc_distributions(adata, 'qc_before.png', title='Initial QC')
print_qc_summary(adata, label='Before filtering')
Example 2: Apply only MT% filtering, keep other metrics permissive
adata = ad.read_h5ad('input.h5ad')
calculate_qc_metrics(adata, inplace=True)
# Only filter high MT% cells
high_mt = apply_hard_threshold(adata, 'pct_counts_mt', 10, operator='>')
adata_filtered = filter_cells(adata, ~high_mt)
adata_filtered.write('filtered.h5ad')
Example 3: Different thresholds for different subsets
adata = ad.read_h5ad('input.h5ad')
calculate_qc_metrics(adata, inplace=True)
# Apply type-specific QC (assumes cell_type metadata exists)
neurons = adata.obs['cell_type'] == 'neuron'
other_cells = ~neurons
# Neurons tolerate higher MT%, other cells use stricter threshold
neuron_qc = apply_hard_threshold(adata[neurons], 'pct_counts_mt', 15, operator='>')
other_qc = apply_hard_threshold(adata[other_cells], 'pct_counts_mt', 8, operator='>')
For detailed QC methodology, parameter rationale, and troubleshooting guidance, see references/scverse_qc_guidelines.md. This reference provides:
Load this reference when users need deeper understanding of the methodology or when troubleshooting QC issues.
Typical downstream analysis steps:
Weekly Installs
149
Repository
GitHub Stars
8.9K
First Seen
Jan 31, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
opencode133
codex130
gemini-cli125
github-copilot121
claude-code117
cursor115
免费AI数据抓取智能体:自动化收集、丰富与存储网站/API数据
1,100 周安装
edit-article AI文章编辑助手 - 智能结构化重写,提升内容清晰度与连贯性
1,300 周安装
CTF密码学挑战速查指南 | 经典/现代密码攻击、RSA/ECC/流密码实战技巧
1,400 周安装
企业级智能体运维指南:云端AI系统生命周期管理、可观测性与安全控制
1,300 周安装
Vue.js开发指南:最佳实践、组件设计与响应式编程核心原则
1,500 周安装
Gemini Live API 开发指南:实时语音视频交互、WebSockets集成与SDK使用
1,600 周安装
dbs-hook:短视频开头优化AI工具,诊断开头问题并生成优化方案,提升视频吸引力
1,700 周安装