scanpy by davila7/claude-code-templates
npx skills add https://github.com/davila7/claude-code-templates --skill scanpyScanpy 是一个基于 AnnData 构建的、可扩展的 Python 工具包,用于分析单细胞 RNA-seq 数据。应用此技能可完成完整的单细胞分析流程,包括质量控制、标准化、降维、聚类、标记基因识别、可视化和轨迹分析。
在以下情况应使用此技能:
import scanpy as sc
import pandas as pd
import numpy as np
# 配置设置
sc.settings.verbosity = 3
sc.settings.set_figure_params(dpi=80, facecolor='white')
sc.settings.figdir = './figures/'
# 从 10X Genomics 加载
adata = sc.read_10x_mtx('path/to/data/')
adata = sc.read_10x_h5('path/to/data.h5')
# 从 h5ad (AnnData 格式) 加载
adata = sc.read_h5ad('path/to/data.h5ad')
# 从 CSV 加载
adata = sc.read_csv('path/to/data.csv')
AnnData 对象是 scanpy 中的核心数据结构:
adata.X # 表达矩阵 (细胞 × 基因)
adata.obs # 细胞元数据 (DataFrame)
adata.var # 基因元数据 (DataFrame)
adata.uns # 非结构化注释 (字典)
adata.obsm # 多维细胞数据 (PCA, UMAP)
adata.raw # 原始数据备份
# 访问细胞和基因名称
adata.obs_names # 细胞条形码
adata.var_names # 基因名称
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
识别并过滤低质量的细胞和基因:
# 识别线粒体基因
adata.var['mt'] = adata.var_names.str.startswith('MT-')
# 计算 QC 指标
sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], inplace=True)
# 可视化 QC 指标
sc.pl.violin(adata, ['n_genes_by_counts', 'total_counts', 'pct_counts_mt'],
jitter=0.4, multi_panel=True)
# 过滤细胞和基因
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
adata = adata[adata.obs.pct_counts_mt < 5, :] # 移除高线粒体百分比细胞
使用 QC 脚本进行自动化分析:
python scripts/qc_analysis.py input_file.h5ad --output filtered.h5ad
# 将每个细胞标准化到 10,000 个计数
sc.pp.normalize_total(adata, target_sum=1e4)
# 对数转换
sc.pp.log1p(adata)
# 保存原始计数以备后用
adata.raw = adata
# 识别高可变基因
sc.pp.highly_variable_genes(adata, n_top_genes=2000)
sc.pl.highly_variable_genes(adata)
# 子集化为高可变基因
adata = adata[:, adata.var.highly_variable]
# 回归掉不需要的变异
sc.pp.regress_out(adata, ['total_counts', 'pct_counts_mt'])
# 缩放数据
sc.pp.scale(adata, max_value=10)
# PCA
sc.tl.pca(adata, svd_solver='arpack')
sc.pl.pca_variance_ratio(adata, log=True) # 检查肘部图
# 计算邻域图
sc.pp.neighbors(adata, n_neighbors=10, n_pcs=40)
# UMAP 可视化
sc.tl.umap(adata)
sc.pl.umap(adata, color='leiden')
# 替代方案:t-SNE
sc.tl.tsne(adata)
# Leiden 聚类 (推荐)
sc.tl.leiden(adata, resolution=0.5)
sc.pl.umap(adata, color='leiden', legend_loc='on data')
# 尝试多种分辨率以找到最佳粒度
for res in [0.3, 0.5, 0.8, 1.0]:
sc.tl.leiden(adata, resolution=res, key_added=f'leiden_{res}')
# 为每个簇寻找标记基因
sc.tl.rank_genes_groups(adata, 'leiden', method='wilcoxon')
# 可视化结果
sc.pl.rank_genes_groups(adata, n_genes=25, sharey=False)
sc.pl.rank_genes_groups_heatmap(adata, n_genes=10)
sc.pl.rank_genes_groups_dotplot(adata, n_genes=5)
# 将结果获取为 DataFrame
markers = sc.get.rank_genes_groups_df(adata, group='0')
# 为已知细胞类型定义标记基因
marker_genes = ['CD3D', 'CD14', 'MS4A1', 'NKG7', 'FCGR3A']
# 可视化标记基因
sc.pl.umap(adata, color=marker_genes, use_raw=True)
sc.pl.dotplot(adata, var_names=marker_genes, groupby='leiden')
# 手动注释
cluster_to_celltype = {
'0': 'CD4 T cells',
'1': 'CD14+ Monocytes',
'2': 'B cells',
'3': 'CD8 T cells',
}
adata.obs['cell_type'] = adata.obs['leiden'].map(cluster_to_celltype)
# 可视化注释后的类型
sc.pl.umap(adata, color='cell_type', legend_loc='on data')
# 保存处理后的数据
adata.write('results/processed_data.h5ad')
# 导出元数据
adata.obs.to_csv('results/cell_metadata.csv')
adata.var.to_csv('results/gene_metadata.csv')
# 设置高质量默认值
sc.settings.set_figure_params(dpi=300, frameon=False, figsize=(5, 5))
sc.settings.file_format_figs = 'pdf'
# 自定义样式的 UMAP
sc.pl.umap(adata, color='cell_type',
palette='Set2',
legend_loc='on data',
legend_fontsize=12,
legend_fontoutline=2,
frameon=False,
save='_publication.pdf')
# 标记基因热图
sc.pl.heatmap(adata, var_names=genes, groupby='cell_type',
swap_axes=True, show_gene_labels=True,
save='_markers.pdf')
# 点图
sc.pl.dotplot(adata, var_names=genes, groupby='cell_type',
save='_dotplot.pdf')
有关全面的可视化示例,请参阅 references/plotting_guide.md。
# PAGA (基于分区的图抽象)
sc.tl.paga(adata, groups='leiden')
sc.pl.paga(adata, color='leiden')
# 扩散拟时序
adata.uns['iroot'] = np.flatnonzero(adata.obs['leiden'] == '0')[0]
sc.tl.dpt(adata)
sc.pl.umap(adata, color='dpt_pseudotime')
# 在细胞类型内比较处理组与对照组
adata_subset = adata[adata.obs['cell_type'] == 'T cells']
sc.tl.rank_genes_groups(adata_subset, groupby='condition',
groups=['treated'], reference='control')
sc.pl.rank_genes_groups(adata_subset, groups=['treated'])
# 为细胞的基因集表达评分
gene_set = ['CD3D', 'CD3E', 'CD3G']
sc.tl.score_genes(adata, gene_set, score_name='T_cell_score')
sc.pl.umap(adata, color='T_cell_score')
# ComBat 批次校正
sc.pp.combat(adata, key='batch')
# 替代方案:使用 Harmony 或 scVI (单独的包)
min_genes:每个细胞的最小基因数(通常为 200-500)min_cells:每个基因的最小细胞数(通常为 3-10)pct_counts_mt:线粒体阈值(通常为 5-20%)target_sum:每个细胞的目标计数(默认为 1e4)n_top_genes:高可变基因数量(通常为 2000-3000)min_mean, max_mean, min_disp:高可变基因选择参数n_pcs:主成分数量(检查方差比率图)n_neighbors:邻居数量(通常为 10-30)resolution:聚类粒度(0.4-1.2,值越高 = 簇越多)adata.raw = adatause_raw=True 绘制基因表达图:显示原始计数自动化质量控制脚本,用于计算指标、生成图和过滤数据:
python scripts/qc_analysis.py input.h5ad --output filtered.h5ad \
--mt-threshold 5 --min-genes 200 --min-cells 3
完整的逐步工作流程,包含详细解释和代码示例,涵盖:
从头开始执行完整分析时,请阅读此参考文档。
按模块组织的 scanpy 函数快速参考指南:
sc.read_*, adata.write_*)sc.pp.*)sc.tl.*)sc.pl.*)用于快速查找函数签名和常用参数。
全面的可视化指南,包括:
创建可用于发表的图表时,请查阅此指南。
完整的分析模板,提供了从数据加载到细胞类型注释的完整工作流程。复制并自定义此模板以用于新分析:
cp assets/analysis_template.py my_analysis.py
# 编辑参数并运行
python my_analysis.py
该模板包含所有标准步骤,带有可配置参数和有用的注释。
assets/analysis_template.py 作为起点scripts/qc_analysis.py 进行初始过滤每周安装次数
164
代码库
GitHub 星标数
23.5K
首次出现
2026年1月21日
安全审计
安装于
opencode137
claude-code135
gemini-cli129
cursor129
codex120
github-copilot114
Scanpy is a scalable Python toolkit for analyzing single-cell RNA-seq data, built on AnnData. Apply this skill for complete single-cell workflows including quality control, normalization, dimensionality reduction, clustering, marker gene identification, visualization, and trajectory analysis.
This skill should be used when:
import scanpy as sc
import pandas as pd
import numpy as np
# Configure settings
sc.settings.verbosity = 3
sc.settings.set_figure_params(dpi=80, facecolor='white')
sc.settings.figdir = './figures/'
# From 10X Genomics
adata = sc.read_10x_mtx('path/to/data/')
adata = sc.read_10x_h5('path/to/data.h5')
# From h5ad (AnnData format)
adata = sc.read_h5ad('path/to/data.h5ad')
# From CSV
adata = sc.read_csv('path/to/data.csv')
The AnnData object is the core data structure in scanpy:
adata.X # Expression matrix (cells × genes)
adata.obs # Cell metadata (DataFrame)
adata.var # Gene metadata (DataFrame)
adata.uns # Unstructured annotations (dict)
adata.obsm # Multi-dimensional cell data (PCA, UMAP)
adata.raw # Raw data backup
# Access cell and gene names
adata.obs_names # Cell barcodes
adata.var_names # Gene names
Identify and filter low-quality cells and genes:
# Identify mitochondrial genes
adata.var['mt'] = adata.var_names.str.startswith('MT-')
# Calculate QC metrics
sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], inplace=True)
# Visualize QC metrics
sc.pl.violin(adata, ['n_genes_by_counts', 'total_counts', 'pct_counts_mt'],
jitter=0.4, multi_panel=True)
# Filter cells and genes
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
adata = adata[adata.obs.pct_counts_mt < 5, :] # Remove high MT% cells
Use the QC script for automated analysis:
python scripts/qc_analysis.py input_file.h5ad --output filtered.h5ad
# Normalize to 10,000 counts per cell
sc.pp.normalize_total(adata, target_sum=1e4)
# Log-transform
sc.pp.log1p(adata)
# Save raw counts for later
adata.raw = adata
# Identify highly variable genes
sc.pp.highly_variable_genes(adata, n_top_genes=2000)
sc.pl.highly_variable_genes(adata)
# Subset to highly variable genes
adata = adata[:, adata.var.highly_variable]
# Regress out unwanted variation
sc.pp.regress_out(adata, ['total_counts', 'pct_counts_mt'])
# Scale data
sc.pp.scale(adata, max_value=10)
# PCA
sc.tl.pca(adata, svd_solver='arpack')
sc.pl.pca_variance_ratio(adata, log=True) # Check elbow plot
# Compute neighborhood graph
sc.pp.neighbors(adata, n_neighbors=10, n_pcs=40)
# UMAP for visualization
sc.tl.umap(adata)
sc.pl.umap(adata, color='leiden')
# Alternative: t-SNE
sc.tl.tsne(adata)
# Leiden clustering (recommended)
sc.tl.leiden(adata, resolution=0.5)
sc.pl.umap(adata, color='leiden', legend_loc='on data')
# Try multiple resolutions to find optimal granularity
for res in [0.3, 0.5, 0.8, 1.0]:
sc.tl.leiden(adata, resolution=res, key_added=f'leiden_{res}')
# Find marker genes for each cluster
sc.tl.rank_genes_groups(adata, 'leiden', method='wilcoxon')
# Visualize results
sc.pl.rank_genes_groups(adata, n_genes=25, sharey=False)
sc.pl.rank_genes_groups_heatmap(adata, n_genes=10)
sc.pl.rank_genes_groups_dotplot(adata, n_genes=5)
# Get results as DataFrame
markers = sc.get.rank_genes_groups_df(adata, group='0')
# Define marker genes for known cell types
marker_genes = ['CD3D', 'CD14', 'MS4A1', 'NKG7', 'FCGR3A']
# Visualize markers
sc.pl.umap(adata, color=marker_genes, use_raw=True)
sc.pl.dotplot(adata, var_names=marker_genes, groupby='leiden')
# Manual annotation
cluster_to_celltype = {
'0': 'CD4 T cells',
'1': 'CD14+ Monocytes',
'2': 'B cells',
'3': 'CD8 T cells',
}
adata.obs['cell_type'] = adata.obs['leiden'].map(cluster_to_celltype)
# Visualize annotated types
sc.pl.umap(adata, color='cell_type', legend_loc='on data')
# Save processed data
adata.write('results/processed_data.h5ad')
# Export metadata
adata.obs.to_csv('results/cell_metadata.csv')
adata.var.to_csv('results/gene_metadata.csv')
# Set high-quality defaults
sc.settings.set_figure_params(dpi=300, frameon=False, figsize=(5, 5))
sc.settings.file_format_figs = 'pdf'
# UMAP with custom styling
sc.pl.umap(adata, color='cell_type',
palette='Set2',
legend_loc='on data',
legend_fontsize=12,
legend_fontoutline=2,
frameon=False,
save='_publication.pdf')
# Heatmap of marker genes
sc.pl.heatmap(adata, var_names=genes, groupby='cell_type',
swap_axes=True, show_gene_labels=True,
save='_markers.pdf')
# Dot plot
sc.pl.dotplot(adata, var_names=genes, groupby='cell_type',
save='_dotplot.pdf')
Refer to references/plotting_guide.md for comprehensive visualization examples.
# PAGA (Partition-based graph abstraction)
sc.tl.paga(adata, groups='leiden')
sc.pl.paga(adata, color='leiden')
# Diffusion pseudotime
adata.uns['iroot'] = np.flatnonzero(adata.obs['leiden'] == '0')[0]
sc.tl.dpt(adata)
sc.pl.umap(adata, color='dpt_pseudotime')
# Compare treated vs control within cell types
adata_subset = adata[adata.obs['cell_type'] == 'T cells']
sc.tl.rank_genes_groups(adata_subset, groupby='condition',
groups=['treated'], reference='control')
sc.pl.rank_genes_groups(adata_subset, groups=['treated'])
# Score cells for gene set expression
gene_set = ['CD3D', 'CD3E', 'CD3G']
sc.tl.score_genes(adata, gene_set, score_name='T_cell_score')
sc.pl.umap(adata, color='T_cell_score')
# ComBat batch correction
sc.pp.combat(adata, key='batch')
# Alternative: use Harmony or scVI (separate packages)
min_genes: Minimum genes per cell (typically 200-500)min_cells: Minimum cells per gene (typically 3-10)pct_counts_mt: Mitochondrial threshold (typically 5-20%)target_sum: Target counts per cell (default 1e4)n_top_genes: Number of HVGs (typically 2000-3000)min_mean, max_mean, min_disp: HVG selection parametersn_pcs: Number of principal components (check variance ratio plot)n_neighbors: Number of neighbors (typically 10-30)resolution: Clustering granularity (0.4-1.2, higher = more clusters)adata.raw = adata before filtering genesuse_raw=True for gene expression plots: Shows original countsAutomated quality control script that calculates metrics, generates plots, and filters data:
python scripts/qc_analysis.py input.h5ad --output filtered.h5ad \
--mt-threshold 5 --min-genes 200 --min-cells 3
Complete step-by-step workflow with detailed explanations and code examples for:
Read this reference when performing a complete analysis from scratch.
Quick reference guide for scanpy functions organized by module:
sc.read_*, adata.write_*)sc.pp.*)sc.tl.*)sc.pl.*)Use this for quick lookup of function signatures and common parameters.
Comprehensive visualization guide including:
Consult this when creating publication-ready figures.
Complete analysis template providing a full workflow from data loading through cell type annotation. Copy and customize this template for new analyses:
cp assets/analysis_template.py my_analysis.py
# Edit parameters and run
python my_analysis.py
The template includes all standard steps with configurable parameters and helpful comments.
assets/analysis_template.py as a starting pointscripts/qc_analysis.py for initial filteringWeekly Installs
164
Repository
GitHub Stars
23.5K
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
opencode137
claude-code135
gemini-cli129
cursor129
codex120
github-copilot114
DOCX文件创建、编辑与分析完整指南 - 使用docx-js、Pandoc和Python脚本
51,800 周安装
OpenClaw专家技能clawpilot:AI代理安全配置、自动更新与核心架构指南
118 周安装
盲点侦探:系统识别非虚构写作缺失内容,提升内容全面性与可信度
120 周安装
OpenClaw Feeds RSS聚合器:新闻、游戏、金融信息源一键获取,支持流式JSON输出
122 周安装
union-search-skill:跨平台联合搜索工具,支持20+平台无API密钥搜索
118 周安装
Ant Design Vue 4.x 中文指南:Vue 3 UI 组件库安装、使用与API详解
126 周安装
API认证最佳实践指南:JWT、OAuth 2.0、API密钥安全实现与Node.js示例
120 周安装