pydeseq2 by davila7/claude-code-templates
npx skills add https://github.com/davila7/claude-code-templates --skill pydeseq2PyDESeq2 是 DESeq2 的 Python 实现,用于批量 RNA-seq 数据的差异表达分析。设计和执行从数据加载到结果解释的完整工作流程,包括单因子和多因子设计、带多重检验校正的 Wald 检验、可选的 apeGLM 收缩,以及与 pandas 和 AnnData 的集成。
此技能应在以下情况下使用:
对于想要执行标准差异表达分析的用户:
import pandas as pd
from pydeseq2.dds import DeseqDataSet
from pydeseq2.ds import DeseqStats
# 1. 加载数据
counts_df = pd.read_csv("counts.csv", index_col=0).T # 转置为样本 × 基因
metadata = pd.read_csv("metadata.csv", index_col=0)
# 2. 过滤低计数基因
genes_to_keep = counts_df.columns[counts_df.sum(axis=0) >= 10]
counts_df = counts_df[genes_to_keep]
# 3. 初始化和拟合 DESeq2
dds = DeseqDataSet(
counts=counts_df,
metadata=metadata,
design="~condition",
refit_cooks=True
)
dds.deseq2()
# 4. 执行统计检验
ds = DeseqStats(dds, contrast=["condition", "treated", "control"])
ds.summary()
# 5. 访问结果
results = ds.results_df
significant = results[results.padj < 0.05]
print(f"Found {len(significant)} significant genes")
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
输入要求:
常见数据加载模式:
# 从 CSV 加载(典型格式:基因 × 样本,需要转置)
counts_df = pd.read_csv("counts.csv", index_col=0).T
metadata = pd.read_csv("metadata.csv", index_col=0)
# 从 TSV 加载
counts_df = pd.read_csv("counts.tsv", sep="\t", index_col=0).T
# 从 AnnData 加载
import anndata as ad
adata = ad.read_h5ad("data.h5ad")
counts_df = pd.DataFrame(adata.X, index=adata.obs_names, columns=adata.var_names)
metadata = adata.obs
数据过滤:
# 移除低计数基因
genes_to_keep = counts_df.columns[counts_df.sum(axis=0) >= 10]
counts_df = counts_df[genes_to_keep]
# 移除元数据缺失的样本
samples_to_keep = ~metadata.condition.isna()
counts_df = counts_df.loc[samples_to_keep]
metadata = metadata.loc[samples_to_keep]
设计公式指定了基因表达如何建模。
单因子设计:
design = "~condition" # 简单的两组比较
多因子设计:
design = "~batch + condition" # 控制批次效应
design = "~age + condition" # 包含连续协变量
design = "~group + condition + group:condition" # 交互效应
设计公式指南:
初始化 DeseqDataSet 并运行完整流程:
from pydeseq2.dds import DeseqDataSet
dds = DeseqDataSet(
counts=counts_df,
metadata=metadata,
design="~condition",
refit_cooks=True, # 移除异常值后重新拟合
n_cpus=1 # 并行处理(根据需要调整)
)
# 运行完整的 DESeq2 流程
dds.deseq2()
deseq2() 的作用:
执行 Wald 检验以识别差异表达基因:
from pydeseq2.ds import DeseqStats
ds = DeseqStats(
dds,
contrast=["condition", "treated", "control"], # 测试处理组 vs 对照组
alpha=0.05, # 显著性阈值
cooks_filter=True, # 过滤异常值
independent_filter=True # 过滤低功效检验
)
ds.summary()
对比规范:
[变量, 测试水平, 参考水平]["condition", "treated", "control"] 测试处理组 vs 对照组None,则使用设计中的最后一个系数结果 DataFrame 列:
baseMean:跨样本的平均归一化计数log2FoldChange:条件之间的对数 2 倍数变化lfcSE:LFC 的标准误stat:Wald 检验统计量pvalue:原始 p 值padj:调整后的 p 值(通过 Benjamini-Hochberg 进行 FDR 校正)应用收缩以减少倍数变化估计中的噪声:
ds.lfc_shrink() # 应用 apeGLM 收缩
何时使用 LFC 收缩:
重要提示: 收缩仅影响 log2FoldChange 值,不影响统计检验结果(p 值保持不变)。使用收缩值进行可视化,但报告未收缩的 p 值以表示显著性。
保存结果和中间对象:
import pickle
# 将结果导出为 CSV
ds.results_df.to_csv("deseq2_results.csv")
# 仅保存显著基因
significant = ds.results_df[ds.results_df.padj < 0.05]
significant.to_csv("significant_genes.csv")
# 保存 DeseqDataSet 以供后续使用
with open("dds_result.pkl", "wb") as f:
pickle.dump(dds.to_picklable_anndata(), f)
标准病例-对照比较:
dds = DeseqDataSet(counts=counts_df, metadata=metadata, design="~condition")
dds.deseq2()
ds = DeseqStats(dds, contrast=["condition", "treated", "control"])
ds.summary()
results = ds.results_df
significant = results[results.padj < 0.05]
测试多个处理组与对照组:
dds = DeseqDataSet(counts=counts_df, metadata=metadata, design="~condition")
dds.deseq2()
treatments = ["treatment_A", "treatment_B", "treatment_C"]
all_results = {}
for treatment in treatments:
ds = DeseqStats(dds, contrast=["condition", treatment, "control"])
ds.summary()
all_results[treatment] = ds.results_df
sig_count = len(ds.results_df[ds.results_df.padj < 0.05])
print(f"{treatment}: {sig_count} significant genes")
控制技术变异:
# 在设计中包含批次
dds = DeseqDataSet(counts=counts_df, metadata=metadata, design="~batch + condition")
dds.deseq2()
# 测试条件同时控制批次
ds = DeseqStats(dds, contrast=["condition", "treated", "control"])
ds.summary()
包含连续变量,如年龄或剂量:
# 确保连续变量是数值型
metadata["age"] = pd.to_numeric(metadata["age"])
dds = DeseqDataSet(counts=counts_df, metadata=metadata, design="~age + condition")
dds.deseq2()
ds = DeseqStats(dds, contrast=["condition", "treated", "control"])
ds.summary()
此技能包含一个用于标准分析的完整命令行脚本:
# 基本用法
python scripts/run_deseq2_analysis.py \
--counts counts.csv \
--metadata metadata.csv \
--design "~condition" \
--contrast condition treated control \
--output results/
# 带附加选项
python scripts/run_deseq2_analysis.py \
--counts counts.csv \
--metadata metadata.csv \
--design "~batch + condition" \
--contrast condition treated control \
--output results/ \
--min-counts 10 \
--alpha 0.05 \
--n-cpus 4 \
--plots
脚本功能:
当用户需要独立分析工具或想要批量处理多个数据集时,请参考 scripts/run_deseq2_analysis.py。
# 按调整后 p 值过滤
significant = ds.results_df[ds.results_df.padj < 0.05]
# 按显著性和效应大小同时过滤
sig_and_large = ds.results_df[
(ds.results_df.padj < 0.05) &
(abs(ds.results_df.log2FoldChange) > 1)
]
# 分离上调和下调基因
upregulated = significant[significant.log2FoldChange > 0]
downregulated = significant[significant.log2FoldChange < 0]
print(f"Upregulated: {len(upregulated)}")
print(f"Downregulated: {len(downregulated)}")
# 按调整后 p 值排序
top_by_padj = ds.results_df.sort_values("padj").head(20)
# 按绝对倍数变化排序(使用收缩值)
ds.lfc_shrink()
ds.results_df["abs_lfc"] = abs(ds.results_df.log2FoldChange)
top_by_lfc = ds.results_df.sort_values("abs_lfc", ascending=False).head(20)
# 按组合指标排序
ds.results_df["score"] = -np.log10(ds.results_df.padj) * abs(ds.results_df.log2FoldChange)
top_combined = ds.results_df.sort_values("score", ascending=False).head(20)
# 检查归一化(大小因子应接近 1)
print("Size factors:", dds.obsm["size_factors"])
# 检查离散度估计
import matplotlib.pyplot as plt
plt.hist(dds.varm["dispersions"], bins=50)
plt.xlabel("Dispersion")
plt.ylabel("Frequency")
plt.title("Dispersion Distribution")
plt.show()
# 检查 p 值分布(应基本平坦,在 0 附近有峰值)
plt.hist(ds.results_df.pvalue.dropna(), bins=50)
plt.xlabel("P-value")
plt.ylabel("Frequency")
plt.title("P-value Distribution")
plt.show()
可视化显著性与效应大小:
import matplotlib.pyplot as plt
import numpy as np
results = ds.results_df.copy()
results["-log10(padj)"] = -np.log10(results.padj)
plt.figure(figsize=(10, 6))
significant = results.padj < 0.05
plt.scatter(
results.loc[~significant, "log2FoldChange"],
results.loc[~significant, "-log10(padj)"],
alpha=0.3, s=10, c='gray', label='Not significant'
)
plt.scatter(
results.loc[significant, "log2FoldChange"],
results.loc[significant, "-log10(padj)"],
alpha=0.6, s=10, c='red', label='padj < 0.05'
)
plt.axhline(-np.log10(0.05), color='blue', linestyle='--', alpha=0.5)
plt.xlabel("Log2 Fold Change")
plt.ylabel("-Log10(Adjusted P-value)")
plt.title("Volcano Plot")
plt.legend()
plt.savefig("volcano_plot.png", dpi=300)
显示倍数变化与平均表达:
plt.figure(figsize=(10, 6))
plt.scatter(
np.log10(results.loc[~significant, "baseMean"] + 1),
results.loc[~significant, "log2FoldChange"],
alpha=0.3, s=10, c='gray'
)
plt.scatter(
np.log10(results.loc[significant, "baseMean"] + 1),
results.loc[significant, "log2FoldChange"],
alpha=0.6, s=10, c='red'
)
plt.axhline(0, color='blue', linestyle='--', alpha=0.5)
plt.xlabel("Log10(Base Mean + 1)")
plt.ylabel("Log2 Fold Change")
plt.title("MA Plot")
plt.savefig("ma_plot.png", dpi=300)
问题: "计数和元数据之间的索引不匹配"
解决方案: 确保样本名称完全匹配
print("Counts samples:", counts_df.index.tolist())
print("Metadata samples:", metadata.index.tolist())
# 如果需要,取交集
common = counts_df.index.intersection(metadata.index)
counts_df = counts_df.loc[common]
metadata = metadata.loc[common]
问题: "所有基因的计数都为零"
解决方案: 检查数据是否需要转置
print(f"Counts shape: {counts_df.shape}")
# 如果基因数 > 样本数,则需要转置
if counts_df.shape[1] < counts_df.shape[0]:
counts_df = counts_df.T
问题: "设计矩阵不是满秩的"
原因: 混杂变量(例如,所有处理样本都在一个批次中)
解决方案: 移除混杂变量或添加交互项
# 检查混杂
print(pd.crosstab(metadata.condition, metadata.batch))
# 简化设计或添加交互
design = "~condition" # 移除批次
# 或者
design = "~condition + batch + condition:batch" # 建模交互
诊断:
# 检查离散度分布
plt.hist(dds.varm["dispersions"], bins=50)
plt.show()
# 检查大小因子
print(dds.obsm["size_factors"])
# 查看按原始 p 值排序的顶部基因
print(ds.results_df.nsmallest(20, "pvalue"))
可能原因:
有关此面向工作流程指南之外的全面详细信息:
API 参考 (references/api_reference.md):PyDESeq2 类、方法和数据结构的完整文档。当需要详细的参数信息或理解对象属性时使用。
工作流程指南 (references/workflow_guide.md):深入指南,涵盖完整的分析工作流程、数据加载模式、多因子设计、故障排除和最佳实践。当处理复杂的实验设计或遇到问题时使用。
当用户需要时,请将这些参考加载到上下文中:
Read references/api_reference.mdRead references/workflow_guide.mdRead references/workflow_guide.md(参见故障排除部分)数据方向很重要: 计数矩阵通常加载为基因 × 样本,但需要是样本 × 基因。如果需要,始终使用 .T 进行转置。
样本过滤: 在分析前移除元数据缺失的样本以避免错误。
基因过滤: 过滤低计数基因(例如,总读取数 < 10)以提高功效并减少计算时间。
设计公式顺序: 将调整变量放在关注变量之前(例如,"~batch + condition" 而不是 "~condition + batch")。
LFC 收缩时机: 在统计检验后应用收缩,仅用于可视化/排序目的。P 值仍基于未收缩的估计值。
结果解释: 使用 padj < 0.05 表示显著性,而不是原始 p 值。Benjamini-Hochberg 程序控制错误发现率。
对比规范: 格式为 [变量, 测试水平, 参考水平],其中测试水平与参考水平进行比较。
保存中间对象: 使用 pickle 保存 DeseqDataSet 对象,以供后续使用或进行额外分析,而无需重新运行昂贵的拟合步骤。
uv pip install pydeseq2
系统要求:
可视化可选:
每周安装数
122
仓库
GitHub 星标数
22.6K
首次出现
2026 年 1 月 21 日
安全审计
安装于
claude-code104
opencode98
gemini-cli93
cursor93
antigravity86
codex82
PyDESeq2 is a Python implementation of DESeq2 for differential expression analysis with bulk RNA-seq data. Design and execute complete workflows from data loading through result interpretation, including single-factor and multi-factor designs, Wald tests with multiple testing correction, optional apeGLM shrinkage, and integration with pandas and AnnData.
This skill should be used when:
For users who want to perform a standard differential expression analysis:
import pandas as pd
from pydeseq2.dds import DeseqDataSet
from pydeseq2.ds import DeseqStats
# 1. Load data
counts_df = pd.read_csv("counts.csv", index_col=0).T # Transpose to samples × genes
metadata = pd.read_csv("metadata.csv", index_col=0)
# 2. Filter low-count genes
genes_to_keep = counts_df.columns[counts_df.sum(axis=0) >= 10]
counts_df = counts_df[genes_to_keep]
# 3. Initialize and fit DESeq2
dds = DeseqDataSet(
counts=counts_df,
metadata=metadata,
design="~condition",
refit_cooks=True
)
dds.deseq2()
# 4. Perform statistical testing
ds = DeseqStats(dds, contrast=["condition", "treated", "control"])
ds.summary()
# 5. Access results
results = ds.results_df
significant = results[results.padj < 0.05]
print(f"Found {len(significant)} significant genes")
Input requirements:
Common data loading patterns:
# From CSV (typical format: genes × samples, needs transpose)
counts_df = pd.read_csv("counts.csv", index_col=0).T
metadata = pd.read_csv("metadata.csv", index_col=0)
# From TSV
counts_df = pd.read_csv("counts.tsv", sep="\t", index_col=0).T
# From AnnData
import anndata as ad
adata = ad.read_h5ad("data.h5ad")
counts_df = pd.DataFrame(adata.X, index=adata.obs_names, columns=adata.var_names)
metadata = adata.obs
Data filtering:
# Remove low-count genes
genes_to_keep = counts_df.columns[counts_df.sum(axis=0) >= 10]
counts_df = counts_df[genes_to_keep]
# Remove samples with missing metadata
samples_to_keep = ~metadata.condition.isna()
counts_df = counts_df.loc[samples_to_keep]
metadata = metadata.loc[samples_to_keep]
The design formula specifies how gene expression is modeled.
Single-factor designs:
design = "~condition" # Simple two-group comparison
Multi-factor designs:
design = "~batch + condition" # Control for batch effects
design = "~age + condition" # Include continuous covariate
design = "~group + condition + group:condition" # Interaction effects
Design formula guidelines:
Initialize the DeseqDataSet and run the complete pipeline:
from pydeseq2.dds import DeseqDataSet
dds = DeseqDataSet(
counts=counts_df,
metadata=metadata,
design="~condition",
refit_cooks=True, # Refit after removing outliers
n_cpus=1 # Parallel processing (adjust as needed)
)
# Run the complete DESeq2 pipeline
dds.deseq2()
Whatdeseq2() does:
Perform Wald tests to identify differentially expressed genes:
from pydeseq2.ds import DeseqStats
ds = DeseqStats(
dds,
contrast=["condition", "treated", "control"], # Test treated vs control
alpha=0.05, # Significance threshold
cooks_filter=True, # Filter outliers
independent_filter=True # Filter low-power tests
)
ds.summary()
Contrast specification:
[variable, test_level, reference_level]["condition", "treated", "control"] tests treated vs controlNone, uses the last coefficient in the designResult DataFrame columns:
baseMean: Mean normalized count across sampleslog2FoldChange: Log2 fold change between conditionslfcSE: Standard error of LFCstat: Wald test statisticpvalue: Raw p-valuepadj: Adjusted p-value (FDR-corrected via Benjamini-Hochberg)Apply shrinkage to reduce noise in fold change estimates:
ds.lfc_shrink() # Applies apeGLM shrinkage
When to use LFC shrinkage:
Important: Shrinkage affects only the log2FoldChange values, not the statistical test results (p-values remain unchanged). Use shrunk values for visualization but report unshrunken p-values for significance.
Save results and intermediate objects:
import pickle
# Export results as CSV
ds.results_df.to_csv("deseq2_results.csv")
# Save significant genes only
significant = ds.results_df[ds.results_df.padj < 0.05]
significant.to_csv("significant_genes.csv")
# Save DeseqDataSet for later use
with open("dds_result.pkl", "wb") as f:
pickle.dump(dds.to_picklable_anndata(), f)
Standard case-control comparison:
dds = DeseqDataSet(counts=counts_df, metadata=metadata, design="~condition")
dds.deseq2()
ds = DeseqStats(dds, contrast=["condition", "treated", "control"])
ds.summary()
results = ds.results_df
significant = results[results.padj < 0.05]
Testing multiple treatment groups against control:
dds = DeseqDataSet(counts=counts_df, metadata=metadata, design="~condition")
dds.deseq2()
treatments = ["treatment_A", "treatment_B", "treatment_C"]
all_results = {}
for treatment in treatments:
ds = DeseqStats(dds, contrast=["condition", treatment, "control"])
ds.summary()
all_results[treatment] = ds.results_df
sig_count = len(ds.results_df[ds.results_df.padj < 0.05])
print(f"{treatment}: {sig_count} significant genes")
Control for technical variation:
# Include batch in design
dds = DeseqDataSet(counts=counts_df, metadata=metadata, design="~batch + condition")
dds.deseq2()
# Test condition while controlling for batch
ds = DeseqStats(dds, contrast=["condition", "treated", "control"])
ds.summary()
Include continuous variables like age or dosage:
# Ensure continuous variable is numeric
metadata["age"] = pd.to_numeric(metadata["age"])
dds = DeseqDataSet(counts=counts_df, metadata=metadata, design="~age + condition")
dds.deseq2()
ds = DeseqStats(dds, contrast=["condition", "treated", "control"])
ds.summary()
This skill includes a complete command-line script for standard analyses:
# Basic usage
python scripts/run_deseq2_analysis.py \
--counts counts.csv \
--metadata metadata.csv \
--design "~condition" \
--contrast condition treated control \
--output results/
# With additional options
python scripts/run_deseq2_analysis.py \
--counts counts.csv \
--metadata metadata.csv \
--design "~batch + condition" \
--contrast condition treated control \
--output results/ \
--min-counts 10 \
--alpha 0.05 \
--n-cpus 4 \
--plots
Script features:
Refer users to scripts/run_deseq2_analysis.py when they need a standalone analysis tool or want to batch process multiple datasets.
# Filter by adjusted p-value
significant = ds.results_df[ds.results_df.padj < 0.05]
# Filter by both significance and effect size
sig_and_large = ds.results_df[
(ds.results_df.padj < 0.05) &
(abs(ds.results_df.log2FoldChange) > 1)
]
# Separate up- and down-regulated
upregulated = significant[significant.log2FoldChange > 0]
downregulated = significant[significant.log2FoldChange < 0]
print(f"Upregulated: {len(upregulated)}")
print(f"Downregulated: {len(downregulated)}")
# Sort by adjusted p-value
top_by_padj = ds.results_df.sort_values("padj").head(20)
# Sort by absolute fold change (use shrunk values)
ds.lfc_shrink()
ds.results_df["abs_lfc"] = abs(ds.results_df.log2FoldChange)
top_by_lfc = ds.results_df.sort_values("abs_lfc", ascending=False).head(20)
# Sort by a combined metric
ds.results_df["score"] = -np.log10(ds.results_df.padj) * abs(ds.results_df.log2FoldChange)
top_combined = ds.results_df.sort_values("score", ascending=False).head(20)
# Check normalization (size factors should be close to 1)
print("Size factors:", dds.obsm["size_factors"])
# Examine dispersion estimates
import matplotlib.pyplot as plt
plt.hist(dds.varm["dispersions"], bins=50)
plt.xlabel("Dispersion")
plt.ylabel("Frequency")
plt.title("Dispersion Distribution")
plt.show()
# Check p-value distribution (should be mostly flat with peak near 0)
plt.hist(ds.results_df.pvalue.dropna(), bins=50)
plt.xlabel("P-value")
plt.ylabel("Frequency")
plt.title("P-value Distribution")
plt.show()
Visualize significance vs effect size:
import matplotlib.pyplot as plt
import numpy as np
results = ds.results_df.copy()
results["-log10(padj)"] = -np.log10(results.padj)
plt.figure(figsize=(10, 6))
significant = results.padj < 0.05
plt.scatter(
results.loc[~significant, "log2FoldChange"],
results.loc[~significant, "-log10(padj)"],
alpha=0.3, s=10, c='gray', label='Not significant'
)
plt.scatter(
results.loc[significant, "log2FoldChange"],
results.loc[significant, "-log10(padj)"],
alpha=0.6, s=10, c='red', label='padj < 0.05'
)
plt.axhline(-np.log10(0.05), color='blue', linestyle='--', alpha=0.5)
plt.xlabel("Log2 Fold Change")
plt.ylabel("-Log10(Adjusted P-value)")
plt.title("Volcano Plot")
plt.legend()
plt.savefig("volcano_plot.png", dpi=300)
Show fold change vs mean expression:
plt.figure(figsize=(10, 6))
plt.scatter(
np.log10(results.loc[~significant, "baseMean"] + 1),
results.loc[~significant, "log2FoldChange"],
alpha=0.3, s=10, c='gray'
)
plt.scatter(
np.log10(results.loc[significant, "baseMean"] + 1),
results.loc[significant, "log2FoldChange"],
alpha=0.6, s=10, c='red'
)
plt.axhline(0, color='blue', linestyle='--', alpha=0.5)
plt.xlabel("Log10(Base Mean + 1)")
plt.ylabel("Log2 Fold Change")
plt.title("MA Plot")
plt.savefig("ma_plot.png", dpi=300)
Issue: "Index mismatch between counts and metadata"
Solution: Ensure sample names match exactly
print("Counts samples:", counts_df.index.tolist())
print("Metadata samples:", metadata.index.tolist())
# Take intersection if needed
common = counts_df.index.intersection(metadata.index)
counts_df = counts_df.loc[common]
metadata = metadata.loc[common]
Issue: "All genes have zero counts"
Solution: Check if data needs transposition
print(f"Counts shape: {counts_df.shape}")
# If genes > samples, transpose is needed
if counts_df.shape[1] < counts_df.shape[0]:
counts_df = counts_df.T
Issue: "Design matrix is not full rank"
Cause: Confounded variables (e.g., all treated samples in one batch)
Solution: Remove confounded variable or add interaction term
# Check confounding
print(pd.crosstab(metadata.condition, metadata.batch))
# Either simplify design or add interaction
design = "~condition" # Remove batch
# OR
design = "~condition + batch + condition:batch" # Model interaction
Diagnostics:
# Check dispersion distribution
plt.hist(dds.varm["dispersions"], bins=50)
plt.show()
# Check size factors
print(dds.obsm["size_factors"])
# Look at top genes by raw p-value
print(ds.results_df.nsmallest(20, "pvalue"))
Possible causes:
For comprehensive details beyond this workflow-oriented guide:
API Reference (references/api_reference.md): Complete documentation of PyDESeq2 classes, methods, and data structures. Use when needing detailed parameter information or understanding object attributes.
Workflow Guide (references/workflow_guide.md): In-depth guide covering complete analysis workflows, data loading patterns, multi-factor designs, troubleshooting, and best practices. Use when handling complex experimental designs or encountering issues.
Load these references into context when users need:
Read references/api_reference.mdRead references/workflow_guide.mdRead references/workflow_guide.md (see Troubleshooting section)Data orientation matters: Count matrices typically load as genes × samples but need to be samples × genes. Always transpose with .T if needed.
Sample filtering: Remove samples with missing metadata before analysis to avoid errors.
Gene filtering: Filter low-count genes (e.g., < 10 total reads) to improve power and reduce computational time.
Design formula order: Put adjustment variables before the variable of interest (e.g., "~batch + condition" not "~condition + batch").
LFC shrinkage timing: Apply shrinkage after statistical testing and only for visualization/ranking purposes. P-values remain based on unshrunken estimates.
Result interpretation: Use padj < 0.05 for significance, not raw p-values. The Benjamini-Hochberg procedure controls false discovery rate.
Contrast specification: The format is [variable, test_level, reference_level] where test_level is compared against reference_level.
uv pip install pydeseq2
System requirements:
Optional for visualization:
Weekly Installs
122
Repository
GitHub Stars
22.6K
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
claude-code104
opencode98
gemini-cli93
cursor93
antigravity86
codex82
DOCX文件创建、编辑与分析完整指南 - 使用docx-js、Pandoc和Python脚本
51,800 周安装
ActiveCampaign自动化集成指南:通过Rube MCP实现CRM与营销自动化
72 周安装
通过Rube MCP实现Make自动化:集成Composio工具包管理场景与操作
72 周安装
Microsoft Teams自动化指南:通过Rube MCP实现频道消息、聊天与会议管理
72 周安装
Electrobun 最佳实践:TypeScript + Bun 跨平台桌面应用开发指南
72 周安装
ATXP Memory:AI代理记忆管理工具 - 云端备份与本地向量搜索
72 周安装
Brave Search Spellcheck API:智能拼写检查与查询纠正,提升搜索准确性
72 周安装
Save intermediate objects: Use pickle to save DeseqDataSet objects for later use or additional analyses without re-running the expensive fitting step.