tooluniverse-epigenomics by mims-harvard/tooluniverse
npx skills add https://github.com/mims-harvard/tooluniverse --skill tooluniverse-epigenomics用于处理和分析表观基因组学数据的生产就绪计算技能。结合本地 Python 计算(pandas、scipy、numpy、pysam、statsmodels)与 ToolUniverse 注释工具,提供调控背景信息。旨在解决关于甲基化、ChIP-seq、ATAC-seq 和多组学整合的 BixBench 风格问题。
触发条件:
示例问题:
不适用于(请使用其他技能):
tooluniverse-rnaseq-deseq2广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
tooluniverse-variant-analysistooluniverse-gene-enrichmenttooluniverse-protein-structure-retrieval# Core (MUST be available)
import pandas as pd
import numpy as np
from scipy import stats
import statsmodels.stats.multitest as mt
# Optional but useful
import pysam # BAM/CRAM file access
import gseapy # Enrichment of genes from methylation analysis
# ToolUniverse (for annotation)
from tooluniverse import ToolUniverse
在编写任何代码之前,先解析问题以确定:
通过扫描关键词对文件进行分类:methyl/beta/cpg/illumina、chip/peak/narrowpeak、atac/accessibility、clinical/patient/sample、express/rnaseq/fpkm、manifest/annotation/probe。
完整的决策树和参数提取表请参见 ANALYSIS_PROCEDURES.md。
甲基化分析的核心功能:
完整的函数实现请参见 CODE_REFERENCE.md 阶段 1。
完整的函数实现请参见 CODE_REFERENCE.md 阶段 2。
完整的函数实现请参见 CODE_REFERENCE.md 阶段 3。
完整的函数实现请参见 CODE_REFERENCE.md 阶段 4。
完整的函数实现请参见 CODE_REFERENCE.md 阶段 5。
在计算分析后使用 ToolUniverse 工具添加生物学背景:
参数详情请参见 CODE_REFERENCE.md 阶段 6 和 TOOLS_REFERENCE.md。
完整的函数实现请参见 CODE_REFERENCE.md 阶段 7。
| 模式 | 输入 | 关键步骤 | 输出 |
|---|---|---|---|
| 差异甲基化 | β 值矩阵 + 临床数据 | 过滤探针 -> 定义组 -> t 检验 -> FDR -> 阈值 | 显著差异甲基化位点计数 |
| 年龄相关 CpG 密度 | β 值矩阵 + 清单 + 年龄 | 与年龄相关性 -> FDR -> 映射到染色体 -> 每条染色体密度 | 染色体间密度比率 |
| 多组学缺失数据 | 临床数据 + 表达数据 + 甲基化数据 | 提取样本 ID -> 取交集 -> 检查 NaN | 完整病例计数 |
| ChIP-seq 峰注释 | BED/narrowPeak + 基因注释 | 加载峰 -> 注释到基因 -> 分类区域 | 位于启动子的比例 |
| 甲基化-表达 | β 值矩阵 + 表达数据 + 探针-基因映射 | 对齐样本 -> 相关性 -> FDR | 显著负相关 |
详细的逐步流程和边缘情况处理请参见 ANALYSIS_PROCEDURES.md。
| 函数 | 目的 | 输入 | 输出 |
|---|---|---|---|
load_methylation_data() | 加载 β/M 值矩阵 | 文件路径 | DataFrame |
detect_methylation_type() | 检测 β 值与 M 值 | DataFrame | 'beta' 或 'mvalue' |
filter_cpg_probes() | 按条件过滤探针 | DataFrame + 过滤器 | 过滤后的 DataFrame |
differential_methylation() | 组间差异甲基化分析 | β 值 + 样本 | 包含 padj 的 DataFrame |
identify_age_related_cpgs() | 年龄相关 CpG | β 值 + 年龄 | 包含 padj 的 DataFrame |
chromosome_cpg_density() | 每条染色体的 CpG 密度 | 探针 + 清单 | 密度 DataFrame |
genome_wide_average_density() | 总体基因组密度 | 密度 DataFrame | 浮点数 |
chromosome_density_ratio() | 染色体间比率 | 密度 + 染色体名称 | 浮点数 |
load_bed_file() | 加载 BED/narrowPeak | 文件路径 | DataFrame |
peak_statistics() | 基本峰统计 | BED DataFrame | 字典 |
annotate_peaks_to_genes() | 将峰注释到基因 | 峰 + 基因 | 注释后的 DataFrame |
find_overlaps() | 峰重叠分析 | 两个 BED DataFrame | 重叠 DataFrame |
missing_data_analysis() | 跨模态完整性分析 | 多个 DataFrame | 字典 |
correlate_methylation_expression() | 甲基化-表达相关性 | β 值 + 表达数据 | 相关性 DataFrame |
ensembl_lookup_gene - 基因坐标、生物类型(需要 species='homo_sapiens')ensembl_get_regulatory_features - 按区域的调控特征(区域中不要使用 "chr" 前缀)ensembl_get_overlap_features - 基因/转录本重叠数据SCREEN_get_regulatory_elements - cCREs:增强子、启动子、绝缘子ReMap_get_transcription_factor_binding - TF 结合位点RegulomeDB_query_variant - 变异调控评分jaspar_search_matrices - TF 结合矩阵ENCODE_search_experiments - 实验元数据(assay_title 必须是 "TF ChIP-seq",而不是 "ChIP-seq")ChIPAtlas_get_experiments - ChIP-seq 实验(需要 operation 参数)ChIPAtlas_search_datasets - 数据集搜索(需要 operation 参数)ChIPAtlas_enrichment_analysis - 来自 BED/基序/基因的富集分析ChIPAtlas_get_peak_data - 峰数据下载(需要 operation 参数)FourDN_search_data - 染色质构象数据(需要 operation 参数)MyGene_query_genes - 基因查询MyGene_batch_query - 批量基因查询HGNC_get_gene_info - 基因符号、别名、IDGO_get_annotations_for_gene - GO 注释完整的参数详情和返回模式请参见 TOOLS_REFERENCE.md。
| 版本 | 物种 | 常染色体 | 性染色体 |
|---|---|---|---|
| hg38 (GRCh38) | 人类 | chr1-chr22 | chrX, chrY |
| hg19 (GRCh37) | 人类 | chr1-chr22 | chrX, chrY |
| mm10 (GRCm38) | 小鼠 | chr1-chr19 | chrX, chrY |
CODE_REFERENCE.md - 所有阶段的完整 Python 函数实现TOOLS_REFERENCE.md - ToolUniverse 工具参数详情和返回模式ANALYSIS_PROCEDURES.md - 决策树、逐步分析模式、边缘情况、备用策略QUICK_START.md - 常见分析类型的快速入门示例每周安装量
142
代码仓库
GitHub 星标数
1.2K
首次出现
2026年2月16日
安全审计
安装于
gemini-cli136
codex136
opencode136
github-copilot134
kimi-cli130
amp130
Production-ready computational skill for processing and analyzing epigenomics data. Combines local Python computation (pandas, scipy, numpy, pysam, statsmodels) with ToolUniverse annotation tools for regulatory context. Designed to solve BixBench-style questions about methylation, ChIP-seq, ATAC-seq, and multi-omics integration.
Triggers :
Example Questions :
NOT for (use other skills instead):
tooluniverse-rnaseq-deseq2tooluniverse-variant-analysistooluniverse-gene-enrichmenttooluniverse-protein-structure-retrieval# Core (MUST be available)
import pandas as pd
import numpy as np
from scipy import stats
import statsmodels.stats.multitest as mt
# Optional but useful
import pysam # BAM/CRAM file access
import gseapy # Enrichment of genes from methylation analysis
# ToolUniverse (for annotation)
from tooluniverse import ToolUniverse
Before writing any code, parse the question to identify:
Categorize files by scanning for keywords: methyl/beta/cpg/illumina, chip/peak/narrowpeak, atac/accessibility, clinical/patient/sample, express/rnaseq/fpkm, manifest/annotation/probe.
See ANALYSIS_PROCEDURES.md for the full decision tree and parameter extraction table.
Core functions for methylation analysis:
See CODE_REFERENCE.md Phase 1 for full function implementations.
See CODE_REFERENCE.md Phase 2 for full function implementations.
See CODE_REFERENCE.md Phase 3 for full function implementations.
See CODE_REFERENCE.md Phase 4 for full function implementations.
See CODE_REFERENCE.md Phase 5 for full function implementations.
Use ToolUniverse tools to add biological context after computational analysis:
See CODE_REFERENCE.md Phase 6 and TOOLS_REFERENCE.md for parameters.
See CODE_REFERENCE.md Phase 7 for full function implementations.
| Pattern | Input | Key Steps | Output |
|---|---|---|---|
| Differential methylation | Beta matrix + clinical | Filter probes -> define groups -> t-test -> FDR -> threshold | Count of significant DMPs |
| Age-related CpG density | Beta matrix + manifest + ages | Correlate with age -> FDR -> map to chr -> density per chr | Density ratio between chromosomes |
| Multi-omics missing data | Clinical + expression + methylation | Extract sample IDs -> intersect -> check NaN | Complete case count |
| ChIP-seq peak annotation | BED/narrowPeak + gene annotation | Load peaks -> annotate to genes -> classify regions | Fraction in promoters |
| Methylation-expression | Beta matrix + expression + probe-gene map | Align samples -> correlate -> FDR | Significant anti-correlations |
See ANALYSIS_PROCEDURES.md for detailed step-by-step flows and edge case handling.
| Function | Purpose | Input | Output |
|---|---|---|---|
load_methylation_data() | Load beta/M-value matrix | file path | DataFrame |
detect_methylation_type() | Detect beta vs M-values | DataFrame | 'beta' or 'mvalue' |
filter_cpg_probes() | Filter probes by criteria | DataFrame + filters | filtered DataFrame |
differential_methylation() | DM analysis between groups | beta + samples |
ensembl_lookup_gene - Gene coordinates, biotype (REQUIRES species='homo_sapiens')ensembl_get_regulatory_features - Regulatory features by region (NO "chr" prefix in region)ensembl_get_overlap_features - Gene/transcript overlap dataSCREEN_get_regulatory_elements - cCREs: enhancers, promoters, insulatorsReMap_get_transcription_factor_binding - TF binding sitesRegulomeDB_query_variant - Variant regulatory scorejaspar_search_matrices - TF binding matricesENCODE_search_experiments - Experiment metadata (assay_title must be "TF ChIP-seq" not "ChIP-seq")MyGene_query_genes - Gene queryMyGene_batch_query - Batch gene queryHGNC_get_gene_info - Gene symbol, aliases, IDsGO_get_annotations_for_gene - GO annotationsSee TOOLS_REFERENCE.md for full parameter details and return schemas.
| Build | Species | Autosomes | Sex Chromosomes |
|---|---|---|---|
| hg38 (GRCh38) | Human | chr1-chr22 | chrX, chrY |
| hg19 (GRCh37) | Human | chr1-chr22 | chrX, chrY |
| mm10 (GRCm38) | Mouse | chr1-chr19 | chrX, chrY |
CODE_REFERENCE.md - Full Python function implementations for all phasesTOOLS_REFERENCE.md - ToolUniverse tool parameter details and return schemasANALYSIS_PROCEDURES.md - Decision trees, step-by-step analysis patterns, edge cases, fallback strategiesQUICK_START.md - Quick start examples for common analysis typesWeekly Installs
142
Repository
GitHub Stars
1.2K
First Seen
Feb 16, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
gemini-cli136
codex136
opencode136
github-copilot134
kimi-cli130
amp130
智能OCR文字识别工具 - 支持100+语言,高精度提取图片/PDF/手写文本
1,100 周安装
Anthropic品牌风格指南技能 - 获取官方品牌标识、颜色、字体和视觉设计规范
1,100 周安装
Ant Design 最佳实践指南:React 组件库使用决策、主题配置与性能优化
1,200 周安装
Day 5 Fetch & Digest:内容获取与消化技能教程 - 学习API调用、字幕提取与Quiz-First学习法
1,100 周安装
App Store Connect 元数据自动化本地化工具 - asc-localize-metadata 使用指南
1,200 周安装
YARA-X 规则编写指南:恶意软件检测规则优化与最佳实践
1,200 周安装
Peekaboo:macOS UI 自动化 CLI 工具 - 屏幕捕获、元素定位与输入驱动
1,200 周安装
| DataFrame with padj |
identify_age_related_cpgs() | Age-correlated CpGs | beta + ages | DataFrame with padj |
chromosome_cpg_density() | CpG density per chromosome | probes + manifest | density DataFrame |
genome_wide_average_density() | Overall genome density | density DataFrame | float |
chromosome_density_ratio() | Ratio between chromosomes | density + chr names | float |
load_bed_file() | Load BED/narrowPeak | file path | DataFrame |
peak_statistics() | Basic peak stats | BED DataFrame | dict |
annotate_peaks_to_genes() | Annotate peaks to genes | peaks + genes | annotated DataFrame |
find_overlaps() | Peak overlap analysis | two BED DataFrames | overlap DataFrame |
missing_data_analysis() | Cross-modality completeness | multiple DataFrames | dict |
correlate_methylation_expression() | Meth-expression correlation | beta + expression | correlation DataFrame |
ChIPAtlas_get_experiments - ChIP-seq experiments (REQUIRES operation param)ChIPAtlas_search_datasets - Dataset search (REQUIRES operation param)ChIPAtlas_enrichment_analysis - Enrichment from BED/motifs/genesChIPAtlas_get_peak_data - Peak data download (REQUIRES operation param)FourDN_search_data - Chromatin conformation data (REQUIRES operation param)