显微图像分析与定量成像数据技能：使用Python进行菌落形态计量、细胞计数与统计建模

tooluniverse-image-analysis by mims-harvard/tooluniverse

220 周安装量

1,200 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/mims-harvard/tooluniverse --skill tooluniverse-image-analysis

AI/机器学习数据分析科研工具

🇨🇳中文介绍

显微图像分析与定量成像数据

使用 pandas、numpy、scipy、statsmodels 和 scikit-image 分析显微镜测量数据的生产就绪技能。专为 BixBench 成像问题设计，涵盖菌落形态计量、细胞计数、荧光定量、回归建模和统计比较。

重要提示：此技能处理复杂的多工作流分析。大多数实现细节已移至 references/ 目录以进行渐进式披露。本文档侧重于高层决策和工作流编排。

何时使用此技能

当用户具备以下情况时应用：

拥有 CSV/TSV 格式的显微镜测量数据（面积、圆形度、强度、细胞计数）
询问菌落形态计量（细菌群游、生物膜、生长测定）
需要对成像测量进行统计比较（t 检验、ANOVA、Dunnett 检验、Mann-Whitney 检验）
询问细胞计数统计（NeuN、DAPI、标记物计数）
需要效应量计算（Cohen's d）和功效分析
希望将回归模型（多项式、样条）拟合到剂量反应或比率数据
询问模型比较（R 平方、F 统计量、AIC/BIC）
需要对成像数据进行 Shapiro-Wilk 正态性检验
希望获得拟合模型峰值预测的置信区间
问题提及成像软件输出（ImageJ、CellProfiler、QuPath）
需要荧光强度定量或共定位分析
询问图像分割结果（计数、面积、形状）

BixBench 覆盖范围：涵盖 4 个项目（bix-18、bix-19、bix-41、bix-54）的 21 个问题

不适用于（请改用其他技能）：

系统发育分析 → 使用 tooluniverse-phylogenetics
RNA-seq 差异表达 → 使用 tooluniverse-rnaseq-deseq2
单细胞 scRNA-seq → 使用

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

高层工作流决策树

START: User question about microscopy data
│
├─ Q1: What type of data is available?
│  │
│  ├─ PRE-QUANTIFIED DATA (CSV/TSV with measurements)
│  │  └─ Workflow: Load → Parse question → Statistical analysis
│  │     Pattern: Most common BixBench pattern (bix-18, bix-19, bix-41, bix-54)
│  │     See: Section "Quantitative Data Analysis" below
│  │
│  └─ RAW IMAGES (TIFF, PNG, multi-channel)
│     └─ Workflow: Load → Segment → Measure → Analyze
│        See: references/image_processing.md
│
├─ Q2: What type of analysis is needed?
│  │
│  ├─ STATISTICAL COMPARISON
│  │  ├─ Two groups → t-test or Mann-Whitney
│  │  ├─ Multiple groups → ANOVA or Dunnett's test
│  │  ├─ Two factors → Two-way ANOVA
│  │  └─ Effect size → Cohen's d, power analysis
│  │  See: references/statistical_analysis.md
│  │
│  ├─ REGRESSION MODELING
│  │  ├─ Dose-response → Polynomial (quadratic, cubic)
│  │  ├─ Ratio optimization → Natural spline
│  │  └─ Model comparison → R-squared, F-statistic, AIC/BIC
│  │  See: references/statistical_analysis.md
│  │
│  ├─ CELL COUNTING
│  │  ├─ Fluorescence (DAPI, NeuN) → Threshold + watershed
│  │  ├─ Brightfield → Adaptive threshold
│  │  └─ High-density → CellPose or StarDist (external)
│  │  See: references/cell_counting.md
│  │
│  ├─ COLONY SEGMENTATION
│  │  ├─ Swarming assays → Otsu threshold + morphology
│  │  ├─ Biofilms → Li threshold + fill holes
│  │  └─ Growth assays → Time-lapse tracking
│  │  See: references/segmentation.md
│  │
│  └─ FLUORESCENCE QUANTIFICATION
│     ├─ Intensity measurement → regionprops
│     ├─ Colocalization → Pearson/Manders
│     └─ Multi-channel → Channel-wise quantification
│     See: references/fluorescence_analysis.md
│
└─ Q3: When to use scikit-image vs OpenCV?
   ├─ scikit-image: Scientific analysis, measurements, regionprops
   ├─ OpenCV: Fast processing, real-time, large batches
   └─ Both: Often interchangeable for basic operations
   See: references/image_processing.md "Library Selection Guide"

定量数据分析工作流

阶段 0：问题解析与数据发现

关键第一步：在编写任何代码之前，识别可用的数据文件以及问题要求的内容。

import os, glob, pandas as pd

# Discover data files
data_dir = "."
csv_files = glob.glob(os.path.join(data_dir, '**', '*.csv'), recursive=True)
tsv_files = glob.glob(os.path.join(data_dir, '**', '*.tsv'), recursive=True)
img_files = glob.glob(os.path.join(data_dir, '**', '*.tif*'), recursive=True)

# Load and inspect first measurement file
if csv_files:
    df = pd.read_csv(csv_files[0])
    print(f"Shape: {df.shape}")
    print(f"Columns: {list(df.columns)}")
    print(df.head())
    print(df.describe())

Area：菌落或细胞面积，以像素或校准单位表示
Circularity：4 pi 面积/周长^2，范围 [0,1]，1.0 = 完美圆形
Round：圆度 = 4 面积/(pi 长轴^2)
Genotype/Strain：生物学分组变量
Ratio：共培养混合比例（例如 "1:3", "5:1"）
NeuN/DAPI/GFP：细胞标记物计数或强度

阶段 1：分组统计

def grouped_summary(df, group_cols, measure_col):
    """Calculate summary statistics by group."""
    summary = df.groupby(group_cols)[measure_col].agg(
        Mean='mean',
        SD='std',
        Median='median',
        Min='min',
        Max='max',
        N='count'
    ).reset_index()
    summary['SEM'] = summary['SD'] / np.sqrt(summary['N'])
    return summary

# Example: Colony morphometry by genotype
area_summary = grouped_summary(df, 'Genotype', 'Area')
circ_summary = grouped_summary(df, 'Genotype', 'Circularity')

有关详细的统计函数，请参阅：references/statistical_analysis.md

阶段 2：统计检验

需要正态性检验吗？ → Shapiro-Wilk 检验
两组比较？ → t 检验或 Mann-Whitney 检验
多组与对照比较？ → Dunnett 检验
多组，所有比较？ → Tukey HSD 检验
两个因素？ → 双向 ANOVA
效应量？ → Cohen's d
样本量规划？ → 功效分析

请参阅：references/statistical_analysis.md 获取完整实现

阶段 3：回归建模

何时使用每种模型：

多项式（二次/三次）：平滑的剂量反应，清晰的峰值
自然样条：灵活，非参数，处理复杂模式
线性：简单关系，检查趋势

模型比较指标：

R 平方：整体拟合度（越高越好）
调整后 R 平方：惩罚复杂度
F 统计量 p 值：模型显著性
AIC/BIC：比较非嵌套模型

请参阅：references/statistical_analysis.md 获取完整实现

原始图像处理工作流

处理原始图像时

工作流：加载 → 预处理 → 分割 → 测量 → 导出

# Quick start for cell counting
from scripts.segment_cells import count_cells_in_image

result = count_cells_in_image(
    image_path="cells.tif",
    channel=0,  # DAPI channel
    min_area=50
)
print(f"Found {result['count']} cells")

细胞类型	密度	最佳方法	备注
细胞核 (DAPI)	低-中	Otsu + watershed	标准方法
细胞核 (DAPI)	高	CellPose/StarDist	处理接触细胞
菌落	分离良好	Otsu 阈值	快速，可靠
菌落	接触	Watershed	边缘检测
细胞 (相差)	任意	自适应阈值	处理不均匀照明
荧光	低信号	Li 阈值	更敏感

请参阅：references/segmentation.md 和 references/cell_counting.md 获取详细方案

库选择：scikit-image 与 OpenCV

使用 scikit-image 当：

需要科学测量（面积、周长、强度）
需要 regionprops 获取对象属性
发表质量的分析
对科学家来说语法更简单

使用 OpenCV 当：

处理大批量图像
速度至关重要
实时处理
高级计算机视觉功能

两者都适用于：

阈值化、滤波、形态学操作
基本图像变换
大多数分割任务

请参阅：references/image_processing.md "库选择指南"

常见 BixBench 模式

模式 1：菌落形态计量 (bix-18)

问题类型："面积最大的基因型的平均圆形度是多少？"

数据：包含 Genotype、Area、Circularity 列的 CSV

加载 CSV → 按 Genotype 分组
计算每个基因型的平均 Area
识别平均 Area 最大的基因型
报告该基因型的平均 Circularity

请参阅：references/segmentation.md "菌落形态计量分析"

模式 2：细胞计数统计 (bix-19)

问题类型："条件间 NeuN 计数的 Cohen's d 是多少？"

数据：包含 Condition、NeuN_count、Sex、Hemisphere 列的 CSV

加载 CSV → 根据需要按半球/性别筛选
按 Condition (KD vs CTRL) 拆分
使用合并 SD 计算 Cohen's d
报告效应量

请参阅：references/statistical_analysis.md "效应量计算"

模式 3：多组比较 (bix-41)

问题类型："Dunnett 检验：有多少个比例与对照等效？"

数据：包含多个共培养比例、Area、Circularity 的 CSV

创建 Strain_Ratio 标签
对 Area 运行 Dunnett 检验（与对照比较）
对 Circularity 运行 Dunnett 检验（与对照比较）
统计在两个检验中均不显著的组数

请参阅：references/statistical_analysis.md "Dunnett 检验"

模式 4：回归优化 (bix-54)

问题类型："自然样条模型的峰值频率是多少？"

数据：包含共培养频率和 Area 测量的 CSV

将比例字符串转换为频率
拟合自然样条模型 (df=4)
通过网格搜索找到峰值
报告峰值频率 + 置信区间

请参阅：references/statistical_analysis.md "回归建模"

任务	主要工具	参考
加载测量 CSV	pandas.read_csv()	本文档
分组统计	df.groupby().agg()	本文档
T 检验	scipy.stats.ttest_ind()	statistical_analysis.md
ANOVA	statsmodels.ols + anova_lm()	statistical_analysis.md
Dunnett 检验	scipy.stats.dunnett()	statistical_analysis.md
Cohen's d	自定义函数 (合并 SD)	statistical_analysis.md
功效分析	statsmodels TTestIndPower	statistical_analysis.md
多项式回归	statsmodels.OLS + poly features	statistical_analysis.md
自然样条	patsy.cr() + statsmodels.OLS	statistical_analysis.md
细胞分割	skimage.filters + watershed	cell_counting.md
菌落分割	skimage.filters.threshold_otsu	segmentation.md
荧光定量	skimage.measure.regionprops	fluorescence_analysis.md
共定位	Pearson/Manders	fluorescence_analysis.md
图像加载	tifffile, skimage.io	image_processing.md
批处理	scripts/batch_process.py	scripts/

scripts/ 目录中的即用型脚本：

segment_cells.py - 使用 watershed 进行细胞/细胞核计数
measure_fluorescence.py - 多通道强度定量
batch_process.py - 处理图像文件夹
colony_morphometry.py - 测量菌落面积/圆形度
statistical_comparison.py - 组比较统计

# Count cells in image
python scripts/segment_cells.py cells.tif --channel 0 --min-area 50

# Batch process folder
python scripts/batch_process.py input_folder/ output.csv --analysis cell_count

完整实现和方案请参阅：

references/statistical_analysis.md - 所有统计检验、回归模型
references/cell_counting.md - 细胞/细胞核计数方案
references/segmentation.md - 菌落和对象分割
references/fluorescence_analysis.md - 强度定量、共定位
references/image_processing.md - 图像加载、预处理、库选择
references/troubleshooting.md - 常见问题及解决方案

匹配 R 统计函数

一些 BixBench 问题使用 R 进行分析。Python 等效函数：

R 的 Dunnett 检验 (multcomp::glht) → scipy.stats.dunnett() (scipy ≥ 1.10)
R 的自然样条 (ns(x, df=4)) → patsy.cr(x, knots=...) 带有显式分位数节点
R 的 t 检验 (t.test()) → scipy.stats.ttest_ind()
R 的 ANOVA (aov()) → statsmodels.formula.api.ols() + sm.stats.anova_lm()

请参阅：references/statistical_analysis.md 获取精确的参数匹配

BixBench 期望特定格式：

"到最接近的千位"：int(round(val, -3))
百分比：通常是整数或 1-2 位小数
Cohen's d：3 位小数
样本量：始终为整数（向上取整）
比例：字符串格式 "5:1"

完整性检查清单

返回答案前，请验证：

已加载所有数据文件并检查了列名
已识别请求的特定统计量或模型
使用了正确的分组变量和筛选条件
应用了正确的舍入或格式
对于"多少个"问题：根据标准正确计数
对于统计检验：使用了适当的多重比较校正
对于回归：正确准备和转换了数据
双重检查了比较的方向
已验证答案在预期范围内

从本文档顶部的决策树开始
检查相关参考指南获取详细方案
使用示例脚本作为模板
查看故障排除指南了解常见问题
所有统计实现都在 statistical_analysis.md 中

🇺🇸English

Microscopy Image Analysis and Quantitative Imaging Data

Production-ready skill for analyzing microscopy-derived measurement data using pandas, numpy, scipy, statsmodels, and scikit-image. Designed for BixBench imaging questions covering colony morphometry, cell counting, fluorescence quantification, regression modeling, and statistical comparisons.

IMPORTANT : This skill handles complex multi-workflow analysis. Most implementation details have been moved to references/ for progressive disclosure. This document focuses on high-level decision-making and workflow orchestration.

When to Use This Skill

Apply when users:

Have microscopy measurement data (area, circularity, intensity, cell counts) in CSV/TSV
Ask about colony morphometry (bacterial swarming, biofilm, growth assays)
Need statistical comparisons of imaging measurements (t-test, ANOVA, Dunnett's, Mann-Whitney)
Ask about cell counting statistics (NeuN, DAPI, marker counts)
Need effect size calculations (Cohen's d) and power analysis
Want regression models (polynomial, spline) fitted to dose-response or ratio data
Ask about model comparison (R-squared, F-statistic, AIC/BIC)
Need Shapiro-Wilk normality testing on imaging data
Want confidence intervals for peak predictions from fitted models
Questions mention imaging software output (ImageJ, CellProfiler, QuPath)
Need fluorescence intensity quantification or colocalization analysis
Ask about image segmentation results (counts, areas, shapes)

BixBench Coverage : 21 questions across 4 projects (bix-18, bix-19, bix-41, bix-54)

NOT for (use other skills instead):

Phylogenetic analysis → Use tooluniverse-phylogenetics
RNA-seq differential expression → Use tooluniverse-rnaseq-deseq2
Single-cell scRNA-seq → Use tooluniverse-single-cell
Statistical regression only (no imaging context) → Use tooluniverse-statistical-modeling

Core Principles

Data-first approach - Load and inspect all CSV/TSV measurement data before analysis
Question-driven - Parse the exact statistic, comparison, or model requested
Statistical rigor - Proper effect sizes, multiple comparison corrections, model selection
Imaging-aware - Understand ImageJ/CellProfiler measurement columns (Area, Circularity, Round, Intensity)
Workflow flexibility - Support both pre-quantified data (CSV) and raw image processing
Precision - Match expected answer format (integer, range, decimal places)
Reproducible - Use standard Python/scipy equivalents to R functions

Required Python Packages

# Core (MUST be installed)
import pandas as pd
import numpy as np
from scipy import stats
from scipy.interpolate import BSpline, make_interp_spline
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.stats.power import TTestIndPower
from patsy import dmatrix, bs, cr

# Optional (for raw image processing)
import skimage
import cv2
import tifffile

Installation :

pip install pandas numpy scipy statsmodels patsy scikit-image opencv-python-headless tifffile

High-Level Workflow Decision Tree

START: User question about microscopy data
│
├─ Q1: What type of data is available?
│  │
│  ├─ PRE-QUANTIFIED DATA (CSV/TSV with measurements)
│  │  └─ Workflow: Load → Parse question → Statistical analysis
│  │     Pattern: Most common BixBench pattern (bix-18, bix-19, bix-41, bix-54)
│  │     See: Section "Quantitative Data Analysis" below
│  │
│  └─ RAW IMAGES (TIFF, PNG, multi-channel)
│     └─ Workflow: Load → Segment → Measure → Analyze
│        See: references/image_processing.md
│
├─ Q2: What type of analysis is needed?
│  │
│  ├─ STATISTICAL COMPARISON
│  │  ├─ Two groups → t-test or Mann-Whitney
│  │  ├─ Multiple groups → ANOVA or Dunnett's test
│  │  ├─ Two factors → Two-way ANOVA
│  │  └─ Effect size → Cohen's d, power analysis
│  │  See: references/statistical_analysis.md
│  │
│  ├─ REGRESSION MODELING
│  │  ├─ Dose-response → Polynomial (quadratic, cubic)
│  │  ├─ Ratio optimization → Natural spline
│  │  └─ Model comparison → R-squared, F-statistic, AIC/BIC
│  │  See: references/statistical_analysis.md
│  │
│  ├─ CELL COUNTING
│  │  ├─ Fluorescence (DAPI, NeuN) → Threshold + watershed
│  │  ├─ Brightfield → Adaptive threshold
│  │  └─ High-density → CellPose or StarDist (external)
│  │  See: references/cell_counting.md
│  │
│  ├─ COLONY SEGMENTATION
│  │  ├─ Swarming assays → Otsu threshold + morphology
│  │  ├─ Biofilms → Li threshold + fill holes
│  │  └─ Growth assays → Time-lapse tracking
│  │  See: references/segmentation.md
│  │
│  └─ FLUORESCENCE QUANTIFICATION
│     ├─ Intensity measurement → regionprops
│     ├─ Colocalization → Pearson/Manders
│     └─ Multi-channel → Channel-wise quantification
│     See: references/fluorescence_analysis.md
│
└─ Q3: When to use scikit-image vs OpenCV?
   ├─ scikit-image: Scientific analysis, measurements, regionprops
   ├─ OpenCV: Fast processing, real-time, large batches
   └─ Both: Often interchangeable for basic operations
   See: references/image_processing.md "Library Selection Guide"

Quantitative Data Analysis Workflow

Phase 0: Question Parsing and Data Discovery

CRITICAL FIRST STEP : Before writing ANY code, identify what data files are available and what the question is asking for.

import os, glob, pandas as pd

# Discover data files
data_dir = "."
csv_files = glob.glob(os.path.join(data_dir, '**', '*.csv'), recursive=True)
tsv_files = glob.glob(os.path.join(data_dir, '**', '*.tsv'), recursive=True)
img_files = glob.glob(os.path.join(data_dir, '**', '*.tif*'), recursive=True)

# Load and inspect first measurement file
if csv_files:
    df = pd.read_csv(csv_files[0])
    print(f"Shape: {df.shape}")
    print(f"Columns: {list(df.columns)}")
    print(df.head())
    print(df.describe())

Common Column Names :

Area: Colony or cell area in pixels or calibrated units
Circularity: 4 pi area/perimeter^2, range [0,1], 1.0 = perfect circle
Round: Roundness = 4 area/(pi major_axis^2)
Genotype/Strain: Biological grouping variable
Ratio: Co-culture mixing ratio (e.g., "1:3", "5:1")
NeuN/DAPI/GFP: Cell marker counts or intensities

Phase 1: Grouped Statistics

def grouped_summary(df, group_cols, measure_col):
    """Calculate summary statistics by group."""
    summary = df.groupby(group_cols)[measure_col].agg(
        Mean='mean',
        SD='std',
        Median='median',
        Min='min',
        Max='max',
        N='count'
    ).reset_index()
    summary['SEM'] = summary['SD'] / np.sqrt(summary['N'])
    return summary

# Example: Colony morphometry by genotype
area_summary = grouped_summary(df, 'Genotype', 'Area')
circ_summary = grouped_summary(df, 'Genotype', 'Circularity')

For detailed statistical functions, see: references/statistical_analysis.md

Phase 2: Statistical Testing

Decision guide :

Normality test needed? → Shapiro-Wilk
Two groups comparison? → t-test or Mann-Whitney
Multiple groups vs control? → Dunnett's test
Multiple groups, all comparisons? → Tukey HSD
Two factors? → Two-way ANOVA
Effect size? → Cohen's d
Sample size planning? → Power analysis

See: references/statistical_analysis.md for complete implementations

Phase 3: Regression Modeling

When to use each model :

Polynomial (quadratic/cubic): Smooth dose-response, clear peak
Natural spline: Flexible, non-parametric, handles complex patterns
Linear: Simple relationships, checking for trends

Model comparison metrics:

R-squared: Overall fit (higher = better)
Adjusted R-squared: Penalizes complexity
F-statistic p-value: Model significance
AIC/BIC: Compare non-nested models

See: references/statistical_analysis.md for complete implementations

Raw Image Processing Workflow

When Processing Raw Images

Workflow : Load → Preprocess → Segment → Measure → Export

# Quick start for cell counting
from scripts.segment_cells import count_cells_in_image

result = count_cells_in_image(
    image_path="cells.tif",
    channel=0,  # DAPI channel
    min_area=50
)
print(f"Found {result['count']} cells")

Segmentation Method Selection

Decision guide :

Cell Type	Density	Best Method	Notes
Nuclei (DAPI)	Low-Medium	Otsu + watershed	Standard approach
Nuclei (DAPI)	High	CellPose/StarDist	Handles touching
Colonies	Well-separated	Otsu threshold	Fast, reliable
Colonies	Touching	Watershed	Edge detection
Cells (phase)	Any	Adaptive threshold	Handles uneven illumination
Fluorescence	Low signal	Li threshold	More sensitive

See: references/segmentation.md and references/cell_counting.md for detailed protocols

Library Selection: scikit-image vs OpenCV

Use scikit-image when :

Scientific measurements needed (area, perimeter, intensity)
regionprops for object properties
Publication-quality analysis
Easier syntax for scientists

Use OpenCV when :

Processing large image batches
Speed is critical
Real-time processing
Advanced computer vision features

Both work for :

Thresholding, filtering, morphological operations
Basic image transformations
Most segmentation tasks

See: references/image_processing.md "Library Selection Guide"

Common BixBench Patterns

Pattern 1: Colony Morphometry (bix-18)

Question type : "Mean circularity of genotype with largest area?"

Data : CSV with Genotype, Area, Circularity columns

Workflow :

Load CSV → group by Genotype
Calculate mean Area per genotype
Identify genotype with max mean Area
Report mean Circularity for that genotype

See: references/segmentation.md "Colony Morphometry Analysis"

Pattern 2: Cell Counting Statistics (bix-19)

Question type : "Cohen's d for NeuN counts between conditions?"

Data : CSV with Condition, NeuN_count, Sex, Hemisphere columns

Workflow :

Load CSV → filter by hemisphere/sex if needed
Split by Condition (KD vs CTRL)
Calculate Cohen's d with pooled SD
Report effect size

See: references/statistical_analysis.md "Effect Size Calculations"

Pattern 3: Multi-Group Comparison (bix-41)

Question type : "Dunnett's test: How many ratios equivalent to control?"

Data : CSV with multiple co-culture ratios, Area, Circularity

Workflow :

Create Strain_Ratio labels
Run Dunnett's test for Area (vs control)
Run Dunnett's test for Circularity (vs control)
Count groups NOT significant in BOTH tests

See: references/statistical_analysis.md "Dunnett's Test"

Pattern 4: Regression Optimization (bix-54)

Question type : "Peak frequency from natural spline model?"

Data : CSV with co-culture frequencies and Area measurements

Workflow :

Convert ratio strings to frequencies
Fit natural spline model (df=4)
Find peak via grid search
Report peak frequency + confidence interval

See: references/statistical_analysis.md "Regression Modeling"

Quick Reference Table

Task	Primary Tool	Reference
Load measurement CSV	pandas.read_csv()	This file
Group statistics	df.groupby().agg()	This file
T-test	scipy.stats.ttest_ind()	statistical_analysis.md
ANOVA	statsmodels.ols + anova_lm()	statistical_analysis.md
Dunnett's test	scipy.stats.dunnett()	statistical_analysis.md
Cohen's d	Custom function (pooled SD)	statistical_analysis.md
Power analysis	statsmodels TTestIndPower	statistical_analysis.md
Polynomial regression

Example Scripts

Ready-to-use scripts in scripts/ directory:

segment_cells.py - Cell/nuclei counting with watershed
measure_fluorescence.py - Multi-channel intensity quantification
batch_process.py - Process folders of images
colony_morphometry.py - Measure colony area/circularity
statistical_comparison.py - Group comparison statistics

Usage:

# Count cells in image
python scripts/segment_cells.py cells.tif --channel 0 --min-area 50

# Batch process folder
python scripts/batch_process.py input_folder/ output.csv --analysis cell_count

Detailed Reference Guides

For complete implementations and protocols:

references/statistical_analysis.md - All statistical tests, regression models
references/cell_counting.md - Cell/nuclei counting protocols
references/segmentation.md - Colony and object segmentation
references/fluorescence_analysis.md - Intensity quantification, colocalization
references/image_processing.md - Image loading, preprocessing, library selection
references/troubleshooting.md - Common issues and solutions

Important Notes

Matching R Statistical Functions

Some BixBench questions use R for analysis. Python equivalents:

R's Dunnett test (multcomp::glht) → scipy.stats.dunnett() (scipy ≥ 1.10)
R's natural spline (ns(x, df=4)) → patsy.cr(x, knots=...) with explicit quantile knots
R's t-test (t.test()) → scipy.stats.ttest_ind()
R's ANOVA (aov()) → statsmodels.formula.api.ols() + sm.stats.anova_lm()

See: references/statistical_analysis.md for exact parameter matching

Answer Formatting

BixBench expects specific formats:

"to the nearest thousand": int(round(val, -3))
Percentages: Usually integer or 1-2 decimal places
Cohen's d: 3 decimal places
Sample sizes: Always integer (ceiling)
Ratios: String format "5:1"

Completeness Checklist

Before returning your answer, verify:

Loaded all data files and inspected column names
Identified the specific statistic or model requested
Used correct grouping variables and filter conditions
Applied correct rounding or format
For "how many" questions: counted correctly based on criteria
For statistical tests: used appropriate multiple comparison correction
For regression: properly prepared and transformed data
Double-checked direction of comparisons
Verified answer falls within expected range

Getting Help

Start with decision tree at top of this file
Check relevant reference guide for detailed protocol
Use example scripts as templates
See troubleshooting guide for common issues
All statistical implementations in statistical_analysis.md

Weekly Installs

220

Repository

mims-harvard/to…universe

GitHub Stars

1.2K

First Seen

Feb 19, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

codex217

gemini-cli216

opencode216

github-copilot215

cursor213

kimi-cli212

AI Elements：基于shadcn/ui的AI原生应用组件库，快速构建对话界面

58,500 周安装

显微图像分析与定量成像数据技能：使用Python进行菌落形态计量、细胞计数与统计建模

🇨🇳中文介绍

显微图像分析与定量成像数据

何时使用此技能

相关 Skills

核心原则

所需 Python 包

高层工作流决策树

定量数据分析工作流

阶段 0：问题解析与数据发现

阶段 1：分组统计

阶段 2：统计检验

阶段 3：回归建模

原始图像处理工作流

处理原始图像时

分割方法选择

库选择：scikit-image 与 OpenCV

常见 BixBench 模式

模式 1：菌落形态计量 (bix-18)

模式 2：细胞计数统计 (bix-19)

模式 3：多组比较 (bix-41)

模式 4：回归优化 (bix-54)

快速参考表

示例脚本

详细参考指南

重要说明

匹配 R 统计函数

答案格式

完整性检查清单

获取帮助

🇺🇸English

Microscopy Image Analysis and Quantitative Imaging Data

When to Use This Skill

Core Principles

Required Python Packages

High-Level Workflow Decision Tree

Quantitative Data Analysis Workflow

Phase 0: Question Parsing and Data Discovery

Phase 1: Grouped Statistics

Phase 2: Statistical Testing

Phase 3: Regression Modeling

Raw Image Processing Workflow

When Processing Raw Images

Segmentation Method Selection

Library Selection: scikit-image vs OpenCV

Common BixBench Patterns

Pattern 1: Colony Morphometry (bix-18)

Pattern 2: Cell Counting Statistics (bix-19)

Pattern 3: Multi-Group Comparison (bix-41)

Pattern 4: Regression Optimization (bix-54)

Quick Reference Table

Example Scripts

Detailed Reference Guides

Important Notes

Matching R Statistical Functions

Answer Formatting

Completeness Checklist

Getting Help

最新 Skills