tooluniverse-image-analysis by mims-harvard/tooluniverse
npx skills add https://github.com/mims-harvard/tooluniverse --skill tooluniverse-image-analysis使用 pandas、numpy、scipy、statsmodels 和 scikit-image 分析显微镜测量数据的生产就绪技能。专为 BixBench 成像问题设计,涵盖菌落形态计量、细胞计数、荧光定量、回归建模和统计比较。
重要提示:此技能处理复杂的多工作流分析。大多数实现细节已移至 references/ 目录以进行渐进式披露。本文档侧重于高层决策和工作流编排。
当用户具备以下情况时应用:
BixBench 覆盖范围:涵盖 4 个项目(bix-18、bix-19、bix-41、bix-54)的 21 个问题
不适用于(请改用其他技能):
tooluniverse-phylogeneticstooluniverse-rnaseq-deseq2广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
tooluniverse-single-celltooluniverse-statistical-modeling# Core (MUST be installed)
import pandas as pd
import numpy as np
from scipy import stats
from scipy.interpolate import BSpline, make_interp_spline
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.stats.power import TTestIndPower
from patsy import dmatrix, bs, cr
# Optional (for raw image processing)
import skimage
import cv2
import tifffile
安装:
pip install pandas numpy scipy statsmodels patsy scikit-image opencv-python-headless tifffile
START: User question about microscopy data
│
├─ Q1: What type of data is available?
│ │
│ ├─ PRE-QUANTIFIED DATA (CSV/TSV with measurements)
│ │ └─ Workflow: Load → Parse question → Statistical analysis
│ │ Pattern: Most common BixBench pattern (bix-18, bix-19, bix-41, bix-54)
│ │ See: Section "Quantitative Data Analysis" below
│ │
│ └─ RAW IMAGES (TIFF, PNG, multi-channel)
│ └─ Workflow: Load → Segment → Measure → Analyze
│ See: references/image_processing.md
│
├─ Q2: What type of analysis is needed?
│ │
│ ├─ STATISTICAL COMPARISON
│ │ ├─ Two groups → t-test or Mann-Whitney
│ │ ├─ Multiple groups → ANOVA or Dunnett's test
│ │ ├─ Two factors → Two-way ANOVA
│ │ └─ Effect size → Cohen's d, power analysis
│ │ See: references/statistical_analysis.md
│ │
│ ├─ REGRESSION MODELING
│ │ ├─ Dose-response → Polynomial (quadratic, cubic)
│ │ ├─ Ratio optimization → Natural spline
│ │ └─ Model comparison → R-squared, F-statistic, AIC/BIC
│ │ See: references/statistical_analysis.md
│ │
│ ├─ CELL COUNTING
│ │ ├─ Fluorescence (DAPI, NeuN) → Threshold + watershed
│ │ ├─ Brightfield → Adaptive threshold
│ │ └─ High-density → CellPose or StarDist (external)
│ │ See: references/cell_counting.md
│ │
│ ├─ COLONY SEGMENTATION
│ │ ├─ Swarming assays → Otsu threshold + morphology
│ │ ├─ Biofilms → Li threshold + fill holes
│ │ └─ Growth assays → Time-lapse tracking
│ │ See: references/segmentation.md
│ │
│ └─ FLUORESCENCE QUANTIFICATION
│ ├─ Intensity measurement → regionprops
│ ├─ Colocalization → Pearson/Manders
│ └─ Multi-channel → Channel-wise quantification
│ See: references/fluorescence_analysis.md
│
└─ Q3: When to use scikit-image vs OpenCV?
├─ scikit-image: Scientific analysis, measurements, regionprops
├─ OpenCV: Fast processing, real-time, large batches
└─ Both: Often interchangeable for basic operations
See: references/image_processing.md "Library Selection Guide"
关键第一步:在编写任何代码之前,识别可用的数据文件以及问题要求的内容。
import os, glob, pandas as pd
# Discover data files
data_dir = "."
csv_files = glob.glob(os.path.join(data_dir, '**', '*.csv'), recursive=True)
tsv_files = glob.glob(os.path.join(data_dir, '**', '*.tsv'), recursive=True)
img_files = glob.glob(os.path.join(data_dir, '**', '*.tif*'), recursive=True)
# Load and inspect first measurement file
if csv_files:
df = pd.read_csv(csv_files[0])
print(f"Shape: {df.shape}")
print(f"Columns: {list(df.columns)}")
print(df.head())
print(df.describe())
常见列名:
def grouped_summary(df, group_cols, measure_col):
"""Calculate summary statistics by group."""
summary = df.groupby(group_cols)[measure_col].agg(
Mean='mean',
SD='std',
Median='median',
Min='min',
Max='max',
N='count'
).reset_index()
summary['SEM'] = summary['SD'] / np.sqrt(summary['N'])
return summary
# Example: Colony morphometry by genotype
area_summary = grouped_summary(df, 'Genotype', 'Area')
circ_summary = grouped_summary(df, 'Genotype', 'Circularity')
有关详细的统计函数,请参阅:references/statistical_analysis.md
决策指南:
请参阅:references/statistical_analysis.md 获取完整实现
何时使用每种模型:
模型比较指标:
请参阅:references/statistical_analysis.md 获取完整实现
工作流:加载 → 预处理 → 分割 → 测量 → 导出
# Quick start for cell counting
from scripts.segment_cells import count_cells_in_image
result = count_cells_in_image(
image_path="cells.tif",
channel=0, # DAPI channel
min_area=50
)
print(f"Found {result['count']} cells")
决策指南:
| 细胞类型 | 密度 | 最佳方法 | 备注 |
|---|---|---|---|
| 细胞核 (DAPI) | 低-中 | Otsu + watershed | 标准方法 |
| 细胞核 (DAPI) | 高 | CellPose/StarDist | 处理接触细胞 |
| 菌落 | 分离良好 | Otsu 阈值 | 快速,可靠 |
| 菌落 | 接触 | Watershed | 边缘检测 |
| 细胞 (相差) | 任意 | 自适应阈值 | 处理不均匀照明 |
| 荧光 | 低信号 | Li 阈值 | 更敏感 |
请参阅:references/segmentation.md 和 references/cell_counting.md 获取详细方案
使用 scikit-image 当:
使用 OpenCV 当:
两者都适用于:
请参阅:references/image_processing.md "库选择指南"
问题类型:"面积最大的基因型的平均圆形度是多少?"
数据:包含 Genotype、Area、Circularity 列的 CSV
工作流:
请参阅:references/segmentation.md "菌落形态计量分析"
问题类型:"条件间 NeuN 计数的 Cohen's d 是多少?"
数据:包含 Condition、NeuN_count、Sex、Hemisphere 列的 CSV
工作流:
请参阅:references/statistical_analysis.md "效应量计算"
问题类型:"Dunnett 检验:有多少个比例与对照等效?"
数据:包含多个共培养比例、Area、Circularity 的 CSV
工作流:
请参阅:references/statistical_analysis.md "Dunnett 检验"
问题类型:"自然样条模型的峰值频率是多少?"
数据:包含共培养频率和 Area 测量的 CSV
工作流:
请参阅:references/statistical_analysis.md "回归建模"
| 任务 | 主要工具 | 参考 |
|---|---|---|
| 加载测量 CSV | pandas.read_csv() | 本文档 |
| 分组统计 | df.groupby().agg() | 本文档 |
| T 检验 | scipy.stats.ttest_ind() | statistical_analysis.md |
| ANOVA | statsmodels.ols + anova_lm() | statistical_analysis.md |
| Dunnett 检验 | scipy.stats.dunnett() | statistical_analysis.md |
| Cohen's d | 自定义函数 (合并 SD) | statistical_analysis.md |
| 功效分析 | statsmodels TTestIndPower | statistical_analysis.md |
| 多项式回归 | statsmodels.OLS + poly features | statistical_analysis.md |
| 自然样条 | patsy.cr() + statsmodels.OLS | statistical_analysis.md |
| 细胞分割 | skimage.filters + watershed | cell_counting.md |
| 菌落分割 | skimage.filters.threshold_otsu | segmentation.md |
| 荧光定量 | skimage.measure.regionprops | fluorescence_analysis.md |
| 共定位 | Pearson/Manders | fluorescence_analysis.md |
| 图像加载 | tifffile, skimage.io | image_processing.md |
| 批处理 | scripts/batch_process.py | scripts/ |
scripts/ 目录中的即用型脚本:
用法:
# Count cells in image
python scripts/segment_cells.py cells.tif --channel 0 --min-area 50
# Batch process folder
python scripts/batch_process.py input_folder/ output.csv --analysis cell_count
完整实现和方案请参阅:
一些 BixBench 问题使用 R 进行分析。Python 等效函数:
multcomp::glht) → scipy.stats.dunnett() (scipy ≥ 1.10)ns(x, df=4)) → patsy.cr(x, knots=...) 带有显式分位数节点t.test()) → scipy.stats.ttest_ind()aov()) → statsmodels.formula.api.ols() + sm.stats.anova_lm()请参阅:references/statistical_analysis.md 获取精确的参数匹配
BixBench 期望特定格式:
int(round(val, -3))返回答案前,请验证:
每周安装次数
220
仓库
GitHub 星标数
1.2K
首次出现
2026年2月19日
安全审计
安装于
codex217
gemini-cli216
opencode216
github-copilot215
cursor213
kimi-cli212
Production-ready skill for analyzing microscopy-derived measurement data using pandas, numpy, scipy, statsmodels, and scikit-image. Designed for BixBench imaging questions covering colony morphometry, cell counting, fluorescence quantification, regression modeling, and statistical comparisons.
IMPORTANT : This skill handles complex multi-workflow analysis. Most implementation details have been moved to references/ for progressive disclosure. This document focuses on high-level decision-making and workflow orchestration.
Apply when users:
BixBench Coverage : 21 questions across 4 projects (bix-18, bix-19, bix-41, bix-54)
NOT for (use other skills instead):
tooluniverse-phylogeneticstooluniverse-rnaseq-deseq2tooluniverse-single-celltooluniverse-statistical-modeling# Core (MUST be installed)
import pandas as pd
import numpy as np
from scipy import stats
from scipy.interpolate import BSpline, make_interp_spline
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.stats.power import TTestIndPower
from patsy import dmatrix, bs, cr
# Optional (for raw image processing)
import skimage
import cv2
import tifffile
Installation :
pip install pandas numpy scipy statsmodels patsy scikit-image opencv-python-headless tifffile
START: User question about microscopy data
│
├─ Q1: What type of data is available?
│ │
│ ├─ PRE-QUANTIFIED DATA (CSV/TSV with measurements)
│ │ └─ Workflow: Load → Parse question → Statistical analysis
│ │ Pattern: Most common BixBench pattern (bix-18, bix-19, bix-41, bix-54)
│ │ See: Section "Quantitative Data Analysis" below
│ │
│ └─ RAW IMAGES (TIFF, PNG, multi-channel)
│ └─ Workflow: Load → Segment → Measure → Analyze
│ See: references/image_processing.md
│
├─ Q2: What type of analysis is needed?
│ │
│ ├─ STATISTICAL COMPARISON
│ │ ├─ Two groups → t-test or Mann-Whitney
│ │ ├─ Multiple groups → ANOVA or Dunnett's test
│ │ ├─ Two factors → Two-way ANOVA
│ │ └─ Effect size → Cohen's d, power analysis
│ │ See: references/statistical_analysis.md
│ │
│ ├─ REGRESSION MODELING
│ │ ├─ Dose-response → Polynomial (quadratic, cubic)
│ │ ├─ Ratio optimization → Natural spline
│ │ └─ Model comparison → R-squared, F-statistic, AIC/BIC
│ │ See: references/statistical_analysis.md
│ │
│ ├─ CELL COUNTING
│ │ ├─ Fluorescence (DAPI, NeuN) → Threshold + watershed
│ │ ├─ Brightfield → Adaptive threshold
│ │ └─ High-density → CellPose or StarDist (external)
│ │ See: references/cell_counting.md
│ │
│ ├─ COLONY SEGMENTATION
│ │ ├─ Swarming assays → Otsu threshold + morphology
│ │ ├─ Biofilms → Li threshold + fill holes
│ │ └─ Growth assays → Time-lapse tracking
│ │ See: references/segmentation.md
│ │
│ └─ FLUORESCENCE QUANTIFICATION
│ ├─ Intensity measurement → regionprops
│ ├─ Colocalization → Pearson/Manders
│ └─ Multi-channel → Channel-wise quantification
│ See: references/fluorescence_analysis.md
│
└─ Q3: When to use scikit-image vs OpenCV?
├─ scikit-image: Scientific analysis, measurements, regionprops
├─ OpenCV: Fast processing, real-time, large batches
└─ Both: Often interchangeable for basic operations
See: references/image_processing.md "Library Selection Guide"
CRITICAL FIRST STEP : Before writing ANY code, identify what data files are available and what the question is asking for.
import os, glob, pandas as pd
# Discover data files
data_dir = "."
csv_files = glob.glob(os.path.join(data_dir, '**', '*.csv'), recursive=True)
tsv_files = glob.glob(os.path.join(data_dir, '**', '*.tsv'), recursive=True)
img_files = glob.glob(os.path.join(data_dir, '**', '*.tif*'), recursive=True)
# Load and inspect first measurement file
if csv_files:
df = pd.read_csv(csv_files[0])
print(f"Shape: {df.shape}")
print(f"Columns: {list(df.columns)}")
print(df.head())
print(df.describe())
Common Column Names :
def grouped_summary(df, group_cols, measure_col):
"""Calculate summary statistics by group."""
summary = df.groupby(group_cols)[measure_col].agg(
Mean='mean',
SD='std',
Median='median',
Min='min',
Max='max',
N='count'
).reset_index()
summary['SEM'] = summary['SD'] / np.sqrt(summary['N'])
return summary
# Example: Colony morphometry by genotype
area_summary = grouped_summary(df, 'Genotype', 'Area')
circ_summary = grouped_summary(df, 'Genotype', 'Circularity')
For detailed statistical functions, see: references/statistical_analysis.md
Decision guide :
See: references/statistical_analysis.md for complete implementations
When to use each model :
Model comparison metrics:
See: references/statistical_analysis.md for complete implementations
Workflow : Load → Preprocess → Segment → Measure → Export
# Quick start for cell counting
from scripts.segment_cells import count_cells_in_image
result = count_cells_in_image(
image_path="cells.tif",
channel=0, # DAPI channel
min_area=50
)
print(f"Found {result['count']} cells")
Decision guide :
| Cell Type | Density | Best Method | Notes |
|---|---|---|---|
| Nuclei (DAPI) | Low-Medium | Otsu + watershed | Standard approach |
| Nuclei (DAPI) | High | CellPose/StarDist | Handles touching |
| Colonies | Well-separated | Otsu threshold | Fast, reliable |
| Colonies | Touching | Watershed | Edge detection |
| Cells (phase) | Any | Adaptive threshold | Handles uneven illumination |
| Fluorescence | Low signal | Li threshold | More sensitive |
See: references/segmentation.md and references/cell_counting.md for detailed protocols
Use scikit-image when :
Use OpenCV when :
Both work for :
See: references/image_processing.md "Library Selection Guide"
Question type : "Mean circularity of genotype with largest area?"
Data : CSV with Genotype, Area, Circularity columns
Workflow :
See: references/segmentation.md "Colony Morphometry Analysis"
Question type : "Cohen's d for NeuN counts between conditions?"
Data : CSV with Condition, NeuN_count, Sex, Hemisphere columns
Workflow :
See: references/statistical_analysis.md "Effect Size Calculations"
Question type : "Dunnett's test: How many ratios equivalent to control?"
Data : CSV with multiple co-culture ratios, Area, Circularity
Workflow :
See: references/statistical_analysis.md "Dunnett's Test"
Question type : "Peak frequency from natural spline model?"
Data : CSV with co-culture frequencies and Area measurements
Workflow :
See: references/statistical_analysis.md "Regression Modeling"
| Task | Primary Tool | Reference |
|---|---|---|
| Load measurement CSV | pandas.read_csv() | This file |
| Group statistics | df.groupby().agg() | This file |
| T-test | scipy.stats.ttest_ind() | statistical_analysis.md |
| ANOVA | statsmodels.ols + anova_lm() | statistical_analysis.md |
| Dunnett's test | scipy.stats.dunnett() | statistical_analysis.md |
| Cohen's d | Custom function (pooled SD) | statistical_analysis.md |
| Power analysis | statsmodels TTestIndPower | statistical_analysis.md |
| Polynomial regression |
Ready-to-use scripts in scripts/ directory:
Usage:
# Count cells in image
python scripts/segment_cells.py cells.tif --channel 0 --min-area 50
# Batch process folder
python scripts/batch_process.py input_folder/ output.csv --analysis cell_count
For complete implementations and protocols:
Some BixBench questions use R for analysis. Python equivalents:
multcomp::glht) → scipy.stats.dunnett() (scipy ≥ 1.10)ns(x, df=4)) → patsy.cr(x, knots=...) with explicit quantile knotst.test()) → scipy.stats.ttest_ind()aov()) → statsmodels.formula.api.ols() + sm.stats.anova_lm()See: references/statistical_analysis.md for exact parameter matching
BixBench expects specific formats:
int(round(val, -3))Before returning your answer, verify:
Weekly Installs
220
Repository
GitHub Stars
1.2K
First Seen
Feb 19, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
codex217
gemini-cli216
opencode216
github-copilot215
cursor213
kimi-cli212
AI Elements:基于shadcn/ui的AI原生应用组件库,快速构建对话界面
58,500 周安装
| statsmodels.OLS + poly features |
| statistical_analysis.md |
| Natural spline | patsy.cr() + statsmodels.OLS | statistical_analysis.md |
| Cell segmentation | skimage.filters + watershed | cell_counting.md |
| Colony segmentation | skimage.filters.threshold_otsu | segmentation.md |
| Fluorescence quantification | skimage.measure.regionprops | fluorescence_analysis.md |
| Colocalization | Pearson/Manders | fluorescence_analysis.md |
| Image loading | tifffile, skimage.io | image_processing.md |
| Batch processing | scripts/batch_process.py | scripts/ |