统计分析技能：假设检验、回归、贝叶斯分析与APA报告

statistical-analysis by davila7/claude-code-templates

267 周安装量

23,400 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/davila7/claude-code-templates --skill statistical-analysis

AI/机器学习方法论数据分析

🇨🇳中文介绍

统计分析

概述

统计分析是一个用于检验假设和量化关系的系统过程。通过假设检验（t检验、方差分析、卡方检验）、回归、相关性和贝叶斯分析，并包含假设检验和APA格式报告。此技能适用于学术研究。

何时使用此技能

此技能应在以下情况下使用：

进行统计假设检验（t检验、方差分析、卡方检验）
执行回归或相关性分析
运行贝叶斯统计分析
检查统计假设和诊断
计算效应量并进行功效分析
以APA格式报告统计结果
为研究分析实验或观测数据

核心能力

1. 检验选择与规划

根据研究问题和数据特征选择合适的统计检验
进行先验功效分析以确定所需样本量
规划分析策略，包括多重比较校正

2. 假设检验

在运行检验前自动验证所有相关假设
提供诊断可视化（Q-Q图、残差图、箱线图）
当假设被违反时推荐补救措施

3. 统计检验

假设检验：t检验、方差分析、卡方检验、非参数替代方法
回归：线性、多元、逻辑回归，含诊断
相关性：皮尔逊、斯皮尔曼，含置信区间
贝叶斯替代方法：贝叶斯t检验、方差分析、回归，含贝叶斯因子

4. 效应量与解释

为所有分析计算并解释适当的效应量
为效应估计提供置信区间
区分统计显著性与实际显著性

5. 专业报告

生成APA风格的统计报告
创建可直接用于发表的图表
提供包含所有必需统计量的完整解释

工作流程决策树

使用此决策树确定您的分析路径：

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

快速参考：选择正确的检验

使用 references/test_selection_guide.md 获取全面指导。快速参考：

独立、连续、正态 → 独立样本t检验
独立、连续、非正态 → 曼-惠特尼U检验
配对、连续、正态 → 配对样本t检验
配对、连续、非正态 → 威尔科克森符号秩检验
二分类结果 → 卡方检验或费希尔精确检验

比较三组及以上：

独立、连续、正态 → 单因素方差分析
独立、连续、非正态 → 克鲁斯卡尔-沃利斯检验
配对、连续、正态 → 重复测量方差分析
配对、连续、非正态 → 弗里德曼检验

两个连续变量 → 皮尔逊（正态）或斯皮尔曼（非正态）相关性
连续结果与预测变量 → 线性回归
二分类结果与预测变量 → 逻辑回归

贝叶斯替代方法： 所有检验都有贝叶斯版本，提供：

关于假设的直接概率陈述
量化证据的贝叶斯因子
支持零假设的能力
参见 references/bayesian_statistics.md

系统化假设验证

在解释检验结果前，务必检查假设。

使用提供的 scripts/assumption_checks.py 模块进行自动化检查：

from scripts.assumption_checks import comprehensive_assumption_check

# 包含可视化的全面检查
results = comprehensive_assumption_check(
    data=df,
    value_col='score',
    group_col='group',  # 可选：用于组间比较
    alpha=0.05
)

异常值检测（IQR和z分数方法）
正态性检验（夏皮罗-威尔克检验 + Q-Q图）
方差齐性检验（莱文检验 + 箱线图）
解释与建议

针对特定检查，使用单独的函数：

from scripts.assumption_checks import (
    check_normality,
    check_normality_per_group,
    check_homogeneity_of_variance,
    check_linearity,
    detect_outliers
)

# 示例：带可视化的正态性检查
result = check_normality(
    data=df['score'],
    name='Test Score',
    alpha=0.05,
    plot=True
)
print(result['interpretation'])
print(result['recommendation'])

当假设被违反时该怎么办

正态性被违反：

轻微违反 + 每组n > 30 → 继续进行参数检验（稳健）
中度违反 → 使用非参数替代方法
严重违反 → 转换数据或使用非参数检验

方差齐性被违反：

对于t检验 → 使用韦尔奇t检验
对于方差分析 → 使用韦尔奇方差分析或布朗-福赛斯方差分析
对于回归 → 使用稳健标准误或加权最小二乘法

线性被违反（回归）：

添加多项式项
转换变量
使用非线性模型或GAM

参见 references/assumptions_and_diagnostics.md 获取全面指导。

统计分析的主要库：

scipy.stats : 核心统计检验
statsmodels : 高级回归和诊断
pingouin : 用户友好的统计检验，含效应量
pymc : 贝叶斯统计建模
arviz : 贝叶斯可视化和诊断

带完整报告的T检验

import pingouin as pg
import numpy as np

# 运行独立样本t检验
result = pg.ttest(group_a, group_b, correction='auto')

# 提取结果
t_stat = result['T'].values[0]
df = result['dof'].values[0]
p_value = result['p-val'].values[0]
cohens_d = result['cohen-d'].values[0]
ci_lower = result['CI95%'].values[0][0]
ci_upper = result['CI95%'].values[0][1]

# 报告
print(f"t({df:.0f}) = {t_stat:.2f}, p = {p_value:.3f}")
print(f"Cohen's d = {cohens_d:.2f}, 95% CI [{ci_lower:.2f}, {ci_upper:.2f}]")

带事后检验的方差分析

import pingouin as pg

# 单因素方差分析
aov = pg.anova(dv='score', between='group', data=df, detailed=True)
print(aov)

# 如果显著，进行事后检验
if aov['p-unc'].values[0] < 0.05:
    posthoc = pg.pairwise_tukey(dv='score', between='group', data=df)
    print(posthoc)

# 效应量
eta_squared = aov['np2'].values[0]  # 偏η²
print(f"Partial η² = {eta_squared:.3f}")

带诊断的线性回归

import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor

# 拟合模型
X = sm.add_constant(X_predictors)  # 添加截距
model = sm.OLS(y, X).fit()

# 摘要
print(model.summary())

# 检查多重共线性（VIF）
vif_data = pd.DataFrame()
vif_data["Variable"] = X.columns
vif_data["VIF"] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
print(vif_data)

# 检查假设
residuals = model.resid
fitted = model.fittedvalues

# 残差图
import matplotlib.pyplot as plt
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# 残差与拟合值
axes[0, 0].scatter(fitted, residuals, alpha=0.6)
axes[0, 0].axhline(y=0, color='r', linestyle='--')
axes[0, 0].set_xlabel('Fitted values')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].set_title('Residuals vs Fitted')

# Q-Q图
from scipy import stats
stats.probplot(residuals, dist="norm", plot=axes[0, 1])
axes[0, 1].set_title('Normal Q-Q')

# 尺度-位置图
axes[1, 0].scatter(fitted, np.sqrt(np.abs(residuals / residuals.std())), alpha=0.6)
axes[1, 0].set_xlabel('Fitted values')
axes[1, 0].set_ylabel('√|Standardized residuals|')
axes[1, 0].set_title('Scale-Location')

# 残差直方图
axes[1, 1].hist(residuals, bins=20, edgecolor='black', alpha=0.7)
axes[1, 1].set_xlabel('Residuals')
axes[1, 1].set_ylabel('Frequency')
axes[1, 1].set_title('Histogram of Residuals')

plt.tight_layout()
plt.show()

import pymc as pm
import arviz as az
import numpy as np

with pm.Model() as model:
    # 先验分布
    mu1 = pm.Normal('mu_group1', mu=0, sigma=10)
    mu2 = pm.Normal('mu_group2', mu=0, sigma=10)
    sigma = pm.HalfNormal('sigma', sigma=10)

    # 似然函数
    y1 = pm.Normal('y1', mu=mu1, sigma=sigma, observed=group_a)
    y2 = pm.Normal('y2', mu=mu2, sigma=sigma, observed=group_b)

    # 派生量
    diff = pm.Deterministic('difference', mu1 - mu2)

    # 采样
    trace = pm.sample(2000, tune=1000, return_inferencedata=True)

# 汇总
print(az.summary(trace, var_names=['difference']))

# 组1大于组2的概率
prob_greater = np.mean(trace.posterior['difference'].values > 0)
print(f"P(μ₁ > μ₂ | data) = {prob_greater:.3f}")

# 绘制后验分布
az.plot_posterior(trace, var_names=['difference'], ref_val=0)

始终计算效应量

效应量量化效应大小，而p值仅表明效应存在。

参见 references/effect_sizes_and_power.md 获取全面指导。

快速参考：常见效应量

检验	效应量	小	中	大
T检验	Cohen's d	0.20	0.50	0.80
方差分析	η²_p	0.01	0.06	0.14
相关性	r	0.10	0.30	0.50
回归	R²	0.02	0.13	0.26
卡方检验	Cramér's V	0.07	0.21	0.35

重要提示：基准值是指导原则。具体情境很重要！

大多数效应量由pingouin自动计算：

# T检验返回Cohen's d
result = pg.ttest(x, y)
d = result['cohen-d'].values[0]

# 方差分析返回偏η²
aov = pg.anova(dv='score', between='group', data=df)
eta_p2 = aov['np2'].values[0]

# 相关性：r本身就是效应量
corr = pg.corr(x, y)
r = corr['r'].values[0]

效应量的置信区间

始终报告置信区间以显示精确度：

from pingouin import compute_effsize_from_t

# 对于t检验
d, ci = compute_effsize_from_t(
    t_statistic,
    nx=len(group1),
    ny=len(group2),
    eftype='cohen'
)
print(f"d = {d:.2f}, 95% CI [{ci[0]:.2f}, {ci[1]:.2f}]")

先验功效分析（研究规划）

在数据收集前确定所需样本量：

from statsmodels.stats.power import (
    tt_ind_solve_power,
    FTestAnovaPower
)

# T检验：检测d = 0.5需要多少n？
n_required = tt_ind_solve_power(
    effect_size=0.5,
    alpha=0.05,
    power=0.80,
    ratio=1.0,
    alternative='two-sided'
)
print(f"Required n per group: {n_required:.0f}")

# 方差分析：检测f = 0.25需要多少n？
anova_power = FTestAnovaPower()
n_per_group = anova_power.solve_power(
    effect_size=0.25,
    ngroups=3,
    alpha=0.05,
    power=0.80
)
print(f"Required n per group: {n_per_group:.0f}")

敏感性分析（研究后）

确定可以检测到多大的效应：

# 每组n=50时，我们能检测到多大的效应？
detectable_d = tt_ind_solve_power(
    effect_size=None,  # 求解此项
    nobs1=50,
    alpha=0.05,
    power=0.80,
    ratio=1.0,
    alternative='two-sided'
)
print(f"Study could detect d ≥ {detectable_d:.2f}")

注意：事后功效分析（研究后计算功效）通常不推荐。改用敏感性分析。

参见 references/effect_sizes_and_power.md 获取详细指导。

APA风格统计报告

遵循 references/reporting_standards.md 中的指南。

描述性统计：所有组/变量的M、SD、n
检验统计量：检验名称、统计量、df、精确p值
效应量：含置信区间
假设检验：进行了哪些检验、结果、采取的措施
所有计划的分析：包括不显著的结果

Group A (n = 48, M = 75.2, SD = 8.5) scored significantly higher than
Group B (n = 52, M = 68.3, SD = 9.2), t(98) = 3.82, p < .001, d = 0.77,
95% CI [0.36, 1.18], two-tailed. Assumptions of normality (Shapiro-Wilk:
Group A W = 0.97, p = .18; Group B W = 0.96, p = .12) and homogeneity
of variance (Levene's F(1, 98) = 1.23, p = .27) were satisfied.

单因素方差分析

A one-way ANOVA revealed a significant main effect of treatment condition
on test scores, F(2, 147) = 8.45, p < .001, η²_p = .10. Post hoc
comparisons using Tukey's HSD indicated that Condition A (M = 78.2,
SD = 7.3) scored significantly higher than Condition B (M = 71.5,
SD = 8.1, p = .002, d = 0.87) and Condition C (M = 70.1, SD = 7.9,
p < .001, d = 1.07). Conditions B and C did not differ significantly
(p = .52, d = 0.18).

Multiple linear regression was conducted to predict exam scores from
study hours, prior GPA, and attendance. The overall model was significant,
F(3, 146) = 45.2, p < .001, R² = .48, adjusted R² = .47. Study hours
(B = 1.80, SE = 0.31, β = .35, t = 5.78, p < .001, 95% CI [1.18, 2.42])
and prior GPA (B = 8.52, SE = 1.95, β = .28, t = 4.37, p < .001,
95% CI [4.66, 12.38]) were significant predictors, while attendance was
not (B = 0.15, SE = 0.12, β = .08, t = 1.25, p = .21, 95% CI [-0.09, 0.39]).
Multicollinearity was not a concern (all VIF < 1.5).

A Bayesian independent samples t-test was conducted using weakly
informative priors (Normal(0, 1) for mean difference). The posterior
distribution indicated that Group A scored higher than Group B
(M_diff = 6.8, 95% credible interval [3.2, 10.4]). The Bayes Factor
BF₁₀ = 45.3 provided very strong evidence for a difference between
groups, with a 99.8% posterior probability that Group A's mean exceeded
Group B's mean. Convergence diagnostics were satisfactory (all R̂ < 1.01,
ESS > 1000).

何时使用贝叶斯方法

在以下情况下考虑贝叶斯方法：

您有先验信息需要纳入
您希望获得关于假设的直接概率陈述
样本量较小或计划顺序数据收集
您需要量化支持零假设的证据
模型复杂（分层、缺失数据）

参见 references/bayesian_statistics.md 获取关于以下内容的全面指导：

贝叶斯定理和解释
先验设定（信息性、弱信息性、无信息性）
使用贝叶斯因子的贝叶斯假设检验
可信区间与置信区间
贝叶斯t检验、方差分析、回归和分层模型
模型收敛性检查和后验预测检查

直观解释："给定数据，参数在此区间内的概率为95%"
支持零假设的证据：可以量化无效应的支持度
灵活性：无需担心p值操纵；可以按数据到达顺序分析
不确定性量化：完整的后验分布

此技能包含全面的参考资料：

test_selection_guide.md : 选择适当统计检验的决策树
assumptions_and_diagnostics.md : 检查和处理假设违反的详细指导
effect_sizes_and_power.md : 计算、解释和报告效应量；进行功效分析
bayesian_statistics.md : 贝叶斯分析方法的完整指南
reporting_standards.md : APA风格报告指南，含示例

assumption_checks.py : 带可视化的自动化假设检查
- comprehensive_assumption_check(): 完整工作流程
- check_normality(): 带Q-Q图的正态性检验
- check_homogeneity_of_variance(): 带箱线图的莱文检验
- check_linearity(): 回归线性检查
- detect_outliers(): IQR和z分数异常值检测

尽可能预注册分析，以区分验证性分析和探索性分析
在解释结果前始终检查假设
报告效应量，含置信区间
报告所有计划的分析，包括不显著的结果
区分统计显著性与实际显著性
分析前后可视化数据
检查回归/方差分析的诊断（残差图、VIF等）
进行敏感性分析以评估稳健性
共享数据和代码以确保可重复性
对违反、转换和决策保持透明

应避免的常见陷阱

P值操纵：不要测试多种方法直到某个结果显著
HARKing：不要将探索性发现呈现为验证性发现
忽略假设：检查它们并报告违反情况
混淆显著性与重要性：p < .05 ≠ 有意义的效应
不报告效应量：对解释至关重要
选择性报告结果：报告所有计划的分析
误解p值：它们不是假设为真的概率
多重比较：适当时校正家族误差
忽略缺失数据：了解机制（MCAR、MAR、MNAR）
过度解释不显著的结果：缺乏证据 ≠ 证据缺乏

开始统计分析时：

定义研究问题和假设
确定适当的统计检验（使用test_selection_guide.md）
进行功效分析以确定样本量
加载和检查数据
检查缺失数据和异常值
使用assumption_checks.py验证假设
运行主要分析
计算效应量，含置信区间
如有需要，进行事后检验（含校正）
创建可视化
按照reporting_standards.md撰写结果
进行敏感性分析
共享数据和代码

支持与进一步阅读

有关以下问题的疑问：

检验选择 : 参见references/test_selection_guide.md
假设 : 参见references/assumptions_and_diagnostics.md
效应量 : 参见references/effect_sizes_and_power.md
贝叶斯方法 : 参见references/bayesian_statistics.md
报告 : 参见references/reporting_standards.md

Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences
Field, A. (2013). Discovering Statistics Using IBM SPSS Statistics
Gelman, A., & Hill, J. (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models
Kruschke, J. K. (2014). Doing Bayesian Data Analysis

APA风格指南: https://apastyle.apa.org/
统计咨询: Cross Validated (stats.stackexchange.com)

🇺🇸English

Statistical Analysis

Overview

Statistical analysis is a systematic process for testing hypotheses and quantifying relationships. Conduct hypothesis tests (t-test, ANOVA, chi-square), regression, correlation, and Bayesian analyses with assumption checks and APA reporting. Apply this skill for academic research.

When to Use This Skill

This skill should be used when:

Conducting statistical hypothesis tests (t-tests, ANOVA, chi-square)
Performing regression or correlation analyses
Running Bayesian statistical analyses
Checking statistical assumptions and diagnostics
Calculating effect sizes and conducting power analyses
Reporting statistical results in APA format
Analyzing experimental or observational data for research

Core Capabilities

1. Test Selection and Planning

Choose appropriate statistical tests based on research questions and data characteristics
Conduct a priori power analyses to determine required sample sizes
Plan analysis strategies including multiple comparison corrections

2. Assumption Checking

Automatically verify all relevant assumptions before running tests
Provide diagnostic visualizations (Q-Q plots, residual plots, box plots)
Recommend remedial actions when assumptions are violated

3. Statistical Testing

Hypothesis testing: t-tests, ANOVA, chi-square, non-parametric alternatives
Regression: linear, multiple, logistic, with diagnostics
Correlations: Pearson, Spearman, with confidence intervals
Bayesian alternatives: Bayesian t-tests, ANOVA, regression with Bayes Factors

4. Effect Sizes and Interpretation

Calculate and interpret appropriate effect sizes for all analyses
Provide confidence intervals for effect estimates
Distinguish statistical from practical significance

5. Professional Reporting

Generate APA-style statistical reports
Create publication-ready figures and tables
Provide complete interpretation with all required statistics

Workflow Decision Tree

Use this decision tree to determine your analysis path:

START
│
├─ Need to SELECT a statistical test?
│  └─ YES → See "Test Selection Guide"
│  └─ NO → Continue
│
├─ Ready to check ASSUMPTIONS?
│  └─ YES → See "Assumption Checking"
│  └─ NO → Continue
│
├─ Ready to run ANALYSIS?
│  └─ YES → See "Running Statistical Tests"
│  └─ NO → Continue
│
└─ Need to REPORT results?
   └─ YES → See "Reporting Results"

Test Selection Guide

Quick Reference: Choosing the Right Test

Use references/test_selection_guide.md for comprehensive guidance. Quick reference:

Comparing Two Groups:

Independent, continuous, normal → Independent t-test
Independent, continuous, non-normal → Mann-Whitney U test
Paired, continuous, normal → Paired t-test
Paired, continuous, non-normal → Wilcoxon signed-rank test
Binary outcome → Chi-square or Fisher's exact test

Comparing 3+ Groups:

Independent, continuous, normal → One-way ANOVA
Independent, continuous, non-normal → Kruskal-Wallis test
Paired, continuous, normal → Repeated measures ANOVA
Paired, continuous, non-normal → Friedman test

Relationships:

Two continuous variables → Pearson (normal) or Spearman correlation (non-normal)
Continuous outcome with predictor(s) → Linear regression
Binary outcome with predictor(s) → Logistic regression

Bayesian Alternatives: All tests have Bayesian versions that provide:

Direct probability statements about hypotheses
Bayes Factors quantifying evidence
Ability to support null hypothesis
See references/bayesian_statistics.md

Assumption Checking

Systematic Assumption Verification

ALWAYS check assumptions before interpreting test results.

Use the provided scripts/assumption_checks.py module for automated checking:

from scripts.assumption_checks import comprehensive_assumption_check

# Comprehensive check with visualizations
results = comprehensive_assumption_check(
    data=df,
    value_col='score',
    group_col='group',  # Optional: for group comparisons
    alpha=0.05
)

This performs:

Outlier detection (IQR and z-score methods)
Normality testing (Shapiro-Wilk test + Q-Q plots)
Homogeneity of variance (Levene's test + box plots)
Interpretation and recommendations

Individual Assumption Checks

For targeted checks, use individual functions:

from scripts.assumption_checks import (
    check_normality,
    check_normality_per_group,
    check_homogeneity_of_variance,
    check_linearity,
    detect_outliers
)

# Example: Check normality with visualization
result = check_normality(
    data=df['score'],
    name='Test Score',
    alpha=0.05,
    plot=True
)
print(result['interpretation'])
print(result['recommendation'])

What to Do When Assumptions Are Violated

Normality violated:

Mild violation + n > 30 per group → Proceed with parametric test (robust)
Moderate violation → Use non-parametric alternative
Severe violation → Transform data or use non-parametric test

Homogeneity of variance violated:

For t-test → Use Welch's t-test
For ANOVA → Use Welch's ANOVA or Brown-Forsythe ANOVA
For regression → Use robust standard errors or weighted least squares

Linearity violated (regression):

Add polynomial terms
Transform variables
Use non-linear models or GAM

See references/assumptions_and_diagnostics.md for comprehensive guidance.

Running Statistical Tests

Python Libraries

Primary libraries for statistical analysis:

scipy.stats : Core statistical tests
statsmodels : Advanced regression and diagnostics
pingouin : User-friendly statistical testing with effect sizes
pymc : Bayesian statistical modeling
arviz : Bayesian visualization and diagnostics

Example Analyses

T-Test with Complete Reporting

import pingouin as pg
import numpy as np

# Run independent t-test
result = pg.ttest(group_a, group_b, correction='auto')

# Extract results
t_stat = result['T'].values[0]
df = result['dof'].values[0]
p_value = result['p-val'].values[0]
cohens_d = result['cohen-d'].values[0]
ci_lower = result['CI95%'].values[0][0]
ci_upper = result['CI95%'].values[0][1]

# Report
print(f"t({df:.0f}) = {t_stat:.2f}, p = {p_value:.3f}")
print(f"Cohen's d = {cohens_d:.2f}, 95% CI [{ci_lower:.2f}, {ci_upper:.2f}]")

ANOVA with Post-Hoc Tests

import pingouin as pg

# One-way ANOVA
aov = pg.anova(dv='score', between='group', data=df, detailed=True)
print(aov)

# If significant, conduct post-hoc tests
if aov['p-unc'].values[0] < 0.05:
    posthoc = pg.pairwise_tukey(dv='score', between='group', data=df)
    print(posthoc)

# Effect size
eta_squared = aov['np2'].values[0]  # Partial eta-squared
print(f"Partial η² = {eta_squared:.3f}")

Linear Regression with Diagnostics

import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor

# Fit model
X = sm.add_constant(X_predictors)  # Add intercept
model = sm.OLS(y, X).fit()

# Summary
print(model.summary())

# Check multicollinearity (VIF)
vif_data = pd.DataFrame()
vif_data["Variable"] = X.columns
vif_data["VIF"] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
print(vif_data)

# Check assumptions
residuals = model.resid
fitted = model.fittedvalues

# Residual plots
import matplotlib.pyplot as plt
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Residuals vs fitted
axes[0, 0].scatter(fitted, residuals, alpha=0.6)
axes[0, 0].axhline(y=0, color='r', linestyle='--')
axes[0, 0].set_xlabel('Fitted values')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].set_title('Residuals vs Fitted')

# Q-Q plot
from scipy import stats
stats.probplot(residuals, dist="norm", plot=axes[0, 1])
axes[0, 1].set_title('Normal Q-Q')

# Scale-Location
axes[1, 0].scatter(fitted, np.sqrt(np.abs(residuals / residuals.std())), alpha=0.6)
axes[1, 0].set_xlabel('Fitted values')
axes[1, 0].set_ylabel('√|Standardized residuals|')
axes[1, 0].set_title('Scale-Location')

# Residuals histogram
axes[1, 1].hist(residuals, bins=20, edgecolor='black', alpha=0.7)
axes[1, 1].set_xlabel('Residuals')
axes[1, 1].set_ylabel('Frequency')
axes[1, 1].set_title('Histogram of Residuals')

plt.tight_layout()
plt.show()

Bayesian T-Test

import pymc as pm
import arviz as az
import numpy as np

with pm.Model() as model:
    # Priors
    mu1 = pm.Normal('mu_group1', mu=0, sigma=10)
    mu2 = pm.Normal('mu_group2', mu=0, sigma=10)
    sigma = pm.HalfNormal('sigma', sigma=10)

    # Likelihood
    y1 = pm.Normal('y1', mu=mu1, sigma=sigma, observed=group_a)
    y2 = pm.Normal('y2', mu=mu2, sigma=sigma, observed=group_b)

    # Derived quantity
    diff = pm.Deterministic('difference', mu1 - mu2)

    # Sample
    trace = pm.sample(2000, tune=1000, return_inferencedata=True)

# Summarize
print(az.summary(trace, var_names=['difference']))

# Probability that group1 > group2
prob_greater = np.mean(trace.posterior['difference'].values > 0)
print(f"P(μ₁ > μ₂ | data) = {prob_greater:.3f}")

# Plot posterior
az.plot_posterior(trace, var_names=['difference'], ref_val=0)

Effect Sizes

Always Calculate Effect Sizes

Effect sizes quantify magnitude, while p-values only indicate existence of an effect.

See references/effect_sizes_and_power.md for comprehensive guidance.

Quick Reference: Common Effect Sizes

Test	Effect Size	Small	Medium	Large
T-test	Cohen's d	0.20	0.50	0.80
ANOVA	η²_p	0.01	0.06	0.14
Correlation	r	0.10	0.30	0.50
Regression	R²	0.02	0.13	0.26
Chi-square	Cramér's V	0.07	0.21	0.35

Important : Benchmarks are guidelines. Context matters!

Calculating Effect Sizes

Most effect sizes are automatically calculated by pingouin:

# T-test returns Cohen's d
result = pg.ttest(x, y)
d = result['cohen-d'].values[0]

# ANOVA returns partial eta-squared
aov = pg.anova(dv='score', between='group', data=df)
eta_p2 = aov['np2'].values[0]

# Correlation: r is already an effect size
corr = pg.corr(x, y)
r = corr['r'].values[0]

Confidence Intervals for Effect Sizes

Always report CIs to show precision:

from pingouin import compute_effsize_from_t

# For t-test
d, ci = compute_effsize_from_t(
    t_statistic,
    nx=len(group1),
    ny=len(group2),
    eftype='cohen'
)
print(f"d = {d:.2f}, 95% CI [{ci[0]:.2f}, {ci[1]:.2f}]")

Power Analysis

A Priori Power Analysis (Study Planning)

Determine required sample size before data collection:

from statsmodels.stats.power import (
    tt_ind_solve_power,
    FTestAnovaPower
)

# T-test: What n is needed to detect d = 0.5?
n_required = tt_ind_solve_power(
    effect_size=0.5,
    alpha=0.05,
    power=0.80,
    ratio=1.0,
    alternative='two-sided'
)
print(f"Required n per group: {n_required:.0f}")

# ANOVA: What n is needed to detect f = 0.25?
anova_power = FTestAnovaPower()
n_per_group = anova_power.solve_power(
    effect_size=0.25,
    ngroups=3,
    alpha=0.05,
    power=0.80
)
print(f"Required n per group: {n_per_group:.0f}")

Sensitivity Analysis (Post-Study)

Determine what effect size you could detect:

# With n=50 per group, what effect could we detect?
detectable_d = tt_ind_solve_power(
    effect_size=None,  # Solve for this
    nobs1=50,
    alpha=0.05,
    power=0.80,
    ratio=1.0,
    alternative='two-sided'
)
print(f"Study could detect d ≥ {detectable_d:.2f}")

Note : Post-hoc power analysis (calculating power after study) is generally not recommended. Use sensitivity analysis instead.

See references/effect_sizes_and_power.md for detailed guidance.

Reporting Results

APA Style Statistical Reporting

Follow guidelines in references/reporting_standards.md.

Essential Reporting Elements

Descriptive statistics : M, SD, n for all groups/variables
Test statistics : Test name, statistic, df, exact p-value
Effect sizes : With confidence intervals
Assumption checks : Which tests were done, results, actions taken
All planned analyses : Including non-significant findings

Example Report Templates

Independent T-Test

Group A (n = 48, M = 75.2, SD = 8.5) scored significantly higher than
Group B (n = 52, M = 68.3, SD = 9.2), t(98) = 3.82, p < .001, d = 0.77,
95% CI [0.36, 1.18], two-tailed. Assumptions of normality (Shapiro-Wilk:
Group A W = 0.97, p = .18; Group B W = 0.96, p = .12) and homogeneity
of variance (Levene's F(1, 98) = 1.23, p = .27) were satisfied.

One-Way ANOVA

A one-way ANOVA revealed a significant main effect of treatment condition
on test scores, F(2, 147) = 8.45, p < .001, η²_p = .10. Post hoc
comparisons using Tukey's HSD indicated that Condition A (M = 78.2,
SD = 7.3) scored significantly higher than Condition B (M = 71.5,
SD = 8.1, p = .002, d = 0.87) and Condition C (M = 70.1, SD = 7.9,
p < .001, d = 1.07). Conditions B and C did not differ significantly
(p = .52, d = 0.18).

Multiple Regression

Multiple linear regression was conducted to predict exam scores from
study hours, prior GPA, and attendance. The overall model was significant,
F(3, 146) = 45.2, p < .001, R² = .48, adjusted R² = .47. Study hours
(B = 1.80, SE = 0.31, β = .35, t = 5.78, p < .001, 95% CI [1.18, 2.42])
and prior GPA (B = 8.52, SE = 1.95, β = .28, t = 4.37, p < .001,
95% CI [4.66, 12.38]) were significant predictors, while attendance was
not (B = 0.15, SE = 0.12, β = .08, t = 1.25, p = .21, 95% CI [-0.09, 0.39]).
Multicollinearity was not a concern (all VIF < 1.5).

Bayesian Analysis

A Bayesian independent samples t-test was conducted using weakly
informative priors (Normal(0, 1) for mean difference). The posterior
distribution indicated that Group A scored higher than Group B
(M_diff = 6.8, 95% credible interval [3.2, 10.4]). The Bayes Factor
BF₁₀ = 45.3 provided very strong evidence for a difference between
groups, with a 99.8% posterior probability that Group A's mean exceeded
Group B's mean. Convergence diagnostics were satisfactory (all R̂ < 1.01,
ESS > 1000).

Bayesian Statistics

When to Use Bayesian Methods

Consider Bayesian approaches when:

You have prior information to incorporate
You want direct probability statements about hypotheses
Sample size is small or planning sequential data collection
You need to quantify evidence for the null hypothesis
The model is complex (hierarchical, missing data)

See references/bayesian_statistics.md for comprehensive guidance on:

Bayes' theorem and interpretation
Prior specification (informative, weakly informative, non-informative)
Bayesian hypothesis testing with Bayes Factors
Credible intervals vs. confidence intervals
Bayesian t-tests, ANOVA, regression, and hierarchical models
Model convergence checking and posterior predictive checks

Key Advantages

Intuitive interpretation : "Given the data, there is a 95% probability the parameter is in this interval"
Evidence for null : Can quantify support for no effect
Flexible : No p-hacking concerns; can analyze data as it arrives
Uncertainty quantification : Full posterior distribution

Resources

This skill includes comprehensive reference materials:

References Directory

test_selection_guide.md : Decision tree for choosing appropriate statistical tests
assumptions_and_diagnostics.md : Detailed guidance on checking and handling assumption violations
effect_sizes_and_power.md : Calculating, interpreting, and reporting effect sizes; conducting power analyses
bayesian_statistics.md : Complete guide to Bayesian analysis methods
reporting_standards.md : APA-style reporting guidelines with examples

Scripts Directory

assumption_checks.py : Automated assumption checking with visualizations
- comprehensive_assumption_check(): Complete workflow
- check_normality(): Normality testing with Q-Q plots
- check_homogeneity_of_variance(): Levene's test with box plots
- check_linearity(): Regression linearity checks
- detect_outliers(): IQR and z-score outlier detection

Best Practices

Pre-register analyses when possible to distinguish confirmatory from exploratory
Always check assumptions before interpreting results
Report effect sizes with confidence intervals
Report all planned analyses including non-significant results
Distinguish statistical from practical significance
Visualize data before and after analysis
Check diagnostics for regression/ANOVA (residual plots, VIF, etc.)
Conduct sensitivity analyses to assess robustness
Share data and code for reproducibility
Be transparent about violations, transformations, and decisions

Common Pitfalls to Avoid

P-hacking : Don't test multiple ways until something is significant
HARKing : Don't present exploratory findings as confirmatory
Ignoring assumptions : Check them and report violations
Confusing significance with importance : p < .05 ≠ meaningful effect
Not reporting effect sizes : Essential for interpretation
Cherry-picking results : Report all planned analyses
Misinterpreting p-values : They're NOT probability that hypothesis is true
Multiple comparisons : Correct for family-wise error when appropriate
Ignoring missing data : Understand mechanism (MCAR, MAR, MNAR)
Overinterpreting non-significant results : Absence of evidence ≠ evidence of absence

Getting Started Checklist

When beginning a statistical analysis:

Define research question and hypotheses
Determine appropriate statistical test (use test_selection_guide.md)
Conduct power analysis to determine sample size
Load and inspect data
Check for missing data and outliers
Verify assumptions using assumption_checks.py
Run primary analysis
Calculate effect sizes with confidence intervals
Conduct post-hoc tests if needed (with corrections)
Create visualizations
Write results following reporting_standards.md
Conduct sensitivity analyses
Share data and code

Support and Further Reading

For questions about:

Test selection : See references/test_selection_guide.md
Assumptions : See references/assumptions_and_diagnostics.md
Effect sizes : See references/effect_sizes_and_power.md
Bayesian methods : See references/bayesian_statistics.md
Reporting : See references/reporting_standards.md

Key textbooks :

Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences
Field, A. (2013). Discovering Statistics Using IBM SPSS Statistics
Gelman, A., & Hill, J. (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models
Kruschke, J. K. (2014). Doing Bayesian Data Analysis

Online resources :

APA Style Guide: https://apastyle.apa.org/
Statistical Consulting: Cross Validated (stats.stackexchange.com)

Weekly Installs

225

Repository

davila7/claude-…emplates

GitHub Stars

22.6K

First Seen

Jan 21, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

opencode176

claude-code175

gemini-cli161

codex152

cursor149

github-copilot132

AI Elements：基于shadcn/ui的AI原生应用组件库，快速构建对话界面

56,200 周安装