A/B测试分析工具 - 统计学方法评估实验结果，优化产品决策

ab-test-analysis by phuryn/pm-skills

227 周安装量

8,100 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/phuryn/pm-skills --skill ab-test-analysis

数据分析测试产品管理

🇨🇳中文介绍

A/B 测试分析

以严谨的统计学方法评估 A/B 测试结果，并将发现转化为清晰的产品决策。

背景

您正在分析 $ARGUMENTS 的 A/B 测试结果。

如果用户提供了数据文件（CSV、Excel 或分析导出文件），请直接读取并分析它们。需要时生成用于统计计算的 Python 脚本。

操作指南

理解实验 ：
- 假设是什么？
- 改变了什么（变体）？
- 主要指标是什么？是否有护栏指标？
- 测试运行了多长时间？
- 流量分配比例是多少？
验证测试设置 ：
- 样本量 ：样本量是否足以检测预期的效应大小？
  - 使用公式：n = (Z²α/2 × 2 × p × (1-p)) / MDE²
  - 如果测试功效不足（<80%），请标记
- 持续时间 ：测试是否至少运行了 1-2 个完整的业务周期？
- 随机化 ：是否有样本比例不匹配（SRM）的证据？
- 新奇/首要效应 ：是否有足够的时间来消除初始行为变化的影响？
计算统计显著性 ：
- 对照组和变体组的转化率
- 相对提升 ：(变体 - 对照组) / 对照组 × 100
- p 值 ：使用双尾 z 检验或卡方检验
- 置信区间 ：差异的 95% 置信区间
- ：p 值是否 < 0.05？

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

结果	建议
显著正向提升，无护栏问题	上线 — 推广至 100%
显著正向提升，存在护栏问题	调查 — 在上线前理解权衡
不显著，有正向趋势	延长测试 — 需要更多数据或更大效应
不显著，持平	停止测试 — 未检测到有意义的差异
显著负向提升	不上线 — 回退至对照组，分析原因

🇺🇸English

A/B Test Analysis

Evaluate A/B test results with statistical rigor and translate findings into clear product decisions.

Context

You are analyzing A/B test results for $ARGUMENTS.

If the user provides data files (CSV, Excel, or analytics exports), read and analyze them directly. Generate Python scripts for statistical calculations when needed.

Instructions

Understand the experiment :
- What was the hypothesis?
- What was changed (the variant)?
- What is the primary metric? Any guardrail metrics?
- How long did the test run?
- What is the traffic split?
Validate the test setup :
- Sample size : Is the sample large enough for the expected effect size?
  - Use the formula: n = (Z²α/2 × 2 × p × (1-p)) / MDE²
  - Flag if the test is underpowered (<80% power)
- Duration : Did the test run for at least 1-2 full business cycles?
- Randomization : Any evidence of sample ratio mismatch (SRM)?
- Novelty/primacy effects : Was there enough time to wash out initial behavior changes?
Calculate statistical significance :
- Conversion rate for control and variant
- Relative lift : (variant - control) / control × 100
- p-value : Using a two-tailed z-test or chi-squared test
- Confidence interval : 95% CI for the difference
- Statistical significance : Is p < 0.05?
- Practical significance : Is the lift meaningful for the business?

If the user provides raw data, generate and run a Python script to calculate these.

Check guardrail metrics :
- Did any guardrail metrics (revenue, engagement, page load time) degrade?
- A winning primary metric with degraded guardrails may not be a true win
Interpret results :

Outcome	Recommendation
Significant positive lift, no guardrail issues	Ship it — roll out to 100%
Significant positive lift, guardrail concerns	Investigate — understand trade-offs before shipping
Not significant, positive trend	Extend the test — need more data or larger effect
Not significant, flat	Stop the test — no meaningful difference detected
Significant negative lift	Don't ship — revert to control, analyze why

Provide the analysis summary :

## A/B Test Results: [Test Name]

**Hypothesis**: [What we expected]
**Duration**: [X days] | **Sample**: [N control / M variant]

| Metric | Control | Variant | Lift | p-value | Significant? |
|---|---|---|---|---|---|
| [Primary] | X% | Y% | +Z% | 0.0X | Yes/No |
| [Guardrail] | ... | ... | ... | ... | ... |

**Recommendation**: [Ship / Extend / Stop / Investigate]
**Reasoning**: [Why]
**Next steps**: [What to do]

Think step by step. Save as markdown. Generate Python scripts for calculations if raw data is provided.

A/B测试分析工具 - 统计学方法评估实验结果，优化产品决策

🇨🇳中文介绍

A/B 测试分析

背景

操作指南

相关 Skills

延伸阅读

🇺🇸English

A/B Test Analysis

Context

Instructions

Further Reading

最新 Skills