统计分析技能指南：描述性统计、趋势分析与异常值检测方法

statistical-analysis by anthropics/knowledge-work-plugins

658 周安装量

10,300 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/anthropics/knowledge-work-plugins --skill statistical-analysis

数据可视化自动化数据分析

🇨🇳中文介绍

统计分析技能

描述性统计、趋势分析、异常值检测、假设检验，以及关于何时对统计结论保持谨慎的指导。

描述性统计方法论

集中趋势

根据数据选择合适的中心度量：

情况	使用	原因
对称分布，无异常值	均值	最有效的估计量
偏态分布	中位数	对异常值稳健
分类或顺序数据	众数	非数值数据的唯一选择
高度偏态且存在异常值（例如，每用户收入）	中位数 + 均值	同时报告；差距显示偏斜程度

对于业务指标，始终同时报告均值和中位数。 如果两者差异显著，则数据存在偏斜，仅使用均值会产生误导。

离散程度与变异性

标准差 ：数值通常偏离均值的程度。用于正态分布数据。
四分位距 ：从 p25 到 p75 的距离。对异常值稳健。用于偏态数据。
变异系数 ：标准差 / 均值。用于比较不同尺度指标的变异性。
极差：最大值减去最小值。对异常值敏感，但能快速了解数据范围。

业务背景下的百分位数

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

趋势分析与预测

移动平均以平滑噪声：

# 7日移动平均（适用于具有周季节性的日度数据）
df['ma_7d'] = df['metric'].rolling(window=7, min_periods=1).mean()

# 28日移动平均（平滑周度和月度模式）
df['ma_28d'] = df['metric'].rolling(window=28, min_periods=1).mean()

周期对比 ：

周环比：与上周同一天比较
月环比：与上月同期比较
年同比：季节性业务的黄金标准
去年同期：比较特定日历日

简单增长率：(当期 - 前期) / 前期
复合年增长率：(期末 / 期初) ^ (1 / 年数) - 1
对数增长率：ln(当期 / 前期)  -- 适用于波动较大的序列

检查周期性模式：

绘制原始时间序列图——首先进行视觉检查
计算星期几的平均值：是否存在清晰的周度模式？
计算月份的平均值：是否存在年度周期？
比较周期时，始终使用年同比或同期比较，以避免趋势与季节性混淆

预测（简单方法）

对于业务分析师（非数据科学家），使用直接的方法：

朴素预测 ：明天 = 今天。用作基线。
季节性朴素 ：明天 = 上周/去年同一天。
线性趋势 ：对历史数据拟合一条直线。仅适用于明显线性趋势。
移动平均预测 ：使用历史移动平均值作为预测。

始终传达不确定性。提供一个范围，而不是点估计：

"基于 3 个月的趋势，我们预计下个月将有 1 万至 1.2 万次注册"
而不是"我们下个月将获得恰好 11,234 次注册"

何时需要升级到数据科学家 ：非线性趋势、多重季节性、外部因素（营销支出、节假日），或者当预测准确性对资源分配至关重要时。

Z 分数法（适用于正态分布数据）：

z_scores = (df['value'] - df['value'].mean()) / df['value'].std()
outliers = df[abs(z_scores) > 3]  # 超过 3 个标准差

四分位距法（对非正态分布稳健）：

Q1 = df['value'].quantile(0.25)
Q3 = df['value'].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers = df[(df['value'] < lower_bound) | (df['value'] > upper_bound)]

百分位数法（最简单）：

outliers = df[(df['value'] < df['value'].quantile(0.01)) |
              (df['value'] > df['value'].quantile(0.99))]

不要自动删除异常值。而是：

调查：这是数据错误、真实的极端值，还是不同的群体？
数据错误 ：修正或删除（例如，负年龄、1970 年的时间戳）
真实极端值 ：保留它们，但考虑使用稳健统计量（用中位数代替均值）
不同群体 ：将它们分割出来进行单独分析（例如，企业客户与中小企业客户）

报告你的操作："我们排除了 47 条交易金额 >5 万美元的记录（0.3%），这些代表单独分析的企业批量订单。"

时间序列异常检测

用于检测时间序列中的异常值：

计算期望值（移动平均值或去年同期值）
计算与期望值的偏差
标记超过阈值的偏差（通常是残差的 2-3 个标准差）
区分点异常（单个异常值）和变点（持续的偏移）

当你需要确定观察到的差异是真实的还是可能由随机机会造成时，使用假设检验。常见场景：

A/B 测试结果：变体 B 是否真的比 A 好？
前后对比：产品变更是否真的改变了指标？
细分群体比较：企业客户是否真的具有更高的留存率？

零假设 ：没有差异（默认假设）
备择假设 ：存在差异
选择显著性水平 ：通常为 0.05（5% 的假阳性机会）
计算检验统计量和 p 值
解释：如果 p 值 < alpha，则拒绝零假设（存在真实差异的证据）

场景	检验	何时使用
比较两组均值	t 检验（独立）	正态数据，两组
比较两组比例	比例 z 检验	转化率，二元结果
比较配对测量值	配对 t 检验	同一实体的前后对比
比较 3 组以上均值	方差分析	多个细分群体或变体
非正态数据，两组	Mann-Whitney U 检验	偏态指标，顺序数据
类别之间的关联	卡方检验	两个分类变量

实际显著性与统计显著性

统计显著性 意味着差异不太可能由偶然造成。

实际显著性 意味着差异大到足以影响业务决策。

差异可能在统计上显著但在实际上无意义（常见于大样本）。始终报告：

效应大小 ：差异有多大？（例如，"变体 B 将转化率提高了 0.3 个百分点"）
置信区间 ：真实效应的合理范围是什么？
业务影响 ：这转化为多少收入、用户或其他业务指标？

小样本会产生不可靠的结果，即使 p 值显著
比例的经验法则：每组至少需要 30 个事件才能保证基本可靠性
要检测小效应（例如，1% 的转化率变化），每组可能需要数千个观察值
如果你的样本量小，请说明："由于每组只有 200 个观察值，我们检测小于 X% 的效应的能力有限"

何时对统计结论保持谨慎

锚定于特定数字

警惕虚假的精确性：

"下季度流失率将为 4.73%" 暗示了超出合理范围的确定性
更倾向于使用范围："基于历史模式，我们预计流失率在 4-6% 之间"
适当舍入："大约 5%" 通常比 "4.73%" 更诚实

2026 年 1 月 31 日

🇺🇸English

Statistical Analysis Skill

Descriptive statistics, trend analysis, outlier detection, hypothesis testing, and guidance on when to be cautious about statistical claims.

Descriptive Statistics Methodology

Central Tendency

Choose the right measure of center based on the data:

Situation	Use	Why
Symmetric distribution, no outliers	Mean	Most efficient estimator
Skewed distribution	Median	Robust to outliers
Categorical or ordinal data	Mode	Only option for non-numeric
Highly skewed with outliers (e.g., revenue per user)	Median + mean	Report both; the gap shows skew

Always report mean and median together for business metrics. If they diverge significantly, the data is skewed and the mean alone is misleading.

Spread and Variability

Standard deviation : How far values typically fall from the mean. Use with normally distributed data.
Interquartile range (IQR) : Distance from p25 to p75. Robust to outliers. Use with skewed data.
Coefficient of variation (CV) : StdDev / Mean. Use to compare variability across metrics with different scales.
Range : Max minus min. Sensitive to outliers but gives a quick sense of data extent.

Percentiles for Business Context

Report key percentiles to tell a richer story than mean alone:

p1:   Bottom 1% (floor / minimum typical value)
p5:   Low end of normal range
p25:  First quartile
p50:  Median (typical user)
p75:  Third quartile
p90:  Top 10% / power users
p95:  High end of normal range
p99:  Top 1% / extreme users

Example narrative : "The median session duration is 4.2 minutes, but the top 10% of users spend over 22 minutes per session, pulling the mean up to 7.8 minutes."

Describing Distributions

Characterize every numeric distribution you analyze:

Shape : Normal, right-skewed, left-skewed, bimodal, uniform, heavy-tailed
Center : Mean and median (and the gap between them)
Spread : Standard deviation or IQR
Outliers : How many and how extreme
Bounds : Is there a natural floor (zero) or ceiling (100%)?

Trend Analysis and Forecasting

Identifying Trends

Moving averages to smooth noise:

# 7-day moving average (good for daily data with weekly seasonality)
df['ma_7d'] = df['metric'].rolling(window=7, min_periods=1).mean()

# 28-day moving average (smooths weekly AND monthly patterns)
df['ma_28d'] = df['metric'].rolling(window=28, min_periods=1).mean()

Period-over-period comparison :

Week-over-week (WoW): Compare to same day last week
Month-over-month (MoM): Compare to same month prior
Year-over-year (YoY): Gold standard for seasonal businesses
Same-day-last-year: Compare specific calendar day

Growth rates :

Simple growth: (current - previous) / previous
CAGR: (ending / beginning) ^ (1 / years) - 1
Log growth: ln(current / previous)  -- better for volatile series

Seasonality Detection

Check for periodic patterns:

Plot the raw time series -- visual inspection first
Compute day-of-week averages: is there a clear weekly pattern?
Compute month-of-year averages: is there an annual cycle?
When comparing periods, always use YoY or same-period comparisons to avoid conflating trend with seasonality

Forecasting (Simple Methods)

For business analysts (not data scientists), use straightforward methods:

Naive forecast : Tomorrow = today. Use as a baseline.
Seasonal naive : Tomorrow = same day last week/year.
Linear trend : Fit a line to historical data. Only for clearly linear trends.
Moving average forecast : Use trailing average as the forecast.

Always communicate uncertainty. Provide a range, not a point estimate:

"We expect 10K-12K signups next month based on the 3-month trend"
NOT "We will get exactly 11,234 signups next month"

When to escalate to a data scientist : Non-linear trends, multiple seasonalities, external factors (marketing spend, holidays), or when forecast accuracy matters for resource allocation.

Outlier and Anomaly Detection

Statistical Methods

Z-score method (for normally distributed data):

z_scores = (df['value'] - df['value'].mean()) / df['value'].std()
outliers = df[abs(z_scores) > 3]  # More than 3 standard deviations

IQR method (robust to non-normal distributions):

Q1 = df['value'].quantile(0.25)
Q3 = df['value'].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers = df[(df['value'] < lower_bound) | (df['value'] > upper_bound)]

Percentile method (simplest):

outliers = df[(df['value'] < df['value'].quantile(0.01)) |
              (df['value'] > df['value'].quantile(0.99))]

Handling Outliers

Do NOT automatically remove outliers. Instead:

Investigate : Is this a data error, a genuine extreme value, or a different population?
Data errors : Fix or remove (e.g., negative ages, timestamps in year 1970)
Genuine extremes : Keep them but consider using robust statistics (median instead of mean)
Different population : Segment them out for separate analysis (e.g., enterprise vs. SMB customers)

Report what you did : "We excluded 47 records (0.3%) with transaction amounts >$50K, which represent bulk enterprise orders analyzed separately."

Time Series Anomaly Detection

For detecting unusual values in a time series:

Compute expected value (moving average or same-period-last-year)
Compute deviation from expected
Flag deviations beyond a threshold (typically 2-3 standard deviations of the residuals)
Distinguish between point anomalies (single unusual value) and change points (sustained shift)

Hypothesis Testing Basics

When to Use

Use hypothesis testing when you need to determine whether an observed difference is likely real or could be due to random chance. Common scenarios:

A/B test results: Is variant B actually better than A?
Before/after comparison: Did the product change actually move the metric?
Segment comparison: Do enterprise customers really have higher retention?

The Framework

Null hypothesis (H0) : There is no difference (the default assumption)
Alternative hypothesis (H1) : There is a difference
Choose significance level (alpha) : Typically 0.05 (5% chance of false positive)
Compute test statistic and p-value
Interpret : If p < alpha, reject H0 (evidence of a real difference)

Common Tests

Scenario	Test	When to Use
Compare two group means	t-test (independent)	Normal data, two groups
Compare two group proportions	z-test for proportions	Conversion rates, binary outcomes
Compare paired measurements	Paired t-test	Before/after on same entities
Compare 3+ group means	ANOVA	Multiple segments or variants
Non-normal data, two groups	Mann-Whitney U test	Skewed metrics, ordinal data
Association between categories	Chi-squared test	Two categorical variables

Practical Significance vs. Statistical Significance

Statistical significance means the difference is unlikely due to chance.

Practical significance means the difference is large enough to matter for business decisions.

A difference can be statistically significant but practically meaningless (common with large samples). Always report:

Effect size : How big is the difference? (e.g., "Variant B improved conversion by 0.3 percentage points")
Confidence interval : What's the range of plausible true effects?
Business impact : What does this translate to in revenue, users, or other business terms?

Sample Size Considerations

Small samples produce unreliable results, even with significant p-values
Rule of thumb for proportions: Need at least 30 events per group for basic reliability
For detecting small effects (e.g., 1% conversion rate change), you may need thousands of observations per group
If your sample is small, say so: "With only 200 observations per group, we have limited power to detect effects smaller than X%"

When to Be Cautious About Statistical Claims

Correlation Is Not Causation

When you find a correlation, explicitly consider:

Reverse causation : Maybe B causes A, not A causes B
Confounding variables : Maybe C causes both A and B
Coincidence : With enough variables, spurious correlations are inevitable

What you can say : "Users who use feature X have 30% higher retention" What you cannot say without more evidence : "Feature X causes 30% higher retention"

Multiple Comparisons Problem

When you test many hypotheses, some will be "significant" by chance:

Testing 20 metrics at p=0.05 means ~1 will be falsely significant
If you looked at many segments before finding one that's different, note that
Adjust for multiple comparisons with Bonferroni correction (divide alpha by number of tests) or report how many tests were run

Simpson's Paradox

A trend in aggregated data can reverse when data is segmented:

Always check whether the conclusion holds across key segments
Example: Overall conversion goes up, but conversion goes down in every segment -- because the mix shifted toward a higher-converting segment

Survivorship Bias

You can only analyze entities that "survived" to be in your dataset:

Analyzing active users ignores those who churned
Analyzing successful companies ignores those that failed
Always ask: "Who is missing from this dataset, and would their inclusion change the conclusion?"

Ecological Fallacy

Aggregate trends may not apply to individuals:

"Countries with higher X have higher Y" does NOT mean "individuals with higher X have higher Y"
Be careful about applying group-level findings to individual cases

Anchoring on Specific Numbers

Be wary of false precision:

"Churn will be 4.73% next quarter" implies more certainty than is warranted
Prefer ranges: "We expect churn between 4-6% based on historical patterns"
Round appropriately: "About 5%" is often more honest than "4.73%"

Weekly Installs

311

Repository

anthropics/know…-plugins

GitHub Stars

8.8K

First Seen

Jan 31, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

opencode252

codex241

gemini-cli235

claude-code227

github-copilot224

kimi-cli207

DOCX文件创建、编辑与分析完整指南 - 使用docx-js、Pandoc和Python脚本

41,800 周安装