postmortem by alirezarezvani/claude-skills
npx skills add https://github.com/alirezarezvani/claude-skills --skill postmortem命令: /em:postmortem <事件>
不是指责。是理解。失败的交易、未达成的季度目标、反响不佳的功能、不成功的招聘。究竟发生了什么,原因是什么,以及因此需要做出哪些改变。
它们通常演变成以下两种情况之一:
指责会 —— 有人成了替罪羊,防御墙高高筑起,真正的原因未被审视,同样的问题会以不同的形式再次发生。
粉饰会 —— “我们学到了很多,我们会做得更好,这里有12条模糊的行动项。” 一切照旧。同样的问题,不同的季度。
真正的复盘分析两者都不是。它是对系统故障的一次严谨调查。不是“这是谁的错”,而是“哪些条件使得这个结果在事后看来是可预见的?”
目的: 从失败中汲取最大的学习价值,以便防止问题复发并改进系统。
在分析之前:准确描述发生了什么。
精确性至关重要。“我们未达到第三季度收入目标”不够精确。“我们新签了42万美元的年度经常性收入,而目标是68万美元——26万美元的差额主要由三笔推迟到第四季度的交易和一笔输给竞争对手的交易造成”是精确的。
目标:从发生了什么(表象)追溯到为什么会发生(根本原因)。
典型的糟糕“5个为什么”:
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
→ 结论:无能为力。企业销售就是如此。
真正的“5个为什么”:
→ 根本原因:资格认定标准过时,没有负责人,没有审查流程。→ 解决方案:更新标准,指定负责人,增加季度审查。
检验良好根本原因的标准: 你是否能通过一个具体、切实的改变来防止问题复发?如果是,你就找到了真正的原因。
大多数事件都有多个促成因素。并非所有都是根本原因。
促成因素: 使情况恶化,但不是核心原因。如果消除它,结果可能会不同——但同类问题仍会复发。
根本原因: 导致该结果很可能发生的基本条件。解决这个问题,此类问题就不会复发。
示例 —— 失败的招聘:
这种区分很重要。 如果只解决促成因素,下次你将会遇到一个看似不同但结构相同的失败。
每次失败都有前兆。事后看来,它们显而易见。这一步的价值在于让它们在未来变得显而易见。
提问:
常见模式:
这一步对于系统性问题尤为重要——“我们觉得提出担忧不安全”是一个比“交易资格认定有误”更深层的根本原因。
有些失败发生在决策正确的情况下。有些则是因为决策错误。了解其中的差异可以防止过度纠正和纠正不足。
对于不可控因素:可以做些什么来增强对类似事件的韧性?对于可控因素:具体需要改变什么?
警告: “这超出了我们的控制范围”有时被用来逃避责任。要严谨。
每次复盘分析都应以变更记录结束——具体的承诺,有负责人和截止日期。
糟糕的行动项:
良好的行动项:
针对每项行动:
最常被跳过的步骤。如果没有人检查变更是否真的发生并真正有效,那么复盘分析就毫无用处。
设定一个验证日期:“我们将在6月的董事会会议上审查资格认定标准是否已更新,以及交易推迟率是否有所改善。”
没有这一步,复盘分析就是一场表演。
事件: [名称和日期]
预期: [预期发生什么]
实际: [实际发生了什么]
影响: [量化影响]
时间线
[日期]: [发生了什么或显现了什么]
[日期]: ...
5个为什么
1. [为什么X发生了?] → 因为 [Y]
2. [为什么Y发生了?] → 因为 [Z]
3. [为什么Z发生了?] → 因为 [A]
4. [为什么A发生了?] → 因为 [B]
5. [为什么B发生了?] → 因为 [根本原因]
根本原因: [一句清晰的陈述]
促成因素
• [因素] —— 它是如何促成问题的
• [因素] —— 它是如何促成问题的
被忽视的预警信号
• [在什么日期可见的信号] —— 为什么没有采取行动
可控因素: [列表]
不可控因素: [列表]
变更记录
| 行动项 | 负责人 | 截止日期 | 验证方式 |
|--------|-------|----------|-------------|
| [具体变更] | [姓名] | [日期] | [如何验证] |
验证日期: [检查日期]
指责是廉价的。理解是困难的。
目标不是确认某人犯了错误。目标是理解为什么系统产生了那个结果——以便改进系统。
“销售人员没有正确认定交易资格”是指责。“当我们向上游市场转移时,我们的资格认定框架没有更新,而且没有人负责保持其时效性”是理解。
第一种版本会解雇或羞辱某人。第二种版本会建立一个更具韧性的组织。
两者可能同时成立。区别在于:哪一个才能真正防止问题复发?
每周安装量
109
代码仓库
GitHub 星标数
6.7K
首次出现
6 天前
安全审计
安装于
opencode105
amp104
gemini-cli104
codex104
kimi-cli104
cursor104
Command: /em:postmortem <event>
Not blame. Understanding. The failed deal, the missed quarter, the feature that flopped, the hire that didn't work out. What actually happened, why, and what changes as a result.
They become one of two things:
The blame session — someone gets scapegoated, defensive walls go up, actual causes don't get examined, and the same problem happens again in a different form.
The whitewash — "We learned a lot, we're going to do better, here are 12 vague action items." Nothing changes. Same problem, different quarter.
A real post-mortem is neither. It's a rigorous investigation into a system failure. Not "whose fault was it" but "what conditions made this outcome predictable in hindsight?"
The purpose: extract the maximum learning value from a failure so you can prevent recurrence and improve the system.
Before analysis: describe exactly what happened.
Precision matters. "We missed Q3 revenue" is not precise enough. "We closed $420K in new ARR vs $680K target — a $260K miss driven primarily by three deals that slipped to Q4 and one deal that was lost to a competitor" is precise.
The goal: get from what happened (the symptom) to why it happened (the root cause).
Standard bad 5 Whys:
→ Conclusion: Nothing to do. It's just enterprise.
Real 5 Whys:
→ Root cause: Qualification criteria outdated, no owner, no review process. → Fix: Update criteria, assign owner, add quarterly review.
The test for a good root cause: Could you prevent recurrence with a specific, concrete change? If yes, you've found something real.
Most events have multiple contributing factors. Not all are root causes.
Contributing factor: Made it worse, but isn't the core reason. If removed, the outcome might have been different — but the same class of problem would recur.
Root cause: The fundamental condition that made the outcome probable. Fix this, and this class of problem doesn't recur.
Example — failed hire:
The distinction matters. If you address only contributing factors, you'll have a different-looking but structurally identical failure next time.
Every failure has precursors. In hindsight, they're obvious. The value of this step is making them obvious prospectively.
Ask:
Common patterns:
This step is particularly important for systemic issues — "we didn't feel safe raising the concern" is a much deeper root cause than "the deal qualification was off."
Some failures happen despite correct decisions. Some happen because of incorrect decisions. Knowing the difference prevents both overcorrection and undercorrection.
For things out of control: what can be done to be more resilient to similar events? For things in control: what specifically needs to change?
Warning: "It was outside our control" is sometimes used to avoid accountability. Be rigorous.
Every post-mortem ends with a change register — specific commitments, owned and dated.
Bad action items:
Good action items:
For each action:
The most commonly skipped step. Post-mortems are useless if nobody checks whether the changes actually happened and actually worked.
Set a verification date: "We'll review whether qualification criteria have been updated and whether deal slippage rate has improved at the June board meeting."
Without this, post-mortems are theater.
EVENT: [Name and date]
EXPECTED: [What was supposed to happen]
ACTUAL: [What happened]
IMPACT: [Quantified]
TIMELINE
[Date]: [What happened or was visible]
[Date]: ...
5 WHYS
1. [Why did X happen?] → Because [Y]
2. [Why did Y happen?] → Because [Z]
3. [Why did Z happen?] → Because [A]
4. [Why did A happen?] → Because [B]
5. [Why did B happen?] → Because [ROOT CAUSE]
ROOT CAUSE: [One clear sentence]
CONTRIBUTING FACTORS
• [Factor] — how it contributed
• [Factor] — how it contributed
WARNING SIGNS MISSED
• [Signal visible at what date] — why it wasn't acted on
WHAT WAS IN CONTROL: [List]
WHAT WASN'T: [List]
CHANGE REGISTER
| Action | Owner | Due Date | Verification |
|--------|-------|----------|-------------|
| [Specific change] | [Name] | [Date] | [How to verify] |
VERIFICATION DATE: [Date of check-in]
Blame is cheap. Understanding is hard.
The goal isn't to establish that someone made a mistake. The goal is to understand why the system produced that outcome — so the system can be improved.
"The salesperson didn't qualify the deal properly" is blame. "Our qualification framework hadn't been updated when we moved upmarket, and no one owned keeping it current" is understanding.
The first version fires or shames someone. The second version builds a more resilient organization.
Both might be true simultaneously. The distinction is: which one actually prevents recurrence?
Weekly Installs
109
Repository
GitHub Stars
6.7K
First Seen
6 days ago
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
opencode105
amp104
gemini-cli104
codex104
kimi-cli104
cursor104
站立会议模板:敏捷开发每日站会指南与工具(含远程团队异步模板)
10,500 周安装