systematic-debugging by davila7/claude-code-templates
npx skills add https://github.com/davila7/claude-code-templates --skill systematic-debugging随机修复浪费时间并会引入新错误。快速补丁掩盖了根本问题。
核心原则: 在尝试修复之前,务必找到根本原因。症状修复就是失败。
违反此流程的字面要求,就是违背调试的精神。
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
如果你没有完成阶段 1,就不能提出修复方案。
适用于任何技术问题:
特别在以下情况使用:
在以下情况下不要跳过:
在进入下一阶段之前,你必须完成每个阶段。
在尝试任何修复之前:
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
稳定复现
检查最近的变更
在多组件系统中收集证据
当系统有多个组件时(CI → 构建 → 签名,API → 服务 → 数据库):
在提出修复方案之前,添加诊断工具:
For EACH component boundary:
- Log what data enters component
- Log what data exits component
- Verify environment/config propagation
- Check state at each layer
Run once to gather evidence showing WHERE it breaks
THEN analyze evidence to identify failing component
THEN investigate that specific component
示例(多层系统):
# Layer 1: Workflow
echo "=== Secrets available in workflow: ==="
echo "IDENTITY: ${IDENTITY:+SET}${IDENTITY:-UNSET}"
# Layer 2: Build script
echo "=== Env vars in build script: ==="
env | grep IDENTITY || echo "IDENTITY not in environment"
# Layer 3: Signing script
echo "=== Keychain state: ==="
security list-keychains
security find-identity -v
# Layer 4: Actual signing
codesign --sign "$IDENTITY" --verbose=4 "$APP"
这揭示了: 哪一层失败(密钥 → 工作流 ✓,工作流 → 构建 ✗)
当错误位于调用栈深处时:
查看本目录中的 root-cause-tracing.md 以获取完整的向后追踪技术。
快速版本:
* 错误值起源于哪里?
* 是什么用错误值调用了这个?
* 持续向上追踪,直到找到源头
* 在源头修复,而不是在症状处
在修复之前找到模式:
找到工作示例
与参考对比
识别差异
理解依赖关系
科学方法:
提出单一假设
最小化测试
在继续之前验证
当你不知道时
修复根本原因,而不是症状:
创建失败的测试用例
superpowers:test-driven-development 技能来编写正确的失败测试实施单一修复
验证修复
如果修复无效
如果 3+ 次修复失败:质疑架构
表明存在架构问题的模式:
* 每次修复都揭示了不同地方的新共享状态/耦合/问题
* 修复需要"大规模重构"才能实施
* 每次修复都会在其他地方产生新症状
停止并质疑基本原则:
* 这个模式从根本上说是合理的吗?
* 我们是否"仅仅因为惯性而坚持它"?
* 我们应该重构架构还是继续修复症状?
在尝试更多修复之前,与你的真人伙伴讨论
这不是一个失败的假设——这是一个错误的架构。
如果你发现自己这样想:
所有这些都意味着:停止。返回阶段 1。
如果 3+ 次修复失败: 质疑架构(见阶段 4.5)
注意这些纠正:
当你看到这些时: 停止。返回阶段 1。
| 借口 | 现实 |
|---|---|
| "问题很简单,不需要流程" | 简单的问题也有根本原因。对于简单的错误,流程很快。 |
| "紧急情况,没时间走流程" | 系统性调试比猜测和检查的胡乱尝试更快。 |
| "先试试这个,然后调查" | 第一次修复设定了模式。从一开始就做对。 |
| "确认修复有效后再写测试" | 未经测试的修复不牢固。先测试能证明它。 |
| "一次进行多个修复可以节省时间" | 无法隔离是什么起了作用。会导致新错误。 |
| "参考太长了,我会调整模式" | 部分理解必然导致错误。完整阅读它。 |
| "我看到了问题,让我修复它" | 看到症状 ≠ 理解根本原因。 |
| "再试一次修复"(在 2+ 次失败后) | 3+ 次失败 = 架构问题。质疑模式,不要再修复。 |
| 阶段 | 关键活动 | 成功标准 |
|---|---|---|
| 1. 根本原因 | 阅读错误、复现、检查变更、收集证据 | 理解是什么和为什么 |
| 2. 模式 | 找到工作示例、比较 | 识别差异 |
| 3. 假设 | 形成理论、最小化测试 | 确认或新假设 |
| 4. 实施 | 创建测试、修复、验证 | 错误解决、测试通过 |
如果系统性调查揭示问题确实是环境性的、时间依赖性的或外部的:
但是: 95% 的"没有根本原因"案例是调查不完整。
这些技术是系统性调试的一部分,可在本目录中找到:
root-cause-tracing.md - 通过调用栈向后追踪错误以找到原始触发点defense-in-depth.md - 在找到根本原因后,在多个层面添加验证condition-based-waiting.md - 用条件轮询替换任意超时相关技能:
来自调试会话:
每周安装
198
仓库
GitHub 星标
23.5K
首次出现
Jan 21, 2026
安全审计
安装于
claude-code165
opencode157
gemini-cli153
cursor152
codex140
github-copilot129
Random fixes waste time and create new bugs. Quick patches mask underlying issues.
Core principle: ALWAYS find root cause before attempting fixes. Symptom fixes are failure.
Violating the letter of this process is violating the spirit of debugging.
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
If you haven't completed Phase 1, you cannot propose fixes.
Use for ANY technical issue:
Use this ESPECIALLY when:
Don't skip when:
You MUST complete each phase before proceeding to the next.
BEFORE attempting ANY fix:
Read Error Messages Carefully
Reproduce Consistently
Check Recent Changes
Gather Evidence in Multi-Component Systems
WHEN system has multiple components (CI → build → signing, API → service → database):
BEFORE proposing fixes, add diagnostic instrumentation:
For EACH component boundary:
- Log what data enters component
- Log what data exits component
- Verify environment/config propagation
- Check state at each layer
Run once to gather evidence showing WHERE it breaks
THEN analyze evidence to identify failing component
THEN investigate that specific component
Example (multi-layer system):
# Layer 1: Workflow
echo "=== Secrets available in workflow: ==="
echo "IDENTITY: ${IDENTITY:+SET}${IDENTITY:-UNSET}"
# Layer 2: Build script
echo "=== Env vars in build script: ==="
env | grep IDENTITY || echo "IDENTITY not in environment"
# Layer 3: Signing script
echo "=== Keychain state: ==="
security list-keychains
security find-identity -v
# Layer 4: Actual signing
codesign --sign "$IDENTITY" --verbose=4 "$APP"
This reveals: Which layer fails (secrets → workflow ✓, workflow → build ✗)
WHEN error is deep in call stack:
See root-cause-tracing.md in this directory for the complete backward tracing technique.
Quick version:
* Where does bad value originate?
* What called this with bad value?
* Keep tracing up until you find the source
* Fix at source, not at symptom
Find the pattern before fixing:
Find Working Examples
Compare Against References
Identify Differences
Understand Dependencies
Scientific method:
Form Single Hypothesis
Test Minimally
Verify Before Continuing
When You Don't Know
Fix the root cause, not the symptom:
Create Failing Test Case
superpowers:test-driven-development skill for writing proper failing testsImplement Single Fix
Verify Fix
If Fix Doesn't Work
If 3+ Fixes Failed: Question Architecture
Pattern indicating architectural problem:
* Each fix reveals new shared state/coupling/problem in different place
* Fixes require "massive refactoring" to implement
* Each fix creates new symptoms elsewhere
STOP and question fundamentals:
* Is this pattern fundamentally sound?
* Are we "sticking with it through sheer inertia"?
* Should we refactor architecture vs. continue fixing symptoms?
Discuss with your human partner before attempting more fixes
This is NOT a failed hypothesis - this is a wrong architecture.
If you catch yourself thinking:
ALL of these mean: STOP. Return to Phase 1.
If 3+ fixes failed: Question the architecture (see Phase 4.5)
Watch for these redirections:
When you see these: STOP. Return to Phase 1.
| Excuse | Reality |
|---|---|
| "Issue is simple, don't need process" | Simple issues have root causes too. Process is fast for simple bugs. |
| "Emergency, no time for process" | Systematic debugging is FASTER than guess-and-check thrashing. |
| "Just try this first, then investigate" | First fix sets the pattern. Do it right from the start. |
| "I'll write test after confirming fix works" | Untested fixes don't stick. Test first proves it. |
| "Multiple fixes at once saves time" | Can't isolate what worked. Causes new bugs. |
| "Reference too long, I'll adapt the pattern" | Partial understanding guarantees bugs. Read it completely. |
| "I see the problem, let me fix it" | Seeing symptoms ≠ understanding root cause. |
| "One more fix attempt" (after 2+ failures) | 3+ failures = architectural problem. Question pattern, don't fix again. |
| Phase | Key Activities | Success Criteria |
|---|---|---|
| 1. Root Cause | Read errors, reproduce, check changes, gather evidence | Understand WHAT and WHY |
| 2. Pattern | Find working examples, compare | Identify differences |
| 3. Hypothesis | Form theory, test minimally | Confirmed or new hypothesis |
| 4. Implementation | Create test, fix, verify | Bug resolved, tests pass |
If systematic investigation reveals issue is truly environmental, timing-dependent, or external:
But: 95% of "no root cause" cases are incomplete investigation.
These techniques are part of systematic debugging and available in this directory:
root-cause-tracing.md - Trace bugs backward through call stack to find original triggerdefense-in-depth.md - Add validation at multiple layers after finding root causecondition-based-waiting.md - Replace arbitrary timeouts with condition pollingRelated skills:
From debugging sessions:
Weekly Installs
198
Repository
GitHub Stars
23.5K
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubFailSocketPassSnykPass
Installed on
claude-code165
opencode157
gemini-cli153
cursor152
codex140
github-copilot129
站立会议模板:敏捷开发每日站会指南与工具(含远程团队异步模板)
10,500 周安装