systematic-debugging by secondsky/claude-skills
npx skills add https://github.com/secondsky/claude-skills --skill systematic-debugging随机修复浪费时间并会引入新的错误。快速补丁掩盖了根本问题。
核心原则: 在尝试修复之前,必须找到根本原因。症状修复意味着失败。
违反此流程的字面要求就是违背调试的精神。
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
如果你没有完成第一阶段,就不能提出修复方案。
适用于任何技术问题:
在以下情况下尤其要使用此方法:
在以下情况下不要跳过:
在进入下一阶段之前,必须完成当前阶段。
在尝试任何修复之前:
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
稳定复现
检查最近的变更
在多组件系统中收集证据
当系统有多个组件时(CI → 构建 → 签名,API → 服务 → 数据库):
在提出修复方案之前,添加诊断工具:
For EACH component boundary:
- Log what data enters component
- Log what data exits component
- Verify environment/config propagation
- Check state at each layer
Run once to gather evidence showing WHERE it breaks
THEN analyze evidence to identify failing component
THEN investigate that specific component
示例(多层系统):
# Layer 1: Workflow
echo "=== Secrets available in workflow: ==="
echo "IDENTITY: ${IDENTITY:+SET}${IDENTITY:-UNSET}"
# Layer 2: Build script
echo "=== Env vars in build script: ==="
env | grep IDENTITY || echo "IDENTITY not in environment"
# Layer 3: Signing script
echo "=== Keychain state: ==="
security list-keychains
security find-identity -v
# Layer 4: Actual signing
codesign --sign "$IDENTITY" --verbose=4 "$APP"
这揭示了: 哪一层失败了(密钥 → 工作流 ✓,工作流 → 构建 ✗)
当错误深藏在调用栈中时:
请参阅 root-cause-tracing 技能以了解向后追踪技术
快速版本:
* 错误值源自何处?
* 是什么用错误值调用了这个?
* 持续向上追踪,直到找到源头
* 在源头修复,而不是在症状处修复
在修复之前找到模式:
找到工作示例
与参考进行比较
识别差异
理解依赖关系
科学方法:
形成单一假设
最小化测试
在继续之前验证
当你不知道时
修复根本原因,而不是症状:
创建失败的测试用例
尽可能简单的复现
如果可能,使用自动化测试
如果没有框架,使用一次性测试脚本
在修复之前必须拥有
bun test my-fix.test.ts
# Or with npm
npm test -- my-fix.test.ts
2. 实施单一修复
* 解决已识别的根本原因
* 一次一个更改
* 不要"趁我在这里"进行改进
* 不要捆绑重构
3. 验证修复
* 现在测试通过了吗?
* 没有破坏其他测试吗?
* 问题真的解决了吗?
# Run full test suite
bun test # or: npm test
# Run specific test
bun test --grep "my fix"
4. 如果修复无效
* 停止
* 计数:你尝试了多少次修复?
* 如果 < 3:返回第一阶段,用新信息重新分析
* **如果 ≥ 3:停止并质疑架构(见下面的步骤 5)**
* 在没有架构讨论的情况下,不要尝试第 4 次修复
5. 如果 3+ 次修复失败:质疑架构
表明存在架构问题的模式:
* 每次修复都揭示了不同地方的新共享状态/耦合/问题
* 修复需要"大规模重构"才能实施
* 每次修复都会在其他地方产生新的症状
停止并质疑基本原则:
* 这种模式从根本上说是合理的吗?
* 我们是否"仅仅因为惯性而坚持它"?
* 我们应该重构架构还是继续修复症状?
在尝试更多修复之前,与你的合作伙伴讨论
这不是一个失败的假设 - 这是一个错误的架构。
如果你发现自己有这些想法:
所有这些都意味着:停止。返回第一阶段。
如果 3+ 次修复失败: 质疑架构(见第四阶段第 5 步)
| 借口 | 现实 |
|---|---|
| "问题很简单,不需要流程" | 简单的问题也有根本原因。对于简单的错误,流程很快。 |
| "紧急情况,没时间走流程" | 系统性调试比猜测和检查的胡乱尝试更快。 |
| "先试试这个,然后再调查" | 第一次修复就设定了模式。从一开始就做对。 |
| "确认修复有效后再写测试" | 未经测试的修复不牢固。先测试可以证明它。 |
| "一次修复多个问题节省时间" | 无法隔离是什么起了作用。会导致新的错误。 |
| "参考太长了,我会调整模式" | 部分理解必然导致错误。完整阅读它。 |
| "我看到了问题,让我修复它" | 看到症状 ≠ 理解根本原因。 |
| "再尝试一次修复"(在 2+ 次失败后) | 3+ 次失败 = 架构问题。质疑模式,不要再修复。 |
| 阶段 | 关键活动 | 成功标准 |
|---|---|---|
| 1. 根本原因 | 阅读错误、复现、检查变更、收集证据 | 理解是什么和为什么 |
| 2. 模式 | 找到工作示例,进行比较 | 识别差异 |
| 3. 假设 | 形成理论,最小化测试 | 确认或形成新假设 |
| 4. 实施 | 创建测试、修复、验证 | 错误解决,测试通过 |
如果系统调查揭示问题确实是环境性的、时间依赖性的或外部的:
但是: 95% 的"没有根本原因"案例是调查不完整。
此技能可与以下技能配合使用:
来自调试会话的数据:
每周安装量
80
仓库
GitHub 星标数
91
首次出现
Jan 25, 2026
安全审计
安装于
claude-code67
gemini-cli63
codex63
opencode62
cursor60
github-copilot57
Random fixes waste time and create new bugs. Quick patches mask underlying issues.
Core principle: ALWAYS find root cause before attempting fixes. Symptom fixes are failure.
Violating the letter of this process is violating the spirit of debugging.
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
If you haven't completed Phase 1, you cannot propose fixes.
Use for ANY technical issue:
Use this ESPECIALLY when:
Don't skip when:
You MUST complete each phase before proceeding to the next.
BEFORE attempting ANY fix:
Read Error Messages Carefully
Reproduce Consistently
Check Recent Changes
Gather Evidence in Multi-Component Systems
WHEN system has multiple components (CI → build → signing, API → service → database):
BEFORE proposing fixes, add diagnostic instrumentation:
For EACH component boundary:
- Log what data enters component
- Log what data exits component
- Verify environment/config propagation
- Check state at each layer
Run once to gather evidence showing WHERE it breaks
THEN analyze evidence to identify failing component
THEN investigate that specific component
Example (multi-layer system):
# Layer 1: Workflow
echo "=== Secrets available in workflow: ==="
echo "IDENTITY: ${IDENTITY:+SET}${IDENTITY:-UNSET}"
# Layer 2: Build script
echo "=== Env vars in build script: ==="
env | grep IDENTITY || echo "IDENTITY not in environment"
# Layer 3: Signing script
echo "=== Keychain state: ==="
security list-keychains
security find-identity -v
# Layer 4: Actual signing
codesign --sign "$IDENTITY" --verbose=4 "$APP"
This reveals: Which layer fails (secrets → workflow ✓, workflow → build ✗)
WHEN error is deep in call stack:
See root-cause-tracing skill for backward tracing technique
Quick version:
* Where does bad value originate?
* What called this with bad value?
* Keep tracing up until you find the source
* Fix at source, not at symptom
Find the pattern before fixing:
Find Working Examples
Compare Against References
Identify Differences
Understand Dependencies
Scientific method:
Form Single Hypothesis
Test Minimally
Verify Before Continuing
When You Don't Know
Fix the root cause, not the symptom:
Create Failing Test Case
Simplest possible reproduction
Automated test if possible
One-off test script if no framework
MUST have before fixing
bun test my-fix.test.ts
# Or with npm
npm test -- my-fix.test.ts
2. Implement Single Fix
* Address the root cause identified
* ONE change at a time
* No "while I'm here" improvements
* No bundled refactoring
3. Verify Fix
* Test passes now?
* No other tests broken?
* Issue actually resolved?
# Run full test suite
bun test # or: npm test
# Run specific test
bun test --grep "my fix"
4. If Fix Doesn't Work
* STOP
* Count: How many fixes have you tried?
* If < 3: Return to Phase 1, re-analyze with new information
* **If ≥ 3: STOP and question the architecture (step 5 below)**
* DON'T attempt Fix #4 without architectural discussion
5. If 3+ Fixes Failed: Question Architecture
Pattern indicating architectural problem:
* Each fix reveals new shared state/coupling/problem in different place
* Fixes require "massive refactoring" to implement
* Each fix creates new symptoms elsewhere
STOP and question fundamentals:
* Is this pattern fundamentally sound?
* Are we "sticking with it through sheer inertia"?
* Should we refactor architecture vs. continue fixing symptoms?
Discuss with your human partner before attempting more fixes
This is NOT a failed hypothesis - this is a wrong architecture.
If you catch yourself thinking:
ALL of these mean: STOP. Return to Phase 1.
If 3+ fixes failed: Question the architecture (see Phase 4.5)
| Excuse | Reality |
|---|---|
| "Issue is simple, don't need process" | Simple issues have root causes too. Process is fast for simple bugs. |
| "Emergency, no time for process" | Systematic debugging is FASTER than guess-and-check thrashing. |
| "Just try this first, then investigate" | First fix sets the pattern. Do it right from the start. |
| "I'll write test after confirming fix works" | Untested fixes don't stick. Test first proves it. |
| "Multiple fixes at once saves time" | Can't isolate what worked. Causes new bugs. |
| "Reference too long, I'll adapt the pattern" | Partial understanding guarantees bugs. Read it completely. |
| "I see the problem, let me fix it" | Seeing symptoms ≠ understanding root cause. |
| "One more fix attempt" (after 2+ failures) | 3+ failures = architectural problem. Question pattern, don't fix again. |
| Phase | Key Activities | Success Criteria |
|---|---|---|
| 1. Root Cause | Read errors, reproduce, check changes, gather evidence | Understand WHAT and WHY |
| 2. Pattern | Find working examples, compare | Identify differences |
| 3. Hypothesis | Form theory, test minimally | Confirmed or new hypothesis |
| 4. Implementation | Create test, fix, verify | Bug resolved, tests pass |
If systematic investigation reveals issue is truly environmental, timing-dependent, or external:
But: 95% of "no root cause" cases are incomplete investigation.
This skill works with:
From debugging sessions:
Weekly Installs
80
Repository
GitHub Stars
91
First Seen
Jan 25, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
claude-code67
gemini-cli63
codex63
opencode62
cursor60
github-copilot57
Go语言故障排除与调试指南:系统化解决编译、崩溃、性能问题
657 周安装
UI动画性能优化指南:修复卡顿、提升流畅度,CSS/JS动画最佳实践
10,700 周安装
Gmail 邮件监控工具 - 实时流式推送新邮件到 Pub/Sub | Google Workspace CLI
10,800 周安装
Spring Boot 最佳实践指南:项目结构、依赖注入、配置、Web层与安全
11,000 周安装
NestJS最佳实践指南:40条规则提升架构、性能与安全性
11,100 周安装
iOS移动设计指南:掌握SwiftUI与HIG,构建原生精致Apple应用
11,100 周安装
Firecrawl Browser交互工具:抓取页面后点击、填写表单、导航与数据提取
11,200 周安装