根因追溯技巧：快速定位软件Bug根本原因与调用链分析方法

kaizen%3Aroot-cause-tracing by neolabhq/context-engineering-kit

306 周安装量

739 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/neolabhq/context-engineering-kit --skill kaizen:root-cause-tracing

开发代码质量测试

🇨🇳中文介绍

根因追溯

概述

错误通常深藏在调用栈中显现（例如在错误目录中执行 git init、文件创建在错误位置、使用错误路径打开数据库）。你的直觉是在错误出现的地方修复它，但这只是在处理症状。

核心原则： 沿着调用链向后追溯，直到找到最初的触发点，然后在源头进行修复。

何时使用

digraph when_to_use {
    "Bug appears deep in stack?" [shape=diamond];
    "Can trace backwards?" [shape=diamond];
    "Fix at symptom point" [shape=box];
    "Trace to original trigger" [shape=box];
    "BETTER: Also add defense-in-depth" [shape=box];

    "Bug appears deep in stack?" -> "Can trace backwards?" [label="yes"];
    "Can trace backwards?" -> "Trace to original trigger" [label="yes"];
    "Can trace backwards?" -> "Fix at symptom point" [label="no - dead end"];
    "Trace to original trigger" -> "BETTER: Also add defense-in-depth";
}

在以下情况使用：

错误发生在执行过程的深处（不在入口点）
堆栈跟踪显示很长的调用链
不清楚无效数据源自何处
需要找出是哪个测试/代码触发了问题

追溯过程

1. 观察症状

Error: git init failed in /Users/jesse/project/packages/core

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

2. 找到直接原因

是什么代码直接导致了这个问题？

await execFileAsync('git', ['init'], { cwd: projectDir });

3. 提问：是谁调用了这个？

WorktreeManager.createSessionWorktree(projectDir, sessionId)
  → called by Session.initializeWorkspace()
  → called by Session.create()
  → called by test at Project.create()

4. 继续向上追溯

传递了什么值？

projectDir = '' （空字符串！）
作为 cwd 的空字符串会解析为 process.cwd()
那就是源代码目录！

5. 找到原始触发点

空字符串是从哪里来的？

const context = setupCoreTest(); // Returns { tempDir: '' }
Project.create('name', context.tempDir); // Accessed before beforeEach!

当你无法手动追溯时，添加检测代码：

// 在有问题的操作之前
async function gitInit(directory: string) {
  const stack = new Error().stack;
  console.error('DEBUG git init:', {
    directory,
    cwd: process.cwd(),
    nodeEnv: process.env.NODE_ENV,
    stack,
  });

  await execFileAsync('git', ['init'], { cwd: directory });
}

关键点： 在测试中使用 console.error()（而不是日志记录器——可能不会显示）

运行并捕获输出：

npm test 2>&1 | grep 'DEBUG git init'

分析堆栈跟踪：

查找测试文件名
找到触发调用的行号
识别模式（相同的测试？相同的参数？）

找出哪个测试造成了污染

如果某物在测试期间出现，但你不知道是哪个测试造成的：

使用二分查找脚本：@find-polluter.sh

./find-polluter.sh '.git' 'src/**/*.test.ts'

逐个运行测试，在第一个污染者处停止。查看脚本了解用法。

真实示例：空的 projectDir

症状： .git 被创建在 packages/core/（源代码目录）中

git init 在 process.cwd() 中运行 ← 空的 cwd 参数
WorktreeManager 被调用时传入空的 projectDir
Session.create() 传递了空字符串
测试在 beforeEach 之前访问了 context.tempDir
setupCoreTest() 最初返回 { tempDir: '' }

根本原因： 顶级变量初始化时访问了空值

修复： 将 tempDir 改为一个 getter，如果在 beforeEach 之前访问则抛出错误

同时添加了纵深防御：

第 1 层：Project.create() 验证目录
第 2 层：WorkspaceManager 验证非空
第 3 层：NODE_ENV 防护，拒绝在 tmpdir 之外执行 git init
第 4 层：在 git init 之前记录堆栈跟踪

digraph principle {
    "Found immediate cause" [shape=ellipse];
    "Can trace one level up?" [shape=diamond];
    "Trace backwards" [shape=box];
    "Is this the source?" [shape=diamond];
    "Fix at source" [shape=box];
    "Add validation at each layer" [shape=box];
    "Bug impossible" [shape=doublecircle];
    "NEVER fix just the symptom" [shape=octagon, style=filled, fillcolor=red, fontcolor=white];

    "Found immediate cause" -> "Can trace one level up?";
    "Can trace one level up?" -> "Trace backwards" [label="yes"];
    "Can trace one level up?" -> "NEVER fix just the symptom" [label="no"];
    "Trace backwards" -> "Is this the source?";
    "Is this the source?" -> "Trace backwards" [label="no - keeps going"];
    "Is this the source?" -> "Fix at source" [label="yes"];
    "Fix at source" -> "Add validation at each layer";
    "Add validation at each layer" -> "Bug impossible";
}

切勿仅仅在错误出现的地方修复。 向后追溯以找到原始触发点。

在测试中： 使用 console.error() 而不是日志记录器——日志记录器可能被抑制 在操作之前： 在危险操作之前记录，而不是在它失败之后 包含上下文： 目录、cwd、环境变量、时间戳 捕获堆栈： new Error().stack 显示完整的调用链

来自调试会话（2025-10-03）：

通过 5 级追溯找到了根本原因
在源头进行了修复（getter 验证）
添加了 4 层防御
1847 个测试通过，零污染

🇺🇸English

Root Cause Tracing

Overview

Bugs often manifest deep in the call stack (git init in wrong directory, file created in wrong location, database opened with wrong path). Your instinct is to fix where the error appears, but that's treating a symptom.

Core principle: Trace backward through the call chain until you find the original trigger, then fix at the source.

When to Use

digraph when_to_use {
    "Bug appears deep in stack?" [shape=diamond];
    "Can trace backwards?" [shape=diamond];
    "Fix at symptom point" [shape=box];
    "Trace to original trigger" [shape=box];
    "BETTER: Also add defense-in-depth" [shape=box];

    "Bug appears deep in stack?" -> "Can trace backwards?" [label="yes"];
    "Can trace backwards?" -> "Trace to original trigger" [label="yes"];
    "Can trace backwards?" -> "Fix at symptom point" [label="no - dead end"];
    "Trace to original trigger" -> "BETTER: Also add defense-in-depth";
}

Use when:

Error happens deep in execution (not at entry point)
Stack trace shows long call chain
Unclear where invalid data originated
Need to find which test/code triggers the problem

The Tracing Process

1. Observe the Symptom

Error: git init failed in /Users/jesse/project/packages/core

2. Find Immediate Cause

What code directly causes this?

await execFileAsync('git', ['init'], { cwd: projectDir });

3. Ask: What Called This?

WorktreeManager.createSessionWorktree(projectDir, sessionId)
  → called by Session.initializeWorkspace()
  → called by Session.create()
  → called by test at Project.create()

4. Keep Tracing Up

What value was passed?

projectDir = '' (empty string!)
Empty string as cwd resolves to process.cwd()
That's the source code directory!

5. Find Original Trigger

Where did empty string come from?

const context = setupCoreTest(); // Returns { tempDir: '' }
Project.create('name', context.tempDir); // Accessed before beforeEach!

Adding Stack Traces

When you can't trace manually, add instrumentation:

// Before the problematic operation
async function gitInit(directory: string) {
  const stack = new Error().stack;
  console.error('DEBUG git init:', {
    directory,
    cwd: process.cwd(),
    nodeEnv: process.env.NODE_ENV,
    stack,
  });

  await execFileAsync('git', ['init'], { cwd: directory });
}

Critical: Use console.error() in tests (not logger - may not show)

Run and capture:

npm test 2>&1 | grep 'DEBUG git init'

Analyze stack traces:

Look for test file names
Find the line number triggering the call
Identify the pattern (same test? same parameter?)

Finding Which Test Causes Pollution

If something appears during tests but you don't know which test:

Use the bisection script: @find-polluter.sh

./find-polluter.sh '.git' 'src/**/*.test.ts'

Runs tests one-by-one, stops at first polluter. See script for usage.

Real Example: Empty projectDir

Symptom: .git created in packages/core/ (source code)

Trace chain:

git init runs in process.cwd() ← empty cwd parameter
WorktreeManager called with empty projectDir
Session.create() passed empty string
Test accessed context.tempDir before beforeEach
setupCoreTest() returns { tempDir: '' } initially

Root cause: Top-level variable initialization accessing empty value

Fix: Made tempDir a getter that throws if accessed before beforeEach

Also added defense-in-depth:

Layer 1: Project.create() validates directory
Layer 2: WorkspaceManager validates not empty
Layer 3: NODE_ENV guard refuses git init outside tmpdir
Layer 4: Stack trace logging before git init

Key Principle

digraph principle {
    "Found immediate cause" [shape=ellipse];
    "Can trace one level up?" [shape=diamond];
    "Trace backwards" [shape=box];
    "Is this the source?" [shape=diamond];
    "Fix at source" [shape=box];
    "Add validation at each layer" [shape=box];
    "Bug impossible" [shape=doublecircle];
    "NEVER fix just the symptom" [shape=octagon, style=filled, fillcolor=red, fontcolor=white];

    "Found immediate cause" -> "Can trace one level up?";
    "Can trace one level up?" -> "Trace backwards" [label="yes"];
    "Can trace one level up?" -> "NEVER fix just the symptom" [label="no"];
    "Trace backwards" -> "Is this the source?";
    "Is this the source?" -> "Trace backwards" [label="no - keeps going"];
    "Is this the source?" -> "Fix at source" [label="yes"];
    "Fix at source" -> "Add validation at each layer";
    "Add validation at each layer" -> "Bug impossible";
}

NEVER fix just where the error appears. Trace back to find the original trigger.

Stack Trace Tips

In tests: Use console.error() not logger - logger may be suppressed Before operation: Log before the dangerous operation, not after it fails Include context: Directory, cwd, environment variables, timestamps Capture stack: new Error().stack shows complete call chain

Real-World Impact

From debugging session (2025-10-03):

Found root cause through 5-level trace
Fixed at source (getter validation)
Added 4 layers of defense
1847 tests passed, zero pollution

Weekly Installs

215

Repository

neolabhq/contex…ring-kit

GitHub Stars

699

First Seen

Feb 19, 2026

Installed on

opencode209

codex207

github-copilot207

gemini-cli206

kimi-cli204

amp204

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

109,600 周安装