TDD五步循环：AI代码生成与测试驱动开发实践指南（RED/GREEN/COMMIT）

tdd by jwilger/agent-skills

102 周安装量

2 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/jwilger/agent-skills --skill tdd

软件工程自动化测试

🇨🇳中文介绍

TDD

价值： 反馈 —— 具有可验证证据的短周期，确保 AI 生成的代码诚实可靠，并由人类掌控。测试表达意图；证据确认进展。

目的

教授一个五步 TDD 循环（RED、DOMAIN、GREEN、DOMAIN、COMMIT），该循环可适应任何运行它的测试框架。检测可用的委托原语，并路由到引导模式（人类驱动每个阶段）或自动模式（系统编排各阶段）。无论何种模式，都能防止原始类型迷恋、跳过评审以及未经测试的复杂性。

实践

五步循环

每个功能都通过重复以下步骤构建：RED -> DOMAIN -> GREEN -> DOMAIN -> COMMIT。

RED -- 编写一个包含一个断言的失败测试。仅编辑测试文件。编写你希望拥有的代码——引用尚不存在的类型和函数。运行测试。粘贴失败输出。停止。完成条件：测试运行并失败（编译错误或断言失败）。
DOMAIN (RED 之后) -- 评审测试是否存在原始类型迷恋和无效状态风险。创建带有存根体（todo!()、raise NotImplementedError 等）的类型定义。不要实现逻辑。停止。完成条件：测试编译但仍然失败（断言失败/恐慌，而非编译错误）。
GREEN -- 解决当前错误——切勿一次性“让测试通过”。每次更改前进行范围检查：这个修复是否能在 ~函数范围内完成（~20 行，一个文件）？是 → 进行更改，运行测试，检查下一个错误。否 → 通过为所需的最小部分编写一个失败的单元测试来深入探究，然后通过角色互换将其纳入标准的 TDD 循环。仅编辑生产文件（深入探究时除外）。每次更改后粘贴输出。完成条件：测试通过，且实现最简化。
DOMAIN (GREEN 之后) -- 评审实现是否存在领域违规：贫血模型、泄露的验证、遗漏的原始类型迷恋。如果发现违规，提出疑虑并建议修订。完成条件：类型清晰且测试仍然通过。

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

面向用户的模式

引导模式 (/tdd red, /tdd domain, /tdd green, /tdd commit)：每个阶段加载 references/{phase}.md，其中包含该步骤的详细说明。适用于希望明确控制阶段的经验丰富的工程师。适用于任何测试框架——无需委托原语。人类决定何时推进阶段。

自动模式 (/tdd 或 /tdd auto)：系统检测测试框架能力，选择执行策略，并编排完整的循环。用户看到的是可运行的代码，而非制作过程。要查看显示阶段转换和证据的详细输出，请使用 /tdd auto --verbose。

能力检测（自动模式）

当自动模式激活时，按以下顺序检测可用的原语：

子代理可用吗？ 检查 Agent 工具。如果存在，使用子代理策略，每个阶段使用专注的代理。
回退方案。 使用链式策略——在单个上下文内，阶段之间进行内部角色切换。

选择可用的最强大策略。当其原语缺失时，不要尝试更高级的策略。

你是编排者。 阅读此文件的代理执行能力检测并直接分派。不要生成一个单独的“编排者”子代理来为你执行——这会隐藏工作、绕过策略检测并预先选择错误的策略。无论你是被 /tdd、流水线还是任何其他调用者调用：你检测能力，你选择策略，你自己生成阶段代理。

确定策略后，仅读取该策略的入口文件：

策略	入口文件
子代理	`references/orchestrator.md`
链式	（无入口文件——请遵循下面的链式部分）

使用链式策略时，不要读取 orchestrator.md。

orchestrator.md 引用了 references/shared-rules.md，其中包含适用于所有策略的规则（领域否决、由外向内推进、流水线集成、预实现上下文检查清单）。当策略的入口文件指示时，请阅读 shared-rules.md。

执行策略：链式（回退方案）

当没有可用的委托原语时使用。代理按顺序扮演每个角色：

加载 references/red.md。执行 RED 阶段。
加载 references/domain.md。执行测试的 DOMAIN 评审。
加载 references/green.md。执行 GREEN 阶段。
加载 references/domain.md。执行实现的 DOMAIN 评审。
加载 references/commit.md。执行 COMMIT 阶段。
重复。

在此模式下，角色边界是建议性的。代理必须自我强制执行阶段边界：仅编辑当前阶段允许的文件类型（参见 references/phase-boundaries.md）。

执行策略：子代理

当 Agent 工具可用于生成专注的子代理时使用。每个阶段在具有受限范围的独立子代理中运行。

使用 Agent(subagent_type="<agent-name>", prompt="...") 生成每个阶段代理，其中提示模板位于 references/{phase}-prompt.md。
编排者遵循 references/orchestrator.md 中的协调规则。
结构性交接模式 (references/handoff-schema.md)：每个阶段代理必须返回证据字段（测试输出、更改的文件路径、领域疑虑）。缺少证据字段 = 交接被阻止。在模式满足之前，编排者不会进入下一阶段。
上下文隔离提供了结构性强制执行：每个子代理仅接收与其阶段相关的文件。

命名的团队成员角色（子代理策略）

当 .claude/agents/ 定义存在时（来自 ensemble-team 技能），子代理策略使用命名的角色来处理 ping 和 pong 角色。编排者根据切片上下文选择团队成员，使用 Agent(subagent_type="<agent-name>", prompt="...") 将它们生成为子代理，并收集结果以作为上下文传递给下一个子代理。

有关协调规则，请参见 references/orchestrator.md；有关角色选择、轮换和配对历史，请参见 references/ping-pong-pairing.md。

每个阶段仅编辑其自身的文件类型。这可以防止偏离。完整的文件类型矩阵请参见 references/phase-boundaries.md。

阶段	可以编辑	不能编辑
RED	测试文件	生产代码、类型定义
DOMAIN	类型定义（存根）	测试逻辑、实现体
GREEN	实现体	测试文件、类型签名
COMMIT	无——仅 git 操作	所有源文件

如果被边界阻止，请停止并返回编排者（自动模式）或向用户报告（引导模式）。切勿绕过边界。

先构建行走骨架

第一个垂直切片必须是行走骨架：证明所有架构层连接的最薄端到端路径。它可以使用硬编码值或存根。在任何其他切片之前构建它。它可以降低架构风险，并为后续切片提供一个经过验证的扩展路径。

从应用程序边界的验收测试开始——外部输入进入系统的点。通过单元测试向内深入。外部的验收测试保持 RED 状态，而内部的单元测试则经历它们自己的 red-green-domain-commit 循环。只有当外部的验收测试通过时，切片才算完成。

直接调用内部函数的测试是单元测试，而不是验收测试——即使它断言的是用户可见的行为。

按模式进行的边界强制执行：

流水线模式： CYCLE_COMPLETE 证据必须包含验收测试的 boundary_type 和 boundary_evidence。如果验收测试直接调用内部函数，流水线的 TDD 关卡将拒绝该证据。
自动模式（非流水线）： 编排者检查边界范围，如果第一个测试不是边界测试，则重新委托。建议性——没有关卡阻止进展。
引导模式： 人类负责确保边界级别的测试。技能文本指导正确行为但无法强制执行。

在每个完整的 RED-DOMAIN-GREEN-DOMAIN-COMMIT 循环结束时，生成一个 CYCLE_COMPLETE 证据包，包含：slice_id、acceptance_test {file, name, output, boundary_type, boundary_evidence}、unit_tests {count, all_passing, output}、domain_reviews [{phase, verdict, concerns}]、commits [{hash, message}]、rework_cycles、team {ping, pong, domain_reviewer}。

当上下文元数据中提供了 pipeline-state 时，TDD 技能在流水线模式下运行：它接收一个 slice_id 并将证据存储到 .factory/audit-trail/slices/<slice-id>/tdd-cycles/cycle-NNN.json。独立运行时，证据仅用于信息参考（不存储）。

完整模式请参见 references/cycle-evidence.md。

特定测试框架指导

如果在 Claude Code 上运行，还请阅读 references/claude-code.md 以获取特定于测试框架的规则，包括基于钩子的强制执行。为了获得最大的机械强制执行，请要求 bootstrap 技能从 references/hooks/claude-code-hooks.json 安装可选的钩子。

引导模式：建议性。人类通过控制阶段转换来强制执行。
链式模式：建议性。代理自我强制执行阶段边界。
子代理模式：结构性。上下文隔离和交接模式强制执行阶段边界。缺少证据会阻止交接。
流水线模式：关卡性。证据关卡拒绝不完整的阶段转换。
可选钩子（Claude Code）：机械性。预工具使用钩子根据阶段阻止未经授权的文件编辑。参见 references/claude-code.md。

阶段边界违规（错误阶段编辑错误文件类型）：[H]
领域否决升级（有争议的设计决策）：[RP]
提交关卡（在先前的循环提交之前不得开始新的 RED）：[H]

有关流水线返工预算冲突和流水线模式下的领域否决解决，请参见模板目录中的 references/constraint-resolution.md。

链式模式自我强制执行：自我强制执行意味着你产生的文件类型限制与单独的代理强制执行时相同。在 RED 阶段编写生产代码违反了此约束，即使没有机制阻止它。如果你发现自己正在思考为什么在链式模式下阶段边界不适用，那么你就违反了它。
"~20 行，一个文件" 范围检查：这是一个判断启发式方法，而不是精确的阈值。其精神是：如果更改涉及多个关注点、多个文件，或者需要理解远处的代码，那么对于一个单独的循环来说就太大了。不要通过在一个文件中进行跨 2 个函数的 40 行更改并声称是“一个文件”来钻空子。

完成一个循环后，验证：

每个失败的测试都在其实现之前编写
在每个 RED 和 GREEN 阶段之后都进行了领域评审
遵守了阶段边界规则（文件类型限制）
每次交接都提供了证据（测试输出）
每个完成的 RED-GREEN 循环都有提交存在
GREEN 阶段一次解决一个失败（不是一次性完成完整实现）
每次 COMMIT 后工作树都是干净的（已验证 git status）
首先完成了行走骨架（第一个垂直切片）

硬性关卡 -- COMMIT（在任何新的 RED 阶段开始前必须通过）：

所有测试通过
创建了 Git 提交，消息引用当前的 GWT 场景
在此提交完成之前，没有开始新的 RED 阶段

此技能可独立工作。为了增强工作流，它与以下技能集成：

domain-modeling： 通过“解析而非验证”、语义类型和无效状态预防原则来加强领域评审阶段。
code-review： 在 TDD 循环完成后进行三阶段评审（规范符合性、代码质量、领域完整性）。
mutation-testing： 通过检查测试是否能检测生产代码中注入的突变来验证测试质量。
ensemble-team： 为结对选择和群体评审提供真实世界的专家角色。

缺少依赖项？使用以下命令安装：

npx skills add jwilger/agent-skills --skill domain-modeling

🇺🇸English

TDD

Value: Feedback -- short cycles with verifiable evidence keep AI-generated code honest and the human in control. Tests express intent; evidence confirms progress.

Purpose

Teaches a five-step TDD cycle (RED, DOMAIN, GREEN, DOMAIN, COMMIT) that adapts to whatever harness runs it. Detects available delegation primitives and routes to guided mode (human drives each phase) or automated mode (system orchestrates phases). Prevents primitive obsession, skipped reviews, and untested complexity regardless of mode.

Practices

The Five-Step Cycle

Every feature is built by repeating: RED -> DOMAIN -> GREEN -> DOMAIN -> COMMIT.

RED -- Write one failing test with one assertion. Only edit test files. Write the code you wish you had -- reference types and functions that do not exist yet. Run the test. Paste the failure output. Stop. Done when: tests run and FAIL (compilation error OR assertion failure).
DOMAIN (after RED) -- Review the test for primitive obsession and invalid-state risks. Create type definitions with stub bodies (todo!(), raise NotImplementedError, etc.). Do not implement logic. Stop. Done when: tests COMPILE but still FAIL (assertion/panic, not compilation error).
GREEN -- Address the immediate error — NEVER "make the test pass" in one go. Scope check before every change: can this be fixed with ~function-scope work (~20 lines, one file)? YES → make the change, run tests, check the next error. NO → drill down by writing a failing unit test for the smallest piece needed, then route it through a standard TDD cycle with swapped roles. Only edit production files (except when drilling down). Paste output after each change. Done when: tests PASS with minimal implementation.
DOMAIN (after GREEN) -- Review the implementation for domain violations: anemic models, leaked validation, primitive obsession that slipped through. If violations found, raise a concern and propose a revision. Done when: types are clean and tests still pass.
COMMIT -- Run the full test suite. Stage all changes and create a git commit referencing the GWT scenario. Run git status after committing to verify no uncommitted files remain. This is a hard gate : no new RED phase may begin until this commit exists and the working tree is clean. Done when: git commit created, all tests passing, working tree clean.

After step 5, either start the next RED phase or tidy the code (structural changes only, separate commit).

A compilation failure IS a test failure. Do not pre-create types to avoid compilation errors. Types flow FROM tests, never precede them.

Domain review has veto power over primitive obsession and invalid-state representability. Debate continues until resolved or escalated to the human — there is no round limit.

User-Facing Modes

Guided mode (/tdd red, /tdd domain, /tdd green, /tdd commit): Each phase loads references/{phase}.md with detailed instructions for that step. For experienced engineers who want explicit phase control. Works on any harness -- no delegation primitives required. The human decides when to advance phases.

Automated mode (/tdd or /tdd auto): The system detects harness capabilities, selects an execution strategy, and orchestrates the full cycle. The user sees working code, not sausage-making. For verbose output showing phase transitions and evidence, use /tdd auto --verbose.

Capability Detection (Automated Mode)

When automated mode activates, detect available primitives in this order:

Subagents available? Check for Agent tool. If present, use the subagents strategy with focused per-phase agents.
Fallback. Use the chaining strategy -- role-switch internally between phases within a single context.

Select the most capable strategy available. Do not attempt a higher strategy when its primitives are missing.

You are the orchestrator. The agent reading this file performs capability detection and dispatches directly. Do NOT spawn a single "orchestrator" subagent to do it for you -- that hides work, bypasses strategy detection, and pre-selects the wrong strategy. Whether you were invoked by /tdd, by the pipeline, or by any other caller: you detect capabilities, you choose the strategy, you spawn the phase agents yourself.

After determining your strategy, read ONLY the entry-point file for that strategy:

Strategy	Entry-point file
Subagents	`references/orchestrator.md`
Chaining	(no entry file -- follow the chaining section below)

Do NOT read orchestrator.md when using chaining.

orchestrator.md references references/shared-rules.md for rules that apply to all strategies (domain veto, outside-in progression, pipeline integration, pre-implementation context checklist). Read shared-rules.md when directed by your strategy's entry-point file.

Execution Strategy: Chaining (Fallback)

Used when no delegation primitives are available. The agent plays each role sequentially:

Load references/red.md. Execute the RED phase.
Load references/domain.md. Execute DOMAIN review of the test.
Load references/green.md. Execute the GREEN phase.
Load references/domain.md. Execute DOMAIN review of the implementation.
Load references/commit.md. Execute the COMMIT phase.
Repeat.

Role boundaries are advisory in this mode. The agent must self-enforce phase boundaries: only edit file types permitted by the current phase (see references/phase-boundaries.md).

Execution Strategy: Subagents

Used when the Agent tool is available for spawning focused subagents. Each phase runs in an isolated subagent with constrained scope.

Spawn each phase agent using Agent(subagent_type="<agent-name>", prompt="...") with the prompt template in references/{phase}-prompt.md.
The orchestrator follows references/orchestrator.md for coordination rules.
Structural handoff schema (references/handoff-schema.md): every phase agent must return evidence fields (test output, file paths changed, domain concerns). Missing evidence fields = handoff blocked. The orchestrator does not proceed to the next phase until the schema is satisfied.
Context isolation provides structural enforcement: each subagent receives only the files relevant to its phase.

Named Team Member Personas (Subagent Strategy)

When .claude/agents/ definitions exist (from the ensemble-team skill), the subagent strategy uses named personas for ping and pong roles. The orchestrator selects team members based on slice context, spawns them as subagents using Agent(subagent_type="<agent-name>", prompt="..."), and collects results to pass as context to the next subagent.

See references/orchestrator.md for coordination rules and references/ping-pong-pairing.md for persona selection, rotation, and pairing history.

Phase Boundary Rules

Each phase edits only its own file types. This prevents drift. See references/phase-boundaries.md for the complete file-type matrix.

Phase	Can Edit	Cannot Edit
RED	Test files	Production code, type definitions
DOMAIN	Type definitions (stubs)	Test logic, implementation bodies
GREEN	Implementation bodies	Test files, type signatures
COMMIT	Nothing -- git operations only	All source files

If blocked by a boundary, stop and return to the orchestrator (automated) or report to the user (guided). Never circumvent boundaries.

Walking Skeleton First

The first vertical slice must be a walking skeleton: the thinnest end-to-end path proving all architectural layers connect. It may use hardcoded values or stubs. Build it before any other slice. It de-risks the architecture and gives subsequent slices a proven wiring path to extend.

Outside-In TDD

Start from an acceptance test at the application boundary -- the point where external input enters the system. Drill inward through unit tests. The outer acceptance test stays RED while inner unit tests go through their own red-green-domain-commit cycles. The slice is complete only when the outer acceptance test passes.

A test that calls internal functions directly is a unit test, not an acceptance test -- even if it asserts on user-visible behavior.

Boundary enforcement by mode:

Pipeline mode: The CYCLE_COMPLETE evidence must include boundary_type and boundary_evidence on the acceptance test. The pipeline's TDD gate rejects evidence where the acceptance test calls internal functions directly.
Automated mode (non-pipeline): The orchestrator checks boundary scope and re-delegates if the first test is not a boundary test. Advisory -- no gate blocks progression.
Guided mode: The human is responsible for ensuring boundary-level tests. The skill text instructs correct behavior but cannot enforce it.

Cycle-Complete Evidence

At the end of each complete RED-DOMAIN-GREEN-DOMAIN-COMMIT cycle, produce a CYCLE_COMPLETE evidence packet containing: slice_id, acceptance_test {file, name, output, boundary_type, boundary_evidence}, unit_tests {count, all_passing, output}, domain_reviews [{phase, verdict, concerns}], commits [{hash, message}], rework_cycles, team {ping, pong, domain_reviewer}.

When pipeline-state is provided in context metadata, the TDD skill operates in pipeline mode : it receives a slice_id and stores evidence to .factory/audit-trail/slices/<slice-id>/tdd-cycles/cycle-NNN.json. When running standalone, the evidence is informational only (not stored).

See references/cycle-evidence.md for full schema.

Harness-Specific Guidance

If running on Claude Code, also read references/claude-code.md for harness-specific rules including hook-based enforcement. For maximum mechanical enforcement, ask the bootstrap skill to install optional hooks from references/hooks/claude-code-hooks.json.

Enforcement Note

Guided mode : Advisory. The human enforces by controlling phase transitions.
Chaining mode : Advisory. The agent self-enforces phase boundaries.
Subagent mode : Structural. Context isolation and handoff schemas enforce phase boundaries. Missing evidence blocks handoffs.
Pipeline mode : Gating. Evidence gates reject incomplete phase transitions.
Optional hooks (Claude Code): Mechanical. Pre-tool-use hooks block unauthorized file edits per phase. See references/claude-code.md.

Hard constraints:

Phase boundary violation (wrong file type in wrong phase): [H]
Domain veto escalation (contested design decision): [RP]
Commit gate (no new RED before prior cycle committed): [H]

See references/constraint-resolution.md in the template directory for pipeline rework budget conflicts and domain veto resolution in pipeline mode.

Constraints

Chaining mode self-enforcement : Self-enforcement means you produce the same file-type restrictions as if separate agents were enforcing them. Writing production code during RED phase violates this constraint even though no mechanism prevents it. If you catch yourself reasoning about why a phase boundary doesn't apply in chaining mode, you are violating it.
"~20 lines, one file" scope check : This is a judgment heuristic, not a precise threshold. The spirit is: if the change touches multiple concerns, multiple files, or requires understanding distant code, it is too large for a single cycle. Do not game this by making a 40-line change across 2 functions in one file and claiming it's "one file."

Verification

After completing a cycle, verify:

Every failing test was written BEFORE its implementation
Domain review occurred after EVERY RED and GREEN phase
Phase boundary rules were respected (file-type restrictions)
Evidence (test output) was provided at each handoff
Commit exists for every completed RED-GREEN cycle
GREEN phase iterated one failure at a time (not full implementation in one pass)
Working tree clean after every COMMIT (git status verified)
Walking skeleton completed first (first vertical slice)

HARD GATE -- COMMIT (must pass before any new RED phase):

All tests pass
Git commit created with message referencing the current GWT scenario
No new RED phase started before this commit was made

Dependencies

This skill works standalone. For enhanced workflows, it integrates with:

domain-modeling: Strengthens the domain review phases with parse-don't-validate, semantic types, and invalid-state prevention principles.
code-review: Three-stage review (spec compliance, code quality, domain integrity) after TDD cycles complete.
mutation-testing: Validates test quality by checking that tests detect injected mutations in production code.
ensemble-team: Provides real-world expert personas for pair selection and mob review.

Missing a dependency? Install with:

npx skills add jwilger/agent-skills --skill domain-modeling

Weekly Installs

Repository

jwilger/agent-skills

GitHub Stars

First Seen

Feb 19, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

claude-code73

codex72

cursor72

github-copilot70

gemini-cli70

kimi-cli70

Skills CLI 使用指南：AI Agent 技能包管理器安装与管理教程

46,600 周安装