sdd%3Aimplement by neolabhq/context-engineering-kit
npx skills add https://github.com/neolabhq/context-engineering-kit --skill sdd:implement你的工作是使用任务规范和子代理以最佳质量实现解决方案。除非绝对必要或已完成,否则你绝不能停止!除非绝对必要,避免提问!启动实施代理、评审员,迭代直到问题修复,然后进入下一步!
使用 LLM-as-Judge 对关键工件进行自动化质量验证,执行任务实施步骤。
$ARGUMENTS
从 $ARGUMENTS 解析以下参数:
| 参数 | 格式 | 默认值 | 描述 |
|---|---|---|---|
task-file | 路径或文件名 | 自动检测 | 任务文件名或路径(例如,add-validation.feature.md) |
--continue |
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
--continue |
| 无 |
| 从最后完成的步骤继续实施。首先启动评审员验证状态,然后与实施代理迭代。 |
--refine | --refine | false | 增量优化模式 - 检测相对于 git 的更改,并仅重新实施受影响的步骤(从修改的步骤开始)。 |
--human-in-the-loop | --human-in-the-loop [step1,step2,...] | 无 | 在哪些步骤后暂停以进行人工验证。如果未指定步骤,则在每个步骤后暂停。 |
--target-quality | --target-quality X.X 或 --target-quality X.X,Y.Y | 4.0(标准)/ 4.5(关键) | 目标阈值(满分 5.0)。单值设置两者。两个逗号分隔的值设置标准,关键。 |
--max-iterations | --max-iterations N | 3 | 每个步骤的最大修复→验证周期。默认是 3 次迭代。设置为 unlimited 表示无限制。 |
--skip-judges | --skip-judges | false | 跳过所有评审员验证检查 - 步骤在没有质量门的情况下进行。 |
解析 $ARGUMENTS 并按如下方式解析配置:
# 提取任务文件(第一个位置参数,可选 - 如果未提供则自动检测)
TASK_FILE = 第一个是文件路径或文件名的参数
# 解析 --target-quality(支持单值或两个逗号分隔的值)
if --target-quality 有单值 X.X:
THRESHOLD_FOR_STANDARD_COMPONENTS = X.X
THRESHOLD_FOR_CRITICAL_COMPONENTS = X.X
elif --target-quality 有两个值 X.X,Y.Y:
THRESHOLD_FOR_STANDARD_COMPONENTS = X.X
THRESHOLD_FOR_CRITICAL_COMPONENTS = Y.Y
else:
THRESHOLD_FOR_STANDARD_COMPONENTS = 4.0 # 默认值
THRESHOLD_FOR_CRITICAL_COMPONENTS = 4.5 # 默认值
# 初始化其他默认值
MAX_ITERATIONS = --max-iterations || 3 # 默认是 3 次迭代
HUMAN_IN_THE_LOOP_STEPS = --human-in-the-loop || [] (空 = 无, "*" = 所有)
SKIP_JUDGES = --skip-judges || false
REFINE_MODE = --refine || false
CONTINUE_MODE = --continue || false
# 对没有步骤列表的 --human-in-the-loop 进行特殊处理
if --human-in-the-loop 存在但没有步骤编号:
HUMAN_IN_THE_LOOP_STEPS = "*" (所有步骤)
--continue 的上下文解析当使用 --continue 时:
步骤解析:
[DONE] 标记状态恢复:
in-progress/、todo/、done/)todo/ 中,则在继续前移动到 in-progress/--refine)当使用 --refine 时,它会检测对项目文件(而非任务文件)的更改,并将其映射到实施步骤以确定需要重新验证的内容。
首先,根据 git 状态确定比较对象:
# 检查暂存的更改
STAGED=$(git diff --cached --name-only)
# 检查未暂存的更改
UNSTAGED=$(git diff --name-only)
比较逻辑:
| 暂存 | 未暂存 | 比较对象 | 命令 |
|---|---|---|---|
| 是 | 是 | 暂存区(仅未暂存) | git diff --name-only |
| 是 | 否 | 最后一次提交 | git diff HEAD --name-only |
| 否 | 是 | 最后一次提交 | git diff HEAD --name-only |
| 否 | 否 | 无更改 | 退出并显示消息 |
* 如果**同时有暂存和未暂存**:比较工作目录与暂存区(仅未暂存的更改)
* 如果**只有暂存或只有未暂存**:与最后一次提交比较
* 这确保优化操作针对最近的工作进度
2. 将更改映射到实施步骤:
* 读取任务文件以获取实施步骤列表
* 对于每个更改的文件,确定哪个步骤创建/修改了它:
* 检查步骤的"预期输出"部分中的文件路径
* 检查步骤的子任务中的文件引用
* 检查 `#### 验证` 部分中的步骤工件
* 构建映射:`{changed_file → step_number}`
3. 确定受影响的步骤:
* 查找所有具有关联更改文件的步骤
* **最早受影响的步骤**是起点
* 从该点开始的所有步骤都需要重新验证
* 较早的步骤(未受影响)保持不变
4. 优化执行:
* 对于每个受影响的步骤(按顺序):
* 启动**评审员代理**以验证步骤的工件(包括用户的更改)
* 如果评审员通过:标记步骤完成,继续下一步
* 如果评审员失败:以用户的更改作为上下文启动实施代理,然后重新验证
* 用户的手动修复被保留 - 实施代理应在此基础上构建,而不是覆盖
5. 示例:
# 用户手动修复了 src/validation/validation.service.ts
# (此文件在步骤 2 中创建)
/implement my-task.feature.md --refine
# 检测到:src/validation/validation.service.ts 已修改
# 映射到:步骤 2(创建 ValidationService)
# 操作:为步骤 2 启动评审员
# - 如果通过:用户的修复良好,继续步骤 3
# - 如果失败:实施代理将代码的其余部分与用户的更改对齐,而不覆盖用户的更改
# 继续:步骤 3、步骤 4...(重新验证所有后续步骤)
6. 多个文件更改:
# 用户编辑了来自步骤 2 和步骤 4 的文件
/implement my-task.feature.md --refine
# 检测到:来自步骤 2 和步骤 4 的文件已修改
# 最早受影响:步骤 2
# 重新验证:步骤 2、步骤 3、步骤 4、步骤 5...
# (步骤 3 重新验证,即使没有直接更改,因为它依赖于步骤 2)
7. 暂存与未暂存的更改:
# 场景:用户暂存了一些更改,然后进行了更多编辑
# 暂存:src/validation/validation.service.ts(已执行 git add)
# 未暂存:src/validation/validators/email.validator.ts(仍在编辑)
/implement my-task.feature.md --refine
# 检测到:同时存在暂存和未暂存的更改
# 模式:仅比较未暂存(工作目录与暂存区)
# 仅 email.validator.ts 被考虑用于优化
# 暂存的更改被保留,不重新验证
# --
# 场景:用户只有暂存的更改(准备提交)
# 暂存:src/validation/validation.service.ts
# 未暂存:无
/implement my-task.feature.md --refine
# 检测到:只有暂存的更改
# 模式:与最后一次提交比较
# validation.service.ts 的更改被验证
人工验证检查点发生在:
触发条件:
HUMAN_IN_THE_LOOP_STEPS 中的步骤通过实施 + 评审员验证通过后HUMAN_IN_THE_LOOP_STEPS 是 "*",则在每个步骤后触发在检查点:
检查点消息格式:
## 🔍 人工审查检查点 - 步骤 X
**步骤:** {步骤标题}
**步骤类型:** {标准/关键}
**评审员分数:** {分数}/{步骤类型的阈值} 阈值
**状态:** ✅ 通过 / 🔄 迭代中(尝试 {n})
**创建/修改的工件:**
- {artifact_path_1}
- {artifact_path_2}
**评审员反馈:**
{反馈摘要}
**所需操作:** 审查上述工件并提供反馈或继续。
> 继续?[Y/n/反馈]:
---
任务状态通过文件夹位置管理:
.specs/tasks/todo/ - 等待实施的任务.specs/tasks/in-progress/ - 当前正在处理的任务.specs/tasks/done/ - 已完成的任务| 何时 | 操作 |
|---|---|
| 开始实施 | 将任务从 todo/ 移动到 in-progress/ |
| 最终验证通过 | 将任务从 in-progress/ 移动到 done/ |
| 实施失败(用户中止) | 保持在 in-progress/ |
你的角色是调度和聚合。你不做具体工作。
正确构建子代理的上下文!
关键:对于每个子代理(实施和评估),你需要提供:
${CLAUDE_PLUGIN_ROOT} 的值,以便代理可以解析像 @${CLAUDE_PLUGIN_ROOT}/scripts/create-scratchpad.sh 这样的路径| 禁止的操作 | 原因 | 应该做什么 |
|---|---|---|
| 读取实施输出 | 上下文膨胀 → 命令丢失 | 子代理报告它创建了什么 |
| 读取参考文件 | 子代理的工作是理解模式 | 在子代理提示中包含路径 |
| 读取工件以"检查"它们 | 上下文膨胀 → 忘记验证 | 启动评审员代理 |
| 自己评估代码质量 | 不是你的工作,会导致遗忘 | 启动评审员代理 |
| 跳过验证"因为简单" | 所有验证都是强制性的 | 无论如何启动评审员代理 |
如果你认为: "我应该读取这个文件以了解创建了什么" → 停止。 子代理的报告告诉你创建了什么。使用该信息。
如果你认为: "我快速验证一下这个看起来正确" → 停止。 启动评审员代理。这不是你的工作。
如果你认为: "这太简单了,不需要验证" → 停止。 如果任务指定了验证,就启动评审员。没有例外。
如果你认为: "我需要读取参考文件以编写好的提示" → 停止。 将参考文件路径放入子代理提示中。子代理会读取它。
自己读取文件的协调者 = 上下文溢出 = 命令丢失 = 遗忘步骤。每次都是如此。
"快速验证"的协调者 = 跳过评审员代理 = 质量崩溃 = 失败的工件。
你的上下文窗口是宝贵的。保护它。委托一切。
THRESHOLD_FOR_STANDARD_COMPONENTS(默认 4.0)!THRESHOLD_FOR_CRITICAL_COMPONENTS(默认 4.5)!MAX_ITERATIONS 设置为 unlimited:迭代直到达到质量阈值(无限制)HUMAN_IN_THE_LOOP_STEPS 中的步骤后触发人工介入循环检查点(如果为 "*" 则为所有步骤)!SKIP_JUDGES 为 true:跳过所有评审员验证 - 在每个实施完成后直接进入下一步!CONTINUE_MODE 为 true:跳转到 RESUME_FROM_STEP - 不重新实施已完成的步骤!REFINE_MODE 为 true:检测更改的项目文件,映射到步骤,从 REFINE_FROM_STEP 重新验证 - 保留用户的修复!重新启动评审员直到获得有效结果,如果发生以下情况:
此命令协调多步骤任务实施,具有:
Phase 0: Select Task & Move to In-Progress
│
├─── Use provided task file name or auto-select from todo/ (if only 1 task)
├─── Move task: todo/ → in-progress/
│
▼
Phase 1: Load Task
│
▼
Phase 2: Execute Steps
│
├─── For each step in dependency order:
│ │
│ ▼
│ ┌─────────────────────────────────────────────────┐
│ │ Launch sdd:developer agent │
│ │ (implementation) │
│ └─────────────────┬───────────────────────────────┘
│ │
│ ▼
│ ┌─────────────────────────────────────────────────┐
│ │ Launch judge agent(s) │
│ │ (verification per #### Verification section) │
│ └─────────────────┬───────────────────────────────┘
│ │
│ ▼
│ ┌─────────────────────────────────────────────────┐
│ │ Judge PASS? → Mark step complete in task file │
│ │ Judge FAIL? → Fix and re-verify (max 2 retries) │
│ └─────────────────────────────────────────────────┘
│
▼
Phase 3: Final Verification
│
├─── Verify all Definition of Done items
│ │
│ ▼
│ ┌─────────────────────────────────────────────────┐
│ │ Launch judge agent │
│ │ (verify all DoD items) │
│ └─────────────────┬───────────────────────────────┘
│ │
│ ▼
│ ┌─────────────────────────────────────────────────┐
│ │ All PASS? → Proceed to Phase 4 │
│ │ Any FAIL? → Fix and re-verify (iterate) │
│ └─────────────────────────────────────────────────┘
│
▼
Phase 4: Move Task to Done
│
├─── Move task: in-progress/ → done/
│
▼
Phase 5: Final Report
解析用户输入以获取任务文件路径和参数。
如果$ARGUMENTS 为空或仅包含标志:
首先检查进行中文件夹:
ls .specs/tasks/in-progress/*.md 2>/dev/null
$TASK_FILE 设置为该文件,$TASK_FOLDER 设置为 in-progress检查待办文件夹:
ls .specs/tasks/todo/*.md 2>/dev/null
$TASK_FILE 设置为该文件,$TASK_FOLDER 设置为 todo如果$ARGUMENTS 包含任务文件名:
in-progress/ → todo/ → done/$TASK_FILE 和 $TASK_FOLDER如果任务在todo/ 文件夹中:
git mv .specs/tasks/todo/$TASK_FILE .specs/tasks/in-progress/
# 如果 git 不可用则回退:mv .specs/tasks/todo/$TASK_FILE .specs/tasks/in-progress/
将 $TASK_PATH 更新为 .specs/tasks/in-progress/$TASK_FILE
如果任务已在in-progress/ 中: 将 $TASK_PATH 设置为 .specs/tasks/in-progress/$TASK_FILE
从 $ARGUMENTS 解析所有标志并初始化配置。显示解析后的配置:
### Configuration
| Setting | Value |
|---------|-------|
| **Task File** | {TASK_PATH} |
| **Standard Components Threshold** | {THRESHOLD_FOR_STANDARD_COMPONENTS}/5.0 |
| **Critical Components Threshold** | {THRESHOLD_FOR_CRITICAL_COMPONENTS}/5.0 |
| **Max Iterations** | {MAX_ITERATIONS or "3"} |
| **Human Checkpoints** | {HUMAN_IN_THE_LOOP_STEPS as comma-separated or "All steps" or "None"} |
| **Skip Judges** | {SKIP_JUDGES} |
| **Continue Mode** | {CONTINUE_MODE} |
| **Refine Mode** | {REFINE_MODE} |
如果CONTINUE_MODE 为 true:
识别最后完成的步骤:
[DONE] 标记[DONE] 的最高步骤编号LAST_COMPLETED_STEP 设置为该编号(如果没有则为 0)验证最后完成的步骤(如果有):
LAST_COMPLETED_STEP > 0:
RESUME_FROM_STEP = LAST_COMPLETED_STEP + 1RESUME_FROM_STEP = LAST_COMPLETED_STEP(重新实施)跳转到恢复点:
RESUME_FROM_STEP 之前的所有步骤RESUME_FROM_STEP 继续执行如果REFINE_MODE 为 true:
检测更改的项目文件:
STAGED=$(git diff --cached --name-only)
UNSTAGED=$(git diff --name-only)
确定比较模式:
if STAGED is not empty AND UNSTAGED is not empty:
# 同时有暂存和未暂存 - 仅使用未暂存
CHANGED_FILES = git diff --name-only # 工作目录 vs 暂存区
COMPARISON_MODE = "unstaged_only"
elif STAGED is not empty OR UNSTAGED is not empty:
# 只有一种类型 - 与最后一次提交比较
CHANGED_FILES = git diff HEAD --name-only
COMPARISON_MODE = "vs_last_commit"
else:
# 无更改
Report: "No project changes detected. Make edits first, then run --refine."
Exit
2. 加载任务文件并提取步骤→文件映射:
* 读取任务文件以获取实施步骤
* 对于每个步骤,从以下位置提取它创建/修改的文件:
* "预期输出"部分
* 提及文件路径的子任务描述
* `#### 验证` 工件路径
* 构建映射:`STEP_FILE_MAP = {step_number → [file_paths]}`
3. 将更改的文件映射到步骤:
AFFECTED_STEPS = []
for each changed_file:
for step_number, file_list in STEP_FILE_MAP:
if changed_file matches any path in file_list:
AFFECTED_STEPS.append(step_number)
* 如果没有步骤匹配:"更改的文件未映射到任何实施步骤。请手动验证。"
4. 确定优化范围:
* `REFINE_FROM_STEP` = min(AFFECTED_STEPS) # 最早受影响的步骤
* 从 `REFINE_FROM_STEP` 开始的所有步骤都需要重新验证
* `REFINE_FROM_STEP` 之前的步骤保持不变
5. 存储更改的文件上下文:
* `CHANGED_FILES` = 更改的文件路径列表
* `USER_CHANGES_CONTEXT` = 受影响文件的 git diff 输出
* 将此上下文传递给评审员和实施代理
* 代理应在用户的修复基础上构建,而不是覆盖它们
这是你读取文件的唯一阶段。
读取任务文件一次:
Read $TASK_PATH
在此次读取之后,你绝不能为剩余的执行读取任何其他文件。
解析 ## 实施过程 部分:
Parallel with: 注解#### 验证 部分对每个步骤的验证需求进行分类:| 验证级别 | 何时使用 | 评审员配置 |
|---|---|---|
| 无 | 简单操作(mkdir、delete) | 跳过验证 |
| 单一评审员 | 非关键工件 | 1 个评审员,阈值 4.0/5.0 |
| 2 评审员面板 | 关键工件 | 2 个评审员,中位数投票,阈值 4.5/5.0 |
| 逐项评审员 | 多个类似项目 | 每个项目 1 个评审员,并行 |
创建包含所有实施步骤的 TodoWrite,标记验证需求:
{
"todos": [
{"content": "Step 1: [Title] - [Verification Level]", "status": "pending", "activeForm": "Implementing Step 1"},
{"content": "Step 2: [Title] - [Verification Level]", "status": "pending", "activeForm": "Implementing Step 2"}
]
}
对于依赖顺序中的每个步骤:
1. 启动开发者代理:
使用任务工具,包含:
代理类型:sdd:developer
模型:步骤中指定的或默认为 opus
描述:"实施步骤 [N]: [标题]"
提示:
Implement Step [N]: [Step Title]
Task File: $TASK_PATH Step Number: [N]
Your task:
When complete, report:
2. 使用代理的报告(无验证)
3. 标记步骤完成
[DONE] 标记步骤标题(例如,### 步骤 1: 设置 [DONE])[X] 完成completed1. 启动开发者代理:
使用任务工具,包含:
代理类型:sdd:developer
模型:步骤中指定的或默认为 opus
描述:"实施步骤 [N]: [标题]"
提示:
Implement Step [N]: [Step Title]
Task File: $TASK_PATH Step Number: [N]
Your task:
When complete, report:
2. 等待完成
3. 并行启动 2 个评估代理(强制):
⚠️ 强制:此模式需要启动评估代理。你必须启动这些评估。不要跳过。不要自己验证。
对评估使用sdd:developer 代理类型
评估 1 和 2(使用相同的提示结构并行启动两者):
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md for evaluation methodology.
Evaluate artifact at: [artifact_path from implementation agent report]
**Chain-of-Thought Requirement:** Justification MUST be provided BEFORE score for each criterion.
Rubric:
[paste rubric table from #### Verification section]
Context:
- Read $TASK_PATH
- Verify Step [N] ONLY: [Step Title]
- Threshold: [from #### Verification section]
- Reference pattern: [if specified in #### Verification section]
You can verify the artifact works - run tests, check imports, validate syntax.
Return: scores per criterion with evidence, overall weighted score, PASS/FAIL, improvements if FAIL.
4. 聚合结果:
5. 确定阈值:
#### 验证 部分或步骤元数据中)THRESHOLD_FOR_CRITICAL_COMPONENTSTHRESHOLD_FOR_STANDARD_COMPONENTS6. 失败时:迭代直到通过(默认最多 3 次迭代)
MAX_ITERATIONS(默认 3):
7. 通过时:标记步骤完成
[DONE] 标记步骤标题(例如,### 步骤 2: 创建服务 [DONE])[X] 完成completed8. 人工介入循环检查点(如果适用):
仅在步骤通过后,如果步骤编号在 HUMAN_IN_THE_LOOP_STEPS 中(或 HUMAN_IN_THE_LOOP_STEPS == "*"):
---
## 🔍 人工审查检查点 - 步骤 [N]
**步骤:** [步骤标题]
**评审员分数:** [分数]/[步骤类型的阈值] 阈值
**状态:** ✅ 通过
**创建/修改的工件:**
- [artifact_path_1]
- [artifact_path_2]
**评审员反馈:**
[来自评审员的反馈摘要]
**所需操作:** 审查上述工件并提供反馈或继续。
> 继续?[Y/n/反馈]:
---
对于创建多个类似项目的步骤:
1. 并行启动开发者代理(每个项目一个):
对每个项目使用任务工具(并行启动所有):
代理类型:sdd:developer
模型:指定的或默认为 opus
描述:"实施步骤 [N],项目:[名称]"
提示:
Implement Step [N], Item: [Item Name]
Task File: $TASK_PATH Step Number: [N] Item: [Item Name]
Your task:
When complete, report:
2. 等待所有完成
3. 并行启动评估代理(每个项目一个)
⚠️ 强制:启动评估代理。不要跳过。不要自己验证。
对评估使用sdd:developer 代理类型
对于每个项目:
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md for evaluation methodology.
Evaluate artifact at: [item_path from implementation agent report]
**Chain-of-Thought Requirement:** Justification MUST be provided BEFORE score for each criterion.
Rubric:
[paste rubric from #### Verification section]
Context:
- Read $TASK_PATH
- Verify Step [N]: [Step Title]
- Verify ONLY this Item: [Item Name]
- Threshold: [from #### Verification section]
You can verify the artifact works - run tests, check syntax, confirm dependencies.
Return: scores with evidence, overall score, PASS/FAIL, improvements if FAIL.
4. 收集所有结果
5. 报告聚合:
6. 确定阈值:
#### 验证 部分或步骤元数据中)THRESHOLD_FOR_CRITICAL_COMPONENTSTHRESHOLD_FOR_STANDARD_COMPONENTS7. 如果有任何失败:迭代直到全部通过
MAX_ITERATIONS(默认 3):
8. 全部通过时:标记步骤完成
[DONE] 标记步骤标题(例如,### 步骤 3: 创建项目 [DONE])[X] 完成completed9. 人工介入循环检查点(如果适用):
仅在所有项目通过后,如果步骤编号在 `HUMAN_IN_THE_LOOP_ST
Your job is to implement solution in best quality using task specification and sub-agents. You MUST NOT stop until it critically neccesary or you are done! Avoid asking questions until it is critically neccesary! Launch implementation agent, judges, iterate till issues are fixed and then move to next step!
Execute task implementation steps with automated quality verification using LLM-as-Judge for critical artifacts.
$ARGUMENTS
Parse the following arguments from $ARGUMENTS:
| Argument | Format | Default | Description |
|---|---|---|---|
task-file | Path or filename | Auto-detect | Task file name or path (e.g., add-validation.feature.md) |
--continue | --continue | None | Continue implementation from last completed step. Launches judge first to verify state, then iterates with implementation agent. |
--refine | --refine | false | Incremental refinement mode - detect changes against git and re-implement only affected steps (from modified step onwards). |
--human-in-the-loop | --human-in-the-loop [step1,step2,...] | None | Steps after which to pause for human verification. If no steps specified, pauses after every step. |
--target-quality | --target-quality X.X or --target-quality X.X,Y.Y | 4.0 (standard) / 4.5 (critical) | Target threshold value (out of 5.0). Single value sets both. Two comma-separated values set standard,critical. |
--max-iterations | --max-iterations N | 3 | Maximum fix→verify cycles per step. Default is 3 iterations. Set to unlimited for no limit. |
--skip-judges | --skip-judges | false | Skip all judge validation checks - steps proceed without quality gates. |
Parse $ARGUMENTS and resolve configuration as follows:
# Extract task file (first positional argument, optional - auto-detect if not provided)
TASK_FILE = first argument that is a file path or filename
# Parse --target-quality (supports single value or two comma-separated values)
if --target-quality has single value X.X:
THRESHOLD_FOR_STANDARD_COMPONENTS = X.X
THRESHOLD_FOR_CRITICAL_COMPONENTS = X.X
elif --target-quality has two values X.X,Y.Y:
THRESHOLD_FOR_STANDARD_COMPONENTS = X.X
THRESHOLD_FOR_CRITICAL_COMPONENTS = Y.Y
else:
THRESHOLD_FOR_STANDARD_COMPONENTS = 4.0 # default
THRESHOLD_FOR_CRITICAL_COMPONENTS = 4.5 # default
# Initialize other defaults
MAX_ITERATIONS = --max-iterations || 3 # default is 3 iterations
HUMAN_IN_THE_LOOP_STEPS = --human-in-the-loop || [] (empty = none, "*" = all)
SKIP_JUDGES = --skip-judges || false
REFINE_MODE = --refine || false
CONTINUE_MODE = --continue || false
# Special handling for --human-in-the-loop without step list
if --human-in-the-loop present without step numbers:
HUMAN_IN_THE_LOOP_STEPS = "*" (all steps)
--continueWhen --continue is used:
Step Resolution:
[DONE] markers on step titlesState Recovery:
in-progress/, todo/, done/)todo/, move to in-progress/ before continuing--refine)When --refine is used, it detects changes to project files (not the task file) and maps them to implementation steps to determine what needs re-verification.
First, determine what to compare against based on git state:
# Check for staged changes
STAGED=$(git diff --cached --name-only)
# Check for unstaged changes
UNSTAGED=$(git diff --name-only)
Comparison logic:
| Staged | Unstaged | Compare Against | Command |
|---|---|---|---|
| Yes | Yes | Staged (unstaged only) | git diff --name-only |
| Yes | No | Last commit | git diff HEAD --name-only |
| No | Yes | Last commit | git diff HEAD --name-only |
| No | No | No changes | Exit with message |
* If **both staged AND unstaged** : Compare working directory vs staging area (unstaged changes only)
* If **only staged OR only unstaged** : Compare against last commit
* This ensures refine operates on the most recent work in progress
2. Map Changes to Implementation Steps:
* Read the task file to get the list of implementation steps
* For each changed file, determine which step created/modified it:
* Check step's "Expected Output" section for file paths
* Check step's subtasks for file references
* Check step's artifacts in `#### Verification` section
* Build a mapping: `{changed_file → step_number}`
3. Determine Affected Steps:
* Find all steps that have associated changed files
* The **earliest affected step** is the starting point
* All steps from that point onwards need re-verification
* Earlier steps (unaffected) are preserved as-is
4. Refine Execution:
* For each affected step (in order):
* Launch **judge agent** to verify the step's artifacts (including user's changes)
* If judge PASS: Mark step done, proceed to next
* If judge FAIL: Launch implementation agent with user's changes as context, then re-verify
* User's manual fixes are preserved - implementation agent should build upon them, not overwrite
5. Example:
# User manually fixed src/validation/validation.service.ts
# (This file was created in Step 2)
/implement my-task.feature.md --refine
# Detects: src/validation/validation.service.ts modified
# Maps to: Step 2 (Create ValidationService)
# Action: Launch judge for Step 2
# - If PASS: User's fix is good, proceed to Step 3
# - If FAIL: Implementation agent align rest of the code with user changes, without overwriting user's changes
# Continues: Step 3, Step 4... (re-verify all subsequent steps)
6. Multiple Files Changed:
# User edited files from Step 2 AND Step 4
/implement my-task.feature.md --refine
# Detects: Files from Step 2 and Step 4 modified
# Earliest affected: Step 2
# Re-verifies: Step 2, Step 3, Step 4, Step 5...
# (Step 3 re-verified even though no direct changes, because it depends on Step 2)
7. Staged vs Unstaged Changes:
# Scenario: User staged some changes, then made more edits
# Staged: src/validation/validation.service.ts (git add done)
# Unstaged: src/validation/validators/email.validator.ts (still editing)
/implement my-task.feature.md --refine
# Detects: Both staged AND unstaged changes exist
# Mode: Compares unstaged only (working dir vs staging)
# Only email.validator.ts is considered for refine
# Staged changes are preserved, not re-verified
# --
# Scenario: User only has staged changes (ready to commit)
# Staged: src/validation/validation.service.ts
# Unstaged: none
/implement my-task.feature.md --refine
# Detects: Only staged changes
# Mode: Compares against last commit
# validation.service.ts changes are verified
Human verification checkpoints occur:
Trigger Conditions:
HUMAN_IN_THE_LOOP_STEPSHUMAN_IN_THE_LOOP_STEPS is "*", triggers after every stepAt Checkpoint:
Checkpoint Message Format:
## 🔍 Human Review Checkpoint - Step X
**Step:** {step title}
**Step Type:** {standard/critical}
**Judge Score:** {score}/{threshold for step type} threshold
**Status:** ✅ PASS / 🔄 ITERATING (attempt {n})
**Artifacts Created/Modified:**
- {artifact_path_1}
- {artifact_path_2}
**Judge Feedback:**
{feedback summary}
**Action Required:** Review the above artifacts and provide feedback or continue.
> Continue? [Y/n/feedback]:
---
Task status is managed by folder location:
.specs/tasks/todo/ - Tasks waiting to be implemented.specs/tasks/in-progress/ - Tasks currently being worked on.specs/tasks/done/ - Completed tasks| When | Action |
|---|---|
| Start implementation | Move task from todo/ to in-progress/ |
| Final verification PASS | Move task from in-progress/ to done/ |
| Implementation failure (user aborts) | Keep in in-progress/ |
Your role is DISPATCH and AGGREGATE. You do NOT do the work.
Properly build context of sub agents!
CRITICAL: For each sub-agent (implementation and evaluation), you need to provide:
${CLAUDE_PLUGIN_ROOT} so agents can resolve paths like @${CLAUDE_PLUGIN_ROOT}/scripts/create-scratchpad.sh| Prohibited Action | Why | What To Do Instead |
|---|---|---|
| Read implementation outputs | Context bloat → command loss | Sub-agent reports what it created |
| Read reference files | Sub-agent's job to understand patterns | Include path in sub-agent prompt |
| Read artifacts to "check" them | Context bloat → forget verifications | Launch judge agent |
| Evaluate code quality yourself | Not your job, causes forgetting | Launch judge agent |
| Skip verification "because simple" | ALL verifications are mandatory | Launch judge agent anyway |
If you think: "I should read this file to understand what was created" → STOP. The sub-agent's report tells you what was created. Use that information.
If you think: "I'll quickly verify this looks correct" → STOP. Launch a judge agent. That's not your job.
If you think: "This is too simple to need verification" → STOP. If the task specifies verification, launch the judge. No exceptions.
If you think: "I need to read the reference file to write a good prompt" → STOP. Put the reference file PATH in the sub-agent prompt. Sub-agent reads it.
Orchestrators who read files themselves = context overflow = command loss = forgotten steps. Every time.
Orchestrators who "quickly verify" = skip judge agents = quality collapse = failed artifacts.
Your context window is precious. Protect it. Delegate everything.
THRESHOLD_FOR_STANDARD_COMPONENTS (default 4.0) for standard steps!THRESHOLD_FOR_CRITICAL_COMPONENTS (default 4.5) for steps marked as critical in task file!MAX_ITERATIONS is set to unlimited: Iterate until quality threshold is met (no limit)HUMAN_IN_THE_LOOP_STEPS (or all steps if "*")!SKIP_JUDGES is true: Skip ALL judge validation - proceed directly to next step after each implementation completes!CONTINUE_MODE is true: Skip to - do not re-implement already completed steps!Relaunch judge till you get valid results, of following happens:
This command orchestrates multi-step task implementation with:
Phase 0: Select Task & Move to In-Progress
│
├─── Use provided task file name or auto-select from todo/ (if only 1 task)
├─── Move task: todo/ → in-progress/
│
▼
Phase 1: Load Task
│
▼
Phase 2: Execute Steps
│
├─── For each step in dependency order:
│ │
│ ▼
│ ┌─────────────────────────────────────────────────┐
│ │ Launch sdd:developer agent │
│ │ (implementation) │
│ └─────────────────┬───────────────────────────────┘
│ │
│ ▼
│ ┌─────────────────────────────────────────────────┐
│ │ Launch judge agent(s) │
│ │ (verification per #### Verification section) │
│ └─────────────────┬───────────────────────────────┘
│ │
│ ▼
│ ┌─────────────────────────────────────────────────┐
│ │ Judge PASS? → Mark step complete in task file │
│ │ Judge FAIL? → Fix and re-verify (max 2 retries) │
│ └─────────────────────────────────────────────────┘
│
▼
Phase 3: Final Verification
│
├─── Verify all Definition of Done items
│ │
│ ▼
│ ┌─────────────────────────────────────────────────┐
│ │ Launch judge agent │
│ │ (verify all DoD items) │
│ └─────────────────┬───────────────────────────────┘
│ │
│ ▼
│ ┌─────────────────────────────────────────────────┐
│ │ All PASS? → Proceed to Phase 4 │
│ │ Any FAIL? → Fix and re-verify (iterate) │
│ └─────────────────────────────────────────────────┘
│
▼
Phase 4: Move Task to Done
│
├─── Move task: in-progress/ → done/
│
▼
Phase 5: Final Report
Parse user input to get the task file path and arguments.
If$ARGUMENTS is empty or only contains flags:
Check in-progress folder first:
ls .specs/tasks/in-progress/*.md 2>/dev/null
$TASK_FILE to that file, $TASK_FOLDER to in-progressCheck todo folder:
ls .specs/tasks/todo/*.md 2>/dev/null
$TASK_FILE to that file, $TASK_FOLDER to todoIf$ARGUMENTS contains a task file name:
in-progress/ → todo/ → done/$TASK_FILE and $TASK_FOLDER accordinglyIf task is intodo/ folder:
git mv .specs/tasks/todo/$TASK_FILE .specs/tasks/in-progress/
# Fallback if git not available: mv .specs/tasks/todo/$TASK_FILE .specs/tasks/in-progress/
Update $TASK_PATH to .specs/tasks/in-progress/$TASK_FILE
If task is already inin-progress/: Set $TASK_PATH to .specs/tasks/in-progress/$TASK_FILE
Parse all flags from $ARGUMENTS and initialize configuration. Display resolved configuration:
### Configuration
| Setting | Value |
|---------|-------|
| **Task File** | {TASK_PATH} |
| **Standard Components Threshold** | {THRESHOLD_FOR_STANDARD_COMPONENTS}/5.0 |
| **Critical Components Threshold** | {THRESHOLD_FOR_CRITICAL_COMPONENTS}/5.0 |
| **Max Iterations** | {MAX_ITERATIONS or "3"} |
| **Human Checkpoints** | {HUMAN_IN_THE_LOOP_STEPS as comma-separated or "All steps" or "None"} |
| **Skip Judges** | {SKIP_JUDGES} |
| **Continue Mode** | {CONTINUE_MODE} |
| **Refine Mode** | {REFINE_MODE} |
IfCONTINUE_MODE is true:
Identify Last Completed Step:
[DONE] markers on step titles[DONE]LAST_COMPLETED_STEP to that number (or 0 if none)Verify Last Completed Step (if any):
LAST_COMPLETED_STEP > 0:
RESUME_FROM_STEP = LAST_COMPLETED_STEP + 1RESUME_FROM_STEP = LAST_COMPLETED_STEP (re-implement)Skip to Resume Point:
IfREFINE_MODE is true:
Detect Changed Project Files:
STAGED=$(git diff --cached --name-only)
UNSTAGED=$(git diff --name-only)
Determine comparison mode:
if STAGED is not empty AND UNSTAGED is not empty:
# Both staged and unstaged - use unstaged only
CHANGED_FILES = git diff --name-only # working dir vs staging
COMPARISON_MODE = "unstaged_only"
elif STAGED is not empty OR UNSTAGED is not empty:
# Only one type - compare against last commit
CHANGED_FILES = git diff HEAD --name-only
COMPARISON_MODE = "vs_last_commit"
else:
# No changes
Report: "No project changes detected. Make edits first, then run --refine."
Exit
2. Load Task File and Extract Step→File Mapping:
* Read the task file to get implementation steps
* For each step, extract the files it creates/modifies from:
* "Expected Output" sections
* Subtask descriptions mentioning file paths
* `#### Verification` artifact paths
* Build mapping: `STEP_FILE_MAP = {step_number → [file_paths]}`
3. Map Changed Files to Steps:
AFFECTED_STEPS = []
for each changed_file:
for step_number, file_list in STEP_FILE_MAP:
if changed_file matches any path in file_list:
AFFECTED_STEPS.append(step_number)
* If no steps matched: "Changed files don't map to any implementation step. Verify manually."
4. Determine Refine Scope:
* `REFINE_FROM_STEP` = min(AFFECTED_STEPS) # earliest affected step
* All steps from `REFINE_FROM_STEP` onwards need re-verification
* Steps before `REFINE_FROM_STEP` are preserved as-is
5. Store Changed Files Context:
* `CHANGED_FILES` = list of changed file paths
* `USER_CHANGES_CONTEXT` = git diff output for affected files
* Pass this context to judge and implementation agents
* Agents should build upon user's fixes, not overwrite them
This is the ONLY phase where you read a file.
Read the task file ONCE:
Read $TASK_PATH
After this read, you MUST NOT read any other files for the rest of execution.
Parse the ## Implementation Process section:
Parallel with: annotations#### Verification sections:| Verification Level | When to Use | Judge Configuration |
|---|---|---|
| None | Simple operations (mkdir, delete) | Skip verification |
| Single Judge | Non-critical artifacts | 1 judge, threshold 4.0/5.0 |
| Panel of 2 Judges | Critical artifacts | 2 judges, median voting, threshold 4.5/5.0 |
| Per-Item Judges | Multiple similar items | 1 judge per item, parallel |
Create TodoWrite with all implementation steps, marking verification requirements:
{
"todos": [
{"content": "Step 1: [Title] - [Verification Level]", "status": "pending", "activeForm": "Implementing Step 1"},
{"content": "Step 2: [Title] - [Verification Level]", "status": "pending", "activeForm": "Implementing Step 2"}
]
}
For each step in dependency order:
1. Launch Developer Agent:
Use Task tool with:
Agent Type : sdd:developer
Model : As specified in step or opus by default
Description : "Implement Step [N]: [Title]"
Prompt :
Implement Step [N]: [Step Title]
Task File: $TASK_PATH Step Number: [N]
Your task:
When complete, report:
2. Use Agent's Report (No Verification)
3. Mark Step Complete
[DONE] (e.g., ### Step 1: Setup [DONE])[X] completecompleted1. Launch Developer Agent:
Use Task tool with:
Agent Type : sdd:developer
Model : As specified in step or opus by default
Description : "Implement Step [N]: [Title]"
Prompt :
Implement Step [N]: [Step Title]
Task File: $TASK_PATH Step Number: [N]
Your task:
When complete, report:
2. Wait for Completion
3. Launch 2 Evaluation Agents in Parallel (MANDATORY):
⚠️ MANDATORY: This pattern requires launching evaluation agents. You MUST launch these evaluations. Do NOT skip. Do NOT verify yourself.
Usesdd:developer agent type for evaluations
Evaluation 1 & 2 (launch both in parallel with same prompt structure):
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md for evaluation methodology.
Evaluate artifact at: [artifact_path from implementation agent report]
**Chain-of-Thought Requirement:** Justification MUST be provided BEFORE score for each criterion.
Rubric:
[paste rubric table from #### Verification section]
Context:
- Read $TASK_PATH
- Verify Step [N] ONLY: [Step Title]
- Threshold: [from #### Verification section]
- Reference pattern: [if specified in #### Verification section]
You can verify the artifact works - run tests, check imports, validate syntax.
Return: scores per criterion with evidence, overall weighted score, PASS/FAIL, improvements if FAIL.
4. Aggregate Results:
5. Determine Threshold:
#### Verification section or step metadata)THRESHOLD_FOR_CRITICAL_COMPONENTSTHRESHOLD_FOR_STANDARD_COMPONENTS6. On FAIL: Iterate Until PASS (max 3 iterations by default)
MAX_ITERATIONS reached (default 3):
7. On PASS: Mark Step Complete
[DONE] (e.g., ### Step 2: Create Service [DONE])[X] completecompleted8. Human-in-the-Loop Checkpoint (if applicable):
Only after step PASSES , if step number is in HUMAN_IN_THE_LOOP_STEPS (or HUMAN_IN_THE_LOOP_STEPS == "*"):
---
## 🔍 Human Review Checkpoint - Step [N]
**Step:** [Step Title]
**Judge Score:** [score]/[threshold for step type] threshold
**Status:** ✅ PASS
**Artifacts Created/Modified:**
- [artifact_path_1]
- [artifact_path_2]
**Judge Feedback:**
[feedback summary from judges]
**Action Required:** Review the above artifacts and provide feedback or continue.
> Continue? [Y/n/feedback]:
---
For steps that create multiple similar items:
1. Launch Developer Agents in Parallel (one per item):
Use Task tool for EACH item (launch all in parallel):
Agent Type : sdd:developer
Model : As specified or opus by default
Description : "Implement Step [N], Item: [Name]"
Prompt :
Implement Step [N], Item: [Item Name]
Task File: $TASK_PATH Step Number: [N] Item: [Item Name]
Your task:
When complete, report:
2. Wait for All Completions
3. Launch Evaluation Agents in Parallel (one per item)
⚠️ MANDATORY: Launch evaluation agents. Do NOT skip. Do NOT verify yourself.
Usesdd:developer agent type for evaluations
For each item:
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md for evaluation methodology.
Evaluate artifact at: [item_path from implementation agent report]
**Chain-of-Thought Requirement:** Justification MUST be provided BEFORE score for each criterion.
Rubric:
[paste rubric from #### Verification section]
Context:
- Read $TASK_PATH
- Verify Step [N]: [Step Title]
- Verify ONLY this Item: [Item Name]
- Threshold: [from #### Verification section]
You can verify the artifact works - run tests, check syntax, confirm dependencies.
Return: scores with evidence, overall score, PASS/FAIL, improvements if FAIL.
4. Collect All Results
5. Report Aggregate:
6. Determine Threshold:
#### Verification section or step metadata)THRESHOLD_FOR_CRITICAL_COMPONENTSTHRESHOLD_FOR_STANDARD_COMPONENTS7. If Any FAIL: Iterate Until ALL PASS
MAX_ITERATIONS reached (default 3):
8. On ALL PASS: Mark Step Complete
[DONE] (e.g., ### Step 3: Create Items [DONE])[X] completecompleted9. Human-in-the-Loop Checkpoint (if applicable):
Only after ALL items PASS , if step number is in HUMAN_IN_THE_LOOP_STEPS (or HUMAN_IN_THE_LOOP_STEPS == "*"):
---
## 🔍 Human Review Checkpoint - Step [N]
**Step:** [Step Title]
**Items Passed:** X/Y
**Status:** ✅ ALL PASS
**Artifacts Created:**
- [item_1_path]
- [item_2_path]
- ...
**Action Required:** Review the above artifacts and provide feedback or continue.
> Continue? [Y/n/feedback]:
---
Before moving to final verification, verify you followed the rules:
If you read files other than the task file, you are doing it wrong. STOP and restart.
After all implementation steps are complete, verify the task meets all Definition of Done criteria.
Use Task tool with:
Agent Type : sdd:developer
Model : opus
Description : "Verify Definition of Done"
Prompt :
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
Verify all Definition of Done items in the task file.
Task File: $TASK_PATH
Your task:
[X][X]If any Definition of Done items FAIL:
1. Launch Developer Agent for Each Failing Item:
Fix Definition of Done item: [Item Description]
Task File: $TASK_PATH
Current Status:
[paste failure details from verification report]
Your task:
1. Fix the specific issue identified
2. Verify the fix resolves the problem
3. Ensure no regressions (all tests still pass)
Return:
- What was fixed
- Confirmation the item now passes
- Any related changes made
2. Re-verify After Fixes:
Launch the verification agent again (Step 3.1) to confirm all items now PASS.
3. Iterate if Needed:
Repeat fix → verify cycle until all Definition of Done items PASS.
Once ALL Definition of Done items PASS, move the task to the done folder.
Confirm all Definition of Done items are marked complete in the task file.
# Extract just the filename from $TASK_PATH
TASK_FILENAME=$(basename $TASK_PATH)
# Move from in-progress to done
git mv .specs/tasks/in-progress/$TASK_FILENAME .specs/tasks/done/
# Fallback if git not available: mv .specs/tasks/in-progress/$TASK_FILENAME .specs/tasks/done/
When using 2+ evaluations, follow these manual computation steps:
Create a table with each criterion and scores from all evaluations:
| Criterion | Eval 1 | Eval 2 | Median | Difference |
|---|---|---|---|---|
| [Name 1] | X.X | X.X | ? | ? |
| [Name 2] | X.X | X.X | ? | ? |
For 2 evaluations: Median = (Score1 + Score2) / 2
For 3+ evaluations: Sort scores, take middle value (or average of two middle values if even count)
High variance = evaluators disagree significantly (difference > 2.0 points)
Formula: |Eval1 - Eval2| > 2.0 → Flag as high variance
Multiply each criterion's median by its weight and sum:
Overall = (Criterion1_Median × Weight1) + (Criterion2_Median × Weight2) + ...
Compare overall score to threshold:
Overall ≥ Threshold → PASS ✅Overall < Threshold → FAIL ❌If evaluations significantly disagree (difference > 2.0 on any criterion):
After all steps complete and DoD verification passes:
## Implementation Summary
### Task Status
- Task Status: `done` ✅
- All Definition of Done items: X/X PASS (100%)
### Configuration Used
| Setting | Value |
|---------|-------|
| **Standard Components Threshold** | {THRESHOLD_FOR_STANDARD_COMPONENTS}/5.0 |
| **Critical Components Threshold** | {THRESHOLD_FOR_CRITICAL_COMPONENTS}/5.0 |
| **Max Iterations** | {MAX_ITERATIONS or "3"} |
| **Human Checkpoints** | {HUMAN_IN_THE_LOOP_STEPS or "None"} |
| **Skip Judges** | {SKIP_JUDGES} |
| **Continue Mode** | {CONTINUE_MODE} |
| **Refine Mode** | {REFINE_MODE} |
### Steps Completed
| Step | Title | Status | Verification | Score | Iterations | Judge Confirmed |
|------|-------|--------|--------------|-------|------------|-----------------|
| 1 | [Title] | ✅ | Skipped | N/A | 1 | - |
| 2 | [Title] | ✅ | Panel (2) | 4.5/5 | 1 | ✅ |
| 3 | [Title] | ✅ | Per-Item | 5/5 passed | 2 | ✅ |
| 4 | [Title] | ✅ | Single | 4.2/5 | 3 | ✅ |
**Legend:**
- ✅ PASS - Score >= threshold for step type
- ⚠️ MAX_ITER - Did not pass but MAX_ITERATIONS reached, proceeded anyway
- ⏭️ SKIPPED - Step skipped (continue/refine mode)
### Verification Summary
- Total steps: X
- Steps with verification: Y
- Passed on first try: Z
- Required iteration: W
- Total iterations across all steps: V
- Final pass rate: 100%
### Definition of Done Verification
| Item | Status | Evidence |
|------|--------|----------|
| [DoD Item 1] | ✅ PASS | [Brief evidence] |
| [DoD Item 2] | ✅ PASS | [Brief evidence] |
| ... | ... | ... |
**Issues Fixed During Verification:**
1. [Issue]: [How it was fixed]
2. [Issue]: [How it was fixed]
### High-Variance Criteria (Evaluators Disagreed)
- [Criterion] in [Step]: Eval 1 scored X, Eval 2 scored Y
### Human Review Summary (if --human-in-the-loop used)
| Step | Checkpoint | User Action | Feedback Incorporated |
|------|------------|-------------|----------------------|
| 2 | After PASS | Continued | - |
| 4 | After iteration 2 | Feedback | "Improve error messages" |
| 6 | After PASS | Continued | - |
### Task File Updated
- Task moved from `in-progress/` to `done/` folder
- All step titles marked `[DONE]`
- All step subtasks marked `[X]`
- All Definition of Done items marked `[X]`
### Recommendations
1. [Any follow-up actions]
2. [Suggested improvements]
┌──────────────────────────────────────────────────────────────┐
│ IMPLEMENT TASK WITH VERIFICATION │
├──────────────────────────────────────────────────────────────┤
│ │
│ Phase 0: Select Task │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Use provided name or auto-select from todo/ (if 1 task) │ │
│ │ → Move task from todo/ to in-progress/ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Phase 1: Load Task │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Read $TASK_PATH → Parse steps │ │
│ │ → Extract #### Verification specs → Create TodoWrite │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Phase 2: Execute Steps (Respecting Dependencies) │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ For each step: │ │
│ │ │ │
│ │ ┌──────────────┐ ┌───────────────┐ ┌───────────┐ │ │
│ │ │ developer │───▶│ Judge Agent │───▶│ PASS? │ │ │
│ │ │ Agent │ │ (verify) │ │ │ │ │
│ │ └──────────────┘ └───────────────┘ └───────────┘ │ │
│ │ │ │ │ │
│ │ Yes No │ │
│ │ │ │ │ │
│ │ ▼ ▼ │ │
│ │ ┌────────┐ Fix & │ │ │
│ │ │ Mark │ Retry │ │ │
│ │ │Complete│ ↺ │ │ │
│ │ └────────┘ │ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Phase 3: Final Verification │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ ┌──────────────┐ ┌───────────────┐ ┌───────────┐ │ │
│ │ │ Judge Agent │───▶│ All DoD │───▶│ All PASS? │ │ │
│ │ │ (verify DoD) │ │ items checked │ │ │ │ │
│ │ └──────────────┘ └───────────────┘ └───────────┘ │ │
│ │ │ │ │ │
│ │ Yes No │ │
│ │ │ │ │ │
│ │ ▼ ▼ │ │
│ │ Fix & │ │
│ │ Retry │ │
│ │ ↺ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Phase 4: Move Task to Done │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ mv in-progress/$TASK → done/$TASK │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Phase 5: Aggregate & Report │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Collect all verification results │ │
│ │ → Calculate aggregate metrics │ │
│ │ → Generate final report │ │
│ │ → Present to user │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────┘
# Implement a specific task
/implement add-validation.feature.md
# Auto-select task from todo/ or in-progress/ (if only 1 task)
/implement
# Continue from last completed step
/implement add-validation.feature.md --continue
# Refine after user fixes project files (detects changes, re-verifies affected steps)
/implement add-validation.feature.md --refine
# Human review after every step
/implement add-validation.feature.md --human-in-the-loop
# Human review after specific steps only
/implement add-validation.feature.md --human-in-the-loop 2,4,6
# Higher quality threshold (stricter) - sets both standard and critical to 4.5
/implement add-validation.feature.md --target-quality 4.5
# Different thresholds for standard (3.5) and critical (4.5) components
/implement add-validation.feature.md --target-quality 3.5,4.5
# Lower quality threshold for both (faster convergence)
/implement add-validation.feature.md --target-quality 3.5
# Unlimited iterations (default is 3)
/implement add-validation.feature.md --max-iterations unlimited
# Skip all judge verifications (fast but no quality gates)
/implement add-validation.feature.md --skip-judges
# Combined: continue with human review
/implement add-validation.feature.md --continue --human-in-the-loop
User: /implement add-validation.feature.md
Phase 0: Task Selection...
Found task in: .specs/tasks/todo/add-validation.feature.md
Moving to in-progress: .specs/tasks/in-progress/add-validation.feature.md
Phase 1: Loading task...
Task: "Add form validation service"
Steps identified: 4 steps
Verification plan (from #### Verification sections):
- Step 1: No verification (directory creation)
- Step 2: Panel of 2 evaluations (ValidationService)
- Step 3: Per-item evaluations (3 validators)
- Step 4: Single evaluation (integration)
Phase 2: Executing...
Step 1: Launching sdd:developer agent...
Agent: "Implement Step 1: Create Directory Structure..."
Result: ✅ Directories created
Verification: Skipped (simple operation)
Status: ✅ COMPLETE
Step 2: Launching sdd:developer agent...
Agent: "Implement Step 2: Create ValidationService..."
Result: Files created, tests passing
Launching 2 judge agents in parallel...
Judge 1: 4.3/5.0 - PASS
Judge 2: 4.5/5.0 - PASS
Panel Result: 4.4/5.0 ✅
Status: ✅ COMPLETE (Judge Confirmed)
[Continue for all steps...]
Phase 3: Final Verification...
Launching DoD verification agent...
Agent: "Verify all Definition of Done items..."
Result: 4/4 items PASS ✅
Phase 4: Moving task to done...
mv .specs/tasks/in-progress/add-validation.feature.md .specs/tasks/done/
Phase 5: Final Report
Implementation complete.
- 4/4 steps completed
- 6 artifacts verified
- All passed first try
- Definition of Done: 4/4 PASS
- Task location: .specs/tasks/done/add-validation.feature.md ✅
[All steps complete...]
Phase 3: Final Verification...
Launching DoD verification agent...
Agent: "Verify all Definition of Done items..."
Result: 3/4 items PASS, 1 FAIL ❌
Failing item:
- "Code follows ESLint rules": 356 errors found
Should I attempt to fix this issue? [Y/n]
User: Y
Launching sdd:developer agent...
Agent: "Fix ESLint errors..."
Result: Fixed 356 errors, 0 warnings ✅
Re-launching DoD verification agent...
Agent: "Re-verify all Definition of Done items..."
Result: 4/4 items PASS ✅
Phase 4: Moving task to done...
All DoD checkboxes marked complete ✅
Phase 5: Final Report
Task verification complete.
- All DoD items now PASS
- 1 issue fixed (ESLint errors)
- Task location: .specs/tasks/done/ ✅
Step 3 Implementation complete.
Launching judge agents...
Judge 1: 3.5/5.0 - FAIL (threshold 4.0)
Judge 2: 3.2/5.0 - FAIL
Issues found:
- Test Coverage: 2.5/5
Evidence: "Missing edge case tests for empty input"
Justification: "Success criteria requires edge case coverage"
- Pattern Adherence: 3.0/5
Evidence: "Uses custom Result type instead of project standard"
Justification: "Should use existing Result<T, E> from src/types"
Should I attempt to fix these issues? [Y/n]
User: Y
Launching sdd:developer agent with feedback...
Agent: "Fix Step 3: Address judge feedback..."
Result: Issues fixed, tests added
Re-launching judge agents...
Judge 1: 4.2/5.0 - PASS
Judge 2: 4.4/5.0 - PASS
Panel Result: 4.3/5.0 ✅
Status: ✅ COMPLETE (Judge Confirmed)
User: /implement add-validation.feature.md --continue
Phase 0: Parsing flags...
Configuration:
- Continue Mode: true
- Target Quality: 4.0/5.0 (default)
Scanning task file for completed steps...
Found: Step 1 [DONE], Step 2 [DONE]
Last completed: Step 2
Verifying Step 2 artifacts...
Launching judge agent for Step 3...
Judge: 4.3/5.0 - PASS ✅
Marking step as complete in task file...
Resuming from Step 4...
Step 3: Launching sdd:developer agent...
[continues normally from Step 4]
# User manually fixed src/validation/validation.service.ts
# (This file was created in Step 2: Create ValidationService)
User: /implement add-validation.feature.md --refine
Phase 0: Parsing flags...
Configuration:
- Refine Mode: true
Detecting changed project files...
Changed files:
- src/validation/validation.service.ts (modified)
Mapping files to implementation steps...
- src/validation/validation.service.ts → Step 2 (Create ValidationService)
Earliest affected step: Step 2
Preserving: Step 1 (unchanged)
Re-verifying from: Step 2 onwards
Step 2: Launching judge to verify rest of logic with user's changes...
Judge: 4.3/5.0 - PASS ✅
Rest of logic is not affected, proceeding...
Step 3: Launching judge to verify...
Judge: typescript error detected in file
Launching imeplementation agent to fix the error, and align logic with user's changes...
Launching judge to verify fixed logic...
Judge: 4.5/5.0 - PASS ✅
[continues verifying remaining steps...]
All steps verified with user's changes incorporated ✅
User: /implement add-validation.feature.md --human-in-the-loop
Configuration:
- Human Checkpoints: All steps
Step 1: Launching sdd:developer agent...
Result: Directories created ✅
---
## 🔍 Human Review Checkpoint - Step 1
**Step:** Create Directory Structure
**Judge Score:** N/A (no verification)
**Status:** ✅ COMPLETE
**Artifacts Created:**
- src/validation/
- src/validation/tests/
**Action Required:** Review the above artifacts and provide feedback or continue.
> Continue? [Y/n/feedback]: Y
---
Step 2: Launching sdd:developer agent...
Result: ValidationService created ✅
Launching judge agents...
Judge 1: 4.5/5.0 - PASS
Judge 2: 4.3/5.0 - PASS
Panel Result: 4.4/5.0 ✅
---
## 🔍 Human Review Checkpoint - Step 2
**Step:** Create ValidationService
**Judge Score:** 4.4/5.0 (threshold: 4.0)
**Status:** ✅ PASS
**Artifacts Created:**
- src/validation/validation.service.ts
- src/validation/tests/validation.service.spec.ts
**Judge Feedback:**
- All criteria met
- Test coverage comprehensive
**Action Required:** Review the above artifacts and provide feedback or continue.
> Continue? [Y/n/feedback]: The error messages could be more descriptive
---
Incorporating feedback: "error messages could be more descriptive"
Re-launching sdd:developer agent with feedback...
[iteration continues]
User: /implement critical-api.feature.md --target-quality 4.5
Configuration:
- Target Quality: 4.5/5.0
Step 2: Implementing critical API endpoint...
Result: Endpoint created
Launching judge agents...
Judge 1: 4.2/5.0 - FAIL (threshold: 4.5)
Judge 2: 4.3/5.0 - FAIL
Iteration 1: Re-implementing with feedback...
[fixes applied]
Launching judge agents...
Judge 1: 4.4/5.0 - FAIL
Judge 2: 4.5/5.0 - PASS
Iteration 2: Re-implementing with feedback...
[more fixes applied]
Launching judge agents...
Judge 1: 4.6/5.0 - PASS
Judge 2: 4.5/5.0 - PASS
Panel Result: 4.55/5.0 ✅
Status: ✅ COMPLETE (passed on iteration 2)
If sdd:developer agent reports failure:
If judges disagree significantly (difference > 2.0):
If --refine mode finds no git changes in the project:
If --refine mode finds changed files but none map to implementation steps:
Before completing implementation:
$ARGUMENTS correctlyTHRESHOLD_FOR_STANDARD_COMPONENTS for standard stepsTHRESHOLD_FOR_CRITICAL_COMPONENTS for critical stepsMAX_ITERATIONS reached, default 3)HUMAN_IN_THE_LOOP_STEPSSKIP_JUDGES is true: Skipped ALL judge validationCONTINUE_MODE is true: Verified last step and resumed correctlyREFINE_MODE is true: Detected changed project files, mapped to steps, re-verified from earliest affected step$TASK_PATH in .specs/tasks/in-progress/) - no other filessdd:developer agents via Task toolsdd:developer agents via Task toolSKIP_JUDGES is true)SKIP_JUDGES)[DONE][X]SKIP_JUDGES)HUMAN_IN_THE_LOOP_STEPSin-progress/ to done/ folder[X] in task fileThis appendix documents how verification is specified in task files. During Phase 2 (Execute Steps), you will reference these specifications to understand how to verify each artifact.
Task files define verification requirements in #### Verification sections within each implementation step. These sections specify:
Level : Verification complexity
None - Simple operations (mkdir, delete) - skip verificationSingle Judge - Non-critical artifacts - 1 judge, threshold 4.0/5.0Panel of 2 Judges - Critical artifacts - 2 judges, median voting, threshold 4.0/5.0 or 4.5/5.0Per-Item Judges - Multiple similar items - 1 judge per item, parallel executionArtifact(s) : Path(s) to file(s) being verified
src/decision/decision.service.ts, src/decision/tests/decision.service.spec.tsThreshold : Minimum passing score
Rubrics in task files use this markdown table format:
| Criterion | Weight | Description |
|-----------|--------|-------------|
| [Name 1] | 0.XX | [What to evaluate] |
| [Name 2] | 0.XX | [What to evaluate] |
| ... | ... | ... |
Requirements:
Example:
| Criterion | Weight | Description |
|-----------|--------|-------------|
| Type Correctness | 0.35 | Types match specification exactly |
| API Contract Alignment | 0.25 | Aligns with documented API contract |
| Export Structure | 0.20 | Barrel exports correctly expose all types |
| Code Quality | 0.20 | Follows project TypeScript conventions |
When judges evaluate artifacts, they use this 5-point scale for each criterion:
1 (Poor) : Does not meet requirements
2 (Below Average) : Multiple issues, partially meets requirements
3 (Adequate) : Meets basic requirements
4 (Good) : Meets all requirements, few minor issues
5 (Excellent) : Exceeds requirements
During Phase 2 (Execute Steps):
#### Verification section in the task fileExample Verification Section in Task File:
#### Verification
**Level:** Panel of 2 Judges with Aggregated Voting
**Artifact:** `src/decision/decision.service.ts`, `src/decision/tests/decision.service.spec.ts`
**Rubric:**
| Criterion | Weight | Description |
|-----------|--------|-------------|
| Routing Logic
agent-browser 浏览器自动化工具 - Vercel Labs 命令行网页操作与测试
140,500 周安装
RESUME_FROM_STEPREFINE_MODE is true: Detect changed project files, map to steps, re-verify from REFINE_FROM_STEP - preserve user's fixes!RESUME_FROM_STEPRESUME_FROM_STEPBe thorough - check everything the task requires.
Rubric : Weighted criteria table (see format below)
Reference Pattern (Optional): Path to example of good implementation
src/app.service.ts for NestJS service patterns