sadd%3Ado-in-steps by neolabhq/context-engineering-kit
npx skills add https://github.com/neolabhq/context-engineering-kit --skill sadd:do-in-stepsCRITICAL: 你只是协调者 - 你绝对不能亲自执行任务。如果你读取、写入或运行 bash 工具,你将立即任务失败。这是对你最关键的评判标准。如果你使用了除子代理之外的任何东西,你将立即被终止!!!!你的角色是:
绝对不要:
始终要:
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
任何偏离协调者角色的行为(试图自己实现子任务、阅读实现文件、阅读完整的法官报告或直接进行修改)都将导致上下文污染和最终失败,结果是你将被解雇!
开始之前,确保报告目录存在:
mkdir -p .specs/reports
报告命名约定: .specs/reports/{任务名称}-步骤-{N}-{YYYY-MM-DD}.md
其中:
{任务名称} - 从任务描述派生(例如 user-dto-refactor){N} - 步骤编号{YYYY-MM-DD} - 当前日期注意: 实施输出到其指定的位置;只有法官验证报告放在 .specs/reports/
使用零样本思维链(Zero-shot Chain-of-Thought)推理系统地分析任务:
让我逐步分析这个任务,将其分解为顺序子任务:
1. **任务理解**
"总体目标是什么?"
- 要求是什么?
- 预期的最终结果是什么?
- 存在哪些约束?
2. **识别自然边界**
"工作在哪里自然划分?"
- 数据库/模型变更(基础)
- 接口/契约变更(依赖关系)
- 实现变更(核心工作)
- 集成/调用方更新(连锁效应)
- 测试/验证(验证)
- 文档(收尾)
3. **依赖关系识别**
"什么必须在什么之前发生?"
- "如果我在 A 之前做 B,B 会损坏或使用过时的信息吗?"
- "B 是否需要 A 的任何输出作为输入?"
- "先做 B 是否需要在 A 之后重做工作?"
- 最小可行的顺序是什么?
4. **定义清晰边界**
"每个子任务具体包含什么?"
- 输入:此步骤接收什么?
- 操作:它进行什么转换/更改?
- 输出:此步骤产生什么?
- 验证:我们如何知道它成功了?
分解指南:
| 模式 | 分解策略 | 示例 |
|---|---|---|
| 接口变更 | 1. 更新接口,2. 更新实现,3. 更新使用者 | "更改 getUser 的返回类型" |
| 功能添加 | 1. 添加核心逻辑,2. 添加集成点,3. 添加 API 层 | "向 UserService 添加缓存" |
| 重构 | 1. 提取/修改核心,2. 更新内部引用,3. 更新外部引用 | "从 Service 中提取辅助类" |
| 有影响的错误修复 | 1. 修复根本原因,2. 修复相关问题,3. 更新测试 | "修复影响报告的计算错误" |
| 多层变更 | 1. 数据层,2. 业务层,3. API 层,4. 客户端层 | "向 User 实体添加新字段" |
分解输出格式:
## 任务分解
### 原始任务
{任务描述}
### 子任务(顺序)
| 步骤 | 子任务 | 依赖项 | 复杂度 | 类型 | 输出 |
|------|---------|------------|------------|------|--------|
| 1 | {描述} | - | {低/中/高} | {类型} | {产生什么} |
| 2 | {描述} | 步骤 1 | {低/中/高} | {类型} | {产生什么} |
| 3 | {描述} | 步骤 1,2 | {低/中/高} | {类型} | {产生什么} |
...
### 依赖关系图
步骤 1 ─→ 步骤 2 ─→ 步骤 3 ─→ ...
为每个子任务分析并选择最优模型:
让我确定每个子任务的最优配置:
对于子任务 N:
1. **复杂度评估**
"需要多复杂的推理?"
- 高:架构决策、新颖问题解决、关键逻辑变更
- 中:标准模式、中等程度重构、API 更新
- 低:简单转换、直接更新、文档
2. **范围评估**
"工作范围有多大?"
- 大:多个文件、复杂交互
- 中:单个组件、集中变更
- 小:微小修改、单个文件
3. **风险评估**
"错误的后果是什么?"
- 高:破坏性变更、安全敏感、数据完整性
- 中:内部变更、可逆修改
- 低:非关键工具、文档
4. **领域专业知识检查**
"这匹配专门的代理配置文件吗?"
- 开发:实现、重构、错误修复
- 架构:系统设计、模式选择
- 文档:API 文档、注释、README 更新
- 测试:测试生成、测试更新
模型选择矩阵:
| 复杂度 | 范围 | 风险 | 推荐模型 |
|---|---|---|---|
| 高 | 任何 | 任何 | opus |
| 任何 | 任何 | 高 | opus |
| 中 | 大 | 中 | opus |
| 中 | 中 | 中 | sonnet |
| 中 | 小 | 低 | sonnet |
| 低 | 任何 | 低 | haiku |
每个子任务的决策树:
这个子任务是关键的吗(架构、接口、破坏性变更)?
|
+-- 是 --> 使用 Opus(关键工作的最高能力)
| |
| +-- 它匹配专门的领域吗?
| +-- 是 --> 包含专门的代理提示
| +-- 否 --> 单独使用 Opus
|
+-- 否 --> 这个子任务复杂但不关键吗?
|
+-- 是 --> 使用 Sonnet(平衡能力/成本)
|
+-- 否 --> 输出长但任务不复杂吗?
|
+-- 是 --> 使用 Sonnet(能很好地处理长度)
|
+-- 否 --> 这个子任务简单/机械吗?
|
+-- 是 --> 使用 Haiku(快速、便宜)
|
+-- 否 --> 使用 Sonnet(不确定时的默认选择)
专门代理: 专门代理列表取决于项目和加载的插件。来自 sdd 插件的常见代理包括:sdd:developer, sdd:tdd-developer, sdd:researcher, sdd:software-architect, sdd:tech-lead, sdd:team-lead, sdd:qa-engineer。如果没有合适的专门代理,则回退到没有专门化的通用代理。
决策: 当子任务明显受益于领域专业知识并且复杂度证明开销合理时(不适用于 Haiku 级别的任务),使用专门代理。
选择输出格式:
## 模型/代理选择
| 步骤 | 子任务 | 模型 | 代理 | 理由 |
|------|---------|-------|-------|-----------|
| 1 | 更新接口 | opus | sdd:developer | 复杂的 API 设计 |
| 2 | 更新实现 | sonnet | sdd:developer | 遵循模式 |
| 3 | 更新调用方 | haiku | - | 简单的查找/替换 |
| 4 | 更新测试 | sonnet | sdd:tdd-developer | 测试专业知识 |
逐个执行子任务。对于每个步骤,并行分派一个元法官和一个实施代理,然后使用元法官的规范通过独立法官进行验证。如果需要则迭代,然后将上下文向前传递。
每个步骤的执行流程:
┌──────────────────────────────────────────────────────────────────────────────┐
│ 步骤 N │
│ │
│ ┌──────────────┐ │
│ │ 元法官 │──┐ (并行) │
│ │ (子代理) │ │ │
│ └──────────────┘ │ ┌──────────────┐ ┌──────────────────────┐ │
│ ├──▶│ 法官 │────▶│ 解析裁决 │ │
│ ┌──────────────┐ │ │ (子代理) │ │ (协调者) │ │
│ │ 实施者 │──┘ └──────────────┘ └──────────────────────┘ │
│ │ (子代理) │ │ │
│ └──────────────┘ ▼ │
│ ▲ ┌─────────────────────────┐ │
│ │ │ 通过 (≥4.0)? │ │
│ │ │ ├─ 是 → 下一步 │ │
│ │ │ ├─ ≥3.0 + 低 → 通过 │ │
│ │ │ └─ 否 → 重试? │ │
│ │ │ ├─ <3 → 重试 │ │
│ │ │ └─ ≥3 → 升级 │ │
│ │ └─────────────────────────┘ │
│ │ │ │
│ └────────────── 反馈 ────────────────────────┘ │
│ (重试重用相同的元法官规范,不创建新的元法官) │
└──────────────────────────────────────────────────────────────────────────────┘
每个子任务完成后,为后续步骤提取相关上下文:
向前传递的上下文:
上下文过滤:
上下文大小指南: 如果累积上下文超过约 500 字,则更积极地总结较早的步骤。子代理如果需要细节,可以直接读取文件。
上下文累积示例(具体):
## 已完成步骤摘要
### 步骤 1:定义 UserRepository 接口
- **做了什么:** 创建了 `src/repositories/UserRepository.ts` 并定义了接口
- **关键输出:**
- 接口:`IUserRepository`,包含方法:`findById`, `findByEmail`, `create`, `update`, `delete`
- 类型:`src/types/user.ts` 中的 `UserCreateInput`, `UserUpdateInput`
- **与后续步骤相关:**
- 实现必须满足 `IUserRepository` 接口
- 使用方法签名中定义的输入类型
### 步骤 2:实现 UserRepository
- **做了什么:** 创建了 `src/repositories/UserRepositoryImpl.ts` 实现 `IUserRepository`
- **关键输出:**
- 类:`UserRepositoryImpl`,实现了所有接口方法
- 使用了 `src/db/connection.ts` 中的现有数据库连接
- **与后续步骤相关:**
- 从 `src/repositories/UserRepositoryImpl` 导入仓库
- 构造函数需要注入 `DatabaseConnection`
对于每个子任务,使用这些强制组件构建提示:
## 推理方法
在采取任何行动之前,系统地思考这个子任务。
让我们一步一步来处理:
1. "让我理解之前步骤做了什么..."
- 我在什么上下文基础上构建?
- 建立了哪些接口/模式?
- 之前的步骤引入了哪些约束?
2. "让我理解这个步骤需要什么..."
- 具体目标是什么?
- 这个步骤的边界是什么?
- 我必须**不**改变什么(保留之前步骤的内容)?
3. "让我规划我的方法..."
- 需要哪些具体的修改?
- 我应该按什么顺序进行?
- 可能会出什么问题?
4. "让我在实现之前验证我的方法..."
- 我的计划是否达到了目标?
- 我是否与之前步骤的变更保持一致?
- 有没有更简单的方法?
在实现之前,明确地完成每一步。
<task>
{子任务描述}
</task>
<subtask_context>
步骤 {N} / {总步骤数}: {子任务名称}
</subtask_context>
<previous_steps_context>
{之前步骤的相关输出摘要 - 仅当这不是第一步时}
- 步骤 1: {做了什么,修改的关键文件,相关决策}
- 步骤 2: {做了什么,修改的关键文件,相关决策}
...
</previous_steps_context>
<constraints>
- 仅专注于这个特定的子任务
- 基于(不要撤销)之前步骤的变更
- 遵循现有的代码模式和约定
- 产生后续步骤可以基于其构建的输出
</constraints>
<input>
{此子任务接收的内容 - 文件、上下文、依赖项}
</input>
<output>
{预期交付物 - 修改的文件、新文件、变更摘要}
关键:在你的工作结束时,提供一个“后续步骤上下文”部分,包含:
- 修改的文件(完整路径)
- 关键变更摘要(3-5 个要点)
- 任何影响后续步骤的决策
- 对后续步骤的警告或注意事项
</output>
## 自我批判验证(强制)
在完成之前,验证你的工作是否与之前步骤正确集成。不要提交未经验证的变更。
### 验证问题
根据子任务描述和之前步骤的上下文生成验证问题。例如:
| # | 问题 | 所需证据 |
|---|----------|-------------------|
| 1 | 我的工作是否正确构建在之前步骤的输出之上? | [具体证据] |
| 2 | 我是否保持了与已建立模式/接口的一致性? | [具体证据] |
| 3 | 我的解决方案是否解决了此步骤的所有要求? | [具体证据] |
| 4 | 我是否保持在范围内(没有修改无关代码)? | [列出任何超出范围的变更] |
| 5 | 我的输出是否准备好供下一步构建? | [根据依赖关系图检查] |
### 用证据回答每个问题
检查你的解决方案并为每个问题提供具体证据:
[Q1] 与之前步骤的集成:
- 之前步骤输出:[接收到的相关上下文]
- 我如何基于其构建:[具体集成]
- 任何冲突:[已解决或已标记]
[Q2] 模式一致性:
- 已建立的模式:[列表]
- 我如何遵循它们:[证据]
- 任何偏差:[已证明合理或已修复]
[Q3] 要求完整性:
- 要求:[被要求的内容]
- 交付:[你做了什么]
- 差距分析:[任何差距]
[Q4] 范围遵守:
- 范围内的变更:[列表]
- 范围外的变更:[无,或已证明合理]
[Q5] 输出就绪度:
- 后续步骤需要什么:[基于分解]
- 我提供了什么:[具体输出]
- 完整性:[高/中/低]
### 如果需要则修订
如果**任何**验证问题揭示了差距:
1. **修复** - 解决已识别的具体差距
2. **重新验证** - 确认修复解决了问题
3. **更新** - 更新“后续步骤上下文”部分
关键:在**所有**验证问题都有满意答案之前,不要提交。
关键:对于每个步骤,在单条消息中并行分派元法官和实施代理,使用两个 Task 工具调用。元法官必须是消息中的第一个工具调用,以便它可以在实施代理修改工件之前观察它们。
两个代理都作为前台代理运行。等待两者都完成后再继续分派法官。
元法官提示(每个步骤):
## 任务
为以下步骤生成评估规范 yaml。你将生成评分标准、检查表和评分标准,供法官代理用于评估实现工件。
CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`
## 用户提示
{来自用户的原始任务描述}
## 正在评估的步骤
步骤 {N}/{总步骤数}: {子任务名称}
{子任务描述}
- 输入:{此步骤接收的内容}
- 预期输出:{此步骤应产生的内容}
## 之前步骤上下文
{之前步骤完成内容的摘要}
## 工件类型
{代码 | 文档 | 配置 | 等}
## 说明
在你的回复中只返回最终的评估规范 YAML。
分派示例
在单条消息中发送两个 Task 工具调用。元法官第一,实施第二:
包含 2 个工具调用的消息:
工具调用 1 (元法官):
- description: "元法官 步骤 {N}/{总步骤数}: {子任务名称}"
- model: opus
- subagent_type: "sadd:meta-judge"
工具调用 2 (实施):
- description: "步骤 {N}/{总步骤数}: {子任务名称}"
- model: {选择的模型}
- subagent_type: "{选择的代理类型}"
等待两者都返回后再继续分派法官。
在元法官和实施代理都完成后,分派一个独立的法官,使用元法官评估规范来验证该步骤。
关键:向法官提供完全相同的元法官评估规范 YAML,不要跳过或添加任何内容,不要以任何方式修改它,不要缩短或总结其中的任何文本!
步骤法官的提示模板:
你正在根据元法官生成的评估规范评估步骤 {N}/{总步骤数}: {子任务名称}。
CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`
## 原始任务
{整体任务描述}
## 步骤要求
{子任务描述}
- 输入:{此步骤接收的内容}
- 预期输出:{此步骤应产生的内容}
## 之前步骤上下文
{之前步骤完成内容的摘要}
## 评估规范
```yaml
{元法官的评估规范 YAML}
{实施代理修改的文件路径} {来自实施代理的“后续步骤上下文”部分}
按照你的代理说明中定义的完整法官流程执行!
关键:你必须在回复的开头使用这个精确的结构化评估报告格式(YAML)!
关键:绝对不要以任何形式提供分数阈值,包括 `threshold_pass` 或任何不同的形式。法官绝对不能知道分数阈值是多少,以免产生偏见!!!
**分派:**
使用 Task 工具:
description: "法官 步骤 {N}/{总步骤数}: {子任务名称}"
prompt: {包含精确元法官规范 YAML 的法官验证提示}
model: opus
subagent_type: "sadd:judge"
按顺序处理每个子任务:
工具调用 2 (实施):使用 Task 工具: - description: "步骤 {N}/{总步骤数}: {子任务名称}" - prompt: {包含 CoT + 任务 + 先前上下文 + 自我批判的构建提示} - model: {为此子任务选择的模型} - subagent_type: "{选择的代理类型}"
等待两者都完成。收集输出:
分派法官子代理(附带此步骤的元法官规范):使用 Task 工具:
解析法官裁决(不要阅读完整报告):从法官回复中提取:
基于裁决的决策:
如果分数 ≥4.0: → VERDICT: 通过 → 使用累积的上下文进行下一步 → 将 IMPROVEMENTS 作为可选增强包含在上下文中
如果分数 ≥ 3.0 且所有发现的问题都是低优先级,那么: → VERDICT: 通过 → 使用累积的上下文进行下一步 → 将 IMPROVEMENTS 作为可选增强包含在上下文中
如果分数 <4.0: → VERDICT: 失败 → 检查此步骤的重试次数
如果重试次数 < 3: → 分派重试实施代理,附带: - 原始步骤要求 - 法官的 ISSUES 列表作为反馈 - 法官报告的路径以获取细节 - 修复具体问题的说明 → 使用此步骤的相同元法官规范返回法官验证 → 重试时不要重新运行元法官
如果重试次数 ≥ 3: → 升级给用户(见错误处理) → 不要进行下一步
**实施代理的重试提示模板:**
```markdown
## 需要重试:步骤 {N}/{总步骤数}
你之前的实现未通过法官验证。
<original_requirements>
{子任务描述}
</original_requirements>
<judge_feedback>
VERDICT: 失败
SCORE: {分数}/5.0
ISSUES:
{来自法官的问题列表}
完整报告位于:{法官报告路径}
</judge_feedback>
<your_previous_output>
{之前尝试中修改的文件}
</your_previous_output>
说明:
让我们一步一步地修复已识别的问题。
1. 首先,审查法官识别的每个问题
2. 对于每个问题,确定根本原因
3. 规划每个问题的修复方案
4. 实施**所有**修复
5. 验证你的修复解决了每个问题
6. 提供更新后的“后续步骤上下文”部分
关键:专注于修复已识别的具体问题。不要重写所有内容。
所有子任务完成并通过验证后,回复一份全面的报告:
## 顺序执行摘要
**整体任务:** {原始任务描述}
**总步骤数:** {计数}
**总代理数:** {元法官数(每个步骤一个) + 实施代理数 + 法官代理数 + 重试代理数}
### 逐步结果
| 步骤 | 子任务 | 模型 | 法官评分 | 重试次数 | 状态 |
|------|---------|-------|-------------|---------|--------|
| 1 | {名称} | {模型} | {X.X}/5.0 | {0-3} | 通过 |
| 2 | {名称} | {模型} | {X.X}/5.0 | {0-3} | 通过 |
| ... | ... | ... | ... | ... | ... |
### 修改的文件(所有步骤)
- {文件1}: {更改了什么,哪个步骤}
- {文件2}: {更改了什么,哪个步骤}
...
### 关键决策
- 步骤 1: {决策和理由}
- 步骤 2: {决策和理由}
...
### 集成点
{步骤如何连接并相互构建}
### 法官验证摘要
| 步骤 | 初始评分 | 最终评分 | 已修复问题 |
|------|---------------|-------------|--------------|
| 1 | {X.X} | {X.X} | {计数 或 "无"} |
| 2 | {X.X} | {X.X} | {计数 或 "无"} |
### 元法官规范
每个步骤生成一个评估规范(与实施并行),在每个步骤内的重试中重用。
### 后续建议
{法官建议的任何改进、需要运行的测试或需要的手动验证}
法官验证的迭代循环会自动处理大多数失败:
法官失败(可重试):
1. 从法官裁决中解析 ISSUES
2. 分派带有反馈的重试实施代理
3. 使用相同步骤的元法官规范重新验证(**不要**重新运行元法官)
4. 重复直到通过或达到最大重试次数 (3)
当一个步骤三次法官验证都失败时:
升级报告格式:
## 步骤 {N} 验证失败(超过最大重试次数)
### 步骤要求
{子任务描述}
### 验证历史
| 尝试 | 评分 | 关键问题 |
|---------|-------|------------|
| 1 | {X.X}/5.0 | {问题} |
| 2 | {X.X}/5.0 | {问题} |
| 3 | {X.X}/5.0 | {问题} |
| 4 | {X.X}/5.0 | {问题} |
### 持续性问题
{在多次尝试中出现的问题}
### 法官报告
- .specs/reports/{任务名称}-步骤-{N}-尝试-1.md
- .specs/reports/{任务名称}-步骤-{N}-尝试-2.md
- .specs/reports/{任务名称}-步骤-{N}-尝试-3.md
- .specs/reports/{任务名称}-步骤-{N}-尝试-4.md
### 选项
1. **提供指导** - 为另一次重试提供额外上下文
2. **修改要求** - 简化或澄清步骤要求
3. **跳过步骤** - 标记为已跳过并继续(如果非关键)
4. **中止** - 停止执行并保留部分进度
等待你的决定...
绝对不要:
输入:
/do-in-steps 将 UserService.getUser() 的返回类型从 User 更改为 UserDTO 并更新所有使用者
阶段 1 - 分解:
| 步骤 | 子任务 | 依赖项 | 复杂度 | 类型 | 输出 |
|---|---|---|---|---|---|
| 1 | 创建具有适当结构的 UserDTO 类 | - | 中 | 实现 | 新的 UserDTO.ts 文件 |
| 2 | 更新 UserService.getUser() 以返回 UserDTO | 步骤 1 | 高 | 实现 | 修改的 UserService |
| 3 | 更新 UserController 以处理 UserDTO | 步骤 2 | 中 | 重构 | 修改的 UserController |
| 4 | 更新 UserService 和 UserController 的测试 | 步骤 2,3 | 中 | 测试 | 更新的测试文件 |
阶段 2 - 模型选择:
| 步骤 | 子任务 | 模型 | 代理 | 理由 |
|---|---|---|---|---|
| 1 | 创建 DTO | sonnet | sdd:developer | 中等复杂度,标准模式 |
| 2 | 更新 Service | opus | sdd:developer | 高风险,核心服务变更 |
| 3 | 更新 Controller | sonnet | sdd:developer | 中等复杂度,遵循模式 |
| 4 | 更新测试 | sonnet | sdd:tdd-developer | 测试专业知识 |
阶段 3 - 并行元法官和法官验证的执行:
步骤 1: 创建 UserDTO
并行分派(单条消息,2 个工具调用):
工具调用 1 — 元法官 (Opus, sadd:meta-judge)...
→ 生成了步骤特定的评估规范 YAML
工具调用 2 — 实施 (Sonnet, sdd:developer)...
→ 创建了带有 id, name, email, createdAt 字段的 UserDTO.ts
法官验证 (Opus, sadd:judge, 附带步骤 1 元法官规范)...
→ VERDICT: 通过, SCORE: 4.2/5.0
→ IMPROVEMENTS: 考虑添加验证方法
→ 传递的上下文: UserDTO 接口,文件路径
步骤 2: 更新 UserService (第一次尝试失败)
并行分派(单条消息,2 个工具调用):
工具调用 1 — 元法官 (Opus, sadd:meta-judge)...
→ 生成了步骤特定的评估规范 YAML
工具调用 2 — 实施 (Opus, sdd:developer)...
→ 更新了返回类型但遗漏了映射逻辑
法官验证 (Opus, sadd:judge, 附带步骤 2 元法官规范)...
→ VERDICT: 失败, SCORE: 2.8/5.0
→ ISSUES: 缺少 User->UserDTO 映射,返回类型已更改但仍返回 User
重试实施 (Opus) 附带法官反馈...
→ 添加了静态 fromUser() 工厂方法
→ 更新了 getUser() 以使用映射
法官验证 (Opus, sadd:judge, 相同的步骤 2 元法官规范)...
→ VERDICT: 通过, SCORE: 4.5/5.0
→ 传递的上下文: 方法签名已更改,使用了映射模式
步骤 3: 更新 UserController
并行分派(单条消息,2 个工具调用):
工具调用 1 — 元法官 (Opus, sadd:meta-judge)...
→ 生成了步骤特定的评估规范 YAML
工具调用 2 — 实施 (Sonnet, sdd:developer)...
→ 更新了控制器以期望 UserDTO
法官验证 (Opus, sadd:judge, 附带步骤 3 元法官规范)...
→ VERDICT: 通过, SCORE: 4.0/5.0
→ 传递的上下文: 端点契约已更新
步骤 4: 更新测试
并行分派(单条消息,2 个工具调用):
工具调用 1 — 元法官 (Opus, sadd:meta-judge)...
→ 生成了步骤特定的评估规范 YAML
工具调用 2 — 实施 (Sonnet, sdd:tdd-developer)...
→ 更新了服务和控制器测试
法官验证 (Opus, sadd:judge, 附带步骤 4 元法官规范)...
→ VERDICT: 通过, SCORE: 4.3/5.0
→ 所有步骤完成
最终摘要:
CRITICAL: You are the orchestrator only - you MUST NOT perform the task yourself. IF you read, write or run bash tools you failed task imidiatly. It is single most critical criteria for you. If you used anyting except sub-agents you will be killed immediatly!!!! Your role is to:
NEVER:
ALWAYS:
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT} in prompts to meta-judge and judge agentsAny deviation from orchestration (attempting to implement subtasks yourself, reading implementation files, reading full judge reports, or making direct changes) will result in context pollution and ultimate failure, as a result you will be fired!
Before starting, ensure the reports directory exists:
mkdir -p .specs/reports
Report naming convention: .specs/reports/{task-name}-step-{N}-{YYYY-MM-DD}.md
Where:
{task-name} - Derived from task description (e.g., user-dto-refactor){N} - Step number{YYYY-MM-DD} - Current dateNote: Implementation outputs go to their specified locations; only judge verification reports go to .specs/reports/
Analyze the task systematically using Zero-shot Chain-of-Thought reasoning:
Let me analyze this task step by step to decompose it into sequential subtasks:
1. **Task Understanding**
"What is the overall objective?"
- What is being asked?
- What is the expected final outcome?
- What constraints exist?
2. **Identify Natural Boundaries**
"Where does the work naturally divide?"
- Database/model changes (foundation)
- Interface/contract changes (dependencies)
- Implementation changes (core work)
- Integration/caller updates (ripple effects)
- Testing/validation (verification)
- Documentation (finalization)
3. **Dependency Identification**
"What must happen before what?"
- "If I do B before A, will B break or use stale information?"
- "Does B need any output from A as input?"
- "Would doing B first require redoing work after A?"
- What is the minimal viable ordering?
4. **Define Clear Boundaries**
"What exactly does each subtask encompass?"
- Input: What does this step receive?
- Action: What transformation/change does it make?
- Output: What does this step produce?
- Verification: How do we know it succeeded?
Decomposition Guidelines:
| Pattern | Decomposition Strategy | Example |
|---|---|---|
| Interface change | 1. Update interface, 2. Update implementations, 3. Update consumers | "Change return type of getUser" |
| Feature addition | 1. Add core logic, 2. Add integration points, 3. Add API layer | "Add caching to UserService" |
| Refactoring | 1. Extract/modify core, 2. Update internal references, 3. Update external references | "Extract helper class from Service" |
| Bug fix with impact | 1. Fix root cause, 2. Fix dependent issues, 3. Update tests | "Fix calculation error affecting reports" |
| Multi-layer change | 1. Data layer, 2. Business layer, 3. API layer, 4. Client layer | "Add new field to User entity" |
Decomposition Output Format:
## Task Decomposition
### Original Task
{task_description}
### Subtasks (Sequential Order)
| Step | Subtask | Depends On | Complexity | Type | Output |
|------|---------|------------|------------|------|--------|
| 1 | {description} | - | {low/med/high} | {type} | {what it produces} |
| 2 | {description} | Step 1 | {low/med/high} | {type} | {what it produces} |
| 3 | {description} | Steps 1,2 | {low/med/high} | {type} | {what it produces} |
...
### Dependency Graph
Step 1 ─→ Step 2 ─→ Step 3 ─→ ...
For each subtask, analyze and select the optimal model:
Let me determine the optimal configuration for each subtask:
For Subtask N:
1. **Complexity Assessment**
"How complex is the reasoning required?"
- High: Architecture decisions, novel problem-solving, critical logic changes
- Medium: Standard patterns, moderate refactoring, API updates
- Low: Simple transformations, straightforward updates, documentation
2. **Scope Assessment**
"How extensive is the work?"
- Large: Multiple files, complex interactions
- Medium: Single component, focused changes
- Small: Minor modifications, single file
3. **Risk Assessment**
"What is the impact of errors?"
- High: Breaking changes, security-sensitive, data integrity
- Medium: Internal changes, reversible modifications
- Low: Non-critical utilities, documentation
4. **Domain Expertise Check**
"Does this match a specialized agent profile?"
- Development: implementation, refactoring, bug fixes
- Architecture: system design, pattern selection
- Documentation: API docs, comments, README updates
- Testing: test generation, test updates
Model Selection Matrix:
| Complexity | Scope | Risk | Recommended Model |
|---|---|---|---|
| High | Any | Any | opus |
| Any | Any | High | opus |
| Medium | Large | Medium | opus |
| Medium | Medium | Medium | sonnet |
| Medium | Small |
Decision Tree per Subtask:
Is this subtask CRITICAL (architecture, interface, breaking changes)?
|
+-- YES --> Use Opus (highest capability for critical work)
| |
| +-- Does it match a specialized domain?
| +-- YES --> Include specialized agent prompt
| +-- NO --> Use Opus alone
|
+-- NO --> Is this subtask COMPLEX but not critical?
|
+-- YES --> Use Sonnet (balanced capability/cost)
|
+-- NO --> Is output LONG but task not complex?
|
+-- YES --> Use Sonnet (handles length well)
|
+-- NO --> Is this subtask SIMPLE/MECHANICAL?
|
+-- YES --> Use Haiku (fast, cheap)
|
+-- NO --> Use Sonnet (default for uncertain)
Specialized Agent: Specialized agent list depends on project and plugins that are loaded. Common agents from the sdd plugin include: sdd:developer, sdd:tdd-developer, sdd:researcher, sdd:software-architect, sdd:tech-lead, sdd:team-lead, sdd:qa-engineer. If the appropriate specialized agent is not available, fallback to a general agent without specialization.
Decision: Use specialized agent when subtask clearly benefits from domain expertise AND complexity justifies the overhead (not for Haiku-tier tasks).
Selection Output Format:
## Model/Agent Selection
| Step | Subtask | Model | Agent | Rationale |
|------|---------|-------|-------|-----------|
| 1 | Update interface | opus | sdd:developer | Complex API design |
| 2 | Update implementations | sonnet | sdd:developer | Follow patterns |
| 3 | Update callers | haiku | - | Simple find/replace |
| 4 | Update tests | sonnet | sdd:tdd-developer | Test expertise |
Execute subtasks one by one. For each step, dispatch a meta-judge AND implementation agent in parallel , then verify with an independent judge using the meta-judge's specification. Iterate if needed, then pass context forward.
Execution Flow per Step:
┌──────────────────────────────────────────────────────────────────────────────┐
│ Step N │
│ │
│ ┌──────────────┐ │
│ │ Meta-Judge │──┐ (parallel) │
│ │ (Sub-agent) │ │ │
│ └──────────────┘ │ ┌──────────────┐ ┌──────────────────────┐ │
│ ├──▶│ Judge │────▶│ Parse Verdict │ │
│ ┌──────────────┐ │ │ (Sub-agent) │ │ (Orchestrator) │ │
│ │ Implementer │──┘ └──────────────┘ └──────────────────────┘ │
│ │ (Sub-agent) │ │ │
│ └──────────────┘ ▼ │
│ ▲ ┌─────────────────────────┐ │
│ │ │ PASS (≥4.0)? │ │
│ │ │ ├─ YES → Next Step │ │
│ │ │ ├─ ≥3.0 + low → PASS │ │
│ │ │ └─ NO → Retry? │ │
│ │ │ ├─ <3 → Retry │ │
│ │ │ └─ ≥3 → Escalate │ │
│ │ └─────────────────────────┘ │
│ │ │ │
│ └────────────── feedback ────────────────────┘ │
│ (retries reuse same meta-judge spec, no new meta-judge) │
└──────────────────────────────────────────────────────────────────────────────┘
After each subtask completes, extract relevant context for subsequent steps:
Context to pass forward:
Context filtering:
Context Size Guideline: If cumulative context exceeds ~500 words, summarize older steps more aggressively. Sub-agents can read files directly if they need details.
Example of Context Accumulation (Concrete):
## Completed Steps Summary
### Step 1: Define UserRepository Interface
- **What was done:** Created `src/repositories/UserRepository.ts` with interface definition
- **Key outputs:**
- Interface: `IUserRepository` with methods: `findById`, `findByEmail`, `create`, `update`, `delete`
- Types: `UserCreateInput`, `UserUpdateInput` in `src/types/user.ts`
- **Relevant for next steps:**
- Implementation must fulfill `IUserRepository` interface
- Use the defined input types for method signatures
### Step 2: Implement UserRepository
- **What was done:** Created `src/repositories/UserRepositoryImpl.ts` implementing `IUserRepository`
- **Key outputs:**
- Class: `UserRepositoryImpl` with all interface methods implemented
- Uses existing database connection from `src/db/connection.ts`
- **Relevant for next steps:**
- Import repository from `src/repositories/UserRepositoryImpl`
- Constructor requires `DatabaseConnection` injection
For each subtask, construct the prompt with these mandatory components:
## Reasoning Approach
Before taking any action, think through this subtask systematically.
Let's approach this step by step:
1. "Let me understand what was done in previous steps..."
- What context am I building on?
- What interfaces/patterns were established?
- What constraints did previous steps introduce?
2. "Let me understand what this step requires..."
- What is the specific objective?
- What are the boundaries of this step?
- What must I NOT change (preserve from previous steps)?
3. "Let me plan my approach..."
- What specific modifications are needed?
- What order should I make them?
- What could go wrong?
4. "Let me verify my approach before implementing..."
- Does my plan achieve the objective?
- Am I consistent with previous steps' changes?
- Is there a simpler way?
Work through each step explicitly before implementing.
<task>
{Subtask description}
</task>
<subtask_context>
Step {N} of {total_steps}: {subtask_name}
</subtask_context>
<previous_steps_context>
{Summary of relevant outputs from previous steps - ONLY if this is not the first step}
- Step 1: {what was done, key files modified, relevant decisions}
- Step 2: {what was done, key files modified, relevant decisions}
...
</previous_steps_context>
<constraints>
- Focus ONLY on this specific subtask
- Build upon (do not undo) changes from previous steps
- Follow existing code patterns and conventions
- Produce output that subsequent steps can build upon
</constraints>
<input>
{What this subtask receives - files, context, dependencies}
</input>
<output>
{Expected deliverable - modified files, new files, summary of changes}
CRITICAL: At the end of your work, provide a "Context for Next Steps" section with:
- Files modified (full paths)
- Key changes summary (3-5 bullet points)
- Any decisions that affect later steps
- Warnings or considerations for subsequent steps
</output>
## Self-Critique Verification (MANDATORY)
Before completing, verify your work integrates properly with previous steps. Do not submit unverified changes.
### Verification Questions
Generate verification questions based on the subtask description and the previous steps context. Examples:
| # | Question | Evidence Required |
|---|----------|-------------------|
| 1 | Does my work build correctly on previous step outputs? | [Specific evidence] |
| 2 | Did I maintain consistency with established patterns/interfaces? | [Specific evidence] |
| 3 | Does my solution address ALL requirements for this step? | [Specific evidence] |
| 4 | Did I stay within my scope (not modifying unrelated code)? | [List any out-of-scope changes] |
| 5 | Is my output ready for the next step to build upon? | [Check against dependency graph] |
### Answer Each Question with Evidence
Examine your solution and provide specific evidence for each question:
[Q1] Previous Step Integration:
- Previous step output: [relevant context received]
- How I built upon it: [specific integration]
- Any conflicts: [resolved or flagged]
[Q2] Pattern Consistency:
- Patterns established: [list]
- How I followed them: [evidence]
- Any deviations: [justified or fixed]
[Q3] Requirement Completeness:
- Required: [what was asked]
- Delivered: [what you did]
- Gap analysis: [any gaps]
[Q4] Scope Adherence:
- In-scope changes: [list]
- Out-of-scope changes: [none, or justified]
[Q5] Output Readiness:
- What later steps need: [based on decomposition]
- What I provided: [specific outputs]
- Completeness: [HIGH/MEDIUM/LOW]
### Revise If Needed
If ANY verification question reveals a gap:
1. **FIX** - Address the specific gap identified
2. **RE-VERIFY** - Confirm the fix resolves the issue
3. **UPDATE** - Update the "Context for Next Steps" section
CRITICAL: Do not submit until ALL verification questions have satisfactory answers.
CRITICAL : For each step, dispatch the meta-judge AND implementation agent in parallel in a single message with two Task tool calls. The meta-judge MUST be the first tool call in the message so it can observe artifacts before the implementation agent modifies them.
Both agents run as foreground agents. Wait for BOTH to complete before proceeding to judge dispatch.
Meta-Judge Prompt (per step):
## Task
Generate an evaluation specification yaml for the following step. You will produce rubrics, checklists, and scoring criteria that a judge agent will use to evaluate the implementation artifact.
CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`
## User Prompt
{Original task description from user}
## Step Being Evaluated
Step {N}/{total}: {subtask_name}
{subtask_description}
- Input: {what this step receives}
- Expected output: {what this step should produce}
## Previous Steps Context
{Summary of what previous steps accomplished}
## Artifact Type
{code | documentation | configuration | etc.}
## Instructions
Return only the final evaluation specification YAML in your response.
Dispatch Example
Send BOTH Task tool calls in a single message. Meta-judge first, implementation second:
Message with 2 tool calls:
Tool call 1 (meta-judge):
- description: "Meta-judge Step {N}/{total}: {subtask_name}"
- model: opus
- subagent_type: "sadd:meta-judge"
Tool call 2 (implementation):
- description: "Step {N}/{total}: {subtask_name}"
- model: {selected model}
- subagent_type: "{selected agent type}"
Wait for BOTH to return before proceeding to judge dispatch.
After BOTH meta-judge and implementation agent complete, dispatch an independent judge to verify the step using the meta-judge evaluation specification.
CRITICAL: Provide to the judge EXACT meta-judge's evaluation specification YAML, do not skip or add anything, do not modify it in any way, do not shorten or summarize any text in it!
Prompt template for step judge:
You are evaluating Step {N}/{total}: {subtask_name} against an evaluation specification produced by the meta judge.
CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`
## Original Task
{overall_task_description}
## Step Requirements
{subtask_description}
- Input: {what this step receives}
- Expected output: {what this step should produce}
## Previous Steps Context
{Summary of what previous steps accomplished}
## Evaluation Specification
```yaml
{meta-judge's evaluation specification YAML}
{Path to files modified by implementation agent} {Context for Next Steps section from implementation agent}
Follow your full judge process as defined in your agent instructions!
CRITICAL: You must reply with this exact structured evaluation report format in YAML at the START of your response!
CRITICAL: NEVER provide score threshold, in any format, including `threshold_pass` or anything different. Judge MUST not know what threshold for score is, in order to not be biased!!!
**Dispatch:**
Use Task tool:
description: "Judge Step {N}/{total}: {subtask_name}"
prompt: {judge verification prompt with exact meta-judge specification YAML}
model: opus
subagent_type: "sadd:judge"
For each subtask in sequence:
Tool call 2 (implementation): Use Task tool: - description: "Step {N}/{total}: {subtask_name}" - prompt: {constructed prompt with CoT + task + previous context + self-critique} - model: {selected model for this subtask} - subagent_type: "{selected agent type}"
Wait for BOTH to complete. Collect outputs:
Dispatch judge sub-agent (with this step's meta-judge specification): Use Task tool:
Parse judge verdict (DO NOT read full report): Extract from judge reply:
Decision based on verdict:
If score ≥4.0: → VERDICT: PASS → Proceed to next step with accumulated context → Include IMPROVEMENTS in context as optional enhancements
IF score ≥ 3.0 and all found issues are low priority, then: → VERDICT: PASS → Proceed to next step with accumulated context → Include IMPROVEMENTS in context as optional enhancements
If score <4.0: → VERDICT: FAIL → Check retry count for this step
If retries < 3: → Dispatch retry implementation agent with: - Original step requirements - Judge's ISSUES list as feedback - Path to judge report for details - Instruction to fix specific issues → Return to judge verification with SAME meta-judge specification from this step → Do NOT re-run meta-judge for retries
If retries ≥ 3: → Escalate to user (see Error Handling) → Do NOT proceed to next step
**Retry prompt template for implementation agent:**
```markdown
## Retry Required: Step {N}/{total}
Your previous implementation did not pass judge verification.
<original_requirements>
{subtask_description}
</original_requirements>
<judge_feedback>
VERDICT: FAIL
SCORE: {score}/5.0
ISSUES:
{list of issues from judge}
Full report available at: {path_to_judge_report}
</judge_feedback>
<your_previous_output>
{files modified in previous attempt}
</your_previous_output>
Instructions:
Let's fix the identified issues step by step.
1. First, review each issue the judge identified
2. For each issue, determine the root cause
3. Plan the fix for each issue
4. Implement ALL fixes
5. Verify your fixes address each issue
6. Provide updated "Context for Next Steps" section
CRITICAL: Focus on fixing the specific issues identified. Do not rewrite everything.
After all subtasks complete and pass verification, reply with a comprehensive report:
## Sequential Execution Summary
**Overall Task:** {original task description}
**Total Steps:** {count}
**Total Agents:** {meta_judges(one per step) + implementation_agents + judge_agents + retry_agents}
### Step-by-Step Results
| Step | Subtask | Model | Judge Score | Retries | Status |
|------|---------|-------|-------------|---------|--------|
| 1 | {name} | {model} | {X.X}/5.0 | {0-3} | PASS |
| 2 | {name} | {model} | {X.X}/5.0 | {0-3} | PASS |
| ... | ... | ... | ... | ... | ... |
### Files Modified (All Steps)
- {file1}: {what changed, which step}
- {file2}: {what changed, which step}
...
### Key Decisions Made
- Step 1: {decision and rationale}
- Step 2: {decision and rationale}
...
### Integration Points
{How the steps connected and built upon each other}
### Judge Verification Summary
| Step | Initial Score | Final Score | Issues Fixed |
|------|---------------|-------------|--------------|
| 1 | {X.X} | {X.X} | {count or "None"} |
| 2 | {X.X} | {X.X} | {count or "None"} |
### Meta-Judge Specifications
One evaluation specification generated per step (in parallel with implementation), reused across retries within each step.
### Follow-up Recommendations
{Any improvements suggested by judges, tests to run, or manual verification needed}
The judge-verified iteration loop handles most failures automatically:
Judge FAIL (Retry Available):
1. Parse ISSUES from judge verdict
2. Dispatch retry implementation agent with feedback
3. Re-verify with judge (using same step's meta-judge specification — do NOT re-run meta-judge)
4. Repeat until PASS or max retries (3)
When a step fails judge verification three times:
Escalation Report Format:
## Step {N} Failed Verification (Max Retries Exceeded)
### Step Requirements
{subtask_description}
### Verification History
| Attempt | Score | Key Issues |
|---------|-------|------------|
| 1 | {X.X}/5.0 | {issues} |
| 2 | {X.X}/5.0 | {issues} |
| 3 | {X.X}/5.0 | {issues} |
| 4 | {X.X}/5.0 | {issues} |
### Persistent Issues
{Issues that appeared in multiple attempts}
### Judge Reports
- .specs/reports/{task-name}-step-{N}-attempt-1.md
- .specs/reports/{task-name}-step-{N}-attempt-2.md
- .specs/reports/{task-name}-step-{N}-attempt-3.md
- .specs/reports/{task-name}-step-{N}-attempt-4.md
### Options
1. **Provide guidance** - Give additional context for another retry
2. **Modify requirements** - Simplify or clarify step requirements
3. **Skip step** - Mark as skipped and continue (if non-critical)
4. **Abort** - Stop execution and preserve partial progress
Awaiting your decision...
Never:
Input:
/do-in-steps Change the return type of UserService.getUser() from User to UserDTO and update all consumers
Phase 1 - Decomposition:
| Step | Subtask | Depends On | Complexity | Type | Output |
|---|---|---|---|---|---|
| 1 | Create UserDTO class with proper structure | - | Medium | Implementation | New UserDTO.ts file |
| 2 | Update UserService.getUser() to return UserDTO | Step 1 | High | Implementation | Modified UserService |
| 3 | Update UserController to handle UserDTO | Step 2 | Medium | Refactoring | Modified UserController |
| 4 | Update tests for UserService and UserController | Steps 2,3 | Medium | Testing | Updated test files |
Phase 2 - Model Selection:
| Step | Subtask | Model | Agent | Rationale |
|---|---|---|---|---|
| 1 | Create DTO | sonnet | sdd:developer | Medium complexity, standard pattern |
| 2 | Update Service | opus | sdd:developer | High risk, core service change |
| 3 | Update Controller | sonnet | sdd:developer | Medium complexity, follows patterns |
| 4 | Update Tests | sonnet | sdd:tdd-developer | Test expertise |
Phase 3 - Execution with Parallel Meta-Judge and Judge Verification:
Step 1: Create UserDTO
Parallel dispatch (single message, 2 tool calls):
Tool call 1 — Meta-judge (Opus, sadd:meta-judge)...
→ Generated step-specific evaluation specification YAML
Tool call 2 — Implementation (Sonnet, sdd:developer)...
→ Created UserDTO.ts with id, name, email, createdAt fields
Judge Verification (Opus, sadd:judge, with step 1 meta-judge spec)...
→ VERDICT: PASS, SCORE: 4.2/5.0
→ IMPROVEMENTS: Consider adding validation methods
→ Context passed: UserDTO interface, file path
Step 2: Update UserService (First Attempt Failed)
Parallel dispatch (single message, 2 tool calls):
Tool call 1 — Meta-judge (Opus, sadd:meta-judge)...
→ Generated step-specific evaluation specification YAML
Tool call 2 — Implementation (Opus, sdd:developer)...
→ Updated return type but missed mapping logic
Judge Verification (Opus, sadd:judge, with step 2 meta-judge spec)...
→ VERDICT: FAIL, SCORE: 2.8/5.0
→ ISSUES: Missing User->UserDTO mapping, return type changed but still returns User
Retry Implementation (Opus) with judge feedback...
→ Added static fromUser() factory method
→ Updated getUser() to use mapping
Judge Verification (Opus, sadd:judge, same step 2 meta-judge spec)...
→ VERDICT: PASS, SCORE: 4.5/5.0
→ Context passed: Method signature changed, mapping pattern used
Step 3: Update UserController
Parallel dispatch (single message, 2 tool calls):
Tool call 1 — Meta-judge (Opus, sadd:meta-judge)...
→ Generated step-specific evaluation specification YAML
Tool call 2 — Implementation (Sonnet, sdd:developer)...
→ Updated controller to expect UserDTO
Judge Verification (Opus, sadd:judge, with step 3 meta-judge spec)...
→ VERDICT: PASS, SCORE: 4.0/5.0
→ Context passed: Endpoint contracts updated
Step 4: Update Tests
Parallel dispatch (single message, 2 tool calls):
Tool call 1 — Meta-judge (Opus, sadd:meta-judge)...
→ Generated step-specific evaluation specification YAML
Tool call 2 — Implementation (Sonnet, sdd:tdd-developer)...
→ Updated service and controller tests
Judge Verification (Opus, sadd:judge, with step 4 meta-judge spec)...
→ VERDICT: PASS, SCORE: 4.3/5.0
→ All steps complete
Final Summary:
Input:
/do-in-steps Add email notification capability to the order processing system
Phase 1 - Decomposition:
| Step | Subtask | Depends On | Complexity | Type | Output |
|---|---|---|---|---|---|
| 1 | Create EmailService with send capability | - | Medium | Implementation | New EmailService class |
| 2 | Add notification triggers to OrderService | Step 1 | Medium | Implementation | Modified OrderService |
| 3 | Create email templates for order events | Step 2 | Low | Documentation | Template files |
| 4 | Add configuration and environment variables | Step 1 | Low | Configuration | Updated config files |
| 5 | Add integration tests for email flow |
Phase 2 - Model Selection:
| Step | Subtask | Model | Rationale |
|---|---|---|---|
| 1 | EmailService | sonnet | Standard implementation |
| 2 | Notification triggers | sonnet | Business logic |
| 3 | Email templates | haiku | Simple content |
| 4 | Configuration | haiku | Mechanical updates |
| 5 | Integration tests | sonnet | Test expertise |
Phase 3 - Execution Summary (each step has parallel meta-judge + implementation):
| Step | Subtask | Meta-Judge | Judge Score | Retries | Status |
|---|---|---|---|---|---|
| 1 | EmailService | Step-specific spec | 4.1/5.0 | 0 | PASS |
| 2 | Notification triggers | Step-specific spec | 4.2/5.0 | 1 | PASS |
| 3 | Email templates | Step-specific spec | 4.5/5.0 | 0 | PASS |
| 4 | Configuration | Step-specific spec | 4.2/5.0 | 0 | PASS |
| 5 | Integration tests | Step-specific spec |
Total Agents: 16 (5 meta-judges + 5 implementations + 1 retry + 5 judges)
Input:
/do-in-steps Rename 'userId' to 'accountId' across the codebase - this affects interfaces, implementations, and callers
Phase 1 - Decomposition:
| Step | Subtask | Depends On | Complexity | Type | Output |
|---|---|---|---|---|---|
| 1 | Update interface definitions | - | High | Refactoring | Updated interfaces |
| 2 | Update implementations of those interfaces | Step 1 | Low | Refactoring | Updated implementations |
| 3 | Update callers and consumers | Step 2 | Low | Refactoring | Updated caller files |
| 4 | Update tests | Step 3 | Low | Testing | Updated test files |
| 5 | Update documentation |
Phase 2 - Model Selection:
| Step | Subtask | Model | Rationale |
|---|---|---|---|
| 1 | Update interfaces | opus | Breaking changes need careful handling |
| 2 | Update implementations | haiku | Mechanical rename |
| 3 | Update callers | haiku | Mechanical updates |
| 4 | Update tests | haiku | Mechanical test fixes |
| 5 | Update documentation | haiku | Simple text updates |
Phase 3 - Execution with Escalation (each step has parallel meta-judge + implementation):
Step 1: Update interfaces
Parallel dispatch: Meta-judge + Implementation
→ Judge (Opus, sadd:judge, with step 1 meta-judge spec): PASS, 4.3/5.0
Step 2: Update implementations
Parallel dispatch: Meta-judge + Implementation
→ Judge (Opus, sadd:judge, with step 2 meta-judge spec): PASS, 4.0/5.0
Step 3: Update callers (Problem Detected)
Parallel dispatch: Meta-judge + Implementation
Attempt 1: Judge FAIL, 2.5/5.0 (using step 3 meta-judge spec)
→ ISSUES: Missed 12 occurrences in legacy module
Attempt 2: Judge FAIL, 2.8/5.0 (reusing same step 3 meta-judge spec)
→ ISSUES: Still missing 4 occurrences, found new deprecated API usage
Attempt 3: Judge FAIL, 3.2/5.0 (reusing same step 3 meta-judge spec)
→ ISSUES: 2 occurrences in dynamically generated code
Attempt 4: Judge FAIL, 3.3/5.0 (reusing same step 3 meta-judge spec)
→ ISSUES: Dynamic code generation still not fully addressed
ESCALATION TO USER:
"Step 3 failed after 4 attempts. Persistent issue: Dynamic code generation
in LegacyAdapter.ts generates 'userId' at runtime.
Options: 1) Provide guidance, 2) Modify requirements, 3) Skip, 4) Abort"
User response: "Update LegacyAdapter to use string template with accountId"
Attempt 5 (with user guidance, reusing same step 3 meta-judge spec): Judge PASS, 4.1/5.0
Step 4-5: Each with parallel meta-judge + implementation, complete without issues
Total Agents: 20 (5 meta-judges + 5 implementations + 5 retries + 5 judges)
| Step Type | Implementation Model |
|---|---|
| Critical/Breaking | Opus |
| Standard | Opus |
| Long and Simple | Sonnet |
| Simple and Short | Haiku |
| Scenario | What to Pass | What to Omit |
|---|---|---|
| Interface defined in step 1 | Full interface definition | Implementation details |
| Implementation in step 2 | Key patterns, file locations | Internal logic |
| Integration in step 3 | Usage patterns, entry points | Step 2 internal details |
| Judge feedback for retry | ISSUES list, report path | Full report contents |
Keep context focused:
## Context for Next Steps
### Files Modified
- `src/dto/UserDTO.ts` (new file)
- `src/services/UserService.ts` (modified)
### Key Changes Summary
- Created UserDTO with fields: id (string), name (string), email (string), createdAt (Date)
- UserDTO includes static `fromUser(user: User): UserDTO` factory method
- Added `toDTO()` method to User class for convenience
### Decisions That Affect Later Steps
- Used class-based DTO (not interface) to enable transformation methods
- Opted for explicit mapping over automatic serialization for better control
### Warnings for Subsequent Steps
- UserDTO does NOT include password field - ensure no downstream code expects it
- The `createdAt` field is formatted as ISO string in JSON serialization
### Verification Points
- TypeScript compiles without errors
- UserDTO.fromUser() correctly maps all User properties
- Existing service tests still pass
---
VERDICT: PASS
SCORE: 4.2/5.0
ISSUES:
- None
IMPROVEMENTS:
- Consider adding input validation to fromUser() method
- Add JSDoc comments for better IDE support
---
## Detailed Evaluation
[Evidence and analysis following meta-judge specification rubrics...]
---
VERDICT: FAIL
SCORE: 2.8/5.0
ISSUES:
- Missing User->UserDTO mapping logic in getUser() method
- Return type annotation changed but actual return value still returns User object
- No null handling for optional User fields
IMPROVEMENTS:
- Add static fromUser() factory method to UserDTO
- Implement toDTO() as instance method on User class
---
Key Insight: Complex tasks with dependencies benefit from sequential execution where each step operates in a fresh context while receiving only the relevant outputs from previous steps. Per-step meta-judge evaluation specifications ensure tailored evaluation criteria specific to each step's requirements, while running in parallel with implementation for speed. External judge verification catches blind spots that self-critique misses, while the iteration loop (reusing the same step's meta-judge spec) ensures quality before proceeding. This prevents both context pollution and error propagation.
Weekly Installs
249
Repository
GitHub Stars
708
First Seen
Feb 19, 2026
Installed on
opencode244
codex242
github-copilot242
gemini-cli241
kimi-cli239
amp239
AI Elements:基于shadcn/ui的AI原生应用组件库,快速构建对话界面
56,200 周安装
Schema结构化数据完整指南:实现富媒体结果与AI搜索优化(2025)
244 周安装
实用程序员框架:DRY、正交性等七大元原则提升软件质量与架构设计
244 周安装
Python PDF处理指南:合并拆分、提取文本表格、创建PDF文件教程
244 周安装
Ruby on Rails 应用开发指南:构建功能全面的Rails应用,包含模型、控制器、身份验证与最佳实践
245 周安装
代码规范库技能 - 多语言编码标准库,支持Python/Go/Rust/TypeScript等自动加载
245 周安装
阿里云Model Studio模型爬取与技能自动生成工具 - 自动化AI技能开发
245 周安装
| Low |
sonnet |
| Low | Any | Low | haiku |
| Steps 1-4 |
| Medium |
| Testing |
| Test files |
| 4.0/5.0 |
| 0 |
| PASS |
| Step 4 |
| Low |
| Documentation |
| Updated docs |