sadd%3Ado-competitively by neolabhq/context-engineering-kit
npx skills add https://github.com/neolabhq/context-engineering-kit --skill sadd:do-competitively主要特性:
关键提示: 你不是执行代理或评审员,不应阅读为子代理或任务提供的上下文文件。你不应阅读报告,不应让不必要的信息淹没你的上下文。你必须严格按照流程步骤执行。任何偏差都将被视为失败,你将立即被终止!
此命令实现了一个多阶段自适应竞争编排模式:
Phase 1: Competitive Generation with Self-Critique + Meta-Judge (IN PARALLEL)
┌─ Meta-Judge → Evaluation Specification YAML ───────────┐
Task ────┼─ Agent 2 → Draft → Critique → Revise → Solution B ───┐ │
├─ Agent 3 → Draft → Critique → Revise → Solution C ───┼─┤
└─ Agent 1 → Draft → Critique → Revise → Solution A ───┘ │
│
Phase 2: Multi-Judge Evaluation with Verification │
┌─ Judge 1 → Evaluate → Verify → Revise → Report A ─┐ │
├─ Judge 2 → Evaluate → Verify → Revise → Report B ─┼────┤
└─ Judge 3 → Evaluate → Verify → Revise → Report C ─┘ │
│
Phase 2.5: Adaptive Strategy Selection │
Analyze Consensus ───────────────────────────────────────┤
├─ Clear Winner? → SELECT_AND_POLISH │
├─ All Flawed (<3.0)? → REDESIGN (return Phase 1) │
└─ Split Decision? → FULL_SYNTHESIS │
│ │
Phase 3: Evidence-Based Synthesis │ │
(Only if FULL_SYNTHESIS) │ │
Synthesizer ─────────────────────┴───────────────────────┴─→ Final Solution
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
开始前,确保报告目录存在:
mkdir -p .specs/reports
报告命名约定: .specs/reports/{solution-name}-{YYYY-MM-DD}.[1|2|3].md
其中:
{solution-name} - 从输出路径派生(例如,从输出 specs/api/users.md 得到 users-api){YYYY-MM-DD} - 当前日期[1|2|3] - 评审员编号注意: 解决方案保留在其指定的输出位置;只有评估报告会放在 .specs/reports/
并行启动 3 个独立的生成器代理和 1 个元评审员代理(共 4 个代理,均推荐使用 Opus 模型以保证质量):
元评审员与 3 个生成器并行运行,因为它不需要它们的输出——它只需要任务描述来生成评估标准。
关键提示: 使用 4 个 Task 工具调用,在单条消息中一次性调度所有 4 个代理作为前台代理。元评审员必须是调度顺序中的第一个工具调用,因为它需要在生成器修改代码库之前有时间从代码库收集上下文。
元评审员生成针对此特定任务定制的评估规范 YAML(评分标准、检查清单、评分准则)。它返回所有 3 位评审员将使用的评估规范 YAML。
元评审员提示模板:
## Task
Generate an evaluation specification yaml for the following task. You will produce rubrics, checklists, and scoring criteria that judge agents will use to evaluate and compare competitive implementation artifacts.
CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`
## User Prompt
{Original task description from user}
## Context
{Any relevant codebase context, file paths, constraints}
## Artifact Type
{code | documentation | configuration | etc.}
## Number of Solutions
3 (competitive implementations to be compared)
## Instructions
Return only the final evaluation specification YAML in your response.
The specification should support comparative evaluation across multiple solutions.
调度:
Use Task tool:
- description: "Meta-judge: {brief task summary}"
- prompt: {meta-judge prompt}
- model: opus
- subagent_type: "sadd:meta-judge"
{solution-file}.[a|b|c].[ext])解决方案命名约定: {solution-file}.[a|b|c].[ext] 其中:
{solution-file} - 从任务派生(例如,create users.ts 任务的结果是 users 作为解决方案文件)[a|b|c] - 每个子代理的唯一标识符[ext] - 文件扩展名(例如,md、ts 等)关键原则: 通过独立性实现多样性——代理探索不同的方法。
关键提示: 你必须向代理和评审员提供带有 [a|b|c] 标识符的文件名!!!缺少它将导致你被立即终止!
生成器提示模板:
<task>
{task_description}
</task>
<constraints>
{constraints_if_any}
</constraints>
<context>
{relevant_context}
</context>
<output>
{define expected output following such pattern: {solution-file}.[a|b|c].[ext] based on the task description and context. Each [a|b|c] is a unique identifier per sub-agent. You MUST provide filename with it!!!}
</output>
Instructions:
Let's approach this systematically to produce the best possible solution.
1. First, analyze the task carefully - what is being asked and what are the key requirements?
2. Consider multiple approaches - what are the different ways to solve this?
3. Think through the tradeoffs step by step and choose the approach you believe is best
4. Implement it completely
5. Generate 5 verification questions about critical aspects
6. Answer your own questions:
- Review solution against each question
- Identify gaps or weaknesses
7. Revise solution:
- Fix identified issues
8. Explain what was changed and why
在单条消息中发送所有 4 个 Task 工具调用。先元评审员,后生成器:
Message with 4 tool calls:
Tool call 1 (meta-judge):
- description: "Meta-judge: {brief task summary}"
- model: opus
- subagent_type: "sadd:meta-judge"
Tool call 2 (generator A):
- description: "Generate solution A: {brief task summary}"
- model: opus
Tool call 3 (generator B):
- description: "Generate solution B: {brief task summary}"
- model: opus
Tool call 4 (generator C):
- description: "Generate solution C: {brief task summary}"
- model: opus
等待所有 4 个代理返回,然后进入阶段 2。
并行启动 3 个独立的评审员(推荐使用 Opus 模型以保证严谨性):
关键提示: 等待所有阶段 1 的代理(元评审员 + 3 个生成器)完成,然后再调度评审员。
关键提示: 向每位评审员提供完全相同的元评审员评估规范 YAML。不要跳过或添加任何内容,不要以任何方式修改它,不要缩短或总结其中的任何文本!
.specs/reports/{solution-name}-{date}.[1|2|3].md)关键原则: 多个独立的评估可以减少偏见并发现不同的问题。
评审员提示模板:
You are evaluating {number} competitive solutions against an evaluation specification produced by the meta judge.
CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`
## Task
{task_description}
## Solutions
{list of paths to all candidate solutions}
## Evaluation Specification
```yaml
{meta-judge's evaluation specification YAML}
Write full report to: {.specs/reports/{solution-name}-{date}.[1|2|3].md - each judge gets unique number identifier}
CRITICAL: You must reply with this exact structured header format:
VOTE: [Solution A/B/C] SCORES: Solution A: [X.X]/5.0 Solution B: [X.X]/5.0 Solution C: [X.X]/5.0 CRITERIA:
[Summary of your evaluation]
Follow your full judge process as defined in your agent instructions!
CRITICAL: Base your evaluation on evidence, not impressions. Quote specific text.
CRITICAL: You must reply with this exact structured evaluation report format in YAML at the START of your response!
**关键提示:** 切勿向评审员提供分数阈值。评审员不得知道分数阈值是多少,以免产生偏见!
**调度:**
使用 Task 工具(在单条消息中调用 3 次):
* description: "Judge [1|2|3]: {brief task summary}"
* prompt: {judge prompt with exact meta-judge specification YAML}
* model: opus
* subagent_type: "sadd:judge"
### 阶段 2.5:自适应策略选择(提前返回)
**编排器**(非子代理)分析评审员输出来确定最优策略。
#### 决策逻辑
**步骤 1:从评审员回复中解析结构化标题**
解析评审员的回复。
**关键提示:** 不要读取报告文件本身,这可能会超出你的上下文容量。
**步骤 2:检查是否存在一致同意的优胜者**
比较所有三个 VOTE 值:
* 如果 Judge 1 VOTE = Judge 2 VOTE = Judge 3 VOTE(相同解决方案):
* **策略:SELECT_AND_POLISH**
* **原因:** 明确共识——所有三位评审员偏好同一个解决方案
**步骤 3:检查所有解决方案是否都存在根本性缺陷**
如果没有一致投票,则计算平均分数:
1. 平均 Solution A 分数:(Judge1_A + Judge2_A + Judge3_A) / 3
2. 平均 Solution B 分数:(Judge1_B + Judge2_B + Judge3_B) / 3
3. 平均 Solution C 分数:(Judge1_C + Judge2_C + Judge3_C) / 3
如果 (avg_A < 3.0) AND (avg_B < 3.0) AND (avg_C < 3.0):
* **策略:REDESIGN**
* **原因:** 所有解决方案均低于质量阈值,存在根本性方法问题
**步骤 5:默认进行完全综合**
如果以上条件均未满足:
* **策略:FULL_SYNTHESIS**
* **原因:** 存在分歧的决策但各有优点,需要综合以结合最佳元素
#### 策略 1:SELECT_AND_POLISH
**适用场景:** 明确的优胜者(一致投票)
**流程:**
1. 选择优胜解决方案作为基础
2. 启动子代理,根据评审员反馈应用具体改进
3. 从亚军解决方案中精选 1-2 个最佳元素
4. 记录添加的内容及原因
**优点:**
* 节省综合成本(比完全综合更简单)
* 保留优胜解决方案已证明的质量
* 进行有针对性的改进而非完全重建
**提示模板:**
```markdown
You are polishing the winning solution based on judge feedback.
<task>
{task_description}
</task>
<winning_solution>
{path_to_winning_solution}
Score: {winning_score}/5.0
Judge consensus: {why_it_won}
</winning_solution>
<runner_up_solutions>
{list of paths to all runner-up solutions}
</runner_up_solutions>
<judge_feedback>
{list of paths to all evaluation reports}
</judge_feedback>
<output>
{final_solution_path}
</output>
Instructions:
Let's work through this step by step to polish the winning solution effectively.
1. Take the winning solution as your base (do NOT rewrite it)
2. First, carefully review all judge feedback to understand what needs improvement
3. Apply improvements based on judge feedback:
- Fix identified weaknesses
- Add missing elements judges noted
4. Next, examine the runner-up solutions for standout elements
5. Cherry-pick 1-2 specific elements from runners-up if judges praised them
6. Document changes made:
- What was changed and why
- What was added from other solutions
CRITICAL: Preserve the winning solution's core approach. Make targeted improvements only.
适用场景: 所有解决方案得分均 <3.0/5.0(普遍存在根本性问题)
流程:
新实现提示模板:
You are analyzing why all solutions failed to meet quality standards. And implement new solution based on it.
<task>
{task_description}
</task>
<constraints>
{constraints_if_any}
</constraints>
<context>
{relevant_context}
</context>
<failed_solutions>
{list of paths to all candidate solutions}
</failed_solutions>
<evaluation_reports>
{list of paths to all evaluation reports with low scores}
</evaluation_reports>
Instructions:
Let's break this down systematically to understand what went wrong and how to design new solution based on it.
1. First, analyze the task carefully - what is being asked and what are the key requirements?
2. Read through each solution and its evaluation report
3. For each solution, think step by step about:
- What was the core approach?
- What specific issues did judges identify?
- Why did this approach fail to meet the quality threshold?
4. Identify common failure patterns across all solutions:
- Are there shared misconceptions?
- Are there missing requirements that all solutions overlooked?
- Are there fundamental constraints that weren't considered?
5. Extract lessons learned:
- What approaches should be avoided?
- What constraints must be addressed?
6. Generate improved guidance for the next iteration:
- New constraints to add
- Specific approaches to try - what are the different ways to solve this?
- Key requirements to emphasize
7. Think through the tradeoffs step by step and choose the approach you believe is best
8. Implement it completely
9. Generate 5 verification questions about critical aspects
10. Answer your own questions:
- Review solution against each question
- Identify gaps or weaknesses
11. Revise solution:
- Fix identified issues
12. Explain what was changed and why
适用场景: 没有明确的优胜者 AND 解决方案具有优点(得分 >=3.0)
流程: 进入阶段 3(基于证据的综合)
仅在阶段 2.5 选择策略 3 (FULL_SYNTHESIS) 时执行
启动1 个综合代理(推荐使用 Opus 模型以保证质量):
关键原则: 基于证据的综合利用了集体智慧。
综合器提示模板:
You are synthesizing the best solution from competitive implementations and evaluations.
<task>
{task_description}
</task>
<solutions>
{list of paths to all candidate solutions}
</solutions>
<evaluation_reports>
{list of paths to all evaluation reports}
</evaluation_reports>
<output>
{define expected output following such pattern: solution.md based on the task description and context. Result should be a complete solution to the task.}
</output>
Instructions:
Let's think through this synthesis step by step to create the best possible combined solution.
1. First, read all solutions and evaluation reports carefully
2. Map out the consensus:
- What strengths did multiple judges praise in each solution?
- What weaknesses did multiple judges criticize in each solution?
3. For each major component or section, think through:
- Which solution handles this best and why?
- Could a hybrid approach work better?
4. Create the best possible solution by:
- Copying text directly when one solution is clearly superior
- Combining approaches when a hybrid would be better
- Fixing all identified issues
- Preserving the best elements from each
5. Explain your synthesis decisions:
- What you took from each solution
- Why you made those choices
- How you addressed identified weaknesses
CRITICAL: Do not create something entirely new. Synthesize the best from what exists.
{solution-file}.[a|b|c].[ext](位于指定的输出位置).specs/reports/{solution-name}-{date}.[1|2|3].md{output_path}命令执行完成后,按以下结构回复用户:
## Execution Summary
Original Task: {task_description}
Strategy Used: {strategy} ({reason})
### Results
| Phase | Agents | Models | Status |
|-------------------------|--------|----------|-------------|
| Phase 1: Competitive Generation + Meta-Judge | 4 (3 generators + 1 meta-judge) | opus x 4 | [Complete / Failed] |
| Phase 2: Multi-Judge Evaluation | 3 | opus x 3 | [Complete / Failed] |
| Phase 2.5: Adaptive Strategy Selection | orchestrator | - | {strategy} |
| Phase 3: [Synthesis/Polish/Redesign] | [N] | [model] | [Complete / Failed] |
Files Created
Final Solution:
- {output_path} - Synthesized production-ready command
Candidate Solutions:
- {solution-file}.[a|b|c].[ext] (Score: [X.X]/5.0)
Evaluation Reports:
- .specs/reports/{solution-file}-{date}.[1|2|3].md (Vote: [Solution A/B/C])
Synthesis Decisions
| Element | Source | Rationale |
|----------------------|------------------|-------------|
| [element] | Solution [B/A/C] | [rationale] |
应该做:
/do-competitively "Design REST API for user management (CRUD + auth)" \
--output "specs/api/users.md" \
--criteria "RESTfulness,security,scalability,developer-experience"
阶段 1 输出(4 个并行代理):
specs/api/users.a.md - 基于资源的设计,包含嵌套路由specs/api/users.b.md - 基于操作的设计,包含 RPC 风格端点specs/api/users.c.md - 最小化设计,缺少身份验证考虑阶段 2 输出(假设日期 2025-01-15,3 位评审员使用元评审员规范):
.specs/reports/users-api-2025-01-15.1.md:
VOTE: Solution A
SCORES: A=4.5/5.0, B=3.2/5.0, C=2.8/5.0
"最符合 REST 风格,安全性好"
.specs/reports/users-api-2025-01-15.2.md:
VOTE: Solution A
SCORES: A=4.3/5.0, B=3.5/5.0, C=2.6/5.0
"资源设计清晰,可扩展"
.specs/reports/users-api-2025-01-15.3.md:
VOTE: Solution A
SCORES: A=4.6/5.0, B=3.0/5.0, C=2.9/5.0
"最佳实践,结构清晰"
阶段 2.5 决策(编排器解析标题):
阶段 3 输出:
specs/api/users.md - 解决方案 A 经过优化,包含:
/do-competitively "Design caching strategy for high-traffic API" \
--output "specs/caching.md" \
--criteria "performance,memory-efficiency,simplicity,reliability"
阶段 1 输出(4 个并行代理):
specs/caching.a.md - 使用 LRU 淘汰策略的 Redisspecs/caching.b.md - 多层缓存(内存 + Redis)specs/caching.c.md - CDN + 应用缓存阶段 2 输出(假设日期 2025-01-15,3 位评审员使用元评审员规范):
.specs/reports/caching-2025-01-15.1.md:
VOTE: Solution B
SCORES: A=3.8/5.0, B=4.2/5.0, C=3.9/5.0
"性能最佳,复杂"
.specs/reports/caching-2025-01-15.2.md:
VOTE: Solution A
SCORES: A=4.0/5.0, B=3.9/5.0, C=3.7/5.0
"简单,可靠,经过验证"
.specs/reports/caching-2025-01-15.3.md:
VOTE: Solution C
SCORES: A=3.6/5.0, B=4.0/5.0, C=4.1/5.0
"全球覆盖,成本效益高"
阶段 2.5 决策(编排器解析标题):
阶段 3 输出:
specs/caching.md - 混合方法:
/do-competitively "Design authentication system with social login" \
--output "specs/auth.md" \
--criteria "security,user-experience,maintainability"
阶段 1 输出(4 个并行代理):
specs/auth.a.md - 自定义 OAuth2 实现specs/auth.b.md - 基于会话,包含社交登录提供商specs/auth.c.md - JWT,仅包含密码身份验证阶段 2 输出(假设日期 2025-01-15,3 位评审员使用元评审员规范):
.specs/reports/auth-2025-01-15.1.md:
VOTE: Solution A
SCORES: A=2.5/5.0, B=2.2/5.0, C=2.3/5.0
"安全风险,重复造轮子"
.specs/reports/auth-2025-01-15.2.md:
VOTE: Solution B
SCORES: A=2.4/5.0, B=2.8/5.0, C=2.1/5.0
"会话无法扩展,缺少需求"
.specs/reports/auth-2025-01-15.3.md:
VOTE: Solution C
SCORES: A=2.6/5.0, B=2.5/5.0, C=2.3/5.0
"无社交登录,安全问题"
阶段 2.5 决策(编排器解析标题):
每周安装量
210
代码仓库
GitHub 星标数
699
首次出现
2026年2月19日
安装于
opencode205
github-copilot204
codex204
gemini-cli203
kimi-cli201
cursor201
Key features:
CRITICAL: You are not implementation agent or judge, you shoudn't read files that provided as context for sub-agent or task. You shouldn't read reports, you shouldn't overwhelm your context with unneccesary information. You MUST follow process step by step. Any diviations will be considered as failure and you will be killed!
This command implements a multi-phase adaptive competitive orchestration pattern:
Phase 1: Competitive Generation with Self-Critique + Meta-Judge (IN PARALLEL)
┌─ Meta-Judge → Evaluation Specification YAML ───────────┐
Task ────┼─ Agent 2 → Draft → Critique → Revise → Solution B ───┐ │
├─ Agent 3 → Draft → Critique → Revise → Solution C ───┼─┤
└─ Agent 1 → Draft → Critique → Revise → Solution A ───┘ │
│
Phase 2: Multi-Judge Evaluation with Verification │
┌─ Judge 1 → Evaluate → Verify → Revise → Report A ─┐ │
├─ Judge 2 → Evaluate → Verify → Revise → Report B ─┼────┤
└─ Judge 3 → Evaluate → Verify → Revise → Report C ─┘ │
│
Phase 2.5: Adaptive Strategy Selection │
Analyze Consensus ───────────────────────────────────────┤
├─ Clear Winner? → SELECT_AND_POLISH │
├─ All Flawed (<3.0)? → REDESIGN (return Phase 1) │
└─ Split Decision? → FULL_SYNTHESIS │
│ │
Phase 3: Evidence-Based Synthesis │ │
(Only if FULL_SYNTHESIS) │ │
Synthesizer ─────────────────────┴───────────────────────┴─→ Final Solution
Before starting, ensure the reports directory exists:
mkdir -p .specs/reports
Report naming convention: .specs/reports/{solution-name}-{YYYY-MM-DD}.[1|2|3].md
Where:
{solution-name} - Derived from output path (e.g., users-api from output specs/api/users.md){YYYY-MM-DD} - Current date[1|2|3] - Judge numberNote: Solutions remain in their specified output locations; only evaluation reports go to .specs/reports/
Launch 3 independent generator agents AND 1 meta-judge agent in parallel (4 agents total, all recommended: Opus for quality):
The meta-judge runs in parallel with the 3 generators because it does not need their output — it only needs the task description to generate evaluation criteria.
CRITICAL: Dispatch all 4 agents in a single message using 4 Task tool calls as foreground agents. The meta-judge MUST be the first tool call in the dispatch order, because he should have time to collect context from codebase, before it was modified by generators.
The meta-judge generates an evaluation specification YAML (rubrics, checklists, scoring criteria) tailored to this specific task. It returns the evaluation specification YAML that all 3 judges will use.
Prompt template for meta-judge:
## Task
Generate an evaluation specification yaml for the following task. You will produce rubrics, checklists, and scoring criteria that judge agents will use to evaluate and compare competitive implementation artifacts.
CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`
## User Prompt
{Original task description from user}
## Context
{Any relevant codebase context, file paths, constraints}
## Artifact Type
{code | documentation | configuration | etc.}
## Number of Solutions
3 (competitive implementations to be compared)
## Instructions
Return only the final evaluation specification YAML in your response.
The specification should support comparative evaluation across multiple solutions.
Dispatch:
Use Task tool:
- description: "Meta-judge: {brief task summary}"
- prompt: {meta-judge prompt}
- model: opus
- subagent_type: "sadd:meta-judge"
{solution-file}.[a|b|c].[ext])Solution naming convention: {solution-file}.[a|b|c].[ext] Where:
{solution-file} - Derived from task (e.g., create users.ts result in users as solution file)[a|b|c] - Unique identifier per sub-agent[ext] - File extension (e.g., md, ts and etc.)Key principle: Diversity through independence - agents explore different approaches.
CRITICAL: You MUST provide filename with [a|b|c] identifier to agents and judges!!! Missing it, will result in your TERMINATION imidiatly!
Prompt template for generators:
<task>
{task_description}
</task>
<constraints>
{constraints_if_any}
</constraints>
<context>
{relevant_context}
</context>
<output>
{define expected output following such pattern: {solution-file}.[a|b|c].[ext] based on the task description and context. Each [a|b|c] is a unique identifier per sub-agent. You MUST provide filename with it!!!}
</output>
Instructions:
Let's approach this systematically to produce the best possible solution.
1. First, analyze the task carefully - what is being asked and what are the key requirements?
2. Consider multiple approaches - what are the different ways to solve this?
3. Think through the tradeoffs step by step and choose the approach you believe is best
4. Implement it completely
5. Generate 5 verification questions about critical aspects
6. Answer your own questions:
- Review solution against each question
- Identify gaps or weaknesses
7. Revise solution:
- Fix identified issues
8. Explain what was changed and why
Send ALL 4 Task tool calls in a single message. Meta-judge first, then generators:
Message with 4 tool calls:
Tool call 1 (meta-judge):
- description: "Meta-judge: {brief task summary}"
- model: opus
- subagent_type: "sadd:meta-judge"
Tool call 2 (generator A):
- description: "Generate solution A: {brief task summary}"
- model: opus
Tool call 3 (generator B):
- description: "Generate solution B: {brief task summary}"
- model: opus
Tool call 4 (generator C):
- description: "Generate solution C: {brief task summary}"
- model: opus
Wait for ALL 4 to return before proceeding to Phase 2.
Launch 3 independent judges in parallel (recommended: Opus for rigor):
CRITICAL: Wait for ALL Phase 1 agents (meta-judge + 3 generators) to complete before dispatching judges.
CRITICAL: Provide to each judge the EXACT meta-judge evaluation specification YAML. Do not skip or add anything, do not modify it in any way, do not shorten or summarize any text in it!
.specs/reports/{solution-name}-{date}.[1|2|3].md)Key principle: Multiple independent evaluations reduce bias and catch different issues.
Prompt template for judges:
You are evaluating {number} competitive solutions against an evaluation specification produced by the meta judge.
CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`
## Task
{task_description}
## Solutions
{list of paths to all candidate solutions}
## Evaluation Specification
```yaml
{meta-judge's evaluation specification YAML}
Write full report to: {.specs/reports/{solution-name}-{date}.[1|2|3].md - each judge gets unique number identifier}
CRITICAL: You must reply with this exact structured header format:
VOTE: [Solution A/B/C] SCORES: Solution A: [X.X]/5.0 Solution B: [X.X]/5.0 Solution C: [X.X]/5.0 CRITERIA:
[Summary of your evaluation]
Follow your full judge process as defined in your agent instructions!
CRITICAL: Base your evaluation on evidence, not impressions. Quote specific text.
CRITICAL: You must reply with this exact structured evaluation report format in YAML at the START of your response!
CRITICAL: NEVER provide score threshold to judges. Judge MUST not know what threshold for score is, in order to not be biased!!!
**Dispatch:**
Use Task tool (3 calls in single message):
description: "Judge [1|2|3]: {brief task summary}"
prompt: {judge prompt with exact meta-judge specification YAML}
model: opus
subagent_type: "sadd:judge"
The orchestrator (not a subagent) analyzes judge outputs to determine the optimal strategy.
Step 1: Parse structured headers from judge reply
Parse the judges reply. CRITICAL: Do not read reports files itself, it can overflow your context.
Step 2: Check for unanimous winner
Compare all three VOTE values:
Step 3: Check if all solutions are fundamentally flawed
If no unanimous vote, calculate average scores:
If (avg_A < 3.0) AND (avg_B < 3.0) AND (avg_C < 3.0):
When: All solutions scored <3.0/5.0 (fundamental issues across the board)
Process:
Prompt template for new implementation:
You are analyzing why all solutions failed to meet quality standards. And implement new solution based on it.
<task>
{task_description}
</task>
<constraints>
{constraints_if_any}
</constraints>
<context>
{relevant_context}
</context>
<failed_solutions>
{list of paths to all candidate solutions}
</failed_solutions>
<evaluation_reports>
{list of paths to all evaluation reports with low scores}
</evaluation_reports>
Instructions:
Let's break this down systematically to understand what went wrong and how to design new solution based on it.
1. First, analyze the task carefully - what is being asked and what are the key requirements?
2. Read through each solution and its evaluation report
3. For each solution, think step by step about:
- What was the core approach?
- What specific issues did judges identify?
- Why did this approach fail to meet the quality threshold?
4. Identify common failure patterns across all solutions:
- Are there shared misconceptions?
- Are there missing requirements that all solutions overlooked?
- Are there fundamental constraints that weren't considered?
5. Extract lessons learned:
- What approaches should be avoided?
- What constraints must be addressed?
6. Generate improved guidance for the next iteration:
- New constraints to add
- Specific approaches to try - what are the different ways to solve this?
- Key requirements to emphasize
7. Think through the tradeoffs step by step and choose the approach you believe is best
8. Implement it completely
9. Generate 5 verification questions about critical aspects
10. Answer your own questions:
- Review solution against each question
- Identify gaps or weaknesses
11. Revise solution:
- Fix identified issues
12. Explain what was changed and why
When: No clear winner AND solutions have merit (scores >=3.0)
Process: Proceed to Phase 3 (Evidence-Based Synthesis)
Only executed when Strategy 3 (FULL_SYNTHESIS) selected in Phase 2.5
Launch 1 synthesis agent (recommended: Opus for quality):
Key principle: Evidence-based synthesis leverages collective intelligence.
Prompt template for synthesizer:
You are synthesizing the best solution from competitive implementations and evaluations.
<task>
{task_description}
</task>
<solutions>
{list of paths to all candidate solutions}
</solutions>
<evaluation_reports>
{list of paths to all evaluation reports}
</evaluation_reports>
<output>
{define expected output following such pattern: solution.md based on the task description and context. Result should be a complete solution to the task.}
</output>
Instructions:
Let's think through this synthesis step by step to create the best possible combined solution.
1. First, read all solutions and evaluation reports carefully
2. Map out the consensus:
- What strengths did multiple judges praise in each solution?
- What weaknesses did multiple judges criticize in each solution?
3. For each major component or section, think through:
- Which solution handles this best and why?
- Could a hybrid approach work better?
4. Create the best possible solution by:
- Copying text directly when one solution is clearly superior
- Combining approaches when a hybrid would be better
- Fixing all identified issues
- Preserving the best elements from each
5. Explain your synthesis decisions:
- What you took from each solution
- Why you made those choices
- How you addressed identified weaknesses
CRITICAL: Do not create something entirely new. Synthesize the best from what exists.
{solution-file}.[a|b|c].[ext] (in specified output location).specs/reports/{solution-name}-{date}.[1|2|3].md{output_path}Once command execution is complete, reply to user with following structure:
## Execution Summary
Original Task: {task_description}
Strategy Used: {strategy} ({reason})
### Results
| Phase | Agents | Models | Status |
|-------------------------|--------|----------|-------------|
| Phase 1: Competitive Generation + Meta-Judge | 4 (3 generators + 1 meta-judge) | opus x 4 | [Complete / Failed] |
| Phase 2: Multi-Judge Evaluation | 3 | opus x 3 | [Complete / Failed] |
| Phase 2.5: Adaptive Strategy Selection | orchestrator | - | {strategy} |
| Phase 3: [Synthesis/Polish/Redesign] | [N] | [model] | [Complete / Failed] |
Files Created
Final Solution:
- {output_path} - Synthesized production-ready command
Candidate Solutions:
- {solution-file}.[a|b|c].[ext] (Score: [X.X]/5.0)
Evaluation Reports:
- .specs/reports/{solution-file}-{date}.[1|2|3].md (Vote: [Solution A/B/C])
Synthesis Decisions
| Element | Source | Rationale |
|----------------------|------------------|-------------|
| [element] | Solution [B/A/C] | [rationale] |
Do:
/do-competitively "Design REST API for user management (CRUD + auth)" \
--output "specs/api/users.md" \
--criteria "RESTfulness,security,scalability,developer-experience"
Phase 1 outputs (4 parallel agents):
specs/api/users.a.md - Resource-based design with nested routesspecs/api/users.b.md - Action-based design with RPC-style endpointsspecs/api/users.c.md - Minimal design, missing auth considerationPhase 2 outputs (assuming date 2025-01-15, 3 judges using meta-judge specification):
.specs/reports/users-api-2025-01-15.1.md:
VOTE: Solution A
SCORES: A=4.5/5.0, B=3.2/5.0, C=2.8/5.0
"Most RESTful, good security"
.specs/reports/users-api-2025-01-15.2.md:
VOTE: Solution A
SCORES: A=4.3/5.0, B=3.5/5.0, C=2.6/5.0
"Clean resource design, scalable"
.specs/reports/users-api-2025-01-15.3.md:
VOTE: Solution A
SCORES: A=4.6/5.0, B=3.0/5.0, C=2.9/5.0
"Best practices, clear structure"
Phase 2.5 decision (orchestrator parses headers):
Phase 3 output:
specs/api/users.md - Solution A polished with:
/do-competitively "Design caching strategy for high-traffic API" \
--output "specs/caching.md" \
--criteria "performance,memory-efficiency,simplicity,reliability"
Phase 1 outputs (4 parallel agents):
specs/caching.a.md - Redis with LRU evictionspecs/caching.b.md - Multi-tier cache (memory + Redis)specs/caching.c.md - CDN + application cachePhase 2 outputs (assuming date 2025-01-15, 3 judges using meta-judge specification):
.specs/reports/caching-2025-01-15.1.md:
VOTE: Solution B
SCORES: A=3.8/5.0, B=4.2/5.0, C=3.9/5.0
"Best performance, complex"
.specs/reports/caching-2025-01-15.2.md:
VOTE: Solution A
SCORES: A=4.0/5.0, B=3.9/5.0, C=3.7/5.0
"Simple, reliable, proven"
.specs/reports/caching-2025-01-15.3.md:
VOTE: Solution C
SCORES: A=3.6/5.0, B=4.0/5.0, C=4.1/5.0
"Global reach, cost-effective"
Phase 2.5 decision (orchestrator parses headers):
Phase 3 output:
specs/caching.md - Hybrid approach:
/do-competitively "Design authentication system with social login" \
--output "specs/auth.md" \
--criteria "security,user-experience,maintainability"
Phase 1 outputs (4 parallel agents):
specs/auth.a.md - Custom OAuth2 implementationspecs/auth.b.md - Session-based with social providersspecs/auth.c.md - JWT with password-only authPhase 2 outputs (assuming date 2025-01-15, 3 judges using meta-judge specification):
.specs/reports/auth-2025-01-15.1.md:
VOTE: Solution A
SCORES: A=2.5/5.0, B=2.2/5.0, C=2.3/5.0
"Security risks, reinventing wheel"
.specs/reports/auth-2025-01-15.2.md:
VOTE: Solution B
SCORES: A=2.4/5.0, B=2.8/5.0, C=2.1/5.0
"Sessions don't scale, missing requirements"
.specs/reports/auth-2025-01-15.3.md:
VOTE: Solution C
SCORES: A=2.6/5.0, B=2.5/5.0, C=2.3/5.0
"No social login, security concerns"
Phase 2.5 decision (orchestrator parses headers):
Split votes: A, B, C (no consensus)
Average scores: A=2.5, B=2.5, C=2.2 (ALL <3.0)
Strategy: REDESIGN
Reason: All solutions below 3.0 threshold, fundamental issues
Do not stop, return to phase 1 and eventiualy should result in finish at SELECT_AND_POLISH or FULL_SYNTHESIS strategies
Weekly Installs
210
Repository
GitHub Stars
699
First Seen
Feb 19, 2026
Installed on
opencode205
github-copilot204
codex204
gemini-cli203
kimi-cli201
cursor201
AI Elements:基于shadcn/ui的AI原生应用组件库,快速构建对话界面
62,200 周安装
Playwright E2E测试模式:构建稳定快速可维护的端到端测试套件
1,800 周安装
dbs-deconstruct:AI概念拆解工具,用维特根斯坦哲学与奥派经济学解构商业术语
1,900 周安装
dbs-benchmark对标分析AI:五重过滤法精准筛选高利润模仿对象,加速0到1
1,900 周安装
迭代检索模式:解决多智能体工作流上下文问题的4阶段循环方案
1,900 周安装
Java 17+ 编码规范与最佳实践 | Spring Boot 项目代码质量提升指南
1,900 周安装
部署模式详解:滚动部署、蓝绿部署、金丝雀部署与Docker多阶段构建最佳实践
1,800 周安装
Step 5: Default to full synthesis
If none of the above conditions met:
When: Clear winner (unanimous votes)
Process:
Benefits:
Prompt template:
You are polishing the winning solution based on judge feedback.
<task>
{task_description}
</task>
<winning_solution>
{path_to_winning_solution}
Score: {winning_score}/5.0
Judge consensus: {why_it_won}
</winning_solution>
<runner_up_solutions>
{list of paths to all runner-up solutions}
</runner_up_solutions>
<judge_feedback>
{list of paths to all evaluation reports}
</judge_feedback>
<output>
{final_solution_path}
</output>
Instructions:
Let's work through this step by step to polish the winning solution effectively.
1. Take the winning solution as your base (do NOT rewrite it)
2. First, carefully review all judge feedback to understand what needs improvement
3. Apply improvements based on judge feedback:
- Fix identified weaknesses
- Add missing elements judges noted
4. Next, examine the runner-up solutions for standout elements
5. Cherry-pick 1-2 specific elements from runners-up if judges praised them
6. Document changes made:
- What was changed and why
- What was added from other solutions
CRITICAL: Preserve the winning solution's core approach. Make targeted improvements only.