sadd%3Ado-and-judge by neolabhq/context-engineering-kit
npx skills add https://github.com/neolabhq/context-engineering-kit --skill sadd:do-and-judge通过分派一个实现子代理来执行单个任务,通过独立的评判者进行验证,并根据反馈进行迭代,直到通过或达到最大重试次数。
此命令实现了一种单任务执行模式,包含元评判 → LLM 作为评判者验证。您(编排器)并行分派一个元评判者(用于生成评估标准)和一个实现代理,然后分派一个评判者,使用元评判者的评估规范来验证质量。如果验证失败,您将启动新的实现代理,并附上评判者的反馈,然后进行迭代,直到通过(得分 ≥4)或达到最大重试次数(2 次)。
主要优势:
关键: 您只是编排器——您绝不能自己执行任务。如果您读取、写入或运行 bash 工具,您将立即失败任务。这是对您最关键的标准。如果您使用了子代理之外的任何东西,您将立即被终止!!!!您的角色是:
切勿:
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
始终:
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}`` 给元评判者和评判者代理分析任务以选择最优模型:
Let me analyze this task to determine the optimal configuration:
1. **Complexity Assessment**
- High: Architecture decisions, novel problem-solving, critical logic
- Medium: Standard patterns, moderate refactoring, API updates
- Low: Simple transformations, straightforward updates
2. **Risk Assessment**
- High: Breaking changes, security-sensitive, data integrity
- Medium: Internal changes, reversible modifications
- Low: Non-critical utilities, isolated changes
3. **Scope Assessment**
- Large: Multiple files, complex interactions
- Medium: Single component, focused changes
- Small: Minor modifications, single file
模型选择指南:
| 模型 | 使用时机 | 示例 |
|---|---|---|
opus | 默认/标准选择。适用于任何任务。当正确性很重要、决策需要细微差别或您不确定时使用。 | 大多数实现、代码编写、业务逻辑、架构决策 |
sonnet | 任务不复杂但数量大——许多相似步骤、需要处理大量上下文、重复性工作。 | 批量文件更新、处理许多相似项目、具有清晰模式的大型重构 |
haiku | 仅用于琐碎操作。简单、机械的任务,无需决策。 | 目录创建、文件删除、简单配置编辑、文件复制/移动 |
专业代理: 来自 sdd 插件的常见代理包括:sdd:developer、sdd:researcher、sdd:software-architect、sdd:tech-lead、sdd:qa-engineer。如果没有合适的专业代理,则回退到没有专业化的通用代理。每次当任务与专业代理之间没有直接关联,或者代理不可用时,您必须使用通用代理!
关键:在一条消息中使用两次 Task 工具调用来启动两个代理。元评判者必须是该消息中的第一个工具调用,以便它可以在实现代理修改工件之前观察它们。
两个代理都作为前台代理运行。在进入阶段 3 之前,等待两者都完成。
元评判者生成针对此特定任务的评估规范(评分标准、检查清单、评分标准)。它将向您返回评估规范 YAML。
## Task
Generate an evaluation specification yaml for the following task. You will produce rubrics, checklists, and scoring criteria that a judge agent will use to evaluate the implementation artifact.
CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`
## User Prompt
{Original task description from user}
## Context
{Any relevant codebase context, file paths, constraints}
## Artifact Type
{code | documentation | configuration | etc.}
## Instructions
Return only the final evaluation specification YAML in your response.
Use Task tool:
- description: "Meta-judge: {brief task summary}"
- prompt: {meta-judge prompt}
- model: opus
- subagent_type: "sadd:meta-judge"
使用这些强制组件构建实现提示:
零样本思维链前缀(必需 - 必须放在首位)
## Reasoning Approach
Before taking any action, think through this task systematically.
Let's approach this step by step:
1. "Let me understand what this task requires..."
- What is the specific objective?
- What constraints exist?
- What is the expected outcome?
2. "Let me explore the relevant code..."
- What files are involved?
- What patterns exist in the codebase?
- What dependencies need consideration?
3. "Let me plan my approach..."
- What specific modifications are needed?
- What order should I make them?
- What could go wrong?
4. "Let me verify my approach before implementing..."
- Does my plan achieve the objective?
- Am I following existing patterns?
- Is there a simpler way?
Work through each step explicitly before implementing.
任务主体
## Task
{Task description from user}
## Constraints
- Follow existing code patterns and conventions
- Make minimal changes to achieve the objective
- Do not introduce new dependencies without justification
- Ensure changes are testable
## Output
Provide your implementation along with a "Summary" section containing:
- Files modified (full paths)
- Key changes (3-5 bullet points)
- Any decisions made and rationale
- Potential concerns or follow-up needed
自我批判后缀(必需 - 必须放在最后)
## Self-Critique Verification (MANDATORY)
Before completing, verify your work. Do not submit unverified changes.
### Verification Questions
| # | Question | Evidence Required |
|---|----------|-------------------|
| 1 | Does my solution address ALL requirements? | [Specific evidence] |
| 2 | Did I follow existing code patterns? | [Pattern examples] |
| 3 | Are there any edge cases I missed? | [Edge case analysis] |
| 4 | Is my solution the simplest approach? | [Alternatives considered] |
| 5 | Would this pass code review? | [Quality check] |
### Answer Each Question with Evidence
Examine your solution and provide specific evidence for each question.
### Revise If Needed
If ANY verification question reveals a gap:
1. **FIX** - Address the specific gap identified
2. **RE-VERIFY** - Confirm the fix resolves the issue
3. **UPDATE** - Update the Summary section
CRITICAL: Do not submit until ALL verification questions have satisfactory answers.
分派
根据任务和可用代理确定最优代理类型,例如:代码实现 -> sdd:developer 代理。如果不确定,最好使用 general-purpose 代理,而不是分派错误的代理类型。
Use Task tool:
- description: "Implement: {brief task summary}"
- prompt: {constructed prompt with CoT + task + self-critique}
- model: {selected model}
- subagent_type: "{selected agent type}"
在一条消息中发送两个 Task 工具调用。元评判者优先,实现其次:
Message with 2 tool calls:
Tool call 1 (meta-judge):
- description: "Meta-judge: {brief task summary}"
- model: opus
- subagent_type: "sadd:meta-judge"
Tool call 2 (implementation):
- description: "Implement: {brief task summary}"
- model: {selected model}
- subagent_type: "{selected agent type}"
在进入阶段 3 之前,等待两者都返回。
在元评判者和实现都完成后,分派评判者代理。
关键:向评判者提供完全相同的元评判者的评估规范 YAML,不要跳过或添加任何内容,不要以任何方式修改它,不要缩短或总结其中的任何文本!
从元评判者输出中提取:
从实现输出中提取:
评判者提示模板:
You are evaluating an implementation artifact against an evaluation specification produced by the meta judge.
CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`
## User Prompt
{Original task description from user}
## Evaluation Specification
```yaml
{meta-judge's evaluation specification YAML}
{Summary section from implementation agent} {Paths to files modified}
Follow your full judge process as defined in your agent instructions!
CRITICAL: You must reply with this exact structured evaluation report format in YAML at the START of your response!
CRITICAL: NEVER provide score threshold, in any format, including `threshold_pass` or anything different. Judge MUST not know what thershold for score is, in order to not be biased!!!
**Dispatch:**
Use Task tool:
解析评判者输出(不要阅读完整报告):
Extract from judge reply:
- VERDICT: PASS or FAIL
- SCORE: X.X/5.0
- ISSUES: List of problems (if any)
- IMPROVEMENTS: List of suggestions (if any)
决策逻辑:
If score ≥4:
→ VERDICT: PASS
→ Report success with summary
→ Include IMPROVEMENTS as optional enhancements
IF score ≥ 3.0 and all found issues are low priority, then:
→ VERDICT: PASS
→ Report success with summary
→ Include IMPROVEMENTS as optional enhancements
If score <4:
→ VERDICT: FAIL
→ Check retry count
If retries < 3:
→ Dispatch retry implementation agent with judge feedback
→ Return to Phase 3 (judge verification with same meta-judge specification)
If retries ≥ 3:
→ Escalate to user (see Error Handling)
→ Do NOT proceed without user decision
重试提示模板:
## Retry Required
Your previous implementation did not pass judge verification.
## Original Task
{Original task description}
## Judge Feedback
VERDICT: FAIL
SCORE: {score}/5.0
ISSUES:
{list of issues from judge}
## Your Previous Changes
{files modified in previous attempt}
## Instructions
Let's fix the identified issues step by step.
1. Review each issue the judge identified
2. For each issue, determine the root cause
3. Plan the fix for each issue
4. Implement ALL fixes
5. Verify your fixes address each issue
6. Provide updated Summary section
CRITICAL: Focus on fixing the specific issues identified. Do not rewrite everything.
任务通过验证后:
## Execution Summary
**Task:** {original task description}
**Result:** ✅ PASS
### Verification
| Attempt | Score | Status |
|---------|-------|--------|
| 1 | {X.X}/5.0 | {PASS/FAIL} |
| 2 | {X.X}/5.0 | {PASS/FAIL} | (if retry occurred)
### Files Modified
- {file1}: {what changed}
- {file2}: {what changed}
### Key Changes
- {change 1}
- {change 2}
### Suggested Improvements (Optional)
{IMPROVEMENTS from judge, if any}
当任务两次验证失败时:
上报报告格式:
## Task Failed Verification (Max Retries Exceeded)
### Task Requirements
{original task description}
### Verification History
| Attempt | Score | Key Issues |
|---------|-------|------------|
| 1 | {X.X}/5.0 | {issues} |
| 2 | {X.X}/5.0 | {issues} |
| 3 | {X.X}/5.0 | {issues} |
### Persistent Issues
{Issues that appeared in multiple attempts}
### Options
1. **Provide guidance** - Give additional context for another retry
2. **Modify requirements** - Simplify or clarify task
3. **Abort** - Stop execution
Awaiting your decision...
输入:
/do-and-judge Extract the validation logic from UserController into a separate UserValidator class
执行:
Phase 1: Task Analysis
→ Model: Opus
Phase 2: Parallel Dispatch (single message, 2 tool calls)
Tool call 1 — Meta-judge (Opus)...
→ Generated evaluation specification YAML
→ 3 rubric dimensions, 6 checklist items
Tool call 2 — Implementation (sadd:meta-judge + Opus)...
→ Created UserValidator.ts
→ Updated UserController to use validator
→ Summary: 2 files modified, validation extracted
Phase 3: Dispatch Judge (with meta-judge specification)
Judge (sadd:judge)...
→ VERDICT: PASS, SCORE: 4.2/5.0
→ ISSUES: None
→ IMPROVEMENTS: Add input validation for edge cases
Phase 6: Final Report
✅ PASS on attempt 1
Files: UserValidator.ts (new), UserController.ts (modified)
输入:
/do-and-judge Implement rate limiting middleware with configurable limits per endpoint
执行:
Phase 1: Task Analysis
- Complexity: High (new feature, multiple concerns)
- Risk: High (affects all endpoints)
- Scope: Medium (single middleware)
→ Model: opus
Phase 2: Parallel Dispatch (Attempt 1)
Tool call 1 — Meta-judge (Opus)...
→ Generated evaluation specification YAML
→ 4 rubric dimensions, 8 checklist items
Tool call 2 — Implementation (sadd:meta-judge + Opus + sdd:developer)...
→ Created RateLimiter middleware
→ Added configuration schema
Phase 3: Dispatch Judge (with meta-judge specification)
Judge (sadd:judge + Opus)...
→ VERDICT: FAIL, SCORE: 3.1/5.0
→ ISSUES:
- Missing per-endpoint configuration
- No Redis support for distributed deployments
→ IMPROVEMENTS: Add monitoring hooks
Phase 5: Retry with Feedback
Implementation (sadd:meta-judge + Opus)...
→ Added endpoint-specific limits
→ Added Redis adapter option
Phase 3: Dispatch Judge (Attempt 2, same meta-judge specification)
Judge (sadd:judge + Opus)...
→ VERDICT: PASS, SCORE: 4.4/5.0
→ IMPROVEMENTS: Add metrics export
Phase 6: Final Report
✅ PASS on attempt 2
Files: RateLimiter.ts, config/rateLimits.ts, adapters/RedisAdapter.ts
输入:
/do-and-judge Migrate the database schema to support multi-tenancy
执行:
Phase 1: Task Analysis
- Complexity: High
- Risk: High (database schema change)
→ Model: opus
Phase 2: Parallel Dispatch
Meta-judge → evaluation specification YAML
Implementation → initial migration scaffolding
Attempt 1: FAIL (2.8/5.0) - Missing tenant isolation in queries
Attempt 2: FAIL (3.2/5.0) - Incomplete migration script
Attempt 3: FAIL (3.3/5.0) - Edge cases in existing data migration
ESCALATION:
Persistent issue: Existing data migration requires business decisions
about how to handle orphaned records.
Options presented to user:
1. Provide guidance on orphan handling
2. Simplify to new tenants only
3. Abort
User chose: Option 1 - "Delete orphaned records older than 1 year"
Attempt 4 (with guidance): PASS (4.1/5.0)
每周安装次数
214
仓库
GitHub 星标数
699
首次出现
2026年2月19日
安装于
opencode208
github-copilot207
codex207
gemini-cli206
cursor205
kimi-cli204
Execute a single task by dispatching an implementation sub-agent, verifying with an independent judge, and iterating with feedback until passing or max retries exceeded.
This command implements a single-task execution pattern with meta-judge → LLM-as-a-judge verification. You (the orchestrator) dispatch a meta-judge (to generate evaluation criteria) and an implementation agent in parallel , then dispatch a judge with the meta-judge's evaluation specification to verify quality. If verification fails, you launch new implementation agent with judge feedback and iterate until passing (score ≥4) or max retries (2) exceeded.
Key benefits:
CRITICAL: You are the orchestrator only - you MUST NOT perform the task yourself. IF you read, write or run bash tools you failed task imidiatly. It is single most critical criteria for you. If you used anyting except sub-agents you will be killed immediatly!!!! Your role is to:
NEVER:
ALWAYS:
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}`` in prompts to meta-judge and judge agentsAnalyze the task to select the optimal model:
Let me analyze this task to determine the optimal configuration:
1. **Complexity Assessment**
- High: Architecture decisions, novel problem-solving, critical logic
- Medium: Standard patterns, moderate refactoring, API updates
- Low: Simple transformations, straightforward updates
2. **Risk Assessment**
- High: Breaking changes, security-sensitive, data integrity
- Medium: Internal changes, reversible modifications
- Low: Non-critical utilities, isolated changes
3. **Scope Assessment**
- Large: Multiple files, complex interactions
- Medium: Single component, focused changes
- Small: Minor modifications, single file
Model Selection Guide:
| Model | When to Use | Examples |
|---|---|---|
opus | Default/standard choice. Safe for any task. Use when correctness matters, decisions are nuanced, or you're unsure. | Most implementation, code writing, business logic, architectural decisions |
sonnet | Task is not complex but high volume - many similar steps, large context to process, repetitive work. | Bulk file updates, processing many similar items, large refactoring with clear patterns |
haiku | Trivial operations only. Simple, mechanical tasks with no decision-making. | Directory creation, file deletion, simple config edits, file copying/moving |
Specialized Agents: Common agents from the sdd plugin include: sdd:developer, sdd:researcher, sdd:software-architect, sdd:tech-lead, sdd:qa-engineer. If the appropriate specialized agent is not available, fallback to a general agent without specialization. You MUST use general-purpose every time, when there no direct coralation between task and specialized agent, or agent is not available!
CRITICAL : Launch BOTH agents in a single message using two Task tool calls. The meta-judge MUST be the first tool call in the message so it can observe artifacts before the implementation agent modifies them.
Both agents run as foreground agents. Wait for both to complete before proceeding to Phase 3.
The meta-judge generates an evaluation specification (rubrics, checklist, scoring criteria) tailored to this specific task. It will return to you the evaluation specification YAML.
## Task
Generate an evaluation specification yaml for the following task. You will produce rubrics, checklists, and scoring criteria that a judge agent will use to evaluate the implementation artifact.
CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`
## User Prompt
{Original task description from user}
## Context
{Any relevant codebase context, file paths, constraints}
## Artifact Type
{code | documentation | configuration | etc.}
## Instructions
Return only the final evaluation specification YAML in your response.
Use Task tool:
- description: "Meta-judge: {brief task summary}"
- prompt: {meta-judge prompt}
- model: opus
- subagent_type: "sadd:meta-judge"
Construct the implementation prompt with these mandatory components:
Zero-shot Chain-of-Thought Prefix (REQUIRED - MUST BE FIRST)
## Reasoning Approach
Before taking any action, think through this task systematically.
Let's approach this step by step:
1. "Let me understand what this task requires..."
- What is the specific objective?
- What constraints exist?
- What is the expected outcome?
2. "Let me explore the relevant code..."
- What files are involved?
- What patterns exist in the codebase?
- What dependencies need consideration?
3. "Let me plan my approach..."
- What specific modifications are needed?
- What order should I make them?
- What could go wrong?
4. "Let me verify my approach before implementing..."
- Does my plan achieve the objective?
- Am I following existing patterns?
- Is there a simpler way?
Work through each step explicitly before implementing.
Task Body
## Task
{Task description from user}
## Constraints
- Follow existing code patterns and conventions
- Make minimal changes to achieve the objective
- Do not introduce new dependencies without justification
- Ensure changes are testable
## Output
Provide your implementation along with a "Summary" section containing:
- Files modified (full paths)
- Key changes (3-5 bullet points)
- Any decisions made and rationale
- Potential concerns or follow-up needed
Self-Critique Suffix (REQUIRED - MUST BE LAST)
## Self-Critique Verification (MANDATORY)
Before completing, verify your work. Do not submit unverified changes.
### Verification Questions
| # | Question | Evidence Required |
|---|----------|-------------------|
| 1 | Does my solution address ALL requirements? | [Specific evidence] |
| 2 | Did I follow existing code patterns? | [Pattern examples] |
| 3 | Are there any edge cases I missed? | [Edge case analysis] |
| 4 | Is my solution the simplest approach? | [Alternatives considered] |
| 5 | Would this pass code review? | [Quality check] |
### Answer Each Question with Evidence
Examine your solution and provide specific evidence for each question.
### Revise If Needed
If ANY verification question reveals a gap:
1. **FIX** - Address the specific gap identified
2. **RE-VERIFY** - Confirm the fix resolves the issue
3. **UPDATE** - Update the Summary section
CRITICAL: Do not submit until ALL verification questions have satisfactory answers.
Dispatch
Determine the optimal agent type based on the task and avaiable agents, for exmple: code implementation -> sdd:developer agent. If you not sure, better use general-purpose agent, than dispatch incorrect agent type.
Use Task tool:
- description: "Implement: {brief task summary}"
- prompt: {constructed prompt with CoT + task + self-critique}
- model: {selected model}
- subagent_type: "{selected agent type}"
Send BOTH Task tool calls in a single message. Meta-judge first, implementation second:
Message with 2 tool calls:
Tool call 1 (meta-judge):
- description: "Meta-judge: {brief task summary}"
- model: opus
- subagent_type: "sadd:meta-judge"
Tool call 2 (implementation):
- description: "Implement: {brief task summary}"
- model: {selected model}
- subagent_type: "{selected agent type}"
Wait for BOTH to return before proceeding to Phase 3.
After BOTH meta-judge and implementation complete, dispatch the judge agent.
CRITICAL: Provide to the judge EXACT meta-judge's evaluation specification YAML, do not skip or add anything, do not modify it in any way, do not shorten or sumaraize any text in it!
Extract from meta-judge output:
Extract from implementation output:
Judge prompt template:
You are evaluating an implementation artifact against an evaluation specification produced by the meta judge.
CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`
## User Prompt
{Original task description from user}
## Evaluation Specification
```yaml
{meta-judge's evaluation specification YAML}
{Summary section from implementation agent} {Paths to files modified}
Follow your full judge process as defined in your agent instructions!
CRITICAL: You must reply with this exact structured evaluation report format in YAML at the START of your response!
CRITICAL: NEVER provide score threshold, in any format, including `threshold_pass` or anything different. Judge MUST not know what thershold for score is, in order to not be biased!!!
**Dispatch:**
Use Task tool:
Parse judge output (DO NOT read full report):
Extract from judge reply:
- VERDICT: PASS or FAIL
- SCORE: X.X/5.0
- ISSUES: List of problems (if any)
- IMPROVEMENTS: List of suggestions (if any)
Decision logic:
If score ≥4:
→ VERDICT: PASS
→ Report success with summary
→ Include IMPROVEMENTS as optional enhancements
IF score ≥ 3.0 and all found issues are low priority, then:
→ VERDICT: PASS
→ Report success with summary
→ Include IMPROVEMENTS as optional enhancements
If score <4:
→ VERDICT: FAIL
→ Check retry count
If retries < 3:
→ Dispatch retry implementation agent with judge feedback
→ Return to Phase 3 (judge verification with same meta-judge specification)
If retries ≥ 3:
→ Escalate to user (see Error Handling)
→ Do NOT proceed without user decision
Retry prompt template:
## Retry Required
Your previous implementation did not pass judge verification.
## Original Task
{Original task description}
## Judge Feedback
VERDICT: FAIL
SCORE: {score}/5.0
ISSUES:
{list of issues from judge}
## Your Previous Changes
{files modified in previous attempt}
## Instructions
Let's fix the identified issues step by step.
1. Review each issue the judge identified
2. For each issue, determine the root cause
3. Plan the fix for each issue
4. Implement ALL fixes
5. Verify your fixes address each issue
6. Provide updated Summary section
CRITICAL: Focus on fixing the specific issues identified. Do not rewrite everything.
After task passes verification:
## Execution Summary
**Task:** {original task description}
**Result:** ✅ PASS
### Verification
| Attempt | Score | Status |
|---------|-------|--------|
| 1 | {X.X}/5.0 | {PASS/FAIL} |
| 2 | {X.X}/5.0 | {PASS/FAIL} | (if retry occurred)
### Files Modified
- {file1}: {what changed}
- {file2}: {what changed}
### Key Changes
- {change 1}
- {change 2}
### Suggested Improvements (Optional)
{IMPROVEMENTS from judge, if any}
When task fails verification twice:
Escalation Report Format:
## Task Failed Verification (Max Retries Exceeded)
### Task Requirements
{original task description}
### Verification History
| Attempt | Score | Key Issues |
|---------|-------|------------|
| 1 | {X.X}/5.0 | {issues} |
| 2 | {X.X}/5.0 | {issues} |
| 3 | {X.X}/5.0 | {issues} |
### Persistent Issues
{Issues that appeared in multiple attempts}
### Options
1. **Provide guidance** - Give additional context for another retry
2. **Modify requirements** - Simplify or clarify task
3. **Abort** - Stop execution
Awaiting your decision...
Input:
/do-and-judge Extract the validation logic from UserController into a separate UserValidator class
Execution:
Phase 1: Task Analysis
→ Model: Opus
Phase 2: Parallel Dispatch (single message, 2 tool calls)
Tool call 1 — Meta-judge (Opus)...
→ Generated evaluation specification YAML
→ 3 rubric dimensions, 6 checklist items
Tool call 2 — Implementation (sadd:meta-judge + Opus)...
→ Created UserValidator.ts
→ Updated UserController to use validator
→ Summary: 2 files modified, validation extracted
Phase 3: Dispatch Judge (with meta-judge specification)
Judge (sadd:judge)...
→ VERDICT: PASS, SCORE: 4.2/5.0
→ ISSUES: None
→ IMPROVEMENTS: Add input validation for edge cases
Phase 6: Final Report
✅ PASS on attempt 1
Files: UserValidator.ts (new), UserController.ts (modified)
Input:
/do-and-judge Implement rate limiting middleware with configurable limits per endpoint
Execution:
Phase 1: Task Analysis
- Complexity: High (new feature, multiple concerns)
- Risk: High (affects all endpoints)
- Scope: Medium (single middleware)
→ Model: opus
Phase 2: Parallel Dispatch (Attempt 1)
Tool call 1 — Meta-judge (Opus)...
→ Generated evaluation specification YAML
→ 4 rubric dimensions, 8 checklist items
Tool call 2 — Implementation (sadd:meta-judge + Opus + sdd:developer)...
→ Created RateLimiter middleware
→ Added configuration schema
Phase 3: Dispatch Judge (with meta-judge specification)
Judge (sadd:judge + Opus)...
→ VERDICT: FAIL, SCORE: 3.1/5.0
→ ISSUES:
- Missing per-endpoint configuration
- No Redis support for distributed deployments
→ IMPROVEMENTS: Add monitoring hooks
Phase 5: Retry with Feedback
Implementation (sadd:meta-judge + Opus)...
→ Added endpoint-specific limits
→ Added Redis adapter option
Phase 3: Dispatch Judge (Attempt 2, same meta-judge specification)
Judge (sadd:judge + Opus)...
→ VERDICT: PASS, SCORE: 4.4/5.0
→ IMPROVEMENTS: Add metrics export
Phase 6: Final Report
✅ PASS on attempt 2
Files: RateLimiter.ts, config/rateLimits.ts, adapters/RedisAdapter.ts
Input:
/do-and-judge Migrate the database schema to support multi-tenancy
Execution:
Phase 1: Task Analysis
- Complexity: High
- Risk: High (database schema change)
→ Model: opus
Phase 2: Parallel Dispatch
Meta-judge → evaluation specification YAML
Implementation → initial migration scaffolding
Attempt 1: FAIL (2.8/5.0) - Missing tenant isolation in queries
Attempt 2: FAIL (3.2/5.0) - Incomplete migration script
Attempt 3: FAIL (3.3/5.0) - Edge cases in existing data migration
ESCALATION:
Persistent issue: Existing data migration requires business decisions
about how to handle orphaned records.
Options presented to user:
1. Provide guidance on orphan handling
2. Simplify to new tenants only
3. Abort
User chose: Option 1 - "Delete orphaned records older than 1 year"
Attempt 4 (with guidance): PASS (4.1/5.0)
Weekly Installs
214
Repository
GitHub Stars
699
First Seen
Feb 19, 2026
Installed on
opencode208
github-copilot207
codex207
gemini-cli206
cursor205
kimi-cli204
AI Elements:基于shadcn/ui的AI原生应用组件库,快速构建对话界面
60,400 周安装
OpenAI API 完整文档技能 - 官方文档集成与智能问答助手
1 周安装
Next.js 官方文档中文指南 - 从入门到精通,掌握App Router、数据获取与性能优化
1 周安装
Hono 框架中文文档 | 轻量级 Web 框架,支持 Bun、Deno、Cloudflare Workers
1 周安装
Express.js 全面中文文档与 API 参考 | 涵盖安全漏洞、性能优化与迁移指南
1 周安装
Drizzle ORM 完整文档 | 无头 ORM 与类 SQL 查询指南
1 周安装
Cortex 文档大全 | 集成指南与 API 参考 | 涵盖 FireHydrant、ServiceNow、Datadog 等
1 周安装