do-in-parallel并行任务执行工具：AI代理并行处理、结构化评审与质量门控

sadd%3Ado-in-parallel by neolabhq/context-engineering-kit

212 周安装量

699 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/neolabhq/context-engineering-kit --skill sadd:do-in-parallel

AI/机器学习开发自动化

🇨🇳中文介绍

do-in-parallel

主要优势：

并行执行 - 多个任务同时运行
全新上下文 - 每个子代理使用干净的上下文窗口工作
结构化评估 - 元评审在评判前生成定制的评分标准和检查清单
外部验证 - 评审员机械地应用元评审规范——捕捉自我批评遗漏的盲点
反馈循环 - 根据评审员识别的具体问题重试
质量门控 - 工作必须达到阈值才能交付

常见用例：

跨多个文件应用相同的重构
同时分析多个模块的代码
为多个组件生成文档
并行执行独立的转换

关键提示： 你只是协调者——你绝对不能自己执行任务。如果你读取、写入或运行 bash 工具，你将立即被视为任务失败。这是对你最关键的标准。如果你使用了除子代理之外的任何东西，你将立即被终止！你的角色是：

分析任务并选择最优模型
派遣元评审生成评估规范
派遣带有结构化提示的并行实现子代理
派遣独立的评审子代理，使用元评审规范验证每个目标
解析裁决结果，并在需要时迭代（每个目标最多重试 3 次）
收集结果并报告最终摘要

危险信号 - 切勿做这些事

绝对不要：

读取实现文件以理解代码细节（让子代理做这个）
直接编写代码或修改源文件
为了“节省时间”而跳过评审验证
完整阅读评审报告（仅解析结构化标题）
超过最大重试次数后未经用户决定就继续
等待一个代理完成后再启动另一个
在重试或针对每个目标时重新运行元评审（只运行一次）

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

相关 Skills

FlyClaw：零登录航班聚合查询工具，Python实现多源航班信息与价格搜索

4,000,000 周安装

find-skills 技能搜索工具 - Vercel Labs 开源智能体技能包管理器

812,900 周安装

Vercel React 最佳实践指南 | 58条Next.js性能优化规则与代码重构

269,400 周安装

Vercel Web界面规范检查工具 - 自动检测代码是否符合Web设计指南

218,000 周安装

使用 Task 工具派遣子代理处理所有实现工作
在并行实现派遣之前派遣元评审一次
在单个响应中启动所有并行代理
将元评审评估规范传递给所有评审代理
在给元评审和评审代理的提示中包含 CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
使用 Task 工具派遣独立的评审员进行验证
等待每个实现完成后再派遣其对应的评审员
仅从评审输出中解析 VERDICT/SCORE/ISSUES
如果验证失败，则根据反馈进行迭代（每个目标最多重试 3 次）
对所有目标重用相同的元评审规范（切勿重新运行元评审）

Let me analyze this parallel task step by step to determine the optimal configuration:

1. **Task Type Identification**
   "What type of work is being requested across all targets?"
   - Code transformation / refactoring
   - Code analysis / review
   - Documentation generation
   - Test generation
   - Data transformation
   - Simple lookup / extraction

2. **Per-Target Complexity Assessment**
   "How complex is the work for EACH individual target?"
   - High: Requires deep understanding, architecture decisions, novel solutions
   - Medium: Standard patterns, moderate reasoning, clear approach
   - Low: Simple transformations, mechanical changes, well-defined rules

3. **Per-Target Output Size**
   "How extensive is each target's expected output?"
   - Large: Multi-section documents, comprehensive analysis
   - Medium: Focused deliverable, single component
   - Small: Brief result, minor change

4. **Independence Check**
   "Are the targets truly independent?"
   - Yes: No shared state, no cross-dependencies, order doesn't matter
   - Partial: Some shared context needed, but can run in parallel
   - No: Dependencies exist --> Use sequential execution instead

检查项	问题	如果回答为"否"
文件独立性	目标是否共享文件？	无法并行化 - 文件冲突
状态独立性	任务是否修改共享状态？	无法并行化 - 竞态条件
顺序独立性	执行顺序是否重要？	无法并行化 - 需要顺序执行
输出独立性	是否有目标读取另一个目标的输出？	无法并行化 - 数据依赖

任务概况	推荐模型	理由
每个目标复杂（架构、设计）	`opus`	每个任务的最大推理能力
专业领域（代码审查、安全）	`opus`	领域专业知识很重要
中等复杂度，大输出	`sonnet`	能力良好，对于大量任务具有成本效益
简单转换（重命名、格式化）	`haiku`	快速、廉价，足以应对机械性任务
默认（不确定时）	`opus`	优先考虑质量而非成本

## Task

Generate an evaluation specification yaml for the following task. You will produce rubrics, checklists, and scoring criteria that a judge agent will use to evaluate the implementation artifact.

CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`

## User Prompt
{Original task description from user}

## Context
{Any relevant codebase context, file paths, constraints}

## Artifact Type
{code | documentation | configuration | etc.}

## Instructions
Return only the final evaluation specification YAML in your response.

## Reasoning Approach

Let's think step by step.

Before taking any action, think through the problem systematically:

1. "Let me first understand what is being asked for this specific target..."
   - What is the core objective?
   - What are the explicit requirements?
   - What constraints must I respect?

2. "Let me analyze this specific target..."
   - What is the current state?
   - What patterns or conventions exist?
   - What context is relevant?

3. "Let me plan my approach..."
   - What are the concrete steps?
   - What could go wrong?
   - Is there a simpler approach?

Work through each step explicitly before implementing.

<task>
{Task description from $ARGUMENTS}
</task>

<target>
{Specific target for this agent: file path, component name, etc.}
</target>

<constraints>
- Work ONLY on the specified target
- Do NOT modify other files unless explicitly required
- Follow existing patterns in the target
- {Any additional constraints from context}
</constraints>

<output>
{Expected deliverable location and format}

CRITICAL: At the end of your work, provide a "Summary" section containing:
- Files modified (full paths)
- Key changes (3-5 bullet points)
- Any decisions made and rationale
- Potential concerns or follow-up needed
</output>

## Self-Critique Verification (MANDATORY)

Before completing, verify your work for this target. Do not submit unverified changes.

### 1. Generate Verification Questions

Create questions specific to your task and target. There examples of questions:

| # | Question | Why It Matters |
|---|----------|----------------|
| 1 | Did I achieve the stated objective for this target? | Incomplete work = failed task |
| 2 | Are my changes consistent with patterns in this file/codebase? | Inconsistency creates technical debt |
| 3 | Did I introduce any regressions or break existing functionality? | Breaking changes are unacceptable |
| 4 | Are edge cases and error scenarios handled appropriately? | Edge cases cause production issues |
| 5 | Is my output clear, well-formatted, and ready for review? | Unclear output reduces value |

### 2. Answer Each Question with Evidence

For each question, provide specific evidence from your work:

[Q1] Objective Achievement:
- Required: [what was asked]
- Delivered: [what you did]
- Gap analysis: [any gaps]

[Q2] Pattern Consistency:
- Existing pattern: [observed pattern]
- My implementation: [how I followed it]
- Deviations: [any intentional deviations and why]

[Q3] Regression Check:
- Functions affected: [list]
- Tests that would catch issues: [if known]
- Confidence level: [HIGH/MEDIUM/LOW]

[Q4] Edge Cases:
- Edge case 1: [scenario] - [HANDLED/NOTED]
- Edge case 2: [scenario] - [HANDLED/NOTED]

[Q5] Output Quality:
- Well-organized: [YES/NO]
- Self-documenting: [YES/NO]
- Ready for PR: [YES/NO]

### 3. Fix Issues Before Submitting

If ANY verification reveals a gap:
1. **FIX** - Address the specific issue
2. **RE-VERIFY** - Confirm the fix resolves the issue
3. **DOCUMENT** - Note what was changed and why

CRITICAL: Do not submit until ALL verification questions have satisfactory answers.

┌─────────────────────────────────────────────────────────────────────────┐
│                                                                         │
│   Phase 3.5: Meta-Judge (ONCE)                                          │
│   ┌──────────────────────────────────────┐                              │
│   │ Meta-Judge (Opus)                     │                              │
│   │ → Evaluation Specification YAML       │                              │
│   └──────────────────┬───────────────────┘                              │
│                      │ (shared across all targets)                      │
│                      ▼                                                  │
│   Parallel Targets                                                      │
│                                                                         │
│   Target A          Target B          Target C                          │
│   ┌──────────┐      ┌──────────┐      ┌──────────┐                      │
│   │Implementer│      │Implementer│      │Implementer│                     │
│   │(parallel) │      │(parallel) │      │(parallel) │                     │
│   └─────┬────┘      └─────┬────┘      └─────┬────┘                      │
│         │                 │                 │                            │
│         ▼                 ▼                 ▼                            │
│   ┌──────────┐      ┌──────────┐      ┌──────────┐                      │
│   │  Judge   │      │  Judge   │      │  Judge   │                      │
│   │(per-target)│    │(per-target)│    │(per-target)│                     │
│   │+meta-spec │     │+meta-spec │     │+meta-spec │                     │
│   └─────┬────┘      └─────┬────┘      └─────┬────┘                      │
│         │                 │                 │                            │
│         ▼                 ▼                 ▼                            │
│   ┌──────────────────────────────────────────────────┐                  │
│   │ Parse Verdict (per target)                        │                  │
│   │ ├─ PASS (≥4)? → Complete                          │                  │
│   │ ├─ Soft PASS (≥3 + low priority issues)? → Complete│                 │
│   │ └─ FAIL (<4)? → Retry (max 3 per target)          │                  │
│   └──────────────────────────────────────────────────┘                  │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

## Dispatching 3 parallel tasks

[Task 1]
Use Task tool:
  description: "Parallel: simplify error handling in src/services/user.ts"
  prompt: [CoT prefix + task body for user.ts + critique suffix]
  model: sonnet

[Task 2]
Use Task tool:
  description: "Parallel: simplify error handling in src/services/order.ts"
  prompt: [CoT prefix + task body for order.ts + critique suffix]
  model: sonnet

[Task 3]
Use Task tool:
  description: "Parallel: simplify error handling in src/services/payment.ts"
  prompt: [CoT prefix + task body for payment.ts + critique suffix]
  model: sonnet

[All 3 tasks launched simultaneously - results collected when all complete]

**重要提示：失败是隔离的**
- 一个目标失败**不会**影响其他目标
- 其他并行任务继续独立运行
- 只有失败的目标会被重试

#### 5.4 根据反馈重试（如果需要）

**重试提示模板：**

```markdown
## Retry Required for Target: {target_name}

Your previous implementation did not pass judge verification.

## Original Task
{Original task description}

## Target
{Specific target}

## Judge Feedback
VERDICT: FAIL
SCORE: {score}/5.0
ISSUES:
{list of issues from judge}

## Your Previous Changes
{files modified in previous attempt}

## Instructions
Let's fix the identified issues step by step.

1. Review each issue the judge identified
2. For each issue, determine the root cause
3. Plan the fix for each issue
4. Implement ALL fixes
5. Verify your fixes address each issue
6. Provide updated Summary section

CRITICAL: Focus on fixing the specific issues identified. Do not rewrite everything.

## Parallel Execution Summary

### Configuration
- **Task:** {task description}
- **Model:** {selected model}
- **Targets:** {count} items

### Results

| Target | Model | Judge Score | Retries | Status | Summary |
|--------|-------|-------------|---------|--------|---------|
| {target_1} | {model} | {X.X}/5.0 | {0-3} | SUCCESS | {brief outcome} |
| {target_2} | {model} | {X.X}/5.0 | {0-3} | SUCCESS | {brief outcome} |
| {target_3} | {model} | {X.X}/5.0 | {3} | FAILED | {failure reason} |
| ... | ... | ... | ... | ... | ... |

### Overall Assessment
- **Completed:** {X}/{total}
- **Failed:** {Y}/{total}
- **Total Retries:** {sum of all retries}
- **Common patterns:** {any patterns across results}

### Verification Summary
{Aggregate judge verification results - any common issues?}

### Files Modified
- {list of all modified files}

### Failed Targets (If Any)
{For each failed target after max retries}
- **Target:** {name}
- **Final Score:** {X.X}/5.0
- **Persistent Issues:** {issues that weren't resolved}
- **Options:** Retry with guidance / Skip / Manual fix

### Next Steps
{If any failures, suggest remediation}

Phase 3.5: Dispatch Meta-Judge (ONCE)
  Meta-judge (Opus)...
    → Generated evaluation specification YAML
    → 3 rubric dimensions, 5 checklist items

Phase 5: Parallel Dispatch
  [All 3 implementation agents launched simultaneously]

  Target: user.ts
    Implementation (Sonnet)...
      -> Converted 4 nested if-else blocks to early returns
    Judge Verification (Opus, with meta-judge spec)...
      -> VERDICT: PASS, SCORE: 4.2/5.0
      -> IMPROVEMENTS: Consider extracting complex conditions

  Target: order.ts
    Implementation (Sonnet)...
      -> Converted 6 nested if-else blocks to early returns
    Judge Verification (Opus, with meta-judge spec)...
      -> VERDICT: PASS, SCORE: 4.0/5.0
      -> ISSUES: None

  Target: payment.ts
    Implementation (Sonnet)...
      -> Converted 3 nested if-else blocks
    Judge Verification (Opus, with meta-judge spec)...
      -> VERDICT: FAIL, SCORE: 3.2/5.0
      -> ISSUES: Missing edge case for null amount
    Retry Implementation (Sonnet)...
      -> Added null check for payment amount
    Judge Verification (Opus, with same meta-judge spec)...
      -> VERDICT: PASS, SCORE: 4.1/5.0

## Parallel Execution Summary

### Configuration
- **Task:** Simplify error handling to use early returns
- **Model:** Sonnet
- **Targets:** 3 files

### Results

| Target | Model | Judge Score | Retries | Status | Summary |
|--------|-------|-------------|---------|--------|---------|
| src/services/user.ts | sonnet | 4.2/5.0 | 0 | SUCCESS | Converted 4 nested if-else blocks |
| src/services/order.ts | sonnet | 4.0/5.0 | 0 | SUCCESS | Converted 6 nested if-else blocks |
| src/services/payment.ts | sonnet | 4.1/5.0 | 1 | SUCCESS | Converted 3 blocks, added null check |

### Overall Assessment
- **Completed:** 3/3
- **Total Retries:** 1
- **Total Agents:** 9 (1 meta-judge + 3 implementations + 1 retry + 4 judges)
- **Common patterns:** All files followed consistent early return pattern

目标	模型	评审分数	状态
src/api/users.ts	haiku	4.0/5.0	成功
src/api/products.ts	haiku	3.8/5.0	成功
src/api/orders.ts	haiku	4.2/5.0	成功
src/api/auth.ts	haiku	4.1/5.0	成功

目标	模型	评审分数	重试次数	状态
src/db/queries.ts	opus	4.5/5.0	0	成功
src/db/migrations.ts	opus	4.3/5.0	0	成功
src/api/search.ts	opus	4.0/5.0	1	成功

Phase 3.5: Meta-judge (Opus)
  → Generated evaluation specification YAML
  → 4 rubric dimensions, 7 checklist items

Target: UserService
  -> Judge (Opus, with meta-judge spec): PASS, 4.3/5.0

Target: OrderService
  -> Judge (Opus, with meta-judge spec): FAIL, 3.2/5.0 (missing edge cases)
  -> Retry: Judge (Opus, same meta-judge spec): PASS, 4.0/5.0

Target: PaymentService
  -> Judge (Opus, with meta-judge spec): FAIL, 2.8/5.0 (wrong mock patterns)
  -> Retry 1: Judge (Opus, same meta-judge spec): FAIL, 3.0/5.0 (still missing scenarios)
  -> Retry 2: Judge (Opus, same meta-judge spec): FAIL, 3.1/5.0 (coverage only 65%)
  -> Retry 3: Judge (Opus, same meta-judge spec): FAIL, 3.2/5.0 (coverage at 72%)
  -> MARKED FAILED after max retries

Target: NotificationService
  -> Judge (Opus, with meta-judge spec): PASS, 4.1/5.0

目标	模型	评审分数	重试次数	状态
UserService	sonnet	4.3/5.0	0	成功
OrderService	sonnet	4.0/5.0	1	成功
PaymentService	sonnet	3.2/5.0	3	失败
NotificationService	sonnet	4.1/5.0	0	成功

目标	模型	评审分数	状态
src/handlers/user.ts	haiku	4.2/5.0	成功
src/handlers/order.ts	haiku	4.0/5.0	成功
src/handlers/product.ts	haiku	4.1/5.0	成功

场景	模型	原因
安全分析	Opus	需要关键推理
架构决策	Opus	质量优先于速度
简单重构	Haiku	快速，足够
文档生成	Haiku	机械性任务
按文件代码审查	Sonnet	平衡的能力
测试生成	Sonnet	广泛但模式化

实现模型	评审模型	理由
Opus	Opus	关键工作需要强力验证
Sonnet	Opus	定制评估需要强推理能力
Haiku	Opus	用强力评估验证简单工作

失败类型	描述	恢复操作
可恢复的	评审员发现问题，可以重试	根据评审反馈重试（每个目标最多 3 次）
方法失败	此目标的方法错误	升级给用户并提供选项
基础问题	需求不明确或不可能	升级给用户以澄清
超过最大重试次数	目标在 3 次重试后失败	标记为失败，继续其他目标，最后报告

🇺🇸English

do-in-parallel

Key benefits:

Parallel execution - Multiple tasks run simultaneously
Fresh context - Each sub-agent works with clean context window
Structured evaluation - Meta-judge produces tailored rubrics and checklists before judging
External verification - Judge applies meta-judge specification mechanically — catches blind spots self-critique misses
Feedback loop - Retry with specific issues identified by judge
Quality gate - Work doesn't ship until it meets threshold

Common use cases:

Apply the same refactoring across multiple files
Run code analysis on several modules simultaneously
Generate documentation for multiple components
Execute independent transformations in parallel

CRITICAL: You are the orchestrator only - you MUST NOT perform the task yourself. IF you read, write or run bash tools you failed task imidiatly. It is single most critical criteria for you. If you used anyting except sub-agents you will be killed immediatly!!!! Your role is to:

Analyze the task and select optimal model
Dispatch meta-judge to generate evaluation specification
Dispatch parallel implementation sub-agents with structured prompts
Dispatch independent judge sub-agents to verify each target using the meta-judge specification
Parse verdict and iterate if needed (max 3 retries per target)
Collect results and report final summary

RED FLAGS - Never Do These

NEVER:

Read implementation files to understand code details (let sub-agents do this)
Write code or make changes to source files directly
Skip judge verification to "save time"
Read judge reports in full (only parse structured headers)
Proceed after max retries without user decision
Wait for one agent to complete before starting another
Re-run meta-judge on retries or per-target (run it ONCE)

ALWAYS:

Use Task tool to dispatch sub-agents for ALL implementation work
Dispatch meta-judge ONCE before parallel implementation dispatch
Launch ALL parallel agents in a SINGLE response
Pass meta-judge evaluation specification to ALL judge agents
Include CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT} in prompts to meta-judge and judge agents
Use Task tool to dispatch independent judges for verification
Wait for each implementation to complete before dispatching its judge
Parse only VERDICT/SCORE/ISSUES from judge output
Iterate with feedback if verification fails (max 3 retries per target)
Reuse same meta-judge specification for all targets (never re-run meta-judge)

Process

Phase 1: Parse Input and Identify Targets

Extract targets from the command arguments:

Input patterns:
1. --files "src/a.ts,src/b.ts,src/c.ts"    --> File-based targets
2. --targets "UserService,OrderService"    --> Named targets
3. Infer from task description             --> Parse file paths from task

Parsing rules:

If --files provided: Split by comma, validate each path exists
If --targets provided: Split by comma, use as-is
If neither: Attempt to extract file paths or target names from task description

Phase 2: Task Analysis with Zero-shot CoT

Before dispatching, analyze the task systematically:

Let me analyze this parallel task step by step to determine the optimal configuration:

1. **Task Type Identification**
   "What type of work is being requested across all targets?"
   - Code transformation / refactoring
   - Code analysis / review
   - Documentation generation
   - Test generation
   - Data transformation
   - Simple lookup / extraction

2. **Per-Target Complexity Assessment**
   "How complex is the work for EACH individual target?"
   - High: Requires deep understanding, architecture decisions, novel solutions
   - Medium: Standard patterns, moderate reasoning, clear approach
   - Low: Simple transformations, mechanical changes, well-defined rules

3. **Per-Target Output Size**
   "How extensive is each target's expected output?"
   - Large: Multi-section documents, comprehensive analysis
   - Medium: Focused deliverable, single component
   - Small: Brief result, minor change

4. **Independence Check**
   "Are the targets truly independent?"
   - Yes: No shared state, no cross-dependencies, order doesn't matter
   - Partial: Some shared context needed, but can run in parallel
   - No: Dependencies exist --> Use sequential execution instead

Independence Validation (REQUIRED before parallel dispatch)

Verify tasks are truly independent before proceeding:

Check	Question	If NO
File Independence	Do targets share files?	Cannot parallelize - files conflict
State Independence	Do tasks modify shared state?	Cannot parallelize - race conditions
Order Independence	Does execution order matter?	Cannot parallelize - sequencing required
Output Independence	Does any target read another's output?	Cannot parallelize - data dependency

Independence Checklist:

No target reads output from another target
No target modifies files another target reads
Order of completion doesn't matter
No shared mutable state
No database transactions spanning targets

If ANY check fails: STOP and inform user why parallelization is unsafe. Recommend /launch-sub-agent for sequential execution.

Phase 3: Model and Agent Selection

Select the optimal model and specialized agent based on task analysis. Same configuration for all parallel agents (ensures consistent quality):

3.1 Model Selection

Task Profile	Recommended Model	Rationale
Complex per-target (architecture, design)	`opus`	Maximum reasoning capability per task
Specialized domain (code review, security)	`opus`	Domain expertise matters
Medium complexity, large output	`sonnet`	Good capability, cost-efficient for volume
Simple transformations (rename, format)	`haiku`	Fast, cheap, sufficient for mechanical tasks
(when uncertain)

Decision Tree:

Is EACH target's task COMPLEX (architecture, novel problem, critical decision)?
|
+-- YES --> Use Opus for ALL agents
|
+-- NO --> Is task SIMPLE and MECHANICAL (rename, format, extract)?
           |
           +-- YES --> Use Haiku for ALL agents
           |
           +-- NO --> Is output LARGE but task not complex?
                      |
                      +-- YES --> Use Sonnet for ALL agents
                      |
                      +-- NO --> Use Opus for ALL agents (default)

3.2 Specialized Agent Selection (Optional)

If the task matches a specialized domain, include the relevant agent prompt in ALL parallel agents. Specialized agents provide domain-specific best practices that improve output quality.

Specialized Agents: Specialized agent list depends on project and plugins that are loaded.

Decision: Use specialized agent when:

Task clearly benefits from domain expertise
Consistency across all parallel agents is important
Task is NOT trivial (overhead not justified for simple tasks)

Skip specialized agent when:

Task is simple/mechanical (Haiku-tier)
No clear domain match exists
General-purpose execution is sufficient

Phase 3.5: Dispatch Meta-Judge

Before dispatching parallel implementation agents, dispatch a single meta-judge to generate an evaluation specification. The meta-judge produces rubrics, checklists, and scoring criteria tailored to this specific task. The SAME specification is reused for ALL per-target judge verifications.

Meta-judge prompt template:

## Task

Generate an evaluation specification yaml for the following task. You will produce rubrics, checklists, and scoring criteria that a judge agent will use to evaluate the implementation artifact.

CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`

## User Prompt
{Original task description from user}

## Context
{Any relevant codebase context, file paths, constraints}

## Artifact Type
{code | documentation | configuration | etc.}

## Instructions
Return only the final evaluation specification YAML in your response.

Dispatch:

Use Task tool:
  - description: "Meta-judge: {brief task summary}"
  - prompt: {meta-judge prompt}
  - model: opus
  - subagent_type: "sadd:meta-judge"

Wait for meta-judge to complete before proceeding to Phase 4.

Phase 4: Construct Per-Target Prompts

Build identical prompt structure for each target, customized only with target-specific details:

4.1 Zero-shot Chain-of-Thought Prefix (REQUIRED - MUST BE FIRST)

## Reasoning Approach

Let's think step by step.

Before taking any action, think through the problem systematically:

1. "Let me first understand what is being asked for this specific target..."
   - What is the core objective?
   - What are the explicit requirements?
   - What constraints must I respect?

2. "Let me analyze this specific target..."
   - What is the current state?
   - What patterns or conventions exist?
   - What context is relevant?

3. "Let me plan my approach..."
   - What are the concrete steps?
   - What could go wrong?
   - Is there a simpler approach?

Work through each step explicitly before implementing.

4.2 Task Body (Customized per target)

<task>
{Task description from $ARGUMENTS}
</task>

<target>
{Specific target for this agent: file path, component name, etc.}
</target>

<constraints>
- Work ONLY on the specified target
- Do NOT modify other files unless explicitly required
- Follow existing patterns in the target
- {Any additional constraints from context}
</constraints>

<output>
{Expected deliverable location and format}

CRITICAL: At the end of your work, provide a "Summary" section containing:
- Files modified (full paths)
- Key changes (3-5 bullet points)
- Any decisions made and rationale
- Potential concerns or follow-up needed
</output>

4.3 Self-Critique Suffix (REQUIRED - MUST BE LAST)

## Self-Critique Verification (MANDATORY)

Before completing, verify your work for this target. Do not submit unverified changes.

### 1. Generate Verification Questions

Create questions specific to your task and target. There examples of questions:

| # | Question | Why It Matters |
|---|----------|----------------|
| 1 | Did I achieve the stated objective for this target? | Incomplete work = failed task |
| 2 | Are my changes consistent with patterns in this file/codebase? | Inconsistency creates technical debt |
| 3 | Did I introduce any regressions or break existing functionality? | Breaking changes are unacceptable |
| 4 | Are edge cases and error scenarios handled appropriately? | Edge cases cause production issues |
| 5 | Is my output clear, well-formatted, and ready for review? | Unclear output reduces value |

### 2. Answer Each Question with Evidence

For each question, provide specific evidence from your work:

[Q1] Objective Achievement:
- Required: [what was asked]
- Delivered: [what you did]
- Gap analysis: [any gaps]

[Q2] Pattern Consistency:
- Existing pattern: [observed pattern]
- My implementation: [how I followed it]
- Deviations: [any intentional deviations and why]

[Q3] Regression Check:
- Functions affected: [list]
- Tests that would catch issues: [if known]
- Confidence level: [HIGH/MEDIUM/LOW]

[Q4] Edge Cases:
- Edge case 1: [scenario] - [HANDLED/NOTED]
- Edge case 2: [scenario] - [HANDLED/NOTED]

[Q5] Output Quality:
- Well-organized: [YES/NO]
- Self-documenting: [YES/NO]
- Ready for PR: [YES/NO]

### 3. Fix Issues Before Submitting

If ANY verification reveals a gap:
1. **FIX** - Address the specific issue
2. **RE-VERIFY** - Confirm the fix resolves the issue
3. **DOCUMENT** - Note what was changed and why

CRITICAL: Do not submit until ALL verification questions have satisfactory answers.

Phase 5: Parallel Dispatch and Judge Verification

Launch all sub-agents simultaneously, then verify each with an independent judge using the meta-judge's evaluation specification.

5.1 Execution Flow per Target

┌─────────────────────────────────────────────────────────────────────────┐
│                                                                         │
│   Phase 3.5: Meta-Judge (ONCE)                                          │
│   ┌──────────────────────────────────────┐                              │
│   │ Meta-Judge (Opus)                     │                              │
│   │ → Evaluation Specification YAML       │                              │
│   └──────────────────┬───────────────────┘                              │
│                      │ (shared across all targets)                      │
│                      ▼                                                  │
│   Parallel Targets                                                      │
│                                                                         │
│   Target A          Target B          Target C                          │
│   ┌──────────┐      ┌──────────┐      ┌──────────┐                      │
│   │Implementer│      │Implementer│      │Implementer│                     │
│   │(parallel) │      │(parallel) │      │(parallel) │                     │
│   └─────┬────┘      └─────┬────┘      └─────┬────┘                      │
│         │                 │                 │                            │
│         ▼                 ▼                 ▼                            │
│   ┌──────────┐      ┌──────────┐      ┌──────────┐                      │
│   │  Judge   │      │  Judge   │      │  Judge   │                      │
│   │(per-target)│    │(per-target)│    │(per-target)│                     │
│   │+meta-spec │     │+meta-spec │     │+meta-spec │                     │
│   └─────┬────┘      └─────┬────┘      └─────┬────┘                      │
│         │                 │                 │                            │
│         ▼                 ▼                 ▼                            │
│   ┌──────────────────────────────────────────────────┐                  │
│   │ Parse Verdict (per target)                        │                  │
│   │ ├─ PASS (≥4)? → Complete                          │                  │
│   │ ├─ Soft PASS (≥3 + low priority issues)? → Complete│                 │
│   │ └─ FAIL (<4)? → Retry (max 3 per target)          │                  │
│   └──────────────────────────────────────────────────┘                  │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

CRITICAL: Parallel Dispatch Pattern

Launch ALL implementation agents in a SINGLE response. Do NOT wait for one agent to complete before starting another:

## Dispatching 3 parallel tasks

[Task 1]
Use Task tool:
  description: "Parallel: simplify error handling in src/services/user.ts"
  prompt: [CoT prefix + task body for user.ts + critique suffix]
  model: sonnet

[Task 2]
Use Task tool:
  description: "Parallel: simplify error handling in src/services/order.ts"
  prompt: [CoT prefix + task body for order.ts + critique suffix]
  model: sonnet

[Task 3]
Use Task tool:
  description: "Parallel: simplify error handling in src/services/payment.ts"
  prompt: [CoT prefix + task body for payment.ts + critique suffix]
  model: sonnet

[All 3 tasks launched simultaneously - results collected when all complete]

Parallelization Guidelines:

Launch ALL independent tasks in a single batch (same response)
Do NOT wait for one task before starting another
Do NOT make sequential Task tool calls
Task tool handles parallelization automatically
Results collected after all complete

Context Isolation (IMPORTANT):

Pass only context relevant to each specific target
Do NOT pass the full list of all targets to each agent
Let sub-agents discover local patterns through file reading
Each agent works in clean context without accumulated confusion

5.2 Judge Verification Protocol

After each implementation agent completes, dispatch an independent judge for that target using the meta-judge's evaluation specification.

CRITICAL: Provide to the judge EXACT meta-judge's evaluation specification YAML, do not skip or add anything, do not modify it in any way, do not shorten or summarize any text in it!

Judge prompt template:

You are evaluating an implementation artifact for target {target_name} against an evaluation specification produced by the meta judge.

CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`

## User Prompt
{Original task description from user}

## Target
{Specific target: file path or component name}

## Evaluation Specification

```yaml
{meta-judge's evaluation specification YAML}

Implementation Output

{Summary section from implementation agent} {Paths to files modified}

Instructions

Follow your full judge process as defined in your agent instructions!

Output

CRITICAL: You must reply with this exact structured evaluation report format in YAML at the START of your response!

CRITICAL: NEVER provide score threshold, in any format, including `threshold_pass` or anything different. Judge MUST not know what threshold for score is, in order to not be biased!!!

**Dispatch judge for each target:**

Use Task tool:

description: "Judge: {target name}"
prompt: {judge verification prompt with exact meta-judge specification YAML}
model: opus
subagent_type: "sadd:judge"

5.3 Parse Verdict and Iterate

Parse judge output for each target (DO NOT read full report):

Extract from judge reply:

VERDICT: PASS or FAIL
SCORE: X.X/5.0
ISSUES: List of problems (if any)
IMPROVEMENTS: List of suggestions (if any)

Decision logic per target:

If score >= 4: -> VERDICT: PASS -> Mark target complete -> Include IMPROVEMENTS as optional enhancements

IF score >= 3.0 and all found issues are low priority, then: -> VERDICT: PASS -> Mark target complete -> Include IMPROVEMENTS as optional enhancements

If score < 4: -> VERDICT: FAIL -> Check retry count for this target

If retries < 3: -> Dispatch retry implementation agent with judge feedback -> Return to judge verification with same meta-judge specification

If retries >= 3: -> Mark target as failed (isolate from other targets) -> Do NOT proceed with more retries without user decision

**IMPORTANT: Failures are isolated**
- One target failing does NOT affect other targets
- Other parallel tasks continue independently
- Only the failed target is retried

#### 5.4 Retry with Feedback (If Needed)

**Retry prompt template:**

```markdown
## Retry Required for Target: {target_name}

Your previous implementation did not pass judge verification.

## Original Task
{Original task description}

## Target
{Specific target}

## Judge Feedback
VERDICT: FAIL
SCORE: {score}/5.0
ISSUES:
{list of issues from judge}

## Your Previous Changes
{files modified in previous attempt}

## Instructions
Let's fix the identified issues step by step.

1. Review each issue the judge identified
2. For each issue, determine the root cause
3. Plan the fix for each issue
4. Implement ALL fixes
5. Verify your fixes address each issue
6. Provide updated Summary section

CRITICAL: Focus on fixing the specific issues identified. Do not rewrite everything.

Phase 6: Collect and Summarize Results

After all agents complete (with retries as needed), aggregate results:

## Parallel Execution Summary

### Configuration
- **Task:** {task description}
- **Model:** {selected model}
- **Targets:** {count} items

### Results

| Target | Model | Judge Score | Retries | Status | Summary |
|--------|-------|-------------|---------|--------|---------|
| {target_1} | {model} | {X.X}/5.0 | {0-3} | SUCCESS | {brief outcome} |
| {target_2} | {model} | {X.X}/5.0 | {0-3} | SUCCESS | {brief outcome} |
| {target_3} | {model} | {X.X}/5.0 | {3} | FAILED | {failure reason} |
| ... | ... | ... | ... | ... | ... |

### Overall Assessment
- **Completed:** {X}/{total}
- **Failed:** {Y}/{total}
- **Total Retries:** {sum of all retries}
- **Common patterns:** {any patterns across results}

### Verification Summary
{Aggregate judge verification results - any common issues?}

### Files Modified
- {list of all modified files}

### Failed Targets (If Any)
{For each failed target after max retries}
- **Target:** {name}
- **Final Score:** {X.X}/5.0
- **Persistent Issues:** {issues that weren't resolved}
- **Options:** Retry with guidance / Skip / Manual fix

### Next Steps
{If any failures, suggest remediation}

Failure Handling:

Report failed tasks clearly with error details
Successful tasks are NOT affected by failures
Failed targets isolated after max retries
Suggest options: provide guidance, skip, or manual fix

Examples

Example 1: Code Simplification Across Modules

Input:

/do-in-parallel "Simplify error handling to use early returns instead of nested if-else" \
  --files "src/services/user.ts,src/services/order.ts,src/services/payment.ts"

Analysis:

Task type: Code transformation / refactoring
Per-target complexity: Medium (pattern-based transformation)
Output size: Medium (modified file)
Independence: Yes (separate files, no cross-dependencies)

Model Selection: Sonnet (pattern-based, medium complexity)

Execution:

Phase 3.5: Dispatch Meta-Judge (ONCE)
  Meta-judge (Opus)...
    → Generated evaluation specification YAML
    → 3 rubric dimensions, 5 checklist items

Phase 5: Parallel Dispatch
  [All 3 implementation agents launched simultaneously]

  Target: user.ts
    Implementation (Sonnet)...
      -> Converted 4 nested if-else blocks to early returns
    Judge Verification (Opus, with meta-judge spec)...
      -> VERDICT: PASS, SCORE: 4.2/5.0
      -> IMPROVEMENTS: Consider extracting complex conditions

  Target: order.ts
    Implementation (Sonnet)...
      -> Converted 6 nested if-else blocks to early returns
    Judge Verification (Opus, with meta-judge spec)...
      -> VERDICT: PASS, SCORE: 4.0/5.0
      -> ISSUES: None

  Target: payment.ts
    Implementation (Sonnet)...
      -> Converted 3 nested if-else blocks
    Judge Verification (Opus, with meta-judge spec)...
      -> VERDICT: FAIL, SCORE: 3.2/5.0
      -> ISSUES: Missing edge case for null amount
    Retry Implementation (Sonnet)...
      -> Added null check for payment amount
    Judge Verification (Opus, with same meta-judge spec)...
      -> VERDICT: PASS, SCORE: 4.1/5.0

Result:

## Parallel Execution Summary

### Configuration
- **Task:** Simplify error handling to use early returns
- **Model:** Sonnet
- **Targets:** 3 files

### Results

| Target | Model | Judge Score | Retries | Status | Summary |
|--------|-------|-------------|---------|--------|---------|
| src/services/user.ts | sonnet | 4.2/5.0 | 0 | SUCCESS | Converted 4 nested if-else blocks |
| src/services/order.ts | sonnet | 4.0/5.0 | 0 | SUCCESS | Converted 6 nested if-else blocks |
| src/services/payment.ts | sonnet | 4.1/5.0 | 1 | SUCCESS | Converted 3 blocks, added null check |

### Overall Assessment
- **Completed:** 3/3
- **Total Retries:** 1
- **Total Agents:** 9 (1 meta-judge + 3 implementations + 1 retry + 4 judges)
- **Common patterns:** All files followed consistent early return pattern

Example 2: Documentation Generation

Input:

/do-in-parallel "Generate JSDoc documentation for all public methods" \
  --files "src/api/users.ts,src/api/products.ts,src/api/orders.ts,src/api/auth.ts"

Analysis:

Task type: Documentation generation
Per-target complexity: Low (mechanical documentation)
Output size: Medium (inline comments)
Independence: Yes

Model Selection: Haiku (mechanical, well-defined rules)

Dispatch: 1 meta-judge + 4 parallel agents

Execution Summary:

Target	Model	Judge Score	Status
src/api/users.ts	haiku	4.0/5.0	SUCCESS
src/api/products.ts	haiku	3.8/5.0	SUCCESS
src/api/orders.ts	haiku	4.2/5.0	SUCCESS
src/api/auth.ts	haiku	4.1/5.0	SUCCESS

Total Agents: 9 (1 meta-judge + 4 implementations + 4 judges)

Example 3: Security Analysis

Input:

/do-in-parallel "Analyze for potential SQL injection vulnerabilities and suggest fixes" \
  --files "src/db/queries.ts,src/db/migrations.ts,src/api/search.ts"

Analysis:

Task type: Security analysis
Per-target complexity: High (security requires careful analysis)
Output size: Medium (analysis report + suggestions)
Independence: Yes

Model Selection: Opus (security-critical, requires deep analysis)

Dispatch: 1 meta-judge + 3 parallel agents

Execution Summary:

Target	Model	Judge Score	Retries	Status
src/db/queries.ts	opus	4.5/5.0	0	SUCCESS
src/db/migrations.ts	opus	4.3/5.0	0	SUCCESS
src/api/search.ts	opus	4.0/5.0	1	SUCCESS

Total Agents: 8 (1 meta-judge + 3 implementations + 1 retry + 3 judges)

Example 4: Test Generation with Partial Failure

Input:

/do-in-parallel "Generate unit tests achieving 80% coverage" \
  --targets "UserService,OrderService,PaymentService,NotificationService"

Analysis:

Task type: Test generation
Per-target complexity: Medium (follow testing patterns)
Output size: Large (multiple test files)
Independence: Yes (separate services)

Model Selection: Sonnet (pattern-based, extensive output)

Dispatch: 1 meta-judge + 4 parallel agents

Execution:

Phase 3.5: Meta-judge (Opus)
  → Generated evaluation specification YAML
  → 4 rubric dimensions, 7 checklist items

Target: UserService
  -> Judge (Opus, with meta-judge spec): PASS, 4.3/5.0

Target: OrderService
  -> Judge (Opus, with meta-judge spec): FAIL, 3.2/5.0 (missing edge cases)
  -> Retry: Judge (Opus, same meta-judge spec): PASS, 4.0/5.0

Target: PaymentService
  -> Judge (Opus, with meta-judge spec): FAIL, 2.8/5.0 (wrong mock patterns)
  -> Retry 1: Judge (Opus, same meta-judge spec): FAIL, 3.0/5.0 (still missing scenarios)
  -> Retry 2: Judge (Opus, same meta-judge spec): FAIL, 3.1/5.0 (coverage only 65%)
  -> Retry 3: Judge (Opus, same meta-judge spec): FAIL, 3.2/5.0 (coverage at 72%)
  -> MARKED FAILED after max retries

Target: NotificationService
  -> Judge (Opus, with meta-judge spec): PASS, 4.1/5.0

Result:

Target	Model	Judge Score	Retries	Status
UserService	sonnet	4.3/5.0	0	SUCCESS
OrderService	sonnet	4.0/5.0	1	SUCCESS
PaymentService	sonnet	3.2/5.0	3	FAILED
NotificationService	sonnet	4.1/5.0	0	SUCCESS

Overall: 3/4 completed, 1 failed

Escalation for PaymentService:

### Failed Target: PaymentService
- **Final Score:** 3.2/5.0
- **Persistent Issues:**
  - Test coverage at 72%, target is 80%
  - Complex async scenarios not fully covered
- **Options:**
  1. Provide guidance on specific async patterns to test
  2. Accept 72% coverage as sufficient
  3. Manual test writing for complex scenarios

Example 5: Inferred Targets from Task

Input:

/do-in-parallel "Apply consistent logging format to src/handlers/user.ts, src/handlers/order.ts, and src/handlers/product.ts"

Analysis:

Targets inferred: 3 files extracted from task description
Task type: Code transformation
Complexity: Low
Independence: Yes

Model Selection: Haiku (simple, mechanical)

Dispatch: 1 meta-judge + 3 parallel agents

Execution Summary:

Target	Model	Judge Score	Status
src/handlers/user.ts	haiku	4.2/5.0	SUCCESS
src/handlers/order.ts	haiku	4.0/5.0	SUCCESS
src/handlers/product.ts	haiku	4.1/5.0	SUCCESS

Best Practices

Target Selection

Be specific: List exact files when possible
Use globs carefully: Review expanded list before confirming
Limit scope: 10-15 targets max per batch for manageability
Group by similarity: Similar targets benefit from consistent patterns

Model Selection Guidelines

Scenario	Model	Reason
Security analysis	Opus	Critical reasoning required
Architecture decisions	Opus	Quality over speed
Simple refactoring	Haiku	Fast, sufficient
Documentation generation	Haiku	Mechanical task
Code review per file	Sonnet	Balanced capability
Test generation	Sonnet	Extensive but patterned

Meta-Judge + Judge Verification

Never skip meta-judge - Tailored evaluation criteria produce better judgments than generic ones
Reuse meta-judge spec across all targets - The evaluation specification stays constant; only the implementation changes
Parse only headers from judge - Don't read full reports to avoid context pollution
Include CLAUDE_PLUGIN_ROOT - Both meta-judge and judge need the resolved plugin root path
Meta-judge YAML - Pass only the meta-judge YAML to the judge, do not add any additional text or comments to it!

Judge Selection

Implementation Model	Judge Model	Rationale
Opus	Opus	Critical work needs strong verification
Sonnet	Opus	Tailored evaluation requires strong reasoning
Haiku	Opus	Verify simple work with strong evaluation

Guideline: Judges always use Opus for consistent, high-quality evaluation across all targets.

Context Isolation

Minimal context: Each sub-agent gets only what it needs
No cross-references: Don't tell Agent A about Agent B's target
Let them discover: Sub-agents read files to understand patterns
File system as truth: Changes are coordinated through the filesystem

Quality Assurance

Three-layer verification: Self-critique (internal) + Meta-judge specification (structured) + Judge (external)
Self-critique first: Implementation agents verify own work before submission
Meta-judge specification: Tailored rubrics ensure consistent, relevant evaluation criteria
External judge second: Independent judge applies meta-judge specification mechanically — catches blind spots self-critique misses
Iteration loop: Retry with feedback until passing or max retries
Isolated failures: One target failing doesn't affect others
Review the summary: Check for failed or partial completions
Run tests after: Parallel changes may have subtle interactions
Commit atomically: All changes from one batch = one commit

Error Handling

Failure Type	Description	Recovery Action
Recoverable	Judge found issues, retry available	Retry with judge feedback (max 3 per target)
Approach Failure	The approach for this target is wrong	Escalate to user with options
Foundation Issue	Requirements unclear or impossible	Escalate to user for clarification
Max Retries Exceeded	Target failed after 3 retries	Mark failed, continue other targets, report at end

Critical Rules:

NEVER continue past max retries without user input
NEVER try to "fix forward" without addressing judge issues
NEVER skip judge verification
STOP and report if context is missing (don't guess)
ISOLATE failures - one target failing doesn't stop others

Weekly Installs

212

Repository

neolabhq/contex…ring-kit

GitHub Stars

699

First Seen

Feb 19, 2026

Installed on

opencode207

github-copilot204

codex204

gemini-cli203

kimi-cli201

cursor201

do-in-parallel并行任务执行工具：AI代理并行处理、结构化评审与质量门控

🇨🇳中文介绍

do-in-parallel

危险信号 - 切勿做这些事

相关 Skills

流程

阶段 1：解析输入并识别目标

阶段 2：使用零样本思维链进行任务分析

独立性验证（并行派遣前必需）

阶段 3：模型和代理选择

3.1 模型选择

3.2 专用代理选择（可选）

阶段 3.5：派遣元评审

阶段 4：构建每个目标的提示

4.1 零样本思维链前缀（必需 - 必须是第一个）

4.2 任务主体（针对每个目标定制）

4.3 自我批评后缀（必需 - 必须是最后一个）

阶段 5：并行派遣和评审验证

5.1 每个目标的执行流程

5.2 评审验证协议

Implementation Output

Instructions

Output

5.3 解析裁决结果并迭代

阶段 6：收集并总结结果

示例

示例 1：跨模块的代码简化

示例 2：文档生成

示例 3：安全分析

示例 4：带有部分失败的测试生成

示例 5：从任务推断目标

最佳实践

目标选择

模型选择指南

元评审 + 评审验证

评审员选择

上下文隔离

质量保证

错误处理

🇺🇸English

do-in-parallel

RED FLAGS - Never Do These

Process

Phase 1: Parse Input and Identify Targets

Phase 2: Task Analysis with Zero-shot CoT

Independence Validation (REQUIRED before parallel dispatch)

Phase 3: Model and Agent Selection

3.1 Model Selection

3.2 Specialized Agent Selection (Optional)

Phase 3.5: Dispatch Meta-Judge

Phase 4: Construct Per-Target Prompts

4.1 Zero-shot Chain-of-Thought Prefix (REQUIRED - MUST BE FIRST)

4.2 Task Body (Customized per target)

4.3 Self-Critique Suffix (REQUIRED - MUST BE LAST)

Phase 5: Parallel Dispatch and Judge Verification

5.1 Execution Flow per Target

5.2 Judge Verification Protocol

Implementation Output

Instructions

Output

5.3 Parse Verdict and Iterate

Phase 6: Collect and Summarize Results

Examples

Example 1: Code Simplification Across Modules

Example 2: Documentation Generation

Example 3: Security Analysis

Example 4: Test Generation with Partial Failure

Example 5: Inferred Targets from Task

Best Practices

Target Selection

Model Selection Guidelines

Meta-Judge + Judge Verification

Judge Selection

Context Isolation

Quality Assurance

Error Handling

最新 Skills