Council多模型共识委员会：AI智能体并行评审验证、头脑风暴与研究工具

council by boshu2/agentops

1,500 周安装量

225 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/boshu2/agentops --skill council

AI/机器学习开发开发运维

🇨🇳中文介绍

/council — 多模型共识委员会

并行启动具有不同视角的评审员，整合为共识。适用于任何任务——验证、研究、头脑风暴。

快速开始

/council --quick validate recent                               # 快速内联检查
/council validate this plan                                    # 验证（2个智能体）
/council brainstorm caching approaches                         # 头脑风暴
/council validate the implementation                          # 验证（此处触发映射的批判）
/council research kubernetes upgrade strategies                # 研究
/council research the CI/CD pipeline bottlenecks               # 研究（此处触发映射的分析）
/council --preset=security-audit validate the auth system      # 预设角色
/council --deep --explorers=3 research upgrade automation      # 深度 + 探索者
/council --debate validate the auth system                # 对抗性两轮评审
/council --deep --debate validate the migration plan      # 彻底 + 辩论
/council                                                       # 从上下文推断

Council 独立工作——无需 RPI 工作流，无需棘轮链，无需 ao CLI。除了初始安装外，无需任何设置。

模式

模式	智能体数量

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

启动后端（必需）

Council 需要一个能够并行启动子智能体并且（对于 --debate）在智能体之间发送消息的运行时。使用您的运行时提供的任何多智能体原语。如果未检测到多智能体能力，则回退到 --quick（内联单智能体）。

必需的能力：

启动子智能体 — 使用提示创建一个并行智能体（除 --quick 外的所有模式都需要）
智能体消息传递 — 向特定智能体发送消息（--debate 需要）

技能描述的是做什么，而不是调用哪个工具。关于能力契约，请参阅 skills/shared/SKILL.md。

检测到您的后端后，请阅读相应的参考文档以获取具体的启动/等待/消息/清理示例：

Claude 功能契约 → ../shared/references/claude-code-latest-features.md
Claude 原生团队 → ../shared/references/backend-claude-teams.md
Codex 子智能体 / CLI → ../shared/references/backend-codex-subagents.md
后台任务 → ../shared/references/backend-background-tasks.md
内联（--quick） → ../shared/references/backend-inline.md

关于委员会特定的启动流程（阶段、超时、输出收集），另请参阅 references/cli-spawning.md。

何时使用 `--debate`

对于评审员可能意见不一的高风险或模糊评审，请使用 --debate：

安全审计、架构决策、迁移计划
存在多个有效视角的评审
遗漏发现会产生实际后果的情况

对于预期会达成共识的常规验证，请跳过 --debate。辩论会增加 R2 延迟（评审员保持活动状态，并通过后端消息传递处理第二轮）。

--quick 和 --debate 不能组合使用。--quick 以内联方式运行，不启动子智能体；--debate 需要多智能体轮次。如果两者都传递，则报错退出："Error: --quick and --debate are incompatible."
--debate 仅支持验证模式。头脑风暴和研究不产生 PASS/WARN/FAIL 裁决。如果组合使用，则报错退出："Error: --debate is only supported with validate mode."

类型	触发词	视角焦点
validate	validate, check, review, assess, critique, feedback, improve	这是正确的吗？有什么问题？可以如何改进？
brainstorm	brainstorm, explore, options, approaches	有哪些替代方案？优缺点是什么？
research	research, investigate, deep dive, explore deeply, analyze, examine, evaluate, compare	我们能发现什么？有哪些属性、权衡和结构？

自然语言有效——技能会根据您的提示推断任务类型。

计划/规范验证的首轮严格性检查门（必需）

当模式为 validate 且目标是计划/规范/契约（或包含边界规则、状态转换或一致性表）时，评审员在返回 PASS 之前必须应用此检查门：

规范变更 + 确认序列是明确的、单一路径且不矛盾的。
至多消费一次的路径是崩溃安全的，具有明确的原子边界和重启恢复语义。
状态/优先级行为通过字段级真值表和冲突证据的异常原因代码来定义。
一致性包括明确的边界故障点测试和重放/无重复效果结果的确定性断言。

此检查门的裁决策略：

缺少或矛盾的检查门项目：最低 WARN。
任何检查门项目缺少确定性一致性覆盖：最低 WARN。
关键生命周期不变量无法机械验证：FAIL。

上下文预算规则（关键）

评审员将所有分析写入输出文件。发送给负责人的消息仅包含一个最小的完成信号：{"type":"verdict","verdict":"...","confidence":"...","file":"..."}。负责人在整合期间读取输出文件。这可以防止 N 个评审员通过 SendMessage 用 N 份完整报告撑爆负责人的上下文窗口。

整合作为负责人内联运行——没有单独的主席智能体。负责人使用 Read 工具顺序读取每个评审员的输出文件并进行综合。

┌─────────────────────────────────────────────────────────────────┐
│  Phase 1: Build Packet (JSON)                                   │
│  - Task type (validate/brainstorm/research)                      │
│  - Target description                                           │
│  - Context (files, diffs, prior decisions)                      │
│  - Perspectives to assign                                       │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  Phase 1a: Select spawn backend                                  │
│  codex_subagents | claude_teams | background_fallback            │
│  Team lead = spawner (this agent)                                │
└─────────────────────────────────────────────────────────────────┘
                              │
            ┌─────────────────┴─────────────────┐
            ▼                                   ▼
┌───────────────────────┐           ┌───────────────────────┐
│  RUNTIME-NATIVE JUDGES│           │     CODEX AGENTS      │
│ (spawn_agent or teams)│           │  (Bash tool, parallel)│
│                       │           │  Agent 1 (independent │
│  Agent 1 (independent │           │    or with preset)    │
│    or with preset)    │           │  Agent 2              │
│  Agent 2              │           │  Agent 3              │
│  Agent 3 (--deep only)│           │  (--mixed only)       │
│  (--deep/--mixed only)│           │                       │
│                       │           │  Output: JSON + MD    │
│  Write files, then    │           │  Files: .agents/      │
│ wait()/SendMessage to │           │    council/codex-*    │
│ lead                  │           │                       │
│  Files: .agents/      │           └───────────────────────┘
│    council/claude-*   │                       │
└───────────────────────┘                       │
            │                                   │
            └─────────────────┬─────────────────┘
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  Phase 2: Consolidation (Team Lead — inline, no extra agent)    │
│  - Receive MINIMAL completion signals (verdict + file path)     │
│  - Read each judge's output file with Read tool                 │
│  - If schema_version is missing from a judge's output, treat    │
│    as version 0 (backward compatibility)                        │
│  - Compute consensus verdict                                    │
│  - Identify shared findings                                     │
│  - Surface disagreements with attribution                       │
│  - Generate Markdown report for human                           │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  Phase 3: Cleanup                                               │
│  - Cleanup backend resources (close_agent / TeamDelete / none)  │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  Output: Markdown Council Report                                │
│  - Consensus: PASS/WARN/FAIL                                    │
│  - Shared findings                                              │
│  - Disagreements (if any)                                       │
│  - Recommendations                                              │
└─────────────────────────────────────────────────────────────────┘

故障	行为
N 个智能体中的 1 个超时	使用 N-1 个继续，并在报告中注明
所有 Codex CLI 智能体失败	仅使用运行时原生评审员继续，注明降级
所有智能体失败	返回错误，建议重试
Codex CLI 未安装	跳过 Codex CLI 评审员，仅继续使用运行时评审员（警告用户）
无多智能体能力	回退到 `--quick`（内联单智能体评审）
无智能体消息传递	`--debate` 不可用，仅限单轮评审
输出目录缺失	自动创建 `.agents/council/`

超时：每个智能体 120 秒（可通过 --timeout=N 以秒为单位配置）。

最低法定人数： 至少 1 个智能体必须响应才能构成有效的委员会。如果 0 个智能体响应，则返回错误。

多智能体能力： 检测运行时是否支持并行启动子智能体。如果不支持，则降级到 --quick。
智能体消息传递： 检测运行时是否支持智能体到智能体的消息传递。如果不支持，则禁用 --debate。
Codex CLI 评审员（仅限 --mixed）： 检查 which codex，测试模型可用性，测试 --output-schema 支持。不可用时降级混合模式。
智能体数量： 验证 judges * (1 + explorers) <= MAX_AGENTS (12)
输出目录： mkdir -p .agents/council

快速模式（`--quick`）

单智能体内联验证。无子进程启动，无 Task 工具，无 Codex。当前智能体使用与完整委员会相同的输出模式进行结构化的自我评审。

何时使用： 常规检查、实施中的完整性检查、提交前的快速扫描。

执行： 收集上下文（文件、差异）-> 使用委员会 output_schema（裁决、置信度、发现、建议）内联执行结构化自我评审 -> 将报告写入 .agents/council/YYYY-MM-DD-quick-<target>.md，标记为 Mode: quick (single-agent)。

限制： 无跨视角分歧，无跨供应商洞察，置信度上限较低。不适用于安全审计或架构决策。

数据包格式（JSON）

发送给每个智能体的数据包。文件内容以内联方式包含——智能体在数据包中接收实际的代码/计划文本，而不仅仅是路径。这确保了 Claude 和 Codex 智能体都能分析，而无需文件访问权限。

如果 .agents/ao/environment.json 存在，请将其包含在上下文数据包中，以便评审员可以推理可用的工具和环境状态。

评审员提示边界：

请勿在评审员提示中包含 .agents/ 引用。
请勿指示评审员搜索 .agents/ 目录。评审员仅基于委员会数据包操作。

{
  "council_packet": {
    "version": "1.0",
    "mode": "validate | brainstorm | research",
    "target": "Implementation of user authentication system",
    "context": {
      "files": [
        {
          "path": "src/auth/jwt.py",
          "content": "<file contents inlined here>"
        },
        {
          "path": "src/auth/middleware.py",
          "content": "<file contents inlined here>"
        }
      ],
      "diff": "git diff output if applicable",
      "spec": {
        "source": "bead na-0042 | plan doc | none",
        "content": "The spec/bead description text (optional — included when wrapper provides it)"
      },
      "prior_decisions": [
        "Using JWT, not sessions",
        "Refresh tokens required"
      ],
      "empirical_results": "(optional) test output, CLI flag verification, or Wave 0 findings — include when evaluating feasibility"
    },
    "perspective": "skeptic (only when --preset or --perspectives used)",
    "perspective_description": "What could go wrong? (only when --preset or --perspectives used)",
    "output_schema": {
      "verdict": "PASS | WARN | FAIL",
      "confidence": "HIGH | MEDIUM | LOW",
      "key_insight": "Single sentence summary",
      "findings": [
        {
          "severity": "critical | significant | minor",
          "category": "security | architecture | performance | style",
          "description": "What was found",
          "location": "file:line if applicable",
          "recommendation": "How to address",
          "fix": "Specific action to resolve this finding",
          "why": "Root cause or rationale",
          "ref": "File path, spec anchor, or doc reference"
        }
      ],
      "recommendation": "Concrete next step",
      "schema_version": 2
    }
  }
}

当评估实施可行性时（例如，“这个 CLI 标志能用吗？”、“这些工具能共存吗？”），请始终在 context.empirical_results 中包含经验测试结果。基于假设进行推理的评审员会产生错误的裁决——一个 Codex 评审员曾因数据包中没有 Wave 0 测试输出而对 -s read-only 给出了错误的 FAIL。规则是：先运行实验，然后让评审员评估证据。

包装器技能（/vibe、/pre-mortem）在委员会目标涉及工具行为、标志组合或运行时兼容性时，应包含相关的测试输出。

视角与预设： 使用 Read 工具查看 skills/council/references/personas.md 以获取角色定义、预设配置和自定义视角详细信息。

自动升级： 当 --preset 或 --perspectives 指定的视角数量超过当前评审员数量时，自动将评审员数量升级以匹配。--count 标志会覆盖自动升级。

命名视角为每个评审员分配一个特定的观点。传递 --perspectives="a,b,c" 使用自由格式名称，或使用 --perspectives-file=<path> 指定包含焦点描述的 YAML 文件：

/council --perspectives="security-auditor,performance-critic,simplicity-advocate" validate src/auth/
/council --perspectives-file=.agents/perspectives/api-review.yaml validate src/api/

--perspectives-file 的 YAML 格式：

perspectives:
  - name: security-auditor
    focus: Find security vulnerabilities and trust boundary violations
  - name: performance-critic
    focus: Identify performance bottlenecks and scaling risks

标志优先级： --perspectives/--perspectives-file 覆盖 --preset 视角。--count 始终覆盖评审员数量。没有 --count 时，评审员数量自动升级以匹配视角数量。

有关所有内置预设及其视角定义，请参阅 references/personas.md。

探索者子智能体

探索者详情： 使用 Read 工具查看 skills/council/references/explorers.md 以获取探索者架构、提示、子问题生成和超时配置。

摘要： 评审员可以启动探索者子智能体（--explorers=N，最多 5 个）进行并行深度研究。总智能体数 = judges * (1 + explorers)，上限为 MAX_AGENTS=12。

辩论阶段（`--debate`）

辩论协议： 使用 Read 工具查看 skills/council/references/debate-protocol.md 以获取完整的辩论执行流程、R1 到 R2 的裁决注入、超时处理和成本分析。

摘要： 两轮对抗性评审。R1 产生独立裁决。R2 通过后端消息传递（send_input 或 SendMessage）发送其他评审员的裁决，进行换位思考和修订。仅支持验证模式。

智能体提示： 使用 Read 工具查看 skills/council/references/agent-prompts.md 以获取评审员提示（默认和基于视角的）、整合提示和辩论 R2 消息模板。

条件	裁决
全部 PASS	PASS
任何 FAIL	FAIL
混合 PASS/WARN	WARN
全部 WARN	WARN

如果 Claude 说 PASS 而 Codex 说 FAIL → DISAGREE（展示两者）
严重性加权：安全 FAIL 胜过风格 WARN

DISAGREE 解决： 当供应商意见不一致时，启动者会展示双方的立场和推理，并交由用户决定。没有自动的决胜机制——跨供应商分歧是值得人工关注的信号。

报告模板： 使用 Read 工具查看 skills/council/references/output-format.md 以获取完整的报告模板（验证、头脑风暴、研究）和辩论报告补充（裁决变化、收敛检测）。

所有报告都写入 .agents/council/YYYY-MM-DD-<type>-<target>.md。

最低法定人数： 1 个智能体。推荐： 80% 的评审员。超时时，使用剩余的评审员继续，并在报告中注明。用户取消时，关闭所有评审员并生成带有 INCOMPLETE 标记的部分报告。

变量	默认值	描述
`COUNCIL_TIMEOUT`	120	智能体超时时间（秒）
`COUNCIL_CODEX_MODEL`	gpt-5.3-codex	覆盖 --mixed 的 Codex 模型。显式设置以固定 Codex 评审员行为；省略则使用用户配置的默认值。
`COUNCIL_CLAUDE_MODEL`	sonnet	评审员使用的 Claude 模型（默认 sonnet——对于高风险任务，通过 `--profile=thorough` 使用 opus）
`COUNCIL_EXPLORER_MODEL`	sonnet	探索者子智能体使用的模型
`COUNCIL_EXPLORER_TIMEOUT`	60	探索者超时时间（秒）
`COUNCIL_R2_TIMEOUT`	90	发送辩论消息后，等待 R2 辩论完成的最长时间。比 R1 短，因为评审员已有上下文。

标志	描述
`--deep`	3 个 Claude 智能体而非 2 个
`--mixed`	添加 3 个 Codex 智能体
`--debate`	启用对抗性辩论轮次（通过后端消息传递进行 2 轮，相同智能体）。与 `--quick` 不兼容。
`--timeout=N`	覆盖超时时间（秒）（默认：120）
`--perspectives="a,b,c"`	自定义视角名称（每个名称将评审员的系统提示设置为采用该观点）
`--perspectives-file=<path>`	从 YAML 文件加载命名视角（见下文命名视角）
`--preset=<name>`	内置角色预设（security-audit, architecture, research, ops, code-review, plan-review, doc-review, retrospective, product, developer-experience）
`--count=N`	覆盖每个供应商的智能体数量（例如，`--count=4` = 4 个 Claude，或使用 --mixed 时为 4+4）。受 MAX_AGENTS=12 上限限制。
`--explorers=N`	每个评审员的探索者子智能体数量（默认：0，最大：5）。最大有效值取决于评审员数量。总智能体数上限为 12。
`--explorer-model=M`	覆盖探索者模型（默认：sonnet）
`--technique=<name>`	头脑风暴技术（scamper, six-hats, reverse）。不区分大小写。仅适用于头脑风暴模式——与验证/研究模式组合会报错。如果省略，则进行非结构化头脑风暴（当前行为）。见 `references/brainstorm-techniques.md`。
`--profile=<name>`	模型质量配置文件（thorough, balanced, fast）。如果名称无法识别则报错。被 `COUNCIL_CLAUDE_MODEL` 环境变量（最高优先级）覆盖，然后被显式的 `--count`/`--deep`/`--mixed` 覆盖。见 `references/model-profiles.md`。

CLI 启动： 使用 Read 工具查看 skills/council/references/cli-spawning.md 以获取团队设置、Claude/Codex 智能体启动、并行执行、辩论 R2 命令、清理和模型选择。

/council validate recent                                        # 2 个评审员，最近的提交
/council --deep --preset=architecture research the auth system  # 3 个具有架构角色的评审员
/council --mixed validate this plan                             # 3 个 Claude + 3 个 Codex
/council --deep --explorers=3 research upgrade patterns         # 12 个智能体（3 个评审员 x 4）
/council --preset=security-audit --deep validate the API        # 攻击者、防御者、合规性、Web 安全
/council --preset=doc-review validate README.md                  # 4 个文档评审员，具有命名视角
/council brainstorm caching strategies for the API              # 2 个评审员探索选项
/council --technique=scamper brainstorm API improvements               # 结构化的 SCAMPER 头脑风暴
/council --technique=six-hats brainstorm migration strategy            # 并行视角头脑风暴
/council --profile=thorough validate the security architecture       # opus, 3 个评审员, 120s 超时
/council --profile=fast validate recent                               # haiku, 2 个评审员, 60s 超时
/council research Redis vs Memcached for session storage        # 2 个评审员评估权衡
/council validate the implementation plan in PLAN.md            # 结构化计划反馈
/council --preset=doc-review validate docs/ARCHITECTURE.md             # 4 个文档评审员
/council --perspectives="security-auditor,perf-critic" validate src/   # 命名视角
/council --perspectives-file=.agents/perspectives/custom.yaml validate # 从文件加载视角

快速单智能体验证

用户说： /council --quick validate recent

发生的情况：

智能体内联收集上下文（最近的差异、文件），不启动子智能体
智能体使用委员会输出模式进行结构化自我评审
报告写入 .agents/council/YYYY-MM-DD-quick-<target>.md，标记为 Mode: quick (single-agent)

结果： 用于常规验证的快速完整性检查（无跨视角洞察或辩论）。

对抗性辩论评审

用户说： /council --debate validate the auth system

发生的情况：

智能体启动 2 个评审员（运行时原生后端），具有独立视角
R1：评审员独立评估，将裁决写入 .agents/council/
R2：团队负责人通过后端消息传递发送其他评审员的裁决
评审员根据跨视角证据修订立场
整合：团队负责人计算共识，并进行收敛检测

结果： 带有换位思考和修订的两轮评审，适用于高风险决策。

带探索者的跨供应商共识

用户说： /council --mixed --explorers=2 research Kubernetes upgrade strategies

发生的情况：

智能体启动 3 个 Claude 评审员 + 3 个 Codex 评审员（共 6 个）
每个评审员启动 2 个探索者子智能体（6 x 3 = 18 个总智能体，超过 MAX_AGENTS）
智能体自动缩放到每个供应商 2 个评审员（4 x 3 = 12 个智能体，达到上限）
探索者执行并行深度研究，将子发现返回给评审员
评审员将探索者发现与自己的研究进行整合

结果： 具有深度探索的跨供应商研究，总智能体数上限为 12。

问题	原因	解决方案
"Error: --quick and --debate are incompatible"	同时传递了两个标志	使用 `--quick` 进行快速内联检查或使用 `--debate` 进行多轮评审，不要同时使用
"Error: --debate is only supported with validate mode"	将辩论标志与头脑风暴/研究模式一起使用	移除 `--debate` 或切换到验证模式——头脑风暴/研究没有 PASS/FAIL 裁决
委员会启动的智能体数量少于预期	`--explorers=N` 超过 MAX_AGENTS (12)	智能体自动缩放评审员数量。检查报告头部以获取实际评审员数量。减少 `--explorers` 或使用 `--count` 手动设置评审员数量
在 --mixed 模式下跳过 Codex 评审员	Codex CLI 不在 PATH 中	安装 Codex CLI (`brew install codex`)。模型使用用户配置的默认值——无需特定模型。
`.agents/council/` 中没有输出文件	权限错误或磁盘已满	使用 `ls -ld .agents/council/` 检查目录权限。委员会会自动创建缺失的目录。
智能体在 120 秒后超时	文件读取缓慢或网络问题	使用 `--timeout=300` 增加超时时间，或检查 `COUNCIL_TIMEOUT` 环境变量。默认：120 秒。

/council 取代了旧的 judge 技能。迁移：

旧命令	新命令
judge recent	`/council validate recent`
judge 2 opus	`/council recent`（默认）
judge 3 opus	`/council --deep recent`

judge 技能已弃用。请使用 /council。

Council 使用您的运行时提供的任何多智能体原语。每个评审员都是一个并行的子智能体，将输出写入文件，并向负责人发送一个最小的完成信号。

--debate 标志实现了审议协议模式：

独立评估 → 证据交换 → 立场修订 → 收敛分析

R1： 将评审员作为并行子智能体启动。每个评审员独立评估，将裁决写入文件，发送完成信号。
R2： 负责人通过智能体消息传递将其他评审员的裁决摘要发送给每个评审员。评审员进行修订并写入 R2 文件。
整合： 负责人读取所有输出文件，计算共识。
清理： 通过运行时的清理机制关闭评审员。

仅限评审员 → 负责人。 评审员之间从不直接通信。这可以防止锚定效应。
负责人 → 评审员。 只有负责人发送后续消息（用于辩论 R2）。
评审员不共享任务变更。 负责人管理协调状态。

Ralph Wiggum 合规性

Council 保持新鲜上下文隔离（Ralph Wiggum 模式），但有一个有记录的例外：

--debate 在 R1 和 R2 之间重用评审员上下文。 这是有意为之的。评审员在单个原子性的委员会调用中持续存在——它们不会在单独的委员会调用之间持续存在。理由如下：

评审员在评估其他评审员在 R2 中的立场时，受益于他们自己的 R1 分析上下文（推理链，而不仅仅是裁决 JSON）
仅使用裁决摘要（约 200 个令牌）重新启动会丢失评审员关于为什么得出其裁决的工作记忆
例外是有界的：最多 2 轮，在一次调用内，有明确的清理

没有 --debate 时，委员会完全符合 Ralph 模式：每个评审员都是新启动的，执行一次，写入输出，然后终止。

如果未检测到多智能体能力，委员会将回退到 --quick（内联单智能体评审）。如果智能体消息传递不可用，--debate 将降级为单轮评审，并在报告中注明。

约定：council-YYYYMMDD-<target>（例如，council-20260206-auth-system）。

评审员名称：独立评审员使用 judge-{N}（例如，judge-1、judge-2），或在使用预设/视角时使用 judge-{perspective}（例如，judge-error-paths、judge-feasibility）。在 Codex 和 Claude 后端使用相同的逻辑名称。

skills/vibe/SKILL.md — 复杂性 + 委员会进行代码验证（发现规范时使用 --preset=code-review）
skills/pre-mortem/SKILL.md — 计划验证（使用 --preset=plan-review，始终 3 个评审员）
skills/post-mortem/SKILL.md — 工作收尾（使用 --preset=retrospective，始终 3 个评审员 + 回顾）
skills/swarm/SKILL.md — 多智能体编排
skills/standards/SKILL.md — 特定语言的编码标准
skills/research/SKILL.md — 代码库探索（与委员会研究模式互补）

🇺🇸English

/council — Multi-Model Consensus Council

Spawn parallel judges with different perspectives, consolidate into consensus. Works for any task — validation, research, brainstorming.

Quick Start

/council --quick validate recent                               # fast inline check
/council validate this plan                                    # validation (2 agents)
/council brainstorm caching approaches                         # brainstorm
/council validate the implementation                          # validation (critique triggers map here)
/council research kubernetes upgrade strategies                # research
/council research the CI/CD pipeline bottlenecks               # research (analyze triggers map here)
/council --preset=security-audit validate the auth system      # preset personas
/council --deep --explorers=3 research upgrade automation      # deep + explorers
/council --debate validate the auth system                # adversarial 2-round review
/council --deep --debate validate the migration plan      # thorough + debate
/council                                                       # infers from context

Council works independently — no RPI workflow, no ratchet chain, no ao CLI required. Zero setup beyond initial install.

Modes

Mode	Agents	Execution Backend	Use Case
`--quick`	0 (inline)	Self	Fast single-agent check, no spawning
default	2	Runtime-native (Codex sub-agents preferred; Claude teams fallback)	Independent judges (no perspective labels)
`--deep`	3	Runtime-native	Thorough review
`--mixed`	3+3	Runtime-native + Codex CLI	Cross-vendor consensus
`--debate`

/council --quick validate recent   # inline single-agent check, no spawning
/council recent                    # 2 runtime-native judges
/council --deep recent             # 3 runtime-native judges
/council --mixed recent            # runtime-native + Codex CLI

Spawn Backend (MANDATORY)

Council requires a runtime that can spawn parallel subagents and (for --debate) send messages between agents. Use whatever multi-agent primitives your runtime provides. If no multi-agent capability is detected, fall back to --quick (inline single-agent).

Required capabilities:

Spawn subagent — create a parallel agent with a prompt (required for all modes except --quick)
Agent messaging — send a message to a specific agent (required for --debate)

Skills describe WHAT to do, not WHICH tool to call. See skills/shared/SKILL.md for the capability contract.

After detecting your backend, read the matching reference for concrete spawn/wait/message/cleanup examples:

Claude feature contract → ../shared/references/claude-code-latest-features.md
Claude Native Teams → ../shared/references/backend-claude-teams.md
Codex Sub-Agents / CLI → ../shared/references/backend-codex-subagents.md
Background Tasks → ../shared/references/backend-background-tasks.md
Inline (--quick) → ../shared/references/backend-inline.md

See also references/cli-spawning.md for council-specific spawning flow (phases, timeouts, output collection).

When to Use `--debate`

Use --debate for high-stakes or ambiguous reviews where judges are likely to disagree:

Security audits, architecture decisions, migration plans
Reviews where multiple valid perspectives exist
Cases where a missed finding has real consequences

Skip --debate for routine validation where consensus is expected. Debate adds R2 latency (judges stay alive and process a second round via backend messaging).

Incompatibilities:

--quick and --debate cannot be combined. --quick runs inline with no spawning; --debate requires multi-agent rounds. If both are passed, exit with error: "Error: --quick and --debate are incompatible."
--debate is only supported with validate mode. Brainstorm and research do not produce PASS/WARN/FAIL verdicts. If combined, exit with error: "Error: --debate is only supported with validate mode."

Task Types

Type	Trigger Words	Perspective Focus
validate	validate, check, review, assess, critique, feedback, improve	Is this correct? What's wrong? What could be better?
brainstorm	brainstorm, explore, options, approaches	What are the alternatives? Pros/cons?
research	research, investigate, deep dive, explore deeply, analyze, examine, evaluate, compare	What can we discover? What are the properties, trade-offs, and structure?

Natural language works — the skill infers task type from your prompt.

First-pass rigor gate for plan/spec validation (MANDATORY)

When mode is validate and the target is a plan/spec/contract (or contains boundary rules, state transitions, or conformance tables), judges must apply this gate before returning PASS:

Canonical mutation + ack sequence is explicit, single-path, and non-contradictory.
Consume-at-most-once path is crash-safe with explicit atomic boundary and restart recovery semantics.
Status/precedence behavior is defined with a field-level truth table and anomaly reason codes for conflicting evidence.
Conformance includes explicit boundary failpoint tests and deterministic assertions for replay/no-duplicate-effect outcomes.

Verdict policy for this gate:

Missing or contradictory gate item: minimum WARN.
Missing deterministic conformance coverage for any gate item: minimum WARN.
Critical lifecycle invariant not mechanically verifiable: FAIL.

Architecture

Context Budget Rule (CRITICAL)

Judges write ALL analysis to output files. Messages to the lead contain ONLY a minimal completion signal: {"type":"verdict","verdict":"...","confidence":"...","file":"..."}. The lead reads output files during consolidation. This prevents N judges from exploding the lead's context window with N full reports via SendMessage.

Consolidation runs inline as the lead — no separate chairman agent. The lead reads each judge's output file sequentially with the Read tool and synthesizes.

Execution Flow

┌─────────────────────────────────────────────────────────────────┐
│  Phase 1: Build Packet (JSON)                                   │
│  - Task type (validate/brainstorm/research)                      │
│  - Target description                                           │
│  - Context (files, diffs, prior decisions)                      │
│  - Perspectives to assign                                       │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  Phase 1a: Select spawn backend                                  │
│  codex_subagents | claude_teams | background_fallback            │
│  Team lead = spawner (this agent)                                │
└─────────────────────────────────────────────────────────────────┘
                              │
            ┌─────────────────┴─────────────────┐
            ▼                                   ▼
┌───────────────────────┐           ┌───────────────────────┐
│  RUNTIME-NATIVE JUDGES│           │     CODEX AGENTS      │
│ (spawn_agent or teams)│           │  (Bash tool, parallel)│
│                       │           │  Agent 1 (independent │
│  Agent 1 (independent │           │    or with preset)    │
│    or with preset)    │           │  Agent 2              │
│  Agent 2              │           │  Agent 3              │
│  Agent 3 (--deep only)│           │  (--mixed only)       │
│  (--deep/--mixed only)│           │                       │
│                       │           │  Output: JSON + MD    │
│  Write files, then    │           │  Files: .agents/      │
│ wait()/SendMessage to │           │    council/codex-*    │
│ lead                  │           │                       │
│  Files: .agents/      │           └───────────────────────┘
│    council/claude-*   │                       │
└───────────────────────┘                       │
            │                                   │
            └─────────────────┬─────────────────┘
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  Phase 2: Consolidation (Team Lead — inline, no extra agent)    │
│  - Receive MINIMAL completion signals (verdict + file path)     │
│  - Read each judge's output file with Read tool                 │
│  - If schema_version is missing from a judge's output, treat    │
│    as version 0 (backward compatibility)                        │
│  - Compute consensus verdict                                    │
│  - Identify shared findings                                     │
│  - Surface disagreements with attribution                       │
│  - Generate Markdown report for human                           │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  Phase 3: Cleanup                                               │
│  - Cleanup backend resources (close_agent / TeamDelete / none)  │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  Output: Markdown Council Report                                │
│  - Consensus: PASS/WARN/FAIL                                    │
│  - Shared findings                                              │
│  - Disagreements (if any)                                       │
│  - Recommendations                                              │
└─────────────────────────────────────────────────────────────────┘

Graceful Degradation

Failure	Behavior
1 of N agents times out	Proceed with N-1, note in report
All Codex CLI agents fail	Proceed with runtime-native judges only, note degradation
All agents fail	Return error, suggest retry
Codex CLI not installed	Skip Codex CLI judges, continue with runtime judges only (warn user)
No multi-agent capability	Fall back to `--quick` (inline single-agent review)
No agent messaging	`--debate` unavailable, single-round review only
Output dir missing	Create `.agents/council/` automatically

Timeout: 120s per agent (configurable via --timeout=N in seconds).

Minimum quorum: At least 1 agent must respond for a valid council. If 0 agents respond, return error.

Pre-Flight Checks

Multi-agent capability: Detect whether runtime supports spawning parallel subagents. If not, degrade to --quick.
Agent messaging: Detect whether runtime supports agent-to-agent messaging. If not, disable --debate.
Codex CLI judges (--mixed only): Check which codex, test model availability, test --output-schema support. Downgrade mixed mode when unavailable.
Agent count: Verify judges * (1 + explorers) <= MAX_AGENTS (12)
Output dir: mkdir -p .agents/council

Quick Mode (`--quick`)

Single-agent inline validation. No subprocess spawning, no Task tool, no Codex. The current agent performs a structured self-review using the same output schema as a full council.

When to use: Routine checks, mid-implementation sanity checks, pre-commit quick scan.

Execution: Gather context (files, diffs) -> perform structured self-review inline using the council output_schema (verdict, confidence, findings, recommendation) -> write report to .agents/council/YYYY-MM-DD-quick-<target>.md labeled as Mode: quick (single-agent).

Limitations: No cross-perspective disagreement, no cross-vendor insights, lower confidence ceiling. Not suitable for security audits or architecture decisions.

Packet Format (JSON)

The packet sent to each agent. File contents are included inline — agents receive the actual code/plan text in the packet, not just paths. This ensures both Claude and Codex agents can analyze without needing file access.

If .agents/ao/environment.json exists, include it in the context packet so judges can reason about available tools and environment state.

Judge prompt boundary:

Do NOT include .agents/ references in judge prompts.
Do NOT instruct judges to search .agents/ directories. Judges operate on the council packet only.

{ "council_packet": { "version": "1.0", "mode": "validate | brainstorm | research", "target": "Implementation of user authentication system", "context": { "files": [ { "path": "src/auth/jwt.py", "content": "<file contents inlined here>" }, { "path": "src/auth/middleware.py", "content": "<file contents inlined here>" } ], "diff": "git diff output if applicable", "spec": { "source": "bead na-0042 | plan doc | none", "content": "The spec/bead description text (optional — included when wrapper provides it)" }, "prior_decisions": [ "Using JWT, not sessions", "Refresh tokens required" ], "empirical_results": "(optional) test output, CLI flag verification, or Wave 0 findings — include when evaluating feasibility" }, "perspective": "skeptic (only when --preset or --perspectives used)", "perspective_description": "What could go wrong? (only when --preset or --perspectives used)", "output_schema": { "verdict": "PASS | WARN | FAIL", "confidence": "HIGH | MEDIUM | LOW", "key_insight": "Single sentence summary", "findings": [ { "severity": "critical | significant | minor", "category": "security | architecture | performance | style", "description": "What was found", "location": "file:line if applicable", "recommendation": "How to address", "fix": "Specific action to resolve this finding", "why": "Root cause or rationale", "ref": "File path, spec anchor, or doc reference" } ], "recommendation": "Concrete next step", "schema_version": 2 } } }

Empirical Evidence Rule

When evaluating implementation feasibility (e.g., "will this CLI flag work?", "can these tools coexist?"), always include empirical test results in context.empirical_results. Judges reasoning from assumptions produce false verdicts — a Codex judge once gave a false FAIL on -s read-only because Wave 0 test output was not in the packet. The rule: run the experiment first, then let judges evaluate the evidence.

Wrapper skills (/vibe, /pre-mortem) should include relevant test output when the council target involves tooling behavior, flag combinations, or runtime compatibility.

Perspectives

Perspectives & Presets: Use Read tool on skills/council/references/personas.md for persona definitions, preset configurations, and custom perspective details.

Auto-Escalation: When --preset or --perspectives specifies more perspectives than the current judge count, automatically escalate judge count to match. The --count flag overrides auto-escalation.

Named Perspectives

Named perspectives assign each judge a specific viewpoint. Pass --perspectives="a,b,c" for free-form names, or --perspectives-file=<path> for YAML with focus descriptions:

/council --perspectives="security-auditor,performance-critic,simplicity-advocate" validate src/auth/
/council --perspectives-file=.agents/perspectives/api-review.yaml validate src/api/

YAML format for --perspectives-file:

perspectives:
  - name: security-auditor
    focus: Find security vulnerabilities and trust boundary violations
  - name: performance-critic
    focus: Identify performance bottlenecks and scaling risks

Flag priority: --perspectives/--perspectives-file override --preset perspectives. --count always overrides judge count. Without --count, judge count auto-escalates to match perspective count.

See references/personas.md for all built-in presets and their perspective definitions.

Explorer Sub-Agents

Explorer Details: Use Read tool on skills/council/references/explorers.md for explorer architecture, prompts, sub-question generation, and timeout configuration.

Summary: Judges can spawn explorer sub-agents (--explorers=N, max 5) for parallel deep-dive research. Total agents = judges * (1 + explorers), capped at MAX_AGENTS=12.

Debate Phase (`--debate`)

Debate Protocol: Use Read tool on skills/council/references/debate-protocol.md for full debate execution flow, R1-to-R2 verdict injection, timeout handling, and cost analysis.

Summary: Two-round adversarial review. R1 produces independent verdicts. R2 sends other judges' verdicts via backend messaging (send_input or SendMessage) for steel-manning and revision. Only supported with validate mode.

Agent Prompts

Agent Prompts: Use Read tool on skills/council/references/agent-prompts.md for judge prompts (default and perspective-based), consolidation prompt, and debate R2 message template.

Consensus Rules

Condition	Verdict
All PASS	PASS
Any FAIL	FAIL
Mixed PASS/WARN	WARN
All WARN	WARN

Disagreement handling:

If Claude says PASS and Codex says FAIL → DISAGREE (surface both)
Severity-weighted: Security FAIL outweighs style WARN

DISAGREE resolution: When vendors disagree, the spawner presents both positions with reasoning and defers to the user. No automatic tie-breaking — cross-vendor disagreement is a signal worth human attention.

Output Format

Report Templates: Use Read tool on skills/council/references/output-format.md for full report templates (validate, brainstorm, research) and debate report additions (verdict shifts, convergence detection).

All reports write to .agents/council/YYYY-MM-DD-<type>-<target>.md.

Configuration

Partial Completion

Minimum quorum: 1 agent. Recommended: 80% of judges. On timeout, proceed with remaining judges and note in report. On user cancellation, shutdown all judges and generate partial report with INCOMPLETE marker.

Environment Variables

Variable	Default	Description
`COUNCIL_TIMEOUT`	120	Agent timeout in seconds
`COUNCIL_CODEX_MODEL`	gpt-5.3-codex	Override Codex model for --mixed. Set explicitly to pin Codex judge behavior; omit to use user's configured default.
`COUNCIL_CLAUDE_MODEL`	sonnet	Claude model for judges (sonnet default — use opus for high-stakes via `--profile=thorough`)
`COUNCIL_EXPLORER_MODEL`	sonnet	Model for explorer sub-agents

Flags

Flag	Description
`--deep`	3 Claude agents instead of 2
`--mixed`	Add 3 Codex agents
`--debate`	Enable adversarial debate round (2 rounds via backend messaging, same agents). Incompatible with `--quick`.
`--timeout=N`	Override timeout in seconds (default: 120)
`--perspectives="a,b,c"`	Custom perspective names (each name sets the judge's system prompt to adopt that viewpoint)

CLI Spawning Commands

CLI Spawning: Use Read tool on skills/council/references/cli-spawning.md for team setup, Claude/Codex agent spawning, parallel execution, debate R2 commands, cleanup, and model selection.

Examples

/council validate recent                                        # 2 judges, recent commits
/council --deep --preset=architecture research the auth system  # 3 judges with architecture personas
/council --mixed validate this plan                             # 3 Claude + 3 Codex
/council --deep --explorers=3 research upgrade patterns         # 12 agents (3 judges x 4)
/council --preset=security-audit --deep validate the API        # attacker, defender, compliance, web-security
/council --preset=doc-review validate README.md                  # 4 doc judges with named perspectives
/council brainstorm caching strategies for the API              # 2 judges explore options
/council --technique=scamper brainstorm API improvements               # structured SCAMPER brainstorm
/council --technique=six-hats brainstorm migration strategy            # parallel perspectives brainstorm
/council --profile=thorough validate the security architecture       # opus, 3 judges, 120s timeout
/council --profile=fast validate recent                               # haiku, 2 judges, 60s timeout
/council research Redis vs Memcached for session storage        # 2 judges assess trade-offs
/council validate the implementation plan in PLAN.md            # structured plan feedback
/council --preset=doc-review validate docs/ARCHITECTURE.md             # 4 doc review judges
/council --perspectives="security-auditor,perf-critic" validate src/   # named perspectives
/council --perspectives-file=.agents/perspectives/custom.yaml validate # perspectives from file

Fast Single-Agent Validation

User says: /council --quick validate recent

What happens:

Agent gathers context (recent diffs, files) inline without spawning
Agent performs structured self-review using council output schema
Report written to .agents/council/YYYY-MM-DD-quick-<target>.md labeled Mode: quick (single-agent)

Result: Fast sanity check for routine validation (no cross-perspective insights or debate).

Adversarial Debate Review

User says: /council --debate validate the auth system

What happens:

Agent spawns 2 judges (runtime-native backend) with independent perspectives
R1: Judges assess independently, write verdicts to .agents/council/
R2: Team lead sends other judges' verdicts via backend messaging
Judges revise positions based on cross-perspective evidence
Consolidation: Team lead computes consensus with convergence detection

Result: Two-round review with steel-manning and revision, useful for high-stakes decisions.

Cross-Vendor Consensus with Explorers

User says: /council --mixed --explorers=2 research Kubernetes upgrade strategies

What happens:

Agent spawns 3 Claude judges + 3 Codex judges (6 total)
Each judge spawns 2 explorer sub-agents (6 x 3 = 18 total agents, exceeds MAX_AGENTS)
Agent auto-scales to 2 judges per vendor (4 x 3 = 12 agents at limit)
Explorers perform parallel deep-dives, return sub-findings to judges
Judges consolidate explorer findings with own research

Result: Cross-vendor research with deep exploration, capped at 12 total agents.

Troubleshooting

Problem	Cause	Solution
"Error: --quick and --debate are incompatible"	Both flags passed together	Use `--quick` for fast inline check OR `--debate` for multi-round review, not both
"Error: --debate is only supported with validate mode"	Debate flag used with brainstorm/research	Remove `--debate` or switch to validate mode — brainstorming/research have no PASS/FAIL verdicts
Council spawns fewer agents than expected	`--explorers=N` exceeds MAX_AGENTS (12)	Agent auto-scales judge count. Check report header for actual judge count. Reduce `--explorers` or use `--count` to manually set judges

Migration from judge

/council replaces the old judge skill. Migration:

Old	New
judge recent	`/council validate recent`
judge 2 opus	`/council recent` (default)
judge 3 opus	`/council --deep recent`

The judge skill is deprecated. Use /council.

Multi-Agent Architecture

Council uses whatever multi-agent primitives your runtime provides. Each judge is a parallel subagent that writes output to a file and sends a minimal completion signal to the lead.

Deliberation Protocol

The --debate flag implements the deliberation protocol pattern:

Independent assessment → evidence exchange → position revision → convergence analysis

R1: Spawn judges as parallel subagents. Each assesses independently, writes verdict to file, signals completion.
R2: Lead sends other judges' verdict summaries to each judge via agent messaging. Judges revise and write R2 files.
Consolidation: Lead reads all output files, computes consensus.
Cleanup: Shut down judges via runtime's cleanup mechanism.

Communication Rules

Judges → lead only. Judges never message each other directly. This prevents anchoring.
Lead → judges. Only the lead sends follow-ups (for debate R2).
No shared task mutation by judges. Lead manages coordination state.

Ralph Wiggum Compliance

Council maintains fresh-context isolation (Ralph Wiggum pattern) with one documented exception:

--debate reuses judge context across R1 and R2. This is intentional. Judges persist within a single atomic council invocation — they do NOT persist across separate council calls. The rationale:

Judges benefit from their own R1 analytical context (reasoning chain, not just the verdict JSON) when evaluating other judges' positions in R2
Re-spawning with only the verdict summary (~200 tokens) would lose the judge's working memory of WHY they reached their verdict
The exception is bounded: max 2 rounds, within one invocation, with explicit cleanup

Without --debate, council is fully Ralph-compliant: each judge is a fresh spawn, executes once, writes output, and terminates.

Degradation

If no multi-agent capability is detected, council falls back to --quick (inline single-agent review). If agent messaging is unavailable, --debate degrades to single-round review with a note in the report.

Judge Naming

Convention: council-YYYYMMDD-<target> (e.g., council-20260206-auth-system).

Judge names: judge-{N} for independent judges (e.g., judge-1, judge-2), or judge-{perspective} when using presets/perspectives (e.g., judge-error-paths, judge-feasibility). Use the same logical names across both Codex and Claude backends.

Reference Documents

Weekly Installs

1.4K

Repository

boshu2/agentops

GitHub Stars

197

First Seen

Feb 5, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

opencode1.2K

codex1.2K

github-copilot1.2K

gemini-cli1.2K

kimi-cli1.2K

amp1.2K

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

102,200 周安装

`--quick`	0（内联）	自身	快速单智能体检查，不启动
默认	2	运行时原生（优先使用 Codex 子智能体；Claude 团队作为后备）	独立评审员（无视角标签）
`--deep`	3	运行时原生	彻底评审
`--mixed`	3+3	运行时原生 + Codex CLI	跨供应商共识
`--debate`	2+	运行时原生	对抗性优化（2轮）

Council多模型共识委员会：AI智能体并行评审验证、头脑风暴与研究工具

🇨🇳中文介绍

/council — 多模型共识委员会

快速开始

模式

相关 Skills

启动后端（必需）

何时使用 --debate

任务类型

计划/规范验证的首轮严格性检查门（必需）

架构

上下文预算规则（关键）

执行流程

优雅降级

飞行前检查

快速模式（--quick）

数据包格式（JSON）

经验证据规则

视角

命名视角

探索者子智能体

辩论阶段（--debate）

智能体提示

共识规则

输出格式

配置

部分完成

环境变量

标志

CLI 启动命令

示例

快速单智能体验证

对抗性辩论评审

带探索者的跨供应商共识

故障排除

从 judge 迁移

多智能体架构

审议协议

通信规则

Ralph Wiggum 合规性

降级

评审员命名

另请参阅

参考文档

🇺🇸English

/council — Multi-Model Consensus Council

Quick Start

Modes

Spawn Backend (MANDATORY)

When to Use --debate

Task Types

First-pass rigor gate for plan/spec validation (MANDATORY)

Architecture

Context Budget Rule (CRITICAL)

Execution Flow

Graceful Degradation

Pre-Flight Checks

Quick Mode (--quick)

Packet Format (JSON)

Empirical Evidence Rule

Perspectives

Named Perspectives

Explorer Sub-Agents

Debate Phase (--debate)

Agent Prompts

Consensus Rules

Output Format

Configuration

Partial Completion

Environment Variables

Flags

CLI Spawning Commands

Examples

Fast Single-Agent Validation

Adversarial Debate Review

Cross-Vendor Consensus with Explorers

Troubleshooting

Migration from judge

Multi-Agent Architecture

Deliberation Protocol

何时使用 `--debate`

快速模式（`--quick`）

辩论阶段（`--debate`）

When to Use `--debate`

Quick Mode (`--quick`)

Debate Phase (`--debate`)