multi-agent-patterns by sickn33/antigravity-awesome-skills
npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill multi-agent-patterns多智能体架构将工作分配给多个语言模型实例,每个实例都有自己的上下文窗口。设计良好时,这种分配能够实现超越单智能体限制的能力。设计不佳时,它会引入协调开销,从而抵消其优势。关键见解在于,子智能体的存在主要是为了隔离上下文,而不是为了拟人化地划分角色。
在以下情况下激活此技能:
多智能体系统通过分布式处理来解决单智能体的上下文限制。存在三种主要模式:用于集中控制的监督者/协调者模式、用于灵活交接的对等/群体模式,以及用于分层抽象的层次模式。关键的设计原则是上下文隔离——子智能体的存在主要是为了划分上下文,而不是模拟组织角色。
有效的多智能体系统需要明确的协调协议、避免阿谀奉承的共识机制,以及对故障模式(包括瓶颈、分歧和错误传播)的仔细关注。
上下文瓶颈 单智能体在推理能力、上下文管理和工具协调方面面临固有的上限。随着任务变得越来越复杂,上下文窗口会被累积的历史记录、检索到的文档和工具输出填满。性能会根据可预测的模式下降:中间迷失效应、注意力稀缺和上下文污染。
多智能体架构通过将工作划分到多个上下文窗口来解决这些限制。每个智能体在一个专注于其子任务的干净上下文中运行。结果在协调层聚合,没有任何单个上下文承担全部负担。
Token 经济学的现实 多智能体系统消耗的 token 数量显著多于单智能体方法。生产数据显示:
| 架构 | Token 倍数 | 用例 |
|---|---|---|
| 单智能体聊天 | 1× 基准 | 简单查询 |
| 带工具的单智能体 | ~4× 基准 | 使用工具的任务 |
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| 多智能体系统 | ~15× 基准 | 复杂研究/协调 |
关于 BrowseComp 评估的研究发现,三个因素解释了 95% 的性能差异:token 使用量(占差异的 80%)、工具调用次数和模型选择。这验证了多智能体方法,即通过将工作分配给具有独立上下文窗口的智能体,来增加并行推理的能力。
关键的是,升级到更好的模型通常比加倍 token 预算带来更大的性能提升。Claude Sonnet 4.5 相比早期 Sonnet 版本加倍 token 显示出更大的增益。GPT-5.2 的思考模式同样优于单纯的 token 增加。这表明模型选择和多智能体架构是互补的策略。
并行化论证 许多任务包含可并行化的子任务,而单智能体必须顺序执行这些子任务。一项研究任务可能需要搜索多个独立的来源、分析不同的文档或比较相互竞争的方法。单智能体按顺序处理这些任务,每一步都会累积上下文。
多智能体架构将每个子任务分配给一个具有全新上下文的专用智能体。所有智能体同时工作,然后将结果返回给协调者。实际总时间接近最长子任务的持续时间,而不是所有子任务的总和。
专业化论证 不同的任务受益于不同的智能体配置:不同的系统提示、不同的工具集、不同的上下文结构。一个通用智能体必须在上下文中携带所有可能的配置。专用智能体只携带它们需要的东西。
多智能体架构实现了专业化,而不会导致组合爆炸。协调者将任务路由到专用智能体;每个智能体都在为其领域优化的精简上下文中运行。
模式 1:监督者/协调者 监督者模式将一个中心智能体置于控制地位,委托给专家并综合结果。监督者维护全局状态和轨迹,将用户目标分解为子任务,并路由到适当的工作者。
User Query -> Supervisor -> [Specialist, Specialist, Specialist] -> Aggregation -> Final Output
何时使用:具有明确分解的复杂任务、需要跨领域协调的任务、需要人工监督的任务。
优点:对工作流程的严格控制,更容易实现人在环路的干预,确保遵守预定义的计划。
缺点:监督者上下文成为瓶颈,监督者故障会级联到所有工作者,存在“传话游戏”问题,即监督者错误地转述子智能体的响应。
传话游戏问题及解决方案 LangGraph 基准测试发现,监督者架构最初的表现比优化版本差 50%,原因是“传话游戏”问题,即监督者错误地转述子智能体的响应,导致保真度丢失。
解决方案:实现一个 forward_message 工具,允许子智能体直接将响应传递给用户:
def forward_message(message: str, to_user: bool = True):
"""
将子智能体响应直接转发给用户,无需监督者综合。
在以下情况下使用:
- 子智能体响应是最终且完整的
- 监督者综合会丢失重要细节
- 必须完全保留响应格式
"""
if to_user:
return {"type": "direct_response", "content": message}
return {"type": "supervisor_input", "content": message}
使用这种模式,群体架构的表现略优于监督者架构,因为子智能体直接响应用户,消除了翻译错误。
实现说明:实现直接传递机制,允许子智能体在适当时直接将响应传递给用户,而不是通过监督者综合。
模式 2:对等/群体 对等模式移除了中央控制,允许智能体根据预定义的协议直接通信。任何智能体都可以通过显式的交接机制将控制权转移给任何其他智能体。
def transfer_to_agent_b():
return agent_b # 通过函数返回进行交接
agent_a = Agent(
name="Agent A",
functions=[transfer_to_agent_b]
)
何时使用:需要灵活探索的任务、僵化计划适得其反的任务、具有无法预先分解的突发需求的任务。
优点:无单点故障,能有效扩展广度优先探索,支持突发问题解决行为。
缺点:协调复杂性随智能体数量增加,没有中央状态维护者时存在分歧风险,需要强大的收敛约束。
实现说明:定义具有状态传递的显式交接协议。确保智能体能够将其上下文需求传达给接收智能体。
模式 3:层次化 层次化结构将智能体组织成抽象层:战略层、规划层和执行层。战略层智能体定义目标和约束;规划层智能体将目标分解为可执行的计划;执行层智能体执行原子任务。
Strategy Layer (Goal Definition) -> Planning Layer (Task Decomposition) -> Execution Layer (Atomic Tasks)
何时使用:具有清晰层次结构的大规模项目、具有管理层级的企业工作流、需要高层规划和详细执行的任务。
优点:反映组织结构,职责分离清晰,支持在不同层级使用不同的上下文结构。
缺点:层级间的协调开销,战略与执行之间可能存在的错位,复杂的错误传播。
多智能体架构的主要目的是上下文隔离。每个子智能体在一个专注于其子任务的干净上下文窗口中运行,无需携带来自其他子任务的累积上下文。
隔离机制 完全上下文委托:对于子智能体需要完全理解的复杂任务,规划者共享其全部上下文。子智能体拥有自己的工具和指令,但接收完整的上下文以做出决策。
指令传递:对于简单、定义明确的子任务,规划者通过函数调用创建指令。子智能体只接收其特定任务所需的指令。
文件系统内存:对于需要共享状态的复杂任务,智能体读写持久化存储。文件系统作为协调机制,避免了因传递共享状态而导致的上下文膨胀。
隔离权衡 完全上下文委托提供了最大的能力,但违背了使用子智能体的初衷。指令传递保持了隔离,但限制了子智能体的灵活性。文件系统内存实现了无需传递上下文的共享状态,但引入了延迟和一致性的挑战。
正确的选择取决于任务复杂性、协调需求和可接受的延迟。
投票问题 简单的多数投票将弱模型的幻觉与强模型的推理视为同等重要。如果不加干预,多智能体讨论会由于固有的趋同倾向而就错误的前提达成共识。
加权投票 根据置信度或专业知识对智能体的投票进行加权。具有更高置信度或领域专业知识的智能体在最终决策中拥有更大的权重。
辩论协议 辩论协议要求智能体在多轮中相互批评对方的输出。在复杂推理上,对抗性批评通常比协作性共识产生更高的准确性。
基于触发的干预 监控多智能体交互中的特定行为标记。当讨论没有进展时,停滞触发器被激活。阿谀奉承触发器检测智能体是否在没有独特推理的情况下模仿彼此的答案。
不同的框架以不同的理念实现这些模式。LangGraph 使用基于图的状态机,具有显式的节点和边。AutoGen 使用具有 GroupChat 的对话/事件驱动模式。CrewAI 使用具有层次化团队结构的基于角色的流程。
故障:监督者瓶颈 监督者从所有工作者那里积累上下文,容易受到饱和和性能下降的影响。
缓解措施:实现输出模式约束,使工作者只返回提炼后的摘要。使用检查点来持久化监督者状态,而无需携带完整历史记录。
故障:协调开销 智能体通信消耗 token 并引入延迟。复杂的协调可能会抵消并行化的好处。
缓解措施:通过清晰的交接协议最小化通信。尽可能批量处理结果。使用异步通信模式。
故障:分歧 在没有中央协调的情况下追求不同目标的智能体可能会偏离预期目标。
缓解措施:为每个智能体定义清晰的目标边界。实施验证向共享目标进展的收敛检查。对智能体执行设置生存时间限制。
故障:错误传播 一个智能体输出中的错误会传播到使用该输出的下游智能体。
缓解措施:在传递给消费者之前验证智能体输出。使用带有断路器的重试逻辑。尽可能使用幂等操作。
示例 1:研究团队架构
Supervisor
├── Researcher (web search, document retrieval)
├── Analyzer (data analysis, statistics)
├── Fact-checker (verification, validation)
└── Writer (report generation, formatting)
示例 2:交接协议
def handle_customer_request(request):
if request.type == "billing":
return transfer_to(billing_agent)
elif request.type == "technical":
return transfer_to(technical_agent)
elif request.type == "sales":
return transfer_to(sales_agent)
else:
return handle_general(request)
此技能建立在 context-fundamentals 和 context-degradation 之上。它连接到:
内部参考:
本集合中的相关技能:
外部资源:
创建日期 : 2025-12-20 最后更新 : 2025-12-20 作者 : Agent Skills for Context Engineering Contributors 版本 : 1.0.0
每周安装量
159
仓库
GitHub 星标数
27.1K
首次出现
Jan 31, 2026
安全审计
安装于
opencode147
gemini-cli146
github-copilot145
codex144
cursor144
kimi-cli140
Multi-agent architectures distribute work across multiple language model instances, each with its own context window. When designed well, this distribution enables capabilities beyond single-agent limits. When designed poorly, it introduces coordination overhead that negates benefits. The critical insight is that sub-agents exist primarily to isolate context, not to anthropomorphize role division.
Activate this skill when:
Multi-agent systems address single-agent context limitations through distribution. Three dominant patterns exist: supervisor/orchestrator for centralized control, peer-to-peer/swarm for flexible handoffs, and hierarchical for layered abstraction. The critical design principle is context isolation—sub-agents exist primarily to partition context rather than to simulate organizational roles.
Effective multi-agent systems require explicit coordination protocols, consensus mechanisms that avoid sycophancy, and careful attention to failure modes including bottlenecks, divergence, and error propagation.
The Context Bottleneck Single agents face inherent ceilings in reasoning capability, context management, and tool coordination. As tasks grow more complex, context windows fill with accumulated history, retrieved documents, and tool outputs. Performance degrades according to predictable patterns: the lost-in-middle effect, attention scarcity, and context poisoning.
Multi-agent architectures address these limitations by partitioning work across multiple context windows. Each agent operates in a clean context focused on its subtask. Results aggregate at a coordination layer without any single context bearing the full burden.
The Token Economics Reality Multi-agent systems consume significantly more tokens than single-agent approaches. Production data shows:
| Architecture | Token Multiplier | Use Case |
|---|---|---|
| Single agent chat | 1× baseline | Simple queries |
| Single agent with tools | ~4× baseline | Tool-using tasks |
| Multi-agent system | ~15× baseline | Complex research/coordination |
Research on the BrowseComp evaluation found that three factors explain 95% of performance variance: token usage (80% of variance), number of tool calls, and model choice. This validates the multi-agent approach of distributing work across agents with separate context windows to add capacity for parallel reasoning.
Critically, upgrading to better models often provides larger performance gains than doubling token budgets. Claude Sonnet 4.5 showed larger gains than doubling tokens on earlier Sonnet versions. GPT-5.2's thinking mode similarly outperforms raw token increases. This suggests model selection and multi-agent architecture are complementary strategies.
The Parallelization Argument Many tasks contain parallelizable subtasks that a single agent must execute sequentially. A research task might require searching multiple independent sources, analyzing different documents, or comparing competing approaches. A single agent processes these sequentially, accumulating context with each step.
Multi-agent architectures assign each subtask to a dedicated agent with a fresh context. All agents work simultaneously, then return results to a coordinator. The total real-world time approaches the duration of the longest subtask rather than the sum of all subtasks.
The Specialization Argument Different tasks benefit from different agent configurations: different system prompts, different tool sets, different context structures. A general-purpose agent must carry all possible configurations in context. Specialized agents carry only what they need.
Multi-agent architectures enable specialization without combinatorial explosion. The coordinator routes to specialized agents; each agent operates with lean context optimized for its domain.
Pattern 1: Supervisor/Orchestrator The supervisor pattern places a central agent in control, delegating to specialists and synthesizing results. The supervisor maintains global state and trajectory, decomposes user objectives into subtasks, and routes to appropriate workers.
User Query -> Supervisor -> [Specialist, Specialist, Specialist] -> Aggregation -> Final Output
When to use: Complex tasks with clear decomposition, tasks requiring coordination across domains, tasks where human oversight is important.
Advantages: Strict control over workflow, easier to implement human-in-the-loop interventions, ensures adherence to predefined plans.
Disadvantages: Supervisor context becomes bottleneck, supervisor failures cascade to all workers, "telephone game" problem where supervisors paraphrase sub-agent responses incorrectly.
The Telephone Game Problem and Solution LangGraph benchmarks found supervisor architectures initially performed 50% worse than optimized versions due to the "telephone game" problem where supervisors paraphrase sub-agent responses incorrectly, losing fidelity.
The fix: implement a forward_message tool allowing sub-agents to pass responses directly to users:
def forward_message(message: str, to_user: bool = True):
"""
Forward sub-agent response directly to user without supervisor synthesis.
Use when:
- Sub-agent response is final and complete
- Supervisor synthesis would lose important details
- Response format must be preserved exactly
"""
if to_user:
return {"type": "direct_response", "content": message}
return {"type": "supervisor_input", "content": message}
With this pattern, swarm architectures slightly outperform supervisors because sub-agents respond directly to users, eliminating translation errors.
Implementation note: Implement direct pass-through mechanisms allowing sub-agents to pass responses directly to users rather than through supervisor synthesis when appropriate.
Pattern 2: Peer-to-Peer/Swarm The peer-to-peer pattern removes central control, allowing agents to communicate directly based on predefined protocols. Any agent can transfer control to any other through explicit handoff mechanisms.
def transfer_to_agent_b():
return agent_b # Handoff via function return
agent_a = Agent(
name="Agent A",
functions=[transfer_to_agent_b]
)
When to use: Tasks requiring flexible exploration, tasks where rigid planning is counterproductive, tasks with emergent requirements that defy upfront decomposition.
Advantages: No single point of failure, scales effectively for breadth-first exploration, enables emergent problem-solving behaviors.
Disadvantages: Coordination complexity increases with agent count, risk of divergence without central state keeper, requires robust convergence constraints.
Implementation note: Define explicit handoff protocols with state passing. Ensure agents can communicate their context needs to receiving agents.
Pattern 3: Hierarchical Hierarchical structures organize agents into layers of abstraction: strategic, planning, and execution layers. Strategy layer agents define goals and constraints; planning layer agents break goals into actionable plans; execution layer agents perform atomic tasks.
Strategy Layer (Goal Definition) -> Planning Layer (Task Decomposition) -> Execution Layer (Atomic Tasks)
When to use: Large-scale projects with clear hierarchical structure, enterprise workflows with management layers, tasks requiring both high-level planning and detailed execution.
Advantages: Mirrors organizational structures, clear separation of concerns, enables different context structures at different levels.
Disadvantages: Coordination overhead between layers, potential for misalignment between strategy and execution, complex error propagation.
The primary purpose of multi-agent architectures is context isolation. Each sub-agent operates in a clean context window focused on its subtask without carrying accumulated context from other subtasks.
Isolation Mechanisms Full context delegation: For complex tasks where the sub-agent needs complete understanding, the planner shares its entire context. The sub-agent has its own tools and instructions but receives full context for its decisions.
Instruction passing: For simple, well-defined subtasks, the planner creates instructions via function call. The sub-agent receives only the instructions needed for its specific task.
File system memory: For complex tasks requiring shared state, agents read and write to persistent storage. The file system serves as the coordination mechanism, avoiding context bloat from shared state passing.
Isolation Trade-offs Full context delegation provides maximum capability but defeats the purpose of sub-agents. Instruction passing maintains isolation but limits sub-agent flexibility. File system memory enables shared state without context passing but introduces latency and consistency challenges.
The right choice depends on task complexity, coordination needs, and acceptable latency.
The Voting Problem Simple majority voting treats hallucinations from weak models as equal to reasoning from strong models. Without intervention, multi-agent discussions devolve into consensus on false premises due to inherent bias toward agreement.
Weighted Voting Weight agent votes by confidence or expertise. Agents with higher confidence or domain expertise carry more weight in final decisions.
Debate Protocols Debate protocols require agents to critique each other's outputs over multiple rounds. Adversarial critique often yields higher accuracy on complex reasoning than collaborative consensus.
Trigger-Based Intervention Monitor multi-agent interactions for specific behavioral markers. Stall triggers activate when discussions make no progress. Sycophancy triggers detect when agents mimic each other's answers without unique reasoning.
Different frameworks implement these patterns with different philosophies. LangGraph uses graph-based state machines with explicit nodes and edges. AutoGen uses conversational/event-driven patterns with GroupChat. CrewAI uses role-based process flows with hierarchical crew structures.
Failure: Supervisor Bottleneck The supervisor accumulates context from all workers, becoming susceptible to saturation and degradation.
Mitigation: Implement output schema constraints so workers return only distilled summaries. Use checkpointing to persist supervisor state without carrying full history.
Failure: Coordination Overhead Agent communication consumes tokens and introduces latency. Complex coordination can negate parallelization benefits.
Mitigation: Minimize communication through clear handoff protocols. Batch results where possible. Use asynchronous communication patterns.
Failure: Divergence Agents pursuing different goals without central coordination can drift from intended objectives.
Mitigation: Define clear objective boundaries for each agent. Implement convergence checks that verify progress toward shared goals. Use time-to-live limits on agent execution.
Failure: Error Propagation Errors in one agent's output propagate to downstream agents that consume that output.
Mitigation: Validate agent outputs before passing to consumers. Implement retry logic with circuit breakers. Use idempotent operations where possible.
Example 1: Research Team Architecture
Supervisor
├── Researcher (web search, document retrieval)
├── Analyzer (data analysis, statistics)
├── Fact-checker (verification, validation)
└── Writer (report generation, formatting)
Example 2: Handoff Protocol
def handle_customer_request(request):
if request.type == "billing":
return transfer_to(billing_agent)
elif request.type == "technical":
return transfer_to(technical_agent)
elif request.type == "sales":
return transfer_to(sales_agent)
else:
return handle_general(request)
This skill builds on context-fundamentals and context-degradation. It connects to:
Internal reference:
Related skills in this collection:
External resources:
Created : 2025-12-20 Last Updated : 2025-12-20 Author : Agent Skills for Context Engineering Contributors Version : 1.0.0
Weekly Installs
159
Repository
GitHub Stars
27.1K
First Seen
Jan 31, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
opencode147
gemini-cli146
github-copilot145
codex144
cursor144
kimi-cli140
超能力技能使用指南:AI助手技能调用优先级与工作流程详解
46,500 周安装