sadd%3Amulti-agent-patterns by neolabhq/context-engineering-kit
npx skills add https://github.com/neolabhq/context-engineering-kit --skill sadd:multi-agent-patterns多智能体架构将工作分布在多个智能体调用中,每个智能体都有其专注的上下文。设计良好时,这种分布能够实现超越单智能体限制的能力。设计不佳时,则会引入协调开销,抵消其优势。关键洞见在于:子智能体存在的主要目的是隔离上下文,而非拟人化地划分角色。
多智能体系统通过分布工作来解决单智能体的上下文限制。存在三种主要模式:用于集中控制的监督者/协调者模式、用于灵活交接的对等/群体模式,以及用于分层抽象的层次模式。关键的设计原则是上下文隔离——子智能体存在的主要目的是划分上下文,而不是模拟组织角色。
有效的多智能体系统需要明确的协调协议、避免谄媚的共识机制,以及对故障模式(包括瓶颈、分歧和错误传播)的密切关注。
单智能体在推理能力、上下文管理和工具协调方面面临固有的上限。随着任务变得更加复杂,上下文窗口会被累积的历史记录、检索到的文档和工具输出填满。性能会按照可预测的模式下降:中间迷失效应、注意力稀缺和上下文污染。
多智能体架构通过将工作划分到多个上下文窗口来解决这些限制。每个智能体在一个专注于其子任务的干净上下文中运行。结果在协调层聚合,而无需任何单个上下文承担全部负担。
许多任务包含可并行化的子任务,而单智能体必须顺序执行这些子任务。一个研究任务可能需要搜索多个独立的来源、分析不同的文档或比较竞争性的方法。单智能体按顺序处理这些任务,每一步都累积上下文。
多智能体架构将每个子任务分配给一个具有全新上下文的专用智能体。所有智能体同时工作,然后将结果返回给协调者。实际总时间接近最长子任务的持续时间,而不是所有子任务时间的总和。
不同的任务受益于不同的智能体配置:不同的系统提示、不同的工具集、不同的上下文结构。一个通用智能体必须在上下文中携带所有可能的配置。专用智能体只携带它们需要的东西。
多智能体架构实现了专业化,而不会导致组合爆炸。协调者将任务路由到专用智能体;每个智能体都使用为其领域优化的精简上下文运行。
监督者模式将一个中心智能体置于控制地位,将任务委托给专家并综合结果。监督者维护全局状态和轨迹,将用户目标分解为子任务,并路由到适当的工作者。
User Request -> Supervisor -> [Specialist A, Specialist B, Specialist C] -> Aggregation -> Final Output
何时使用: 具有明确分解的复杂任务、需要跨领域协调的任务、人工监督很重要的任务。
优点: 对工作流程的严格控制,更容易实现人在环路的干预,确保遵循预定义的计划。
监督者上下文成为瓶颈,监督者故障会级联到所有工作者,存在“传话游戏”问题,即监督者错误地转述子智能体的响应。
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
Claude Code 实现: 创建一个主命令,通过使用 Task 工具调用专门的子智能体来进行协调。监督者命令包含协调逻辑,并调用子智能体执行专门工作。
<!-- 监督者命令结构示例 -->
1. 分析用户请求并将其分解为子任务
2. 对于每个子任务,分派给相应的专家:
- 使用 Task 工具生成具有专注上下文的子智能体
- 仅向每个子智能体传递相关上下文
3. 收集并综合所有子智能体的结果
4. 向用户返回统一的响应
传话游戏问题: 当监督者错误地转述子智能体的响应,导致保真度丢失时,监督者架构可能表现更差。解决方法:当综合会丢失重要细节时,允许子智能体直接传递响应。在 Claude Code 中,这意味着让子智能体直接写入共享文件或逐字返回其输出,而不是让监督者重写所有内容。
对等模式移除了中心控制,允许智能体基于预定义的协议直接通信。任何智能体都可以通过明确的交接机制将控制权转移给任何其他智能体。
何时使用: 需要灵活探索的任务、刚性规划适得其反的任务、具有无法预先分解的突发需求的任务。
优点: 没有单点故障,对于广度优先探索能有效扩展,能够实现突发的问题解决行为。
缺点: 协调复杂性随智能体数量增加,没有中央状态保持者时存在分歧风险,需要稳健的收敛约束。
Claude Code 实现: 创建能够根据发现的需求调用其他命令的命令。使用共享文件(如任务列表或状态文件)作为协调机制。
<!-- 对等交接结构示例 -->
1. 从共享上下文文件分析当前状态
2. 确定此智能体是否能完成任务
3. 如果需要专门的帮助:
- 将当前发现写入共享状态
- 调用适当的对等命令/技能
4. 继续直到任务完成或进行交接
层次结构将智能体组织成抽象层:战略层、规划层和执行层。战略层智能体定义目标和约束;规划层智能体将目标分解为可执行的计划;执行层智能体执行原子任务。
Strategy Layer (Goal Definition) -> Planning Layer (Task Decomposition) -> Execution Layer (Atomic Tasks)
何时使用: 具有清晰层次结构的大规模项目、具有管理层级的企业工作流、需要高级规划和详细执行的任务。
优点: 反映组织结构,关注点分离清晰,允许在不同层级使用不同的上下文结构。
缺点: 层级间的协调开销,战略与执行之间可能存在的错位,复杂的错误传播。
Claude Code 实现: 在不同抽象级别构建你的插件命令。高级命令专注于战略并调用中级规划命令,后者再调用原子执行命令。
多智能体架构的主要目的是上下文隔离。每个子智能体在一个专注于其子任务的干净上下文窗口中运行,而不携带来自其他子任务的累积上下文。
指令传递: 对于简单、定义明确的子任务,协调者创建专注的指令。子智能体仅接收其特定任务所需的指令。在 Claude Code 中,这意味着通过 Task 工具向子智能体传递最小化、有针对性的提示。
文件系统内存: 对于需要共享状态的复杂任务,智能体读写持久存储。文件系统作为协调机制,避免了因传递共享状态而导致的上下文膨胀。这是 Claude Code 最自然的模式——智能体通过 Markdown 文件、JSON 状态文件或结构化文档进行通信。
完整上下文委托: 对于子智能体需要完全理解的复杂任务,协调者共享其整个上下文。子智能体拥有自己的工具和指令,但接收完整的上下文以做出决策。应谨慎使用,因为它违背了上下文隔离的目的。
完整上下文委托提供了最大能力,但违背了子智能体的目的。指令传递保持了隔离,但限制了子智能体的灵活性。文件系统内存实现了无需上下文传递的共享状态,但引入了一致性挑战。
正确的选择取决于任务复杂性、协调需求和工作性质。
简单的多数投票将弱推理产生的幻觉视为与合理推理同等重要。如果不加干预,多智能体讨论可能会因为固有的趋同倾向而就错误的前提达成共识。
根据置信度或专业知识对智能体的贡献进行加权。具有更高置信度或领域专业知识的智能体在最终决策中拥有更大的权重。
辩论协议要求智能体在多轮中相互批评对方的输出。在复杂推理上,对抗性批评通常比协作性共识产生更高的准确性。
Claude Code 实现: 创建一个审查阶段,让一个智能体批评另一个的输出。将其构建为单独的命令:一个用于初始工作,一个用于批评,以及可选的一个用于基于批评进行修订。
监控多智能体交互中的特定行为标记:
监督者累积来自所有工作者的上下文,容易受到饱和和性能下降的影响。
缓解措施: 实施输出约束,使工作者仅返回提炼后的摘要。使用基于文件的检查点来持久化状态,而无需在上下文中携带完整历史记录。
智能体通信消耗令牌并引入延迟。复杂的协调可能抵消并行化的好处。
缓解措施: 通过清晰的交接协议最小化通信。使用结构化文件格式进行智能体间通信。尽可能批量处理结果。
在没有中央协调的情况下追求不同目标的智能体可能会偏离预期目标。
缓解措施: 为每个智能体定义清晰的目标边界。实施验证向共享目标进展的收敛检查。对智能体执行设置迭代限制。
一个智能体输出中的错误会传播到消费该输出的下游智能体。
缓解措施: 在传递给消费者之前验证智能体输出。实施重试逻辑。设计组件故障时的优雅降级。
创建一个主命令,用于:
为专门领域定义子智能体:
使用文件系统进行智能体间协调:
Supervisor Command: review-code
├── Subagent: security-review (security specialist)
├── Subagent: performance-review (performance specialist)
├── Subagent: style-review (style/conventions specialist)
└── Aggregation: combine findings, deduplicate, prioritize
每个子智能体仅接收要审查的代码及其专业焦点。监督者将所有发现汇总成一个统一的审查报告。
对于跨越多个会话或需要持久状态的任务,使用基于文件的内存:
上下文窗口本身。提供即时访问,但会话结束时消失。仅保留活动信息;总结已完成的工作。
在会话期间创建的文件,用于跟踪进度:
跨会话持久存在的文件:
选择满足需求的最简单内存机制。基于文件的内存是透明的、可调试的,并且不需要基础设施。
内存提供了持久层,使智能体能够在会话间保持连续性,并对积累的知识进行推理。简单的智能体完全依赖上下文作为内存,会话结束时丢失所有状态。复杂的智能体实现分层内存架构,以平衡即时上下文需求与长期知识保留。从向量存储到知识图谱再到时序知识图谱的演变,代表了为改进检索和推理而在结构化内存方面不断增加的投资。
内存存在于从即时上下文到永久存储的连续谱上。在一个极端,上下文窗口中的工作内存提供零延迟访问,但会话结束时消失。在另一个极端,永久存储无限期持久存在,但需要检索才能进入上下文。
简单的向量存储缺乏关系和时序结构。知识图谱保留关系以供推理。时序知识图谱为时间感知查询添加了有效期。实现选择取决于查询复杂性、基础设施约束和准确性要求。
上下文-内存连续谱 内存存在于从即时上下文到永久存储的连续谱上。在一个极端,上下文窗口中的工作内存提供零延迟访问,但会话结束时消失。在另一个极端,永久存储无限期持久存在,但需要检索才能进入上下文。有效的架构使用此连续谱上的多个层级。
该连续谱包括工作内存(上下文窗口、零延迟、易失性)、短期内存(会话持久、可搜索、易失性)、长期内存(跨会话持久、结构化、半永久性)和永久内存(归档、可查询、永久性)。每一层都有不同的延迟、容量和持久性特征。
为何简单的向量存储不足 向量 RAG 通过在共享嵌入空间中嵌入查询和文档来提供语义检索。相似性搜索检索语义上最相似的文档。这对于文档检索效果很好,但缺乏用于智能体内存的结构。
向量存储丢失关系信息。如果智能体了解到“客户 X 在日期 Z 购买了产品 Y”,向量存储可以在直接询问时检索到这个事实。但它无法回答“购买了产品 Y 的客户还购买了哪些产品?”,因为没有保留关系结构。
向量存储也难以处理时序有效性。事实会随时间变化,但向量存储除了通过显式元数据和过滤外,没有机制来区分“当前事实”和“过时事实”。
向基于图的内存迁移 知识图谱保留实体之间的关系。与孤立的文档块不同,图谱编码了实体 A 与实体 B 具有关系 R。这使得能够遍历关系而不仅仅是相似性的查询成为可能。
时序知识图谱为事实添加有效期。每个事实都有一个“有效起始”时间戳和可选的“有效截止”时间戳。这使得能够进行时间旅行查询,以重建特定时间点的知识。
基准性能比较 深度内存检索(DMR)基准测试提供了跨内存架构的具体性能数据:
| 内存系统 | DMR 准确率 | 检索延迟 | 备注 |
|---|---|---|---|
| Zep (时序 KG) | 94.8% | 2.58s | 最佳准确率,快速检索 |
| MemGPT | 93.4% | 可变 | 良好的一般性能 |
| GraphRAG | ~75-85% | 可变 | 比基线 RAG 提升 20-35% |
| 向量 RAG | ~60-70% | 快速 | 丢失关系结构 |
| 递归摘要 | 35.3% | 低 | 严重信息丢失 |
Zep 展示了与全上下文基线相比,检索延迟减少了 90%(2.58 秒 vs GPT-5.2 的 28.9 秒)。这种效率来自于仅检索相关子图,而不是整个上下文历史记录。
GraphRAG 在复杂推理任务中实现了比基线 RAG 约 20-35% 的准确率提升,并通过基于社区的摘要将幻觉减少高达 30%。
第 1 层:工作内存 工作内存是上下文窗口本身。它提供对当前正在处理的信息的即时访问,但容量有限,并且在会话结束时消失。
工作内存使用模式包括:记录中间结果的便签式计算、为当前任务保留对话的对话历史、跟踪活动目标进度的当前任务状态,以及保存当前正在使用的信息的活动检索文档。
优化工作内存的方法是:仅保留活动信息,在信息脱离注意力之前总结已完成的工作,并将关键信息放在注意力偏好的位置。
第 2 层:短期内存 短期内存在当前会话期间持久存在,但不会跨会话。它提供搜索和检索功能,而没有永久存储的延迟。
常见的实现包括:持续到会话结束的会话范围数据库、指定会话目录中的文件系统存储,以及按会话 ID 键控的内存缓存。
短期内存的用例包括:在不塞满上下文的情况下跨轮次跟踪对话状态、存储工具调用可能稍后需要的中间结果、维护任务清单和进度跟踪,以及在会话内缓存检索到的信息。
第 3 层:长期内存 长期内存在会话间无限期持久存在。它使智能体能够从过去的交互中学习,并随时间积累知识。
长期内存的实现范围从简单的键值存储到复杂的图数据库。选择取决于要建模的关系复杂性、所需的查询模式以及可接受的基础设施复杂性。
长期内存的用例包括:跨会话学习用户偏好、随时间增长构建领域知识库、维护具有关系历史的实体注册表,以及存储可以重用的成功模式。
第 4 层:实体内存 实体内存专门跟踪关于实体(人、地点、概念、对象)的信息以保持一致性。这创建了一个基本的知识图谱,其中实体在多次交互中被识别。
实体内存通过跟踪在一次对话中提到的“John Doe”与另一次对话中是同一个人来维护实体身份。它通过存储随时间发现的关于实体的事实来维护实体属性。它通过跟踪发现的实体之间的关系来维护实体关系。
第 5 层:时序知识图谱 时序知识图谱通过显式有效期扩展了实体内存。事实不仅仅是真或假,而是在特定时间范围内为真。
这使得诸如“用户在日期 X 的地址是什么?”之类的查询成为可能,通过检索在该日期范围内有效的事实来实现。它防止了过时信息与新数据矛盾时的上下文冲突。它使得能够对实体随时间的变化进行时序推理。
模式 1:文件系统即内存 文件系统本身可以作为一个内存层。这种模式简单,不需要额外的基础设施,并且实现了与使基于文件系统的上下文有效相同的即时加载。
实现使用文件系统层次结构进行组织。使用传达含义的命名约定。以结构化格式(JSON、YAML)存储事实。在文件名或元数据中使用时间戳进行时序跟踪。
优点:简单、透明、可移植。缺点:没有语义搜索,没有关系跟踪,需要手动组织。
模式 2:带元数据的向量 RAG 增强丰富元数据的向量存储提供了具有过滤功能的语义搜索。
实现嵌入事实或文档,并存储带有元数据,包括实体标签、时序有效性、来源归属和置信度分数。查询包括元数据过滤器以及语义搜索。
模式 3:知识图谱 知识图谱显式地建模实体和关系。实现定义实体类型和关系类型,使用图数据库或属性图存储,并为常见查询模式维护索引。
模式 4:时序知识图谱 时序知识图谱为事实添加有效期,使得时间旅行查询成为可能,并防止过时信息导致的上下文冲突。
语义检索 使用嵌入相似性搜索检索与当前查询语义相似的记忆。
基于实体的检索 通过遍历图关系检索与特定实体相关的所有记忆。
时序检索 使用有效期过滤器检索在特定时间或时间范围内有效的记忆。
记忆随时间积累,需要整合以防止无限增长并移除过时信息。
整合触发器 在显著的内存积累后、当检索返回过多过时结果时、按计划定期或在明确请求整合时触发整合。
整合过程 识别过时事实、合并相关事实、更新有效期、归档或删除过时事实,以及重建索引。
每周安装数
214
代码仓库
GitHub 星标数
699
首次出现
2026年2月19日
安装于
opencode209
codex208
github-copilot208
gemini-cli207
kimi-cli205
cursor205
Multi-agent architectures distribute work across multiple agent invocations, each with its own focused context. When designed well, this distribution enables capabilities beyond single-agent limits. When designed poorly, it introduces coordination overhead that negates benefits. The critical insight is that sub-agents exist primarily to isolate context, not to anthropomorphize role division.
Multi-agent systems address single-agent context limitations through distribution. Three dominant patterns exist: supervisor/orchestrator for centralized control, peer-to-peer/swarm for flexible handoffs, and hierarchical for layered abstraction. The critical design principle is context isolation—sub-agents exist primarily to partition context rather than to simulate organizational roles.
Effective multi-agent systems require explicit coordination protocols, consensus mechanisms that avoid sycophancy, and careful attention to failure modes including bottlenecks, divergence, and error propagation.
Single agents face inherent ceilings in reasoning capability, context management, and tool coordination. As tasks grow more complex, context windows fill with accumulated history, retrieved documents, and tool outputs. Performance degrades according to predictable patterns: the lost-in-middle effect, attention scarcity, and context poisoning.
Multi-agent architectures address these limitations by partitioning work across multiple context windows. Each agent operates in a clean context focused on its subtask. Results aggregate at a coordination layer without any single context bearing the full burden.
Many tasks contain parallelizable subtasks that a single agent must execute sequentially. A research task might require searching multiple independent sources, analyzing different documents, or comparing competing approaches. A single agent processes these sequentially, accumulating context with each step.
Multi-agent architectures assign each subtask to a dedicated agent with a fresh context. All agents work simultaneously, then return results to a coordinator. The total real-world time approaches the duration of the longest subtask rather than the sum of all subtasks.
Different tasks benefit from different agent configurations: different system prompts, different tool sets, different context structures. A general-purpose agent must carry all possible configurations in context. Specialized agents carry only what they need.
Multi-agent architectures enable specialization without combinatorial explosion. The coordinator routes to specialized agents; each agent operates with lean context optimized for its domain.
The supervisor pattern places a central agent in control, delegating to specialists and synthesizing results. The supervisor maintains global state and trajectory, decomposes user objectives into subtasks, and routes to appropriate workers.
User Request -> Supervisor -> [Specialist A, Specialist B, Specialist C] -> Aggregation -> Final Output
When to use: Complex tasks with clear decomposition, tasks requiring coordination across domains, tasks where human oversight is important.
Advantages: Strict control over workflow, easier to implement human-in-the-loop interventions, ensures adherence to predefined plans.
Disadvantages: Supervisor context becomes bottleneck, supervisor failures cascade to all workers, "telephone game" problem where supervisors paraphrase sub-agent responses incorrectly.
Claude Code Implementation: Create a main command that orchestrates by calling specialized subagents using the Task tool. The supervisor command contains the coordination logic and calls subagents for specialized work.
<!-- Example supervisor command structure -->
1. Analyze the user request and decompose into subtasks
2. For each subtask, dispatch to appropriate specialist:
- Use Task tool to spawn subagent with focused context
- Pass only relevant context to each subagent
3. Collect and synthesize results from all subagents
4. Return unified response to user
The Telephone Game Problem: Supervisor architectures can perform worse when supervisors paraphrase sub-agent responses incorrectly, losing fidelity. The fix: allow sub-agents to pass responses directly when synthesis would lose important details. In Claude Code, this means letting subagents write directly to shared files or return their output verbatim rather than having the supervisor rewrite everything.
The peer-to-peer pattern removes central control, allowing agents to communicate directly based on predefined protocols. Any agent can transfer control to any other through explicit handoff mechanisms.
When to use: Tasks requiring flexible exploration, tasks where rigid planning is counterproductive, tasks with emergent requirements that defy upfront decomposition.
Advantages: No single point of failure, scales effectively for breadth-first exploration, enables emergent problem-solving behaviors.
Disadvantages: Coordination complexity increases with agent count, risk of divergence without central state keeper, requires robust convergence constraints.
Claude Code Implementation: Create commands that can invoke other commands based on discovered needs. Use shared files (like task lists or state files) as the coordination mechanism.
<!-- Example peer handoff structure -->
1. Analyze current state from shared context file
2. Determine if this agent can complete the task
3. If specialized help needed:
- Write current findings to shared state
- Invoke appropriate peer command/skill
4. Continue until task complete or hand off
Hierarchical structures organize agents into layers of abstraction: strategic, planning, and execution layers. Strategy layer agents define goals and constraints; planning layer agents break goals into actionable plans; execution layer agents perform atomic tasks.
Strategy Layer (Goal Definition) -> Planning Layer (Task Decomposition) -> Execution Layer (Atomic Tasks)
When to use: Large-scale projects with clear hierarchical structure, enterprise workflows with management layers, tasks requiring both high-level planning and detailed execution.
Advantages: Mirrors organizational structures, clear separation of concerns, enables different context structures at different levels.
Disadvantages: Coordination overhead between layers, potential for misalignment between strategy and execution, complex error propagation.
Claude Code Implementation: Structure your plugin with commands at different abstraction levels. High-level commands focus on strategy and call mid-level planning commands, which in turn call atomic execution commands.
The primary purpose of multi-agent architectures is context isolation. Each sub-agent operates in a clean context window focused on its subtask without carrying accumulated context from other subtasks.
Instruction passing: For simple, well-defined subtasks, the coordinator creates focused instructions. The sub-agent receives only the instructions needed for its specific task. In Claude Code, this means passing minimal, targeted prompts to subagents via the Task tool.
File system memory: For complex tasks requiring shared state, agents read and write to persistent storage. The file system serves as the coordination mechanism, avoiding context bloat from shared state passing. This is the most natural pattern for Claude Code—agents communicate through markdown files, JSON state files, or structured documents.
Full context delegation: For complex tasks where the sub-agent needs complete understanding, the coordinator shares its entire context. The sub-agent has its own tools and instructions but receives full context for its decisions. Use sparingly as it defeats the purpose of context isolation.
Full context delegation provides maximum capability but defeats the purpose of sub-agents. Instruction passing maintains isolation but limits sub-agent flexibility. File system memory enables shared state without context passing but introduces consistency challenges.
The right choice depends on task complexity, coordination needs, and the nature of the work.
Simple majority voting treats hallucinations from weak reasoning as equal to sound reasoning. Without intervention, multi-agent discussions can devolve into consensus on false premises due to inherent bias toward agreement.
Weight agent contributions by confidence or expertise. Agents with higher confidence or domain expertise carry more weight in final decisions.
Debate protocols require agents to critique each other's outputs over multiple rounds. Adversarial critique often yields higher accuracy on complex reasoning than collaborative consensus.
Claude Code Implementation: Create a review stage where one agent critiques another's output. Structure this as separate commands: one for initial work, one for critique, and optionally one for revision based on critique.
Monitor multi-agent interactions for specific behavioral markers:
The supervisor accumulates context from all workers, becoming susceptible to saturation and degradation.
Mitigation: Implement output constraints so workers return only distilled summaries. Use file-based checkpointing to persist state without carrying full history in context.
Agent communication consumes tokens and introduces latency. Complex coordination can negate parallelization benefits.
Mitigation: Minimize communication through clear handoff protocols. Use structured file formats for inter-agent communication. Batch results where possible.
Agents pursuing different goals without central coordination can drift from intended objectives.
Mitigation: Define clear objective boundaries for each agent. Implement convergence checks that verify progress toward shared goals. Use iteration limits on agent execution.
Errors in one agent's output propagate to downstream agents that consume that output.
Mitigation: Validate agent outputs before passing to consumers. Implement retry logic. Design for graceful degradation when components fail.
Create a main command that:
Define Subagents for specialized domains:
Use the file system for inter-agent coordination:
Supervisor Command: review-code
├── Subagent: security-review (security specialist)
├── Subagent: performance-review (performance specialist)
├── Subagent: style-review (style/conventions specialist)
└── Aggregation: combine findings, deduplicate, prioritize
Each subagent receives only the code to review and their specialty focus. The supervisor aggregates all findings into a unified review.
For tasks spanning multiple sessions or requiring persistent state, use file-based memory:
The context window itself. Provides immediate access but vanishes when sessions end. Keep only active information; summarize completed work.
Files created during a session that track progress:
Persistent files that survive across sessions:
Choose the simplest memory mechanism that meets your needs. File-based memory is transparent, debuggable, and requires no infrastructure.
Memory provides the persistence layer that allows agents to maintain continuity across sessions and reason over accumulated knowledge. Simple agents rely entirely on context for memory, losing all state when sessions end. Sophisticated agents implement layered memory architectures that balance immediate context needs with long-term knowledge retention. The evolution from vector stores to knowledge graphs to temporal knowledge graphs represents increasing investment in structured memory for improved retrieval and reasoning.
Memory exists on a spectrum from immediate context to permanent storage. At one extreme, working memory in the context window provides zero-latency access but vanishes when sessions end. At the other extreme, permanent storage persists indefinitely but requires retrieval to enter context.
Simple vector stores lack relationship and temporal structure. Knowledge graphs preserve relationships for reasoning. Temporal knowledge graphs add validity periods for time-aware queries. Implementation choices depend on query complexity, infrastructure constraints, and accuracy requirements.
The Context-Memory Spectrum Memory exists on a spectrum from immediate context to permanent storage. At one extreme, working memory in the context window provides zero-latency access but vanishes when sessions end. At the other extreme, permanent storage persists indefinitely but requires retrieval to enter context. Effective architectures use multiple layers along this spectrum.
The spectrum includes working memory (context window, zero latency, volatile), short-term memory (session-persistent, searchable, volatile), long-term memory (cross-session persistent, structured, semi-permanent), and permanent memory (archival, queryable, permanent). Each layer has different latency, capacity, and persistence characteristics.
Why Simple Vector Stores Fall Short Vector RAG provides semantic retrieval by embedding queries and documents in a shared embedding space. Similarity search retrieves the most semantically similar documents. This works well for document retrieval but lacks structure for agent memory.
Vector stores lose relationship information. If an agent learns that "Customer X purchased Product Y on Date Z," a vector store can retrieve this fact if asked directly. But it cannot answer "What products did customers who purchased Product Y also buy?" because relationship structure is not preserved.
Vector stores also struggle with temporal validity. Facts change over time, but vector stores provide no mechanism to distinguish "current fact" from "outdated fact" except through explicit metadata and filtering.
The Move to Graph-Based Memory Knowledge graphs preserve relationships between entities. Instead of isolated document chunks, graphs encode that Entity A has Relationship R to Entity B. This enables queries that traverse relationships rather than just similarity.
Temporal knowledge graphs add validity periods to facts. Each fact has a "valid from" and optionally "valid until" timestamp. This enables time-travel queries that reconstruct knowledge at specific points in time.
Benchmark Performance Comparison The Deep Memory Retrieval (DMR) benchmark provides concrete performance data across memory architectures:
| Memory System | DMR Accuracy | Retrieval Latency | Notes |
|---|---|---|---|
| Zep (Temporal KG) | 94.8% | 2.58s | Best accuracy, fast retrieval |
| MemGPT | 93.4% | Variable | Good general performance |
| GraphRAG | ~75-85% | Variable | 20-35% gains over baseline RAG |
| Vector RAG | ~60-70% | Fast | Loses relationship structure |
| Recursive Summarization | 35.3% | Low | Severe information loss |
Zep demonstrated 90% reduction in retrieval latency compared to full-context baselines (2.58s vs 28.9s for GPT-5.2). This efficiency comes from retrieving only relevant subgraphs rather than entire context history.
GraphRAG achieves approximately 20-35% accuracy gains over baseline RAG in complex reasoning tasks and reduces hallucination by up to 30% through community-based summarization.
Layer 1: Working Memory Working memory is the context window itself. It provides immediate access to information currently being processed but has limited capacity and vanishes when sessions end.
Working memory usage patterns include scratchpad calculations where agents track intermediate results, conversation history that preserves dialogue for current task, current task state that tracks progress on active objectives, and active retrieved documents that hold information currently being used.
Optimize working memory by keeping only active information, summarizing completed work before it falls out of attention, and using attention-favored positions for critical information.
Layer 2: Short-Term Memory Short-term memory persists across the current session but not across sessions. It provides search and retrieval capabilities without the latency of permanent storage.
Common implementations include session-scoped databases that persist until session end, file-system storage in designated session directories, and in-memory caches keyed by session ID.
Short-term memory use cases include tracking conversation state across turns without stuffing context, storing intermediate results from tool calls that may be needed later, maintaining task checklists and progress tracking, and caching retrieved information within sessions.
Layer 3: Long-Term Memory Long-term memory persists across sessions indefinitely. It enables agents to learn from past interactions and build knowledge over time.
Long-term memory implementations range from simple key-value stores to sophisticated graph databases. The choice depends on complexity of relationships to model, query patterns required, and acceptable infrastructure complexity.
Long-term memory use cases include learning user preferences across sessions, building domain knowledge bases that grow over time, maintaining entity registries with relationship history, and storing successful patterns that can be reused.
Layer 4: Entity Memory Entity memory specifically tracks information about entities (people, places, concepts, objects) to maintain consistency. This creates a rudimentary knowledge graph where entities are recognized across multiple interactions.
Entity memory maintains entity identity by tracking that "John Doe" mentioned in one conversation is the same person in another. It maintains entity properties by storing facts discovered about entities over time. It maintains entity relationships by tracking relationships between entities as they are discovered.
Layer 5: Temporal Knowledge Graphs Temporal knowledge graphs extend entity memory with explicit validity periods. Facts are not just true or false but true during specific time ranges.
This enables queries like "What was the user's address on Date X?" by retrieving facts valid during that date range. It prevents context clash when outdated information contradicts new data. It enables temporal reasoning about how entities changed over time.
Pattern 1: File-System-as-Memory The file system itself can serve as a memory layer. This pattern is simple, requires no additional infrastructure, and enables the same just-in-time loading that makes file-system-based context effective.
Implementation uses the file system hierarchy for organization. Use naming conventions that convey meaning. Store facts in structured formats (JSON, YAML). Use timestamps in filenames or metadata for temporal tracking.
Advantages: Simplicity, transparency, portability. Disadvantages: No semantic search, no relationship tracking, manual organization required.
Pattern 2: Vector RAG with Metadata Vector stores enhanced with rich metadata provide semantic search with filtering capabilities.
Implementation embeds facts or documents and stores with metadata including entity tags, temporal validity, source attribution, and confidence scores. Query includes metadata filters alongside semantic search.
Pattern 3: Knowledge Graph Knowledge graphs explicitly model entities and relationships. Implementation defines entity types and relationship types, uses graph database or property graph storage, and maintains indexes for common query patterns.
Pattern 4: Temporal Knowledge Graph Temporal knowledge graphs add validity periods to facts, enabling time-travel queries and preventing context clash from outdated information.
Semantic Retrieval Retrieve memories semantically similar to current query using embedding similarity search.
Entity-Based Retrieval Retrieve all memories related to specific entities by traversing graph relationships.
Temporal Retrieval Retrieve memories valid at specific time or within time range using validity period filters.
Memories accumulate over time and require consolidation to prevent unbounded growth and remove outdated information.
Consolidation Triggers Trigger consolidation after significant memory accumulation, when retrieval returns too many outdated results, periodically on a schedule, or when explicit consolidation is requested.
Consolidation Process Identify outdated facts, merge related facts, update validity periods, archive or delete obsolete facts, and rebuild indexes.
Weekly Installs
214
Repository
GitHub Stars
699
First Seen
Feb 19, 2026
Installed on
opencode209
codex208
github-copilot208
gemini-cli207
kimi-cli205
cursor205
超能力技能使用指南:AI助手技能调用优先级与工作流程详解
45,100 周安装