记忆系统设计指南：构建智能体的短期、长期与图记忆架构 | AI智能体开发

memory-systems by sickn33/antigravity-awesome-skills

278 周安装量

30,900 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill memory-systems

AI/机器学习系统架构知识管理

🇨🇳中文介绍

何时使用此技能

设计短期、长期和基于图的记忆架构

当需要设计短期、长期和基于图的记忆架构时，请使用此技能。

记忆系统设计

记忆提供了持久层，使智能体能够在多个会话间保持连续性，并对积累的知识进行推理。简单的智能体完全依赖上下文作为记忆，当会话结束时所有状态都会丢失。复杂的智能体则实现分层的记忆架构，以平衡即时上下文需求和长期知识保留。从向量存储到知识图谱，再到时序知识图谱的演进，代表了为改进检索和推理而在结构化记忆方面不断增加的投资。

何时使用

在以下情况下激活此技能：

构建必须跨会话持久存在的智能体
需要在对话间保持实体一致性
实现对积累知识的推理
设计能够从过去交互中学习的系统
创建随时间增长的知识库
构建能够跟踪状态变化的时序感知系统

核心概念

记忆存在于从即时上下文到永久存储的连续谱上。在一个极端，上下文窗口中的工作记忆提供零延迟访问，但会话结束时就会消失。在另一个极端，永久存储无限期地持久存在，但需要检索才能进入上下文。

简单的向量存储缺乏关系和时间结构。知识图谱保留关系以进行推理。时序知识图谱为时序感知查询添加了有效期。实现选择取决于查询复杂性、基础设施约束和准确性要求。

详细主题

记忆架构基础

上下文-记忆连续谱 记忆存在于从即时上下文到永久存储的连续谱上。在一个极端，上下文窗口中的工作记忆提供零延迟访问，但会话结束时就会消失。在另一个极端，永久存储无限期地持久存在，但需要检索才能进入上下文。有效的架构会沿此连续谱使用多个层次。

该连续谱包括工作记忆（上下文窗口，零延迟，易失性）、短期记忆（会话持久，可搜索，易失性）、长期记忆（跨会话持久，结构化，半永久性）和永久记忆（归档，可查询，永久性）。每一层都有不同的延迟、容量和持久性特征。

为何简单的向量存储不足 向量 RAG 通过在共享嵌入空间中嵌入查询和文档来提供语义检索。相似性搜索检索语义最相似的文档。这对于文档检索效果很好，但缺乏用于智能体记忆的结构。

向量存储会丢失关系信息。如果智能体了解到“客户 X 在日期 Z 购买了产品 Y”，当被直接询问时，向量存储可以检索到这个事实。但它无法回答“购买了产品 Y 的客户还购买了哪些产品？”，因为没有保留关系结构。

向量存储也难以处理时间有效性。事实会随时间变化，但向量存储除了通过显式元数据和过滤外，没有机制来区分“当前事实”和“过时事实”。

向基于图的记忆演进 知识图谱保留了实体之间的关系。与孤立的文档块不同，图谱编码了实体 A 与实体 B 具有关系 R。这使得能够进行遍历关系的查询，而不仅仅是相似性查询。

时序知识图谱为事实添加了有效期。每个事实都有一个“有效起始”时间戳，并可选择性地包含“有效截止”时间戳。这使得能够进行时间旅行查询，以重建特定时间点的知识。

基准性能比较 深度记忆检索（DMR）基准提供了跨记忆架构的具体性能数据：

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

相关 Skills

find-skills 技能搜索工具 - Vercel Labs 开源智能体技能包管理器

879,700 周安装

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

118,000 周安装

AI Elements：基于shadcn/ui的AI原生应用组件库，快速构建对话界面

67,500 周安装

超能力技能使用指南：AI助手技能调用优先级与工作流程详解

49,600 周安装

记忆系统	DMR 准确率	检索延迟	备注
Zep（时序 KG）	94.8%	2.58s	最佳准确率，快速检索
MemGPT	93.4%	可变	良好的通用性能
GraphRAG	~75-85%	可变	相比基线 RAG 提升 20-35%
向量 RAG	~60-70%	快速	丢失关系结构
递归摘要	35.3%	低	严重信息丢失

Zep 相比完整上下文基线，检索延迟减少了 90%（2.58 秒 vs GPT-5.2 的 28.9 秒）。这种效率来自于仅检索相关的子图，而非整个上下文历史。

GraphRAG 在复杂推理任务中相比基线 RAG 实现了约 20-35% 的准确率提升，并通过基于社区的摘要将幻觉减少高达 30%。

第 1 层：工作记忆 工作记忆就是上下文窗口本身。它提供对当前正在处理信息的即时访问，但容量有限，并且在会话结束时消失。

工作记忆的使用模式包括：跟踪中间结果的草稿计算、为当前任务保留对话的对话历史、跟踪活动目标进度的当前任务状态，以及保存当前正在使用信息的活动检索文档。

优化工作记忆的方法包括：仅保留活动信息、在信息脱离注意力前对已完成的工作进行摘要，以及对关键信息使用注意力偏好的位置。

第 2 层：短期记忆 短期记忆在当前会话期间持久存在，但不会跨会话。它提供搜索和检索能力，而无需永久存储的延迟。

常见的实现包括：持续到会话结束的会话范围数据库、指定会话目录中的文件系统存储，以及按会话 ID 键控的内存缓存。

短期记忆的用例包括：跨轮次跟踪对话状态而无需塞满上下文、存储工具调用可能稍后需要的中间结果、维护任务清单和进度跟踪，以及在会话内缓存检索到的信息。

第 3 层：长期记忆 长期记忆跨会话无限期地持久存在。它使智能体能够从过去的交互中学习，并随时间积累知识。

长期记忆的实现范围从简单的键值存储到复杂的图数据库。选择取决于要建模的关系复杂性、所需的查询模式以及可接受的基础设施复杂性。

长期记忆的用例包括：跨会话学习用户偏好、构建随时间增长的知识库、维护包含关系历史的实体注册表，以及存储可以重用的成功模式。

第 4 层：实体记忆 实体记忆专门跟踪关于实体（人、地点、概念、对象）的信息以保持一致性。这创建了一个基本的知识图谱，其中实体在多次交互中被识别。

实体记忆通过跟踪在一个对话中提到的“John Doe”与另一个对话中是同一个人来维护实体身份。它通过存储随时间发现的关于实体的事实来维护实体属性。它通过跟踪发现的实体之间的关系来维护实体关系。

第 5 层：时序知识图谱 时序知识图谱通过显式的有效期扩展了实体记忆。事实不仅仅是真或假，而是在特定时间范围内为真。

这使得能够进行诸如“用户在日期 X 的地址是什么？”的查询，通过检索在该日期范围内有效的事实来实现。它防止了过时信息与新数据冲突时的上下文冲突。它使得能够对实体如何随时间变化进行时序推理。

模式 1：文件系统即记忆 文件系统本身可以作为一个记忆层。这种模式简单，不需要额外的基础设施，并且实现了与基于文件系统的上下文同样有效的即时加载。

实现使用文件系统层次结构进行组织。使用传达含义的命名约定。以结构化格式（JSON， YAML）存储事实。在文件名或元数据中使用时间戳进行时间跟踪。

优点：简单、透明、可移植。缺点：无语义搜索、无关系跟踪、需要手动组织。

模式 2：带元数据的向量 RAG 通过丰富元数据增强的向量存储提供了具有过滤能力的语义搜索。

实现嵌入事实或文档并存储，元数据包括实体标签、时间有效性、来源归属和置信度分数。查询包括元数据过滤器以及语义搜索。

模式 3：知识图谱 知识图谱显式地建模实体和关系。实现定义实体类型和关系类型，使用图数据库或属性图存储，并为常见查询模式维护索引。

模式 4：时序知识图谱 时序知识图谱为事实添加有效期，使得能够进行时间旅行查询，并防止过时信息导致的上下文冲突。

语义检索 使用嵌入相似性搜索检索与当前查询语义相似的记忆。

基于实体的检索 通过遍历图关系检索与特定实体相关的所有记忆。

时序检索 使用有效期过滤器检索在特定时间或时间范围内有效的记忆。

记忆随时间积累，需要整合以防止无限增长并移除过时信息。

整合触发 在大量记忆积累后、当检索返回过多过时结果时、按计划定期或在请求显式整合时触发整合。

整合过程 识别过时事实、合并相关事实、更新有效期、归档或删除过时事实，以及重建索引。

记忆必须与上下文系统集成才能发挥作用。使用即时记忆加载在需要时检索相关记忆。使用策略性注入将记忆放置在注意力偏好的位置。

根据需求选择记忆架构：

简单持久性需求：文件系统记忆
语义搜索需求：带元数据的向量 RAG
关系推理需求：知识图谱
时间有效性需求：时序知识图谱

示例 1：实体跟踪

# 跨对话跟踪实体
def remember_entity(entity_id, properties):
    memory.store({
        "type": "entity",
        "id": entity_id,
        "properties": properties,
        "last_updated": now()
    })

def get_entity(entity_id):
    return memory.retrieve_entity(entity_id)

示例 2：时序查询

# 用户在 2024 年 1 月 15 日的地址是什么？
def query_address_at_time(user_id, query_time):
    return temporal_graph.query("""
        MATCH (user)-[r:LIVES_AT]->(address)
        WHERE user.id = $user_id
        AND r.valid_from <= $query_time
        AND (r.valid_until IS NULL OR r.valid_until > $query_time)
        RETURN address
    """, {"user_id": user_id, "query_time": query_time})

使记忆架构与查询需求相匹配
为记忆访问实现渐进式披露
使用时间有效性防止过时信息冲突
定期整合记忆以防止无限增长
优雅地设计记忆检索失败的处理
考虑持久性记忆的隐私影响
为关键记忆实现备份和恢复
随时间监控记忆增长和性能

此技能建立在 context-fundamentals 之上。它连接到：

multi-agent-patterns - 跨智能体的共享记忆
context-optimization - 基于记忆的上下文加载
evaluation - 评估记忆质量

Implementation Reference - 详细的实现模式

此集合中的相关技能：

context-fundamentals - 上下文基础
multi-agent-patterns - 跨智能体记忆

图数据库文档（Neo4j 等）
向量存储文档（Pinecone， Weaviate 等）
关于知识图谱和推理的研究

创建日期 : 2025-12-20 最后更新 : 2025-12-20 作者 : Agent Skills for Context Engineering Contributors 版本 : 1.0.0

🇺🇸English

When to Use This Skill

Design short-term, long-term, and graph-based memory architectures

Use this skill when working with design short-term, long-term, and graph-based memory architectures.

Memory System Design

Memory provides the persistence layer that allows agents to maintain continuity across sessions and reason over accumulated knowledge. Simple agents rely entirely on context for memory, losing all state when sessions end. Sophisticated agents implement layered memory architectures that balance immediate context needs with long-term knowledge retention. The evolution from vector stores to knowledge graphs to temporal knowledge graphs represents increasing investment in structured memory for improved retrieval and reasoning.

When to Use

Activate this skill when:

Building agents that must persist across sessions
Needing to maintain entity consistency across conversations
Implementing reasoning over accumulated knowledge
Designing systems that learn from past interactions
Creating knowledge bases that grow over time
Building temporal-aware systems that track state changes

Core Concepts

Memory exists on a spectrum from immediate context to permanent storage. At one extreme, working memory in the context window provides zero-latency access but vanishes when sessions end. At the other extreme, permanent storage persists indefinitely but requires retrieval to enter context.

Simple vector stores lack relationship and temporal structure. Knowledge graphs preserve relationships for reasoning. Temporal knowledge graphs add validity periods for time-aware queries. Implementation choices depend on query complexity, infrastructure constraints, and accuracy requirements.

Detailed Topics

Memory Architecture Fundamentals

The Context-Memory Spectrum Memory exists on a spectrum from immediate context to permanent storage. At one extreme, working memory in the context window provides zero-latency access but vanishes when sessions end. At the other extreme, permanent storage persists indefinitely but requires retrieval to enter context. Effective architectures use multiple layers along this spectrum.

The spectrum includes working memory (context window, zero latency, volatile), short-term memory (session-persistent, searchable, volatile), long-term memory (cross-session persistent, structured, semi-permanent), and permanent memory (archival, queryable, permanent). Each layer has different latency, capacity, and persistence characteristics.

Why Simple Vector Stores Fall Short Vector RAG provides semantic retrieval by embedding queries and documents in a shared embedding space. Similarity search retrieves the most semantically similar documents. This works well for document retrieval but lacks structure for agent memory.

Vector stores lose relationship information. If an agent learns that "Customer X purchased Product Y on Date Z," a vector store can retrieve this fact if asked directly. But it cannot answer "What products did customers who purchased Product Y also buy?" because relationship structure is not preserved.

Vector stores also struggle with temporal validity. Facts change over time, but vector stores provide no mechanism to distinguish "current fact" from "outdated fact" except through explicit metadata and filtering.

The Move to Graph-Based Memory Knowledge graphs preserve relationships between entities. Instead of isolated document chunks, graphs encode that Entity A has Relationship R to Entity B. This enables queries that traverse relationships rather than just similarity.

Temporal knowledge graphs add validity periods to facts. Each fact has a "valid from" and optionally "valid until" timestamp. This enables time-travel queries that reconstruct knowledge at specific points in time.

Benchmark Performance Comparison The Deep Memory Retrieval (DMR) benchmark provides concrete performance data across memory architectures:

Memory System	DMR Accuracy	Retrieval Latency	Notes
Zep (Temporal KG)	94.8%	2.58s	Best accuracy, fast retrieval
MemGPT	93.4%	Variable	Good general performance
GraphRAG	~75-85%	Variable	20-35% gains over baseline RAG
Vector RAG	~60-70%	Fast	Loses relationship structure
Recursive Summarization	35.3%	Low	Severe information loss

Zep demonstrated 90% reduction in retrieval latency compared to full-context baselines (2.58s vs 28.9s for GPT-5.2). This efficiency comes from retrieving only relevant subgraphs rather than entire context history.

GraphRAG achieves approximately 20-35% accuracy gains over baseline RAG in complex reasoning tasks and reduces hallucination by up to 30% through community-based summarization.

Memory Layer Architecture

Layer 1: Working Memory Working memory is the context window itself. It provides immediate access to information currently being processed but has limited capacity and vanishes when sessions end.

Working memory usage patterns include scratchpad calculations where agents track intermediate results, conversation history that preserves dialogue for current task, current task state that tracks progress on active objectives, and active retrieved documents that hold information currently being used.

Optimize working memory by keeping only active information, summarizing completed work before it falls out of attention, and using attention-favored positions for critical information.

Layer 2: Short-Term Memory Short-term memory persists across the current session but not across sessions. It provides search and retrieval capabilities without the latency of permanent storage.

Common implementations include session-scoped databases that persist until session end, file-system storage in designated session directories, and in-memory caches keyed by session ID.

Short-term memory use cases include tracking conversation state across turns without stuffing context, storing intermediate results from tool calls that may be needed later, maintaining task checklists and progress tracking, and caching retrieved information within sessions.

Layer 3: Long-Term Memory Long-term memory persists across sessions indefinitely. It enables agents to learn from past interactions and build knowledge over time.

Long-term memory implementations range from simple key-value stores to sophisticated graph databases. The choice depends on complexity of relationships to model, query patterns required, and acceptable infrastructure complexity.

Long-term memory use cases include learning user preferences across sessions, building domain knowledge bases that grow over time, maintaining entity registries with relationship history, and storing successful patterns that can be reused.

Layer 4: Entity Memory Entity memory specifically tracks information about entities (people, places, concepts, objects) to maintain consistency. This creates a rudimentary knowledge graph where entities are recognized across multiple interactions.

Entity memory maintains entity identity by tracking that "John Doe" mentioned in one conversation is the same person in another. It maintains entity properties by storing facts discovered about entities over time. It maintains entity relationships by tracking relationships between entities as they are discovered.

Layer 5: Temporal Knowledge Graphs Temporal knowledge graphs extend entity memory with explicit validity periods. Facts are not just true or false but true during specific time ranges.

This enables queries like "What was the user's address on Date X?" by retrieving facts valid during that date range. It prevents context clash when outdated information contradicts new data. It enables temporal reasoning about how entities changed over time.

Memory Implementation Patterns

Pattern 1: File-System-as-Memory The file system itself can serve as a memory layer. This pattern is simple, requires no additional infrastructure, and enables the same just-in-time loading that makes file-system-based context effective.

Implementation uses the file system hierarchy for organization. Use naming conventions that convey meaning. Store facts in structured formats (JSON, YAML). Use timestamps in filenames or metadata for temporal tracking.

Advantages: Simplicity, transparency, portability. Disadvantages: No semantic search, no relationship tracking, manual organization required.

Pattern 2: Vector RAG with Metadata Vector stores enhanced with rich metadata provide semantic search with filtering capabilities.

Implementation embeds facts or documents and stores with metadata including entity tags, temporal validity, source attribution, and confidence scores. Query includes metadata filters alongside semantic search.

Pattern 3: Knowledge Graph Knowledge graphs explicitly model entities and relationships. Implementation defines entity types and relationship types, uses graph database or property graph storage, and maintains indexes for common query patterns.

Pattern 4: Temporal Knowledge Graph Temporal knowledge graphs add validity periods to facts, enabling time-travel queries and preventing context clash from outdated information.

Memory Retrieval Patterns

Semantic Retrieval Retrieve memories semantically similar to current query using embedding similarity search.

Entity-Based Retrieval Retrieve all memories related to specific entities by traversing graph relationships.

Temporal Retrieval Retrieve memories valid at specific time or within time range using validity period filters.

Memory Consolidation

Memories accumulate over time and require consolidation to prevent unbounded growth and remove outdated information.

Consolidation Triggers Trigger consolidation after significant memory accumulation, when retrieval returns too many outdated results, periodically on a schedule, or when explicit consolidation is requested.

Consolidation Process Identify outdated facts, merge related facts, update validity periods, archive or delete obsolete facts, and rebuild indexes.

Practical Guidance

Integration with Context

Memories must integrate with context systems to be useful. Use just-in-time memory loading to retrieve relevant memories when needed. Use strategic injection to place memories in attention-favored positions.

Memory System Selection

Choose memory architecture based on requirements:

Simple persistence needs: File-system memory
Semantic search needs: Vector RAG with metadata
Relationship reasoning needs: Knowledge graph
Temporal validity needs: Temporal knowledge graph

Examples

Example 1: Entity Tracking

# Track entity across conversations
def remember_entity(entity_id, properties):
    memory.store({
        "type": "entity",
        "id": entity_id,
        "properties": properties,
        "last_updated": now()
    })

def get_entity(entity_id):
    return memory.retrieve_entity(entity_id)

Example 2: Temporal Query

# What was the user's address on January 15, 2024?
def query_address_at_time(user_id, query_time):
    return temporal_graph.query("""
        MATCH (user)-[r:LIVES_AT]->(address)
        WHERE user.id = $user_id
        AND r.valid_from <= $query_time
        AND (r.valid_until IS NULL OR r.valid_until > $query_time)
        RETURN address
    """, {"user_id": user_id, "query_time": query_time})

Guidelines

Match memory architecture to query requirements
Implement progressive disclosure for memory access
Use temporal validity to prevent outdated information conflicts
Consolidate memories periodically to prevent unbounded growth
Design for memory retrieval failures gracefully
Consider privacy implications of persistent memory
Implement backup and recovery for critical memories
Monitor memory growth and performance over time

Integration

This skill builds on context-fundamentals. It connects to:

multi-agent-patterns - Shared memory across agents
context-optimization - Memory-based context loading
evaluation - Evaluating memory quality

References

Internal reference:

Implementation Reference - Detailed implementation patterns

Related skills in this collection:

context-fundamentals - Context basics
multi-agent-patterns - Cross-agent memory

External resources:

Graph database documentation (Neo4j, etc.)
Vector store documentation (Pinecone, Weaviate, etc.)
Research on knowledge graphs and reasoning

Skill Metadata

Created : 2025-12-20 Last Updated : 2025-12-20 Author : Agent Skills for Context Engineering Contributors Version : 1.0.0

Weekly Installs

209

Repository

sickn33/antigra…e-skills

GitHub Stars

27.1K

First Seen

Jan 31, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

opencode200

codex198

gemini-cli198

github-copilot196

kimi-cli194

cursor193

AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具

49,000 周安装