AI智能体记忆系统设计指南：从向量存储到时序知识图谱，提升会话持久性与推理能力

memory-systems by crinkj/common-claude-setting

1 周安装量

安装命令

npx skills add https://github.com/crinkj/common-claude-setting --skill memory-systems

🇨🇳中文介绍

记忆系统设计

记忆提供持久化层，使智能体能够在会话间保持连续性，并对积累的知识进行推理。简单智能体完全依赖上下文实现记忆，会话结束时所有状态都会丢失。复杂智能体则采用分层记忆架构，在即时上下文需求和长期知识保留之间取得平衡。从向量存储到知识图谱再到时序知识图谱的演进，代表着对结构化记忆的投入不断增加，以提升检索和推理能力。

何时启用

在以下情况下启用此技能：

构建需要在会话间持久化知识的智能体
在记忆框架（Mem0、Zep/Graphiti、Letta、LangMem、Cognee）之间进行选择
需要在对话中保持实体一致性
实现对积累知识的推理
设计可扩展的生产环境记忆架构
根据基准测试（LoCoMo、LongMemEval、DMR）评估记忆系统
构建具有自动实体/关系提取和自我改进功能的动态记忆（Cognee）

核心概念

记忆涵盖从易失性上下文窗口到持久化存储的整个范围。基准测试的关键启示：工具复杂性不如可靠检索重要——Letta 基于文件系统的智能体使用基本文件操作在 LoCoMo 上获得了 74% 的分数，超过了 Mem0 专用工具的 68.5%。从简单开始，仅在检索质量要求时才添加结构（图谱、时序有效性）。

详细主题

生产框架概览

框架	架构	最佳适用场景	权衡取舍
Mem0	向量存储 + 图谱记忆，可插拔后端	多租户系统，广泛集成	对多智能体场景的专业性较弱
Zep/Graphiti	时序知识图谱，双时序模型	需要关系建模和时序推理的企业场景	高级功能锁定在云端

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

系统	DMR 准确率	LoCoMo	HotPotQA（多跳）	延迟
Cognee	—	—	在 EM、F1、Correctness 上最高	可变
Zep（时序 KG）	94.8%	—	各项指标中等	2.58s
Letta（文件系统）	—	74.0%	—	—
Mem0	—	68.5%	各项指标最低	—
MemGPT	93.4%	—	—	可变
GraphRAG	~75-85%	—	—	可变
向量 RAG 基线	~60-70%	—	—	快

记忆层级（决策点）

层级	持久性	实现方式	何时使用
工作记忆	仅限上下文窗口	系统提示中的便签	始终使用——通过关注度优势位置进行优化
短期记忆	会话范围	文件系统，内存缓存	中间工具结果，对话状态
长期记忆	跨会话	键值存储 → 图数据库	用户偏好，领域知识，实体注册表
实体记忆	跨会话	实体注册表 + 属性	保持身份一致性（“John Doe”在不同对话中是同一个人）
时序知识图谱	跨会话 + 历史	带有有效性区间的图谱	随时间变化的事实，时间旅行查询，防止上下文冲突

策略	使用时机	局限性
语义检索（嵌入相似度）	直接事实查询	在多跳推理上效果下降
基于实体的检索（图遍历）	“告诉我关于 X 的一切”	需要图结构
时序检索（有效性过滤）	随时间变化的事实	需要有效性元数据
混合检索（语义 + 关键词 + 图谱）	最佳整体准确率	基础设施最复杂

Zep 的混合方法通过仅检索相关子图，实现了 90% 的延迟降低（2.58 秒 vs 28.9 秒）。Cognee 通过其 14 种搜索模式实现混合检索——每种模式都结合了其三存储架构（图、向量、关系）中的不同策略，让智能体根据查询类型选择检索策略，而不是使用一刀切的方法。

定期整合记忆以防止无限增长。使其失效但不要丢弃——保留历史对于时序查询很重要。在记忆数量达到阈值、检索质量下降或预定时间间隔时触发整合。有关工作整合代码，请参阅实现参考。

从简单开始，仅在检索失败时增加复杂性。 大多数智能体在第一天并不需要时序知识图谱。

原型阶段：文件系统记忆。将事实存储为带时间戳的结构化 JSON。足以验证智能体行为。
扩展阶段：当需要语义搜索和多租户隔离时，迁移到 Mem0 或带元数据的向量存储。
复杂推理阶段：当需要关系遍历、时序有效性或跨会话综合时，添加 Zep/Graphiti。Graphiti 使用带有通用关系的结构化连接，保持图谱简单且易于推理；Cognee 构建更密集的多层语义图谱，带有详细的关系边——根据您是否需要时序双模型（Graphiti）或更丰富的互联知识结构（Cognee）来选择。
完全控制阶段：当需要智能体通过深度内省进行记忆自我管理时，使用 Letta 或 Cognee。

记忆必须与上下文系统集成才能发挥作用。使用即时记忆加载在需要时检索相关记忆。使用策略性注入将记忆放置在关注度优势位置（上下文开头/结尾）。

空检索结果：回退到更广泛的搜索（移除实体过滤器，扩大时间范围）。如果仍然为空，提示用户澄清。
陈旧结果：检查 valid_until 时间戳。如果大多数结果已过期，在重试前触发整合。
冲突事实：优先选择 valid_from 时间最近的事实。如果置信度低，向用户展示冲突。
存储失败：将写入操作排队等待重试。绝不要让智能体的响应因记忆写入而阻塞。

将所有内容塞入上下文：长输入成本高昂且会降低性能。使用即时检索。
忽略时序有效性：事实会过时。没有有效性跟踪，过时信息会污染上下文。
过早过度设计：一个文件系统智能体可能胜过复杂的记忆工具。在简单方法失败时再增加复杂性。
没有整合策略：无限制的记忆增长会随时间推移降低检索质量。

示例 1：Mem0 集成

from mem0 import Memory

m = Memory()
m.add("User prefers dark mode and Python 3.12", user_id="alice")
m.add("User switched to light mode", user_id="alice")

# 检索当前偏好（浅色模式），而非过时的偏好
results = m.search("What theme does the user prefer?", user_id="alice")

示例 2：时序查询

# 跟踪具有有效期的实体
graph.create_temporal_relationship(
    source_id=user_node,
    rel_type="LIVES_AT",
    target_id=address_node,
    valid_from=datetime(2024, 1, 15),
    valid_until=datetime(2024, 9, 1),  # 搬出
)

# 查询：用户在 2024 年 3 月 1 日住在哪里？
results = graph.query_at_time(
    {"type": "LIVES_AT", "source_label": "User"},
    query_time=datetime(2024, 3, 1)
)

示例 3：Cognee 记忆摄取与搜索

import cognee
from cognee.modules.search.types import SearchType

# 摄取并构建知识图谱
await cognee.add("./docs/")
await cognee.add("any data")
await cognee.cognify()

# 丰富记忆
await cognee.memify()

# 智能体检索具有关系感知的上下文
results = await cognee.search(
    query_text="Any query for your memory",
    query_type=SearchType.GRAPH_COMPLETION,
)

从文件系统记忆开始；仅在检索质量要求时增加复杂性
为任何可能随时间变化的事实跟踪时序有效性
使用混合检索（语义 + 关键词 + 图谱）以获得最佳准确率
定期整合记忆——使其失效但不要丢弃
为检索失败设计：当记忆查找返回空结果时始终有备用方案
考虑持久化记忆的隐私影响（保留策略，删除权）
在更改前后，根据 LoCoMo 或 LongMemEval 对您的记忆系统进行基准测试
在生产环境中监控记忆增长和检索延迟

此技能建立在 context-fundamentals 之上。它连接到：

multi-agent-patterns - 跨智能体的共享记忆
context-optimization - 基于记忆的上下文加载
evaluation - 评估记忆质量

内部参考资料：

实现参考 - 详细的实现模式，生产框架参考资料（Mem0、Graphiti、Cognee）

本集合中的相关技能：

context-fundamentals - 上下文基础
multi-agent-patterns - 跨智能体记忆

Zep 时序知识图谱论文（arXiv:2501.13956）
Mem0 生产架构论文（arXiv:2504.19413）
Cognee 优化知识图谱 + LLM 推理论文（arXiv:2505.24478）
LoCoMo 基准测试（Snap Research）
MemBench 评估框架（ACL 2025）
Graphiti 开源时序 KG 引擎（github.com/getzep/graphiti）
Cognee 开源知识图谱记忆（github.com/topoteretes/cognee）
Cognee 对比：形式与功能 — 图结构对比以及 Mem0、Graphiti、LightRAG、Cognee 在 HotPotQA 上的基准测试

创建日期：2025-12-20 最后更新：2026-02-26 作者：Agent Skills for Context Engineering Contributors 版本：3.0.0

🇺🇸English

Memory System Design

Memory provides the persistence layer that allows agents to maintain continuity across sessions and reason over accumulated knowledge. Simple agents rely entirely on context for memory, losing all state when sessions end. Sophisticated agents implement layered memory architectures that balance immediate context needs with long-term knowledge retention. The evolution from vector stores to knowledge graphs to temporal knowledge graphs represents increasing investment in structured memory for improved retrieval and reasoning.

When to Activate

Activate this skill when:

Building agents that must persist knowledge across sessions
Choosing between memory frameworks (Mem0, Zep/Graphiti, Letta, LangMem, Cognee)
Needing to maintain entity consistency across conversations
Implementing reasoning over accumulated knowledge
Designing memory architectures that scale in production
Evaluating memory systems against benchmarks (LoCoMo, LongMemEval, DMR)
Building dynamic memory with automatic entity/relationship extraction and self-improving(Cognee)

Core Concepts

Memory spans a spectrum from volatile context window to persistent storage. Key insight from benchmarks: tool complexity matters less than reliable retrieval — Letta's filesystem agents scored 74% on LoCoMo using basic file operations, beating Mem0's specialized tools at 68.5%. Start simple, add structure (graphs, temporal validity) only when retrieval quality demands it.

Detailed Topics

Production Framework Landscape

Framework	Architecture	Best For	Trade-off
Mem0	Vector store + graph memory, pluggable backends	Multi-tenant systems, broad integrations	Less specialized for multi-agent
Zep/Graphiti	Temporal knowledge graph, bi-temporal model	Enterprise requiring relationship modeling + temporal reasoning	Advanced features cloud-locked
Letta	Self-editing memory with tiered storage (in-context/core/archival)	Full agent introspection, stateful services	Complexity for simple use cases
Cognee	Multi-layer semantic graph via customizable ECL pipeline with customizable Tasks	Evolving agent memory that adapts and learns; multi-hop reasoning	Heavier ingest-time processing
LangMem	Memory tools for LangGraph workflows	Teams already on LangGraph	Tightly coupled to LangGraph
File-system	Plain files with naming conventions	Simple agents, prototyping	No semantic search, no relationships

Zep's Graphiti engine builds a three-tier knowledge graph (episode, semantic entity, community subgraphs) with a bi-temporal model tracking both when events occurred and when they were ingested. Mem0 offers the fastest path to production with managed infrastructure. Letta provides the deepest agent control through its Agent Development Environment. Cognee produces multi-layer semantic graphs — it layers text chunks and entity types as nodes with detailed relationship edges, building interconnected knowledge engine. Every core piece (ingestion, entity extraction, post-processing, retrieval) is customizable.

Benchmark Performance Comparison

System	DMR Accuracy	LoCoMo	HotPotQA (multi-hop)	Latency
Cognee	—	—	Highest on EM, F1, Correctness	Variable
Zep (Temporal KG)	94.8%	—	Mid-range across metrics	2.58s
Letta (filesystem)	—	74.0%	—	—
Mem0	—	68.5%	Lowest across metrics	—
MemGPT	93.4%	—	—	Variable
GraphRAG	~75-85%	—

Zep achieves up to 18.5% accuracy improvement on LongMemEval while reducing latency by 90%. Cognee outperformed Mem0, Graphiti, and LightRAG on HotPotQA multi-hop reasoning benchmarks across Exact Match, F1, and human-like correctness metrics. Letta's filesystem-based agents achieved 74% on LoCoMo using basic file operations, outperforming specialized memory tools — tool complexity matters less than reliable retrieval. No single benchmark is definitive; treat these as signals for specific retrieval dimensions rather than rankings.

Memory Layers (Decision Points)

Layer	Persistence	Implementation	When to Use
Working	Context window only	Scratchpad in system prompt	Always — optimize with attention-favored positions
Short-term	Session-scoped	File-system, in-memory cache	Intermediate tool results, conversation state
Long-term	Cross-session	Key-value store → graph DB	User preferences, domain knowledge, entity registries
Entity	Cross-session	Entity registry + properties	Maintaining identity ("John Doe" = same person across conversations)
Temporal KG	Cross-session + history	Graph with validity intervals	Facts that change over time, time-travel queries, preventing context clash

Retrieval Strategies

Strategy	Use When	Limitation
Semantic (embedding similarity)	Direct factual queries	Degrades on multi-hop reasoning
Entity-based (graph traversal)	"Tell me everything about X"	Requires graph structure
Temporal (validity filter)	Facts change over time	Requires validity metadata
Hybrid (semantic + keyword + graph)	Best overall accuracy	Most infrastructure

Zep's hybrid approach achieves 90% latency reduction (2.58s vs 28.9s) by retrieving only relevant subgraphs. Cognee implements hybrid retrieval through its 14 search modes — each mode combines different strategies from its three-store architecture (graph, vector, relational), letting agents select the retrieval strategy that fits the query type rather than using a one-size-fits-all approach.

Memory Consolidation

Consolidate periodically to prevent unbounded growth. Invalidate but don't discard — preserving history matters for temporal queries. Trigger on memory count thresholds, degraded retrieval quality, or scheduled intervals. See implementation reference for working consolidation code.

Practical Guidance

Choosing a Memory Architecture

Start simple, add complexity only when retrieval fails. Most agents don't need a temporal knowledge graph on day one.

Prototype : File-system memory. Store facts as structured JSON with timestamps. Good enough to validate agent behavior.
Scale : Move to Mem0 or vector store with metadata when you need semantic search and multi-tenant isolation.
Complex reasoning : Add Zep/Graphiti when you need relationship traversal, temporal validity, or cross-session synthesis. Graphiti uses structured ties with generic relations, keeping graphs simple and easy to reason about; Cognee builds denser multi-layer semantic graphs with detailed relationship edges — choose based on whether you need temporal bi-modeling (Graphiti) or richer interconnected knowledge structures (Cognee).
Full control : Use Letta or Cognee when you need agent self-management of memory with deep introspection.

Integration with Context

Memories must integrate with context systems to be useful. Use just-in-time memory loading to retrieve relevant memories when needed. Use strategic injection to place memories in attention-favored positions (beginning/end of context).

Error Recovery

Empty retrieval : Fall back to broader search (remove entity filter, widen time range). If still empty, prompt user for clarification.
Stale results : Check valid_until timestamps. If most results are expired, trigger consolidation before retrying.
Conflicting facts : Prefer the fact with the most recent valid_from. Surface the conflict to the user if confidence is low.
Storage failure : Queue writes for retry. Never block the agent's response on a memory write.

Anti-Patterns

Stuffing everything into context : Long inputs are expensive and degrade performance. Use just-in-time retrieval.
Ignoring temporal validity : Facts go stale. Without validity tracking, outdated information poisons context.
Over-engineering early : A filesystem agent can outperform complex memory tooling. Add sophistication when simple approaches fail.
No consolidation strategy : Unbounded memory growth degrades retrieval quality over time.

Examples

Example 1: Mem0 Integration

from mem0 import Memory

m = Memory()
m.add("User prefers dark mode and Python 3.12", user_id="alice")
m.add("User switched to light mode", user_id="alice")

# Retrieves current preference (light mode), not outdated one
results = m.search("What theme does the user prefer?", user_id="alice")

Example 2: Temporal Query

# Track entity with validity periods
graph.create_temporal_relationship(
    source_id=user_node,
    rel_type="LIVES_AT",
    target_id=address_node,
    valid_from=datetime(2024, 1, 15),
    valid_until=datetime(2024, 9, 1),  # moved out
)

# Query: Where did user live on March 1, 2024?
results = graph.query_at_time(
    {"type": "LIVES_AT", "source_label": "User"},
    query_time=datetime(2024, 3, 1)
)

Example 3: Cognee Memory Ingestion and Search

import cognee
from cognee.modules.search.types import SearchType

# Ingest and build knowledge graph
await cognee.add("./docs/")
await cognee.add("any data")
await cognee.cognify()

# Enrich memory 
await cognee.memify()

# Agent retrieves relationship-aware context
results = await cognee.search(
    query_text="Any query for your memory",
    query_type=SearchType.GRAPH_COMPLETION,
)

Guidelines

Start with file-system memory; add complexity only when retrieval quality demands it
Track temporal validity for any fact that can change over time
Use hybrid retrieval (semantic + keyword + graph) for best accuracy
Consolidate memories periodically — invalidate but don't discard
Design for retrieval failure: always have a fallback when memory lookup returns nothing
Consider privacy implications of persistent memory (retention policies, deletion rights)
Benchmark your memory system against LoCoMo or LongMemEval before and after changes
Monitor memory growth and retrieval latency in production

Integration

This skill builds on context-fundamentals. It connects to:

multi-agent-patterns - Shared memory across agents
context-optimization - Memory-based context loading
evaluation - Evaluating memory quality

References

Internal references:

Implementation Reference - Detailed implementation patterns, production framework references (Mem0, Graphiti, Cognee)

Related skills in this collection:

context-fundamentals - Context basics
multi-agent-patterns - Cross-agent memory

External resources:

Zep temporal knowledge graph paper (arXiv:2501.13956)
Mem0 production architecture paper (arXiv:2504.19413)
Cognee optimized knowledge graph + LLM reasoning paper (arXiv:2505.24478)
LoCoMo benchmark (Snap Research)
MemBench evaluation framework (ACL 2025)
Graphiti open-source temporal KG engine (github.com/getzep/graphiti)
Cognee open-source knowledge graph memory (github.com/topoteretes/cognee)
Cognee comparison: Form vs Function — graph structure comparison and HotPotQA benchmarks across Mem0, Graphiti, LightRAG, Cognee

Skill Metadata

Created : 2025-12-20 Last Updated : 2026-02-26 Author : Agent Skills for Context Engineering Contributors Version : 3.0.0

Weekly Installs

Repository

crinkj/common-c…-setting

First Seen

Today

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

zencoder1

amp1

cline1

openclaw1

opencode1

cursor1

超能力技能使用指南：AI助手技能调用优先级与工作流程详解

45,100 周安装