上下文压缩策略：优化AI智能体对话历史管理，降低任务令牌总成本

context-compression by sickn33/antigravity-awesome-skills

323 周安装量

27,100 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill context-compression

AI/机器学习开发自动化

🇨🇳中文介绍

上下文压缩策略

当智能体会话生成数百万令牌的对话历史时，压缩变得势在必行。简单粗暴的方法是进行激进压缩，以最小化每次请求的令牌数。正确的优化目标是任务令牌数：完成一项任务所消耗的总令牌数，包括因压缩丢失关键信息而产生的重新获取成本。

何时使用

在以下情况下启用此技能：

智能体会话超出上下文窗口限制
代码库超出上下文窗口（500万+令牌的系统）
设计对话摘要策略
调试智能体“忘记”其修改了哪些文件的情况
构建压缩质量评估框架

核心概念

上下文压缩是在节省令牌和丢失信息之间进行权衡。存在三种可用于生产环境的方法：

锚定迭代摘要：维护结构化的、持久的摘要，明确包含会话意图、文件修改、决策和后续步骤等部分。当触发压缩时，仅总结新截断的片段，并将其与现有摘要合并。通过为特定类型的信息设立专门部分，这种结构强制保留了信息。
不透明压缩：生成优化重构保真度的压缩表示。实现最高的压缩率（99%以上），但牺牲了可解释性。无法验证保留了哪些内容。
再生式完整摘要：每次压缩时生成详细的结构化摘要。产生可读的输出，但由于是完全再生而非增量合并，在多次压缩循环中可能会丢失细节。

关键见解：结构强制保留。专门的部分充当了摘要器必须填写的检查清单，防止信息无声地漂移。

详细主题

为何任务令牌数很重要

传统的压缩指标以每次请求的令牌数为目标。这是错误的优化方向。当压缩丢失了文件路径或错误消息等关键细节时，智能体必须重新获取信息、重新探索方法，并浪费令牌来恢复上下文。

正确的指标是任务令牌数：从任务开始到完成所消耗的总令牌数。一个节省了0.5%更多令牌但导致重新获取成本增加20%的压缩策略，总体成本更高。

工件追踪问题

在所有压缩方法中，工件追踪完整性是最薄弱的维度，在评估中得分仅为2.2-2.5（满分5.0）。即使是带有明确文件部分的结构化摘要，也难以在长会话中保持完整的文件跟踪。

编码智能体需要知道：

创建了哪些文件
修改了哪些文件以及更改了什么
读取了哪些文件但未更改
函数名、变量名、错误消息

这个问题可能需要超越通用摘要的特殊处理：一个单独的工件索引或在智能体脚手架中明确的文件状态跟踪。

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

结构化摘要部分

有效的结构化摘要包含明确的部分：

## Session Intent
[用户试图完成的目标]

## Files Modified
- auth.controller.ts: Fixed JWT token generation
- config/redis.ts: Updated connection pooling
- tests/auth.test.ts: Added mock setup for new config

## Decisions Made
- Using Redis connection pool instead of per-request connections
- Retry logic with exponential backoff for transient failures

## Current State
- 14 tests passing, 2 failing
- Remaining: mock setup for session service tests

## Next Steps
1. Fix remaining test failures
2. Run full test suite
3. Update documentation

这种结构可以防止文件路径或决策的无声丢失，因为每个部分都必须被明确处理。

何时触发压缩与如何压缩同样重要：

策略	触发点	权衡
固定阈值	70-80% 上下文使用率	简单，但可能压缩过早
滑动窗口	保留最后 N 轮对话 + 摘要	可预测的上下文大小
基于重要性	先压缩低相关性部分	复杂，但保留信号
任务边界	在逻辑任务完成时压缩	摘要清晰，但时机不可预测

对于大多数编码智能体用例，采用结构化摘要的滑动窗口方法在可预测性和质量之间提供了最佳平衡。

基于探针的评估

像 ROUGE 或嵌入相似度这样的传统指标无法捕捉功能性压缩质量。一个摘要可能在词汇重叠上得分很高，却遗漏了智能体需要的那一个文件路径。

基于探针的评估通过在压缩后提问来直接衡量功能性质量：

探针类型	测试内容	示例问题
回忆	事实保留	“原始错误信息是什么？”
工件	文件跟踪	“我们修改了哪些文件？”
延续	任务规划	“我们接下来应该做什么？”
决策	推理链	“关于 Redis 问题我们做出了什么决定？”

如果压缩保留了正确的信息，智能体就能正确回答。如果没有，它就会猜测或产生幻觉。

六个维度捕捉了编码智能体的压缩质量：

准确性：技术细节是否正确？文件路径、函数名、错误代码。
上下文感知：响应是否反映了当前的对话状态？
工件追踪：智能体是否知道读取或修改了哪些文件？
完整性：响应是否解决了问题的所有部分？
连续性：能否在不重新获取信息的情况下继续工作？
指令遵循：响应是否尊重了规定的约束？

准确性在不同压缩方法之间显示出最大的差异（0.6 分的差距）。工件追踪普遍较弱（2.2-2.5 的范围）。

三阶段压缩工作流

对于超出上下文窗口的大型代码库或智能体系统，通过三个阶段应用压缩：

研究阶段：从架构图、文档和关键接口生成一份研究文档。将探索压缩成对组件和依赖关系的结构化分析。输出：单一研究文档。
规划阶段：将研究转化为包含函数签名、类型定义和数据流的实现规范。一个 500 万令牌的代码库可以压缩成大约 2000 字的规范。
实施阶段：根据规范执行。上下文保持专注于规范，而不是原始的代码库探索。

使用示例工件作为种子

当提供手动迁移示例或参考 PR 时，将其用作模板来理解目标模式。该示例揭示了静态分析无法浮现的约束：哪些不变量必须保持，哪些服务会在更改时中断，以及一个干净的迁移是什么样子的。

当智能体无法区分本质复杂性（业务需求）和偶然复杂性（遗留变通方法）时，这一点尤其重要。示例工件编码了这种区别。

实施锚定迭代摘要

定义与智能体需求匹配的明确摘要部分
在首次触发压缩时，将截断的历史总结到各个部分中
在后续压缩中，仅总结新截断的内容
将新摘要合并到现有部分中，而不是重新生成
为调试目的，跟踪哪些信息来自哪个压缩周期

何时使用每种方法

在以下情况下使用锚定迭代摘要：

会话是长期运行的（100+ 条消息）
文件跟踪很重要（编码、调试）
你需要验证保留了哪些内容

在以下情况下使用不透明压缩：

需要最大限度的令牌节省
会话相对较短
重新获取成本较低

在以下情况下使用再生式摘要：

摘要的可解释性至关重要
会话有清晰的阶段边界
每次压缩时进行完整的上下文审查是可以接受的

方法	压缩率	质量得分	权衡
锚定迭代	98.6%	3.70	质量最佳，压缩率稍低
再生式	98.7%	3.44	质量良好，压缩率中等
不透明	99.3%	3.35	压缩率最佳，质量损失

结构化摘要保留的额外 0.7% 令牌换来了 0.35 的质量分。对于任何重新获取成本很重要的任务，这种权衡倾向于结构化方法。

示例 1：调试会话压缩

原始上下文（89,000 令牌，178 条消息）：

/api/auth/login 端点出现 401 错误
追踪了身份验证控制器、中间件、会话存储
发现 Redis 连接陈旧
修复了连接池，添加了重试逻辑
14 个测试通过，2 个失败

压缩后的结构化摘要：

## Session Intent
Debug 401 Unauthorized error on /api/auth/login despite valid credentials.

## Root Cause
Stale Redis connection in session store. JWT generated correctly but session could not be persisted.

## Files Modified
- auth.controller.ts: No changes (read only)
- middleware/cors.ts: No changes (examined)
- config/redis.ts: Fixed connection pooling configuration
- services/session.service.ts: Added retry logic for transient failures
- tests/auth.test.ts: Updated mock setup

## Test Status
14 passing, 2 failing (mock setup issues)

## Next Steps
1. Fix remaining test failures (mock session service)
2. Run full test suite
3. Deploy to staging

示例 2：探针响应质量

压缩后，询问“原始错误是什么？”：

良好响应（结构化摘要）：

“原始错误是来自 /api/auth/login 端点的 401 未授权响应。用户在使用有效凭据时收到此错误。根本原因是会话存储中的 Redis 连接陈旧。”

较差响应（激进压缩）：

“我们正在调试一个身份验证问题。登录失败。我们修复了一些配置问题。”

结构化响应保留了端点、错误代码和根本原因。激进响应丢失了所有技术细节。

优化任务令牌数，而非每次请求的令牌数
使用带有明确文件跟踪部分的结构化摘要
在上下文使用率达到 70-80% 时触发压缩
实施增量合并而非完全再生
使用基于探针的评估测试压缩质量
如果文件跟踪至关重要，则单独跟踪工件轨迹
接受稍低的压缩率以获得更好的质量保留
监控重新获取频率作为压缩质量的信号

此技能与本集合中的其他几个技能相关联：

context-degradation - 压缩是应对性能下降的缓解策略
context-optimization - 压缩是众多优化技术之一
evaluation - 基于探针的评估适用于压缩测试
memory-systems - 压缩与便笺和摘要内存模式相关

评估框架参考 - 详细的探针类型和评分标准

本集合中的相关技能：

context-degradation - 理解压缩能防止什么
context-optimization - 更广泛的优化策略
evaluation - 构建评估框架

Factory Research: Evaluating Context Compression for AI Agents (December 2025)
Research on LLM-as-judge evaluation methodology (Zheng et al., 2023)
Netflix Engineering: "The Infinite Software Crisis" - Three-phase workflow and context compression at scale (AI Summit 2025)

创建日期 : 2025-12-22 最后更新 : 2025-12-26 作者 : Agent Skills for Context Engineering Contributors 版本 : 1.1.0

🇺🇸English

Context Compression Strategies

When agent sessions generate millions of tokens of conversation history, compression becomes mandatory. The naive approach is aggressive compression to minimize tokens per request. The correct optimization target is tokens per task: total tokens consumed to complete a task, including re-fetching costs when compression loses critical information.

When to Use

Activate this skill when:

Agent sessions exceed context window limits
Codebases exceed context windows (5M+ token systems)
Designing conversation summarization strategies
Debugging cases where agents "forget" what files they modified
Building evaluation frameworks for compression quality

Core Concepts

Context compression trades token savings against information loss. Three production-ready approaches exist:

Anchored Iterative Summarization : Maintain structured, persistent summaries with explicit sections for session intent, file modifications, decisions, and next steps. When compression triggers, summarize only the newly-truncated span and merge with the existing summary. Structure forces preservation by dedicating sections to specific information types.
Opaque Compression : Produce compressed representations optimized for reconstruction fidelity. Achieves highest compression ratios (99%+) but sacrifices interpretability. Cannot verify what was preserved.
Regenerative Full Summary : Generate detailed structured summaries on each compression. Produces readable output but may lose details across repeated compression cycles due to full regeneration rather than incremental merging.

The critical insight: structure forces preservation. Dedicated sections act as checklists that the summarizer must populate, preventing silent information drift.

Detailed Topics

Why Tokens-Per-Task Matters

Traditional compression metrics target tokens-per-request. This is the wrong optimization. When compression loses critical details like file paths or error messages, the agent must re-fetch information, re-explore approaches, and waste tokens recovering context.

The right metric is tokens-per-task: total tokens consumed from task start to completion. A compression strategy saving 0.5% more tokens but causing 20% more re-fetching costs more overall.

The Artifact Trail Problem

Artifact trail integrity is the weakest dimension across all compression methods, scoring 2.2-2.5 out of 5.0 in evaluations. Even structured summarization with explicit file sections struggles to maintain complete file tracking across long sessions.

Coding agents need to know:

Which files were created
Which files were modified and what changed
Which files were read but not changed
Function names, variable names, error messages

This problem likely requires specialized handling beyond general summarization: a separate artifact index or explicit file-state tracking in agent scaffolding.

Structured Summary Sections

Effective structured summaries include explicit sections:

## Session Intent
[What the user is trying to accomplish]

## Files Modified
- auth.controller.ts: Fixed JWT token generation
- config/redis.ts: Updated connection pooling
- tests/auth.test.ts: Added mock setup for new config

## Decisions Made
- Using Redis connection pool instead of per-request connections
- Retry logic with exponential backoff for transient failures

## Current State
- 14 tests passing, 2 failing
- Remaining: mock setup for session service tests

## Next Steps
1. Fix remaining test failures
2. Run full test suite
3. Update documentation

This structure prevents silent loss of file paths or decisions because each section must be explicitly addressed.

Compression Trigger Strategies

When to trigger compression matters as much as how to compress:

Strategy	Trigger Point	Trade-off
Fixed threshold	70-80% context utilization	Simple but may compress too early
Sliding window	Keep last N turns + summary	Predictable context size
Importance-based	Compress low-relevance sections first	Complex but preserves signal
Task-boundary	Compress at logical task completions	Clean summaries but unpredictable timing

The sliding window approach with structured summaries provides the best balance of predictability and quality for most coding agent use cases.

Probe-Based Evaluation

Traditional metrics like ROUGE or embedding similarity fail to capture functional compression quality. A summary may score high on lexical overlap while missing the one file path the agent needs.

Probe-based evaluation directly measures functional quality by asking questions after compression:

Probe Type	What It Tests	Example Question
Recall	Factual retention	"What was the original error message?"
Artifact	File tracking	"Which files have we modified?"
Continuation	Task planning	"What should we do next?"
Decision	Reasoning chain	"What did we decide about the Redis issue?"

If compression preserved the right information, the agent answers correctly. If not, it guesses or hallucinates.

Evaluation Dimensions

Six dimensions capture compression quality for coding agents:

Accuracy : Are technical details correct? File paths, function names, error codes.
Context Awareness : Does the response reflect current conversation state?
Artifact Trail : Does the agent know which files were read or modified?
Completeness : Does the response address all parts of the question?
Continuity : Can work continue without re-fetching information?
Instruction Following : Does the response respect stated constraints?

Accuracy shows the largest variation between compression methods (0.6 point gap). Artifact trail is universally weak (2.2-2.5 range).

Practical Guidance

Three-Phase Compression Workflow

For large codebases or agent systems exceeding context windows, apply compression through three phases:

Research Phase : Produce a research document from architecture diagrams, documentation, and key interfaces. Compress exploration into a structured analysis of components and dependencies. Output: single research document.
Planning Phase : Convert research into implementation specification with function signatures, type definitions, and data flow. A 5M token codebase compresses to approximately 2,000 words of specification.
Implementation Phase : Execute against the specification. Context remains focused on the spec rather than raw codebase exploration.

Using Example Artifacts as Seeds

When provided with a manual migration example or reference PR, use it as a template to understand the target pattern. The example reveals constraints that static analysis cannot surface: which invariants must hold, which services break on changes, and what a clean migration looks like.

This is particularly important when the agent cannot distinguish essential complexity (business requirements) from accidental complexity (legacy workarounds). The example artifact encodes that distinction.

Implementing Anchored Iterative Summarization

Define explicit summary sections matching your agent's needs
On first compression trigger, summarize truncated history into sections
On subsequent compressions, summarize only new truncated content
Merge new summary into existing sections rather than regenerating
Track which information came from which compression cycle for debugging

When to Use Each Approach

Use anchored iterative summarization when:

Sessions are long-running (100+ messages)
File tracking matters (coding, debugging)
You need to verify what was preserved

Use opaque compression when:

Maximum token savings required
Sessions are relatively short
Re-fetching costs are low

Use regenerative summaries when:

Summary interpretability is critical
Sessions have clear phase boundaries
Full context review is acceptable on each compression

Compression Ratio Considerations

Method	Compression Ratio	Quality Score	Trade-off
Anchored Iterative	98.6%	3.70	Best quality, slightly less compression
Regenerative	98.7%	3.44	Good quality, moderate compression
Opaque	99.3%	3.35	Best compression, quality loss

The 0.7% additional tokens retained by structured summarization buys 0.35 quality points. For any task where re-fetching costs matter, this trade-off favors structured approaches.

Examples

Example 1: Debugging Session Compression

Original context (89,000 tokens, 178 messages):

401 error on /api/auth/login endpoint
Traced through auth controller, middleware, session store
Found stale Redis connection
Fixed connection pooling, added retry logic
14 tests passing, 2 failing

Structured summary after compression:

## Session Intent
Debug 401 Unauthorized error on /api/auth/login despite valid credentials.

## Root Cause
Stale Redis connection in session store. JWT generated correctly but session could not be persisted.

## Files Modified
- auth.controller.ts: No changes (read only)
- middleware/cors.ts: No changes (examined)
- config/redis.ts: Fixed connection pooling configuration
- services/session.service.ts: Added retry logic for transient failures
- tests/auth.test.ts: Updated mock setup

## Test Status
14 passing, 2 failing (mock setup issues)

## Next Steps
1. Fix remaining test failures (mock session service)
2. Run full test suite
3. Deploy to staging

Example 2: Probe Response Quality

After compression, asking "What was the original error?":

Good response (structured summarization):

"The original error was a 401 Unauthorized response from the /api/auth/login endpoint. Users received this error with valid credentials. Root cause was stale Redis connection in session store."

Poor response (aggressive compression):

"We were debugging an authentication issue. The login was failing. We fixed some configuration problems."

The structured response preserves endpoint, error code, and root cause. The aggressive response loses all technical detail.

Guidelines

Optimize for tokens-per-task, not tokens-per-request
Use structured summaries with explicit sections for file tracking
Trigger compression at 70-80% context utilization
Implement incremental merging rather than full regeneration
Test compression quality with probe-based evaluation
Track artifact trail separately if file tracking is critical
Accept slightly lower compression ratios for better quality retention
Monitor re-fetching frequency as a compression quality signal

Integration

This skill connects to several others in the collection:

context-degradation - Compression is a mitigation strategy for degradation
context-optimization - Compression is one optimization technique among many
evaluation - Probe-based evaluation applies to compression testing
memory-systems - Compression relates to scratchpad and summary memory patterns

References

Internal reference:

Evaluation Framework Reference - Detailed probe types and scoring rubrics

Related skills in this collection:

context-degradation - Understanding what compression prevents
context-optimization - Broader optimization strategies
evaluation - Building evaluation frameworks

External resources:

Factory Research: Evaluating Context Compression for AI Agents (December 2025)
Research on LLM-as-judge evaluation methodology (Zheng et al., 2023)
Netflix Engineering: "The Infinite Software Crisis" - Three-phase workflow and context compression at scale (AI Summit 2025)

Skill Metadata

Created : 2025-12-22 Last Updated : 2025-12-26 Author : Agent Skills for Context Engineering Contributors Version : 1.1.0

Weekly Installs

323

Repository

sickn33/antigra…e-skills

GitHub Stars

27.1K

First Seen

Feb 1, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

gemini-cli305

opencode305

codex305

github-copilot302

kimi-cli301

amp299

agent-browser 浏览器自动化工具 - Vercel Labs 命令行网页操作与测试

138,300 周安装