npx skills add https://github.com/crinkj/common-claude-setting --skill context-optimization上下文优化通过策略性的压缩、掩蔽、缓存和分区,扩展了有限上下文窗口的有效容量。目标并非神奇地增加上下文窗口,而是更好地利用可用容量。有效的优化可以使有效上下文容量翻倍或增至三倍,而无需更大的模型或更长的上下文。
在以下情况下激活此技能:
上下文优化通过四种主要策略扩展有效容量:压缩(在接近限制时总结上下文)、观察掩蔽(用引用替换冗长的输出)、KV缓存优化(重用缓存的计算)和上下文分区(将工作拆分到隔离的上下文中)。
关键洞察在于上下文质量比数量更重要。优化在减少噪声的同时保留了信号。艺术在于选择保留什么与丢弃什么,以及何时应用每种技术。
什么是压缩 压缩是在接近上下文限制时总结上下文内容,然后用摘要重新初始化新上下文窗口的做法。这以高保真方式提炼上下文窗口的内容,使智能体能够以最小的性能下降继续工作。
压缩通常是上下文优化的首要手段。艺术在于选择保留什么与丢弃什么。
压缩实现 压缩通过识别可以压缩的部分、生成捕获要点的摘要并用摘要替换完整内容来工作。压缩的优先级依次为:工具输出(用摘要替换)、旧对话轮次(总结早期对话)、检索到的文档(如果存在较新版本则总结),并且永不压缩系统提示。
摘要生成 有效的摘要根据消息类型保留不同的元素:
工具输出:保留关键发现、指标和结论。删除冗长的原始输出。
对话轮次:保留关键决策、承诺和上下文转换。删除填充内容和来回对话。
检索到的文档:保留关键事实和主张。删除支持性证据和详细阐述。
观察问题 工具输出可能占智能体轨迹中 token 使用量的 80% 以上。其中大部分是冗长的输出,其目的已经达到。一旦智能体使用工具输出来做出决策,保留完整输出提供的价值递减,同时消耗大量上下文。
观察掩蔽用紧凑的引用替换冗长的工具输出。信息在需要时仍然可以访问,但不会持续消耗上下文。
掩蔽策略选择 并非所有观察都应同等掩蔽:
永不掩蔽:对当前任务至关重要的观察、最近一轮的观察、用于主动推理的观察。
考虑掩蔽:3 轮以前的观察、可以提取要点的冗长输出、目的已实现的观察。
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
始终掩蔽:重复的输出、样板文件头/尾、对话中已总结的输出。
理解 KV 缓存 KV 缓存存储推理期间计算的键和值张量,其大小随序列长度线性增长。在共享相同前缀的请求之间缓存 KV 缓存可以避免重新计算。
前缀缓存通过基于哈希的块匹配,在具有相同前缀的请求之间重用 KV 块。这极大地降低了具有公共前缀(如系统提示)的请求的成本和延迟。
缓存优化模式 通过重新排序上下文元素以最大化缓存命中率来优化缓存。首先放置稳定元素(系统提示、工具定义),然后是频繁重用的元素,最后是唯一元素。
设计提示以最大化缓存稳定性:避免动态内容(如时间戳),使用一致的格式,保持会话间结构稳定。
子智能体分区 上下文优化最激进的形式是将工作分配到具有隔离上下文的子智能体之间。每个子智能体在一个专注于其子任务的干净上下文中运行,而不携带来自其他子任务的累积上下文。
这种方法实现了关注点分离——详细的搜索上下文保持在子智能体内隔离,而协调器专注于综合和分析。
结果聚合 通过验证所有分区已完成、合并兼容的结果以及在仍然太大时进行总结,来聚合来自分区子任务的结果。
上下文预算分配 设计明确的上下文预算。将 token 分配到类别:系统提示、工具定义、检索到的文档、消息历史和预留缓冲区。根据预算监控使用情况,并在接近限制时触发优化。
基于触发的优化 监控优化触发信号:token 利用率超过 80%、性能下降指标和性能下降。根据上下文组成应用适当的优化技术。
何时优化:
应用什么:
压缩应实现 50-70% 的 token 减少,且质量下降小于 5%。掩蔽应在被掩蔽的观察中实现 60-80% 的减少。缓存优化对于稳定工作负载应实现 70% 以上的命中率。
根据测量的有效性监控并迭代优化策略。
示例 1:压缩触发
if context_tokens / context_limit > 0.8:
context = compact_context(context)
示例 2:观察掩蔽
if len(observation) > max_length:
ref_id = store_observation(observation)
return f"[Obs:{ref_id} elided. Key: {extract_key(observation)}]"
示例 3:缓存友好排序
# 稳定内容在前
context = [system_prompt, tool_definitions] # 可缓存
context += [reused_templates] # 可重用
context += [unique_content] # 唯一
此技能建立在 context-fundamentals 和 context-degradation 之上。它关联到:
内部参考:
本集合中的相关技能:
外部资源:
创建日期 : 2025-12-20 最后更新 : 2025-12-20 作者 : Agent Skills for Context Engineering Contributors 版本 : 1.0.0
每周安装数
1
代码库
首次出现
今天
安全审计
安装于
zencoder1
amp1
cline1
openclaw1
opencode1
cursor1
Context optimization extends the effective capacity of limited context windows through strategic compression, masking, caching, and partitioning. The goal is not to magically increase context windows but to make better use of available capacity. Effective optimization can double or triple effective context capacity without requiring larger models or longer contexts.
Activate this skill when:
Context optimization extends effective capacity through four primary strategies: compaction (summarizing context near limits), observation masking (replacing verbose outputs with references), KV-cache optimization (reusing cached computations), and context partitioning (splitting work across isolated contexts).
The key insight is that context quality matters more than quantity. Optimization preserves signal while reducing noise. The art lies in selecting what to keep versus what to discard, and when to apply each technique.
What is Compaction Compaction is the practice of summarizing context contents when approaching limits, then reinitializing a new context window with the summary. This distills the contents of a context window in a high-fidelity manner, enabling the agent to continue with minimal performance degradation.
Compaction typically serves as the first lever in context optimization. The art lies in selecting what to keep versus what to discard.
Compaction Implementation Compaction works by identifying sections that can be compressed, generating summaries that capture essential points, and replacing full content with summaries. Priority for compression goes to tool outputs (replace with summaries), old turns (summarize early conversation), retrieved docs (summarize if recent versions exist), and never compress system prompt.
Summary Generation Effective summaries preserve different elements depending on message type:
Tool outputs: Preserve key findings, metrics, and conclusions. Remove verbose raw output.
Conversational turns: Preserve key decisions, commitments, and context shifts. Remove filler and back-and-forth.
Retrieved documents: Preserve key facts and claims. Remove supporting evidence and elaboration.
The Observation Problem Tool outputs can comprise 80%+ of token usage in agent trajectories. Much of this is verbose output that has already served its purpose. Once an agent has used a tool output to make a decision, keeping the full output provides diminishing value while consuming significant context.
Observation masking replaces verbose tool outputs with compact references. The information remains accessible if needed but does not consume context continuously.
Masking Strategy Selection Not all observations should be masked equally:
Never mask: Observations critical to current task, observations from the most recent turn, observations used in active reasoning.
Consider masking: Observations from 3+ turns ago, verbose outputs with key points extractable, observations whose purpose has been served.
Always mask: Repeated outputs, boilerplate headers/footers, outputs already summarized in conversation.
Understanding KV-Cache The KV-cache stores Key and Value tensors computed during inference, growing linearly with sequence length. Caching the KV-cache across requests sharing identical prefixes avoids recomputation.
Prefix caching reuses KV blocks across requests with identical prefixes using hash-based block matching. This dramatically reduces cost and latency for requests with common prefixes like system prompts.
Cache Optimization Patterns Optimize for caching by reordering context elements to maximize cache hits. Place stable elements first (system prompt, tool definitions), then frequently reused elements, then unique elements last.
Design prompts to maximize cache stability: avoid dynamic content like timestamps, use consistent formatting, keep structure stable across sessions.
Sub-Agent Partitioning The most aggressive form of context optimization is partitioning work across sub-agents with isolated contexts. Each sub-agent operates in a clean context focused on its subtask without carrying accumulated context from other subtasks.
This approach achieves separation of concerns—the detailed search context remains isolated within sub-agents while the coordinator focuses on synthesis and analysis.
Result Aggregation Aggregate results from partitioned subtasks by validating all partitions completed, merging compatible results, and summarizing if still too large.
Context Budget Allocation Design explicit context budgets. Allocate tokens to categories: system prompt, tool definitions, retrieved docs, message history, and reserved buffer. Monitor usage against budget and trigger optimization when approaching limits.
Trigger-Based Optimization Monitor signals for optimization triggers: token utilization above 80%, degradation indicators, and performance drops. Apply appropriate optimization techniques based on context composition.
When to optimize:
What to apply:
Compaction should achieve 50-70% token reduction with less than 5% quality degradation. Masking should achieve 60-80% reduction in masked observations. Cache optimization should achieve 70%+ hit rate for stable workloads.
Monitor and iterate on optimization strategies based on measured effectiveness.
Example 1: Compaction Trigger
if context_tokens / context_limit > 0.8:
context = compact_context(context)
Example 2: Observation Masking
if len(observation) > max_length:
ref_id = store_observation(observation)
return f"[Obs:{ref_id} elided. Key: {extract_key(observation)}]"
Example 3: Cache-Friendly Ordering
# Stable content first
context = [system_prompt, tool_definitions] # Cacheable
context += [reused_templates] # Reusable
context += [unique_content] # Unique
This skill builds on context-fundamentals and context-degradation. It connects to:
Internal reference:
Related skills in this collection:
External resources:
Created : 2025-12-20 Last Updated : 2025-12-20 Author : Agent Skills for Context Engineering Contributors Version : 1.0.0
Weekly Installs
1
Repository
First Seen
Today
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
zencoder1
amp1
cline1
openclaw1
opencode1
cursor1
超能力技能使用指南:AI助手技能调用优先级与工作流程详解
45,100 周安装