上下文工程基础：AI智能体上下文管理核心概念与实践指南

context-fundamentals by sickn33/antigravity-awesome-skills

105 周安装量

27,100 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill context-fundamentals

AI/机器学习系统架构提示工程

🇨🇳中文介绍

上下文工程基础

上下文是语言模型在推理时可用的完整状态。它包括模型在生成响应时可以关注的一切内容：系统指令、工具定义、检索到的文档、消息历史记录和工具输出。理解上下文基础是进行有效上下文工程的前提。

何时使用

在以下情况下激活此技能：

设计新的智能体系统或修改现有架构时
调试可能与上下文相关的意外智能体行为时
优化上下文使用以减少令牌成本或提高性能时
向新团队成员介绍上下文工程概念时
审查与上下文相关的设计决策时

核心概念

上下文由几个不同的组成部分构成，每个部分都有不同的特征和约束。注意力机制创建了一个有限的预算，限制了有效的上下文使用。渐进式披露通过仅在需要时加载信息来管理这种约束。上下文工程这门学科旨在策划能够实现预期结果的最小高信号令牌集。

详细主题

上下文的剖析

系统提示 系统提示确立了智能体的核心身份、约束和行为准则。它们在会话开始时加载一次，并通常在整个对话过程中持续存在。系统提示应极其清晰，并使用简单、直接的语言，其抽象程度要适合智能体。

合适的抽象程度需要平衡两种失败模式。一个极端是，工程师硬编码复杂而脆弱的逻辑，这会造成脆弱性和维护负担。另一个极端是，工程师提供模糊的高层指导，这无法为期望的输出提供具体信号，或错误地假设了共享上下文。最佳的抽象程度需要取得平衡：既要足够具体以有效指导行为，又要足够灵活以提供强有力的启发式方法。

使用 XML 标签或 Markdown 标题将提示组织成不同的部分，以划分背景信息、指令、工具指导和输出描述。随着模型能力增强，确切的格式变得不那么重要，但结构清晰性仍然有价值。

工具定义 工具定义指定了智能体可以执行的操作。每个工具都包括名称、描述、参数和返回格式。工具定义在序列化后位于上下文的前部附近，通常在系统提示之前或之后。

工具描述共同引导智能体行为。糟糕的描述会迫使智能体猜测；优化后的描述包含使用上下文、示例和默认值。整合原则指出，如果人类工程师无法明确说出在给定情况下应该使用哪个工具，就不能期望智能体做得更好。

检索到的文档 检索到的文档提供特定领域的知识、参考资料或与任务相关的信息。智能体使用检索增强生成在运行时将相关文档拉入上下文，而不是预先加载所有可能的信息。

这种即时方法维护轻量级标识符（文件路径、存储的查询、网页链接），并使用这些引用动态地将数据加载到上下文中。这反映了人类的认知方式：我们通常不会记住全部信息，而是使用外部的组织和索引系统按需检索相关信息。

消息历史记录 消息历史记录包含用户和智能体之间的对话，包括之前的查询、响应和推理过程。对于长时间运行的任务，消息历史记录可能会增长到主导上下文使用。

消息历史记录充当便签式记忆，智能体在其中跟踪进度、维护任务状态并保留跨轮次的推理过程。有效管理消息历史记录对于完成长期任务至关重要。

工具输出 工具输出是智能体操作的结果：文件内容、搜索结果、命令执行输出、API 响应和类似数据。工具输出构成了典型智能体轨迹中的大部分令牌，研究表明观察结果（工具输出）可以达到总上下文使用量的 83.9%。

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

上下文窗口与注意力机制

注意力预算约束 语言模型通过注意力机制处理令牌，该机制在上下文中的所有令牌之间创建成对关系。对于 n 个令牌，这会产生 n² 个必须计算和存储的关系。随着上下文长度增加，模型捕捉这些关系的能力会变得捉襟见肘。

模型从以较短序列为主的训练数据分布中发展出注意力模式。这意味着模型对于上下文范围内的依赖关系经验较少，专门的参数也较少。结果是，随着上下文增长，一个“注意力预算”会被耗尽。

位置编码与上下文扩展 位置编码插值允许模型通过将较长的序列适配到最初训练的较小上下文来处理它们。然而，这种适配会引入对令牌位置理解的退化。与在较短上下文上的性能相比，模型在处理较长上下文时仍然能力很强，但在信息检索和长程推理方面表现出精度下降。

渐进式披露原则 渐进式披露通过仅在需要时加载信息来高效管理上下文。启动时，智能体仅加载技能名称和描述——这足以知道何时可能相关。完整内容仅在技能被激活用于特定任务时加载。

这种方法使智能体保持快速，同时让它们能够按需访问更多上下文。该原则适用于多个层面：技能选择、文档加载，甚至工具结果检索。

上下文质量与上下文数量

更大的上下文窗口能解决记忆问题的假设已被经验证伪。上下文工程意味着找到可能的最小高信号令牌集，以最大化期望结果的可能性。

有几个因素对上下文效率提出了要求。处理成本随上下文长度不成比例地增长——不仅仅是令牌数量翻倍导致成本翻倍，而是在时间和计算资源上呈指数级增长。即使窗口技术上支持更多令牌，模型性能在超过特定上下文长度后也会下降。即使有前缀缓存，长输入仍然昂贵。

指导原则是信息性而非详尽性。包含对当前决策重要的内容，排除不重要的内容，并设计能够按需访问额外信息的系统。

上下文作为有限资源

必须将上下文视为具有边际收益递减的有限资源。就像人类的工作记忆有限一样，语言模型在解析大量上下文时会消耗注意力预算。

引入的每个新令牌都会在一定程度上消耗这个预算。这就需要仔细策划可用的令牌。工程问题是在固有约束下优化效用。

上下文工程是迭代的，每次决定向模型传递什么时都会发生策划阶段。这不是一次性的提示编写练习，而是一个持续的上下文管理学科。

基于文件系统的访问

具有文件系统访问权限的智能体可以自然地使用渐进式披露。将参考资料、文档和数据存储在外部。仅在需要时使用标准文件系统操作加载文件。这种模式避免了将可能不相关的信息塞满上下文。

文件系统本身提供了智能体可以导航的结构。文件大小暗示复杂性；命名约定暗示目的；时间戳可作为相关性的代理。文件引用的元数据提供了一种有效优化行为的机制。

最有效的智能体采用混合策略。为了提高速度预加载一些上下文（如 CLAUDE.md 文件或项目规则），但同时允许在需要时自主探索额外的上下文。决策边界取决于任务特征和上下文动态。

对于内容变化较少的上下文，预先加载更多内容是有意义的。对于快速变化或高度具体的信息，即时加载可以避免上下文过时。

在设计时明确考虑上下文预算。了解模型和任务的有效上下文限制。在开发过程中监控上下文使用情况。在适当的阈值处实现压缩触发器。假设上下文会退化来设计系统，而不是希望它不会。

有效的上下文预算不仅需要了解原始令牌数量，还需要了解注意力分布模式。上下文中间部分受到的注意力少于开头和结尾。将关键信息放在注意力偏好的位置。

示例 1：组织系统提示

<BACKGROUND_INFORMATION>
You are a Python expert helping a development team.
Current project: Data processing pipeline in Python 3.9+
</BACKGROUND_INFORMATION>

<INSTRUCTIONS>
- Write clean, idiomatic Python code
- Include type hints for function signatures
- Add docstrings for public functions
- Follow PEP 8 style guidelines
</INSTRUCTIONS>

<TOOL_GUIDANCE>
Use bash for shell operations, python for code tasks.
File operations should use pathlib for cross-platform compatibility.
</TOOL_GUIDANCE>

<OUTPUT_DESCRIPTION>
Provide code blocks with syntax highlighting.
Explain non-obvious decisions in comments.
</OUTPUT_DESCRIPTION>

示例 2：渐进式文档加载

# 替代一次性加载所有文档：

# 步骤 1：加载摘要
docs/api_summary.md          # 轻量级概述

# 步骤 2：仅在需要时加载特定部分
docs/api/endpoints.md        # 仅当需要 API 调用时
docs/api/authentication.md   # 仅当需要认证上下文时

将上下文视为具有递减收益的有限资源
将关键信息放在注意力偏好的位置（开头和结尾）
使用渐进式披露来延迟加载，直到需要时
使用清晰的部分边界组织系统提示
在开发过程中监控上下文使用情况
在利用率达到 70-80% 时实现压缩触发器
为上下文退化而设计，而不是希望避免它
优先选择较小的高信号上下文，而非较大的低信号上下文

此技能提供了所有其他技能所依赖的基础上下文。在探索以下内容之前应首先学习此技能：

context-degradation - 理解上下文如何失效
context-optimization - 扩展上下文容量的技术
multi-agent-patterns - 上下文隔离如何实现多智能体系统
tool-design - 工具定义如何与上下文交互

Context Components Reference - 详细技术参考

本集合中的相关技能：

context-degradation - 理解上下文失效模式
context-optimization - 高效使用上下文的技术

关于 Transformer 注意力机制的研究
领先 AI 实验室的生产工程指南
关于上下文窗口管理的框架文档

创建日期 : 2025-12-20 最后更新 : 2025-12-20 作者 : Agent Skills for Context Engineering Contributors 版本 : 1.0.0

🇺🇸English

Context Engineering Fundamentals

Context is the complete state available to a language model at inference time. It includes everything the model can attend to when generating responses: system instructions, tool definitions, retrieved documents, message history, and tool outputs. Understanding context fundamentals is prerequisite to effective context engineering.

When to Use

Activate this skill when:

Designing new agent systems or modifying existing architectures
Debugging unexpected agent behavior that may relate to context
Optimizing context usage to reduce token costs or improve performance
Onboarding new team members to context engineering concepts
Reviewing context-related design decisions

Core Concepts

Context comprises several distinct components, each with different characteristics and constraints. The attention mechanism creates a finite budget that constrains effective context usage. Progressive disclosure manages this constraint by loading information only as needed. The engineering discipline is curating the smallest high-signal token set that achieves desired outcomes.

Detailed Topics

The Anatomy of Context

System Prompts System prompts establish the agent's core identity, constraints, and behavioral guidelines. They are loaded once at session start and typically persist throughout the conversation. System prompts should be extremely clear and use simple, direct language at the right altitude for the agent.

The right altitude balances two failure modes. At one extreme, engineers hardcode complex brittle logic that creates fragility and maintenance burden. At the other extreme, engineers provide vague high-level guidance that fails to give concrete signals for desired outputs or falsely assumes shared context. The optimal altitude strikes a balance: specific enough to guide behavior effectively, yet flexible enough to provide strong heuristics.

Organize prompts into distinct sections using XML tagging or Markdown headers to delineate background information, instructions, tool guidance, and output description. The exact formatting matters less as models become more capable, but structural clarity remains valuable.

Tool Definitions Tool definitions specify the actions an agent can take. Each tool includes a name, description, parameters, and return format. Tool definitions live near the front of context after serialization, typically before or after the system prompt.

Tool descriptions collectively steer agent behavior. Poor descriptions force agents to guess; optimized descriptions include usage context, examples, and defaults. The consolidation principle states that if a human engineer cannot definitively say which tool should be used in a given situation, an agent cannot be expected to do better.

Retrieved Documents Retrieved documents provide domain-specific knowledge, reference materials, or task-relevant information. Agents use retrieval augmented generation to pull relevant documents into context at runtime rather than pre-loading all possible information.

The just-in-time approach maintains lightweight identifiers (file paths, stored queries, web links) and uses these references to load data into context dynamically. This mirrors human cognition: we generally do not memorize entire corpuses of information but rather use external organization and indexing systems to retrieve relevant information on demand.

Message History Message history contains the conversation between the user and agent, including previous queries, responses, and reasoning. For long-running tasks, message history can grow to dominate context usage.

Message history serves as scratchpad memory where agents track progress, maintain task state, and preserve reasoning across turns. Effective management of message history is critical for long-horizon task completion.

Tool Outputs Tool outputs are the results of agent actions: file contents, search results, command execution output, API responses, and similar data. Tool outputs comprise the majority of tokens in typical agent trajectories, with research showing observations (tool outputs) can reach 83.9% of total context usage.

Tool outputs consume context whether they are relevant to current decisions or not. This creates pressure for strategies like observation masking, compaction, and selective tool result retention.

Context Windows and Attention Mechanics

The Attention Budget Constraint Language models process tokens through attention mechanisms that create pairwise relationships between all tokens in context. For n tokens, this creates n² relationships that must be computed and stored. As context length increases, the model's ability to capture these relationships gets stretched thin.

Models develop attention patterns from training data distributions where shorter sequences predominate. This means models have less experience with and fewer specialized parameters for context-wide dependencies. The result is an "attention budget" that depletes as context grows.

Position Encoding and Context Extension Position encoding interpolation allows models to handle longer sequences by adapting them to originally trained smaller contexts. However, this adaptation introduces degradation in token position understanding. Models remain highly capable at longer contexts but show reduced precision for information retrieval and long-range reasoning compared to performance on shorter contexts.

The Progressive Disclosure Principle Progressive disclosure manages context efficiently by loading information only as needed. At startup, agents load only skill names and descriptions—sufficient to know when a skill might be relevant. Full content loads only when a skill is activated for specific tasks.

This approach keeps agents fast while giving them access to more context on demand. The principle applies at multiple levels: skill selection, document loading, and even tool result retrieval.

Context Quality Versus Context Quantity

The assumption that larger context windows solve memory problems has been empirically debunked. Context engineering means finding the smallest possible set of high-signal tokens that maximize the likelihood of desired outcomes.

Several factors create pressure for context efficiency. Processing cost grows disproportionately with context length—not just double the cost for double the tokens, but exponentially more in time and computing resources. Model performance degrades beyond certain context lengths even when the window technically supports more tokens. Long inputs remain expensive even with prefix caching.

The guiding principle is informativity over exhaustiveness. Include what matters for the decision at hand, exclude what does not, and design systems that can access additional information on demand.

Context as Finite Resource

Context must be treated as a finite resource with diminishing marginal returns. Like humans with limited working memory, language models have an attention budget drawn on when parsing large volumes of context.

Every new token introduced depletes this budget by some amount. This creates the need for careful curation of available tokens. The engineering problem is optimizing utility against inherent constraints.

Context engineering is iterative and the curation phase happens each time you decide what to pass to the model. It is not a one-time prompt writing exercise but an ongoing discipline of context management.

Practical Guidance

File-System-Based Access

Agents with filesystem access can use progressive disclosure naturally. Store reference materials, documentation, and data externally. Load files only when needed using standard filesystem operations. This pattern avoids stuffing context with information that may not be relevant.

The file system itself provides structure that agents can navigate. File sizes suggest complexity; naming conventions hint at purpose; timestamps serve as proxies for relevance. Metadata of file references provides a mechanism to efficiently refine behavior.

Hybrid Strategies

The most effective agents employ hybrid strategies. Pre-load some context for speed (like CLAUDE.md files or project rules), but enable autonomous exploration for additional context as needed. The decision boundary depends on task characteristics and context dynamics.

For contexts with less dynamic content, pre-loading more upfront makes sense. For rapidly changing or highly specific information, just-in-time loading avoids stale context.

Context Budgeting

Design with explicit context budgets in mind. Know the effective context limit for your model and task. Monitor context usage during development. Implement compaction triggers at appropriate thresholds. Design systems assuming context will degrade rather than hoping it will not.

Effective context budgeting requires understanding not just raw token counts but also attention distribution patterns. The middle of context receives less attention than the beginning and end. Place critical information at attention-favored positions.

Examples

Example 1: Organizing System Prompts

<BACKGROUND_INFORMATION>
You are a Python expert helping a development team.
Current project: Data processing pipeline in Python 3.9+
</BACKGROUND_INFORMATION>

<INSTRUCTIONS>
- Write clean, idiomatic Python code
- Include type hints for function signatures
- Add docstrings for public functions
- Follow PEP 8 style guidelines
</INSTRUCTIONS>

<TOOL_GUIDANCE>
Use bash for shell operations, python for code tasks.
File operations should use pathlib for cross-platform compatibility.
</TOOL_GUIDANCE>

<OUTPUT_DESCRIPTION>
Provide code blocks with syntax highlighting.
Explain non-obvious decisions in comments.
</OUTPUT_DESCRIPTION>

Example 2: Progressive Document Loading

# Instead of loading all documentation at once:

# Step 1: Load summary
docs/api_summary.md          # Lightweight overview

# Step 2: Load specific section as needed
docs/api/endpoints.md        # Only when API calls needed
docs/api/authentication.md   # Only when auth context needed

Guidelines

Treat context as a finite resource with diminishing returns
Place critical information at attention-favored positions (beginning and end)
Use progressive disclosure to defer loading until needed
Organize system prompts with clear section boundaries
Monitor context usage during development
Implement compaction triggers at 70-80% utilization
Design for context degradation rather than hoping to avoid it
Prefer smaller high-signal context over larger low-signal context

Integration

This skill provides foundational context that all other skills build upon. It should be studied first before exploring:

context-degradation - Understanding how context fails
context-optimization - Techniques for extending context capacity
multi-agent-patterns - How context isolation enables multi-agent systems
tool-design - How tool definitions interact with context

References

Internal reference:

Context Components Reference - Detailed technical reference

Related skills in this collection:

context-degradation - Understanding context failure patterns
context-optimization - Techniques for efficient context use

External resources:

Research on transformer attention mechanisms
Production engineering guides from leading AI labs
Framework documentation on context window management

Skill Metadata

Created : 2025-12-20 Last Updated : 2025-12-20 Author : Agent Skills for Context Engineering Contributors Version : 1.0.0

Weekly Installs

105

Repository

sickn33/antigra…e-skills

GitHub Stars

27.1K

First Seen

Feb 1, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

codex100

gemini-cli99

opencode99

github-copilot98

kimi-cli97

amp96

AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具

48,300 周安装