agent-native-architecture by everyinc/compound-engineering-plugin
npx skills add https://github.com/everyinc/compound-engineering-plugin --skill agent-native-architecture<why_now>
软件智能体现在已能可靠工作。Claude Code 证明了,一个拥有 bash 和文件工具访问权限的大型语言模型,以循环运行直至达成目标的方式,能够自主完成复杂的多步骤任务。
令人惊讶的发现是:一个真正优秀的编码智能体,实际上就是一个真正优秀的通用智能体。 让 Claude Code 重构代码库的相同架构,也可以让一个智能体来组织你的文件、管理你的阅读清单或自动化你的工作流。
Claude Code SDK 使这变得触手可及。你可以构建这样的应用程序:其中的功能不是你编写的代码——而是你描述的结果,由拥有工具的智能体达成,该智能体循环运行直至达到预期结果。
这开辟了一个新领域:以 Claude Code 工作方式运行的软件,应用于远超出编码范畴的各类场景。</why_now>
<core_principles>
用户通过界面能做的任何事情,智能体都应该能够通过工具实现。
这是基本原则。没有它,其他一切都无从谈起。
假设你构建了一个笔记应用,拥有创建、组织和标记笔记的漂亮界面。用户要求智能体:“创建一个总结我会议的笔记,并将其标记为紧急。”
如果你构建了创建笔记的界面,但没有赋予智能体相同的能力,那么智能体就会束手无策。它可能会道歉或请求澄清问题,但无法提供帮助——即使这个操作对于使用界面的人类来说是微不足道的。
解决方案: 确保智能体拥有能够完成界面所能做的任何事情的工具(或工具组合)。
这并不是要创建界面按钮与工具之间的一一映射。而是要确保智能体能够达成相同的结果。有时这需要一个单一工具(create_note)。有时则需要组合多个基本操作(write_file 到笔记目录并进行适当格式化)。
准则: 当添加任何界面功能时,请问:智能体能否达成这个结果?如果不能,请添加必要的工具或基本操作。
能力映射表有助于理解:
| 用户操作 |
|---|
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| 智能体如何实现 |
|---|
| 创建笔记 | write_file 到笔记目录,或 create_note 工具 |
| 将笔记标记为紧急 | update_file 元数据,或 tag_note 工具 |
| 搜索笔记 | search_files 或 search_notes 工具 |
| 删除笔记 | delete_file 或 delete_note 工具 |
测试: 选择用户在你的界面中可以执行的任何操作。向智能体描述它。智能体能否完成这个结果?
优先使用原子级基本操作。功能是由智能体在循环中运行直至达成的结果。
工具是一种基本能力:读取文件、写入文件、运行 bash 命令、存储记录、发送通知。
功能 不是你编写的函数。它是你在提示中描述的结果,由拥有工具并在循环中运行直至达成该结果的智能体实现。
细粒度不足(限制了智能体):
Tool: classify_and_organize_files(files)
→ 你编写了决策逻辑
→ 智能体执行你的代码
→ 要改变行为,你需要重构代码
细粒度更高(赋能智能体):
Tools: read_file, write_file, move_file, list_directory, bash
Prompt: "整理用户的下载文件夹。分析每个文件,
根据内容和最近使用情况确定合适的位置,
并将它们移动到那里。"
Agent: 在循环中运行——读取文件、做出判断、移动文件、
检查结果——直到文件夹整理完毕。
→ 智能体做出决策
→ 要改变行为,你只需编辑提示
关键转变: 智能体是在运用判断力追求一个结果,而不是执行一个编排好的序列。它可能会遇到意外的文件类型、调整其方法或请求澄清问题。循环持续进行,直到结果达成。
你的工具越原子化,智能体就能越灵活地使用它们。如果你将决策逻辑捆绑到工具中,你就把判断力移回了代码中。
测试: 要改变某个功能的行为,你是编辑文本描述还是重构代码?
有了原子化工具和对等性,你只需编写新的提示就可以创建新功能。
这是前两个原则的回报。当你的工具是原子化的,并且智能体可以做用户能做的任何事情时,新功能就只是新的提示。
想要一个“每周回顾”功能来总结活动并建议优先级吗?这只是一个提示:
"回顾本周修改的文件。总结关键变化。基于
未完成的项目和临近的截止日期,为下周
建议三个优先事项。"
智能体使用 list_files、read_file 及其判断力来完成此任务。你没有编写每周回顾的代码。你描述了一个结果,智能体在循环中运行直至达成。
这对开发者和用户都适用。 你可以通过添加提示来发布新功能。用户可以通过修改提示或创建自己的提示来自定义行为。“当我说‘归档这个’时,总是将其移动到我的 Action 文件夹并标记为紧急”就变成了一个用户级别的提示,从而扩展了应用程序。
约束条件: 这只有在工具足够原子化,能够以你未预料到的方式组合,并且智能体与用户具有对等性时才能实现。如果工具编码了太多逻辑,或者智能体无法访问关键能力,组合就会失效。
测试: 你能通过编写新的提示部分来添加新功能,而无需添加新代码吗?
智能体能够完成你未明确设计的功能。
当工具是原子化的、对等性得以保持、提示是可组合的时,用户会要求智能体做你从未预料到的事情。而且通常,智能体能够解决。
“交叉引用我的会议笔记和任务列表,告诉我我承诺了但尚未安排的事情。”
你没有构建“承诺追踪器”功能。但如果智能体能够阅读笔记、阅读任务并对其进行推理——在循环中运行直到得出答案——它就能完成这个任务。
这揭示了潜在需求。 与其猜测用户想要什么功能,不如观察他们要求智能体做什么。当模式出现时,你可以用特定领域的工具或专用提示来优化它们。但你不需要预先预测它们——你发现了它们。
飞轮效应:
这改变了你构建产品的方式。你不是试图预先想象每一个功能。你是在创建一个强大的基础,并从涌现的事物中学习。
测试: 给智能体一个与你的领域相关的开放式请求。它能否想出一个合理的方法,在循环中运行直到成功?如果它只是说“我没有这个功能”,那么你的架构就太受限了。
原生智能体应用程序通过积累的上下文和提示优化而变得更好。
与传统软件不同,原生智能体应用程序无需发布代码即可改进:
积累的上下文: 智能体可以跨会话维护状态——存在什么、用户做了什么、什么有效、什么无效。智能体读取和更新的 context.md 文件是第一层。更复杂的方法涉及结构化记忆和习得的偏好。
多层次的提示优化:
自我修改(高级): 智能体可以编辑自己的提示甚至自己的代码。对于生产用例,考虑添加安全护栏——审批门、自动回滚检查点、健康检查。这是未来的发展方向。
改进机制仍在探索中。上下文和提示优化已被证明有效。自我修改正在兴起。明确的是:该架构支持以传统软件无法实现的方式变得更好。
测试: 即使没有代码更改,应用程序在使用一个月后是否比第一天运行得更好?</core_principles>
等待响应后再继续。
阅读参考资料后,将这些模式应用到用户的具体上下文中。
<architecture_checklist>
设计原生智能体系统时,在实施前验证以下内容:
z.string() 输入,而不是 z.enum()complete_task 工具(而非启发式检测)refresh_context 工具)设计架构时,在你的计划中明确处理每个复选框。</architecture_checklist>
<quick_start>
步骤 1:定义原子化工具
const tools = [
tool("read_file", "读取任何文件", { path: z.string() }, ...),
tool("write_file", "写入任何文件", { path: z.string(), content: z.string() }, ...),
tool("list_files", "列出目录", { path: z.string() }, ...),
tool("complete_task", "发出任务完成信号", { summary: z.string() }, ...),
];
步骤 2:在系统提示中编写行为
## 你的职责
当被要求组织内容时,你应该:
1. 读取现有文件以了解结构
2. 分析什么样的组织方式有意义
3. 使用你的工具创建/移动文件
4. 运用你的判断来决定布局和格式
5. 完成后调用 complete_task
你来决定结构。把它做好。
步骤 3:让智能体在循环中工作
const result = await agent.run({
prompt: userMessage,
tools: tools,
systemPrompt: systemPrompt,
// 智能体循环运行,直到调用 complete_task
});
</quick_start>
<reference_index>
references/ 中的所有参考资料:
核心模式:
原生智能体准则:
平台特定:
<anti_patterns>
这些不一定错——它们可能适合你的用例。但值得认识到它们与本文档描述的架构不同。
智能体作为路由器 — 智能体弄清楚用户想要什么,然后调用正确的函数。智能体的智能用于路由,而不是行动。这可能有效,但你只使用了智能体能力的一小部分。
先构建应用,再添加智能体 — 你以传统方式(作为代码)构建功能,然后将其暴露给智能体。智能体只能做你现有功能已经能做的事情。你不会获得涌现能力。
请求/响应思维 — 智能体获取输入,做一件事,返回输出。这忽略了循环:智能体获取一个要达成的结果,运行直到完成,在此过程中处理意外情况。
防御性工具设计 — 因为你习惯于防御性编程,所以过度约束了工具输入。严格的枚举、每一层的验证。这是安全的,但它阻止了智能体做你未预料到的事情。
代码处理理想路径,智能体仅执行 — 传统软件在代码中处理边缘情况——你编写了当 X 出错时发生什么的逻辑。原生智能体让智能体用判断力处理边缘情况。如果你的代码处理了所有边缘情况,那么智能体就只是一个调用者。
首要原则:智能体执行你的代码,而不是自己解决问题
// 错误 - 你编写了工作流,智能体只是执行它
tool("process_feedback", async ({ message }) => {
const category = categorize(message); // 你的代码做决定
const priority = calculatePriority(message); // 你的代码做决定
await store(message, category, priority); // 你的代码编排
if (priority > 3) await notify(); // 你的代码做决定
});
// 正确 - 智能体弄清楚如何处理反馈
tools: store_item, send_message // 基本操作
prompt: "根据可操作性将重要性评为 1-5 级,存储反馈,如果 >= 4 则通知"
工作流形状的工具 — analyze_and_organize 将判断力捆绑到工具中。将其分解为基本操作,让智能体组合它们。
上下文匮乏 — 智能体不知道应用中存在哪些资源。
User: "在我的 feed 中写一些关于叶卡捷琳娜大帝的内容"
Agent: "什么 feed?我不明白你指的是什么系统。"
修复:将可用资源、能力和词汇注入系统提示。
孤立的界面操作 — 用户可以通过界面做某事,但智能体无法实现。修复:保持对等性。
静默操作 — 智能体更改了状态,但界面没有更新。修复:使用具有反应性绑定的共享数据存储,或文件系统监视。
启发式完成检测 — 通过启发式方法检测智能体完成(连续迭代没有工具调用、检查预期的输出文件)。这是脆弱的。修复:要求智能体通过 complete_task 工具显式发出完成信号。
为动态 API 使用静态工具映射 — 为 50 个 API 端点构建 50 个工具,而 discover + access 模式会提供更大的灵活性。
// 错误 - 每种 API 类型都需要一个硬编码的工具
tool("read_steps", ...)
tool("read_heart_rate", ...)
tool("read_sleep", ...)
// 当添加血糖追踪时...需要代码变更
// 正确 - 动态能力发现
tool("list_available_types", ...) // 发现可用的内容
tool("read_health_data", { dataType: z.string() }, ...) // 访问任何类型
不完整的 CRUD — 智能体可以创建但不能更新或删除。
// User: "删除那个日记条目"
// Agent: "我没有那个工具"
tool("create_journal_entry", ...) // 缺少:更新、删除
修复:每个实体都需要完整的 CRUD。
沙盒隔离 — 智能体在与用户分离的数据空间中工作。
Documents/
├── user_files/ ← 用户的空间
└── agent_output/ ← 智能体的空间(隔离)
修复:使用共享工作空间,两者都在相同的文件上操作。
无理由的限制 — 领域工具是完成某事的唯一方式,而你并非有意限制访问。默认应该是开放的。除非有特定原因,否则保持基本操作可用。
人为的能力限制 — 出于模糊的安全考虑而非特定风险,限制智能体能做什么。在限制能力时要深思熟虑。智能体通常应该能够做用户能做的事情。</anti_patterns>
<success_criteria>
当你满足以下条件时,你就构建了一个原生智能体应用程序:
向智能体描述一个在你的应用程序领域内,但你未构建特定功能的结果。
它能否想出如何完成它,在循环中运行直到成功?
如果可以,你就构建了原生智能体的东西。
如果它说“我没有这个功能”——那么你的架构仍然太受限了。</success_criteria>
每周安装量
261
代码仓库
GitHub 星标数
10.9K
首次出现
Jan 20, 2026
安全审计
安装于
codex229
opencode227
gemini-cli227
claude-code214
github-copilot210
cursor209
<why_now>
Software agents work reliably now. Claude Code demonstrated that an LLM with access to bash and file tools, operating in a loop until an objective is achieved, can accomplish complex multi-step tasks autonomously.
The surprising discovery: a really good coding agent is actually a really good general-purpose agent. The same architecture that lets Claude Code refactor a codebase can let an agent organize your files, manage your reading list, or automate your workflows.
The Claude Code SDK makes this accessible. You can build applications where features aren't code you write—they're outcomes you describe, achieved by an agent with tools, operating in a loop until the outcome is reached.
This opens up a new field: software that works the way Claude Code works, applied to categories far beyond coding. </why_now>
<core_principles>
Whatever the user can do through the UI, the agent should be able to achieve through tools.
This is the foundational principle. Without it, nothing else matters.
Imagine you build a notes app with a beautiful interface for creating, organizing, and tagging notes. A user asks the agent: "Create a note summarizing my meeting and tag it as urgent."
If you built UI for creating notes but no agent capability to do the same, the agent is stuck. It might apologize or ask clarifying questions, but it can't help—even though the action is trivial for a human using the interface.
The fix: Ensure the agent has tools (or combinations of tools) that can accomplish anything the UI can do.
This isn't about creating a 1:1 mapping of UI buttons to tools. It's about ensuring the agent can achieve the same outcomes. Sometimes that's a single tool (create_note). Sometimes it's composing primitives (write_file to a notes directory with proper formatting).
The discipline: When adding any UI capability, ask: can the agent achieve this outcome? If not, add the necessary tools or primitives.
A capability map helps:
| User Action | How Agent Achieves It |
|---|---|
| Create a note | write_file to notes directory, or create_note tool |
| Tag a note as urgent | update_file metadata, or tag_note tool |
| Search notes | search_files or search_notes tool |
| Delete a note | delete_file or delete_note tool |
The test: Pick any action a user can take in your UI. Describe it to the agent. Can it accomplish the outcome?
Prefer atomic primitives. Features are outcomes achieved by an agent operating in a loop.
A tool is a primitive capability: read a file, write a file, run a bash command, store a record, send a notification.
A feature is not a function you write. It's an outcome you describe in a prompt, achieved by an agent that has tools and operates in a loop until the outcome is reached.
Less granular (limits the agent):
Tool: classify_and_organize_files(files)
→ You wrote the decision logic
→ Agent executes your code
→ To change behavior, you refactor
More granular (empowers the agent):
Tools: read_file, write_file, move_file, list_directory, bash
Prompt: "Organize the user's downloads folder. Analyze each file,
determine appropriate locations based on content and recency,
and move them there."
Agent: Operates in a loop—reads files, makes judgments, moves things,
checks results—until the folder is organized.
→ Agent makes the decisions
→ To change behavior, you edit the prompt
The key shift: The agent is pursuing an outcome with judgment, not executing a choreographed sequence. It might encounter unexpected file types, adjust its approach, or ask clarifying questions. The loop continues until the outcome is achieved.
The more atomic your tools, the more flexibly the agent can use them. If you bundle decision logic into tools, you've moved judgment back into code.
The test: To change how a feature behaves, do you edit prose or refactor code?
With atomic tools and parity, you can create new features just by writing new prompts.
This is the payoff of the first two principles. When your tools are atomic and the agent can do anything users can do, new features are just new prompts.
Want a "weekly review" feature that summarizes activity and suggests priorities? That's a prompt:
"Review files modified this week. Summarize key changes. Based on
incomplete items and approaching deadlines, suggest three priorities
for next week."
The agent uses list_files, read_file, and its judgment to accomplish this. You didn't write weekly-review code. You described an outcome, and the agent operates in a loop until it's achieved.
This works for developers and users. You can ship new features by adding prompts. Users can customize behavior by modifying prompts or creating their own. "When I say 'file this,' always move it to my Action folder and tag it urgent" becomes a user-level prompt that extends the application.
The constraint: This only works if tools are atomic enough to be composed in ways you didn't anticipate, and if the agent has parity with users. If tools encode too much logic, or the agent can't access key capabilities, composition breaks down.
The test: Can you add a new feature by writing a new prompt section, without adding new code?
The agent can accomplish things you didn't explicitly design for.
When tools are atomic, parity is maintained, and prompts are composable, users will ask the agent for things you never anticipated. And often, the agent can figure it out.
"Cross-reference my meeting notes with my task list and tell me what I've committed to but haven't scheduled."
You didn't build a "commitment tracker" feature. But if the agent can read notes, read tasks, and reason about them—operating in a loop until it has an answer—it can accomplish this.
This reveals latent demand. Instead of guessing what features users want, you observe what they're asking the agent to do. When patterns emerge, you can optimize them with domain-specific tools or dedicated prompts. But you didn't have to anticipate them—you discovered them.
The flywheel:
This changes how you build products. You're not trying to imagine every feature upfront. You're creating a capable foundation and learning from what emerges.
The test: Give the agent an open-ended request relevant to your domain. Can it figure out a reasonable approach, operating in a loop until it succeeds? If it just says "I don't have a feature for that," your architecture is too constrained.
Agent-native applications get better through accumulated context and prompt refinement.
Unlike traditional software, agent-native applications can improve without shipping code:
Accumulated context: The agent can maintain state across sessions—what exists, what the user has done, what worked, what didn't. A context.md file the agent reads and updates is layer one. More sophisticated approaches involve structured memory and learned preferences.
Prompt refinement at multiple levels:
Self-modification (advanced): Agents that can edit their own prompts or even their own code. For production use cases, consider adding safety rails—approval gates, automatic checkpoints for rollback, health checks. This is where things are heading.
The improvement mechanisms are still being discovered. Context and prompt refinement are proven. Self-modification is emerging. What's clear: the architecture supports getting better in ways traditional software doesn't.
The test: Does the application work better after a month of use than on day one, even without code changes? </core_principles>
Wait for response before proceeding.
After reading the reference, apply those patterns to the user's specific context.
<architecture_checklist>
When designing an agent-native system, verify these before implementation :
z.string() inputs when the API validates, not z.enum()complete_task tool (not heuristic detection)refresh_context tool)When designing architecture, explicitly address each checkbox in your plan. </architecture_checklist>
<quick_start>
Step 1: Define atomic tools
const tools = [
tool("read_file", "Read any file", { path: z.string() }, ...),
tool("write_file", "Write any file", { path: z.string(), content: z.string() }, ...),
tool("list_files", "List directory", { path: z.string() }, ...),
tool("complete_task", "Signal task completion", { summary: z.string() }, ...),
];
Step 2: Write behavior in the system prompt
## Your Responsibilities
When asked to organize content, you should:
1. Read existing files to understand the structure
2. Analyze what organization makes sense
3. Create/move files using your tools
4. Use your judgment about layout and formatting
5. Call complete_task when you're done
You decide the structure. Make it good.
Step 3: Let the agent work in a loop
const result = await agent.run({
prompt: userMessage,
tools: tools,
systemPrompt: systemPrompt,
// Agent loops until it calls complete_task
});
</quick_start>
<reference_index>
All references in references/:
Core Patterns:
Agent-Native Disciplines:
Platform-Specific:
<anti_patterns>
These aren't necessarily wrong—they may be appropriate for your use case. But they're worth recognizing as different from the architecture this document describes.
Agent as router — The agent figures out what the user wants, then calls the right function. The agent's intelligence is used to route, not to act. This can work, but you're using a fraction of what agents can do.
Build the app, then add agent — You build features the traditional way (as code), then expose them to an agent. The agent can only do what your features already do. You won't get emergent capability.
Request/response thinking — Agent gets input, does one thing, returns output. This misses the loop: agent gets an outcome to achieve, operates until it's done, handles unexpected situations along the way.
Defensive tool design — You over-constrain tool inputs because you're used to defensive programming. Strict enums, validation at every layer. This is safe, but it prevents the agent from doing things you didn't anticipate.
Happy path in code, agent just executes — Traditional software handles edge cases in code—you write the logic for what happens when X goes wrong. Agent-native lets the agent handle edge cases with judgment. If your code handles all the edge cases, the agent is just a caller.
THE CARDINAL SIN: Agent executes your code instead of figuring things out
// WRONG - You wrote the workflow, agent just executes it
tool("process_feedback", async ({ message }) => {
const category = categorize(message); // Your code decides
const priority = calculatePriority(message); // Your code decides
await store(message, category, priority); // Your code orchestrates
if (priority > 3) await notify(); // Your code decides
});
// RIGHT - Agent figures out how to process feedback
tools: store_item, send_message // Primitives
prompt: "Rate importance 1-5 based on actionability, store feedback, notify if >= 4"
Workflow-shaped tools — analyze_and_organize bundles judgment into the tool. Break it into primitives and let the agent compose them.
Context starvation — Agent doesn't know what resources exist in the app.
User: "Write something about Catherine the Great in my feed"
Agent: "What feed? I don't understand what system you're referring to."
Fix: Inject available resources, capabilities, and vocabulary into system prompt.
Orphan UI actions — User can do something through the UI that the agent can't achieve. Fix: maintain parity.
Silent actions — Agent changes state but UI doesn't update. Fix: Use shared data stores with reactive binding, or file system observation.
Heuristic completion detection — Detecting agent completion through heuristics (consecutive iterations without tool calls, checking for expected output files). This is fragile. Fix: Require agents to explicitly signal completion through a complete_task tool.
Static tool mapping for dynamic APIs — Building 50 tools for 50 API endpoints when a discover + access pattern would give more flexibility.
// WRONG - Every API type needs a hardcoded tool
tool("read_steps", ...)
tool("read_heart_rate", ...)
tool("read_sleep", ...)
// When glucose tracking is added... code change required
// RIGHT - Dynamic capability discovery
tool("list_available_types", ...) // Discover what's available
tool("read_health_data", { dataType: z.string() }, ...) // Access any type
Incomplete CRUD — Agent can create but not update or delete.
// User: "Delete that journal entry"
// Agent: "I don't have a tool for that"
tool("create_journal_entry", ...) // Missing: update, delete
Fix: Every entity needs full CRUD.
Sandbox isolation — Agent works in separate data space from user.
Documents/
├── user_files/ ← User's space
└── agent_output/ ← Agent's space (isolated)
Fix: Use shared workspace where both operate on same files.
Gates without reason — Domain tool is the only way to do something, and you didn't intend to restrict access. The default is open. Keep primitives available unless there's a specific reason to gate.
Artificial capability limits — Restricting what the agent can do out of vague safety concerns rather than specific risks. Be thoughtful about restricting capabilities. The agent should generally be able to do what users can do. </anti_patterns>
<success_criteria>
You've built an agent-native application when:
Describe an outcome to the agent that's within your application's domain but that you didn't build a specific feature for.
Can it figure out how to accomplish it, operating in a loop until it succeeds?
If yes, you've built something agent-native.
If it says "I don't have a feature for that"—your architecture is still too constrained. </success_criteria>
Weekly Installs
261
Repository
GitHub Stars
10.9K
First Seen
Jan 20, 2026
Security Audits
Gen Agent Trust HubFailSocketPassSnykWarn
Installed on
codex229
opencode227
gemini-cli227
claude-code214
github-copilot210
cursor209
AI Elements:基于shadcn/ui的AI原生应用组件库,快速构建对话界面
56,200 周安装