npx skills add https://github.com/shipshitdev/library --skill tool-design工具是智能体与世界交互的主要机制。它们定义了确定性系统与非确定性智能体之间的契约。与为开发者设计的传统软件 API 不同,工具 API 必须为能够推理意图、推断参数值并根据自然语言请求生成调用的语言模型而设计。糟糕的工具设计会产生任何提示工程都无法修复的故障模式。有效的工具设计遵循特定原则,这些原则考虑了智能体如何感知和使用工具。
在以下情况下启用此技能:
工具是确定性系统与非确定性智能体之间的契约。整合原则指出:如果人类工程师无法确定在给定情况下应该使用哪个工具,那么就不能期望智能体做得更好。有效的工具描述是塑造智能体行为的提示工程。
关键原则包括:清晰的描述能回答是什么、何时使用以及返回什么;平衡完整性和令牌效率的响应格式;能够实现恢复的错误信息;以及减少认知负荷的一致约定。
工具即契约 工具是确定性系统与非确定性智能体之间的契约。当人类调用 API 时,他们理解契约并发出适当的请求。智能体必须从描述中推断契约,并生成符合预期格式的调用。
这种根本性差异要求重新思考 API 设计。契约必须明确无误,示例必须说明预期模式,错误信息必须指导纠正。工具定义中的每一个模糊点都成为一个潜在的故障模式。
工具描述即提示 工具描述被加载到智能体上下文中,并共同引导其行为。这些描述不仅仅是文档——它们是塑造智能体如何推理工具使用的提示工程。
像“搜索数据库”这样带有晦涩参数名称的糟糕描述会迫使智能体去猜测。优化的描述包括使用上下文、示例和默认值。描述应回答:工具做什么、何时使用以及产生什么结果。
命名空间与组织 随着工具集合的增长,组织变得至关重要。命名空间将相关工具分组在通用前缀下,帮助智能体在正确的时间选择适当的工具。
命名空间在功能之间创建了清晰的边界。当智能体需要数据库信息时,它会路由到数据库命名空间。当需要网络搜索时,它会路由到网络命名空间。
单一综合性工具 整合原则指出:如果人类工程师无法确定在给定情况下应该使用哪个工具,那么就不能期望智能体做得更好。这导致人们倾向于使用单一综合性工具,而不是多个狭窄的工具。
与其实现 list_users、 和 ,不如实现一个 ,它能查找可用性并进行安排。综合性工具在内部处理完整的工作流程,而不是要求智能体链接多个调用。
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
list_eventscreate_eventschedule_event整合为何有效 智能体的上下文和注意力是有限的。集合中的每个工具在工具选择阶段都会争夺注意力。每个工具都会添加消耗上下文预算的描述令牌。功能重叠会造成使用哪个工具的模糊性。
整合通过消除冗余描述来减少令牌消耗。它通过让一个工具覆盖每个工作流程来消除模糊性。它通过缩小有效工具集来降低工具选择的复杂性。
何时不应整合 整合并非普遍正确。具有根本不同行为的工具应保持分离。在不同上下文中使用的工具受益于分离。可能被独立调用的工具不应人为地捆绑在一起。
整合原则,推到逻辑极端,会导致架构简化:移除大多数专用工具,转而支持原始的、通用的能力。生产证据表明,这种方法可以胜过复杂的多工具架构。
文件系统智能体模式 与其为数据探索、模式查找和查询验证构建自定义工具,不如通过单一的命令执行工具提供直接的文件系统访问。智能体使用标准的 Unix 实用程序(grep、cat、find、ls)来探索、理解和操作系统。
这种方法有效是因为:
简化何时优于复杂化 简化在以下情况下有效:
简化在以下情况下会失败:
停止限制推理 一个常见的反模式是构建工具来“保护”模型免受复杂性影响。预先过滤上下文、限制选项、将交互包装在验证逻辑中。随着模型的改进,这些防护措施往往会变成负担。
要问的问题是:您的工具是在赋能新能力,还是在限制模型本可以自行处理的推理?
为未来的模型构建 模型的改进速度比工具开发更快。为今天的模型优化的架构可能对明天的模型来说限制过多。构建能够从模型改进中受益的最小化架构,而不是锁定当前限制的复杂架构。
有关生产证据,请参阅 架构简化案例研究。
描述结构 有效的工具描述回答四个问题:
工具做什么? 清晰、具体的功能描述。避免使用“有助于”或“可用于”等模糊语言。准确说明工具实现了什么。
何时应使用它? 具体的触发条件和上下文。包括直接触发(“用户询问定价”)和间接信号(“需要当前市场利率”)。
它接受什么输入? 包含类型、约束和默认值的参数描述。解释每个参数控制什么。
它返回什么? 输出格式和结构。包括成功响应和错误情况的示例。
默认参数选择 默认值应反映常见用例。它们通过消除不必要的参数指定来减轻智能体负担。它们防止因参数缺失而导致的错误。
工具响应大小显著影响上下文使用。实现响应格式选项可以让智能体控制详细程度。
简洁格式仅返回基本字段,适用于确认或基本信息。详细格式返回包含所有字段的完整对象,适用于需要完整上下文进行决策的情况。
在工具描述中包含关于何时使用每种格式的指导。智能体会学会根据任务要求选择适当的格式。
错误信息服务于两个受众:调试问题的开发者和从故障中恢复的智能体。对于智能体,错误信息必须具有可操作性。它们必须告诉智能体哪里出了问题以及如何纠正。
设计能够实现恢复的错误信息。对于可重试的错误,包含重试指导。对于输入错误,包含正确的格式。对于缺失的数据,说明需要什么。
在所有工具中使用一致的模式。建立命名约定:工具名称采用动词-名词模式,跨工具的参数名称一致,返回字段名称一致。
研究表明,工具描述重叠会导致模型混淆。更多的工具并不总是带来更好的结果。一个合理的指导原则是,大多数应用使用 10-20 个工具。如果需要更多,请使用命名空间创建逻辑分组。
实现机制来帮助智能体选择正确的工具:工具分组、基于示例的选择以及具有路由到专用子工具的伞形工具的层次结构。
使用 MCP(模型上下文协议)工具时,始终使用完全限定的工具名称,以避免“找不到工具”错误。
格式:ServerName:tool_name
# 正确:完全限定名称
"使用 BigQuery:bigquery_schema 工具来检索表模式。"
"使用 GitHub:create_issue 工具来创建问题。"
# 不正确:非限定名称
"使用 bigquery_schema 工具..." # 当有多个服务器时可能会失败
没有服务器前缀,智能体可能无法定位工具,尤其是在有多个 MCP 服务器可用时。建立命名约定,在所有工具引用中包含服务器上下文。
Claude 可以优化自己的工具。当给定一个工具并观察到故障模式时,它会诊断问题并提出改进建议。生产测试表明,通过帮助未来的智能体避免错误,这种方法实现了任务完成时间减少 40%。
工具测试智能体模式:
def optimize_tool_description(tool_spec, failure_examples):
"""
使用智能体分析工具故障并改进描述。
流程:
1. 智能体尝试在各种任务中使用工具
2. 收集故障模式和摩擦点
3. 智能体分析故障并提出改进建议
4. 针对相同任务测试改进后的描述
"""
prompt = f"""
分析此工具规范和观察到的故障。
工具:{tool_spec}
观察到的故障:
{failure_examples}
识别:
1. 为什么智能体在使用此工具时失败
2. 描述中缺少什么信息
3. 哪些模糊性导致了不正确的使用
提出一个改进的工具描述,以解决这些问题。
"""
return get_agent_response(prompt)
这创建了一个反馈循环:使用工具的智能体生成故障数据,然后智能体利用这些数据改进工具描述,从而减少未来的故障。
根据以下标准评估工具设计:明确性、完整性、可恢复性、效率和一致性。通过呈现具有代表性的智能体请求并评估生成的工具调用来测试工具。
模糊描述: “在数据库中搜索客户信息”留下了太多未解答的问题。
晦涩的参数名称: 名为 x、val 或 param1 的参数迫使智能体猜测其含义。
缺少错误处理: 以通用错误失败的工具不提供任何恢复指导。
命名不一致: 在某些工具中使用 id,在其他工具中使用 identifier,在某些工具中使用 customer_id,会造成混淆。
设计工具集合时:
示例 1:设计良好的工具
def get_customer(customer_id: str, format: str = "concise"):
"""
按 ID 检索客户信息。
使用时机:
- 用户询问特定客户详情时
- 需要客户上下文进行决策时
- 验证客户身份时
参数:
customer_id: 格式 "CUST-######"(例如,"CUST-000001")
format: "concise" 表示关键字段,"detailed" 表示完整记录
返回:
包含请求字段的客户对象
错误:
NOT_FOUND: 未找到客户 ID
INVALID_FORMAT: ID 必须匹配 CUST-###### 模式
"""
示例 2:糟糕的工具设计
此示例演示了几个工具设计反模式:
def search(query):
"""搜索数据库。"""
pass
此设计的问题:
故障模式:
此技能关联到:
内部参考资料:
本集合中的相关技能:
外部资源:
创建日期:2025-12-20 最后更新:2025-12-23 作者:上下文工程贡献者智能体技能 版本:1.1.0
每周安装次数
75
代码仓库
GitHub 星标数
18
首次出现
2026年1月20日
安全审计
安装于
codex57
opencode54
claude-code53
gemini-cli52
cursor52
github-copilot45
Tools are the primary mechanism through which agents interact with the world. They define the contract between deterministic systems and non-deterministic agents. Unlike traditional software APIs designed for developers, tool APIs must be designed for language models that reason about intent, infer parameter values, and generate calls from natural language requests. Poor tool design creates failure modes that no amount of prompt engineering can fix. Effective tool design follows specific principles that account for how agents perceive and use tools.
Activate this skill when:
Tools are contracts between deterministic systems and non-deterministic agents. The consolidation principle states that if a human engineer cannot definitively say which tool should be used in a given situation, an agent cannot be expected to do better. Effective tool descriptions are prompt engineering that shapes agent behavior.
Key principles include: clear descriptions that answer what, when, and what returns; response formats that balance completeness and token efficiency; error messages that enable recovery; and consistent conventions that reduce cognitive load.
Tools as Contracts Tools are contracts between deterministic systems and non-deterministic agents. When humans call APIs, they understand the contract and make appropriate requests. Agents must infer the contract from descriptions and generate calls that match expected formats.
This fundamental difference requires rethinking API design. The contract must be unambiguous, examples must illustrate expected patterns, and error messages must guide correction. Every ambiguity in tool definitions becomes a potential failure mode.
Tool Description as Prompt Tool descriptions are loaded into agent context and collectively steer behavior. The descriptions are not just documentation—they are prompt engineering that shapes how agents reason about tool use.
Poor descriptions like "Search the database" with cryptic parameter names force agents to guess. Optimized descriptions include usage context, examples, and defaults. The description answers: what the tool does, when to use it, and what it produces.
Namespacing and Organization As tool collections grow, organization becomes critical. Namespacing groups related tools under common prefixes, helping agents select appropriate tools at the right time.
Namespacing creates clear boundaries between functionality. When an agent needs database information, it routes to the database namespace. When it needs web search, it routes to web namespace.
Single Comprehensive Tools The consolidation principle states that if a human engineer cannot definitively say which tool should be used in a given situation, an agent cannot be expected to do better. This leads to a preference for single comprehensive tools over multiple narrow tools.
Instead of implementing list_users, list_events, and create_event, implement schedule_event that finds availability and schedules. The comprehensive tool handles the full workflow internally rather than requiring agents to chain multiple calls.
Why Consolidation Works Agents have limited context and attention. Each tool in the collection competes for attention in the tool selection phase. Each tool adds description tokens that consume context budget. Overlapping functionality creates ambiguity about which tool to use.
Consolidation reduces token consumption by eliminating redundant descriptions. It eliminates ambiguity by having one tool cover each workflow. It reduces tool selection complexity by shrinking the effective tool set.
When Not to Consolidate Consolidation is not universally correct. Tools with fundamentally different behaviors should remain separate. Tools used in different contexts benefit from separation. Tools that might be called independently should not be artificially bundled.
The consolidation principle, taken to its logical extreme, leads to architectural reduction: removing most specialized tools in favor of primitive, general-purpose capabilities. Production evidence shows this approach can outperform sophisticated multi-tool architectures.
The File System Agent Pattern Instead of building custom tools for data exploration, schema lookup, and query validation, provide direct file system access through a single command execution tool. The agent uses standard Unix utilities (grep, cat, find, ls) to explore, understand, and operate on your system.
This works because:
When Reduction Outperforms Complexity Reduction works when:
Reduction fails when:
Stop Constraining Reasoning A common anti-pattern is building tools to "protect" the model from complexity. Pre-filtering context, constraining options, wrapping interactions in validation logic. These guardrails often become liabilities as models improve.
The question to ask: are your tools enabling new capabilities, or are they constraining reasoning the model could handle on its own?
Build for Future Models Models improve faster than tooling can keep up. An architecture optimized for today's model may be over-constrained for tomorrow's. Build minimal architectures that can benefit from model improvements rather than sophisticated architectures that lock in current limitations.
See Architectural Reduction Case Study for production evidence.
Description Structure Effective tool descriptions answer four questions:
What does the tool do? Clear, specific description of functionality. Avoid vague language like "helps with" or "can be used for." State exactly what the tool accomplishes.
When should it be used? Specific triggers and contexts. Include both direct triggers ("User asks about pricing") and indirect signals ("Need current market rates").
What inputs does it accept? Parameter descriptions with types, constraints, and defaults. Explain what each parameter controls.
What does it return? Output format and structure. Include examples of successful responses and error conditions.
Default Parameter Selection Defaults should reflect common use cases. They reduce agent burden by eliminating unnecessary parameter specification. They prevent errors from omitted parameters.
Tool response size significantly impacts context usage. Implementing response format options gives agents control over verbosity.
Concise format returns essential fields only, appropriate for confirmation or basic information. Detailed format returns complete objects with all fields, appropriate when full context is needed for decisions.
Include guidance in tool descriptions about when to use each format. Agents learn to select appropriate formats based on task requirements.
Error messages serve two audiences: developers debugging issues and agents recovering from failures. For agents, error messages must be actionable. They must tell the agent what went wrong and how to correct it.
Design error messages that enable recovery. For retryable errors, include retry guidance. For input errors, include corrected format. For missing data, include what's needed.
Use a consistent schema across all tools. Establish naming conventions: verb-noun pattern for tool names, consistent parameter names across tools, consistent return field names.
Research shows tool description overlap causes model confusion. More tools do not always lead to better outcomes. A reasonable guideline is 10-20 tools for most applications. If more are needed, use namespacing to create logical groupings.
Implement mechanisms to help agents select the right tool: tool grouping, example-based selection, and hierarchy with umbrella tools that route to specialized sub-tools.
When using MCP (Model Context Protocol) tools, always use fully qualified tool names to avoid "tool not found" errors.
Format: ServerName:tool_name
# Correct: Fully qualified names
"Use the BigQuery:bigquery_schema tool to retrieve table schemas."
"Use the GitHub:create_issue tool to create issues."
# Incorrect: Unqualified names
"Use the bigquery_schema tool..." # May fail with multiple servers
Without the server prefix, agents may fail to locate tools, especially when multiple MCP servers are available. Establish naming conventions that include server context in all tool references.
Claude can optimize its own tools. When given a tool and observed failure modes, it diagnoses issues and suggests improvements. Production testing shows this approach achieves 40% reduction in task completion time by helping future agents avoid mistakes.
The Tool-Testing Agent Pattern :
def optimize_tool_description(tool_spec, failure_examples):
"""
Use an agent to analyze tool failures and improve descriptions.
Process:
1. Agent attempts to use tool across diverse tasks
2. Collect failure modes and friction points
3. Agent analyzes failures and proposes improvements
4. Test improved descriptions against same tasks
"""
prompt = f"""
Analyze this tool specification and the observed failures.
Tool: {tool_spec}
Failures observed:
{failure_examples}
Identify:
1. Why agents are failing with this tool
2. What information is missing from the description
3. What ambiguities cause incorrect usage
Propose an improved tool description that addresses these issues.
"""
return get_agent_response(prompt)
This creates a feedback loop: agents using tools generate failure data, which agents then use to improve tool descriptions, which reduces future failures.
Evaluate tool designs against criteria: unambiguity, completeness, recoverability, efficiency, and consistency. Test tools by presenting representative agent requests and evaluating the resulting tool calls.
Vague descriptions: "Search the database for customer information" leaves too many questions unanswered.
Cryptic parameter names: Parameters named x, val, or param1 force agents to guess meaning.
Missing error handling: Tools that fail with generic errors provide no recovery guidance.
Inconsistent naming: Using id in some tools, identifier in others, and customer_id in some creates confusion.
When designing tool collections:
Example 1: Well-Designed Tool
def get_customer(customer_id: str, format: str = "concise"):
"""
Retrieve customer information by ID.
Use when:
- User asks about specific customer details
- Need customer context for decision-making
- Verifying customer identity
Args:
customer_id: Format "CUST-######" (e.g., "CUST-000001")
format: "concise" for key fields, "detailed" for complete record
Returns:
Customer object with requested fields
Errors:
NOT_FOUND: Customer ID not found
INVALID_FORMAT: ID must match CUST-###### pattern
"""
Example 2: Poor Tool Design
This example demonstrates several tool design anti-patterns:
def search(query):
"""Search the database."""
pass
Problems with this design:
Failure modes:
This skill connects to:
Internal references:
Related skills in this collection:
External resources:
Created : 2025-12-20 Last Updated : 2025-12-23 Author : Agent Skills for Context Engineering Contributors Version : 1.1.0
Weekly Installs
75
Repository
GitHub Stars
18
First Seen
Jan 20, 2026
Security Audits
Gen Agent Trust HubWarnSocketPassSnykWarn
Installed on
codex57
opencode54
claude-code53
gemini-cli52
cursor52
github-copilot45
React 组合模式指南:Vercel 组件架构最佳实践,提升代码可维护性
120,000 周安装