智能体工具设计指南：AI工具API设计原则与架构简化最佳实践

tool-design by shipshitdev/library

75 周安装量

18 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/shipshitdev/library --skill tool-design

AI/机器学习开发系统架构

🇨🇳中文介绍

智能体工具设计

工具是智能体与世界交互的主要机制。它们定义了确定性系统与非确定性智能体之间的契约。与为开发者设计的传统软件 API 不同，工具 API 必须为能够推理意图、推断参数值并根据自然语言请求生成调用的语言模型而设计。糟糕的工具设计会产生任何提示工程都无法修复的故障模式。有效的工具设计遵循特定原则，这些原则考虑了智能体如何感知和使用工具。

何时启用

在以下情况下启用此技能：

为智能体系统创建新工具时
调试与工具相关的故障或误用时
优化现有工具集以提高智能体性能时
从头开始设计工具 API 时
评估第三方工具以进行智能体集成时
在代码库中标准化工具约定时

核心概念

工具是确定性系统与非确定性智能体之间的契约。整合原则指出：如果人类工程师无法确定在给定情况下应该使用哪个工具，那么就不能期望智能体做得更好。有效的工具描述是塑造智能体行为的提示工程。

关键原则包括：清晰的描述能回答是什么、何时使用以及返回什么；平衡完整性和令牌效率的响应格式；能够实现恢复的错误信息；以及减少认知负荷的一致约定。

详细主题

工具-智能体接口

工具即契约 工具是确定性系统与非确定性智能体之间的契约。当人类调用 API 时，他们理解契约并发出适当的请求。智能体必须从描述中推断契约，并生成符合预期格式的调用。

这种根本性差异要求重新思考 API 设计。契约必须明确无误，示例必须说明预期模式，错误信息必须指导纠正。工具定义中的每一个模糊点都成为一个潜在的故障模式。

工具描述即提示 工具描述被加载到智能体上下文中，并共同引导其行为。这些描述不仅仅是文档——它们是塑造智能体如何推理工具使用的提示工程。

像“搜索数据库”这样带有晦涩参数名称的糟糕描述会迫使智能体去猜测。优化的描述包括使用上下文、示例和默认值。描述应回答：工具做什么、何时使用以及产生什么结果。

命名空间与组织 随着工具集合的增长，组织变得至关重要。命名空间将相关工具分组在通用前缀下，帮助智能体在正确的时间选择适当的工具。

命名空间在功能之间创建了清晰的边界。当智能体需要数据库信息时，它会路由到数据库命名空间。当需要网络搜索时，它会路由到网络命名空间。

整合原则

单一综合性工具 整合原则指出：如果人类工程师无法确定在给定情况下应该使用哪个工具，那么就不能期望智能体做得更好。这导致人们倾向于使用单一综合性工具，而不是多个狭窄的工具。

与其实现 list_users、和，不如实现一个，它能查找可用性并进行安排。综合性工具在内部处理完整的工作流程，而不是要求智能体链接多个调用。

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

MCP 工具命名要求

使用 MCP（模型上下文协议）工具时，始终使用完全限定的工具名称，以避免“找不到工具”错误。

格式：ServerName:tool_name

# 正确：完全限定名称
"使用 BigQuery:bigquery_schema 工具来检索表模式。"
"使用 GitHub:create_issue 工具来创建问题。"

# 不正确：非限定名称
"使用 bigquery_schema 工具..." # 当有多个服务器时可能会失败

没有服务器前缀，智能体可能无法定位工具，尤其是在有多个 MCP 服务器可用时。建立命名约定，在所有工具引用中包含服务器上下文。

使用智能体优化工具

Claude 可以优化自己的工具。当给定一个工具并观察到故障模式时，它会诊断问题并提出改进建议。生产测试表明，通过帮助未来的智能体避免错误，这种方法实现了任务完成时间减少 40%。

工具测试智能体模式：

def optimize_tool_description(tool_spec, failure_examples):
    """
    使用智能体分析工具故障并改进描述。

    流程：
    1. 智能体尝试在各种任务中使用工具
    2. 收集故障模式和摩擦点
    3. 智能体分析故障并提出改进建议
    4. 针对相同任务测试改进后的描述
    """
    prompt = f"""
    分析此工具规范和观察到的故障。

    工具：{tool_spec}

    观察到的故障：
    {failure_examples}

    识别：
    1. 为什么智能体在使用此工具时失败
    2. 描述中缺少什么信息
    3. 哪些模糊性导致了不正确的使用

    提出一个改进的工具描述，以解决这些问题。
    """

    return get_agent_response(prompt)

这创建了一个反馈循环：使用工具的智能体生成故障数据，然后智能体利用这些数据改进工具描述，从而减少未来的故障。

根据以下标准评估工具设计：明确性、完整性、可恢复性、效率和一致性。通过呈现具有代表性的智能体请求并评估生成的工具调用来测试工具。

应避免的反模式

模糊描述： “在数据库中搜索客户信息”留下了太多未解答的问题。

晦涩的参数名称：名为 x、val 或 param1 的参数迫使智能体猜测其含义。

缺少错误处理：以通用错误失败的工具不提供任何恢复指导。

命名不一致：在某些工具中使用 id，在其他工具中使用 identifier，在某些工具中使用 customer_id，会造成混淆。

设计工具集合时：

识别智能体必须完成的不同工作流程
将相关操作分组到综合性工具中
确保每个工具都有明确、无歧义的目的
记录错误情况和恢复路径
通过实际的智能体交互进行测试

示例 1：设计良好的工具

def get_customer(customer_id: str, format: str = "concise"):
    """
    按 ID 检索客户信息。

    使用时机：
    - 用户询问特定客户详情时
    - 需要客户上下文进行决策时
    - 验证客户身份时

    参数：
        customer_id: 格式 "CUST-######"（例如，"CUST-000001"）
        format: "concise" 表示关键字段，"detailed" 表示完整记录

    返回：
        包含请求字段的客户对象

    错误：
        NOT_FOUND: 未找到客户 ID
        INVALID_FORMAT: ID 必须匹配 CUST-###### 模式
    """

示例 2：糟糕的工具设计

此示例演示了几个工具设计反模式：

def search(query):
    """搜索数据库。"""
    pass

此设计的问题：

名称模糊： “search” 是模糊的——搜索什么，目的是什么？
缺少参数：什么数据库？查询应采用什么格式？
没有返回描述：此函数返回什么？列表？字符串？错误处理呢？
没有使用上下文：智能体何时应使用此工具而不是其他工具？
没有错误处理：如果数据库不可用会发生什么？

智能体在应该使用更具体的工具时可能会调用此工具
智能体无法确定正确的查询格式
智能体无法解释结果
智能体无法从故障中恢复

编写能回答是什么、何时使用以及返回什么的描述
使用整合来减少模糊性
实现响应格式选项以提高令牌效率
设计便于智能体恢复的错误信息
建立并遵循一致的命名约定
限制工具数量并使用命名空间进行组织
通过实际的智能体交互测试工具设计
根据观察到的故障模式进行迭代
质疑每个工具是赋能还是限制了模型
优先选择原始的、通用的工具，而不是专用的包装器
投资于文档质量而非工具复杂性
构建能够从模型改进中受益的最小化架构

此技能关联到：

context-fundamentals - 工具如何与上下文交互
multi-agent-patterns - 每个智能体的专用工具
evaluation - 评估工具有效性

内部参考资料：

最佳实践参考 - 详细的工具设计指南
架构简化案例研究 - 工具极简主义的生产证据

本集合中的相关技能：

context-fundamentals - 工具上下文交互
evaluation - 工具测试模式

MCP（模型上下文协议）文档
框架工具约定
面向智能体的 API 设计最佳实践
Vercel d0 智能体架构案例研究

创建日期：2025-12-20 最后更新：2025-12-23 作者：上下文工程贡献者智能体技能版本：1.1.0

🇺🇸English

Tool Design for Agents

Tools are the primary mechanism through which agents interact with the world. They define the contract between deterministic systems and non-deterministic agents. Unlike traditional software APIs designed for developers, tool APIs must be designed for language models that reason about intent, infer parameter values, and generate calls from natural language requests. Poor tool design creates failure modes that no amount of prompt engineering can fix. Effective tool design follows specific principles that account for how agents perceive and use tools.

When to Activate

Activate this skill when:

Creating new tools for agent systems
Debugging tool-related failures or misuse
Optimizing existing tool sets for better agent performance
Designing tool APIs from scratch
Evaluating third-party tools for agent integration
Standardizing tool conventions across a codebase

Core Concepts

Tools are contracts between deterministic systems and non-deterministic agents. The consolidation principle states that if a human engineer cannot definitively say which tool should be used in a given situation, an agent cannot be expected to do better. Effective tool descriptions are prompt engineering that shapes agent behavior.

Key principles include: clear descriptions that answer what, when, and what returns; response formats that balance completeness and token efficiency; error messages that enable recovery; and consistent conventions that reduce cognitive load.

Detailed Topics

The Tool-Agent Interface

Tools as Contracts Tools are contracts between deterministic systems and non-deterministic agents. When humans call APIs, they understand the contract and make appropriate requests. Agents must infer the contract from descriptions and generate calls that match expected formats.

This fundamental difference requires rethinking API design. The contract must be unambiguous, examples must illustrate expected patterns, and error messages must guide correction. Every ambiguity in tool definitions becomes a potential failure mode.

Tool Description as Prompt Tool descriptions are loaded into agent context and collectively steer behavior. The descriptions are not just documentation—they are prompt engineering that shapes how agents reason about tool use.

Poor descriptions like "Search the database" with cryptic parameter names force agents to guess. Optimized descriptions include usage context, examples, and defaults. The description answers: what the tool does, when to use it, and what it produces.

Namespacing and Organization As tool collections grow, organization becomes critical. Namespacing groups related tools under common prefixes, helping agents select appropriate tools at the right time.

Namespacing creates clear boundaries between functionality. When an agent needs database information, it routes to the database namespace. When it needs web search, it routes to web namespace.

The Consolidation Principle

Single Comprehensive Tools The consolidation principle states that if a human engineer cannot definitively say which tool should be used in a given situation, an agent cannot be expected to do better. This leads to a preference for single comprehensive tools over multiple narrow tools.

Instead of implementing list_users, list_events, and create_event, implement schedule_event that finds availability and schedules. The comprehensive tool handles the full workflow internally rather than requiring agents to chain multiple calls.

Why Consolidation Works Agents have limited context and attention. Each tool in the collection competes for attention in the tool selection phase. Each tool adds description tokens that consume context budget. Overlapping functionality creates ambiguity about which tool to use.

Consolidation reduces token consumption by eliminating redundant descriptions. It eliminates ambiguity by having one tool cover each workflow. It reduces tool selection complexity by shrinking the effective tool set.

When Not to Consolidate Consolidation is not universally correct. Tools with fundamentally different behaviors should remain separate. Tools used in different contexts benefit from separation. Tools that might be called independently should not be artificially bundled.

Architectural Reduction

The consolidation principle, taken to its logical extreme, leads to architectural reduction: removing most specialized tools in favor of primitive, general-purpose capabilities. Production evidence shows this approach can outperform sophisticated multi-tool architectures.

The File System Agent Pattern Instead of building custom tools for data exploration, schema lookup, and query validation, provide direct file system access through a single command execution tool. The agent uses standard Unix utilities (grep, cat, find, ls) to explore, understand, and operate on your system.

This works because:

File systems are a proven abstraction that models understand deeply
Standard tools have predictable, well-documented behavior
The agent can chain primitives flexibly rather than being constrained to predefined workflows
Good documentation in files replaces the need for summarization tools

When Reduction Outperforms Complexity Reduction works when:

Your data layer is well-documented and consistently structured
The model has sufficient reasoning capability to navigate complexity
Your specialized tools were constraining rather than enabling the model
You're spending more time maintaining scaffolding than improving outcomes

Reduction fails when:

Your underlying data is messy, inconsistent, or poorly documented
The domain requires specialized knowledge the model lacks
Safety constraints require limiting what the agent can do
Operations are truly complex and benefit from structured workflows

Stop Constraining Reasoning A common anti-pattern is building tools to "protect" the model from complexity. Pre-filtering context, constraining options, wrapping interactions in validation logic. These guardrails often become liabilities as models improve.

The question to ask: are your tools enabling new capabilities, or are they constraining reasoning the model could handle on its own?

Build for Future Models Models improve faster than tooling can keep up. An architecture optimized for today's model may be over-constrained for tomorrow's. Build minimal architectures that can benefit from model improvements rather than sophisticated architectures that lock in current limitations.

See Architectural Reduction Case Study for production evidence.

Tool Description Engineering

Description Structure Effective tool descriptions answer four questions:

What does the tool do? Clear, specific description of functionality. Avoid vague language like "helps with" or "can be used for." State exactly what the tool accomplishes.

When should it be used? Specific triggers and contexts. Include both direct triggers ("User asks about pricing") and indirect signals ("Need current market rates").

What inputs does it accept? Parameter descriptions with types, constraints, and defaults. Explain what each parameter controls.

What does it return? Output format and structure. Include examples of successful responses and error conditions.

Default Parameter Selection Defaults should reflect common use cases. They reduce agent burden by eliminating unnecessary parameter specification. They prevent errors from omitted parameters.

Response Format Optimization

Tool response size significantly impacts context usage. Implementing response format options gives agents control over verbosity.

Concise format returns essential fields only, appropriate for confirmation or basic information. Detailed format returns complete objects with all fields, appropriate when full context is needed for decisions.

Include guidance in tool descriptions about when to use each format. Agents learn to select appropriate formats based on task requirements.

Error Message Design

Error messages serve two audiences: developers debugging issues and agents recovering from failures. For agents, error messages must be actionable. They must tell the agent what went wrong and how to correct it.

Design error messages that enable recovery. For retryable errors, include retry guidance. For input errors, include corrected format. For missing data, include what's needed.

Tool Definition Schema

Use a consistent schema across all tools. Establish naming conventions: verb-noun pattern for tool names, consistent parameter names across tools, consistent return field names.

Tool Collection Design

Research shows tool description overlap causes model confusion. More tools do not always lead to better outcomes. A reasonable guideline is 10-20 tools for most applications. If more are needed, use namespacing to create logical groupings.

Implement mechanisms to help agents select the right tool: tool grouping, example-based selection, and hierarchy with umbrella tools that route to specialized sub-tools.

MCP Tool Naming Requirements

When using MCP (Model Context Protocol) tools, always use fully qualified tool names to avoid "tool not found" errors.

Format: ServerName:tool_name

# Correct: Fully qualified names
"Use the BigQuery:bigquery_schema tool to retrieve table schemas."
"Use the GitHub:create_issue tool to create issues."

# Incorrect: Unqualified names
"Use the bigquery_schema tool..."  # May fail with multiple servers

Without the server prefix, agents may fail to locate tools, especially when multiple MCP servers are available. Establish naming conventions that include server context in all tool references.

Using Agents to Optimize Tools

Claude can optimize its own tools. When given a tool and observed failure modes, it diagnoses issues and suggests improvements. Production testing shows this approach achieves 40% reduction in task completion time by helping future agents avoid mistakes.

The Tool-Testing Agent Pattern :

def optimize_tool_description(tool_spec, failure_examples):
    """
    Use an agent to analyze tool failures and improve descriptions.
    
    Process:
    1. Agent attempts to use tool across diverse tasks
    2. Collect failure modes and friction points
    3. Agent analyzes failures and proposes improvements
    4. Test improved descriptions against same tasks
    """
    prompt = f"""
    Analyze this tool specification and the observed failures.
    
    Tool: {tool_spec}
    
    Failures observed:
    {failure_examples}
    
    Identify:
    1. Why agents are failing with this tool
    2. What information is missing from the description
    3. What ambiguities cause incorrect usage
    
    Propose an improved tool description that addresses these issues.
    """
    
    return get_agent_response(prompt)

This creates a feedback loop: agents using tools generate failure data, which agents then use to improve tool descriptions, which reduces future failures.

Testing Tool Design

Evaluate tool designs against criteria: unambiguity, completeness, recoverability, efficiency, and consistency. Test tools by presenting representative agent requests and evaluating the resulting tool calls.

Practical Guidance

Anti-Patterns to Avoid

Vague descriptions: "Search the database for customer information" leaves too many questions unanswered.

Cryptic parameter names: Parameters named x, val, or param1 force agents to guess meaning.

Missing error handling: Tools that fail with generic errors provide no recovery guidance.

Inconsistent naming: Using id in some tools, identifier in others, and customer_id in some creates confusion.

Tool Selection Framework

When designing tool collections:

Identify distinct workflows agents must accomplish
Group related actions into comprehensive tools
Ensure each tool has a clear, unambiguous purpose
Document error cases and recovery paths
Test with actual agent interactions

Examples

Example 1: Well-Designed Tool

def get_customer(customer_id: str, format: str = "concise"):
    """
    Retrieve customer information by ID.
    
    Use when:
    - User asks about specific customer details
    - Need customer context for decision-making
    - Verifying customer identity
    
    Args:
        customer_id: Format "CUST-######" (e.g., "CUST-000001")
        format: "concise" for key fields, "detailed" for complete record
    
    Returns:
        Customer object with requested fields
    
    Errors:
        NOT_FOUND: Customer ID not found
        INVALID_FORMAT: ID must match CUST-###### pattern
    """

Example 2: Poor Tool Design

This example demonstrates several tool design anti-patterns:

def search(query):
    """Search the database."""
    pass

Problems with this design:

Vague name : "search" is ambiguous - search what, for what purpose?
Missing parameters : What database? What format should query take?
No return description : What does this function return? A list? A string? Error handling?
No usage context : When should an agent use this versus other tools?
No error handling : What happens if the database is unavailable?

Failure modes:

Agents may call this tool when they should use a more specific tool
Agents cannot determine correct query format
Agents cannot interpret results
Agents cannot recover from failures

Guidelines

Write descriptions that answer what, when, and what returns
Use consolidation to reduce ambiguity
Implement response format options for token efficiency
Design error messages for agent recovery
Establish and follow consistent naming conventions
Limit tool count and use namespacing for organization
Test tool designs with actual agent interactions
Iterate based on observed failure modes
Question whether each tool enables or constrains the model
Prefer primitive, general-purpose tools over specialized wrappers
Invest in documentation quality over tooling sophistication
Build minimal architectures that benefit from model improvements

Integration

This skill connects to:

context-fundamentals - How tools interact with context
multi-agent-patterns - Specialized tools per agent
evaluation - Evaluating tool effectiveness

References

Internal references:

Best Practices Reference - Detailed tool design guidelines
Architectural Reduction Case Study - Production evidence for tool minimalism

Related skills in this collection:

context-fundamentals - Tool context interactions
evaluation - Tool testing patterns

External resources:

MCP (Model Context Protocol) documentation
Framework tool conventions
API design best practices for agents
Vercel d0 agent architecture case study

Skill Metadata

Created : 2025-12-20 Last Updated : 2025-12-23 Author : Agent Skills for Context Engineering Contributors Version : 1.1.0

Weekly Installs

Repository

shipshitdev/library

GitHub Stars

First Seen

Jan 20, 2026

Security Audits

Gen Agent Trust HubWarn SocketPass SnykWarn

Installed on

codex57

opencode54

claude-code53

gemini-cli52

cursor52

github-copilot45

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

120,000 周安装