⚠️

重要前提

安装AI Skills的关键前提是：必须科学上网，且开启TUN模式，这一点至关重要，直接决定安装能否顺利完成，在此郑重提醒三遍：科学上网，科学上网，科学上网。查看完整安装教程 →

智能体工具设计指南：如何为AI Agent构建高效、可靠的API工具

tool-design by guanyang/antigravity-skills

59 周安装量

595 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/guanyang/antigravity-skills --skill tool-design

AI/机器学习软件工程 API

🇨🇳中文介绍

智能体工具设计

将每个工具设计为确定性系统与非确定性智能体之间的契约。与面向人类的 API 不同，面向智能体的工具必须仅通过描述就能使契约明确无误——智能体从描述中推断意图，并生成必须符合预期格式的调用。任何模糊性都会成为潜在的故障模式，这是任何提示工程都无法修复的。

何时启用

在以下情况下启用此技能：

为智能体系统创建新工具时
调试与工具相关的故障或误用时
优化现有工具集以提升智能体性能时
从头开始设计工具 API 时
评估用于智能体集成的第三方工具时
在代码库中标准化工具约定时

核心概念

围绕整合原则设计工具：如果人类工程师无法明确说出在给定情况下应该使用哪个工具，就不能指望智能体做得更好。减少工具集，直到每个工具都有一个明确的目的，因为智能体通过比较描述来选择工具，任何重叠都会引入选择错误。

将每个工具描述视为塑造智能体行为的提示工程。描述不是面向人类的文档——它被注入到智能体的上下文中，并直接引导推理。编写描述时要回答工具做什么、何时使用以及返回什么这三个问题，因为这三个问题正是智能体在工具选择过程中评估的内容。

详细主题

工具-智能体接口

作为契约的工具 将每个工具设计为一个自包含的契约。当人类调用 API 时，他们会阅读文档、理解约定并发出适当的请求。智能体必须从单个描述块中推断出整个契约。通过包含格式示例、预期模式和明确约束来使契约明确无误。不要省略调用者需要了解的任何信息，因为智能体在发出调用前无法提出澄清性问题。

作为提示的工具描述 编写工具描述时，要了解它们会直接加载到智能体上下文中，并共同引导行为。像“搜索数据库”这样带有神秘参数名称的模糊描述会迫使智能体猜测——而猜测会产生错误的调用。相反，应包含使用上下文、参数格式示例和合理的默认值。描述中的每个词要么有助于提高工具选择的准确性，要么会损害它。

命名空间和组织 随着集合的增长，将工具放在通用前缀下进行命名空间管理，因为智能体受益于分层分组。当智能体需要数据库操作时，它会路由到 db_* 命名空间；当需要网络交互时，它会路由到 web_*。没有命名空间，智能体必须在扁平列表中评估每个工具，这会随着数量增长而降低选择准确性。

整合原则

单一综合性工具 构建单一的综合性工具，而不是多个重叠的狭窄工具。与其分别实现 list_users、和，不如实现一个，它能在一次调用中查找可用性并进行安排。综合性工具在内部处理完整的工作流程，消除了智能体以正确顺序链接调用的负担。

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

MCP 工具命名要求

使用 MCP（模型上下文协议）时，始终使用完全限定的工具名称，以避免“找不到工具”错误。

格式：ServerName:tool_name

# 正确：完全限定名称
"使用 BigQuery:bigquery_schema 工具来检索表模式。"
"使用 GitHub:create_issue 工具来创建问题。"

# 错误：非限定名称
"使用 bigquery_schema 工具..."  # 当有多个服务器时可能会失败

没有服务器前缀，当有多个 MCP 服务器可用时，智能体可能无法定位工具。建立命名约定，在所有工具引用中包含服务器上下文。

使用智能体优化工具

将观察到的工具故障反馈给智能体，以诊断问题并改进描述。生产测试表明，这种方法通过帮助未来的智能体避免错误，实现了任务完成时间减少 40%。

工具测试智能体模式：

def optimize_tool_description(tool_spec, failure_examples):
    """
    使用智能体分析工具故障并改进描述。

    过程：
    1. 智能体尝试在各种任务中使用工具
    2. 收集故障模式和摩擦点
    3. 智能体分析故障并提出改进建议
    4. 针对相同任务测试改进后的描述
    """
    prompt = f"""
    分析此工具规范和观察到的故障。

    工具：{tool_spec}

    观察到的故障：
    {failure_examples}

    识别：
    1. 智能体使用此工具失败的原因
    2. 描述中缺少哪些信息
    3. 哪些模糊性导致了不正确的使用

    提出一个改进的工具描述来解决这些问题。
    """

    return get_agent_response(prompt)

这创建了一个反馈循环：使用工具的智能体生成故障数据，然后智能体利用这些数据改进工具描述，从而减少未来的故障。

根据五个标准评估工具设计：明确性、完整性、可恢复性、效率和一致性。通过呈现具有代表性的智能体请求，并评估生成的工具调用是否符合预期行为来进行测试。

设计工具集合时：

识别智能体必须完成的不同工作流程
将相关操作分组到综合性工具中
确保每个工具都有明确、无歧义的目的
记录错误情况和恢复路径
使用实际的智能体交互进行测试

示例 1：设计良好的工具

def get_customer(customer_id: str, format: str = "concise"):
    """
    通过 ID 检索客户信息。

    使用时机：
    - 用户询问特定客户详情时
    - 需要客户上下文进行决策时
    - 验证客户身份时

    参数：
        customer_id: 格式 "CUST-######"（例如，"CUST-000001"）
        format: "concise" 表示关键字段，"detailed" 表示完整记录

    返回：
        包含请求字段的客户对象

    错误：
        NOT_FOUND: 未找到客户 ID
        INVALID_FORMAT: ID 必须匹配 CUST-###### 模式
    """

示例 2：设计不佳的工具

此示例演示了多个工具设计反模式：

def search(query):
    """搜索数据库。"""
    pass

此设计的问题：

名称模糊："search" 是模糊的——搜索什么？出于什么目的？
缺少参数：什么数据库？查询应采用什么格式？
无返回描述：此函数返回什么？列表？字符串？错误处理呢？
无使用上下文：智能体何时应使用此工具而不是其他工具？
无错误处理：如果数据库不可用会发生什么？

智能体可能在本应使用更具体的工具时调用此工具
智能体无法确定正确的查询格式
智能体无法解释结果
智能体无法从故障中恢复

编写能回答做什么、何时用、返回什么的描述
使用整合来减少模糊性
实现响应格式选项以提高令牌效率
设计便于智能体恢复的错误消息
建立并遵循一致的命名约定
限制工具数量并使用命名空间进行组织
使用实际的智能体交互测试工具设计
根据观察到的故障模式进行迭代
质疑每个工具是赋能还是限制了模型
优先选择原始的通用工具，而非专用包装器
在文档质量上投入，而非工具复杂性
构建能受益于模型改进的最小化架构

模糊描述：像“在数据库中搜索客户信息”这样的描述留下了太多未解答的问题。说明确切的数据库、查询格式和返回结构。
神秘的参数名称：名为 x、val 或 param1 的参数会迫使智能体猜测含义。使用描述性名称，无需阅读进一步文档即可传达目的。
缺少错误恢复指导：以“发生错误”等通用消息失败的工具不提供任何恢复信号。每个错误响应都必须告诉智能体出了什么问题以及下一步该尝试什么。
跨工具命名不一致：在一个工具中使用 id，在另一个中使用 identifier，在第三个中使用 customer_id 会造成混淆。在整个工具集合中标准化参数名称。
MCP 命名空间冲突：当多个 MCP 工具提供者注册了名称相似的工具（例如，两个服务器都暴露了 search）时，智能体无法区分。始终使用完全限定的 ServerName:tool_name 格式，并在添加新提供者时检查冲突。
工具描述陈旧：随着底层 API 的发展，描述变得不准确——参数增加、返回格式更改、错误代码变化。将描述视为代码：对其进行版本控制，在 API 更改时进行审查，并根据当前行为进行测试。
过度整合：让一个工具处理太多工作流程会产生如此庞大的参数列表，以至于智能体难以选择正确的组合。如果一个工具需要超过 8-10 个参数，或者服务于根本不同的用例，请拆分它。
参数爆炸：过多的可选参数会使智能体决策不堪重负。智能体必须评估的每个参数都会增加认知负荷。提供合理的默认值，将相关选项分组到格式预设中，并将很少使用的参数移到 options 对象中。
缺少错误上下文：仅说“失败”或“输入无效”而不指定哪个输入、失败原因或有效输入是什么样的错误消息，会让智能体无法自我纠正。在每个错误响应中包含无效值、预期格式和具体示例。

此技能关联到：

context-fundamentals - 工具如何与上下文交互
multi-agent-patterns - 每个智能体的专用工具
evaluation - 评估工具有效性

内部参考资料：

最佳实践参考 - 阅读时机：从头开始设计新工具，或审核现有工具集合以查找质量差距时
架构简化案例研究 - 阅读时机：考虑移除专用工具而采用原语时，或评估复杂工具架构是否合理时

此集合中的相关技能：

context-fundamentals - 工具上下文交互
evaluation - 工具测试模式

MCP（模型上下文协议）文档 - 阅读时机：为多服务器智能体环境实现工具，或调试工具路由故障时
框架工具约定 - 阅读时机：采用新的智能体框架，并需要将工具设计原则映射到框架特定的 API 时
面向智能体的 API 设计最佳实践 - 阅读时机：将现有的面向人类的 API 转换为面向智能体的工具接口时
Vercel d0 智能体架构案例研究 - 阅读时机：评估是否整合工具，或寻求架构简化的生产证据时

创建时间：2025-12-20 最后更新：2026-03-17 作者：Agent Skills for Context Engineering Contributors 版本：2.0.0

🇺🇸English

Tool Design for Agents

Design every tool as a contract between a deterministic system and a non-deterministic agent. Unlike human-facing APIs, agent-facing tools must make the contract unambiguous through the description alone -- agents infer intent from descriptions and generate calls that must match expected formats. Every ambiguity becomes a potential failure mode that no amount of prompt engineering can fix.

When to Activate

Activate this skill when:

Creating new tools for agent systems
Debugging tool-related failures or misuse
Optimizing existing tool sets for better agent performance
Designing tool APIs from scratch
Evaluating third-party tools for agent integration
Standardizing tool conventions across a codebase

Core Concepts

Design tools around the consolidation principle: if a human engineer cannot definitively say which tool should be used in a given situation, an agent cannot be expected to do better. Reduce the tool set until each tool has one unambiguous purpose, because agents select tools by comparing descriptions and any overlap introduces selection errors.

Treat every tool description as prompt engineering that shapes agent behavior. The description is not documentation for humans -- it is injected into the agent's context and directly steers reasoning. Write descriptions that answer what the tool does, when to use it, and what it returns, because these three questions are exactly what agents evaluate during tool selection.

Detailed Topics

The Tool-Agent Interface

Tools as Contracts Design each tool as a self-contained contract. When humans call APIs, they read docs, understand conventions, and make appropriate requests. Agents must infer the entire contract from a single description block. Make the contract unambiguous by including format examples, expected patterns, and explicit constraints. Omit nothing that a caller needs to know, because agents cannot ask clarifying questions before making a call.

Tool Description as Prompt Write tool descriptions knowing they load directly into agent context and collectively steer behavior. A vague description like "Search the database" with cryptic parameter names forces the agent to guess -- and guessing produces incorrect calls. Instead, include usage context, parameter format examples, and sensible defaults. Every word in the description either helps or hurts tool selection accuracy.

Namespacing and Organization Namespace tools under common prefixes as the collection grows, because agents benefit from hierarchical grouping. When an agent needs database operations, it routes to the db_* namespace; when it needs web interactions, it routes to web_*. Without namespacing, agents must evaluate every tool in a flat list, which degrades selection accuracy as the count grows.

The Consolidation Principle

Single Comprehensive Tools Build single comprehensive tools instead of multiple narrow tools that overlap. Rather than implementing list_users, list_events, and create_event separately, implement schedule_event that finds availability and schedules in one call. The comprehensive tool handles the full workflow internally, removing the agent's burden of chaining calls in the correct order.

Why Consolidation Works Apply consolidation because agents have limited context and attention. Each tool in the collection competes for attention during tool selection, each description consumes context budget tokens, and overlapping functionality creates ambiguity. Consolidation eliminates redundant descriptions, removes selection ambiguity, and shrinks the effective tool set. Vercel demonstrated this principle by reducing their agent from 17 specialized tools to 2 general-purpose tools and achieving better performance -- fewer tools meant less confusion and more reliable tool selection.

When Not to Consolidate Keep tools separate when they have fundamentally different behaviors, serve different contexts, or must be callable independently. Over-consolidation creates a different problem: a single tool with too many parameters and modes becomes hard for agents to parameterize correctly.

Architectural Reduction

Push the consolidation principle to its logical extreme by removing most specialized tools in favor of primitive, general-purpose capabilities. Production evidence shows this approach can outperform sophisticated multi-tool architectures.

The File System Agent Pattern Provide direct file system access through a single command execution tool instead of building custom tools for data exploration, schema lookup, and query validation. The agent uses standard Unix utilities (grep, cat, find, ls) to explore and operate on the system. This works because file systems are a proven abstraction that models understand deeply, standard tools have predictable behavior, agents can chain primitives flexibly rather than being constrained to predefined workflows, and good documentation in files replaces summarization tools.

When Reduction Outperforms Complexity Choose reduction when the data layer is well-documented and consistently structured, the model has sufficient reasoning capability, specialized tools were constraining rather than enabling the model, or more time is spent maintaining scaffolding than improving outcomes. Avoid reduction when underlying data is messy or poorly documented, the domain requires specialized knowledge the model lacks, safety constraints must limit agent actions, or operations genuinely benefit from structured workflows.

Build for Future Models Design minimal architectures that benefit from model improvements rather than sophisticated architectures that lock in current limitations. Ask whether each tool enables new capabilities or constrains reasoning the model could handle on its own -- tools built as "guardrails" often become liabilities as models improve.

See Architectural Reduction Case Study for production evidence.

Tool Description Engineering

Description Structure Structure every tool description to answer four questions:

What does the tool do? State exactly what the tool accomplishes -- avoid vague language like "helps with" or "can be used for."
When should it be used? Specify direct triggers ("User asks about pricing") and indirect signals ("Need current market rates").
What inputs does it accept? Describe each parameter with types, constraints, defaults, and format examples.
What does it return? Document the output format, structure, successful response examples, and error conditions.

Default Parameter Selection Set defaults to reflect common use cases. Defaults reduce agent burden by eliminating unnecessary parameter specification and prevent errors from omitted parameters. Choose defaults that produce useful results without requiring the agent to understand every option.

Response Format Optimization

Offer response format options (concise vs. detailed) because tool response size significantly impacts context usage. Concise format returns essential fields only, suitable for confirmations. Detailed format returns complete objects, suitable when full context drives decisions. Document when to use each format in the tool description so agents learn to select appropriately.

Error Message Design

Design error messages for two audiences: developers debugging issues and agents recovering from failures. For agents, every error message must be actionable -- it must state what went wrong and how to correct it. Include retry guidance for retryable errors, corrected format examples for input errors, and specific missing fields for incomplete requests. An error that says only "failed" provides zero recovery signal.

Tool Definition Schema

Establish a consistent schema across all tools. Use verb-noun pattern for tool names (get_customer, create_order), consistent parameter names across tools (always customer_id, never sometimes id and sometimes identifier), and consistent return field names. Consistency reduces the cognitive load on agents and improves cross-tool generalization.

Tool Collection Design

Limit tool collections to 10-20 tools for most applications, because research shows description overlap causes model confusion and more tools do not always lead to better outcomes. When more tools are genuinely needed, use namespacing to create logical groupings. Implement selection mechanisms: tool grouping by domain, example-based selection hints, and umbrella tools that route to specialized sub-tools.

MCP Tool Naming Requirements

Always use fully qualified tool names with MCP (Model Context Protocol) to avoid "tool not found" errors.

Format: ServerName:tool_name

# Correct: Fully qualified names
"Use the BigQuery:bigquery_schema tool to retrieve table schemas."
"Use the GitHub:create_issue tool to create issues."

# Incorrect: Unqualified names
"Use the bigquery_schema tool..."  # May fail with multiple servers

Without the server prefix, agents may fail to locate tools when multiple MCP servers are available. Establish naming conventions that include server context in all tool references.

Using Agents to Optimize Tools

Feed observed tool failures back to an agent to diagnose issues and improve descriptions. Production testing shows this approach achieves 40% reduction in task completion time by helping future agents avoid mistakes.

The Tool-Testing Agent Pattern :

def optimize_tool_description(tool_spec, failure_examples):
    """
    Use an agent to analyze tool failures and improve descriptions.

    Process:
    1. Agent attempts to use tool across diverse tasks
    2. Collect failure modes and friction points
    3. Agent analyzes failures and proposes improvements
    4. Test improved descriptions against same tasks
    """
    prompt = f"""
    Analyze this tool specification and the observed failures.

    Tool: {tool_spec}

    Failures observed:
    {failure_examples}

    Identify:
    1. Why agents are failing with this tool
    2. What information is missing from the description
    3. What ambiguities cause incorrect usage

    Propose an improved tool description that addresses these issues.
    """

    return get_agent_response(prompt)

This creates a feedback loop: agents using tools generate failure data, which agents then use to improve tool descriptions, which reduces future failures.

Testing Tool Design

Evaluate tool designs against five criteria: unambiguity, completeness, recoverability, efficiency, and consistency. Test by presenting representative agent requests and evaluating the resulting tool calls against expected behavior.

Practical Guidance

Tool Selection Framework

When designing tool collections:

Identify distinct workflows agents must accomplish
Group related actions into comprehensive tools
Ensure each tool has a clear, unambiguous purpose
Document error cases and recovery paths
Test with actual agent interactions

Examples

Example 1: Well-Designed Tool

def get_customer(customer_id: str, format: str = "concise"):
    """
    Retrieve customer information by ID.

    Use when:
    - User asks about specific customer details
    - Need customer context for decision-making
    - Verifying customer identity

    Args:
        customer_id: Format "CUST-######" (e.g., "CUST-000001")
        format: "concise" for key fields, "detailed" for complete record

    Returns:
        Customer object with requested fields

    Errors:
        NOT_FOUND: Customer ID not found
        INVALID_FORMAT: ID must match CUST-###### pattern
    """

Example 2: Poor Tool Design

This example demonstrates several tool design anti-patterns:

def search(query):
    """Search the database."""
    pass

Problems with this design:

Vague name : "search" is ambiguous - search what, for what purpose?
Missing parameters : What database? What format should query take?
No return description : What does this function return? A list? A string? Error handling?
No usage context : When should an agent use this versus other tools?
No error handling : What happens if the database is unavailable?

Failure modes:

Agents may call this tool when they should use a more specific tool
Agents cannot determine correct query format
Agents cannot interpret results
Agents cannot recover from failures

Guidelines

Write descriptions that answer what, when, and what returns
Use consolidation to reduce ambiguity
Implement response format options for token efficiency
Design error messages for agent recovery
Establish and follow consistent naming conventions
Limit tool count and use namespacing for organization
Test tool designs with actual agent interactions
Iterate based on observed failure modes
Question whether each tool enables or constrains the model
Prefer primitive, general-purpose tools over specialized wrappers
Invest in documentation quality over tooling sophistication
Build minimal architectures that benefit from model improvements

Gotchas

Vague descriptions : Descriptions like "Search the database for customer information" leave too many questions unanswered. State the exact database, query format, and return shape.
Cryptic parameter names : Parameters named x, val, or param1 force agents to guess meaning. Use descriptive names that convey purpose without reading further documentation.
Missing error recovery guidance : Tools that fail with generic messages like "Error occurred" provide no recovery signal. Every error response must tell the agent what went wrong and what to try next.
Inconsistent naming across tools : Using id in one tool, identifier in another, and customer_id in a third creates confusion. Standardize parameter names across the entire tool collection.
MCP namespace collisions : When multiple MCP tool providers register tools with similar names (e.g., two servers both exposing search), agents cannot disambiguate. Always use fully qualified format and audit for collisions when adding new providers.

Integration

This skill connects to:

context-fundamentals - How tools interact with context
multi-agent-patterns - Specialized tools per agent
evaluation - Evaluating tool effectiveness

References

Internal references:

Best Practices Reference - Read when: designing a new tool from scratch or auditing an existing tool collection for quality gaps
Architectural Reduction Case Study - Read when: considering removing specialized tools in favor of primitives, or evaluating whether a complex tool architecture is justified

Related skills in this collection:

context-fundamentals - Tool context interactions
evaluation - Tool testing patterns

External resources:

MCP (Model Context Protocol) documentation - Read when: implementing tools for multi-server agent environments or debugging tool routing failures
Framework tool conventions - Read when: adopting a new agent framework and need to map tool design principles to framework-specific APIs
API design best practices for agents - Read when: translating existing human-facing APIs into agent-facing tool interfaces
Vercel d0 agent architecture case study - Read when: evaluating whether to consolidate tools or seeking production evidence for architectural reduction

Skill Metadata

Created : 2025-12-20 Last Updated : 2026-03-17 Author : Agent Skills for Context Engineering Contributors Version : 2.0.0

Weekly Installs

Repository

guanyang/antigr…y-skills

GitHub Stars

595

First Seen

Jan 26, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykWarn

Installed on

opencode54

codex53

github-copilot52

gemini-cli51

cursor50

antigravity50

AI Elements：基于shadcn/ui的AI原生应用组件库，快速构建对话界面

76,800 周安装

ServerName:tool_name

Tool description rot : Descriptions become inaccurate as underlying APIs evolve -- parameters get added, return formats change, error codes shift. Treat descriptions as code: version them, review them during API changes, and test them against current behavior.

Over-consolidation : Making a single tool handle too many workflows produces parameter lists so large that agents struggle to select the right combination. If a tool requires more than 8-10 parameters or serves fundamentally different use cases, split it.

Parameter explosion : Too many optional parameters overwhelm agent decision-making. Each parameter the agent must evaluate adds cognitive load. Provide sensible defaults, group related options into format presets, and move rarely-used parameters into an options object.

Missing error context : Error messages that say only "failed" or "invalid input" without specifying which input, why it failed, or what a valid input looks like leave agents unable to self-correct. Include the invalid value, the expected format, and a concrete example in every error response.