OpenAI AgentKit 专家指南：2025年多智能体系统架构与最佳实践

OpenAI AgentKit Expert by frankxai/claude-skills-library

5 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/frankxai/claude-skills-library --skill 'OpenAI AgentKit Expert'

AI/机器学习自动化系统架构

🇨🇳中文介绍

OpenAI AgentKit 专家技能

目的

本技能提供关于使用 OpenAI 的 AgentKit 平台和 Agents SDK 构建生产就绪的多智能体系统的全面指导，遵循 2025 年最佳实践。

平台概述

OpenAI AgentKit (2025)

用于构建、部署和优化智能体的完整平台，提供企业级工具。

核心组件：

Agent Builder - 用于创建和版本化多智能体工作流的可视化画布
Connector Registry - 用于数据和工具连接的中心化管理
ChatKit - 可嵌入、可定制的基于聊天的智能体体验
Evaluation Suite - 数据集、跟踪评分、自动提示优化
Multi-Model Support - 第三方模型集成能力

Agents SDK (生产就绪)

Agents SDK 是实验性 Swarm 框架的生产演进版本。所有生产工作请使用 Agents SDK - Swarm 仅用于教育目的。

迁移说明： 如果遇到遗留的 Swarm 代码，请立即迁移到 Agents SDK。

核心概念

1. 智能体

一个智能体封装了：

一组指令（系统提示）
一组函数/工具
将执行移交给另一个智能体的能力

设计原则： 智能体应该是轻量级且专业化的，而不是庞大且通用的。

2. 例程

一个例程是智能体可以执行的一系列操作：

自然语言指令（通过系统提示）
执行所需的可用工具

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

模式 1：分诊模式

使用场景： 将请求路由到专门的子智能体

用户请求 → 分诊智能体 → [确定类别] → 专家智能体

# 具有移交能力的分诊智能体
triage_agent = Agent(
    name="Customer Service Triage",
    instructions="分析客户请求并路由到适当的专家",
    functions=[analyze_request],
    handoffs=[refund_agent, sales_agent, support_agent]
)

多个不同的能力领域
清晰的分类逻辑
不同的智能体需要不同的工具/上下文

模式 2：顺序编排

使用场景： 每个步骤都有专家的多步骤工作流

步骤 1 智能体 → [完成] → 移交 → 步骤 2 智能体 → ... → 最终智能体

# 研究 → 分析 → 报告生成流水线
research_agent → analysis_agent → report_agent

清晰的顺序依赖关系
每个步骤需要专业的知识
一个步骤的输出作为下一个步骤的输入

模式 3：并行分解

使用场景： 将复杂任务分解为并行子任务

协调器智能体
    ↓
    ├─→ 子任务智能体 1
    ├─→ 子任务智能体 2
    └─→ 子任务智能体 3
    ↓
合成智能体（合并结果）

独立的子任务可以并发运行
需要聚合多个视角
通过并行化进行性能优化

应该： ✅ 保持智能体专注于单一职责 ✅ 在系统提示中提供清晰、具体的指令 ✅ 定义明确的移交条件 ✅ 使用描述性的智能体名称（有助于调试） ✅ 在集成前单独测试智能体

不应该： ❌ 创建庞大的“无所不能”的智能体 ❌ 允许智能体直接通信（使用移交） ❌ 过度设计，使用过多专门的智能体 ❌ 忽略移交中的错误处理 ❌ 跳过智能体边界测试

有效的例程：

具有清晰的进入和退出条件
包含错误处理路径
指定所需的上下文/状态
记录预期的输入/输出
定义成功指标

routine = {
    "name": "Process Refund",
    "entry": "User requests refund",
    "steps": [
        "Verify order exists",
        "Check refund eligibility",
        "Calculate refund amount",
        "Process payment reversal",
        "Send confirmation"
    ],
    "exit": "Refund confirmed or rejection reason provided",
    "error_handling": "Escalate to human agent if verification fails"
}

清晰的触发条件 - 何时应该发生移交？
上下文传递 - 传递什么信息？
返回路径 - 控制权能否返回到发起智能体？
失败处理 - 如果目标智能体不可用怎么办？

def handoff_condition(state):
    """确定是否需要移交"""
    if state.requires_specialized_knowledge:
        return specialist_agent
    if state.exceeds_authority_level:
        return supervisor_agent
    return None  # 继续使用当前智能体

最小化 LLM 调用

原则： 限制 LLM 参与并依赖预定义或直接执行流的框架运行效率更高。

尽可能使用确定性逻辑
缓存常见响应
批量处理类似请求
预计算决策树
对简单任务使用较小的模型

原则： 只给智能体提供其专业领域所需的工具。

# 专门的智能体获得有针对性的工具集
refund_agent.tools = [verify_order, calculate_refund, process_payment]
sales_agent.tools = [check_inventory, create_quote, process_order]
# 不要：两个智能体都获得所有 6 个工具

可能时优先使用单智能体解决方案
对 I/O 密集型任务使用异步操作
实现请求合并
监控并优化热点路径

问题： 简单任务使用过多智能体会产生开销

解决方案： 从简单开始，仅在复杂性需要时才添加智能体

问题： 智能体 A → 智能体 B → 智能体 A 形成循环

解决方案： 设计清晰的层次结构或基于状态的终止条件

3. 无状态智能体

问题： 智能体在移交过程中丢失上下文

解决方案： 实现适当的状态管理和上下文传递

问题： 智能体职责重叠导致冲突

解决方案： 定义明确的智能体领域和决策标准

智能体级别：

任务完成率
平均响应时间
工具使用效率
移交准确性

端到端成功率
总延迟
每次交互成本
用户满意度评分

# 测试单个智能体行为
def test_refund_agent():
    result = refund_agent.process(valid_refund_request)
    assert result.status == "approved"
    assert result.amount > 0

# 测试智能体移交
def test_triage_to_refund():
    initial_state = {"request": "I want a refund"}
    final_state = orchestrator.run(initial_state)
    assert final_state.handling_agent == "refund_agent"
    assert final_state.completed == True

端到端测试：

# 测试完整的用户旅程
def test_customer_journey():
    scenarios = load_test_scenarios()
    for scenario in scenarios:
        result = system.execute(scenario)
        assert result.meets_requirements()

必要的可观测性：

智能体调用跟踪
移交决策日志
工具调用成功率
错误模式和频率
延迟分布

AgentKit 内置评估套件
自定义日志记录到集中式系统
故障实时警报
性能仪表板

智能体安全：

将工具权限限制在最小必需范围
验证所有工具输入
在智能体处理前清理用户输入
为每个智能体实施速率限制
记录所有智能体操作的审计跟踪

切勿不必要地在提示中暴露敏感数据
使用安全的凭证管理
加密状态/上下文存储
实施 PII 检测和掩码

跨多个服务器部署智能体实例
对智能体请求使用负载均衡
为高流量场景实施智能体池

分析并优化慢速智能体
策略性地使用缓存
批量处理类似请求
有选择地升级到更强大的模型

基本智能体结构

from openai import OpenAI

client = OpenAI()

# 定义专门的智能体
support_agent = {
    "name": "Technical Support Agent",
    "model": "gpt-4o",
    "instructions": """You are a technical support specialist.
    Help users troubleshoot technical issues.
    If issue requires refund, hand off to refund agent.
    If issue is sales-related, hand off to sales agent.""",
    "tools": [
        {"type": "function", "function": troubleshooting_guide},
        {"type": "function", "function": escalate_to_human}
    ]
}

def execute_agent_workflow(initial_request):
    current_agent = triage_agent
    context = {"request": initial_request, "history": []}

    while not is_complete(context):
        # 执行当前智能体
        response = client.chat.completions.create(
            model=current_agent.model,
            messages=build_messages(context, current_agent),
            tools=current_agent.tools
        )

        # 检查是否需要移交
        next_agent = determine_handoff(response)
        if next_agent:
            context["history"].append({
                "from": current_agent.name,
                "to": next_agent.name
            })
            current_agent = next_agent
        else:
            context["result"] = response
            break

    return context

与其他系统集成

使用 AgentKit 处理基于 OpenAI 的工作流，使用 Claude SDK 处理基于 Anthropic 的工作流，并使用 MCP 将数据源桥接到两者。

LangGraph 提供更细粒度的控制流。对简单工作流使用 AgentKit，对复杂状态机使用 LangGraph。

AgentKit 智能体可以将 MCP 服务器作为工具使用，从而标准化数据源连接。

从 Swarm 迁移到 Agents SDK

将 swarm.run() 替换为 Agents SDK 编排
将智能体定义更新为新模式
将移交逻辑迁移到生产模式
添加适当的错误处理
实现监控和可观测性

时间线： Swarm 仅处于维护状态。在 2025 年第二季度前迁移所有生产代码。

在以下情况下使用 OpenAI AgentKit：

基于 OpenAI 模型（GPT-4 等）构建
需要为非技术利益相关者提供可视化智能体构建器
需要集成的评估和监控
相对于开源框架更倾向于托管平台

在以下情况下考虑替代方案：

需要模型灵活性（使用 LangGraph）
需要复杂的状态机（使用 LangGraph）
希望完全控制编排（使用自定义解决方案）
使用 Anthropic 模型（使用 Claude SDK）

OpenAI 开发者论坛
AgentKit Discord
GitHub Discussions

简单优先 - 从单个智能体开始，仅在需要时添加复杂性
专业化优于泛化 - 专注的智能体表现更好
显式移交 - 清晰的路由优于隐式行为
生产就绪 - 对于实际应用，使用 Agents SDK，而不是 Swarm
测量一切 - 可观测性对于多智能体系统至关重要

本技能确保您使用 OpenAI 在 2025 年的最新平台能力构建稳健、可扩展、生产就绪的多智能体系统。

🇺🇸English

OpenAI AgentKit Expert Skill

Purpose

This skill provides comprehensive guidance on building production-ready multi-agent systems using OpenAI's AgentKit platform and Agents SDK, following 2025 best practices.

Platform Overview

OpenAI AgentKit (2025)

Complete platform for building, deploying, and optimizing agents with enterprise-grade tooling.

Core Components:

Agent Builder - Visual canvas for creating and versioning multi-agent workflows
Connector Registry - Central management for data and tool connections
ChatKit - Embeddable customizable chat-based agent experiences
Evaluation Suite - Datasets, trace grading, automated prompt optimization
Multi-Model Support - Third-party model integration capabilities

Agents SDK (Production-Ready)

The Agents SDK is the production evolution of the experimental Swarm framework. Use Agents SDK for all production work - Swarm is educational only.

Migration Note: If you encounter legacy Swarm code, migrate to Agents SDK immediately.

Core Concepts

1. Agents

An Agent encapsulates:

A set of instructions (system prompt)
A set of functions/tools
The capability to hand off execution to another Agent

Design Principle: Agents should be lightweight and specialized rather than monolithic and general-purpose.

2. Routines

A routine is a sequence of actions an agent can perform:

Natural language instructions (via system prompt)
Available tools needed to execute
Context and state management
Success criteria

Think of it as: A mini-workflow that an agent owns and executes autonomously.

3. Handoffs

Handoffs enable agent-to-agent transitions in execution flow.

Key Pattern: When an agent encounters a task outside its specialization, it hands off to a more appropriate agent.

Example:

# Triage agent determines which specialist to use
if task.type == "refund":
    handoff_to(refund_agent)
elif task.type == "sales":
    handoff_to(sales_agent)

Architectural Patterns

Pattern 1: Triage Pattern

Use Case: Routing requests to specialized sub-agents

Structure:

User Request → Triage Agent → [Determines Category] → Specialist Agent

Example Implementation:

# Triage agent with handoff capabilities
triage_agent = Agent(
    name="Customer Service Triage",
    instructions="Analyze customer requests and route to appropriate specialist",
    functions=[analyze_request],
    handoffs=[refund_agent, sales_agent, support_agent]
)

When to Use:

Multiple distinct capability domains
Clear categorization logic
Different agents need different tools/context

Pattern 2: Sequential Orchestration

Use Case: Multi-step workflows where each step has a specialist

Structure:

Step 1 Agent → [Complete] → Handoff → Step 2 Agent → ... → Final Agent

Example:

# Research → Analysis → Report Generation pipeline
research_agent → analysis_agent → report_agent

When to Use:

Clear sequential dependencies
Each step requires specialized expertise
Output of one step feeds the next

Pattern 3: Parallel Decomposition

Use Case: Breaking complex tasks into parallel subtasks

Structure:

Coordinator Agent
    ↓
    ├─→ Subtask Agent 1
    ├─→ Subtask Agent 2
    └─→ Subtask Agent 3
    ↓
Synthesis Agent (combines results)

When to Use:

Independent subtasks can run concurrently
Need to aggregate multiple perspectives
Performance optimization through parallelization

Best Practices

Agent Design

DO: ✅ Keep agents focused on single responsibilities ✅ Provide clear, specific instructions in system prompts ✅ Define explicit handoff conditions ✅ Use descriptive agent names (helps with debugging) ✅ Test agents in isolation before integration

DON'T: ❌ Create monolithic "do-everything" agents ❌ Allow agents to communicate directly (use handoffs) ❌ Over-engineer with too many specialized agents ❌ Ignore error handling in handoffs ❌ Skip agent boundary testing

Routine Design

Effective Routines:

Have clear entry and exit conditions
Include error handling paths
Specify required context/state
Document expected inputs/outputs
Define success metrics

Example:

routine = {
    "name": "Process Refund",
    "entry": "User requests refund",
    "steps": [
        "Verify order exists",
        "Check refund eligibility",
        "Calculate refund amount",
        "Process payment reversal",
        "Send confirmation"
    ],
    "exit": "Refund confirmed or rejection reason provided",
    "error_handling": "Escalate to human agent if verification fails"
}

Handoff Design

Critical Elements:

Clear Trigger Conditions - When should handoff occur?
Context Passing - What information transfers?
Return Path - Can control return to originating agent?
Failure Handling - What if target agent unavailable?

Example:

def handoff_condition(state):
    """Determine if handoff needed"""
    if state.requires_specialized_knowledge:
        return specialist_agent
    if state.exceeds_authority_level:
        return supervisor_agent
    return None  # Continue with current agent

Performance Optimization

Minimize LLM Calls

Principle: Frameworks that limit LLM involvement and rely on predefined or direct execution flows operate more efficiently.

Strategies:

Use deterministic logic where possible
Cache common responses
Batch similar requests
Pre-compute decision trees
Use smaller models for simple tasks

Efficient Tool Use

Principle: Give agents only the tools they need for their specialty.

Pattern:

# Specialized agents get targeted toolsets
refund_agent.tools = [verify_order, calculate_refund, process_payment]
sales_agent.tools = [check_inventory, create_quote, process_order]
# NOT: both agents get all 6 tools

Latency Reduction

Prefer single-agent solutions when possible
Use async operations for I/O-bound tasks
Implement request coalescing
Monitor and optimize hot paths

Common Anti-Patterns

1. Over-Decomposition

Problem: Too many agents for simple tasks creates overhead

Solution: Start simple, add agents only when complexity demands it

2. Circular Handoffs

Problem: Agent A → Agent B → Agent A creates loops

Solution: Design clear hierarchy or state-based termination

3. Stateless Agents

Problem: Agents lose context across handoffs

Solution: Implement proper state management and context passing

4. Unclear Boundaries

Problem: Overlapping agent responsibilities cause conflicts

Solution: Define explicit agent domains and decision criteria

Evaluation & Testing

Key Metrics

Agent-Level:

Task completion rate
Average response time
Tool usage efficiency
Handoff accuracy

System-Level:

End-to-end success rate
Total latency
Cost per interaction
User satisfaction scores

Testing Strategy

Unit Testing:

# Test individual agent behaviors
def test_refund_agent():
    result = refund_agent.process(valid_refund_request)
    assert result.status == "approved"
    assert result.amount > 0

Integration Testing:

# Test agent handoffs
def test_triage_to_refund():
    initial_state = {"request": "I want a refund"}
    final_state = orchestrator.run(initial_state)
    assert final_state.handling_agent == "refund_agent"
    assert final_state.completed == True

End-to-End Testing:

# Test full user journeys
def test_customer_journey():
    scenarios = load_test_scenarios()
    for scenario in scenarios:
        result = system.execute(scenario)
        assert result.meets_requirements()

Production Deployment

Monitoring

Essential Observability:

Agent invocation traces
Handoff decision logs
Tool call success rates
Error patterns and frequencies
Latency distributions

Tools:

AgentKit built-in evaluation suite
Custom logging to centralized system
Real-time alerting on failures
Performance dashboards

Security

Agent Security:

Scope tools to minimum required permissions
Validate all tool inputs
Sanitize user inputs before agent processing
Implement rate limiting per agent
Audit trail for all agent actions

Data Protection:

Never expose sensitive data in prompts unnecessarily
Use secure credential management
Encrypt state/context storage
Implement PII detection and masking

Scaling Strategies

Horizontal Scaling:

Deploy agent instances across multiple servers
Use load balancing for agent requests
Implement agent pools for high-volume scenarios

Vertical Optimization:

Profile and optimize slow agents
Use caching strategically
Batch similar requests
Upgrade to more powerful models selectively

Code Examples

Basic Agent Structure

from openai import OpenAI

client = OpenAI()

# Define specialized agent
support_agent = {
    "name": "Technical Support Agent",
    "model": "gpt-4o",
    "instructions": """You are a technical support specialist.
    Help users troubleshoot technical issues.
    If issue requires refund, hand off to refund agent.
    If issue is sales-related, hand off to sales agent.""",
    "tools": [
        {"type": "function", "function": troubleshooting_guide},
        {"type": "function", "function": escalate_to_human}
    ]
}

Handoff Implementation

def execute_agent_workflow(initial_request):
    current_agent = triage_agent
    context = {"request": initial_request, "history": []}

    while not is_complete(context):
        # Execute current agent
        response = client.chat.completions.create(
            model=current_agent.model,
            messages=build_messages(context, current_agent),
            tools=current_agent.tools
        )

        # Check for handoff
        next_agent = determine_handoff(response)
        if next_agent:
            context["history"].append({
                "from": current_agent.name,
                "to": next_agent.name
            })
            current_agent = next_agent
        else:
            context["result"] = response
            break

    return context

Integration with Other Systems

With Claude SDK

Use AgentKit for OpenAI-based workflows, Claude SDK for Anthropic-based workflows, and MCP to bridge data sources to both.

With LangGraph

LangGraph provides more fine-grained control flow. Use AgentKit for simpler workflows, LangGraph for complex state machines.

With MCP

AgentKit agents can consume MCP servers as tools, standardizing data source connections.

Migration Guide

From Swarm to Agents SDK

Key Changes:

Replace swarm.run() with Agents SDK orchestration
Update agent definitions to new schema
Migrate handoff logic to production patterns
Add proper error handling
Implement monitoring and observability

Timeline: Swarm is maintenance-only. Migrate all production code by Q2 2025.

Decision Framework

Use OpenAI AgentKit when:

Building on OpenAI models (GPT-4, etc.)
Need visual agent builder for non-technical stakeholders
Want integrated evaluation and monitoring
Prefer managed platform over open-source frameworks

Consider alternatives when:

Need model flexibility (use LangGraph)
Require complex state machines (use LangGraph)
Want full control over orchestration (use custom solution)
Working with Anthropic models (use Claude SDK)

Resources

Official Documentation:

AgentKit Platform: https://platform.openai.com/docs/agents
Agents SDK: https://github.com/openai/agents-sdk
Best Practices: https://platform.openai.com/docs/guides/agents-best-practices

Community:

OpenAI Developer Forum
AgentKit Discord
GitHub Discussions

Final Principles

Simplicity First - Start with single agents, add complexity only when needed
Specialization Over Generalization - Focused agents perform better
Explicit Handoffs - Clear routing beats implicit behavior
Production-Ready - Use Agents SDK, not Swarm, for real applications
Measure Everything - Observability is critical for multi-agent systems

This skill ensures you build robust, scalable, production-ready multi-agent systems using OpenAI's latest platform capabilities in 2025.

Weekly Installs

–

Repository

frankxai/claude…-library

GitHub Stars

First Seen

–

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

AI Elements：基于shadcn/ui的AI原生应用组件库，快速构建对话界面

60,400 周安装

OpenAI AgentKit 专家指南：2025年多智能体系统架构与最佳实践

🇨🇳中文介绍

OpenAI AgentKit 专家技能

目的

平台概述

OpenAI AgentKit (2025)

Agents SDK (生产就绪)

核心概念

1. 智能体

2. 例程

相关 Skills

3. 移交

架构模式

模式 1：分诊模式

模式 2：顺序编排

模式 3：并行分解

最佳实践

智能体设计

例程设计

移交设计

性能优化

最小化 LLM 调用

高效使用工具

降低延迟

常见反模式

1. 过度分解

2. 循环移交

3. 无状态智能体

4. 边界不清

评估与测试

关键指标

测试策略

生产部署

监控

安全性

扩展策略

代码示例

基本智能体结构

移交实现

与其他系统集成

与 Claude SDK

与 LangGraph

与 MCP

迁移指南

从 Swarm 迁移到 Agents SDK

决策框架

资源

最终原则

🇺🇸English

OpenAI AgentKit Expert Skill

Purpose

Platform Overview

OpenAI AgentKit (2025)

Agents SDK (Production-Ready)

Core Concepts

1. Agents

2. Routines

3. Handoffs

Architectural Patterns

Pattern 1: Triage Pattern

Pattern 2: Sequential Orchestration

Pattern 3: Parallel Decomposition

Best Practices

Agent Design

Routine Design

Handoff Design

Performance Optimization

Minimize LLM Calls

Efficient Tool Use

Latency Reduction

Common Anti-Patterns

1. Over-Decomposition

2. Circular Handoffs

3. Stateless Agents

4. Unclear Boundaries

Evaluation & Testing

Key Metrics

Testing Strategy

Production Deployment

Monitoring