LiveKit Agents 开发指南：基于 LiveKit Cloud 构建语音 AI 智能体

livekit-agents by livekit/agent-skills

843 周安装量

34 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/livekit/agent-skills --skill livekit-agents

AI/机器学习云服务音频处理

🇨🇳中文介绍

面向 LiveKit Cloud 的 LiveKit Agents 开发

本技能为使用 LiveKit Cloud 构建语音 AI 智能体提供了有主见的指导。它假定您正在使用 LiveKit Cloud（推荐路径），并阐述了如何着手智能体开发，而非具体的 API 细节。所有关于 API、方法和配置的事实信息都必须来自实时文档。

本技能适用于 LiveKit Cloud 开发者。 如果您是自托管 LiveKit，部分建议（特别是关于 LiveKit Inference 的）将不直接适用。

强制要求：开始前请阅读此清单

在编写任何代码之前，请完成此清单：

通读整个技能文档 - 即使有 MCP 可用，也不要跳过任何部分
确保已连接 LiveKit Cloud 项目 - 您需要从 Cloud 项目中获取 LIVEKIT_URL、LIVEKIT_API_KEY 和 LIVEKIT_API_SECRET
设置文档访问 - 如果可用则使用 MCP，否则使用网络搜索
计划编写测试 - 每个智能体实现必须包含测试（见下文测试部分）
根据实时文档验证所有 API - 切勿依赖模型记忆来获取 LiveKit API 信息

无论 MCP 是否可用，此清单都适用。MCP 提供文档访问，但不能替代本技能中的指导。

LiveKit Cloud 设置

LiveKit Cloud 是运行语音智能体的最快方式。它提供：

托管基础设施（无需部署服务器）
用于 AI 模型的 LiveKit Inference（无需单独的 API 密钥）
内置噪声消除、语音活动检测和其他语音功能
简单的凭证管理

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

连接到您的 Cloud 项目

如果尚未注册，请访问 cloud.livekit.io 注册
创建一个项目（或使用现有项目）
从项目设置中获取您的凭证：
- LIVEKIT_URL - 您项目的 WebSocket URL（例如，wss://your-project.livekit.cloud）
- LIVEKIT_API_KEY - 用于身份验证的 API 密钥
- LIVEKIT_API_SECRET - 用于身份验证的 API 密钥
将这些设置为环境变量（通常在 .env.local 中）：

LIVEKIT_URL=wss://your-project.livekit.cloud

LIVEKIT_API_KEY=your-api-key
LIVEKIT_API_SECRET=your-api-secret

LiveKit CLI 可以自动执行凭证设置。请查阅 CLI 文档以获取当前命令。

使用 LiveKit Inference 获取 AI 模型

LiveKit Inference 是 LiveKit Cloud 使用 AI 模型的推荐方式。 它提供对领先 AI 模型供应商的访问——全部通过您的 LiveKit 凭证完成，无需单独的 API 密钥。

LiveKit Inference 的优势：

无需为每个 AI 供应商管理单独的 API 密钥
账单通过您的 LiveKit Cloud 账户统一结算
针对语音 AI 工作负载进行了优化

请查阅文档以了解可用模型、支持的供应商和当前的使用模式。文档始终包含最新的信息。

关键规则：切勿信任模型记忆中的 LiveKit API

LiveKit Agents 是一个快速发展的 SDK。模型训练数据在创建的那一刻就已经过时了。在使用 LiveKit 时：

切勿假设 API 签名、方法名或配置选项来自记忆
切勿猜测 SDK 行为或默认值
编写代码前始终对照 实时文档进行验证
实现功能时始终引用 文档来源

即使对某个 API 很有信心，此规则也适用。无论如何都要验证。

必需：使用 LiveKit MCP 服务器获取文档

在编写任何 LiveKit 代码之前，请确保可以访问 LiveKit 文档 MCP 服务器。这提供了当前、经过验证的 API 信息，并防止依赖过时的模型知识。

检查 MCP 可用性

查找 livekit-docs MCP 工具。如果可用，请使用它们进行所有文档查找：

在实现任何功能之前搜索文档
验证 API 签名和方法参数
查找配置选项及其有效值
为手头的具体任务查找工作示例

如果 MCP 不可用

如果 LiveKit MCP 服务器未配置，请告知用户并建议安装。所有支持平台的安装说明可在以下位置找到：

从该页面获取适用于用户编码代理的安装说明。

MCP 不可用时的备用方案

如果无法在当前会话中安装 MCP：

立即告知用户 无法实时验证文档
使用网络搜索从 docs.livekit.io 获取当前文档
用注释明确标记所有 LiveKit 特定代码，例如 # UNVERIFIED: Please check docs.livekit.io for current API
清楚说明 何时无法验证某些内容："我无法根据当前文档验证此 API 签名"
建议用户在使用代码前对照 https://docs.livekit.io 进行验证

语音智能体架构原则

语音 AI 智能体具有与基于文本的智能体或传统软件根本不同的要求。请内化这些原则：

语音对话是实时的。用户期望在几百毫秒内得到响应，而不是几秒钟。每个架构决策都应考虑延迟影响：

最小化 LLM 上下文大小以减少推理时间
避免在活跃对话期间进行不必要的工具调用
优先选择流式响应而非批量响应
为异常情况（网络延迟、API 超时）进行设计

上下文膨胀会扼杀性能

庞大的系统提示和大量的工具列表会直接增加延迟。一个拥有 50 个工具和 10,000 个令牌系统提示的语音智能体，无论模型速度多快，都会感觉迟钝。

以最小可行上下文设计智能体：

仅包含与当前对话阶段相关的工具
保持系统提示集中且简洁
移除非主动需要的工具和上下文

用户不阅读，他们倾听

语音界面约束与文本不同：

冗长的回复会让用户沮丧——保持输出简洁
用户无法回滚——确保首次传达就清晰明了
打断是正常的——设计时要优雅处理
沉默感觉像是故障——需要时请确认处理中

工作流架构：移交与任务

复杂的语音智能体不应是单一的整体。LiveKit Agents 支持结构化工作流，在保持低延迟的同时处理复杂的用例。

单一整体智能体的问题

一个处理整个对话流程的单一智能体会积累：

用于每个可能操作的工具（臃肿的工具列表）
用于每个对话阶段的指令（臃肿的上下文）
用于所有场景的状态管理（复杂性）

这会造成延迟并降低可靠性。

移交：智能体到智能体的转移

移交允许一个智能体将控制权转移给另一个。使用移交来：

分离不同的对话阶段（问候 → 信息收集 → 解决）
隔离专业能力（一般支持 → 计费专家）
管理上下文边界（每个智能体仅拥有其所需内容）

围绕自然的对话边界设计移交，在这些边界处上下文可以被总结，而不是整体转移。

任务：范围限定的操作

任务是范围严格限定的提示，旨在实现特定结果。使用任务来处理：

不需要完整智能体能力的离散操作
集中提示优于通用智能体的情况
当仅需要特定能力以减少上下文时

请查阅文档以获取关于移交和任务的实现细节。

必需：为智能体行为编写测试

语音智能体行为就是代码。每个智能体实现必须包含测试。交付没有测试的智能体就是交付未经测试的代码。

强制测试工作流

在构建或修改 LiveKit 智能体时：

创建 tests/ 目录（如果不存在）
在认为实现完成前至少编写一个测试
测试用户请求的核心行为
运行测试以验证它们通过

测试驱动开发流程

当修改智能体行为——指令、工具描述、工作流时——首先为期望的行为编写测试：

定义智能体在特定场景下应该做什么
编写验证此行为的测试用例
实现该功能
迭代直到测试通过

这种方法可以防止交付那些"似乎能工作"但在生产环境中失败的智能体。

每个智能体测试应涵盖的内容

至少，为以下内容编写测试：

基本对话流程：智能体对问候做出适当响应
工具调用（如果存在工具）：工具是否以正确的参数被调用
错误处理：智能体是否优雅地处理意外输入

工具调用：智能体是否以正确的参数调用正确的工具？
响应质量：智能体是否对给定输入产生适当的响应？
工作流转换：移交和任务是否正确触发？
边界情况：智能体如何处理意外输入、打断、沉默？

使用 LiveKit 的测试框架。通过 MCP 查阅测试文档以获取当前模式：

search: "livekit agents testing"

模拟用户输入
验证智能体响应
工具调用断言
工作流转换测试

为何这是不可协商的

在手动测试中"似乎能工作"的智能体在生产中经常失败：

提示更改会无声地破坏行为
工具描述影响工具的调用时机
模型更新会改变响应模式

测试能在用户发现问题之前捕捉到这些问题。

如果用户明确要求不编写测试，可以继续但不编写测试，但要告知他们：

"我已按您的要求构建了没有测试的智能体。我强烈建议在部署到生产环境前添加测试。语音智能体难以手动验证，测试可以防止无声的回归。"

需要避免的常见错误

初始智能体过载

从一个"无所不能"的智能体开始，然后随着时间的推移添加工具/指令。相反，即使初始实现很简单，也应预先设计工作流结构。

忽视延迟直到成为问题

延迟问题会累积。一个在开发中感觉"有点慢"的智能体，在真实网络条件下的生产环境中会变得无法使用。持续测量和优化延迟。

不理解就复制示例

文档中的示例展示了特定的模式。不理解其目的就复制代码会导致臃肿、结构不良的智能体。在包含每个组件之前，先理解它的作用。

因为"只是提示"而跳过测试

智能体行为就是代码。提示更改对行为的影响与代码更改一样大。以与传统软件相同的严谨性测试智能体行为。切勿交付没有至少一个测试文件的智能体实现。

假设模型知识是最新的

重申关键规则：切勿信任模型记忆中的 LiveKit API。SDK 的演变速度超过了模型训练周期。验证一切。

始终查阅文档以获取：

API 方法签名和参数
配置选项及其有效值
SDK 版本特定的功能或更改
部署和基础设施设置
模型供应商集成细节
CLI 命令和标志

本技能提供以下方面的指导：

架构方法和设计原则
工作流结构决策
测试策略
需要避免的常见陷阱

这种区别很重要：本技能告诉您如何思考构建语音智能体。文档告诉您如何实现特定功能。

通过 MCP 使用 LiveKit 文档时，请注意任何空白、过时信息或令人困惑的内容。报告文档问题有助于改善所有开发者的生态系统。

使用 LiveKit Cloud 构建有效的语音智能体需要：

使用 LiveKit Cloud + LiveKit Inference 作为基础——这是通往生产环境的最快路径
根据实时文档验证一切——切勿信任模型记忆
在每个架构决策点最小化延迟
使用移交和任务构建工作流以管理复杂性
在更改前后测试行为——切勿交付没有测试的代码
保持上下文最小化——仅包含当前阶段所需的内容

无论 SDK 版本或 API 如何变化，这些原则仍然有效。对于所有具体的实现细节，请通过 MCP 查阅 LiveKit 文档。

🇺🇸English

LiveKit Agents Development for LiveKit Cloud

This skill provides opinionated guidance for building voice AI agents with LiveKit Cloud. It assumes you are using LiveKit Cloud (the recommended path) and encodes how to approach agent development, not API specifics. All factual information about APIs, methods, and configurations must come from live documentation.

This skill is for LiveKit Cloud developers. If you're self-hosting LiveKit, some recommendations (particularly around LiveKit Inference) won't apply directly.

MANDATORY: Read This Checklist Before Starting

Before writing ANY code, complete this checklist:

Read this entire skill document - Do not skip sections even if MCP is available
Ensure LiveKit Cloud project is connected - You need LIVEKIT_URL, LIVEKIT_API_KEY, and LIVEKIT_API_SECRET from your Cloud project
Set up documentation access - Use MCP if available, otherwise use web search
Plan to write tests - Every agent implementation MUST include tests (see testing section below)
Verify all APIs against live docs - Never rely on model memory for LiveKit APIs

This checklist applies regardless of whether MCP is available. MCP provides documentation access but does NOT replace the guidance in this skill.

LiveKit Cloud Setup

LiveKit Cloud is the fastest way to get a voice agent running. It provides:

Managed infrastructure (no servers to deploy)
LiveKit Inference for AI models (no separate API keys needed)
Built-in noise cancellation, turn detection, and other voice features
Simple credential management

Connect to Your Cloud Project

Sign up at cloud.livekit.io if you haven't already
Create a project (or use an existing one)
Get your credentials from the project settings:
- LIVEKIT_URL - Your project's WebSocket URL (e.g., wss://your-project.livekit.cloud)
- LIVEKIT_API_KEY - API key for authentication
- LIVEKIT_API_SECRET - API secret for authentication
Set these as environment variables (typically in .env.local):

LIVEKIT_URL=wss://your-project.livekit.cloud

LIVEKIT_API_KEY=your-api-key
LIVEKIT_API_SECRET=your-api-secret

The LiveKit CLI can automate credential setup. Consult the CLI documentation for current commands.

Use LiveKit Inference for AI Models

LiveKit Inference is the recommended way to use AI models with LiveKit Cloud. It provides access to leading AI model providers—all through your LiveKit credentials with no separate API keys needed.

Benefits of LiveKit Inference:

No separate API keys to manage for each AI provider
Billing consolidated through your LiveKit Cloud account
Optimized for voice AI workloads

Consult the documentation for available models, supported providers, and current usage patterns. The documentation always has the most up-to-date information.

Critical Rule: Never Trust Model Memory for LiveKit APIs

LiveKit Agents is a fast-evolving SDK. Model training data is outdated the moment it's created. When working with LiveKit:

Never assume API signatures, method names, or configuration options from memory
Never guess SDK behavior or default values
Always verify against live documentation before writing code
Always cite the documentation source when implementing features

This rule applies even when confident about an API. Verify anyway.

REQUIRED: Use LiveKit MCP Server for Documentation

Before writing any LiveKit code, ensure access to the LiveKit documentation MCP server. This provides current, verified API information and prevents reliance on stale model knowledge.

Check for MCP Availability

Look for livekit-docs MCP tools. If available, use them for all documentation lookups:

Search documentation before implementing any feature
Verify API signatures and method parameters
Look up configuration options and their valid values
Find working examples for the specific task at hand

If MCP Is Not Available

If the LiveKit MCP server is not configured, inform the user and recommend installation. Installation instructions for all supported platforms are available at:

https://docs.livekit.io/intro/mcp-server/

Fetch the installation instructions appropriate for the user's coding agent from that page.

Fallback When MCP Unavailable

If MCP cannot be installed in the current session:

Inform the user immediately that documentation cannot be verified in real-time
Use web search to fetch current documentation from docs.livekit.io
Explicitly mark all LiveKit-specific code with a comment like # UNVERIFIED: Please check docs.livekit.io for current API
State clearly when you cannot verify something: "I cannot verify this API signature against current documentation"
Recommend the user verify against https://docs.livekit.io before using the code

Voice Agent Architecture Principles

Voice AI agents have fundamentally different requirements than text-based agents or traditional software. Internalize these principles:

Latency Is Critical

Voice conversations are real-time. Users expect responses within hundreds of milliseconds, not seconds. Every architectural decision should consider latency impact:

Minimize LLM context size to reduce inference time
Avoid unnecessary tool calls during active conversation
Prefer streaming responses over batch responses
Design for the unhappy path (network delays, API timeouts)

Context Bloat Kills Performance

Large system prompts and extensive tool lists directly increase latency. A voice agent with 50 tools and a 10,000-token system prompt will feel sluggish regardless of model speed.

Design agents with minimal viable context:

Include only tools relevant to the current conversation phase
Keep system prompts focused and concise
Remove tools and context that aren't actively needed

Users Don't Read, They Listen

Voice interface constraints differ from text:

Long responses frustrate users—keep outputs concise
Users cannot scroll back—ensure clarity on first delivery
Interruptions are normal—design for graceful handling
Silence feels broken—acknowledge processing when needed

Workflow Architecture: Handoffs and Tasks

Complex voice agents should not be monolithic. LiveKit Agents supports structured workflows that maintain low latency while handling sophisticated use cases.

The Problem with Monolithic Agents

A single agent handling an entire conversation flow accumulates:

Tools for every possible action (bloated tool list)
Instructions for every conversation phase (bloated context)
State management for all scenarios (complexity)

This creates latency and reduces reliability.

Handoffs: Agent-to-Agent Transitions

Handoffs allow one agent to transfer control to another. Use handoffs to:

Separate distinct conversation phases (greeting → intake → resolution)
Isolate specialized capabilities (general support → billing specialist)
Manage context boundaries (each agent has only what it needs)

Design handoffs around natural conversation boundaries where context can be summarized rather than transferred wholesale.

Tasks: Scoped Operations

Tasks are tightly-scoped prompts designed to achieve a specific outcome. Use tasks for:

Discrete operations that don't require full agent capabilities
Situations where a focused prompt outperforms a general-purpose agent
Reducing context when only a specific capability is needed

Consult the documentation for implementation details on handoffs and tasks.

REQUIRED: Write Tests for Agent Behavior

Voice agent behavior is code. Every agent implementation MUST include tests. Shipping an agent without tests is shipping untested code.

Mandatory Testing Workflow

When building or modifying a LiveKit agent:

Create atests/ directory if one doesn't exist
Write at least one test before considering the implementation complete
Test the core behavior the user requested
Run the tests to verify they pass

Test-Driven Development Process

When modifying agent behavior—instructions, tool descriptions, workflows—begin by writing tests for the desired behavior:

Define what the agent should do in specific scenarios
Write test cases that verify this behavior
Implement the feature
Iterate until tests pass

This approach prevents shipping agents that "seem to work" but fail in production.

What Every Agent Test Should Cover

At minimum, write tests for:

Basic conversation flow : Agent responds appropriately to a greeting
Tool invocation (if tools exist): Tools are called with correct parameters
Error handling : Agent handles unexpected input gracefully

Focus tests on:

Tool invocation : Does the agent call the right tools with correct parameters?
Response quality : Does the agent produce appropriate responses for given inputs?
Workflow transitions : Do handoffs and tasks trigger correctly?
Edge cases : How does the agent handle unexpected input, interruptions, silence?

Test Implementation Pattern

Use LiveKit's testing framework. Consult the testing documentation via MCP for current patterns:

search: "livekit agents testing"

The framework supports:

Simulated user input
Verification of agent responses
Tool call assertions
Workflow transition testing

Why This Is Non-Negotiable

Agents that "seem to work" in manual testing frequently fail in production:

Prompt changes silently break behavior
Tool descriptions affect when tools are called
Model updates change response patterns

Tests catch these issues before users do.

Skipping Tests

If a user explicitly requests no tests, proceed without them but inform them:

"I've built the agent without tests as requested. I strongly recommend adding tests before deploying to production. Voice agents are difficult to verify manually and tests prevent silent regressions."

Common Mistakes to Avoid

Overloading the Initial Agent

Starting with one agent that "does everything" and adding tools/instructions over time. Instead, design workflow structure upfront, even if initial implementation is simple.

Ignoring Latency Until It's a Problem

Latency issues compound. An agent that feels "a bit slow" in development becomes unusable in production with real network conditions. Measure and optimize latency continuously.

Copying Examples Without Understanding

Examples in documentation demonstrate specific patterns. Copying code without understanding its purpose leads to bloated, poorly-structured agents. Understand what each component does before including it.

Skipping Tests Because "It's Just Prompts"

Agent behavior is code. Prompt changes affect behavior as much as code changes. Test agent behavior with the same rigor as traditional software. Never deliver an agent implementation without at least one test file.

Assuming Model Knowledge Is Current

Reiterating the critical rule: never trust model memory for LiveKit APIs. The SDK evolves faster than model training cycles. Verify everything.

When to Consult Documentation

Always consult documentation for:

API method signatures and parameters
Configuration options and their valid values
SDK version-specific features or changes
Deployment and infrastructure setup
Model provider integration details
CLI commands and flags

This skill provides guidance on:

Architectural approach and design principles
Workflow structure decisions
Testing strategy
Common pitfalls to avoid

The distinction matters: this skill tells you how to think about building voice agents. The documentation tells you how to implement specific features.

Feedback Loop

When using LiveKit documentation via MCP, note any gaps, outdated information, or confusing content. Reporting documentation issues helps improve the ecosystem for all developers.

Summary

Building effective voice agents with LiveKit Cloud requires:

Use LiveKit Cloud + LiveKit Inference as the foundation—it's the fastest path to production
Verify everything against live documentation—never trust model memory
Minimize latency at every architectural decision point
Structure workflows using handoffs and tasks to manage complexity
Test behavior before and after changes—never ship without tests
Keep context minimal —only include what's needed for the current phase

These principles remain valid regardless of SDK version or API changes. For all implementation specifics, consult the LiveKit documentation via MCP.

Weekly Installs

843

Repository

livekit/agent-skills

GitHub Stars

First Seen

Feb 9, 2026

Security Audits

Gen Agent Trust HubPass SocketWarn SnykPass

Installed on

codex830

opencode826

github-copilot824

gemini-cli820

kimi-cli816

amp814

Azure 升级评估与自动化工具 - 轻松迁移 Functions 计划、托管层级和 SKU

59,200 周安装