videoagent-audio-studio by pexoai/pexo-skills
npx skills add https://github.com/pexoai/pexo-skills --skill videoagent-audio-studio使用场景: 当用户要求生成语音、朗读文本、创建旁白、创作音乐或制作音效时。
VideoAgent 音频工作室是一个智能音频调度器。它会分析您的请求,并将其路由到最佳的可用模型——ElevenLabs 用于语音和音乐,fal.ai 用于快速音效——并返回一个可直接使用的音频 URL。
| 请求类型 | 最佳模型 | 延迟 |
|---|---|---|
| 朗读文本 / 旁白 | elevenlabs-tts-v3 | ~3秒 |
| 低延迟 TTS (实时) | elevenlabs-tts-turbo | <1秒 |
| 背景音乐 | cassetteai-music | ~15秒 |
| 音效 | elevenlabs-sfx |
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| ~5秒 |
| 从音频克隆声音 | elevenlabs-voice-clone | ~10秒 |
bash {baseDir}/tools/start_server.sh
这将在端口 8124 上启动 ElevenLabs MCP 服务器。该技能使用它进行所有音频生成。
分析用户的请求,并通过 MCP 服务器调用相应的工具:
文本转语音 (TTS)
当用户要求“朗读”、“大声读”、“说”或“创建旁白”时:
Use MCP tool: text_to_speech
text: "<要朗读的文本>"
voice_id: "JBFqnCBsd6RMkjVDRZzb" # 默认:"George" (专业、中性)
model_id: "eleven_multilingual_v2" # 低延迟请使用 "eleven_turbo_v2_5"
音乐生成
当用户要求“创作”、“创建背景音乐”或“制作配乐”时:
Use MCP tool: text_to_sound_effects (通过 fal.ai 上的 cassetteai-music)
prompt: "<音乐描述,例如 '欢快的低保真嘻哈音乐,90 秒'>"
duration_seconds: <时长>
音效 (SFX)
当用户要求特定声音时 (例如,“吱呀作响的门”、“窗户上的雨声”):
Use MCP tool: text_to_sound_effects
text: "<声音描述>"
duration_seconds: <1-22>
声音克隆
当用户提供音频样本并希望克隆该声音时:
Use MCP tool: voice_add
name: "<声音名称>"
files: ["<音频文件URL>"]
用户: “为这段文字配音:欢迎参加我们的产品发布会”
→ 路由到:text_to_speech
text: "Welcome to our product launch"
voice_id: "JBFqnCBsd6RMkjVDRZzb"
model_id: "eleven_multilingual_v2"
🎙️ 旁白完成!点击收听
用户: “为播客生成 60 秒的放松背景音乐”
→ 路由到:cassetteai-music (fal.ai)
prompt: "relaxing lo-fi background music for a podcast, gentle piano and soft beats, 60 seconds"
duration_seconds: 60
🎵 背景音乐已就绪!点击收听
用户: “生成一个科幻风格的门打开音效”
→ 路由到:text_to_sound_effects
text: "a futuristic sci-fi door sliding open with a hydraulic hiss"
duration_seconds: 3
在 ~/.openclaw/openclaw.json 中设置 ELEVENLABS_API_KEY:
{
"skills": {
"entries": {
"videoagent-audio-studio": {
"enabled": true,
"env": {
"ELEVENLABS_API_KEY": "your_elevenlabs_key_here"
}
}
}
}
}
在 elevenlabs.io/app/settings/api-keys 获取您的密钥。
"FAL_KEY": "your_fal_key_here"
在 fal.ai/dashboard/keys 获取您的密钥。
cli.js 默认连接到托管的代理。如果您想要完全控制——或者需要为 vercel.app 被屏蔽地区的用户提供服务——您可以从 proxy/ 目录部署自己的实例。
cd proxy
npm install
vercel --prod
在您的 Vercel 项目中设置这些变量 (控制面板 → 设置 → 环境变量):
| 变量 | 用途 | 获取位置 |
|---|---|---|
ELEVENLABS_API_KEY | TTS, SFX, 声音克隆 | elevenlabs.io/app/settings/api-keys |
FAL_KEY | 音乐生成 | fal.ai/dashboard/keys |
VALID_PRO_KEYS | (可选) 限制访问 | 允许的客户端密钥列表,用逗号分隔 |
export AUDIOMIND_PROXY_URL="https://your-domain.com/api/audio"
或者在 ~/.openclaw/openclaw.json 中设置:
{
"skills": {
"entries": {
"videoagent-audio-studio": {
"env": {
"AUDIOMIND_PROXY_URL": "https://your-domain.com/api/audio"
}
}
}
}
}
如果您的用户在中国大陆,请在 Vercel 控制面板 → 设置 → 域名中绑定自定义域名,以避免 vercel.app 的 DNS 问题。
| 模型 ID | 类型 | 提供商 | 备注 |
|---|---|---|---|
eleven_multilingual_v2 | TTS | ElevenLabs | 最佳质量,支持 29 种语言 |
eleven_turbo_v2_5 | TTS | ElevenLabs | 超低延迟,适合实时场景 |
eleven_monolingual_v1 | TTS | ElevenLabs | 仅限英语,最快 |
cassetteai-music | 音乐 | fal.ai | 可靠、快速的音乐生成 |
elevenlabs-sfx | SFX | ElevenLabs | 高质量音效 (最长 22 秒) |
elevenlabs-voice-clone | 克隆 | ElevenLabs | 从短音频样本克隆任何声音 |
ELEVENLABS_API_KEY 即可开始使用。FAL_KEY 现在是可选的。cassetteai-music,它是同步完成的。cassetteai-music 作为音乐生成的稳定替代方案。每周安装量
3.3K
仓库
GitHub 星标数
358
首次出现
2026年3月6日
安全审计
安装于
openclaw2.4K
claude-code2.3K
cursor1.1K
gemini-cli1.1K
kimi-cli1.1K
codex1.1K
Use when: User asks to generate speech, narrate text, create a voice-over, compose music, or produce a sound effect.
VideoAgent Audio Studio is a smart audio dispatcher. It analyzes your request and routes it to the best available model — ElevenLabs for speech and music, fal.ai for fast SFX — and returns a ready-to-use audio URL.
| Request Type | Best Model | Latency |
|---|---|---|
| Narrate text / Voice-over | elevenlabs-tts-v3 | ~3s |
| Low-latency TTS (real-time) | elevenlabs-tts-turbo | <1s |
| Background music | cassetteai-music | ~15s |
| Sound effect | elevenlabs-sfx | ~5s |
| Clone a voice from audio | elevenlabs-voice-clone | ~10s |
bash {baseDir}/tools/start_server.sh
This starts the ElevenLabs MCP server on port 8124. The skill uses it for all audio generation.
Analyze the user's request and call the appropriate tool via the MCP server:
Text-to-Speech (TTS)
When user asks to "narrate", "read aloud", "say", or "create a voice-over":
Use MCP tool: text_to_speech
text: "<the text to narrate>"
voice_id: "JBFqnCBsd6RMkjVDRZzb" # Default: "George" (professional, neutral)
model_id: "eleven_multilingual_v2" # Use "eleven_turbo_v2_5" for low latency
Music Generation
When user asks to "compose", "create background music", or "make a soundtrack":
Use MCP tool: text_to_sound_effects (via cassetteai-music on fal.ai)
prompt: "<music description, e.g. 'upbeat lo-fi hip hop, 90 seconds'>"
duration_seconds: <duration>
Sound Effect (SFX)
When user asks for a specific sound (e.g., "a door creaking", "rain on a window"):
Use MCP tool: text_to_sound_effects
text: "<sound description>"
duration_seconds: <1-22>
Voice Cloning
When user provides an audio sample and wants to clone the voice:
Use MCP tool: voice_add
name: "<voice name>"
files: ["<audio_file_url>"]
User: "Voice this text for me: Welcome to our product launch"
→ Route to: text_to_speech
text: "Welcome to our product launch"
voice_id: "JBFqnCBsd6RMkjVDRZzb"
model_id: "eleven_multilingual_v2"
🎙️ Voiceover done! Listen here
User: "Generate 60 seconds of relaxing background music for a podcast"
→ Route to: cassetteai-music (fal.ai)
prompt: "relaxing lo-fi background music for a podcast, gentle piano and soft beats, 60 seconds"
duration_seconds: 60
🎵 Background music ready! Listen here
User: "Generate a sci-fi style door opening sound effect"
→ Route to: text_to_sound_effects
text: "a futuristic sci-fi door sliding open with a hydraulic hiss"
duration_seconds: 3
Set ELEVENLABS_API_KEY in ~/.openclaw/openclaw.json:
{
"skills": {
"entries": {
"videoagent-audio-studio": {
"enabled": true,
"env": {
"ELEVENLABS_API_KEY": "your_elevenlabs_key_here"
}
}
}
}
}
Get your key at elevenlabs.io/app/settings/api-keys.
"FAL_KEY": "your_fal_key_here"
Get your key at fal.ai/dashboard/keys.
The cli.js connects to a hosted proxy by default. If you want full control — or need to serve users in regions where vercel.app is blocked — you can deploy your own instance from the proxy/ directory.
cd proxy
npm install
vercel --prod
Set these in your Vercel project (Dashboard → Settings → Environment Variables):
| Variable | Required For | Where to Get |
|---|---|---|
ELEVENLABS_API_KEY | TTS, SFX, Voice Clone | elevenlabs.io/app/settings/api-keys |
FAL_KEY | Music generation | fal.ai/dashboard/keys |
VALID_PRO_KEYS | (Optional) Restrict access | Comma-separated list of allowed client keys |
export AUDIOMIND_PROXY_URL="https://your-domain.com/api/audio"
Or set it in ~/.openclaw/openclaw.json:
{
"skills": {
"entries": {
"videoagent-audio-studio": {
"env": {
"AUDIOMIND_PROXY_URL": "https://your-domain.com/api/audio"
}
}
}
}
}
If your users are in mainland China, bind a custom domain in Vercel Dashboard → Settings → Domains to avoid DNS issues with vercel.app.
| Model ID | Type | Provider | Notes |
|---|---|---|---|
eleven_multilingual_v2 | TTS | ElevenLabs | Best quality, supports 29 languages |
eleven_turbo_v2_5 | TTS | ElevenLabs | Ultra-low latency, ideal for real-time |
eleven_monolingual_v1 | TTS | ElevenLabs | English only, fastest |
cassetteai-music | Music | fal.ai | Reliable, fast music generation |
ELEVENLABS_API_KEY is all you need to get started. FAL_KEY is now optional.cassetteai-music by default, which completes synchronously.cassetteai-music as a stable alternative for music generation.Weekly Installs
3.3K
Repository
GitHub Stars
358
First Seen
Mar 6, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
openclaw2.4K
claude-code2.3K
cursor1.1K
gemini-cli1.1K
kimi-cli1.1K
codex1.1K
React 组合模式指南:Vercel 组件架构最佳实践,提升代码可维护性
102,200 周安装
AI智能体长期记忆系统 - 精英级架构,融合6种方法,永不丢失上下文
1,200 周安装
AI新闻播客制作技能:实时新闻转对话式播客脚本与音频生成
1,200 周安装
Word文档处理器:DOCX创建、编辑、分析与修订痕迹处理全指南 | 自动化办公解决方案
1,200 周安装
React Router 框架模式指南:全栈开发、文件路由、数据加载与渲染策略
1,200 周安装
Nano Banana AI 图像生成工具:使用 Gemini 3 Pro 生成与编辑高分辨率图像
1,200 周安装
SVG Logo Designer - AI 驱动的专业矢量标识设计工具,生成可缩放品牌标识
1,200 周安装
elevenlabs-sfx | SFX | ElevenLabs | High-quality sound effects (up to 22s) |
elevenlabs-voice-clone | Clone | ElevenLabs | Clone any voice from a short audio sample |