elevenlabs by digitalsamba/claude-code-video-toolkit
npx skills add https://github.com/digitalsamba/claude-code-video-toolkit --skill elevenlabs需要在 .env 文件中设置 ELEVENLABS_API_KEY。
from elevenlabs.client import ElevenLabs
from elevenlabs import save, VoiceSettings
import os
client = ElevenLabs(api_key=os.getenv("ELEVENLABS_API_KEY"))
audio = client.text_to_speech.convert(
text="Welcome to my video!",
voice_id="JBFqnCBsd6RMkjVDRZzb",
model_id="eleven_multilingual_v2",
voice_settings=VoiceSettings(
stability=0.5,
similarity_boost=0.75,
style=0.5,
speed=1.0
)
)
save(audio, "voiceover.mp3")
| 模型 | 质量 | SSML 支持 | 备注 |
|---|---|---|---|
eleven_multilingual_v2 |
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| 最高一致性 |
| 无 |
| 稳定,可用于生产环境,支持 29 种语言 |
eleven_flash_v2_5 | 良好 | <break>, <phoneme> | 快速,支持暂停/发音标签 |
eleven_turbo_v2_5 | 良好 | <break>, <phoneme> | 延迟最低 |
eleven_v3 | 最具表现力 | 无 | Alpha 版本 — 不稳定,需要提示词工程 |
选择建议: 追求可靠性选 multilingual_v2,需要 SSML 控制选 flash/turbo,追求最大表现力选 v3(可能需要重试)。
| 风格 | stability | similarity | style | speed |
|---|---|---|---|---|
| 自然/专业 | 0.75-0.85 | 0.9 | 0.0-0.1 | 1.0 |
| 对话式 | 0.5-0.6 | 0.85 | 0.3-0.4 | 0.9-1.0 |
| 充满活力/YouTuber风格 | 0.3-0.5 | 0.75 | 0.5-0.7 | 1.0-1.1 |
使用 flash/turbo 模型: 在文本中使用 SSML 的 break 标签:
...章节结束。 <break time="1.5s" /> 下一章节开始...
每次停顿最多 3 秒。过多的停顿可能导致速度异常。
使用 multilingual_v2 / v3 模型: 不支持 SSML。可选方案:
警告: ...(省略号)不是可靠的停顿标记——它可能会被念成一个词或声音。请勿使用省略号作为停顿机制。
音标拼写(所有模型适用): 按照你希望发音的方式拼写单词:
Janus → Jan-usnginx → engine-xSSML 音素标签(仅限 flash/turbo 模型):
<phoneme alphabet="ipa" ph="ˈdʒeɪnəs">Janus</phoneme>
with open("sample.mp3", "rb") as f:
voice = client.voices.ivc.create(
name="My Voice",
files=[f],
remove_background_noise=True
)
print(f"Voice ID: {voice.voice_id}")
client.voices.ivc.create()(而不是 client.voices.clone())"rb")传递文件句柄,而不是路径ffmpeg -i input.m4a -codec:a libmp3lame -qscale:a 2 output.mp3专业语音克隆: 需要 Creator 或更高等级计划,30 分钟以上的音频。参见 reference.md。
每次生成最长 22 秒。
result = client.text_to_sound_effects.convert(
text="Thunder rumbling followed by heavy rain",
duration_seconds=10,
prompt_influence=0.3
)
with open("thunder.mp3", "wb") as f:
for chunk in result:
f.write(chunk)
提示词技巧: 要具体 —— "沉重的脚步声在木地板上,缓慢而刻意,伴随着吱嘎声"
10 秒到 5 分钟。使用 client.music.compose()(而不是 .generate())。
result = client.music.compose(
prompt="Upbeat indie rock, catchy guitar riff, energetic drums, travel vlog",
music_length_ms=60000,
force_instrumental=True
)
with open("music.mp3", "wb") as f:
for chunk in result:
f.write(chunk)
提示词结构: 流派、情绪、乐器、节奏、用途。添加 "no vocals" 或使用 force_instrumental=True 来生成背景音乐。
VOICEOVER-SCRIPT.md → voiceover.py → public/audio/ → Remotion 合成
↓ ↓ ↓ ↓
场景旁白脚本 生成 MP3 音频文件 <Audio> 组件
包含时长 每个场景 包含时间信息 与场景同步
使用工具包的 voiceover 工具为每个场景生成音频:
# 为每个场景生成旁白文件
python tools/voiceover.py --scene-dir public/audio/scenes --json
# 输出:
# public/audio/scenes/
# ├── scene-01-title.mp3
# ├── scene-02-problem.mp3
# ├── scene-03-solution.mp3
# └── manifest.json (每个文件的时长信息)
manifest.json 包含时间信息:
{
"scenes": [
{ "file": "scene-01-title.mp3", "duration": 4.2 },
{ "file": "scene-02-problem.mp3", "duration": 12.8 },
{ "file": "scene-03-solution.mp3", "duration": 15.3 }
],
"totalDuration": 32.3
}
// src/Composition.tsx
import { Audio, staticFile, Series, useVideoConfig } from 'remotion';
// 导入场景组件
import { TitleSlide } from './scenes/TitleSlide';
import { ProblemSlide } from './scenes/ProblemSlide';
import { SolutionSlide } from './scenes/SolutionSlide';
// 场景时长(来自 manifest.json,按 30fps 转换为帧数)
const SCENE_DURATIONS = {
title: Math.ceil(4.2 * 30), // 126 帧
problem: Math.ceil(12.8 * 30), // 384 帧
solution: Math.ceil(15.3 * 30), // 459 帧
};
export const MainComposition: React.FC = () => {
return (
<>
{/* 场景序列 */}
<Series>
<Series.Sequence durationInFrames={SCENE_DURATIONS.title}>
<TitleSlide />
</Series.Sequence>
<Series.Sequence durationInFrames={SCENE_DURATIONS.problem}>
<ProblemSlide />
</Series.Sequence>
<Series.Sequence durationInFrames={SCENE_DURATIONS.solution}>
<SolutionSlide />
</Series.Sequence>
</Series>
{/* 音轨 - 在所有场景中连续播放 */}
<Audio src={staticFile('audio/voiceover.mp3')} volume={1} />
{/* 可选:背景音乐,音量较低 */}
<Audio src={staticFile('audio/music.mp3')} volume={0.15} />
</>
);
};
为了更精细的控制,可以为每个场景单独添加音频:
// src/scenes/ProblemSlide.tsx
import { Audio, staticFile, useCurrentFrame } from 'remotion';
export const ProblemSlide: React.FC = () => {
const frame = useCurrentFrame();
return (
<div style={{ /* 幻灯片样式 */ }}>
<h1>The Problem</h1>
{/* 场景内容 */}
{/* 音频在此场景开始时播放(此序列的第 0 帧) */}
<Audio src={staticFile('audio/scenes/scene-02-problem.mp3')} />
</div>
);
};
根据音频计算场景时长,而不是反过来:
// src/config/timing.ts
import manifest from '../../public/audio/scenes/manifest.json';
const FPS = 30;
// 将音频时长转换为帧数
export const sceneDurations = manifest.scenes.reduce((acc, scene) => {
const name = scene.file.replace(/^scene-\d+-/, '').replace('.mp3', '');
acc[name] = Math.ceil(scene.duration * FPS);
return acc;
}, {} as Record<string, number>);
// 在合成中的用法:
// <Series.Sequence durationInFrames={sceneDurations.title}>
import { Audio, Sequence, interpolate, useCurrentFrame } from 'remotion';
// 音频淡入
export const FadeInAudio: React.FC<{ src: string; fadeFrames?: number }> = ({
src,
fadeFrames = 30
}) => {
const frame = useCurrentFrame();
const volume = interpolate(frame, [0, fadeFrames], [0, 1], {
extrapolateRight: 'clamp',
});
return <Audio src={src} volume={volume} />;
};
// 延迟音频开始
export const DelayedAudio: React.FC<{ src: string; delayFrames: number }> = ({
src,
delayFrames
}) => (
<Sequence from={delayFrames}>
<Audio src={src} />
</Sequence>
);
// 用法:
// <FadeInAudio src={staticFile('audio/music.mp3')} fadeFrames={60} />
// <DelayedAudio src={staticFile('audio/sfx/whoosh.mp3')} delayFrames={45} />
当一个场景同时包含旁白和演示视频时:
import { Audio, OffthreadVideo, staticFile, useVideoConfig } from 'remotion';
export const DemoScene: React.FC = () => {
const { durationInFrames, fps } = useVideoConfig();
// 计算播放速率,使演示视频适应旁白时长
const demoDuration = 45; // 秒(原始演示长度)
const sceneDuration = durationInFrames / fps; // 秒(来自旁白)
const playbackRate = demoDuration / sceneDuration;
return (
<>
<OffthreadVideo
src={staticFile('demos/feature-demo.mp4')}
playbackRate={playbackRate}
/>
<Audio src={staticFile('audio/scenes/scene-04-demo.mp3')} />
</>
);
};
import { Audio, staticFile, delayRender, continueRender } from 'remotion';
import { useEffect, useState } from 'react';
export const SafeAudio: React.FC<{ src: string }> = ({ src }) => {
const [handle] = useState(() => delayRender());
const [audioReady, setAudioReady] = useState(false);
useEffect(() => {
const audio = new window.Audio(src);
audio.oncanplaythrough = () => {
setAudioReady(true);
continueRender(handle);
};
audio.onerror = () => {
console.error(`Failed to load audio: ${src}`);
continueRender(handle); // 即使没有音频也继续渲染,而不是挂起
};
}, [src, handle]);
if (!audioReady) return null;
return <Audio src={src} />;
};
/generate-voiceover 命令处理完整工作流:
/generate-voiceover
1. 读取 VOICEOVER-SCRIPT.md
2. 提取每个场景的旁白
3. 通过 ElevenLabs API 生成音频
4. 保存到 public/audio/scenes/
5. 创建包含时长的 manifest.json
6. 更新 project.json 中的时间信息
JBFqnCBsd6RMkjVDRZzb(温暖的旁白)21m00Tcm4TlvDq8ikWAM(清晰的女声)pNInz6obpgDQGcFmaJgB(专业的男声)列出所有语音:client.voices.get_all()
完整的 API 文档,请参见 reference.md。
每周安装次数
71
代码仓库
GitHub 星标数
108
首次出现
2026年1月23日
安全审计
安装于
opencode56
claude-code56
codex54
cursor52
gemini-cli51
github-copilot48
Requires ELEVENLABS_API_KEY in .env.
from elevenlabs.client import ElevenLabs
from elevenlabs import save, VoiceSettings
import os
client = ElevenLabs(api_key=os.getenv("ELEVENLABS_API_KEY"))
audio = client.text_to_speech.convert(
text="Welcome to my video!",
voice_id="JBFqnCBsd6RMkjVDRZzb",
model_id="eleven_multilingual_v2",
voice_settings=VoiceSettings(
stability=0.5,
similarity_boost=0.75,
style=0.5,
speed=1.0
)
)
save(audio, "voiceover.mp3")
| Model | Quality | SSML Support | Notes |
|---|---|---|---|
eleven_multilingual_v2 | Highest consistency | None | Stable, production-ready, 29 languages |
eleven_flash_v2_5 | Good | <break>, <phoneme> | Fast, supports pause/pronunciation tags |
eleven_turbo_v2_5 | Good | <break>, <phoneme> | Fastest latency |
eleven_v3 | Most expressive | None | Alpha — unreliable, needs prompt engineering |
Choose: multilingual_v2 for reliability, flash/turbo for SSML control, v3 for maximum expressiveness (expect retakes).
| Style | stability | similarity | style | speed |
|---|---|---|---|---|
| Natural/professional | 0.75-0.85 | 0.9 | 0.0-0.1 | 1.0 |
| Conversational | 0.5-0.6 | 0.85 | 0.3-0.4 | 0.9-1.0 |
| Energetic/YouTuber | 0.3-0.5 | 0.75 | 0.5-0.7 | 1.0-1.1 |
With flash/turbo models: Use SSML break tags inline:
...end of section. <break time="1.5s" /> Start of next...
Max 3 seconds per break. Excessive breaks can cause speed artifacts.
With multilingual_v2 / v3: No SSML support. Options:
WARNING: ... (ellipsis) is NOT a reliable pause — it can be vocalized as a word/sound. Do not use ellipsis as a pause mechanism.
Phonetic spelling (any model): Write words as you want them pronounced:
Janus → Jan-usnginx → engine-xSSML phoneme tags (flash/turbo only):
<phoneme alphabet="ipa" ph="ˈdʒeɪnəs">Janus</phoneme>
with open("sample.mp3", "rb") as f:
voice = client.voices.ivc.create(
name="My Voice",
files=[f],
remove_background_noise=True
)
print(f"Voice ID: {voice.voice_id}")
client.voices.ivc.create() (not client.voices.clone())"rb"), not pathsffmpeg -i input.m4a -codec:a libmp3lame -qscale:a 2 output.mp3Professional Voice Clone: Requires Creator plan+, 30+ min audio. See reference.md.
Max 22 seconds per generation.
result = client.text_to_sound_effects.convert(
text="Thunder rumbling followed by heavy rain",
duration_seconds=10,
prompt_influence=0.3
)
with open("thunder.mp3", "wb") as f:
for chunk in result:
f.write(chunk)
Prompt tips: Be specific — "Heavy footsteps on wooden floorboards, slow and deliberate, with creaking"
10 seconds to 5 minutes. Use client.music.compose() (not .generate()).
result = client.music.compose(
prompt="Upbeat indie rock, catchy guitar riff, energetic drums, travel vlog",
music_length_ms=60000,
force_instrumental=True
)
with open("music.mp3", "wb") as f:
for chunk in result:
f.write(chunk)
Prompt structure: Genre, mood, instruments, tempo, use case. Add "no vocals" or use force_instrumental=True for background music.
VOICEOVER-SCRIPT.md → voiceover.py → public/audio/ → Remotion composition
↓ ↓ ↓ ↓
Scene narration Generate MP3 Audio files <Audio> component
with durations per scene with timing synced to scenes
Use the toolkit's voiceover tool to generate audio for each scene:
# Generate voiceover files for each scene
python tools/voiceover.py --scene-dir public/audio/scenes --json
# Output:
# public/audio/scenes/
# ├── scene-01-title.mp3
# ├── scene-02-problem.mp3
# ├── scene-03-solution.mp3
# └── manifest.json (durations for each file)
The manifest.json contains timing info:
{
"scenes": [
{ "file": "scene-01-title.mp3", "duration": 4.2 },
{ "file": "scene-02-problem.mp3", "duration": 12.8 },
{ "file": "scene-03-solution.mp3", "duration": 15.3 }
],
"totalDuration": 32.3
}
// src/Composition.tsx
import { Audio, staticFile, Series, useVideoConfig } from 'remotion';
// Import scene components
import { TitleSlide } from './scenes/TitleSlide';
import { ProblemSlide } from './scenes/ProblemSlide';
import { SolutionSlide } from './scenes/SolutionSlide';
// Scene durations (from manifest.json, converted to frames at 30fps)
const SCENE_DURATIONS = {
title: Math.ceil(4.2 * 30), // 126 frames
problem: Math.ceil(12.8 * 30), // 384 frames
solution: Math.ceil(15.3 * 30), // 459 frames
};
export const MainComposition: React.FC = () => {
return (
<>
{/* Scene sequence */}
<Series>
<Series.Sequence durationInFrames={SCENE_DURATIONS.title}>
<TitleSlide />
</Series.Sequence>
<Series.Sequence durationInFrames={SCENE_DURATIONS.problem}>
<ProblemSlide />
</Series.Sequence>
<Series.Sequence durationInFrames={SCENE_DURATIONS.solution}>
<SolutionSlide />
</Series.Sequence>
</Series>
{/* Audio track - plays continuously across all scenes */}
<Audio src={staticFile('audio/voiceover.mp3')} volume={1} />
{/* Optional: Background music at lower volume */}
<Audio src={staticFile('audio/music.mp3')} volume={0.15} />
</>
);
};
For more control, add audio to each scene individually:
// src/scenes/ProblemSlide.tsx
import { Audio, staticFile, useCurrentFrame } from 'remotion';
export const ProblemSlide: React.FC = () => {
const frame = useCurrentFrame();
return (
<div style={{ /* slide styles */ }}>
<h1>The Problem</h1>
{/* Scene content */}
{/* Audio starts when this scene starts (frame 0 of this sequence) */}
<Audio src={staticFile('audio/scenes/scene-02-problem.mp3')} />
</div>
);
};
Calculate scene duration from audio, not the other way around:
// src/config/timing.ts
import manifest from '../../public/audio/scenes/manifest.json';
const FPS = 30;
// Convert audio durations to frame counts
export const sceneDurations = manifest.scenes.reduce((acc, scene) => {
const name = scene.file.replace(/^scene-\d+-/, '').replace('.mp3', '');
acc[name] = Math.ceil(scene.duration * FPS);
return acc;
}, {} as Record<string, number>);
// Usage in composition:
// <Series.Sequence durationInFrames={sceneDurations.title}>
import { Audio, Sequence, interpolate, useCurrentFrame } from 'remotion';
// Fade in audio
export const FadeInAudio: React.FC<{ src: string; fadeFrames?: number }> = ({
src,
fadeFrames = 30
}) => {
const frame = useCurrentFrame();
const volume = interpolate(frame, [0, fadeFrames], [0, 1], {
extrapolateRight: 'clamp',
});
return <Audio src={src} volume={volume} />;
};
// Delayed audio start
export const DelayedAudio: React.FC<{ src: string; delayFrames: number }> = ({
src,
delayFrames
}) => (
<Sequence from={delayFrames}>
<Audio src={src} />
</Sequence>
);
// Usage:
// <FadeInAudio src={staticFile('audio/music.mp3')} fadeFrames={60} />
// <DelayedAudio src={staticFile('audio/sfx/whoosh.mp3')} delayFrames={45} />
When a scene has both voiceover and demo video:
import { Audio, OffthreadVideo, staticFile, useVideoConfig } from 'remotion';
export const DemoScene: React.FC = () => {
const { durationInFrames, fps } = useVideoConfig();
// Calculate playback rate to fit demo into voiceover duration
const demoDuration = 45; // seconds (original demo length)
const sceneDuration = durationInFrames / fps; // seconds (from voiceover)
const playbackRate = demoDuration / sceneDuration;
return (
<>
<OffthreadVideo
src={staticFile('demos/feature-demo.mp4')}
playbackRate={playbackRate}
/>
<Audio src={staticFile('audio/scenes/scene-04-demo.mp3')} />
</>
);
};
import { Audio, staticFile, delayRender, continueRender } from 'remotion';
import { useEffect, useState } from 'react';
export const SafeAudio: React.FC<{ src: string }> = ({ src }) => {
const [handle] = useState(() => delayRender());
const [audioReady, setAudioReady] = useState(false);
useEffect(() => {
const audio = new window.Audio(src);
audio.oncanplaythrough = () => {
setAudioReady(true);
continueRender(handle);
};
audio.onerror = () => {
console.error(`Failed to load audio: ${src}`);
continueRender(handle); // Continue without audio rather than hang
};
}, [src, handle]);
if (!audioReady) return null;
return <Audio src={src} />;
};
The /generate-voiceover command handles the full workflow:
/generate-voiceover
1. Reads VOICEOVER-SCRIPT.md
2. Extracts narration for each scene
3. Generates audio via ElevenLabs API
4. Saves to public/audio/scenes/
5. Creates manifest.json with durations
6. Updates project.json with timing info
JBFqnCBsd6RMkjVDRZzb (warm narrator)21m00Tcm4TlvDq8ikWAM (clear female)pNInz6obpgDQGcFmaJgB (professional male)List all: client.voices.get_all()
For full API docs, see reference.md.
Weekly Installs
71
Repository
GitHub Stars
108
First Seen
Jan 23, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
opencode56
claude-code56
codex54
cursor52
gemini-cli51
github-copilot48
AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具
47,700 周安装