elevenlabs-remotion by maartenlouis/elevenlabs-remotion-skill
npx skills add https://github.com/maartenlouis/elevenlabs-remotion-skill --skill elevenlabs-remotion使用 ElevenLabs API 为 Remotion 视频生成专业的 AI 语音。
.env.local 文件中设置 ELEVENLABS_API_KEY# 从文本生成语音
node .claude/skills/elevenlabs-remotion-skill/generate.js --text "您的文本内容" --output public/audio/voiceover.mp3
# 使用旁白风格生成(更自然)
node .claude/skills/elevenlabs-remotion-skill/generate.js --text "您的文本" --character narrator --output voiceover.mp3
# 使用请求拼接生成多个场景
node .claude/skills/elevenlabs-remotion-skill/generate.js --scenes remotion/scenes.json --output-dir public/audio/project/
# 重新生成单个场景
node .claude/skills/elevenlabs-remotion-skill/generate.js --scenes scenes.json --scene scene2 --new-text "更新后的文本"
# 列出可用的语音和角色预设
node .claude/skills/elevenlabs-remotion-skill/generate.js --list-voices
node .claude/skills/elevenlabs-remotion-skill/generate.js --list-characters
使用角色预设可以获得更自然的语音,而不是逐字朗读屏幕文本:
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| 角色 | 描述 | 最佳用途 |
|---|
literal | 完全按照书面文本朗读 | 屏幕文本、引用 |
narrator | 专业讲故事者,流畅、引人入胜 | 解说视频、纪录片 |
salesperson | 热情、有说服力、充满活力 | 营销、广告 |
expert | 权威、自信、知识渊博 | 法律内容、教程 |
conversational | 随意、友好、自然 | 社交媒体、休闲内容 |
dramatic | 紧张、情感丰富、有影响力 | 开场钩子、问题陈述 |
calm | 舒缓、令人安心、温和 | 建立信任、结论 |
# 全局使用旁白风格
node .claude/skills/elevenlabs-remotion-skill/generate.js --scenes scenes.json --character narrator --output-dir public/audio/
# 或在 scenes.json 中为每个场景单独设置
{
"scenes": [
{ "id": "scene1", "text": "问题陈述", "character": "dramatic" },
{ "id": "scene2", "text": "解决方案", "character": "calm" }
]
}
使用 ElevenLabs 的请求拼接功能生成多个场景,保持一致的韵律:
{
"name": "product-demo",
"voice": "George",
"character": "narrator",
"scenes": [
{
"id": "scene1",
"text": "通用文本转语音听起来很机械。您的品牌值得更好的。",
"duration": 4.5,
"character": "dramatic"
},
{
"id": "scene2",
"text": "通过语音克隆,您可以使用自己的声音创建无限内容。",
"duration": 5.5
},
{
"id": "scene3",
"text": "录制一个简短样本。克隆它。几分钟内创建专业语音。",
"duration": 6,
"delay": 0.3
}
]
}
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--scenes remotion/product-demo-scenes.json \
--output-dir public/audio/product-demo/
这将创建:
product-demo-scene1.mp3 到 sceneN.mp3product-demo-combined.mp3(所有场景拼接)product-demo-info.json(包含持续时间的元数据)如果某个场景开始过早、时间错误或需要不同的文本:
# 使用新文本重新生成 scene2
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--scenes remotion/scenes.json \
--scene scene2 \
--new-text "更新后的场景 2 文本" \
--output-dir public/audio/project/
# 使用不同角色重新生成 scene3
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--scenes remotion/scenes.json \
--scene scene3 \
--character salesperson \
--output-dir public/audio/project/
# 仅重新生成(相同文本,相同角色)
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--scenes remotion/scenes.json \
--scene scene1 \
--output-dir public/audio/project/
# 将缩略图嵌入 MP4 视频
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--embed-thumbnail public/videos/my-video.mp4 \
--thumbnail public/videos/my-thumbnail.png \
--output public/videos/my-video-with-thumb.mp4
该工具会自动:
--new-text,则更新 scenes.json将缩略图图像嵌入 MP4 视频中,以便 Twitter、YouTube 和视频播放器等平台显示您的自定义缩略图,而不是第一帧。
# 基本用法 - 输出到 video-thumb.mp4
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--embed-thumbnail public/videos/promo.mp4 \
--thumbnail public/videos/thumbnail.png
# 自定义输出路径
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--embed-thumbnail public/videos/promo.mp4 \
--thumbnail public/videos/thumbnail.png \
--output public/videos/promo-final.mp4
# 1. 渲染您的视频
npx remotion render MyVideo public/videos/my-video.mp4
# 2. 渲染您的缩略图(使用 Still 组件)
npx remotion still MyVideoThumbnail public/videos/my-thumbnail.png
# 3. 嵌入缩略图
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--embed-thumbnail public/videos/my-video.mp4 \
--thumbnail public/videos/my-thumbnail.png \
--output public/videos/my-video-final.mp4
嵌入使用 ffmpeg 的 -disposition:v:1 attached_pic 标志将缩略图设置为附加图片,大多数视频播放器和平台都能识别。
该技能在生成后使用 ffprobe 自动验证时间:
| 检查项 | 阈值 | 描述 |
|---|---|---|
| 持续时间不匹配 | >15% | 如果实际持续时间与预期不同则警告 |
| 前导静音 | >200ms | 音频开始较晚(语音延迟) |
| 尾部静音 | >500ms | 结尾有不必要的静音 |
| 语速 | 2-4.5 wps | 最佳约每秒 3 个单词 |
# 验证项目中的所有场景
node .claude/skills/elevenlabs-remotion-skill/generate.js --validate public/audio/product-demo/
输出示例:
🔍 正在验证 product-demo(6 个场景)
❌ scene1: 3.00s(预期:4.5s)
❌ 音频比预期短 1.50 秒
👍 8 个单词 @ 3.1 单词/秒
⚠️ scene2: 6.35s(预期:5.5s)
⚠️ 前导静音:235ms(可能开始较晚)
🐢 10 个单词 @ 1.8 单词/秒
✅ scene4: 4.36s(预期:4s)
👍 9 个单词 @ 2.3 单词/秒
📊 总持续时间:30.80s(预期:30.00s)
验证后,info.json 包含实际测量值:
{
"scenes": [
{
"id": "scene1",
"duration": 4.5,
"actualDuration": 3.0,
"leadingSilence": 0.05,
"wordsPerSecond": 3.1
}
]
}
在您的 Remotion 合成中使用 actualDuration 以实现精确同步。
| 选项 | 描述 | 默认值 |
|---|---|---|
--text, -t | 要转换为语音的文本 | 必需(或 --file/--scenes) |
--file, -f | 从文件读取文本 | - |
--output, -o | 输出文件路径 | output.mp3 |
--output-dir | 场景的输出目录 | public/audio |
--voice, -v | 语音名称或 ID | George |
--model, -m | 模型 ID | eleven_multilingual_v2 |
--character, -c | 角色预设 | literal |
--scenes | 包含场景的 JSON 文件 | - |
--scene | 重新生成单个场景 ID | - |
--new-text | 场景重新生成的新文本 | - |
--validate | 验证现有音频目录 | - |
--skip-validation | 跳过自动验证 | false |
--embed-thumbnail | 要嵌入缩略图的视频文件 | - |
--thumbnail | 缩略图图像文件(PNG/JPG) | - |
--stability | 语音稳定性(0-1) | 因角色而异 |
--similarity | 语音相似度(0-1) | 因角色而异 |
--style | 风格夸张程度(0-1) | 因角色而异 |
--no-combined | 跳过合并文件 | false |
| 语音 | 风格 | 最佳用途 |
|---|---|---|
George | 温暖、迷人的英式口音 | 旁白、解说 |
Antoni | 专业、温暖 | 法律内容、教程 |
Arnold | 权威、深沉 | 企业、严肃话题 |
Josh | 友好、对话式 | 营销、休闲内容 |
生成场景语音后,在您的合成中使用它们:
import { Audio, Sequence, staticFile } from "remotion";
// 使用单个场景音频文件实现精确同步
const SCENE_DURATIONS = {
scene1: 4.5, // 来自 info.json
scene2: 5.5,
scene3: 8.0,
};
export const VideoWithVoiceover: React.FC = () => {
const { fps } = useVideoConfig();
const scene1Frames = Math.round(SCENE_DURATIONS.scene1 * fps);
const scene2Frames = Math.round(SCENE_DURATIONS.scene2 * fps);
return (
<>
<Sequence from={0} durationInFrames={scene1Frames}>
<Audio src={staticFile("audio/project/project-scene1.mp3")} />
<Scene1Visual />
</Sequence>
<Sequence from={scene1Frames} durationInFrames={scene2Frames}>
<Audio src={staticFile("audio/project/project-scene2.mp3")} />
<Scene2Visual />
</Sequence>
</>
);
};
narrator 或 expert 以获得自然流畅的效果--scene 重新生成单个场景,无需重做所有内容# 1. 使用您的脚本创建 scenes.json
# 2. 使用旁白风格生成所有场景
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--scenes remotion/my-video-scenes.json \
--character narrator \
--output-dir public/audio/my-video/
# 3. 在 Remotion 中预览,注意到 scene2 开始过早
# 4. 仅使用更新后的文本重新生成 scene2
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--scenes remotion/my-video-scenes.json \
--scene scene2 \
--new-text "稍长的文本以填充视觉时间" \
--output-dir public/audio/my-video/
# 5. 使用 info.json 中的新持续时间更新视频合成
# 6. 重复直到时间完美
每周安装量
293
代码仓库
GitHub 星标数
2
首次出现
2026年1月23日
安全审计
安装于
opencode250
claude-code234
gemini-cli226
codex226
cursor204
github-copilot199
Generate professional AI voiceovers for Remotion videos using ElevenLabs API.
ELEVENLABS_API_KEY in .env.local# Generate voiceover from text
node .claude/skills/elevenlabs-remotion-skill/generate.js --text "Your text here" --output public/audio/voiceover.mp3
# Generate with narrator style (more natural)
node .claude/skills/elevenlabs-remotion-skill/generate.js --text "Your text" --character narrator --output voiceover.mp3
# Generate scenes with request stitching
node .claude/skills/elevenlabs-remotion-skill/generate.js --scenes remotion/scenes.json --output-dir public/audio/project/
# Regenerate a single scene
node .claude/skills/elevenlabs-remotion-skill/generate.js --scenes scenes.json --scene scene2 --new-text "Updated text"
# List available voices and character presets
node .claude/skills/elevenlabs-remotion-skill/generate.js --list-voices
node .claude/skills/elevenlabs-remotion-skill/generate.js --list-characters
Use character presets for more natural voiceovers instead of literal screen text reading:
| Character | Description | Best For |
|---|---|---|
literal | Reads text exactly as written | Screen text, quotes |
narrator | Professional storyteller, smooth, engaging | Explainers, documentaries |
salesperson | Enthusiastic, persuasive, energetic | Marketing, ads |
expert | Authoritative, confident, knowledgeable | Legal content, tutorials |
conversational | Casual, friendly, natural |
# Use narrator style globally
node .claude/skills/elevenlabs-remotion-skill/generate.js --scenes scenes.json --character narrator --output-dir public/audio/
# Or set per-scene in scenes.json
{
"scenes": [
{ "id": "scene1", "text": "Problem statement", "character": "dramatic" },
{ "id": "scene2", "text": "Solution", "character": "calm" }
]
}
Generate multiple scenes with consistent prosody using ElevenLabs request stitching:
{
"name": "product-demo",
"voice": "George",
"character": "narrator",
"scenes": [
{
"id": "scene1",
"text": "Generic text-to-speech sounds robotic. Your brand deserves better.",
"duration": 4.5,
"character": "dramatic"
},
{
"id": "scene2",
"text": "With voice cloning, you can use your own voice for unlimited content.",
"duration": 5.5
},
{
"id": "scene3",
"text": "Record a short sample. Clone it. Create professional voiceovers in minutes.",
"duration": 6,
"delay": 0.3
}
]
}
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--scenes remotion/product-demo-scenes.json \
--output-dir public/audio/product-demo/
This creates:
product-demo-scene1.mp3 through sceneN.mp3product-demo-combined.mp3 (all scenes stitched)product-demo-info.json (metadata with durations)If a scene starts too early, has wrong timing, or needs different text:
# Regenerate scene2 with new text
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--scenes remotion/scenes.json \
--scene scene2 \
--new-text "Updated scene 2 text" \
--output-dir public/audio/project/
# Regenerate scene3 with different character
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--scenes remotion/scenes.json \
--scene scene3 \
--character salesperson \
--output-dir public/audio/project/
# Just regenerate (same text, same character)
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--scenes remotion/scenes.json \
--scene scene1 \
--output-dir public/audio/project/
# Embed a thumbnail into an MP4 video
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--embed-thumbnail public/videos/my-video.mp4 \
--thumbnail public/videos/my-thumbnail.png \
--output public/videos/my-video-with-thumb.mp4
The tool automatically:
--new-text is providedEmbed a thumbnail image into MP4 videos so platforms like Twitter, YouTube, and video players display your custom thumbnail instead of the first frame.
# Basic usage - outputs to video-thumb.mp4
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--embed-thumbnail public/videos/promo.mp4 \
--thumbnail public/videos/thumbnail.png
# Custom output path
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--embed-thumbnail public/videos/promo.mp4 \
--thumbnail public/videos/thumbnail.png \
--output public/videos/promo-final.mp4
# 1. Render your video
npx remotion render MyVideo public/videos/my-video.mp4
# 2. Render your thumbnail (use Still composition)
npx remotion still MyVideoThumbnail public/videos/my-thumbnail.png
# 3. Embed the thumbnail
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--embed-thumbnail public/videos/my-video.mp4 \
--thumbnail public/videos/my-thumbnail.png \
--output public/videos/my-video-final.mp4
The embedding uses ffmpeg's -disposition:v:1 attached_pic flag to set the thumbnail as an attached picture, which most video players and platforms recognize.
The skill automatically validates timing after generation using ffprobe:
| Check | Threshold | Description |
|---|---|---|
| Duration mismatch | >15% | Warns if actual differs from expected duration |
| Leading silence | >200ms | Audio starts late (voiceover delayed) |
| Trailing silence | >500ms | Unnecessary silence at end |
| Speaking rate | 2-4.5 wps | Optimal ~3 words/second |
# Validate all scenes in a project
node .claude/skills/elevenlabs-remotion-skill/generate.js --validate public/audio/product-demo/
Output example:
🔍 Validating product-demo (6 scenes)
❌ scene1: 3.00s (expected: 4.5s)
❌ Audio 1.50s shorter than expected
👍 8 words @ 3.1 words/sec
⚠️ scene2: 6.35s (expected: 5.5s)
⚠️ Leading silence: 235ms (may start late)
🐢 10 words @ 1.8 words/sec
✅ scene4: 4.36s (expected: 4s)
👍 9 words @ 2.3 words/sec
📊 Total duration: 30.80s (expected: 30.00s)
After validation, the info.json includes actual measurements:
{
"scenes": [
{
"id": "scene1",
"duration": 4.5,
"actualDuration": 3.0,
"leadingSilence": 0.05,
"wordsPerSecond": 3.1
}
]
}
Use actualDuration in your Remotion composition for precise sync.
| Option | Description | Default |
|---|---|---|
--text, -t | Text to convert to speech | Required (or --file/--scenes) |
--file, -f | Read text from file | - |
--output, -o | Output file path | output.mp3 |
| Voice | Style | Best For |
|---|---|---|
George | Warm, captivating British | Narration, explainers |
Antoni | Professional, warm | Legal content, tutorials |
Arnold | Authoritative, deep | Corporate, serious topics |
Josh | Friendly, conversational | Marketing, casual content |
After generating scene voiceovers, use them in your composition:
import { Audio, Sequence, staticFile } from "remotion";
// Use individual scene audio files for precise sync
const SCENE_DURATIONS = {
scene1: 4.5, // From info.json
scene2: 5.5,
scene3: 8.0,
};
export const VideoWithVoiceover: React.FC = () => {
const { fps } = useVideoConfig();
const scene1Frames = Math.round(SCENE_DURATIONS.scene1 * fps);
const scene2Frames = Math.round(SCENE_DURATIONS.scene2 * fps);
return (
<>
<Sequence from={0} durationInFrames={scene1Frames}>
<Audio src={staticFile("audio/project/project-scene1.mp3")} />
<Scene1Visual />
</Sequence>
<Sequence from={scene1Frames} durationInFrames={scene2Frames}>
<Audio src={staticFile("audio/project/project-scene2.mp3")} />
<Scene2Visual />
</Sequence>
</>
);
};
narrator or expert for natural flow--scene to regenerate individual scenes without redoing everything# 1. Create scenes.json with your script
# 2. Generate all scenes with narrator style
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--scenes remotion/my-video-scenes.json \
--character narrator \
--output-dir public/audio/my-video/
# 3. Preview in Remotion, notice scene2 starts too early
# 4. Regenerate just scene2 with updated text
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--scenes remotion/my-video-scenes.json \
--scene scene2 \
--new-text "Slightly longer text to fill the visual timing" \
--output-dir public/audio/my-video/
# 5. Update video composition with new duration from info.json
# 6. Repeat until timing is perfect
Weekly Installs
293
Repository
GitHub Stars
2
First Seen
Jan 23, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
opencode250
claude-code234
gemini-cli226
codex226
cursor204
github-copilot199
Skills CLI 使用指南:AI Agent 技能包管理器安装与管理教程
27,400 周安装
| Social media, casual content |
dramatic | Intense, emotional, impactful | Hooks, problem statements |
calm | Soothing, reassuring, gentle | Trust-building, conclusions |
--output-dir | Output directory for scenes | public/audio |
--voice, -v | Voice name or ID | George |
--model, -m | Model ID | eleven_multilingual_v2 |
--character, -c | Character preset | literal |
--scenes | JSON file with scenes | - |
--scene | Regenerate single scene ID | - |
--new-text | New text for scene regen | - |
--validate | Validate existing audio dir | - |
--skip-validation | Skip auto-validation | false |
--embed-thumbnail | Video file to embed thumbnail into | - |
--thumbnail | Thumbnail image file (PNG/JPG) | - |
--stability | Voice stability (0-1) | varies by character |
--similarity | Voice similarity (0-1) | varies by character |
--style | Style exaggeration (0-1) | varies by character |
--no-combined | Skip combined file | false |