AI解说视频制作指南：从脚本到剪辑的完整教程与工具

explainer-video-guide by inferen-sh/skills

7,200 周安装量

202 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/inferen-sh/skills --skill explainer-video-guide

AI/机器学习营销设计

🇨🇳中文介绍

解说视频制作指南

通过 inference.sh CLI 从脚本到最终剪辑创建解说视频。

快速开始

需要 inference.sh CLI (infsh)。安装说明

infsh login

# 为解说视频生成一个场景
infsh app run google/veo-3-1-fast --input '{
  "prompt": "Clean motion graphics style animation, abstract data flowing between connected nodes, blue and white color scheme, professional corporate aesthetic, smooth transitions"
}'

脚本公式

问题-激化-解决 (PAS) — 60 秒

部分	时长	内容	字数
问题	10s	陈述观众面临的痛点	~25 词

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

相关 Skills

find-skills 技能搜索工具 - Vercel Labs 开源智能体技能包管理器

705,000 周安装

Vercel Web界面规范检查工具 - 自动检测代码是否符合Web设计指南

196,800 周安装

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

99,500 周安装

头脑风暴技能：AI协作设计流程，将创意转化为完整规范与实施计划

71,300 周安装

专业SEO审计工具：全面网站诊断、技术SEO优化与页面分析指南

部分	时长	内容
之前	15s	展示当前令人沮丧的状态
之后	15s	展示理想的结果
桥梁	40s	解释您的产品如何帮助他们实现目标
社会认同	10s	快速统计数据或推荐
行动号召	10s	清晰的下一步

部分	时长	内容
钩子	3s	令人惊讶的事实或问题
功能	15s	展示解决一个问题的功能
结果	7s	结果/好处
行动号召	5s	尝试/了解更多

内容类型	每分钟字数	备注
标准旁白	150 wpm	对话速度
复杂/技术性	120 wpm	留出处理时间
充满活力/社交	170 wpm	短内容更快
儿童内容	100 wpm	清晰且缓慢

# 产品在上下文中
infsh app run google/veo-3-1-fast --input '{
  "prompt": "Clean product demonstration video, hands typing on a laptop showing a dashboard interface, bright modern office, soft natural lighting, professional"
}'

# 抽象概念可视化
infsh app run bytedance/seedance-1-5-pro --input '{
  "prompt": "Abstract motion graphics, colorful data streams connecting floating geometric shapes, smooth fluid animation, dark background with glowing elements, tech aesthetic"
}'

# 生活方式/结果镜头
infsh app run google/veo-3-1-fast --input '{
  "prompt": "Happy person relaxing on couch with laptop, smiling at screen, bright airy living room, warm afternoon light, satisfied customer feeling, lifestyle commercial style"
}'

# 前后对比
infsh app run falai/flux-dev-lora --input '{
  "prompt": "Split screen comparison, left side cluttered messy desk with papers and stress, right side clean organized minimalist workspace, dramatic difference, clean design"
}'

# 先生成静态帧
infsh app run falai/flux-dev-lora --input '{
  "prompt": "Professional workspace with glowing holographic interface, futuristic but clean, blue accent lighting"
}'

# 为其添加动画
infsh app run falai/wan-2-5-i2v --input '{
  "prompt": "Gentle camera push in, holographic elements subtly floating and rotating, soft ambient light shifts",
  "image": "path/to/workspace-still.png"
}'

技巧	效果	示例
句号 `.`	中等停顿	"这改变了一切。方法如下。"
省略号 `...`	长停顿 (戏剧性)	"而结果...是不可思议的。"
逗号 `,`	短停顿	"快速、简单、强大。"
感叹号 `!`	强调/活力	"立即开始构建！"
问号 `?`	语调上扬	"如果有更好的方法呢？"

音量： 低于旁白 20-30% (旁白播放时压低 6-12dB)
风格： 匹配品牌调性 (企业 = 氛围电子乐，初创公司 = 欢快独立音乐)
结构： 开场渐强 (前 3 秒) -> 旁白下微弱的循环 -> 行动号召时渐强
无人声： 旁白下仅使用器乐

生成背景音乐

infsh app run <music-gen-app> --input '{ "prompt": "upbeat corporate background music, modern electronic, 90 BPM, positive and professional, no vocals, suitable for product explainer video" }'

# 1. 生成旁白
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Your script here..."
}'

# 2. 生成场景视觉 (并行)
infsh app run google/veo-3-1-fast --input '{"prompt": "scene 1 description"}' --no-wait
infsh app run google/veo-3-1-fast --input '{"prompt": "scene 2 description"}' --no-wait
infsh app run google/veo-3-1-fast --input '{"prompt": "scene 3 description"}' --no-wait

# 3. 将场景合并成序列
infsh app run infsh/media-merger --input '{
  "media": ["scene1.mp4", "scene2.mp4", "scene3.mp4"]
}'

# 4. 为视频添加旁白
infsh app run infsh/video-audio-merger --input '{
  "video": "merged-scenes.mp4",
  "audio": "voiceover.mp3"
}'

# 5. 添加字幕
infsh app run infsh/caption-videos --input '{
  "video": "final-with-audio.mp4",
  "caption_file": "captions.srt"
}'

格式	长度	平台
社交预告	15-30s	TikTok, Instagram Reels, YouTube Shorts
产品演示	60-90s	网站，落地页
功能解说	90-120s	YouTube，电子邮件
教程/演练	2-5min	YouTube，帮助中心
投资者推介视频	2-3min	推介演示文稿补充

转场	何时使用	效果
切	相关场景之间的默认转场	干净、专业
溶解/交叉淡入淡出	时间流逝，情绪转变	柔和、引人深思
擦除	新主题或新部分	清晰分离
缩放/推进	深入细节	聚焦注意力
匹配剪辑	场景间视觉相似性	巧妙、令人难忘

错误	问题	修复方法
脚本过于冗长	旁白仓促，观众不知所措	削减至最多 150 wpm
前 3 秒没有钩子	观众立即离开	以问题或令人惊讶的数据开始
视觉滞后于旁白	令人困惑的脱节	视觉应与文字匹配或略微提前
背景音乐太大声	听不清旁白	将音乐音量压低至低于人声 6-12dB
没有字幕	85% 的社交视频是静音观看的	始终添加字幕
想法太多	观众什么都记不住	每个视频一个核心信息

🇺🇸English

Explainer Video Guide

Create explainer videos from script to final cut via inference.sh CLI.

Quick Start

Requires inference.sh CLI (infsh). Install instructions

infsh login

# Generate a scene for an explainer
infsh app run google/veo-3-1-fast --input '{
  "prompt": "Clean motion graphics style animation, abstract data flowing between connected nodes, blue and white color scheme, professional corporate aesthetic, smooth transitions"
}'

Script Formulas

Problem-Agitate-Solve (PAS) — 60 seconds

Section	Duration	Content	Word Count
Problem	10s	State the pain point the viewer has	~25 words
Agitate	10s	Show why it's worse than they think	~25 words
Solution	15s	Introduce your product/idea	~35 words
How It Works	20s	Show 3 key steps or features	~50 words
CTA	5s	One clear next action	~12 words

Before-After-Bridge (BAB) — 90 seconds

Section	Duration	Content
Before	15s	Show the current frustrating state
After	15s	Show the ideal outcome
Bridge	40s	Explain how your product gets them there
Social Proof	10s	Quick stat or testimonial
CTA	10s	Clear next step

Feature Spotlight — 30 seconds (social)

Section	Duration	Content
Hook	3s	Surprising fact or question
Feature	15s	Show one feature solving one problem
Result	7s	The outcome/benefit
CTA	5s	Try it / Learn more

Pacing Rules

Content Type	Words Per Minute	Notes
Standard narration	150 wpm	Conversational pace
Complex/technical	120 wpm	Allow processing time
Energetic/social	170 wpm	Faster for short-form
Children's content	100 wpm	Clear and slow

Key rule: 1 scene per key message. Don't pack multiple ideas into one visual.

Scene Duration Guidelines

Establishing shot: 3-5 seconds
Feature demonstration: 5-8 seconds
Text/stat on screen: 3-4 seconds (must be readable)
Transition: 0.5-1 second
CTA screen: 3-5 seconds

Visual Production

Scene Types

# Product in context
infsh app run google/veo-3-1-fast --input '{
  "prompt": "Clean product demonstration video, hands typing on a laptop showing a dashboard interface, bright modern office, soft natural lighting, professional"
}'

# Abstract concept visualization
infsh app run bytedance/seedance-1-5-pro --input '{
  "prompt": "Abstract motion graphics, colorful data streams connecting floating geometric shapes, smooth fluid animation, dark background with glowing elements, tech aesthetic"
}'

# Lifestyle/outcome shot
infsh app run google/veo-3-1-fast --input '{
  "prompt": "Happy person relaxing on couch with laptop, smiling at screen, bright airy living room, warm afternoon light, satisfied customer feeling, lifestyle commercial style"
}'

# Before/after comparison
infsh app run falai/flux-dev-lora --input '{
  "prompt": "Split screen comparison, left side cluttered messy desk with papers and stress, right side clean organized minimalist workspace, dramatic difference, clean design"
}'

Image-to-Video for Scenes

# Generate a still frame first
infsh app run falai/flux-dev-lora --input '{
  "prompt": "Professional workspace with glowing holographic interface, futuristic but clean, blue accent lighting"
}'

# Animate it
infsh app run falai/wan-2-5-i2v --input '{
  "prompt": "Gentle camera push in, holographic elements subtly floating and rotating, soft ambient light shifts",
  "image": "path/to/workspace-still.png"
}'

Voiceover Production

Script Writing Tips

Short sentences. Max 15 words per sentence.
Active voice. "You can track your data" not "Your data can be tracked."
Conversational tone. Read it aloud — if it sounds stiff, rewrite.
One idea per sentence. One sentence per visual beat.

Generating Voiceover

# Professional narration with Dia TTS
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Tired of spending hours on reports that nobody reads? There is a better way. Meet DataFlow. It turns your raw data into visual stories... in seconds. Just connect your source, pick a template, and share. Try DataFlow free today."
}'

Pacing Control in TTS

Technique	Effect	Example
Period `.`	Medium pause	"This changes everything. Here's how."
Ellipsis `...`	Long pause (dramatic)	"And the result... was incredible."
Comma `,`	Short pause	"Fast, simple, powerful."
Exclamation `!`	Emphasis/energy	"Start building today!"
Question `?`	Rising intonation

Music & Audio

Background Music Guidelines

Volume: 20-30% under narration (duck 6-12dB when voice plays)
Style: match the brand tone (corporate = ambient electronic, startup = upbeat indie)
Structure: intro swell (first 3s) -> subtle loop under narration -> swell at CTA
No vocals: instrumental only under narration

Generate background music

infsh app run <music-gen-app> --input '{ "prompt": "upbeat corporate background music, modern electronic, 90 BPM, positive and professional, no vocals, suitable for product explainer video" }'

Assembly Pipeline

Full Production Workflow

# 1. Generate voiceover
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Your script here..."
}'

# 2. Generate scene visuals (in parallel)
infsh app run google/veo-3-1-fast --input '{"prompt": "scene 1 description"}' --no-wait
infsh app run google/veo-3-1-fast --input '{"prompt": "scene 2 description"}' --no-wait
infsh app run google/veo-3-1-fast --input '{"prompt": "scene 3 description"}' --no-wait

# 3. Merge scenes into sequence
infsh app run infsh/media-merger --input '{
  "media": ["scene1.mp4", "scene2.mp4", "scene3.mp4"]
}'

# 4. Add voiceover to video
infsh app run infsh/video-audio-merger --input '{
  "video": "merged-scenes.mp4",
  "audio": "voiceover.mp3"
}'

# 5. Add captions
infsh app run infsh/caption-videos --input '{
  "video": "final-with-audio.mp4",
  "caption_file": "captions.srt"
}'

Video Length by Format

Format	Length	Platform
Social teaser	15-30s	TikTok, Instagram Reels, YouTube Shorts
Product demo	60-90s	Website, landing page
Feature explainer	90-120s	YouTube, email
Tutorial/walkthrough	2-5min	YouTube, help center
Investor pitch video	2-3min	Pitch deck supplement

Transition Types

Transition	When to Use	Effect
Cut	Default between related scenes	Clean, professional
Dissolve/Crossfade	Time passing, mood shift	Soft, contemplative
Wipe	New topic or section	Clear separation
Zoom/Push	Drilling into detail	Focus attention
Match cut	Visual similarity between scenes	Clever, memorable

Common Mistakes

Mistake	Problem	Fix
Script too wordy	Voiceover rushed, viewer overwhelmed	Cut to 150 wpm max
No hook in first 3s	Viewers leave immediately	Start with the problem or surprising stat
Visuals lag narration	Confusing disconnect	Visuals should match or slightly precede words
Background music too loud	Can't hear narration	Duck music 6-12dB under voice
No captions	85% of social video watched silent	Always add captions
Too many ideas	Viewer retains nothing	One core message per video

Related Skills

npx skills add inference-sh/skills@ai-video-generation
npx skills add inference-sh/skills@video-prompting-guide
npx skills add inference-sh/skills@text-to-speech
npx skills add inference-sh/skills@prompt-engineering

Browse all apps: infsh app list

Weekly Installs

7.2K

Repository

inferen-sh/skills

GitHub Stars

202

First Seen

14 days ago

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

claude-code5.8K

gemini-cli5.1K

codex5.1K

opencode5.1K

amp5.1K

kimi-cli5.1K

AI解说视频制作指南：从脚本到剪辑的完整教程与工具

🇨🇳中文介绍

解说视频制作指南

快速开始

脚本公式

问题-激化-解决 (PAS) — 60 秒

相关 Skills

之前-之后-桥梁 (BAB) — 90 秒

功能聚焦 — 30 秒 (社交)

节奏规则

场景时长指南

视觉制作

场景类型

用于场景的图片转视频

旁白制作

脚本写作技巧

生成旁白

TTS 中的节奏控制

音乐与音频

背景音乐指南

生成背景音乐

组装流程

完整制作工作流

按格式划分的视频长度

转场类型

常见错误

相关技能