dialogue-audio by inferen-sh/skills
npx skills add https://github.com/inferen-sh/skills --skill dialogue-audio通过 inference.sh CLI 使用 Dia TTS 创建逼真的多说话人对话。
需要 inference.sh CLI (
infsh)。安装说明
infsh login
# 双人对话
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Have you tried the new feature yet? [S2] Not yet, but I heard it saves a ton of time. [S1] It really does. I cut my workflow in half. [S2] Okay, I am definitely trying it today."
}'
Dia TTS 使用 [S1] 和 [S2] 来区分两个说话人。
| 标签 | 角色 | 语音 |
|---|
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
[S1] | 说话人 1 | 自动分配语音 A |
[S2] | 说话人 2 | 自动分配语音 B |
规则:
[S1] 而不是 [s1]Dia TTS 会解读标点符号和非语音提示,以实现情感表达。
| 标点符号 | 效果 | 示例 |
|---|---|---|
. | 中性、陈述性、中等停顿 | "This is important." |
! | 强调、兴奋、充满活力 | "This is amazing!" |
? | 语调上扬、疑问 | "Are you sure about that?" |
... | 犹豫、声音渐弱、长停顿 | "I thought it would work... but it didn't." |
, | 短暂的呼吸停顿 | "First, we analyze. Then, we act." |
— 或 -- | 打断或转折 | "I was going to say — never mind." |
Dia TTS 支持括号内的声音描述:
(laughs) — 笑声
(sighs) — 恼怒或放松的叹息
(clears throat) — 引起注意的停顿(清嗓子)
(whispers) — 更轻柔的语调
(gasps) — 惊讶
# 兴奋的对话
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Guess what happened today! [S2] What? Tell me! [S1] We hit ten thousand users! [S2] (gasps) No way! That is incredible! [S1] I know... I still cannot believe it."
}'
# 严肃/深思的对话
infsh app run falai/dia-tts --input '{
"prompt": "[S1] We need to talk about the timeline. [S2] (sighs) I know. It is tight. [S1] Can we cut anything from the scope? [S2] Maybe... but it would mean dropping the analytics dashboard. [S1] That is a tough trade-off."
}'
# 教学/解释
infsh app run falai/dia-tts --input '{
"prompt": "[S1] So how does it actually work? [S2] Great question. Think of it like a pipeline. Data comes in on one end, gets processed in the middle, and comes out transformed on the other side. [S1] Like an assembly line? [S2] Exactly! Each step adds something."
}'
| 技巧 | 停顿长度 | 用于 |
|---|---|---|
逗号 , | ~0.3 秒 | 从句之间、列表项之间 |
句号 . | ~0.5 秒 | 句子之间 |
省略号 ... | ~1.0 秒 | 戏剧性停顿、思考、犹豫 |
| 新说话人标签 | ~0.3 秒 | 自然的轮换间隙 |
较短的句子 = 感知速度更快
带逗号的长句 = 有节制的、深思熟虑的节奏
问题后接答案 = 引人入胜的来回节奏
infsh app run falai/dia-tts --input '{ "prompt": "[S1] Ready? [S2] Ready. [S1] Let us go! Three features. Five minutes. [S2] Hit it! [S1] Feature one: real-time sync." }'
infsh app run falai/dia-tts --input '{ "prompt": "[S1] I have been thinking about this for a while... and I think we need to change direction. [S2] What do you mean? [S1] The market has shifted. What worked last year... is not working now." }'
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Welcome to the show. Today we have a special guest. Tell us about yourself. [S2] Thanks for having me! I am a product designer, and I have been building tools for creators for about ten years. [S1] What got you started in design? [S2] Honestly? I was terrible at coding but loved making things look good. (laughs) So design was the natural path."
}'
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Can you walk me through the setup process? [S2] Sure. Step one, install the CLI. It takes about thirty seconds. [S1] And then? [S2] Step two, run the login command. It will open your browser for authentication. [S1] That sounds simple. [S2] It is! Step three, you are ready to run your first app."
}'
infsh app run falai/dia-tts --input '{
"prompt": "[S1] I think we should go with option A. It is faster to implement. [S2] But option B scales better long-term. [S1] Sure, but we need something shipping this quarter. [S2] Fair point... what if we do A now with a migration path to B? [S1] That could work. Let us prototype it."
}'
两个说话人的音量应保持一致。如果一个声音更大:
# 合并音频并平衡音量
infsh app run infsh/video-audio-merger --input '{
"video": "talking-head.mp4",
"audio": "dialogue.mp3",
"audio_volume": 1.0
}'
# 将对话与背景音乐合并
infsh app run infsh/media-merger --input '{
"media": ["dialogue.mp3", "background-music.mp3"]
}'
对于超过约 30 秒的对话,分段生成:
# 第 1 段:介绍
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Welcome back to another episode..."
}'
# 第 2 段:主要内容
infsh app run falai/dia-tts --input '{
"prompt": "[S1] So let us dive into today s topic..."
}'
# 第 3 段:总结
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Great conversation today..."
}'
# 合并所有片段
infsh app run infsh/media-merger --input '{
"media": ["segment1.mp3", "segment2.mp3", "segment3.mp3"]
}'
| 应该做 | 不应该做 |
|---|---|
| 按人们说话的方式写 | 按人们书写的方式写 |
| 短句 (< 15 个单词) | 冗长的学术句子 |
| 使用缩略形式 ("can't", "won't") | 使用正式形式 ("cannot", "will not") |
| 自然的填充词 ("So,", "Well,") | 每个句子都完美无缺 |
| 变换句子长度 | 所有句子长度相同 |
| 包含反应词 ("Exactly!", "Hmm.") | 单方面的独白 |
| 生成前大声朗读 | 假设听起来没问题 |
| 错误 | 问题 | 修复方法 |
|---|---|---|
| 超过 3 个句子的独白 | 听起来像讲座,而不是对话 | 分解成交互对话 |
| 没有情感变化 | 平淡、机械的语调 | 使用标点符号和非语音提示 |
| 缺少说话人标签 | 语音不会交替 | 每个轮次都以 [S1] 或 [S2] 开头 |
| 正式的书面语言 | 听起来不自然 | 使用缩略形式、短句 |
| 话题之间没有停顿 | 感觉仓促 | 使用 ... 或场景转换 |
| 所有部分能量水平相同 | 单调乏味 | 在高/低能量时刻之间变化 |
# ElevenLabs 对话 (22+ 种语音,语音方向)
npx skills add inference-sh/skills@elevenlabs-dialogue
npx skills add inference-sh/skills@text-to-speech
npx skills add inference-sh/skills@ai-podcast-creation
npx skills add inference-sh/skills@ai-avatar-video
浏览所有应用:infsh app list
每周安装量
7.3K
代码仓库
GitHub 星标数
202
首次出现
14 天前
安全审计
安装于
claude-code5.7K
gemini-cli5.2K
codex5.2K
opencode5.2K
amp5.2K
kimi-cli5.2K
Create realistic multi-speaker dialogue with Dia TTS via inference.sh CLI.
Requires inference.sh CLI (
infsh). Install instructions
infsh login
# Two-speaker conversation
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Have you tried the new feature yet? [S2] Not yet, but I heard it saves a ton of time. [S1] It really does. I cut my workflow in half. [S2] Okay, I am definitely trying it today."
}'
Dia TTS uses [S1] and [S2] to distinguish two speakers.
| Tag | Role | Voice |
|---|---|---|
[S1] | Speaker 1 | Automatically assigned voice A |
[S2] | Speaker 2 | Automatically assigned voice B |
Rules:
[S1] not [s1]Dia TTS interprets punctuation and non-speech cues for emotional delivery.
| Punctuation | Effect | Example |
|---|---|---|
. | Neutral, declarative, medium pause | "This is important." |
! | Emphasis, excitement, energy | "This is amazing!" |
? | Rising intonation, questioning | "Are you sure about that?" |
... | Hesitation, trailing off, long pause | "I thought it would work... but it didn't." |
, | Short breath pause |
Dia TTS supports parenthetical sound descriptions:
(laughs) — laughter
(sighs) — exasperation or relief
(clears throat) — attention-getting pause
(whispers) — softer delivery
(gasps) — surprise
# Excited conversation
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Guess what happened today! [S2] What? Tell me! [S1] We hit ten thousand users! [S2] (gasps) No way! That is incredible! [S1] I know... I still cannot believe it."
}'
# Serious/thoughtful dialogue
infsh app run falai/dia-tts --input '{
"prompt": "[S1] We need to talk about the timeline. [S2] (sighs) I know. It is tight. [S1] Can we cut anything from the scope? [S2] Maybe... but it would mean dropping the analytics dashboard. [S1] That is a tough trade-off."
}'
# Teaching/explaining
infsh app run falai/dia-tts --input '{
"prompt": "[S1] So how does it actually work? [S2] Great question. Think of it like a pipeline. Data comes in on one end, gets processed in the middle, and comes out transformed on the other side. [S1] Like an assembly line? [S2] Exactly! Each step adds something."
}'
| Technique | Pause Length | Use For |
|---|---|---|
Comma , | ~0.3 seconds | Between clauses, list items |
Period . | ~0.5 seconds | Between sentences |
Ellipsis ... | ~1.0 seconds | Dramatic pause, thinking, hesitation |
| New speaker tag | ~0.3 seconds | Natural turn-taking gap |
Shorter sentences = faster perceived pace
Longer sentences with commas = measured, thoughtful pace
Questions followed by answers = engaging back-and-forth rhythm
infsh app run falai/dia-tts --input '{ "prompt": "[S1] Ready? [S2] Ready. [S1] Let us go! Three features. Five minutes. [S2] Hit it! [S1] Feature one: real-time sync." }'
infsh app run falai/dia-tts --input '{ "prompt": "[S1] I have been thinking about this for a while... and I think we need to change direction. [S2] What do you mean? [S1] The market has shifted. What worked last year... is not working now." }'
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Welcome to the show. Today we have a special guest. Tell us about yourself. [S2] Thanks for having me! I am a product designer, and I have been building tools for creators for about ten years. [S1] What got you started in design? [S2] Honestly? I was terrible at coding but loved making things look good. (laughs) So design was the natural path."
}'
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Can you walk me through the setup process? [S2] Sure. Step one, install the CLI. It takes about thirty seconds. [S1] And then? [S2] Step two, run the login command. It will open your browser for authentication. [S1] That sounds simple. [S2] It is! Step three, you are ready to run your first app."
}'
infsh app run falai/dia-tts --input '{
"prompt": "[S1] I think we should go with option A. It is faster to implement. [S2] But option B scales better long-term. [S1] Sure, but we need something shipping this quarter. [S2] Fair point... what if we do A now with a migration path to B? [S1] That could work. Let us prototype it."
}'
Both speakers should be at consistent volume. If one is louder:
# Merge with balanced audio
infsh app run infsh/video-audio-merger --input '{
"video": "talking-head.mp4",
"audio": "dialogue.mp3",
"audio_volume": 1.0
}'
# Merge dialogue with background music
infsh app run infsh/media-merger --input '{
"media": ["dialogue.mp3", "background-music.mp3"]
}'
For conversations longer than ~30 seconds, generate in segments:
# Segment 1: Introduction
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Welcome back to another episode..."
}'
# Segment 2: Main content
infsh app run falai/dia-tts --input '{
"prompt": "[S1] So let us dive into today s topic..."
}'
# Segment 3: Wrap-up
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Great conversation today..."
}'
# Merge all segments
infsh app run infsh/media-merger --input '{
"media": ["segment1.mp3", "segment2.mp3", "segment3.mp3"]
}'
| Do | Don't |
|---|---|
| Write how people talk | Write how people write |
| Short sentences (< 15 words) | Long academic sentences |
| Contractions ("can't", "won't") | Formal ("cannot", "will not") |
| Natural fillers ("So,", "Well,") | Every sentence perfectly formed |
| Vary sentence length | All sentences same length |
| Include reactions ("Exactly!", "Hmm.") | One-sided monologues |
| Read it aloud before generating | Assume it sounds right |
| Mistake | Problem | Fix |
|---|---|---|
| Monologues longer than 3 sentences | Sounds like a lecture, not conversation | Break into exchanges |
| No emotional variation | Flat, robotic delivery | Use punctuation and non-speech cues |
| Missing speaker tags | Voices don't alternate | Start every turn with [S1] or [S2] |
| Formal written language | Sounds unnatural spoken | Use contractions, short sentences |
| No pauses between topics | Feels rushed | Use ... or scene breaks |
| All same energy level | Monotonous | Vary between high/low energy moments |
# ElevenLabs dialogue (22+ voices, voice direction)
npx skills add inference-sh/skills@elevenlabs-dialogue
npx skills add inference-sh/skills@text-to-speech
npx skills add inference-sh/skills@ai-podcast-creation
npx skills add inference-sh/skills@ai-avatar-video
Browse all apps: infsh app list
Weekly Installs
7.3K
Repository
GitHub Stars
202
First Seen
14 days ago
Security Audits
Gen Agent Trust HubPassSocketPassSnykFail
Installed on
claude-code5.7K
gemini-cli5.2K
codex5.2K
opencode5.2K
amp5.2K
kimi-cli5.2K
React 组合模式指南:Vercel 组件架构最佳实践,提升代码可维护性
102,200 周安装
| "First, we analyze. Then, we act." |
— or -- | Interruption or pivot | "I was going to say — never mind." |