对话音频生成工具：使用Dia TTS创建逼真多说话人对话，支持情感控制和节奏调整

dialogue-audio by inferen-sh/skills

7,400 周安装量

228 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/inferen-sh/skills --skill dialogue-audio

AI/机器学习开发音频处理

🇨🇳中文介绍

对话音频

通过 inference.sh CLI 使用 Dia TTS 创建逼真的多说话人对话。

快速开始

需要 inference.sh CLI (infsh)。安装说明

infsh login

# 双人对话
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Have you tried the new feature yet? [S2] Not yet, but I heard it saves a ton of time. [S1] It really does. I cut my workflow in half. [S2] Okay, I am definitely trying it today."
}'

说话人标签

Dia TTS 使用 [S1] 和 [S2] 来区分两个说话人。

标签	角色	语音

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

相关 Skills

find-skills 技能搜索工具 - Vercel Labs 开源智能体技能包管理器

733,500 周安装

Vercel React 最佳实践指南 | 58条Next.js性能优化规则与代码重构

252,100 周安装

Vercel Web界面规范检查工具 - 自动检测代码是否符合Web设计指南

202,600 周安装

agent-browser 浏览器自动化工具 - Vercel Labs 命令行网页操作与测试

133,200 周安装

标点符号	效果	示例
`.`	中性、陈述性、中等停顿	"This is important."
`!`	强调、兴奋、充满活力	"This is amazing!"
`?`	语调上扬、疑问	"Are you sure about that?"
`...`	犹豫、声音渐弱、长停顿	"I thought it would work... but it didn't."
`,`	短暂的呼吸停顿	"First, we analyze. Then, we act."
`—` 或 `--`	打断或转折	"I was going to say — never mind."

# 兴奋的对话
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Guess what happened today! [S2] What? Tell me! [S1] We hit ten thousand users! [S2] (gasps) No way! That is incredible! [S1] I know... I still cannot believe it."
}'

# 严肃/深思的对话
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] We need to talk about the timeline. [S2] (sighs) I know. It is tight. [S1] Can we cut anything from the scope? [S2] Maybe... but it would mean dropping the analytics dashboard. [S1] That is a tough trade-off."
}'

# 教学/解释
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] So how does it actually work? [S2] Great question. Think of it like a pipeline. Data comes in on one end, gets processed in the middle, and comes out transformed on the other side. [S1] Like an assembly line? [S2] Exactly! Each step adds something."
}'

技巧	停顿长度	用于
逗号 `,`	~0.3 秒	从句之间、列表项之间
句号 `.`	~0.5 秒	句子之间
省略号 `...`	~1.0 秒	戏剧性停顿、思考、犹豫
新说话人标签	~0.3 秒	自然的轮换间隙

较短的句子 = 感知速度更快
带逗号的长句 = 有节制的、深思熟虑的节奏
问题后接答案 = 引人入胜的来回节奏

快节奏，充满活力

infsh app run falai/dia-tts --input '{ "prompt": "[S1] Ready? [S2] Ready. [S1] Let us go! Three features. Five minutes. [S2] Hit it! [S1] Feature one: real-time sync." }'

缓慢，沉思

infsh app run falai/dia-tts --input '{ "prompt": "[S1] I have been thinking about this for a while... and I think we need to change direction. [S2] What do you mean? [S1] The market has shifted. What worked last year... is not working now." }'

infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Welcome to the show. Today we have a special guest. Tell us about yourself. [S2] Thanks for having me! I am a product designer, and I have been building tools for creators for about ten years. [S1] What got you started in design? [S2] Honestly? I was terrible at coding but loved making things look good. (laughs) So design was the natural path."
}'

infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Can you walk me through the setup process? [S2] Sure. Step one, install the CLI. It takes about thirty seconds. [S1] And then? [S2] Step two, run the login command. It will open your browser for authentication. [S1] That sounds simple. [S2] It is! Step three, you are ready to run your first app."
}'

# 第 1 段：介绍
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Welcome back to another episode..."
}'

# 第 2 段：主要内容
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] So let us dive into today s topic..."
}'

# 第 3 段：总结
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Great conversation today..."
}'

# 合并所有片段
infsh app run infsh/media-merger --input '{
  "media": ["segment1.mp3", "segment2.mp3", "segment3.mp3"]
}'

应该做	不应该做
按人们说话的方式写	按人们书写的方式写
短句 (< 15 个单词)	冗长的学术句子
使用缩略形式 ("can't", "won't")	使用正式形式 ("cannot", "will not")
自然的填充词 ("So,", "Well,")	每个句子都完美无缺
变换句子长度	所有句子长度相同
包含反应词 ("Exactly!", "Hmm.")	单方面的独白
生成前大声朗读	假设听起来没问题

错误	问题	修复方法
超过 3 个句子的独白	听起来像讲座，而不是对话	分解成交互对话
没有情感变化	平淡、机械的语调	使用标点符号和非语音提示
缺少说话人标签	语音不会交替	每个轮次都以 `[S1]` 或 `[S2]` 开头
正式的书面语言	听起来不自然	使用缩略形式、短句
话题之间没有停顿	感觉仓促	使用 `...` 或场景转换
所有部分能量水平相同	单调乏味	在高/低能量时刻之间变化

🇺🇸English

Dialogue Audio

Create realistic multi-speaker dialogue with Dia TTS via inference.sh CLI.

Quick Start

Requires inference.sh CLI (infsh). Install instructions

infsh login

# Two-speaker conversation
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Have you tried the new feature yet? [S2] Not yet, but I heard it saves a ton of time. [S1] It really does. I cut my workflow in half. [S2] Okay, I am definitely trying it today."
}'

Speaker Tags

Dia TTS uses [S1] and [S2] to distinguish two speakers.

Tag	Role	Voice
`[S1]`	Speaker 1	Automatically assigned voice A
`[S2]`	Speaker 2	Automatically assigned voice B

Rules:

Always start each speaker turn with the tag
Tags must be uppercase: [S1] not [s1]
Maximum 2 speakers per generation
Each speaker maintains consistent voice within a session

Emotion & Expression Control

Dia TTS interprets punctuation and non-speech cues for emotional delivery.

Punctuation Effects

Punctuation	Effect	Example
`.`	Neutral, declarative, medium pause	"This is important."
`!`	Emphasis, excitement, energy	"This is amazing!"
`?`	Rising intonation, questioning	"Are you sure about that?"
`...`	Hesitation, trailing off, long pause	"I thought it would work... but it didn't."
`,`	Short breath pause

Non-Speech Sounds

Dia TTS supports parenthetical sound descriptions:

(laughs)      — laughter
(sighs)       — exasperation or relief
(clears throat) — attention-getting pause
(whispers)    — softer delivery
(gasps)       — surprise

Examples with Emotion

# Excited conversation
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Guess what happened today! [S2] What? Tell me! [S1] We hit ten thousand users! [S2] (gasps) No way! That is incredible! [S1] I know... I still cannot believe it."
}'

# Serious/thoughtful dialogue
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] We need to talk about the timeline. [S2] (sighs) I know. It is tight. [S1] Can we cut anything from the scope? [S2] Maybe... but it would mean dropping the analytics dashboard. [S1] That is a tough trade-off."
}'

# Teaching/explaining
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] So how does it actually work? [S2] Great question. Think of it like a pipeline. Data comes in on one end, gets processed in the middle, and comes out transformed on the other side. [S1] Like an assembly line? [S2] Exactly! Each step adds something."
}'

Pacing Control

Pause Hierarchy

Technique	Pause Length	Use For
Comma `,`	~0.3 seconds	Between clauses, list items
Period `.`	~0.5 seconds	Between sentences
Ellipsis `...`	~1.0 seconds	Dramatic pause, thinking, hesitation
New speaker tag	~0.3 seconds	Natural turn-taking gap

Speed Control

Shorter sentences = faster perceived pace
Longer sentences with commas = measured, thoughtful pace
Questions followed by answers = engaging back-and-forth rhythm

Fast-paced, energetic

infsh app run falai/dia-tts --input '{ "prompt": "[S1] Ready? [S2] Ready. [S1] Let us go! Three features. Five minutes. [S2] Hit it! [S1] Feature one: real-time sync." }'

Slow, contemplative

infsh app run falai/dia-tts --input '{ "prompt": "[S1] I have been thinking about this for a while... and I think we need to change direction. [S2] What do you mean? [S1] The market has shifted. What worked last year... is not working now." }'

Conversation Structure Patterns

Interview Format

infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Welcome to the show. Today we have a special guest. Tell us about yourself. [S2] Thanks for having me! I am a product designer, and I have been building tools for creators for about ten years. [S1] What got you started in design? [S2] Honestly? I was terrible at coding but loved making things look good. (laughs) So design was the natural path."
}'

Tutorial / Explainer

infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Can you walk me through the setup process? [S2] Sure. Step one, install the CLI. It takes about thirty seconds. [S1] And then? [S2] Step two, run the login command. It will open your browser for authentication. [S1] That sounds simple. [S2] It is! Step three, you are ready to run your first app."
}'

Debate / Discussion

infsh app run falai/dia-tts --input '{
  "prompt": "[S1] I think we should go with option A. It is faster to implement. [S2] But option B scales better long-term. [S1] Sure, but we need something shipping this quarter. [S2] Fair point... what if we do A now with a migration path to B? [S1] That could work. Let us prototype it."
}'

Post-Production Tips

Volume Normalization

Both speakers should be at consistent volume. If one is louder:

# Merge with balanced audio
infsh app run infsh/video-audio-merger --input '{
  "video": "talking-head.mp4",
  "audio": "dialogue.mp3",
  "audio_volume": 1.0
}'

Adding Background/Music

# Merge dialogue with background music
infsh app run infsh/media-merger --input '{
  "media": ["dialogue.mp3", "background-music.mp3"]
}'

Segmenting Long Conversations

For conversations longer than ~30 seconds, generate in segments:

# Segment 1: Introduction
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Welcome back to another episode..."
}'

# Segment 2: Main content
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] So let us dive into today s topic..."
}'

# Segment 3: Wrap-up
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Great conversation today..."
}'

# Merge all segments
infsh app run infsh/media-merger --input '{
  "media": ["segment1.mp3", "segment2.mp3", "segment3.mp3"]
}'

Script Writing Tips

Do	Don't
Write how people talk	Write how people write
Short sentences (< 15 words)	Long academic sentences
Contractions ("can't", "won't")	Formal ("cannot", "will not")
Natural fillers ("So,", "Well,")	Every sentence perfectly formed
Vary sentence length	All sentences same length
Include reactions ("Exactly!", "Hmm.")	One-sided monologues
Read it aloud before generating	Assume it sounds right

Common Mistakes

Mistake	Problem	Fix
Monologues longer than 3 sentences	Sounds like a lecture, not conversation	Break into exchanges
No emotional variation	Flat, robotic delivery	Use punctuation and non-speech cues
Missing speaker tags	Voices don't alternate	Start every turn with `[S1]` or `[S2]`
Formal written language	Sounds unnatural spoken	Use contractions, short sentences
No pauses between topics	Feels rushed	Use `...` or scene breaks
All same energy level	Monotonous	Vary between high/low energy moments

Related Skills

# ElevenLabs dialogue (22+ voices, voice direction)
npx skills add inference-sh/skills@elevenlabs-dialogue

npx skills add inference-sh/skills@text-to-speech
npx skills add inference-sh/skills@ai-podcast-creation
npx skills add inference-sh/skills@ai-avatar-video

Browse all apps: infsh app list

Weekly Installs

7.3K

Repository

inferen-sh/skills

GitHub Stars

202

First Seen

14 days ago

Security Audits

Gen Agent Trust HubPass SocketPass SnykFail

Installed on

claude-code5.7K

gemini-cli5.2K

codex5.2K

opencode5.2K

amp5.2K

kimi-cli5.2K

`[S1]`	说话人 1	自动分配语音 A
`[S2]`	说话人 2	自动分配语音 B

对话音频生成工具：使用Dia TTS创建逼真多说话人对话，支持情感控制和节奏调整

🇨🇳中文介绍

对话音频

快速开始

说话人标签

相关 Skills

情感与表达控制

标点符号效果

非语音声音

带情感的示例

节奏控制

停顿层级

速度控制

快节奏，充满活力

缓慢，沉思

对话结构模式

采访格式

教程 / 讲解

辩论 / 讨论

后期制作技巧

音量标准化

添加背景/音乐

分割长对话

脚本编写技巧

常见错误

相关技能