文本转语音工具 - 使用 inference.sh CLI 实现高品质AI语音合成，支持多种模型和语言

text-to-speech by inferen-sh/skills

7,200 周安装量

184 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/inferen-sh/skills --skill text-to-speech

AI/机器学习开发音频处理

🇨🇳中文介绍

文本转语音

通过 inference.sh CLI 将文本转换为自然语音。

文本转语音

快速开始

需要 inference.sh CLI (infsh)。安装说明

infsh login

# 生成语音
infsh app run infsh/kokoro-tts --input '{"text": "Hello, welcome to our product demo."}'

可用模型

模型	应用 ID	最佳用途
ElevenLabs TTS	`elevenlabs/tts`	高品质，22+ 种声音，32 种语言
DIA TTS

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

浏览所有音频应用

infsh app list --category audio

基础文本转语音

infsh app run infsh/kokoro-tts --input '{"text": "Welcome to our tutorial."}'

使用 DIA 进行对话式 TTS

infsh app sample infsh/dia-tts --save input.json

# 编辑 input.json:
# {
#   "text": "嘿！你今天怎么样？我真的很兴奋能和你分享这个。",
#   "voice": "conversational"
# }

infsh app run infsh/dia-tts --input input.json

长篇音频（播客）

infsh app sample infsh/vibevoice --save input.json

# 使用你的播客脚本编辑 input.json
infsh app run infsh/vibevoice --input input.json

使用 Higgs 进行富有表现力的语音

infsh app sample infsh/higgs-audio --save input.json

# {
#   "text": "这简直太不可思议了！",
#   "emotion": "excited"
# }

infsh app run infsh/higgs-audio --input input.json

旁白 : 产品演示，解说视频
有声书 : 将文本转换为口语
播客 : 生成播客剧集
无障碍访问 : 使内容可访问
交互式语音应答 : 电话系统语音提示
视频旁白 : 为视频添加旁白

生成语音，然后创建会说话的头部视频：

# 1. 生成语音
infsh app run infsh/kokoro-tts --input '{"text": "你的脚本在这里"}' > speech.json

# 2. 将音频 URL 与 OmniHuman 结合使用，生成虚拟形象视频
infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "<audio-url-from-step-1>"
}'

# ElevenLabs TTS (高级，22+ 种声音)
npx skills add inference-sh/skills@elevenlabs-tts

# ElevenLabs 对话 (多说话人)
npx skills add inference-sh/skills@elevenlabs-dialogue

# 完整平台技能 (所有 150+ 个应用)
npx skills add inference-sh/skills@infsh-cli

# AI 虚拟形象 (将 TTS 与会说话的头部结合)
npx skills add inference-sh/skills@ai-avatar-video

# AI 音乐生成
npx skills add inference-sh/skills@ai-music-generation

# 语音转文本 (转录)
npx skills add inference-sh/skills@speech-to-text

# 视频生成
npx skills add inference-sh/skills@ai-video-generation

浏览所有应用：infsh app list

运行应用 - 如何通过 CLI 运行应用
音频转录示例 - 音频处理工作流
应用概述 - 了解应用生态系统

🇺🇸English

Text-to-Speech

Convert text to natural speech via inference.sh CLI.

Text-to-Speech

Quick Start

Requires inference.sh CLI (infsh). Install instructions

infsh login

# Generate speech
infsh app run infsh/kokoro-tts --input '{"text": "Hello, welcome to our product demo."}'

Available Models

Model	App ID	Best For
ElevenLabs TTS	`elevenlabs/tts`	Premium quality, 22+ voices, 32 languages
DIA TTS	`infsh/dia-tts`	Conversational, expressive
Kokoro TTS	`infsh/kokoro-tts`	Fast, natural
Chatterbox	`infsh/chatterbox`	General purpose
Higgs Audio	`infsh/higgs-audio`	Emotional control
VibeVoice	`infsh/vibevoice`	Podcasts, long-form

Browse All Audio Apps

infsh app list --category audio

Examples

Basic Text-to-Speech

infsh app run infsh/kokoro-tts --input '{"text": "Welcome to our tutorial."}'

Conversational TTS with DIA

infsh app sample infsh/dia-tts --save input.json

# Edit input.json:
# {
#   "text": "Hey! How are you doing today? I'm really excited to share this with you.",
#   "voice": "conversational"
# }

infsh app run infsh/dia-tts --input input.json

Long-form Audio (Podcasts)

infsh app sample infsh/vibevoice --save input.json

# Edit input.json with your podcast script
infsh app run infsh/vibevoice --input input.json

Expressive Speech with Higgs

infsh app sample infsh/higgs-audio --save input.json

# {
#   "text": "This is absolutely incredible!",
#   "emotion": "excited"
# }

infsh app run infsh/higgs-audio --input input.json

Use Cases

Voiceovers : Product demos, explainer videos
Audiobooks : Convert text to spoken word
Podcasts : Generate podcast episodes
Accessibility : Make content accessible
IVR : Phone system voice prompts
Video Narration : Add narration to videos

Combine with Video

Generate speech, then create a talking head video:

# 1. Generate speech
infsh app run infsh/kokoro-tts --input '{"text": "Your script here"}' > speech.json

# 2. Use the audio URL with OmniHuman for avatar video
infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "<audio-url-from-step-1>"
}'

Related Skills

# ElevenLabs TTS (premium, 22+ voices)
npx skills add inference-sh/skills@elevenlabs-tts

# ElevenLabs dialogue (multi-speaker)
npx skills add inference-sh/skills@elevenlabs-dialogue

# Full platform skill (all 150+ apps)
npx skills add inference-sh/skills@infsh-cli

# AI avatars (combine TTS with talking heads)
npx skills add inference-sh/skills@ai-avatar-video

# AI music generation
npx skills add inference-sh/skills@ai-music-generation

# Speech-to-text (transcription)
npx skills add inference-sh/skills@speech-to-text

# Video generation
npx skills add inference-sh/skills@ai-video-generation

Browse all apps: infsh app list

Documentation

Running Apps - How to run apps via CLI
Audio Transcription Example - Audio processing workflows
Apps Overview - Understanding the app ecosystem

Weekly Installs

7.2K

Repository

inferen-sh/skills

GitHub Stars

184

First Seen

13 days ago

Security Audits

Gen Agent Trust HubPass SocketWarn SnykWarn

Installed on

claude-code5.7K

gemini-cli5.1K

codex5.1K

amp5.1K

opencode5.1K

kimi-cli5.1K

文本转语音工具 - 使用 inference.sh CLI 实现高品质AI语音合成，支持多种模型和语言

🇨🇳中文介绍

文本转语音

快速开始

可用模型

相关 Skills

浏览所有音频应用

示例

基础文本转语音

使用 DIA 进行对话式 TTS

长篇音频（播客）

使用 Higgs 进行富有表现力的语音

使用场景

与视频结合

相关技能

文档

🇺🇸English

Text-to-Speech

Quick Start

Available Models

Browse All Audio Apps

Examples

Basic Text-to-Speech

Conversational TTS with DIA

Long-form Audio (Podcasts)

Expressive Speech with Higgs

Use Cases

Combine with Video

Related Skills

Documentation