AI语音克隆与生成工具 - 自然语音合成、多模型选择、命令行操作

ai-voice-cloning by inferen-sh/skills

7,300 周安装量

202 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/inferen-sh/skills --skill ai-voice-cloning

AI/机器学习开发音频处理

🇨🇳中文介绍

AI 语音生成

通过 inference.sh CLI 生成自然的 AI 语音。

AI 语音生成

快速开始

需要 inference.sh CLI (infsh)。安装说明

infsh login

# 生成语音
infsh app run infsh/kokoro-tts --input '{
  "prompt": "Hello! This is an AI-generated voice that sounds natural and engaging.",
  "voice": "af_sarah"
}'

可用模型

模型	应用 ID	最佳用途
ElevenLabs TTS	`elevenlabs/tts`	高品质，22+ 种语音，32 种语言

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

相关 Skills

find-skills 技能搜索工具 - Vercel Labs 开源智能体技能包管理器

733,500 周安装

Vercel React 最佳实践指南 | 58条Next.js性能优化规则与代码重构

252,100 周安装

Vercel Web界面规范检查工具 - 自动检测代码是否符合Web设计指南

202,600 周安装

agent-browser 浏览器自动化工具 - Vercel Labs 命令行网页操作与测试

133,200 周安装

语音 ID	性别	风格
`af_sarah`	女声	温暖，友好
`af_nicole`	女声	专业
`af_sky`	女声	年轻
`am_michael`	男声	权威
`am_adam`	男声	对话式
`am_echo`	男声	清晰，中性

语音 ID	性别	风格
`bf_emma`	女声	优雅
`bf_isabella`	女声	温暖
`bm_george`	男声	经典
`bm_lewis`	男声	现代

infsh app run infsh/kokoro-tts --input '{
  "prompt": "Welcome to our quarterly earnings call. Today we will discuss the financial performance and strategic initiatives for the past quarter.",
  "voice": "am_michael",
  "speed": 1.0
}'

infsh app run infsh/dia-tts --input '{
  "text": "Hey, so I was thinking about that project we discussed. What if we tried a different approach?",
  "voice": "conversational"
}'

infsh app run infsh/kokoro-tts --input '{
  "prompt": "Chapter One. The morning mist hung low over the valley as Sarah made her way down the winding path. She had been walking for hours.",
  "voice": "bf_emma",
  "speed": 0.9
}'

infsh app run infsh/kokoro-tts --input '{
  "prompt": "Introducing the next generation of productivity. Work smarter, not harder.",
  "voice": "af_nicole",
  "speed": 1.1
}'

infsh app run infsh/kokoro-tts --input '{
  "prompt": "Welcome back to Tech Talk! Im your host, and today we are diving deep into the world of artificial intelligence.",
  "voice": "am_adam"
}'

# 生成两个说话者之间的对话
# 说话者 1
infsh app run infsh/kokoro-tts --input '{
  "prompt": "Have you seen the latest AI developments? Its incredible how fast things are moving.",
  "voice": "am_michael"
}' > speaker1.json

# 说话者 2
infsh app run infsh/kokoro-tts --input '{
  "prompt": "I know, right? Just last week I tried that new image generator and was blown away.",
  "voice": "af_sarah"
}' > speaker2.json

# 合并对话
infsh app run infsh/media-merger --input '{
  "audio_files": ["<speaker1-url>", "<speaker2-url>"],
  "crossfade_ms": 300
}'

对于超过 5000 个字符的内容，请分割成块：

# 分块处理长文本
TEXT="Your very long text here..."

# 分割并生成
# 块 1
infsh app run infsh/kokoro-tts --input '{
  "prompt": "<chunk-1>",
  "voice": "bf_emma"
}' > chunk1.json

# 块 2
infsh app run infsh/kokoro-tts --input '{
  "prompt": "<chunk-2>",
  "voice": "bf_emma"
}' > chunk2.json

# 合并块
infsh app run infsh/media-merger --input '{
  "audio_files": ["<chunk1-url>", "<chunk2-url>"],
  "crossfade_ms": 100
}'

语音 + 视频工作流

为视频添加配音

# 1. 生成配音
infsh app run infsh/kokoro-tts --input '{
  "prompt": "This stunning footage shows the beauty of nature in its purest form.",
  "voice": "am_michael"
}' > voiceover.json

# 2. 与视频合并
infsh app run infsh/media-merger --input '{
  "video_url": "https://your-video.mp4",
  "audio_url": "<voiceover-url>"
}'

创建会说话的头像

# 1. 生成语音
infsh app run infsh/kokoro-tts --input '{
  "prompt": "Hi, Im excited to share some updates with you today.",
  "voice": "af_sarah"
}' > speech.json

# 2. 使用头像制作动画
infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "<speech-url>"
}'

速度	效果	适用于
0.8	缓慢，从容	有声书，冥想
0.9	稍慢	教育，教程
1.0	正常	通用
1.1	稍快	广告，充满活力
1.2	快速	快速公告

# 慢速旁白
infsh app run infsh/kokoro-tts --input '{
  "prompt": "Take a deep breath. Let yourself relax.",
  "voice": "bf_emma",
  "speed": 0.8
}'

用于控制节奏的标点符号

使用标点符号来控制语音节奏：

标点符号	效果
句号 `.`	完全停顿
逗号 `,`	短暂停顿
`...`	延长停顿
`!`	强调
`?`	疑问语调
`-`	快速中断

infsh app run infsh/kokoro-tts --input '{
  "prompt": "Wait... Did you hear that? Something is coming. Something big!",
  "voice": "am_adam"
}'

匹配语音与内容 - 商务内容用专业语音，社交内容用休闲语音
使用标点符号 - 用句号和逗号控制节奏
保持句子简短 - 更容易生成且听起来更自然
测试不同语音 - 相同文本在不同语音中听起来不同
调整速度 - 稍慢的速度通常听起来更自然
分割长内容 - 分块处理以确保一致性

配音 - 视频旁白，广告
有声书 - 全书旁白
播客 - AI 主持人和嘉宾
电子学习 - 课程旁白
无障碍访问 - 屏幕阅读器内容
交互式语音应答 - 电话系统消息
内容本地化 - 翻译并配音

# ElevenLabs TTS (高级，22+ 种语音)
npx skills add inference-sh/skills@elevenlabs-tts

# ElevenLabs 语音转换器 (转换录音)
npx skills add inference-sh/skills@elevenlabs-voice-changer

# 所有 TTS 模型
npx skills add inference-sh/skills@text-to-speech

# 播客创作
npx skills add inference-sh/skills@ai-podcast-creation

# AI 头像
npx skills add inference-sh/skills@ai-avatar-video

# 视频生成
npx skills add inference-sh/skills@ai-video-generation

# 完整平台技能
npx skills add inference-sh/skills@infsh-cli

浏览音频应用：infsh app list --category audio

🇺🇸English

AI Voice Generation

Generate natural AI voices via inference.sh CLI.

AI Voice Generation

Quick Start

Requires inference.sh CLI (infsh). Install instructions

infsh login

# Generate speech
infsh app run infsh/kokoro-tts --input '{
  "prompt": "Hello! This is an AI-generated voice that sounds natural and engaging.",
  "voice": "af_sarah"
}'

Available Models

Model	App ID	Best For
ElevenLabs TTS	`elevenlabs/tts`	Premium quality, 22+ voices, 32 languages
ElevenLabs Voice Changer	`elevenlabs/voice-changer`	Transform existing voice recordings
Kokoro TTS	`infsh/kokoro-tts`	Natural, multiple voices
DIA	`infsh/dia-tts`	Conversational, expressive
Chatterbox	`infsh/chatterbox`	Casual, entertainment
Higgs	`infsh/higgs-tts`	Professional narration
VibeVoice	`infsh/vibevoice`	Emotional range

Kokoro Voice Library

American English

Voice ID	Gender	Style
`af_sarah`	Female	Warm, friendly
`af_nicole`	Female	Professional
`af_sky`	Female	Youthful
`am_michael`	Male	Authoritative
`am_adam`	Male	Conversational

British English

Voice ID	Gender	Style
`bf_emma`	Female	Refined
`bf_isabella`	Female	Warm
`bm_george`	Male	Classic
`bm_lewis`	Male	Modern

Voice Generation Examples

Professional Narration

infsh app run infsh/kokoro-tts --input '{
  "prompt": "Welcome to our quarterly earnings call. Today we will discuss the financial performance and strategic initiatives for the past quarter.",
  "voice": "am_michael",
  "speed": 1.0
}'

Conversational Style

infsh app run infsh/dia-tts --input '{
  "text": "Hey, so I was thinking about that project we discussed. What if we tried a different approach?",
  "voice": "conversational"
}'

Audiobook Narration

infsh app run infsh/kokoro-tts --input '{
  "prompt": "Chapter One. The morning mist hung low over the valley as Sarah made her way down the winding path. She had been walking for hours.",
  "voice": "bf_emma",
  "speed": 0.9
}'

Video Voiceover

infsh app run infsh/kokoro-tts --input '{
  "prompt": "Introducing the next generation of productivity. Work smarter, not harder.",
  "voice": "af_nicole",
  "speed": 1.1
}'

Podcast Host

infsh app run infsh/kokoro-tts --input '{
  "prompt": "Welcome back to Tech Talk! Im your host, and today we are diving deep into the world of artificial intelligence.",
  "voice": "am_adam"
}'

Multi-Voice Conversation

# Generate dialogue between two speakers
# Speaker 1
infsh app run infsh/kokoro-tts --input '{
  "prompt": "Have you seen the latest AI developments? Its incredible how fast things are moving.",
  "voice": "am_michael"
}' > speaker1.json

# Speaker 2
infsh app run infsh/kokoro-tts --input '{
  "prompt": "I know, right? Just last week I tried that new image generator and was blown away.",
  "voice": "af_sarah"
}' > speaker2.json

# Merge conversation
infsh app run infsh/media-merger --input '{
  "audio_files": ["<speaker1-url>", "<speaker2-url>"],
  "crossfade_ms": 300
}'

Long-Form Content

Chunked Processing

For content over 5000 characters, split into chunks:

# Process long text in chunks
TEXT="Your very long text here..."

# Split and generate
# Chunk 1
infsh app run infsh/kokoro-tts --input '{
  "prompt": "<chunk-1>",
  "voice": "bf_emma"
}' > chunk1.json

# Chunk 2
infsh app run infsh/kokoro-tts --input '{
  "prompt": "<chunk-2>",
  "voice": "bf_emma"
}' > chunk2.json

# Merge chunks
infsh app run infsh/media-merger --input '{
  "audio_files": ["<chunk1-url>", "<chunk2-url>"],
  "crossfade_ms": 100
}'

Voice + Video Workflow

Add Voiceover to Video

# 1. Generate voiceover
infsh app run infsh/kokoro-tts --input '{
  "prompt": "This stunning footage shows the beauty of nature in its purest form.",
  "voice": "am_michael"
}' > voiceover.json

# 2. Merge with video
infsh app run infsh/media-merger --input '{
  "video_url": "https://your-video.mp4",
  "audio_url": "<voiceover-url>"
}'

Create Talking Head

# 1. Generate speech
infsh app run infsh/kokoro-tts --input '{
  "prompt": "Hi, Im excited to share some updates with you today.",
  "voice": "af_sarah"
}' > speech.json

# 2. Animate with avatar
infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "<speech-url>"
}'

Speed and Pacing

Speed	Effect	Use For
0.8	Slow, deliberate	Audiobooks, meditation
0.9	Slightly slow	Education, tutorials
1.0	Normal	General purpose
1.1	Slightly fast	Commercials, energy
1.2	Fast	Quick announcements

# Slow narration
infsh app run infsh/kokoro-tts --input '{
  "prompt": "Take a deep breath. Let yourself relax.",
  "voice": "bf_emma",
  "speed": 0.8
}'

Punctuation for Pacing

Use punctuation to control speech rhythm:

Punctuation	Effect
Period `.`	Full pause
Comma `,`	Brief pause
`...`	Extended pause
`!`	Emphasis
`?`	Question intonation
`-`	Quick break

infsh app run infsh/kokoro-tts --input '{
  "prompt": "Wait... Did you hear that? Something is coming. Something big!",
  "voice": "am_adam"
}'

Best Practices

Match voice to content - Professional voice for business, casual for social
Use punctuation - Control pacing with periods and commas
Keep sentences short - Easier to generate and sounds more natural
Test different voices - Same text sounds different across voices
Adjust speed - Slightly slower often sounds more natural
Break long content - Process in chunks for consistency

Use Cases

Voiceovers - Video narration, commercials
Audiobooks - Full book narration
Podcasts - AI hosts and guests
E-learning - Course narration
Accessibility - Screen reader content
IVR - Phone system messages
Content localization - Translate and voice

Related Skills

# ElevenLabs TTS (premium, 22+ voices)
npx skills add inference-sh/skills@elevenlabs-tts

# ElevenLabs voice changer (transform recordings)
npx skills add inference-sh/skills@elevenlabs-voice-changer

# All TTS models
npx skills add inference-sh/skills@text-to-speech

# Podcast creation
npx skills add inference-sh/skills@ai-podcast-creation

# AI avatars
npx skills add inference-sh/skills@ai-avatar-video

# Video generation
npx skills add inference-sh/skills@ai-video-generation

# Full platform skill
npx skills add inference-sh/skills@infsh-cli

Browse audio apps: infsh app list --category audio

Weekly Installs

7.3K

Repository

inferen-sh/skills

GitHub Stars

202

First Seen

14 days ago

Security Audits

Gen Agent Trust HubPass SocketWarn SnykPass

Installed on

claude-code5.8K

gemini-cli5.2K

codex5.2K

opencode5.2K

amp5.2K

kimi-cli5.2K

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

102,200 周安装

AI语音克隆与生成工具 - 自然语音合成、多模型选择、命令行操作

🇨🇳中文介绍

AI 语音生成

快速开始

可用模型

相关 Skills

Kokoro 语音库

美式英语

英式英语

语音生成示例

专业旁白

对话风格

有声书旁白

视频配音

播客主持人

多语音对话

长篇内容

分块处理

语音 + 视频工作流

为视频添加配音

创建会说话的头像

速度和节奏

用于控制节奏的标点符号

最佳实践

使用场景

相关技能

🇺🇸English

AI Voice Generation

Quick Start

Available Models

Kokoro Voice Library

American English

British English

Voice Generation Examples

Professional Narration

Conversational Style

Audiobook Narration

Video Voiceover

Podcast Host

Multi-Voice Conversation

Long-Form Content

Chunked Processing

Voice + Video Workflow

Add Voiceover to Video

Create Talking Head

Speed and Pacing

Punctuation for Pacing

Best Practices

Use Cases

Related Skills

最新 Skills