tts by noizai/skills
npx skills add https://github.com/noizai/skills --skill tts将任意文本转换为语音音频。支持两种后端(Kokoro 本地、Noiz 云端)、两种模式(简单模式或时间线精准模式),以及逐片段语音控制。
speak 是默认子命令,可以省略:
# 基本用法(speak 是隐式的)
python3 skills/tts/scripts/tts.py -t "Hello world" # 添加 -o 路径以保存
python3 skills/tts/scripts/tts.py -f article.txt -o out.mp3
# 语音克隆 — 本地文件路径或 URL
python3 skills/tts/scripts/tts.py -t "Hello" --ref-audio ./ref.wav
python3 skills/tts/scripts/tts.py -t "Hello" --ref-audio https://example.com/my_voice.wav -o clone.wav
# 语音消息格式
python3 skills/tts/scripts/tts.py -t "Hello" --format opus -o voice.opus
python3 skills/tts/scripts/tts.py -t "Hello" --format ogg -o voice.ogg
第三方集成(飞书/Telegram/Discord)的文档在 ref_3rd_party.md。
用于精确的逐片段时间控制(配音、字幕、视频旁白)。
如果用户没有 SRT 文件,可以从文本生成:
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
python3 skills/tts/scripts/tts.py to-srt -i article.txt -o article.srt
python3 skills/tts/scripts/tts.py to-srt -i article.txt -o article.srt --cps 15 --gap 500
--cps = 每秒字符数(默认 4,适合中文;英文约 15)。智能体也可以手动编写 SRT。
JSON 文件,控制默认及逐片段的语音设置。segments 的键支持单个索引 "3" 或范围 "5-8"。
Kokoro 语音映射文件示例:
{
"default": { "voice": "zf_xiaoni", "lang": "cmn" },
"segments": {
"1": { "voice": "zm_yunxi" },
"5-8": { "voice": "af_sarah", "lang": "en-us", "speed": 0.9 }
}
}
Noiz 语音映射文件示例(增加了 emo、reference_audio 支持)。reference_audio 可以是本地路径或 URL(用户自己的音频;仅 Noiz 支持):
{
"default": { "voice_id": "voice_123", "target_lang": "zh" },
"segments": {
"1": { "voice_id": "voice_host", "emo": { "Joy": 0.6 } },
"2-4": { "reference_audio": "./refs/guest.wav" }
}
}
动态参考音频切片:如果您正在翻译或为视频配音,并希望每个句子自动使用原始视频中完全相同时间戳的音频作为其参考音频,请使用 --ref-audio-track 参数,而不是在映射文件中设置 reference_audio:
python3 skills/tts/scripts/tts.py render --srt input.srt --voice-map vm.json --ref-audio-track original_video.mp4 -o output.wav
完整示例请查看 examples/ 目录。
python3 skills/tts/scripts/tts.py render --srt input.srt --voice-map vm.json -o output.wav
python3 skills/tts/scripts/tts.py render --srt input.srt --voice-map vm.json --backend noiz --auto-emotion -o output.wav
| 需求 | 推荐 |
|---|---|
| 只需朗读文本,无需复杂功能 | Kokoro(默认) |
| 带章节的 EPUB/PDF 有声书 | Kokoro(原生支持) |
语音混合("v1:60,v2:40") | Kokoro |
| 根据参考音频进行语音克隆 | Noiz |
情感控制(emo 参数) | Noiz |
| 服务器端精确控制每段时长 | Noiz |
当用户需要同时使用情感控制 + 语音克隆 + 精确时长时,Noiz 是唯一支持这三项功能的后端。
当未配置 API 密钥时,tts.py 会自动回退到访客模式——一个无需认证的有限功能 Noiz 端点。访客模式仅支持 --voice-id、--speed 和 --format;不支持语音克隆、情感控制、时长控制和时间线渲染。
# 访客模式(未设置 API 密钥时自动检测)
python3 skills/tts/scripts/tts.py -t "Hello" --voice-id 883b6b7c -o hello.wav
# 显式指定后端以使用 kokoro
python3 skills/tts/scripts/tts.py -t "Hello" --backend kokoro
可用的访客语音(15 种内置):
| voice_id | 名称 | 语言 | 性别 | 语调 |
|---|---|---|---|---|
063a4491 | 販売員(なおみ) | ja | F | 喜び |
4252b9c8 | 落ち着いた女性 | ja | F | 穏やか |
578b4be2 | 熱血漢(たける) | ja | M | 怒り |
a9249ce7 | 安らぎ(みなと) | ja | M | 穏やか |
f00e45a1 | 旅人(かいと) | ja | M | 穏やか |
b4775100 | 悦悦|社交分享 | zh | F | Joyful |
77e15f2c | 婉青|情绪抚慰 | zh | F | Calm |
ac09aeb4 | 阿豪|磁性主持 | zh | M | Calm |
87cb2405 | 建国|知识科普 | zh | M | Calm |
3b9f1e27 | 小明|科技达人 | zh | M | Joyful |
95814add | Science Narration | en | M | Calm |
883b6b7c | The Mentor (Alex) | en | M | Joyful |
a845c7de | The Naturalist (Silas) | en | M | Calm |
5a68d66b | The Healer (Serena) | en | F | Calm |
0e4ab6ec | The Mentor (Maya) | en | F | Calm |
此技能在运行时执行以下文件和网络操作:
config --set-api-key 时,密钥会保存到 ~/.config/noiz/api_key(权限为 0600)。也支持使用 NOIZ_API_KEY 环境变量作为替代。~/.noiz_api_key 存在而 ~/.config/noiz/api_key 不存在,密钥会被复制(而非删除)到新位置。会打印一条消息;旧文件保持不变,供您手动移除。https://noiz.ai/v1/ 进行合成。除非您调用 Noiz 命令,否则不会发送任何数据。--ref-audio 是 URL 时,文件会下载到临时文件,用于 API 调用,然后删除。如果未提供 voice-id 或 ref-audio,则会从 storage.googleapis.com 或 noiz.ai 下载默认参考音频。render 模式下调用,用于组装最终音频。除输出路径和 ~/.config/noiz/ 外,不会修改其他文件。Kokoro 后端完全离线运行,无需网络访问。
ffmpeg(仅时间线模式需要)requests 包:uv pip install requests(Noiz 后端需要)python3 skills/tts/scripts/tts.py config --set-api-key YOUR_KEY(访客模式无需密钥即可工作,但功能有限)--backend kokoro 以使用本地后端仅使用 base64 编码的 API 密钥作为 Authorization 头——不加前缀(例如,不加 APIKEY 或 Bearer )。任何前缀都会导致 401 错误。
有关后端详情和完整参数参考,请参阅 reference.md。
每周安装量
2.4K
代码仓库
GitHub 星标数
402
首次出现
2026 年 2 月 28 日
安全审计
安装于
gemini-cli2.4K
opencode2.4K
cursor2.4K
kimi-cli2.4K
codex2.4K
cline2.4K
Convert any text into speech audio. Supports two backends (Kokoro local, Noiz cloud), two modes (simple or timeline-accurate), and per-segment voice control.
speak is the default — the subcommand can be omitted:
# Basic usage (speak is implicit)
python3 skills/tts/scripts/tts.py -t "Hello world" # add -o path to save
python3 skills/tts/scripts/tts.py -f article.txt -o out.mp3
# Voice cloning — local file path or URL
python3 skills/tts/scripts/tts.py -t "Hello" --ref-audio ./ref.wav
python3 skills/tts/scripts/tts.py -t "Hello" --ref-audio https://example.com/my_voice.wav -o clone.wav
# Voice message format
python3 skills/tts/scripts/tts.py -t "Hello" --format opus -o voice.opus
python3 skills/tts/scripts/tts.py -t "Hello" --format ogg -o voice.ogg
Third-party integration (Feishu/Telegram/Discord) is documented in ref_3rd_party.md.
For precise per-segment timing (dubbing, subtitles, video narration).
If the user doesn't have one, generate from text:
python3 skills/tts/scripts/tts.py to-srt -i article.txt -o article.srt
python3 skills/tts/scripts/tts.py to-srt -i article.txt -o article.srt --cps 15 --gap 500
--cps = characters per second (default 4, good for Chinese; ~15 for English). The agent can also write SRT manually.
JSON file controlling default + per-segment voice settings. segments keys support single index "3" or range "5-8".
Kokoro voice map:
{
"default": { "voice": "zf_xiaoni", "lang": "cmn" },
"segments": {
"1": { "voice": "zm_yunxi" },
"5-8": { "voice": "af_sarah", "lang": "en-us", "speed": 0.9 }
}
}
Noiz voice map (adds emo, reference_audio support). reference_audio can be a local path or a URL (user’s own audio; Noiz only):
{
"default": { "voice_id": "voice_123", "target_lang": "zh" },
"segments": {
"1": { "voice_id": "voice_host", "emo": { "Joy": 0.6 } },
"2-4": { "reference_audio": "./refs/guest.wav" }
}
}
Dynamic Reference Audio Slicing : If you are translating or dubbing a video and want each sentence to automatically use the audio from the original video at the exact same timestamp as its reference audio, use the --ref-audio-track argument instead of setting reference_audio in the map:
python3 skills/tts/scripts/tts.py render --srt input.srt --voice-map vm.json --ref-audio-track original_video.mp4 -o output.wav
See examples/ for full samples.
python3 skills/tts/scripts/tts.py render --srt input.srt --voice-map vm.json -o output.wav
python3 skills/tts/scripts/tts.py render --srt input.srt --voice-map vm.json --backend noiz --auto-emotion -o output.wav
| Need | Recommended |
|---|---|
| Just read text aloud, no fuss | Kokoro (default) |
| EPUB/PDF audiobook with chapters | Kokoro (native support) |
Voice blending ("v1:60,v2:40") | Kokoro |
| Voice cloning from reference audio | Noiz |
Emotion control (emo param) | Noiz |
| Exact server-side duration per segment | Noiz |
When the user needs emotion control + voice cloning + precise duration together, Noiz is the only backend that supports all three.
When no API key is configured, tts.py automatically falls back to guest mode — a limited Noiz endpoint that requires no authentication. Guest mode only supports --voice-id, --speed, and --format; voice cloning, emotion, duration, and timeline rendering are not available.
# Guest mode (auto-detected when no API key is set)
python3 skills/tts/scripts/tts.py -t "Hello" --voice-id 883b6b7c -o hello.wav
# Explicit backend override to use kokoro instead
python3 skills/tts/scripts/tts.py -t "Hello" --backend kokoro
Available guest voices (15 built-in):
| voice_id | name | lang | gender | tone |
|---|---|---|---|---|
063a4491 | 販売員(なおみ) | ja | F | 喜び |
4252b9c8 | 落ち着いた女性 | ja | F | 穏やか |
578b4be2 | 熱血漢(たける) | ja | M | 怒り |
a9249ce7 | 安らぎ(みなと) |
This skill performs the following file and network operations at runtime:
config --set-api-key, the key is saved to ~/.config/noiz/api_key (permissions 0600). The NOIZ_API_KEY environment variable is also supported as an alternative.~/.noiz_api_key exists and ~/.config/noiz/api_key does not, the key is copied (not deleted) to the new location. A message is printed; the old file is left untouched for you to remove manually.https://noiz.ai/v1/ for synthesis. No data is sent unless you invoke a Noiz command.--ref-audio is a URL, the file is downloaded to a temp file, used for the API call, then deleted. If no voice-id or ref-audio is provided, a default reference audio is downloaded from or .No files outside the output path and ~/.config/noiz/ are modified. The Kokoro backend runs entirely offline with no network access.
ffmpeg in PATH (timeline mode only)requests package: uv pip install requests (required for Noiz backend)python3 skills/tts/scripts/tts.py config --set-api-key YOUR_KEY (guest mode works without a key but has limited features)--backend kokoro to use the local backendUse only the base64-encoded API key as Authorization—no prefix (e.g. no APIKEY or Bearer ). Any prefix causes 401.
For backend details and full argument reference, see reference.md.
Weekly Installs
2.4K
Repository
GitHub Stars
402
First Seen
Feb 28, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
gemini-cli2.4K
opencode2.4K
cursor2.4K
kimi-cli2.4K
codex2.4K
cline2.4K
71,300 周安装
| ja |
| M |
| 穏やか |
f00e45a1 | 旅人(かいと) | ja | M | 穏やか |
b4775100 | 悦悦|社交分享 | zh | F | Joyful |
77e15f2c | 婉青|情绪抚慰 | zh | F | Calm |
ac09aeb4 | 阿豪|磁性主持 | zh | M | Calm |
87cb2405 | 建国|知识科普 | zh | M | Calm |
3b9f1e27 | 小明|科技达人 | zh | M | Joyful |
95814add | Science Narration | en | M | Calm |
883b6b7c | The Mentor (Alex) | en | M | Joyful |
a845c7de | The Naturalist (Silas) | en | M | Calm |
5a68d66b | The Healer (Serena) | en | F | Calm |
0e4ab6ec | The Mentor (Maya) | en | F | Calm |
storage.googleapis.comnoiz.airender mode to assemble the final audio.