speakturbo-tts：超低延迟文本转语音工具，90毫秒实时对话，支持8种语音

speakturbo-tts by emzod/speak-turbo

948 周安装量

17 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/emzod/speak-turbo --skill speakturbo-tts

AI/机器学习自动化音频处理

🇨🇳中文介绍

speakturbo - 与你的 Claude 对话！

赋予你的代理实时与你对话的能力。超快速文本转语音，延迟约 90 毫秒，内置 8 种语音。

快速开始

# 立即播放 - 你应该能通过扬声器听到 "Hello world"
speakturbo "Hello world"
# 输出: ⚡ 92ms → ▶ 93ms → ✓ 1245ms

# 通过保存到文件来验证是否正常工作
speakturbo "Hello world" -o test.wav
ls -lh test.wav  # 应显示约 50-100KB 的文件

输出说明: ⚡ = 收到第一段音频，▶ = 开始播放，✓ = 完成

首次运行

首次执行需要 2-5 秒，因为守护进程启动并将模型加载到内存中。后续调用到发出第一个声音约需 90 毫秒。

# 首次运行（较慢 - 守护进程启动中）
speakturbo "Starting up"  # 约 2-5 秒

# 第二次运行（快速 - 守护进程已在运行）
speakturbo "Now I'm fast"  # 约 90ms

使用方法

# 基础用法 - 立即播放（默认语音：alba）
speakturbo "Hello world"

# 保存到文件（不播放音频）
speakturbo "Hello" -o output.wav

# 保存到指定文件
speakturbo "Goodbye" -o goodbye.wav

# 静默模式（抑制状态消息，仍播放音频）
speakturbo "Hello" -q

# 列出可用语音
speakturbo --list-voices

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

相关 Skills

FlyClaw：零登录航班聚合查询工具，Python实现多源航班信息与价格搜索

4,000,000 周安装

find-skills 技能搜索工具 - Vercel Labs 开源智能体技能包管理器

749,400 周安装

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

103,800 周安装

AI Elements：基于shadcn/ui的AI原生应用组件库，快速构建对话界面

53,500 周安装

语音	类型
`alba`	女声（默认）
`marius`	男声
`javert`	男声
`jean`	男声
`fantine`	女声
`cosette`	女声
`eponine`	女声
`azelma`	女声

指标	数值
首次发声时间	约 90ms（守护进程已预热）
首次运行	2-5s（守护进程启动）
实时因子	约快 4 倍
采样率	24kHz 单声道

speakturbo (Rust CLI, 2.2MB)
    │
    │ HTTP 流式传输 (端口 7125)
    ▼
speakturbo-daemon (Python + pocket-tts)
    │
    │ 模型在内存中，空闲 1 小时后自动关闭
    ▼
音频播放 (rodio)

编码: UTF-8
文本中的引号: 使用转义: speakturbo "She said \"hello\""
长文本: 支持，生成时流式传输

-o 标志只允许写入白名单中的目录。默认情况下，这些目录是：

/tmp 和系统临时目录
你的当前工作目录
~/.speakturbo/

如果你需要写入其他位置，请使用 --allow-dir：

speakturbo "Hello" -o /custom/path/audio.wav --allow-dir /custom/path

要永久允许一个目录，请将其添加到 ~/.speakturbo/config：

mkdir -p ~/.speakturbo && echo "/custom/path" >> ~/.speakturbo/config

配置文件每行一个目录。以 # 开头的行是注释。

代码	含义
0	成功（音频已播放/保存）
1	错误（守护进程连接失败，参数无效）

在以下情况下使用 speakturbo：

你需要即时音频反馈（约 90 毫秒）
速度比语音多样性更重要
内置语音足够使用

在以下情况下使用 speak：

你需要自定义语音克隆（摩根·弗里曼等）→ speak "text" --voice ~/.chatter/voices/morgan_freeman.wav
你需要情感标签，如 [laugh]、[sigh]
质量/多样性比速度更重要

完整用法请参阅 speak 技能文档。

没有音频播放：

# 检查守护进程是否在运行
curl http://127.0.0.1:7125/health
# 预期输出: {"status":"ready","voices":["alba","marius",...]}

# 通过保存到文件并手动播放来验证
speakturbo "test" -o /tmp/test.wav
afplay /tmp/test.wav  # macOS
aplay /tmp/test.wav   # Linux

守护进程无法启动：

# 检查端口可用性
lsof -i :7125

# 手动终止并重启
pkill -f "daemon_streaming"
speakturbo "test"  # 自动重启守护进程

首次运行很慢： 这是正常现象。守护进程需要将约 100MB 的模型加载到内存中。后续调用会很快（约 90 毫秒）。

守护进程在首次使用时自动启动，并在空闲 1 小时后自动关闭。

# 检查状态
curl http://127.0.0.1:7125/health

# 手动停止
pkill -f "daemon_streaming"

# 查看日志
cat /tmp/speakturbo.log

功能	speakturbo	speak
首次发声时间	约 90ms	约 4-8s
语音克隆	❌	✅
情感标签	❌	✅
语音	8 种内置	自定义 wav 文件
引擎	pocket-tts	Chatterbox

2026 年 1 月 27 日

🇺🇸English

speakturbo - Talk to your Claude!

Give your agent the ability to speak to you real-time. Ultra-fast text-to-speech with ~90ms latency and 8 built-in voices.

Quick Start

# Play immediately - you should hear "Hello world" through your speakers
speakturbo "Hello world"
# Output: ⚡ 92ms → ▶ 93ms → ✓ 1245ms

# Verify it's working by saving to file
speakturbo "Hello world" -o test.wav
ls -lh test.wav  # Should show ~50-100KB file

Output explained: ⚡ = first audio received, ▶ = playback started, ✓ = done

First Run

The first execution takes 2-5 seconds while the daemon starts and loads the model into memory. Subsequent calls are ~90ms to first sound.

# First run (slow - daemon starting)
speakturbo "Starting up"  # ~2-5 seconds

# Second run (fast - daemon already running)
speakturbo "Now I'm fast"  # ~90ms

Usage

# Basic - plays immediately (default voice: alba)
speakturbo "Hello world"

# Save to file (no audio playback)
speakturbo "Hello" -o output.wav

# Save to specific file
speakturbo "Goodbye" -o goodbye.wav

# Quiet mode (suppress status messages, still plays audio)
speakturbo "Hello" -q

# List available voices
speakturbo --list-voices

Available Voices

Voice	Type
`alba`	Female (default)
`marius`	Male
`javert`	Male
`jean`	Male
`fantine`	Female
`cosette`	Female
`eponine`

Performance

Metric	Value
Time to first sound	~90ms (daemon warm)
First run	2-5s (daemon startup)
Real-time factor	~4x faster
Sample rate	24kHz mono

Architecture

speakturbo (Rust CLI, 2.2MB)
    │
    │ HTTP streaming (port 7125)
    ▼
speakturbo-daemon (Python + pocket-tts)
    │
    │ Model in memory, auto-shutdown after 1hr idle
    ▼
Audio playback (rodio)

Text Input

Encoding: UTF-8
Quotes in text: Use escaping: speakturbo "She said \"hello\""
Long text: Supported, streams as it generates

Output Path Security

The -o flag only writes to directories that are on the allowlist. By default, these are:

/tmp and system temp directories
Your current working directory
~/.speakturbo/

If you need to write elsewhere, use --allow-dir:

speakturbo "Hello" -o /custom/path/audio.wav --allow-dir /custom/path

To permanently allow a directory, add it to ~/.speakturbo/config:

mkdir -p ~/.speakturbo && echo "/custom/path" >> ~/.speakturbo/config

The config file is one directory per line. Lines starting with # are comments.

Exit Codes

Code	Meaning
0	Success (audio played/saved)
1	Error (daemon connection failed, invalid args)

When to Use

Use speakturbo when:

You need instant audio feedback (~90ms)
Speed matters more than voice variety
Built-in voices are sufficient

Usespeak instead when:

You need custom voice cloning (Morgan Freeman, etc.) → speak "text" --voice ~/.chatter/voices/morgan_freeman.wav
You need emotion tags like [laugh], [sigh]
Quality/variety matters more than speed

See the speak skill documentation for full usage.

Troubleshooting

No audio plays:

# Check daemon is running
curl http://127.0.0.1:7125/health
# Expected: {"status":"ready","voices":["alba","marius",...]}

# Verify by saving to file and playing manually
speakturbo "test" -o /tmp/test.wav
afplay /tmp/test.wav  # macOS
aplay /tmp/test.wav   # Linux

Daemon won't start:

# Check port availability
lsof -i :7125

# Manually kill and restart
pkill -f "daemon_streaming"
speakturbo "test"  # Auto-restarts daemon

First run is slow: This is expected. The daemon needs to load the ~100MB model into memory. Subsequent calls will be fast (~90ms).

Daemon Management

The daemon auto-starts on first use and auto-shuts down after 1 hour idle.

# Check status
curl http://127.0.0.1:7125/health

# Manual stop
pkill -f "daemon_streaming"

# View logs
cat /tmp/speakturbo.log

Comparison with speak

Feature	speakturbo	speak
Time to first sound	~90ms	~4-8s
Voice cloning	❌	✅
Emotion tags	❌	✅
Voices	8 built-in	Custom wav files
Engine	pocket-tts	Chatterbox

Weekly Installs

946

Repository

emzod/speak-turbo

GitHub Stars

First Seen

Jan 27, 2026

Security Audits

Gen Agent Trust HubFail SocketPass SnykPass

Installed on

github-copilot902

antigravity841

continue827

codebuddy825

opencode416

codex407

AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具

40,200 周安装

speakturbo-tts：超低延迟文本转语音工具，90毫秒实时对话，支持8种语音

🇨🇳中文介绍

speakturbo - 与你的 Claude 对话！

快速开始

首次运行

使用方法

相关 Skills

可用语音

性能指标

架构

文本输入

输出路径安全

退出代码

使用场景

故障排除

守护进程管理

与 speak 的对比