text-to-speech by elevenlabs/skills
npx skills add https://github.com/elevenlabs/skills --skill text-to-speech从文本生成自然语音 - 支持 70 多种语言,提供多种模型以权衡质量与延迟。
设置: 请参阅安装指南。对于 JavaScript,请仅使用
@elevenlabs/*包。
from elevenlabs import ElevenLabs
client = ElevenLabs()
audio = client.text_to_speech.convert(
text="Hello, welcome to ElevenLabs!",
voice_id="JBFqnCBsd6RMkjVDRZzb", # George
model_id="eleven_multilingual_v2"
)
with open("output.mp3", "wb") as f:
for chunk in audio:
f.write(chunk)
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import { createWriteStream } from "fs";
const client = new ElevenLabsClient();
const audio = await client.textToSpeech.convert("JBFqnCBsd6RMkjVDRZzb", {
text: "Hello, welcome to ElevenLabs!",
modelId: "eleven_multilingual_v2",
});
audio.pipe(createWriteStream("output.mp3"));
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
curl -X POST "https://api.elevenlabs.io/v1/text-to-speech/JBFqnCBsd6RMkjVDRZzb" \
-H "xi-api-key: $ELEVENLABS_API_KEY" -H "Content-Type: application/json" \
-d '{"text": "Hello!", "model_id": "eleven_multilingual_v2"}' --output output.mp3
| 模型 ID | 支持语言 | 延迟 | 最佳用途 |
|---|---|---|---|
eleven_v3 | 70+ | 标准 | 最高质量,情感范围广 |
eleven_multilingual_v2 | 29 | 标准 | 高质量,长内容 |
eleven_flash_v2_5 | 32 | ~75ms | 超低延迟,实时应用 |
eleven_flash_v2 | 英语 | ~75ms | 仅英语,最快 |
eleven_turbo_v2_5 | 32 | ~250-300ms | 质量与速度平衡 |
eleven_turbo_v2 | 英语 | ~250-300ms | 仅英语,平衡 |
使用预设语音或在仪表板中创建自定义语音。
热门语音:
JBFqnCBsd6RMkjVDRZzb - George (男性,叙述型)
EXAVITQu4vr4xnSDxMaL - Sarah (女性,柔和)
onwK4e9ZLuTAKqWW03F9 - Daniel (男性,权威)
XB0fDUnXU5powFXDhCwa - Charlotte (女性,对话式)
voices = client.voices.get_all() for voice in voices.voices: print(f"{voice.voice_id}: {voice.name}")
微调语音效果:
稳定性 : 语音保持一致的程度的。值越低 = 情感范围和变化越大,但可能听起来不稳定。值越高 = 语音平稳、可预测。
相似度增强 : 与原始语音样本的匹配程度。值越高听起来越像原始语音,但可能放大音频伪影。
风格 : 增强语音的独特风格特征(仅适用于 v2+ 模型)。
说话者增强 : 增强清晰度和语音相似度的后处理。
from elevenlabs import VoiceSettings
audio = client.text_to_speech.convert( text="Customize my voice settings.", voice_id="JBFqnCBsd6RMkjVDRZzb", voice_settings=VoiceSettings( stability=0.5, similarity_boost=0.75, style=0.5, speed=1.0, # 0.25 到 4.0 (默认 1.0) use_speaker_boost=True ) )
强制使用特定语言进行发音:
audio = client.text_to_speech.convert(
text="Bonjour, comment allez-vous?",
voice_id="JBFqnCBsd6RMkjVDRZzb",
model_id="eleven_multilingual_v2",
language_code="fr" # ISO 639-1 代码
)
控制数字、日期和缩写如何转换为口语单词。例如,"01/15/2026" 变为 "January fifteenth, twenty twenty-six":
"auto" (默认): 模型根据上下文决定
"on": 始终规范化(当您想要自然语音时使用)
"off": 按字面朗读(当您想要 "zero one slash one five..." 时使用)
audio = client.text_to_speech.convert( text="Call 1-800-555-0123 on 01/15/2026", voice_id="JBFqnCBsd6RMkjVDRZzb", apply_text_normalization="on" )
当分多个请求生成长音频时,音频在边界处可能出现爆音、不自然的停顿或音调变化。请求拼接通过让每个请求知道其前后内容来解决此问题:
# 第一个请求
audio1 = client.text_to_speech.convert(
text="This is the first part.",
voice_id="JBFqnCBsd6RMkjVDRZzb",
next_text="And this continues the story."
)
# 第二个请求使用前文上下文
audio2 = client.text_to_speech.convert(
text="And this continues the story.",
voice_id="JBFqnCBsd6RMkjVDRZzb",
previous_text="This is the first part."
)
| 格式 | 描述 |
|---|---|
mp3_44100_128 | MP3 44.1kHz 128kbps (默认) - 压缩格式,适用于网页/应用 |
mp3_44100_192 | MP3 44.1kHz 192kbps (Creator+) - 更高质量的压缩格式 |
mp3_44100_64 | MP3 44.1kHz 64kbps - 较低质量,文件更小 |
mp3_22050_32 | MP3 22.05kHz 32kbps - 最小的 MP3 文件 |
pcm_16000 | 原始 PCM 16kHz - 用于实时处理 |
pcm_22050 | 原始 PCM 22.05kHz |
pcm_24000 | 原始 PCM 24kHz - 流媒体的良好平衡 |
pcm_44100 | 原始 PCM 44.1kHz (Pro+) - CD 质量 |
pcm_48000 | 原始 PCM 48kHz (Pro+) - 最高质量 |
ulaw_8000 | μ-law 8kHz - 电话系统标准 (Twilio, 电话) |
alaw_8000 | A-law 8kHz - 电话 (μ-law 的替代方案) |
opus_48000_64 | Opus 48kHz 64kbps - 高效的流媒体编解码器 |
wav_44100 | WAV 44.1kHz - 带头的未压缩格式 |
对于实时应用,使用 stream 方法(在音频生成时返回音频块):
audio_stream = client.text_to_speech.stream(
text="This text will be streamed as audio.",
voice_id="JBFqnCBsd6RMkjVDRZzb",
model_id="eleven_flash_v2_5" # 超低延迟
)
for chunk in audio_stream:
play_audio(chunk)
有关 WebSocket 流式传输,请参阅 references/streaming.md。
try:
audio = client.text_to_speech.convert(
text="Generate speech",
voice_id="invalid-voice-id"
)
except Exception as e:
print(f"API error: {e}")
常见错误:
通过响应头 (x-character-count, request-id) 监控字符使用量:
response = client.text_to_speech.convert.with_raw_response(
text="Hello!", voice_id="JBFqnCBsd6RMkjVDRZzb", model_id="eleven_multilingual_v2"
)
audio = response.parse()
print(f"Characters used: {response.headers.get('x-character-count')}")
每周安装量
2.2K
代码仓库
GitHub 星标数
143
首次出现
2026年1月27日
安全审计
安装于
codex1.8K
gemini-cli1.8K
opencode1.8K
github-copilot1.6K
kimi-cli1.6K
amp1.5K
Generate natural speech from text - supports 70+ languages, multiple models for quality vs latency tradeoffs.
Setup: See Installation Guide. For JavaScript, use
@elevenlabs/*packages only.
from elevenlabs import ElevenLabs
client = ElevenLabs()
audio = client.text_to_speech.convert(
text="Hello, welcome to ElevenLabs!",
voice_id="JBFqnCBsd6RMkjVDRZzb", # George
model_id="eleven_multilingual_v2"
)
with open("output.mp3", "wb") as f:
for chunk in audio:
f.write(chunk)
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import { createWriteStream } from "fs";
const client = new ElevenLabsClient();
const audio = await client.textToSpeech.convert("JBFqnCBsd6RMkjVDRZzb", {
text: "Hello, welcome to ElevenLabs!",
modelId: "eleven_multilingual_v2",
});
audio.pipe(createWriteStream("output.mp3"));
curl -X POST "https://api.elevenlabs.io/v1/text-to-speech/JBFqnCBsd6RMkjVDRZzb" \
-H "xi-api-key: $ELEVENLABS_API_KEY" -H "Content-Type: application/json" \
-d '{"text": "Hello!", "model_id": "eleven_multilingual_v2"}' --output output.mp3
| Model ID | Languages | Latency | Best For |
|---|---|---|---|
eleven_v3 | 70+ | Standard | Highest quality, emotional range |
eleven_multilingual_v2 | 29 | Standard | High quality, long-form content |
eleven_flash_v2_5 | 32 | ~75ms | Ultra-low latency, real-time |
eleven_flash_v2 | English | ~75ms | English-only, fastest |
Use pre-made voices or create custom voices in the dashboard.
Popular voices:
JBFqnCBsd6RMkjVDRZzb - George (male, narrative)
EXAVITQu4vr4xnSDxMaL - Sarah (female, soft)
onwK4e9ZLuTAKqWW03F9 - Daniel (male, authoritative)
XB0fDUnXU5powFXDhCwa - Charlotte (female, conversational)
voices = client.voices.get_all() for voice in voices.voices: print(f"{voice.voice_id}: {voice.name}")
Fine-tune how the voice sounds:
Stability : How consistent the voice stays. Lower values = more emotional range and variation, but can sound unstable. Higher = steady, predictable delivery.
Similarity boost : How closely to match the original voice sample. Higher values sound more like the original but may amplify audio artifacts.
Style : Exaggerates the voice's unique style characteristics (only works with v2+ models).
Speaker boost : Post-processing that enhances clarity and voice similarity.
from elevenlabs import VoiceSettings
audio = client.text_to_speech.convert( text="Customize my voice settings.", voice_id="JBFqnCBsd6RMkjVDRZzb", voice_settings=VoiceSettings( stability=0.5, similarity_boost=0.75, style=0.5, speed=1.0, # 0.25 to 4.0 (default 1.0) use_speaker_boost=True ) )
Force specific language for pronunciation:
audio = client.text_to_speech.convert(
text="Bonjour, comment allez-vous?",
voice_id="JBFqnCBsd6RMkjVDRZzb",
model_id="eleven_multilingual_v2",
language_code="fr" # ISO 639-1 code
)
Controls how numbers, dates, and abbreviations are converted to spoken words. For example, "01/15/2026" becomes "January fifteenth, twenty twenty-six":
"auto" (default): Model decides based on context
"on": Always normalize (use when you want natural speech)
"off": Speak literally (use when you want "zero one slash one five...")
audio = client.text_to_speech.convert( text="Call 1-800-555-0123 on 01/15/2026", voice_id="JBFqnCBsd6RMkjVDRZzb", apply_text_normalization="on" )
When generating long audio in multiple requests, the audio can have pops, unnatural pauses, or tone shifts at the boundaries. Request stitching solves this by letting each request know what comes before/after it:
# First request
audio1 = client.text_to_speech.convert(
text="This is the first part.",
voice_id="JBFqnCBsd6RMkjVDRZzb",
next_text="And this continues the story."
)
# Second request using previous context
audio2 = client.text_to_speech.convert(
text="And this continues the story.",
voice_id="JBFqnCBsd6RMkjVDRZzb",
previous_text="This is the first part."
)
| Format | Description |
|---|---|
mp3_44100_128 | MP3 44.1kHz 128kbps (default) - compressed, good for web/apps |
mp3_44100_192 | MP3 44.1kHz 192kbps (Creator+) - higher quality compressed |
mp3_44100_64 | MP3 44.1kHz 64kbps - lower quality, smaller files |
mp3_22050_32 | MP3 22.05kHz 32kbps - smallest MP3 files |
pcm_16000 | Raw PCM 16kHz - use for real-time processing |
pcm_22050 |
For real-time applications, use the stream method (returns audio chunks as they're generated):
audio_stream = client.text_to_speech.stream(
text="This text will be streamed as audio.",
voice_id="JBFqnCBsd6RMkjVDRZzb",
model_id="eleven_flash_v2_5" # Ultra-low latency
)
for chunk in audio_stream:
play_audio(chunk)
See references/streaming.md for WebSocket streaming.
try:
audio = client.text_to_speech.convert(
text="Generate speech",
voice_id="invalid-voice-id"
)
except Exception as e:
print(f"API error: {e}")
Common errors:
Monitor character usage via response headers (x-character-count, request-id):
response = client.text_to_speech.convert.with_raw_response(
text="Hello!", voice_id="JBFqnCBsd6RMkjVDRZzb", model_id="eleven_multilingual_v2"
)
audio = response.parse()
print(f"Characters used: {response.headers.get('x-character-count')}")
Weekly Installs
2.2K
Repository
GitHub Stars
143
First Seen
Jan 27, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
codex1.8K
gemini-cli1.8K
opencode1.8K
github-copilot1.6K
kimi-cli1.6K
amp1.5K
React 组合模式指南:Vercel 组件架构最佳实践,提升代码可维护性
102,200 周安装
eleven_turbo_v2_5 | 32 | ~250-300ms | Balanced quality/speed |
eleven_turbo_v2 | English | ~250-300ms | English-only, balanced |
| Raw PCM 22.05kHz |
pcm_24000 | Raw PCM 24kHz - good balance for streaming |
pcm_44100 | Raw PCM 44.1kHz (Pro+) - CD quality |
pcm_48000 | Raw PCM 48kHz (Pro+) - highest quality |
ulaw_8000 | μ-law 8kHz - standard for phone systems (Twilio, telephony) |
alaw_8000 | A-law 8kHz - telephony (alternative to μ-law) |
opus_48000_64 | Opus 48kHz 64kbps - efficient streaming codec |
wav_44100 | WAV 44.1kHz - uncompressed with headers |