阿里云通义千问TTS语音合成API使用指南 - Python SDK集成与音频生成教程

alicloud-ai-audio-tts by cinience/alicloud-skills

286 周安装量

364 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/cinience/alicloud-skills --skill alicloud-ai-audio-tts

AI/机器学习云服务音频处理

🇨🇳中文介绍

Category: provider

Model Studio 通义千问 TTS

验证

mkdir -p output/alicloud-ai-audio-tts
python -m py_compile skills/ai/audio/alicloud-ai-audio-tts/scripts/generate_tts.py && echo "py_compile_ok" > output/alicloud-ai-audio-tts/validate.txt

通过标准：命令退出码为 0 且生成 output/alicloud-ai-audio-tts/validate.txt 文件。

输出与证据

将生成的音频链接、示例音频文件和请求载荷保存到 output/alicloud-ai-audio-tts/ 目录。
每次执行保留一份验证日志。

关键模型名称

使用以下推荐模型之一：

qwen3-tts-flash
qwen3-tts-instruct-flash
qwen3-tts-instruct-flash-2026-01-26

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

标准化接口 (tts.generate)

text (字符串，必需)
voice (字符串，必需)
language_type (字符串，可选；默认值 Auto)
instruction (字符串，可选；推荐用于 instruct 模型)
stream (布尔值，可选；默认值 false)

audio_url (字符串，当 stream=false 时)
audio_base64_pcm (字符串，当 stream=true 时)
sample_rate (整数，24000)
format (字符串，根据模式为 wav 或 pcm)

快速开始 (Python + DashScope SDK)

import os
import dashscope

# 推荐使用环境变量进行认证：export DASHSCOPE_API_KEY=...
# 或者使用 ~/.alibabacloud/credentials 文件，在 [default] 下设置 dashscope_api_key。
# 北京区域；如需使用新加坡区域，请设置：https://dashscope-intl.aliyuncs.com/api/v1
dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"

text = "Hello, this is a short voice line."
response = dashscope.MultiModalConversation.call(
    model="qwen3-tts-instruct-flash",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    text=text,
    voice="Cherry",
    language_type="English",
    instruction="Warm and calm tone, slightly slower pace.",
    stream=False,
)

audio_url = response.output.audio.url
print(audio_url)

stream=True 返回 Base64 编码的 24kHz PCM 数据块。
解码数据块并进行播放或拼接成 pcm 缓冲区。
流结束时，响应中包含 finish_reason == "stop"。

保持请求简洁；如果遇到大小或超时错误，请将长文本拆分为多次调用。
使用与文本一致的 language_type 以提高发音准确性。
仅在需要明确控制风格/语调时使用 instruction。
根据 (text, voice, language_type) 进行缓存以避免重复费用。

默认输出：output/alicloud-ai-audio-tts/audio/
可通过 OUTPUT_DIR 环境变量覆盖基础目录。

确认用户意图、区域、标识符以及操作是只读还是变更操作。
首先运行一个最小化的只读查询，以验证连接性和权限。
使用明确的参数和限定范围执行目标操作。
验证结果并保存输出/证据文件。

references/api_reference.md 包含参数映射和流式传输示例。
实时模式由 skills/ai/audio/alicloud-ai-audio-tts-realtime/ 提供。
语音克隆/设计由 skills/ai/audio/alicloud-ai-audio-tts-voice-clone/ 和 skills/ai/audio/alicloud-ai-audio-tts-voice-design/ 提供。
来源列表：references/sources.md

🇺🇸English

Category: provider

Model Studio Qwen TTS

Validation

mkdir -p output/alicloud-ai-audio-tts
python -m py_compile skills/ai/audio/alicloud-ai-audio-tts/scripts/generate_tts.py && echo "py_compile_ok" > output/alicloud-ai-audio-tts/validate.txt

Pass criteria: command exits 0 and output/alicloud-ai-audio-tts/validate.txt is generated.

Output And Evidence

Save generated audio links, sample audio files, and request payloads to output/alicloud-ai-audio-tts/.
Keep one validation log per execution.

Critical model names

Use one of the recommended models:

qwen3-tts-flash
qwen3-tts-instruct-flash
qwen3-tts-instruct-flash-2026-01-26

Prerequisites

Install SDK (recommended in a venv to avoid PEP 668 limits):

python3 -m venv .venv . .venv/bin/activate python -m pip install dashscope
Set DASHSCOPE_API_KEY in your environment, or add dashscope_api_key to ~/.alibabacloud/credentials (env takes precedence).

Normalized interface (tts.generate)

Request

text (string, required)
voice (string, required)
language_type (string, optional; default Auto)
instruction (string, optional; recommended for instruct models)
stream (bool, optional; default false)

Response

audio_url (string, when stream=false)
audio_base64_pcm (string, when stream=true)
sample_rate (int, 24000)
format (string, wav or pcm depending on mode)

Quick start (Python + DashScope SDK)

import os
import dashscope

# Prefer env var for auth: export DASHSCOPE_API_KEY=...
# Or use ~/.alibabacloud/credentials with dashscope_api_key under [default].
# Beijing region; for Singapore use: https://dashscope-intl.aliyuncs.com/api/v1
dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"

text = "Hello, this is a short voice line."
response = dashscope.MultiModalConversation.call(
    model="qwen3-tts-instruct-flash",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    text=text,
    voice="Cherry",
    language_type="English",
    instruction="Warm and calm tone, slightly slower pace.",
    stream=False,
)

audio_url = response.output.audio.url
print(audio_url)

Streaming notes

stream=True returns Base64-encoded PCM chunks at 24kHz.
Decode chunks and play or concatenate to a pcm buffer.
The response contains finish_reason == "stop" when the stream ends.

Operational guidance

Keep requests concise; split long text into multiple calls if you hit size or timeout errors.
Use language_type consistent with the text to improve pronunciation.
Use instruction only when you need explicit style/tone control.
Cache by (text, voice, language_type) to avoid repeat costs.

Output location

Default output: output/alicloud-ai-audio-tts/audio/
Override base dir with OUTPUT_DIR.

Workflow

Confirm user intent, region, identifiers, and whether the operation is read-only or mutating.
Run one minimal read-only query first to verify connectivity and permissions.
Execute the target operation with explicit parameters and bounded scope.
Verify results and save output/evidence files.

References

references/api_reference.md for parameter mapping and streaming example.
Realtime mode is provided by skills/ai/audio/alicloud-ai-audio-tts-realtime/.
Voice cloning/design are provided by skills/ai/audio/alicloud-ai-audio-tts-voice-clone/ and skills/ai/audio/alicloud-ai-audio-tts-voice-design/.
Source list: references/sources.md

Weekly Installs

183

Repository

cinience/alicloud-skills

GitHub Stars

340

First Seen

Feb 7, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

gemini-cli182

github-copilot182

codex182

kimi-cli182

amp182

opencode182

Azure 配额管理指南：服务限制、容量验证与配额增加方法

79,700 周安装