HeyGen Starfish TTS API：文本转语音工具，生成高质量音频文件

text-to-speech by heygen-com/skills

611 周安装量

92 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/heygen-com/skills --skill text-to-speech

AI/机器学习音频处理 API

🇨🇳中文介绍

文本转语音（HeyGen Starfish）

使用 HeyGen 自研的 Starfish TTS 模型，将文本转换为语音音频文件。此技能用于独立的音频生成，与视频创作功能分离。

认证

所有请求都需要 X-Api-Key 请求头。请设置 HEYGEN_API_KEY 环境变量。

curl -X GET "https://api.heygen.com/v1/audio/voices" \
  -H "X-Api-Key: $HEYGEN_API_KEY"

工具选择

如果 HeyGen MCP 工具可用（mcp__heygen__*），优先使用它们而非直接调用 HTTP API。

任务	MCP 工具	备用方案（直接 API）
列出 TTS 语音	`mcp__heygen__list_audio_voices`

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

字段	类型	必填	描述
`text`	string	是	要转换为语音的文本内容
`voice_id`	string	是	来自 `GET /v1/audio/voices` 的语音 ID
`speed`	number		语速，0.5-1.5（默认：1）
`pitch`	integer		音高，-50 到 50（默认：0）
`locale`	string		多语言语音的口音/区域设置（例如 `en-US`、`pt-BR`）
`elevenlabs_settings`	object		ElevenLabs 语音的高级设置

ElevenLabs 设置（可选）

字段	类型	描述
`model`	string	模型选择（`eleven_v3`、`eleven_turbo_v2_5` 等）
`similarity_boost`	number	语音相似度，0.0-1.0
`stability`	number	输出一致性，0.0-1.0
`style`	number	风格强度，0.0-1.0

curl -X POST "https://api.heygen.com/v1/audio/text_to_speech" \
  -H "X-Api-Key: $HEYGEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello! Welcome to our product demo.",
    "voice_id": "YOUR_VOICE_ID",
    "speed": 1.0
  }'

interface TTSRequest {
  text: string;
  voice_id: string;
  speed?: number;
  pitch?: number;
  locale?: string;
  elevenlabs_settings?: {
    model?: string;
    similarity_boost?: number;
    stability?: number;
    style?: number;
  };
}

interface WordTimestamp {
  word: string;
  start: number;
  end: number;
}

interface TTSResponse {
  error: null | string;
  data: {
    audio_url: string;
    duration: number;
    request_id: string;
    word_timestamps: WordTimestamp[];
  };
}

async function textToSpeech(request: TTSRequest): Promise<TTSResponse["data"]> {
  const response = await fetch(
    "https://api.heygen.com/v1/audio/text_to_speech",
    {
      method: "POST",
      headers: {
        "X-Api-Key": process.env.HEYGEN_API_KEY!,
        "Content-Type": "application/json",
      },
      body: JSON.stringify(request),
    }
  );

  const json: TTSResponse = await response.json();

  if (json.error) {
    throw new Error(json.error);
  }

  return json.data;
}

import requests
import os

def text_to_speech(
    text: str,
    voice_id: str,
    speed: float = 1.0,
    pitch: int = 0,
    locale: str | None = None,
) -> dict:
    payload = {
        "text": text,
        "voice_id": voice_id,
        "speed": speed,
        "pitch": pitch,
    }

    if locale:
        payload["locale"] = locale

    response = requests.post(
        "https://api.heygen.com/v1/audio/text_to_speech",
        headers={
            "X-Api-Key": os.environ["HEYGEN_API_KEY"],
            "Content-Type": "application/json",
        },
        json=payload,
    )

    data = response.json()
    if data.get("error"):
        raise Exception(data["error"])

    return data["data"]

{
  "error": null,
  "data": {
    "audio_url": "https://resource2.heygen.ai/text_to_speech/.../id=365d46bb.wav",
    "duration": 5.526,
    "request_id": "p38QJ52hfgNlsYKZZmd9",
    "word_timestamps": [
      { "word": "<start>", "start": 0.0, "end": 0.0 },
      { "word": "Hey", "start": 0.079, "end": 0.219 },
      { "word": "there,", "start": 0.239, "end": 0.459 },
      { "word": "<end>", "start": 5.526, "end": 5.526 }
    ]
  }
}

const result = await textToSpeech({
  text: "Welcome to our quarterly earnings call.",
  voice_id: "YOUR_VOICE_ID",
});

console.log(`Audio URL: ${result.audio_url}`);
console.log(`Duration: ${result.duration}s`);

const result = await textToSpeech({
  text: "We're thrilled to announce our newest feature!",
  voice_id: "YOUR_VOICE_ID",
  speed: 1.1,
});

为多语言语音设置区域

const result = await textToSpeech({
  text: "Bem-vindo ao nosso produto.",
  voice_id: "MULTILINGUAL_VOICE_ID",
  locale: "pt-BR",
});

查找语音并生成音频

async function generateSpeech(text: string, language: string): Promise<string> {
  const voices = await listTTSVoices();
  const voice = voices.find(
    (v) => v.language.toLowerCase().includes(language.toLowerCase())
  );

  if (!voice) {
    throw new Error(`No TTS voice found for language: ${language}`);
  }

  const result = await textToSpeech({
    text,
    voice_id: voice.voice_id,
  });

  return result.audio_url;
}

const audioUrl = await generateSpeech("Hello and welcome!", "english");

使用 Break 标签添加停顿

在文本中使用 SSML 风格的 break 标签来添加停顿：

word <break time="1s"/> word

使用带 s 后缀的秒数：<break time="1.5s"/>
标签前后必须有空格
使用自闭合标签格式

使用 GET /v1/audio/voices 来查找兼容的语音 —— 并非所有来自 GET /v2/voices 的语音都支持 Starfish TTS
在设置 locale 前检查 support_locale —— 只有多语言语音支持区域设置选择
将语速保持在 0.8-1.2 之间 以获得自然的声音输出
在生成前使用 preview_audio_url 预览语音（某些语音可能为 null）
使用响应中的 word_timestamps 进行字幕同步或定时文本叠加
在文本中使用 SSML break 标签 来添加停顿：word <break time="1s"/> word

🇺🇸English

Text-to-Speech (HeyGen Starfish)

Generate speech audio files from text using HeyGen's in-house Starfish TTS model. This skill is for standalone audio generation — separate from video creation.

Authentication

All requests require the X-Api-Key header. Set the HEYGEN_API_KEY environment variable.

curl -X GET "https://api.heygen.com/v1/audio/voices" \
  -H "X-Api-Key: $HEYGEN_API_KEY"

Tool Selection

If HeyGen MCP tools are available (mcp__heygen__*), prefer them over direct HTTP API calls.

Task	MCP Tool	Fallback (Direct API)
List TTS voices	`mcp__heygen__list_audio_voices`	`GET /v1/audio/voices`
Generate speech audio	`mcp__heygen__text_to_speech`	`POST /v1/audio/text_to_speech`

Default Workflow

List voices with mcp__heygen__list_audio_voices (or GET /v1/audio/voices)
Pick a voice matching desired language, gender, and features
Call mcp__heygen__text_to_speech (or POST /v1/audio/text_to_speech) with text and voice_id
Use the returned audio_url to download or play the audio

List TTS Voices

Retrieve voices compatible with the Starfish TTS model.

Note: This uses GET /v1/audio/voices — a different endpoint from the video voices API (GET /v2/voices). Not all video voices support Starfish TTS.

curl

curl -X GET "https://api.heygen.com/v1/audio/voices" \
  -H "X-Api-Key: $HEYGEN_API_KEY"

TypeScript

interface TTSVoice {
  voice_id: string;
  language: string;
  gender: "female" | "male" | "unknown";
  name: string;
  preview_audio_url: string | null;
  support_pause: boolean;
  support_locale: boolean;
  type: string;
}

interface TTSVoicesResponse {
  error: null | string;
  data: {
    voices: TTSVoice[];
  };
}

async function listTTSVoices(): Promise<TTSVoice[]> {
  const response = await fetch("https://api.heygen.com/v1/audio/voices", {
    headers: { "X-Api-Key": process.env.HEYGEN_API_KEY! },
  });

  const json: TTSVoicesResponse = await response.json();

  if (json.error) {
    throw new Error(json.error);
  }

  return json.data.voices;
}

Python

import requests
import os

def list_tts_voices() -> list:
    response = requests.get(
        "https://api.heygen.com/v1/audio/voices",
        headers={"X-Api-Key": os.environ["HEYGEN_API_KEY"]}
    )

    data = response.json()
    if data.get("error"):
        raise Exception(data["error"])

    return data["data"]["voices"]

Response Format

{
  "error": null,
  "data": {
    "voices": [
      {
        "voice_id": "f38a635bee7a4d1f9b0a654a31d050d2",
        "name": "Chill Brian",
        "language": "English",
        "gender": "male",
        "preview_audio_url": "https://resource.heygen.ai/text_to_speech/WpSDQvmLGXEqXZVZQiVeg6.mp3",
        "support_pause": true,
        "support_locale": false,
        "type": "public"
      }
    ]
  }
}

Generate Speech Audio

Convert text to speech audio using a specified voice.

Endpoint

POST https://api.heygen.com/v1/audio/text_to_speech

Request Fields

Field	Type	Req	Description
`text`	string	Y	Text content to convert to speech
`voice_id`	string	Y	Voice ID from `GET /v1/audio/voices`
`speed`	number		Speech speed, 0.5-1.5 (default: 1)
`pitch`	integer

ElevenLabs Settings (optional)

Field	Type	Description
`model`	string	Model selection (`eleven_v3`, `eleven_turbo_v2_5`, etc.)
`similarity_boost`	number	Voice similarity, 0.0-1.0
`stability`	number	Output consistency, 0.0-1.0
`style`	number	Style intensity, 0.0-1.0

curl

curl -X POST "https://api.heygen.com/v1/audio/text_to_speech" \
  -H "X-Api-Key: $HEYGEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello! Welcome to our product demo.",
    "voice_id": "YOUR_VOICE_ID",
    "speed": 1.0
  }'

TypeScript

interface TTSRequest {
  text: string;
  voice_id: string;
  speed?: number;
  pitch?: number;
  locale?: string;
  elevenlabs_settings?: {
    model?: string;
    similarity_boost?: number;
    stability?: number;
    style?: number;
  };
}

interface WordTimestamp {
  word: string;
  start: number;
  end: number;
}

interface TTSResponse {
  error: null | string;
  data: {
    audio_url: string;
    duration: number;
    request_id: string;
    word_timestamps: WordTimestamp[];
  };
}

async function textToSpeech(request: TTSRequest): Promise<TTSResponse["data"]> {
  const response = await fetch(
    "https://api.heygen.com/v1/audio/text_to_speech",
    {
      method: "POST",
      headers: {
        "X-Api-Key": process.env.HEYGEN_API_KEY!,
        "Content-Type": "application/json",
      },
      body: JSON.stringify(request),
    }
  );

  const json: TTSResponse = await response.json();

  if (json.error) {
    throw new Error(json.error);
  }

  return json.data;
}

Python

import requests
import os

def text_to_speech(
    text: str,
    voice_id: str,
    speed: float = 1.0,
    pitch: int = 0,
    locale: str | None = None,
) -> dict:
    payload = {
        "text": text,
        "voice_id": voice_id,
        "speed": speed,
        "pitch": pitch,
    }

    if locale:
        payload["locale"] = locale

    response = requests.post(
        "https://api.heygen.com/v1/audio/text_to_speech",
        headers={
            "X-Api-Key": os.environ["HEYGEN_API_KEY"],
            "Content-Type": "application/json",
        },
        json=payload,
    )

    data = response.json()
    if data.get("error"):
        raise Exception(data["error"])

    return data["data"]

Response Format

{
  "error": null,
  "data": {
    "audio_url": "https://resource2.heygen.ai/text_to_speech/.../id=365d46bb.wav",
    "duration": 5.526,
    "request_id": "p38QJ52hfgNlsYKZZmd9",
    "word_timestamps": [
      { "word": "<start>", "start": 0.0, "end": 0.0 },
      { "word": "Hey", "start": 0.079, "end": 0.219 },
      { "word": "there,", "start": 0.239, "end": 0.459 },
      { "word": "<end>", "start": 5.526, "end": 5.526 }
    ]
  }
}

Usage Examples

Basic TTS

const result = await textToSpeech({
  text: "Welcome to our quarterly earnings call.",
  voice_id: "YOUR_VOICE_ID",
});

console.log(`Audio URL: ${result.audio_url}`);
console.log(`Duration: ${result.duration}s`);

With Speed Adjustment

const result = await textToSpeech({
  text: "We're thrilled to announce our newest feature!",
  voice_id: "YOUR_VOICE_ID",
  speed: 1.1,
});

With Locale for Multilingual Voices

const result = await textToSpeech({
  text: "Bem-vindo ao nosso produto.",
  voice_id: "MULTILINGUAL_VOICE_ID",
  locale: "pt-BR",
});

Find a Voice and Generate Audio

async function generateSpeech(text: string, language: string): Promise<string> {
  const voices = await listTTSVoices();
  const voice = voices.find(
    (v) => v.language.toLowerCase().includes(language.toLowerCase())
  );

  if (!voice) {
    throw new Error(`No TTS voice found for language: ${language}`);
  }

  const result = await textToSpeech({
    text,
    voice_id: voice.voice_id,
  });

  return result.audio_url;
}

const audioUrl = await generateSpeech("Hello and welcome!", "english");

Pauses with Break Tags

Use SSML-style break tags in your text for pauses:

word <break time="1s"/> word

Rules:

Use seconds with s suffix: <break time="1.5s"/>
Must have spaces before and after the tag
Self-closing tag format

Best Practices

UseGET /v1/audio/voices to find compatible voices — not all voices from GET /v2/voices support Starfish TTS
Checksupport_locale before setting a locale — only multilingual voices support locale selection
Keep speed between 0.8-1.2 for natural-sounding output
Preview voices using the preview_audio_url before generating (may be null for some voices)
Useword_timestamps in the response for caption syncing or timed text overlays
Use SSML break tags in your text for pauses: word <break time="1s"/> word

Weekly Installs

611

Repository

heygen-com/skills

GitHub Stars

First Seen

Mar 3, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

claude-code580

codex83

cursor82

github-copilot81

cline81

gemini-cli81

AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具

41,400 周安装

HeyGen Starfish TTS API：文本转语音工具，生成高质量音频文件

🇨🇳中文介绍

文本转语音（HeyGen Starfish）

认证

工具选择

相关 Skills

默认工作流程

列出 TTS 语音

curl

TypeScript

Python

响应格式

生成语音音频

端点

请求字段

ElevenLabs 设置（可选）

curl

TypeScript

Python

响应格式

使用示例

基础 TTS

调整语速

为多语言语音设置区域

查找语音并生成音频

使用 Break 标签添加停顿

最佳实践

🇺🇸English

Text-to-Speech (HeyGen Starfish)

Authentication

Tool Selection

Default Workflow

List TTS Voices

curl

TypeScript

Python

Response Format

Generate Speech Audio

Endpoint

Request Fields

ElevenLabs Settings (optional)

curl

TypeScript

Python

Response Format

Usage Examples

Basic TTS

With Speed Adjustment

With Locale for Multilingual Voices

Find a Voice and Generate Audio

Pauses with Break Tags

Best Practices

最新 Skills