fal.ai 音频处理：文本转语音与语音转文本AI模型，支持多语言和高质量音频生成

fal-audio by fal-ai-community/skills

188 周安装量

39 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/fal-ai-community/skills --skill fal-audio

AI/机器学习自动化音频处理

🇨🇳中文介绍

fal.ai 音频

使用 fal.ai 上的先进音频模型进行文本转语音和语音转文本。

工作原理

用户提供文本（用于 TTS）或音频 URL（用于 STT）
脚本选择合适的模型
向 fal.ai API 发送请求
返回音频 URL（TTS）或转录文本（STT）

查找模型

要发现最佳和最新的音频模型，请使用搜索 API：

# 搜索文本转语音模型
bash /mnt/skills/user/fal-generate/scripts/search-models.sh --category "text-to-speech"

# 搜索语音转文本模型
bash /mnt/skills/user/fal-generate/scripts/search-models.sh --category "speech-to-text"

# 搜索音乐生成模型
bash /mnt/skills/user/fal-generate/scripts/search-models.sh --query "music generation"

或者使用 search_models MCP 工具并附带相关关键词，如 "tts"、"speech"、"music"。

使用方法

文本转语音

bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh [选项]

参数：

--text - 要转换为语音的文本（必需）

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

相关 Skills

FlyClaw：零登录航班聚合查询工具，Python实现多源航班信息与价格搜索

4,000,000 周安装

find-skills 技能搜索工具 - Vercel Labs 开源智能体技能包管理器

827,200 周安装

Azure RBAC 权限管理工具：查找最小角色、创建自定义角色与自动化分配

123,100 周安装

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

111,800 周安装

--model - TTS 模型（默认为 fal-ai/minimax/speech-2.8-turbo）

--voice - 语音 ID 或名称（模型特定）

# 基本 TTS（快速，质量好）
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh \
  --text "Hello, welcome to the future of AI."

# 使用 MiniMax HD 获得高质量
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh \
  --text "This is premium quality speech." \
  --model "fal-ai/minimax/speech-2.8-hd"

# 使用 ElevenLabs 的自然语音
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh \
  --text "Natural sounding voice generation" \
  --model "fal-ai/elevenlabs/tts/eleven-v3"

# 多语言 TTS
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh \
  --text "Bonjour, bienvenue dans le futur." \
  --model "fal-ai/chatterbox/text-to-speech/multilingual"

bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh [选项]

--audio-url - 要转录的音频文件的 URL（必需）
--model - STT 模型（默认为 fal-ai/whisper）
--language - 语言代码（可选，自动检测）

# 使用 Whisper 转录
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh \
  --audio-url "https://example.com/audio.mp3"

# 使用说话人分离功能转录
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh \
  --audio-url "https://example.com/meeting.mp3" \
  --model "fal-ai/elevenlabs/speech-to-text/scribe-v2"

# 转录特定语言
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh \
  --audio-url "https://example.com/spanish.mp3" \
  --language "es"

MCP 工具替代方案

使用 search_models MCP 工具或 search-models.sh 查找当前最佳模型，然后使用发现的 modelId 调用 mcp__fal-ai__generate。

文本转语音输出

Generating speech...
Model: fal-ai/minimax/speech-2.8-turbo

Speech generated!

Audio URL: https://v3.fal.media/files/abc123/speech.mp3
Duration: 5.2s

语音转文本输出

Transcribing audio...
Model: fal-ai/whisper

Transcription complete!

Text: "Hello, this is the transcribed text from the audio file."
Duration: 12.5s
Language: en

向用户呈现结果

这是生成的语音：

[下载音频](https://v3.fal.media/files/.../speech.mp3)

• 时长：5.2s | 模型：Maya TTS

这是转录文本：

"Hello, this is the transcribed text from the audio file."

• 时长：12.5s | 语言：English

文本转语音：搜索 text-to-speech 类别。考虑质量与速度的权衡。
文本转音乐：搜索 music generation。有些模型专长于人声，有些专长于器乐。
语音转文本：搜索 speech-to-text 类别。考虑是否需要说话人分离或多语言支持。

Error: Generated audio is empty

检查您的文本是否非空且包含有效内容。

不支持的音频格式

Error: Audio format not supported

支持的格式：MP3, WAV, M4A, FLAC, OGG
请将您的音频转换为支持的格式。

Warning: Could not detect language, defaulting to English

请使用 --language 选项明确指定语言。

🇺🇸English

fal.ai Audio

Text-to-speech and speech-to-text using state-of-the-art audio models on fal.ai.

How It Works

User provides text (for TTS) or audio URL (for STT)
Script selects appropriate model
Sends request to fal.ai API
Returns audio URL (TTS) or transcription text (STT)

Finding Models

To discover the best and latest audio models, use the search API:

# Search for text-to-speech models
bash /mnt/skills/user/fal-generate/scripts/search-models.sh --category "text-to-speech"

# Search for speech-to-text models
bash /mnt/skills/user/fal-generate/scripts/search-models.sh --category "speech-to-text"

# Search for music generation models
bash /mnt/skills/user/fal-generate/scripts/search-models.sh --query "music generation"

Or use the search_models MCP tool with relevant keywords like "tts", "speech", "music".

Usage

Text-to-Speech

bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh [options]

Arguments:

--text - Text to convert to speech (required)
--model - TTS model (defaults to fal-ai/minimax/speech-2.8-turbo)
--voice - Voice ID or name (model-specific)

Examples:

# Basic TTS (fast, good quality)
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh \
  --text "Hello, welcome to the future of AI."

# High quality with MiniMax HD
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh \
  --text "This is premium quality speech." \
  --model "fal-ai/minimax/speech-2.8-hd"

# Natural voices with ElevenLabs
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh \
  --text "Natural sounding voice generation" \
  --model "fal-ai/elevenlabs/tts/eleven-v3"

# Multi-language TTS
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh \
  --text "Bonjour, bienvenue dans le futur." \
  --model "fal-ai/chatterbox/text-to-speech/multilingual"

Speech-to-Text

bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh [options]

Arguments:

--audio-url - URL of audio file to transcribe (required)
--model - STT model (defaults to fal-ai/whisper)
--language - Language code (optional, auto-detected)

Examples:

# Transcribe with Whisper
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh \
  --audio-url "https://example.com/audio.mp3"

# Transcribe with speaker diarization
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh \
  --audio-url "https://example.com/meeting.mp3" \
  --model "fal-ai/elevenlabs/speech-to-text/scribe-v2"

# Transcribe specific language
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh \
  --audio-url "https://example.com/spanish.mp3" \
  --language "es"

MCP Tool Alternative

Use search_models MCP tool or search-models.sh to find the best current model, then call mcp__fal-ai__generate with the discovered modelId.

Output

Text-to-Speech Output

Generating speech...
Model: fal-ai/minimax/speech-2.8-turbo

Speech generated!

Audio URL: https://v3.fal.media/files/abc123/speech.mp3
Duration: 5.2s

Speech-to-Text Output

Transcribing audio...
Model: fal-ai/whisper

Transcription complete!

Text: "Hello, this is the transcribed text from the audio file."
Duration: 12.5s
Language: en

Present Results to User

For TTS:

Here's the generated speech:

[Download audio](https://v3.fal.media/files/.../speech.mp3)

• Duration: 5.2s | Model: Maya TTS

For STT:

Here's the transcription:

"Hello, this is the transcribed text from the audio file."

• Duration: 12.5s | Language: English

Model Selection Tips

Text-to-Speech : Search for text-to-speech category. Consider quality vs speed tradeoffs.
Text-to-Music : Search for music generation. Some models specialize in vocals, others in instrumental.
Speech-to-Text : Search for speech-to-text category. Consider whether you need speaker diarization or multi-language support.

Troubleshooting

Empty Audio

Error: Generated audio is empty

Check that your text is not empty and contains valid content.

Unsupported Audio Format

Error: Audio format not supported

Supported formats: MP3, WAV, M4A, FLAC, OGG
Convert your audio to a supported format.

Language Detection Failed

Warning: Could not detect language, defaulting to English

Specify the language explicitly with --language option.

Weekly Installs

188

Repository

fal-ai-community/skills

GitHub Stars

First Seen

Jan 27, 2026

Security Audits

Gen Agent Trust HubFail SocketPass SnykPass

Installed on

opencode155

gemini-cli153

codex149

github-copilot144

cursor138

kimi-cli127

AI Elements：基于shadcn/ui的AI原生应用组件库，快速构建对话界面

62,200 周安装