fal-audio by fal-ai-community/skills
npx skills add https://github.com/fal-ai-community/skills --skill fal-audio使用 fal.ai 上的先进音频模型进行文本转语音和语音转文本。
要发现最佳和最新的音频模型,请使用搜索 API:
# 搜索文本转语音模型
bash /mnt/skills/user/fal-generate/scripts/search-models.sh --category "text-to-speech"
# 搜索语音转文本模型
bash /mnt/skills/user/fal-generate/scripts/search-models.sh --category "speech-to-text"
# 搜索音乐生成模型
bash /mnt/skills/user/fal-generate/scripts/search-models.sh --query "music generation"
或者使用 search_models MCP 工具并附带相关关键词,如 "tts"、"speech"、"music"。
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh [选项]
参数:
--text - 要转换为语音的文本(必需)广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
--model - TTS 模型(默认为 fal-ai/minimax/speech-2.8-turbo)--voice - 语音 ID 或名称(模型特定)示例:
# 基本 TTS(快速,质量好)
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh \
--text "Hello, welcome to the future of AI."
# 使用 MiniMax HD 获得高质量
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh \
--text "This is premium quality speech." \
--model "fal-ai/minimax/speech-2.8-hd"
# 使用 ElevenLabs 的自然语音
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh \
--text "Natural sounding voice generation" \
--model "fal-ai/elevenlabs/tts/eleven-v3"
# 多语言 TTS
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh \
--text "Bonjour, bienvenue dans le futur." \
--model "fal-ai/chatterbox/text-to-speech/multilingual"
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh [选项]
参数:
--audio-url - 要转录的音频文件的 URL(必需)--model - STT 模型(默认为 fal-ai/whisper)--language - 语言代码(可选,自动检测)示例:
# 使用 Whisper 转录
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh \
--audio-url "https://example.com/audio.mp3"
# 使用说话人分离功能转录
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh \
--audio-url "https://example.com/meeting.mp3" \
--model "fal-ai/elevenlabs/speech-to-text/scribe-v2"
# 转录特定语言
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh \
--audio-url "https://example.com/spanish.mp3" \
--language "es"
使用 search_models MCP 工具或 search-models.sh 查找当前最佳模型,然后使用发现的 modelId 调用 mcp__fal-ai__generate。
Generating speech...
Model: fal-ai/minimax/speech-2.8-turbo
Speech generated!
Audio URL: https://v3.fal.media/files/abc123/speech.mp3
Duration: 5.2s
Transcribing audio...
Model: fal-ai/whisper
Transcription complete!
Text: "Hello, this is the transcribed text from the audio file."
Duration: 12.5s
Language: en
这是生成的语音:
[下载音频](https://v3.fal.media/files/.../speech.mp3)
• 时长:5.2s | 模型:Maya TTS
这是转录文本:
"Hello, this is the transcribed text from the audio file."
• 时长:12.5s | 语言:English
text-to-speech 类别。考虑质量与速度的权衡。music generation。有些模型专长于人声,有些专长于器乐。speech-to-text 类别。考虑是否需要说话人分离或多语言支持。Error: Generated audio is empty
检查您的文本是否非空且包含有效内容。
Error: Audio format not supported
支持的格式:MP3, WAV, M4A, FLAC, OGG
请将您的音频转换为支持的格式。
Warning: Could not detect language, defaulting to English
请使用 --language 选项明确指定语言。
每周安装数
188
仓库
GitHub 星标数
39
首次出现
Jan 27, 2026
安全审计
安装于
opencode155
gemini-cli153
codex149
github-copilot144
cursor138
kimi-cli127
Text-to-speech and speech-to-text using state-of-the-art audio models on fal.ai.
To discover the best and latest audio models, use the search API:
# Search for text-to-speech models
bash /mnt/skills/user/fal-generate/scripts/search-models.sh --category "text-to-speech"
# Search for speech-to-text models
bash /mnt/skills/user/fal-generate/scripts/search-models.sh --category "speech-to-text"
# Search for music generation models
bash /mnt/skills/user/fal-generate/scripts/search-models.sh --query "music generation"
Or use the search_models MCP tool with relevant keywords like "tts", "speech", "music".
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh [options]
Arguments:
--text - Text to convert to speech (required)--model - TTS model (defaults to fal-ai/minimax/speech-2.8-turbo)--voice - Voice ID or name (model-specific)Examples:
# Basic TTS (fast, good quality)
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh \
--text "Hello, welcome to the future of AI."
# High quality with MiniMax HD
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh \
--text "This is premium quality speech." \
--model "fal-ai/minimax/speech-2.8-hd"
# Natural voices with ElevenLabs
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh \
--text "Natural sounding voice generation" \
--model "fal-ai/elevenlabs/tts/eleven-v3"
# Multi-language TTS
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh \
--text "Bonjour, bienvenue dans le futur." \
--model "fal-ai/chatterbox/text-to-speech/multilingual"
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh [options]
Arguments:
--audio-url - URL of audio file to transcribe (required)--model - STT model (defaults to fal-ai/whisper)--language - Language code (optional, auto-detected)Examples:
# Transcribe with Whisper
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh \
--audio-url "https://example.com/audio.mp3"
# Transcribe with speaker diarization
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh \
--audio-url "https://example.com/meeting.mp3" \
--model "fal-ai/elevenlabs/speech-to-text/scribe-v2"
# Transcribe specific language
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh \
--audio-url "https://example.com/spanish.mp3" \
--language "es"
Use search_models MCP tool or search-models.sh to find the best current model, then call mcp__fal-ai__generate with the discovered modelId.
Generating speech...
Model: fal-ai/minimax/speech-2.8-turbo
Speech generated!
Audio URL: https://v3.fal.media/files/abc123/speech.mp3
Duration: 5.2s
Transcribing audio...
Model: fal-ai/whisper
Transcription complete!
Text: "Hello, this is the transcribed text from the audio file."
Duration: 12.5s
Language: en
Here's the generated speech:
[Download audio](https://v3.fal.media/files/.../speech.mp3)
• Duration: 5.2s | Model: Maya TTS
Here's the transcription:
"Hello, this is the transcribed text from the audio file."
• Duration: 12.5s | Language: English
text-to-speech category. Consider quality vs speed tradeoffs.music generation. Some models specialize in vocals, others in instrumental.speech-to-text category. Consider whether you need speaker diarization or multi-language support.Error: Generated audio is empty
Check that your text is not empty and contains valid content.
Error: Audio format not supported
Supported formats: MP3, WAV, M4A, FLAC, OGG
Convert your audio to a supported format.
Warning: Could not detect language, defaulting to English
Specify the language explicitly with --language option.
Weekly Installs
188
Repository
GitHub Stars
39
First Seen
Jan 27, 2026
Security Audits
Gen Agent Trust HubFailSocketPassSnykPass
Installed on
opencode155
gemini-cli153
codex149
github-copilot144
cursor138
kimi-cli127
AI Elements:基于shadcn/ui的AI原生应用组件库,快速构建对话界面
62,200 周安装
Auto-Trigger 技能:AI Agent 工作流自动触发与钩子管理指南
177 周安装
付费广告优化指南:Google、Meta、LinkedIn等平台投放策略与文案框架
177 周安装
SolidJS 最佳实践与模式指南:解决状态耦合与UI卡顿问题
177 周安装
Sentry OTel 导出器设置教程:配置OpenTelemetry Collector发送追踪日志
177 周安装
漫画风格短剧生成器 - AI自动分镜图生视频工具,快速创作漫剧
177 周安装
Cloudflare D1数据库迁移指南 - Drizzle ORM工作流与问题解决方案
177 周安装