npx skills add https://github.com/marswaveai/skills --skill tts/podcast)/explainer)/image-gen)将文本转换为听起来自然的语音音频。两种模式:
/v1/tts): 单一语音,低延迟,同步 MP3 流。适用于闲聊、阅读片段、即时音频。/v1/speech): 多说话人,可为每个片段分配语音。适用于对话、有声书、脚本内容。shared/authentication.md 以获取 API 密钥和请求头信息广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
shared/common-patterns.md 中的错误处理和交互模式shared/speaker-selection.md 中的内置默认值作为后备方案;当用户想要更改语音时,应从 speakers API 获取shared/config-pattern.md 读取配置shared/speaker-selection.md 进行说话人选择(文本表格 + 自由文本输入)~/Downloads/ 或 /tmp/ —— 将产物保存到当前工作目录,并使用基于主题的友好名称(参见 shared/config-pattern.md § 产物命名)在询问任何问题之前,自动根据用户的输入确定模式:
| 信号 | 模式 |
|---|---|
| "多角色", "脚本", "对话", "script", "dialogue", "multi-speaker" | 脚本模式 |
| 按姓名或角色提及多个角色 | 脚本模式 |
| 输入包含结构化片段(A: ..., B: ...) | 脚本模式 |
| 单段文本,无角色标记 | 快速模式 |
| "读一下", "read this", "TTS", "朗读" 附带纯文本 | 快速模式 |
| 不明确 | 快速模式(默认) |
遵循 shared/config-pattern.md § API 密钥检查。如果密钥缺失,立即停止。
遵循 shared/config-pattern.md 步骤 0(零问题启动)。
如果文件不存在 —— 静默创建默认配置并继续:
mkdir -p ".listenhub/tts"
echo '{"outputMode":"inline","language":null,"defaultSpeakers":{}}' > ".listenhub/tts/config.json"
CONFIG_PATH=".listenhub/tts/config.json"
CONFIG=$(cat "$CONFIG_PATH")
不要询问任何设置问题。 直接进入交互流程。
如果文件存在 —— 静默读取配置并继续:
CONFIG_PATH=".listenhub/tts/config.json"
[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/tts/config.json"
CONFIG=$(cat "$CONFIG_PATH")
仅在用户明确要求重新配置时运行。显示当前设置:
当前配置 (tts):
输出方式:{inline / download / both}
语言偏好:{zh / en / 未设置}
默认主播:{speakerName / 使用内置默认}
然后询问:
outputMode : 遵循 shared/output-mode.md § 设置流程问题。
Language (可选): "默认语言?"
null收集答案后,立即保存:
NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}')
# 如果用户选择了语言(不是"每次手动选择"),则保存
if [ "$LANGUAGE" != "null" ]; then
NEW_CONFIG=$(echo "$NEW_CONFIG" | jq --arg lang "$LANGUAGE" '. + {"language": $lang}')
fi
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")
POST /v1/tts步骤 1: 提取文本
获取要转换的文本。如果用户未提供,则询问:
"您希望我朗读什么文本?"
步骤 2: 确定语音
config.defaultSpeakers.{language}[0] 已设置 → 静默使用它(跳至步骤 4)shared/speaker-selection.md 中针对检测到的语言的内置默认值(跳至步骤 4)步骤 3: 保存偏好
在用户明确选择新语音后(使用默认值时除外):
问题:"是否将 {voice name} 保存为您 {language} 的默认语音?"
选项:
- "是" — 更新 .listenhub/tts/config.json
- "否" — 仅本次会话使用
步骤 4: 确认
准备生成:
文本:"{前 80 个字符}..."
语音:{voice name}
是否继续?
步骤 5: 生成
curl -sS -X POST "https://api.marswave.ai/openapi/v1/tts" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-H "X-Source: skills" \
-d '{"input": "...", "voice": "..."}' \
--output /tmp/tts-output.mp3
步骤 6: 呈现结果
从配置中读取 OUTPUT_MODE。遵循 shared/output-mode.md 中的行为。
使用带时间戳的 jobId:$(date +%s)
inline 或 both (TTS 快速模式返回同步音频流 —— 无 audioUrl):
JOB_ID=$(date +%s)
curl -sS -X POST "https://api.marswave.ai/openapi/v1/tts" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-H "X-Source: skills" \
-d '{"input": "...", "voice": "..."}' \
--output /tmp/tts-${JOB_ID}.mp3
然后对 /tmp/tts-{jobId}.mp3 使用 Read 工具。
呈现:
音频已生成!
download 或 both: 根据文本内容生成主题 slug,遵循 shared/config-pattern.md § 产物命名。
SLUG="{topic-slug}" # 例如 "server-maintenance-notice"
NAME="${SLUG}.mp3"
# 去重:如果文件存在,则追加 -2, -3 等
BASE="${NAME%.*}"; EXT="${NAME##*.}"; i=2
while [ -e "$NAME" ]; do NAME="${BASE}-${i}.${EXT}"; i=$((i+1)); done
curl -sS -X POST "https://api.marswave.ai/openapi/v1/tts" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-H "X-Source: skills" \
-d '{"input": "...", "voice": "..."}' \
--output "$NAME"
呈现:
音频已生成!
已保存到当前目录:
{NAME}
POST /v1/speech步骤 1: 获取脚本
确定用户是否已提供脚本数组:
已提供 (JSON 或清晰片段): 解析并显示以确认
尚未提供 : 帮助用户构建片段。询问:
"请提供带有说话人分配的脚本。格式:每行
SpeakerName: 文本内容。我将进行转换。"
一旦用户提供脚本,将其解析为 scripts JSON 格式。
步骤 2: 为每个角色分配语音
对于脚本中的每个唯一角色:
config.defaultSpeakers.{language} 已保存语音 → 静默自动分配(按顺序为每个角色分配一个)shared/speaker-selection.md 中的内置默认值(第一个角色使用 Primary,第二个角色使用 Secondary)步骤 3: 保存偏好
在所有语音分配完成后(如果有新的分配):
问题:"是否将这些语音分配保存以备将来会话使用?"
选项:
- "是" — 更新 .listenhub/tts/config.json 中的 defaultSpeakers
- "否" — 仅本次会话使用
步骤 4: 确认
准备生成:
角色:
{name}: {voice}
{name}: {voice}
片段数:{count}
标题:(自动生成)
是否继续?
步骤 5: 生成
将请求体写入临时文件,然后提交:
# 将请求写入临时文件
cat > /tmp/lh-speech-request.json << 'ENDJSON'
{
"scripts": [
{"content": "...", "speakerId": "..."},
{"content": "...", "speakerId": "..."}
]
}
ENDJSON
# 提交
curl -sS -X POST "https://api.marswave.ai/openapi/v1/speech" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-H "X-Source: skills" \
-d @/tmp/lh-speech-request.json
rm /tmp/lh-speech-request.json
步骤 6: 呈现结果
从配置中读取 OUTPUT_MODE。遵循 shared/output-mode.md 中的行为。
inline 或 both: 将 audioUrl 和 subtitlesUrl 显示为可点击链接。
呈现:
音频已生成!
在线收听:{audioUrl}
字幕:{subtitlesUrl}
时长:{audioDuration / 1000}s
消耗积分:{credits}
download 或 both: 同时下载文件。根据文本内容生成主题 slug,遵循 shared/config-pattern.md § 产物命名。
SLUG="{topic-slug}" # 例如 "welcome-dialogue"
NAME="${SLUG}.mp3"
# 去重:如果文件存在,则追加 -2, -3 等
BASE="${NAME%.*}"; EXT="${NAME##*.}"; i=2
while [ -e "$NAME" ]; do NAME="${BASE}-${i}.${EXT}"; i=$((i+1)); done
curl -sS -o "$NAME" "{audioUrl}"
呈现:
已保存到当前目录:
{NAME}
保存偏好时,合并到 .listenhub/tts/config.json 中 —— 不要覆盖未更改的键。
defaultSpeakers.{language}[0] 设置为选定的 speakerIddefaultSpeakers.{language} 设置为本次会话分配的完整数组languageshared/api-tts.mdshared/api-speakers.mdshared/speaker-selection.mdshared/common-patterns.md § 错误处理shared/common-patterns.md § 长文本输入快速模式:
"TTS this: The server will be down for maintenance at midnight."
quickVoice 为 nullPOST /v1/tts 附带 input + voice/tmp/tts-output.mp3脚本模式:
"帮我做一段双人对话配音,A说:欢迎大家,B说:谢谢邀请"
scriptVoices 为空zh 说话人列表,为 A 和 B 分配语音POST /v1/speech 附带 scripts 数组audioUrl, subtitlesUrl, 时长每周安装数
487
代码仓库
GitHub 星标数
28
首次出现
12 天前
安全审计
安装于
codex483
gemini-cli481
cursor481
opencode481
github-copilot480
cline480
/podcast)/explainer)/image-gen)Convert text into natural-sounding speech audio. Two paths:
/v1/tts): Single voice, low-latency, sync MP3 stream. For casual chat, reading snippets, instant audio./v1/speech): Multi-speaker, per-segment voice assignment. For dialogue, audiobooks, scripted content.shared/authentication.md for API key and headersshared/common-patterns.md for errors and interaction patternsshared/speaker-selection.md as fallback only; fetch from the speakers API when the user wants to change voiceshared/config-pattern.md before any interactionshared/speaker-selection.md for speaker selection (text table + free-text input)~/Downloads/ or /tmp/ as primary output — save artifacts to the current working directory with friendly topic-based names (see shared/config-pattern.md § Artifact Naming)Determine the mode from the user's input automatically before asking any questions:
| Signal | Mode |
|---|---|
| "多角色", "脚本", "对话", "script", "dialogue", "multi-speaker" | Script |
| Multiple characters mentioned by name or role | Script |
| Input contains structured segments (A: ..., B: ...) | Script |
| Single paragraph of text, no character markers | Quick |
| "读一下", "read this", "TTS", "朗读" with plain text | Quick |
| Ambiguous | Quick (default) |
Follow shared/config-pattern.md § API Key Check. If the key is missing, stop immediately.
Follow shared/config-pattern.md Step 0 (Zero-Question Boot).
If file doesn't exist — silently create with defaults and proceed:
mkdir -p ".listenhub/tts"
echo '{"outputMode":"inline","language":null,"defaultSpeakers":{}}' > ".listenhub/tts/config.json"
CONFIG_PATH=".listenhub/tts/config.json"
CONFIG=$(cat "$CONFIG_PATH")
Do NOT ask any setup questions. Proceed directly to the Interaction Flow.
If file exists — read config silently and proceed:
CONFIG_PATH=".listenhub/tts/config.json"
[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/tts/config.json"
CONFIG=$(cat "$CONFIG_PATH")
Only run when the user explicitly asks to reconfigure. Display current settings:
当前配置 (tts):
输出方式:{inline / download / both}
语言偏好:{zh / en / 未设置}
默认主播:{speakerName / 使用内置默认}
Then ask:
outputMode : Follow shared/output-mode.md § Setup Flow Question.
Language (optional): "默认语言?"
nullAfter collecting answers, save immediately:
NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}')
# Save language if user chose one (not "每次手动选择")
if [ "$LANGUAGE" != "null" ]; then
NEW_CONFIG=$(echo "$NEW_CONFIG" | jq --arg lang "$LANGUAGE" '. + {"language": $lang}')
fi
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")
POST /v1/ttsStep 1: Extract text
Get the text to convert. If the user hasn't provided it, ask:
"What text would you like me to read aloud?"
Step 2: Determine voice
config.defaultSpeakers.{language}[0] is set → use it silently (skip to Step 4)shared/speaker-selection.md for the detected language (skip to Step 4)Step 3: Save preference
After the user explicitly selects a new voice (not when using defaults):
Question: "Save {voice name} as your default voice for {language}?"
Options:
- "Yes" — update .listenhub/tts/config.json
- "No" — use for this session only
Step 4: Confirm
Ready to generate:
Text: "{first 80 chars}..."
Voice: {voice name}
Proceed?
Step 5: Generate
curl -sS -X POST "https://api.marswave.ai/openapi/v1/tts" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-H "X-Source: skills" \
-d '{"input": "...", "voice": "..."}' \
--output /tmp/tts-output.mp3
Step 6: Present result
Read OUTPUT_MODE from config. Follow shared/output-mode.md for behavior.
Use a timestamped jobId: $(date +%s)
inline or both (TTS quick returns a sync audio stream — no audioUrl):
JOB_ID=$(date +%s)
curl -sS -X POST "https://api.marswave.ai/openapi/v1/tts" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-H "X-Source: skills" \
-d '{"input": "...", "voice": "..."}' \
--output /tmp/tts-${JOB_ID}.mp3
Then use the Read tool on /tmp/tts-{jobId}.mp3.
Present:
Audio generated!
download or both: Generate a topic slug from the text content following shared/config-pattern.md § Artifact Naming.
SLUG="{topic-slug}" # e.g. "server-maintenance-notice"
NAME="${SLUG}.mp3"
# Dedup: if file exists, append -2, -3, etc.
BASE="${NAME%.*}"; EXT="${NAME##*.}"; i=2
while [ -e "$NAME" ]; do NAME="${BASE}-${i}.${EXT}"; i=$((i+1)); done
curl -sS -X POST "https://api.marswave.ai/openapi/v1/tts" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-H "X-Source: skills" \
-d '{"input": "...", "voice": "..."}' \
--output "$NAME"
Present:
Audio generated!
已保存到当前目录:
{NAME}
POST /v1/speechStep 1: Get scripts
Determine whether the user already has a scripts array:
Already provided (JSON or clear segments): parse and display for confirmation
Not yet provided : help the user structure segments. Ask:
"Please provide the script with speaker assignments. Format: each line as
SpeakerName: text content. I'll convert it."
Once the user provides the script, parse it into the scripts JSON format.
Step 2: Assign voices per character
For each unique character in the script:
config.defaultSpeakers.{language} has saved voices → auto-assign silently (one per character in order)shared/speaker-selection.md (Primary for first character, Secondary for second)Step 3: Save preferences
After all voices are assigned (if any were new):
Question: "Save these voice assignments for future sessions?"
Options:
- "Yes" — update defaultSpeakers in .listenhub/tts/config.json
- "No" — use for this session only
Step 4: Confirm
Ready to generate:
Characters:
{name}: {voice}
{name}: {voice}
Segments: {count}
Title: (auto-generated)
Proceed?
Step 5: Generate
Write the request body to a temp file, then submit:
# Write request to temp file
cat > /tmp/lh-speech-request.json << 'ENDJSON'
{
"scripts": [
{"content": "...", "speakerId": "..."},
{"content": "...", "speakerId": "..."}
]
}
ENDJSON
# Submit
curl -sS -X POST "https://api.marswave.ai/openapi/v1/speech" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-H "X-Source: skills" \
-d @/tmp/lh-speech-request.json
rm /tmp/lh-speech-request.json
Step 6: Present result
Read OUTPUT_MODE from config. Follow shared/output-mode.md for behavior.
inline or both: Display the audioUrl and subtitlesUrl as clickable links.
Present:
Audio generated!
在线收听:{audioUrl}
字幕:{subtitlesUrl}
时长:{audioDuration / 1000}s
消耗积分:{credits}
download or both: Also download the file. Generate a topic slug following shared/config-pattern.md § Artifact Naming.
SLUG="{topic-slug}" # e.g. "welcome-dialogue"
NAME="${SLUG}.mp3"
# Dedup: if file exists, append -2, -3, etc.
BASE="${NAME%.*}"; EXT="${NAME##*.}"; i=2
while [ -e "$NAME" ]; do NAME="${BASE}-${i}.${EXT}"; i=$((i+1)); done
curl -sS -o "$NAME" "{audioUrl}"
Present:
已保存到当前目录:
{NAME}
When saving preferences, merge into .listenhub/tts/config.json — do not overwrite unchanged keys.
defaultSpeakers.{language}[0] to the selected speakerIddefaultSpeakers.{language} to the full array assigned this sessionlanguage if the user explicitly specifies itshared/api-tts.mdshared/api-speakers.mdshared/speaker-selection.mdshared/common-patterns.md § Error Handlingshared/common-patterns.md § Long Text InputQuick mode:
"TTS this: The server will be down for maintenance at midnight."
quickVoice is nullPOST /v1/tts with input + voice/tmp/tts-output.mp3Script mode:
"帮我做一段双人对话配音,A说:欢迎大家,B说:谢谢邀请"
scriptVoices emptyzh speakers, assign A and B voicesPOST /v1/speech with scripts arrayaudioUrl, subtitlesUrl, durationWeekly Installs
487
Repository
GitHub Stars
28
First Seen
12 days ago
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
codex483
gemini-cli481
cursor481
opencode481
github-copilot480
cline480
AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具
42,300 周安装
LLM提示词缓存优化指南:降低90%成本,实现多级缓存与语义匹配
323 周安装
小红书内容转换器:一键将通用文章转为小红书爆款笔记格式 | AI写作助手
323 周安装
内容摘要AI工具:智能提取YouTube、网页、PDF和推文内容,支持测验学习和深度探索
324 周安装
Notion知识捕获工具 - 将对话笔记自动转化为结构化Notion页面 | 知识管理自动化
324 周安装
现代Angular最佳实践指南:TypeScript严格性、信号响应式、性能优化与测试
324 周安装
iOS VoIP 通话开发:CallKit + PushKit 集成原生通话 UI 指南
324 周安装