npx skills add https://github.com/marswaveai/skills --skill asr/tts)/podcast 或 /explainer)使用 coli asr 将音频文件转录为文本,该工具通过本地语音识别模型完全离线运行。无需 API 密钥。支持中文、英文、日文、韩文和粤语(sensevoice 模型)或仅英文(whisper 模型)。
运行 coli asr --help 查看当前 CLI 选项和支持的标志。
shared/config-pattern.md 读取配置shared/common-patterns.md 中的交互模式广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
在配置设置之前,静默检查环境:
COLI_OK=$(which coli 2>/dev/null && echo yes || echo no)
FFMPEG_OK=$(which ffmpeg 2>/dev/null && echo yes || echo no)
MODELS_DIR="$HOME/.coli/models"
MODELS_OK=$([ -d "$MODELS_DIR" ] && ls "$MODELS_DIR" | grep -q sherpa && echo yes || echo no)
| 问题 | 操作 |
|---|---|
未找到 coli | 阻止。告知用户先运行 npm install -g @marswave/coli |
未找到 ffmpeg | 警告(WAV 文件仍可工作)。建议 brew install ffmpeg / sudo apt install ffmpeg |
| 模型未下载 | 告知用户:首次转录将自动下载模型(约 60MB)到 ~/.coli/models/ |
如果缺少 coli,在此停止,不继续执行。
遵循 shared/config-pattern.md 的步骤 0(零问题启动)。
如果文件不存在 — 静默创建默认配置并继续:
mkdir -p ".listenhub/asr"
echo '{"model":"sensevoice","polish":true}' > ".listenhub/asr/config.json"
CONFIG_PATH=".listenhub/asr/config.json"
CONFIG=$(cat "$CONFIG_PATH")
不要询问任何设置问题。 直接使用合理的默认值(sensevoice 模型,启用润色)进入交互流程。
如果文件存在 — 静默读取配置并继续:
CONFIG_PATH=".listenhub/asr/config.json"
[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/asr/config.json"
CONFIG=$(cat "$CONFIG_PATH")
仅在用户明确要求重新配置时运行。显示当前设置:
当前配置 (asr):
模型:sensevoice / whisper-tiny.en
润色:开启 / 关闭
按顺序询问:
model : "默认使用哪个语音识别模型?"
polish : "转录后由 AI 润色文本?(修正标点、去语气词、提升可读性)"
polish: truepolish: false收集所有答案后一次性保存。
如果用户未提供文件路径,询问:
"请提供要转录的音频文件路径。"
在继续之前验证文件是否存在。
准备转录:
文件:{filename}
模型:{model}
润色:{是 / 否}
继续?
使用 JSON 输出运行 coli asr(以获取元数据):
coli asr -j --model {model} "{file}"
首次运行时,coli 将自动下载所需模型。这可能需要一些时间 — 如果模型尚未下载,请告知用户。
解析 JSON 结果以提取 text、lang、emotion、event、duration。
如果 polish 为 true,则获取转录结果中的原始 text 并重写以修正标点、去除填充词并提高可读性。保留原意和说话者意图。不要总结或转述。
直接在对话中显示转录文本:
转录完成
{transcript text}
─────────────────
语言:{lang} · 情绪:{emotion} · 时长:{duration}s
如果经过润色,则显示润色后的版本,并注明经过 AI 优化。可根据要求提供原始版本。
呈现结果后,询问:
Question: "保存为 Markdown 文件到当前目录?"
Options:
- "是" — 保存到当前目录
- "否" — 完成
如果选择是,将 {audio-filename}-transcript.md 写入当前工作目录(用户运行 Claude Code 的位置)。文件应包含转录文本(如果启用了润色,则为润色版本),并带有 front-matter 头部:
---
source: {original audio filename}
date: {YYYY-MM-DD}
model: {model used}
duration: {duration}s
lang: {detected language}
---
{transcript text}
"帮我转录这个文件 meeting.m4a"
coli asr -j --model sensevoice "meeting.m4a""transcribe interview.wav, no polish"
coli asr -j --model sensevoice "interview.wav"每周安装次数
442
仓库
GitHub Stars
24
首次出现
11 天前
安全审计
安装于
codex436
gemini-cli431
cursor431
opencode431
kimi-cli430
github-copilot430
/tts)/podcast or /explainer)Transcribe audio files to text using coli asr, which runs fully offline via local speech recognition models. No API key required. Supports Chinese, English, Japanese, Korean, and Cantonese (sensevoice model) or English-only (whisper model).
Run coli asr --help for current CLI options and supported flags.
shared/config-pattern.md before any interactionshared/common-patterns.md for interaction patternsBefore config setup, silently check the environment:
COLI_OK=$(which coli 2>/dev/null && echo yes || echo no)
FFMPEG_OK=$(which ffmpeg 2>/dev/null && echo yes || echo no)
MODELS_DIR="$HOME/.coli/models"
MODELS_OK=$([ -d "$MODELS_DIR" ] && ls "$MODELS_DIR" | grep -q sherpa && echo yes || echo no)
| Issue | Action |
|---|---|
coli not found | Block. Tell user to run npm install -g @marswave/coli first |
ffmpeg not found | Warn (WAV files still work). Suggest brew install ffmpeg / sudo apt install ffmpeg |
| Models not downloaded | Inform user: first transcription will auto-download models (~60MB) to ~/.coli/models/ |
If coli is missing, stop here and do not proceed.
Follow shared/config-pattern.md Step 0 (Zero-Question Boot).
If file doesn't exist — silently create with defaults and proceed:
mkdir -p ".listenhub/asr"
echo '{"model":"sensevoice","polish":true}' > ".listenhub/asr/config.json"
CONFIG_PATH=".listenhub/asr/config.json"
CONFIG=$(cat "$CONFIG_PATH")
Do NOT ask any setup questions. Proceed directly to the Interaction Flow with sensible defaults (sensevoice model, polish enabled).
If file exists — read config silently and proceed:
CONFIG_PATH=".listenhub/asr/config.json"
[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/asr/config.json"
CONFIG=$(cat "$CONFIG_PATH")
Only run when the user explicitly asks to reconfigure. Display current settings:
当前配置 (asr):
模型:sensevoice / whisper-tiny.en
润色:开启 / 关闭
Ask in order:
model : "默认使用哪个语音识别模型?"
polish : "转录后由 AI 润色文本?(修正标点、去语气词、提升可读性)"
polish: truepolish: falseSave all answers at once after collecting them.
If the user hasn't provided a file path, ask:
"请提供要转录的音频文件路径。"
Verify the file exists before proceeding.
准备转录:
文件:{filename}
模型:{model}
润色:{是 / 否}
继续?
Run coli asr with JSON output (to get metadata):
coli asr -j --model {model} "{file}"
On first run, coli will automatically download the required model. This may take a moment — inform the user if models haven't been downloaded yet.
Parse the JSON result to extract text, lang, emotion, event, duration.
If polish is true, take the raw text from the transcription result and rewrite it to fix punctuation, remove filler words, and improve readability. Preserve the original meaning and speaker intent. Do not summarize or paraphrase.
Display the transcript directly in the conversation:
转录完成
{transcript text}
─────────────────
语言:{lang} · 情绪:{emotion} · 时长:{duration}s
If polished, show the polished version with a note that it was AI-refined. Offer to show the raw original on request.
After presenting the result, ask:
Question: "保存为 Markdown 文件到当前目录?"
Options:
- "是" — save to current directory
- "否" — done
If yes, write {audio-filename}-transcript.md to the current working directory (where the user is running Claude Code). The file should contain the transcript text (polished version if polish was enabled), with a front-matter header:
---
source: {original audio filename}
date: {YYYY-MM-DD}
model: {model used}
duration: {duration}s
lang: {detected language}
---
{transcript text}
"帮我转录这个文件 meeting.m4a"
coli asr -j --model sensevoice "meeting.m4a""transcribe interview.wav, no polish"
coli asr -j --model sensevoice "interview.wav"Weekly Installs
442
Repository
GitHub Stars
24
First Seen
11 days ago
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
codex436
gemini-cli431
cursor431
opencode431
kimi-cli430
github-copilot430
AI Elements:基于shadcn/ui的AI原生应用组件库,快速构建对话界面
54,900 周安装