Coli ASR：离线语音转文字工具，支持中英日韩粤，无需API密钥

asr by marswaveai/skills

462 周安装量

28 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/marswaveai/skills --skill asr

AI/机器学习自动化音频处理

🇨🇳中文介绍

何时使用

用户想要将音频文件转录为文本
用户提供音频文件路径并要求转录
用户提到“转录”、“识别”、“transcribe”、“语音转文字”

何时不使用

用户想要从文本合成语音（使用 /tts）
用户想要创建播客或解说视频（使用 /podcast 或 /explainer）

目的

使用 coli asr 将音频文件转录为文本，该工具通过本地语音识别模型完全离线运行。无需 API 密钥。支持中文、英文、日文、韩文和粤语（sensevoice 模型）或仅英文（whisper 模型）。

运行 coli asr --help 查看当前 CLI 选项和支持的标志。

硬性约束

不使用 shell 脚本。仅使用直接命令。
在任何交互之前，始终按照 shared/config-pattern.md 读取配置
遵循 shared/common-patterns.md 中的交互模式

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

步骤 0：先决条件检查

在配置设置之前，静默检查环境：

COLI_OK=$(which coli 2>/dev/null && echo yes || echo no)
FFMPEG_OK=$(which ffmpeg 2>/dev/null && echo yes || echo no)
MODELS_DIR="$HOME/.coli/models"
MODELS_OK=$([ -d "$MODELS_DIR" ] && ls "$MODELS_DIR" | grep -q sherpa && echo yes || echo no)

问题	操作
未找到 `coli`	阻止。告知用户先运行 `npm install -g @marswave/coli`
未找到 `ffmpeg`	警告（WAV 文件仍可工作）。建议 `brew install ffmpeg` / `sudo apt install ffmpeg`
模型未下载	告知用户：首次转录将自动下载模型（约 60MB）到 `~/.coli/models/`

如果缺少 coli，在此停止，不继续执行。

步骤 0：配置设置

遵循 shared/config-pattern.md 的步骤 0（零问题启动）。

如果文件不存在 — 静默创建默认配置并继续：

mkdir -p ".listenhub/asr"
echo '{"model":"sensevoice","polish":true}' > ".listenhub/asr/config.json"
CONFIG_PATH=".listenhub/asr/config.json"
CONFIG=$(cat "$CONFIG_PATH")

不要询问任何设置问题。 直接使用合理的默认值（sensevoice 模型，启用润色）进入交互流程。

如果文件存在 — 静默读取配置并继续：

CONFIG_PATH=".listenhub/asr/config.json"
[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/asr/config.json"
CONFIG=$(cat "$CONFIG_PATH")

设置流程（仅限用户主动请求重新配置）

仅在用户明确要求重新配置时运行。显示当前设置：

当前配置 (asr)：
  模型：sensevoice / whisper-tiny.en
  润色：开启 / 关闭

model : "默认使用哪个语音识别模型？"
- "sensevoice（推荐）" — 支持中英日韩粤，可检测语言、情绪、音频事件
- "whisper-tiny.en" — 仅英文
polish : "转录后由 AI 润色文本？（修正标点、去语气词、提升可读性）"
- "是（推荐）" → polish: true
- "否，保留原始转录" → polish: false

收集所有答案后一次性保存。

步骤 1：获取音频文件

如果用户未提供文件路径，询问：

"请提供要转录的音频文件路径。"

在继续之前验证文件是否存在。

准备转录：

  文件：{filename}
  模型：{model}
  润色：{是 / 否}

继续？

使用 JSON 输出运行 coli asr（以获取元数据）：

coli asr -j --model {model} "{file}"

首次运行时，coli 将自动下载所需模型。这可能需要一些时间 — 如果模型尚未下载，请告知用户。

解析 JSON 结果以提取 text、lang、emotion、event、duration。

步骤 4：润色（如果启用）

如果 polish 为 true，则获取转录结果中的原始 text 并重写以修正标点、去除填充词并提高可读性。保留原意和说话者意图。不要总结或转述。

步骤 5：呈现结果

直接在对话中显示转录文本：

转录完成

{transcript text}

─────────────────
语言：{lang} · 情绪：{emotion} · 时长：{duration}s

如果经过润色，则显示润色后的版本，并注明经过 AI 优化。可根据要求提供原始版本。

步骤 6：导出为 Markdown（可选）

呈现结果后，询问：

Question: "保存为 Markdown 文件到当前目录？"
Options:
  - "是" — 保存到当前目录
  - "否" — 完成

如果选择是，将 {audio-filename}-transcript.md 写入当前工作目录（用户运行 Claude Code 的位置）。文件应包含转录文本（如果启用了润色，则为润色版本），并带有 front-matter 头部：

---
source: {original audio filename}
date: {YYYY-MM-DD}
model: {model used}
duration: {duration}s
lang: {detected language}
---

{transcript text}

由以下调用 : 未来需要转录录制音频的技能
调用以下 : 无

"帮我转录这个文件 meeting.m4a"

检查先决条件
读取配置
确认：meeting.m4a, sensevoice, 润色开启
运行 coli asr -j --model sensevoice "meeting.m4a"
润色原始文本
内联显示

"transcribe interview.wav, no polish"

检查先决条件
读取配置
本次会话覆盖润色为 false
运行 coli asr -j --model sensevoice "interview.wav"
内联显示原始转录文本

🇺🇸English

When to Use

User wants to transcribe an audio file to text
User provides an audio file path and asks for transcription
User says "转录", "识别", "transcribe", "语音转文字"

When NOT to Use

User wants to synthesize speech from text (use /tts)
User wants to create a podcast or explainer (use /podcast or /explainer)

Purpose

Transcribe audio files to text using coli asr, which runs fully offline via local speech recognition models. No API key required. Supports Chinese, English, Japanese, Korean, and Cantonese (sensevoice model) or English-only (whisper model).

Run coli asr --help for current CLI options and supported flags.

Hard Constraints

No shell scripts. Use direct commands only.
Always read config following shared/config-pattern.md before any interaction
Follow shared/common-patterns.md for interaction patterns
Never ask more than one question at a time

Interaction Flow

Step 0: Prerequisites Check

Before config setup, silently check the environment:

COLI_OK=$(which coli 2>/dev/null && echo yes || echo no)
FFMPEG_OK=$(which ffmpeg 2>/dev/null && echo yes || echo no)
MODELS_DIR="$HOME/.coli/models"
MODELS_OK=$([ -d "$MODELS_DIR" ] && ls "$MODELS_DIR" | grep -q sherpa && echo yes || echo no)

Issue	Action
`coli` not found	Block. Tell user to run `npm install -g @marswave/coli` first
`ffmpeg` not found	Warn (WAV files still work). Suggest `brew install ffmpeg` / `sudo apt install ffmpeg`
Models not downloaded	Inform user: first transcription will auto-download models (~60MB) to `~/.coli/models/`

If coli is missing, stop here and do not proceed.

Step 0: Config Setup

Follow shared/config-pattern.md Step 0 (Zero-Question Boot).

If file doesn't exist — silently create with defaults and proceed:

mkdir -p ".listenhub/asr"
echo '{"model":"sensevoice","polish":true}' > ".listenhub/asr/config.json"
CONFIG_PATH=".listenhub/asr/config.json"
CONFIG=$(cat "$CONFIG_PATH")

Do NOT ask any setup questions. Proceed directly to the Interaction Flow with sensible defaults (sensevoice model, polish enabled).

If file exists — read config silently and proceed:

CONFIG_PATH=".listenhub/asr/config.json"
[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/asr/config.json"
CONFIG=$(cat "$CONFIG_PATH")

Setup Flow (user-initiated reconfigure only)

Only run when the user explicitly asks to reconfigure. Display current settings:

当前配置 (asr)：
  模型：sensevoice / whisper-tiny.en
  润色：开启 / 关闭

Ask in order:

model : "默认使用哪个语音识别模型？"
- "sensevoice（推荐）" — 支持中英日韩粤，可检测语言、情绪、音频事件
- "whisper-tiny.en" — 仅英文
polish : "转录后由 AI 润色文本？（修正标点、去语气词、提升可读性）"
- "是（推荐）" → polish: true
- "否，保留原始转录" → polish: false

Save all answers at once after collecting them.

Step 1: Get Audio File

If the user hasn't provided a file path, ask:

"请提供要转录的音频文件路径。"

Verify the file exists before proceeding.

Step 2: Confirm

准备转录：

  文件：{filename}
  模型：{model}
  润色：{是 / 否}

继续？

Step 3: Transcribe

Run coli asr with JSON output (to get metadata):

coli asr -j --model {model} "{file}"

On first run, coli will automatically download the required model. This may take a moment — inform the user if models haven't been downloaded yet.

Parse the JSON result to extract text, lang, emotion, event, duration.

Step 4: Polish (if enabled)

If polish is true, take the raw text from the transcription result and rewrite it to fix punctuation, remove filler words, and improve readability. Preserve the original meaning and speaker intent. Do not summarize or paraphrase.

Step 5: Present Result

Display the transcript directly in the conversation:

转录完成

{transcript text}

─────────────────
语言：{lang} · 情绪：{emotion} · 时长：{duration}s

If polished, show the polished version with a note that it was AI-refined. Offer to show the raw original on request.

Step 6: Export as Markdown (optional)

After presenting the result, ask:

Question: "保存为 Markdown 文件到当前目录？"
Options:
  - "是" — save to current directory
  - "否" — done

If yes, write {audio-filename}-transcript.md to the current working directory (where the user is running Claude Code). The file should contain the transcript text (polished version if polish was enabled), with a front-matter header:

---
source: {original audio filename}
date: {YYYY-MM-DD}
model: {model used}
duration: {duration}s
lang: {detected language}
---

{transcript text}

Composability

Invoked by : future skills that need to transcribe recorded audio
Invokes : nothing

Examples

"帮我转录这个文件 meeting.m4a"

Check prerequisites
Read config
Confirm: meeting.m4a, sensevoice, polish on
Run coli asr -j --model sensevoice "meeting.m4a"
Polish the raw text
Display inline

"transcribe interview.wav, no polish"

Check prerequisites
Read config
Override polish to false for this session
Run coli asr -j --model sensevoice "interview.wav"
Display raw transcript inline

Weekly Installs

442

Repository

marswaveai/skills

GitHub Stars

First Seen

11 days ago

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

codex436

gemini-cli431

cursor431

opencode431

kimi-cli430

github-copilot430

AI Elements：基于shadcn/ui的AI原生应用组件库，快速构建对话界面

54,900 周安装