Paratran Transcription by briansunter/paratran
npx skills add https://github.com/briansunter/paratran --skill 'Paratran Transcription'使用 parakeet-mlx 为 Apple Silicon 提供音频转录功能。在 Open ASR 排行榜上排名第一,通过 MLX 比 Whisper 快约 30 倍。
提供三种接口:CLI、REST API 和 MCP 服务器。
uvx paratran recording.wav
uv tool install paratran
git clone https://github.com/briansunter/paratran.git
cd paratran
uv sync
uv run paratran recording.wav
# 转录为文本(默认)
paratran recording.wav
# 多个文件,带详细输出
paratran -v file1.wav file2.mp3 file3.m4a
# 输出为 SRT 字幕格式
paratran --output-format srt recording.wav
# 所有格式(txt, json, srt, vtt)输出到目录
paratran --output-format all --output-dir ./output recording.wav
# 使用束搜索解码
paratran --decoding beam recording.wav
# 自定义模型和缓存目录
paratran --model mlx-community/parakeet-tdt-1.1b-v2 --cache-dir /path/to/models recording.wav
| 标志 | 默认值 | 描述 |
|---|---|---|
--model | mlx-community/parakeet-tdt-0.6b-v3 | HuggingFace 模型 ID 或本地路径 |
--cache-dir | HuggingFace 默认值 | 模型缓存目录 |
--output-dir | . | 输出目录 |
--output-format | txt |
环境变量:PARATRAN_MODEL, PARATRAN_MODEL_DIR。
# 启动服务器
paratran serve
# 自定义主机、端口和模型缓存
paratran serve --host 127.0.0.1 --port 9000 --cache-dir /path/to/models
GET /health — 返回模型名称、状态和缓存目录。
POST /transcribe — 上传音频文件,返回转录 JSON。
# 基本转录
curl -X POST http://localhost:8000/transcribe -F "file=@recording.m4a"
# 使用束搜索和句子分割
curl -X POST "http://localhost:8000/transcribe?decoding=beam&max_words=20" -F "file=@recording.m4a"
# 仅提取文本
curl -s -X POST http://localhost:8000/transcribe -F "file=@audio.m4a" | jq -r '.text'
查询参数:decoding, beam_size, length_penalty, patience, duration_reward, max_words, silence_gap, max_duration, chunk_duration, overlap_duration, fp32。
{
"text": "完整的转录文本。",
"duration": 3.52,
"processing_time": 0.176,
"sentences": [
{
"text": "完整的转录文本。",
"start": 0.0,
"end": 3.52,
"tokens": [
{ "text": "完整", "start": 0.0, "end": 0.24 },
{ "text": "的转录文本", "start": 0.24, "end": 0.8 }
]
}
]
}
交互式 API 文档位于 http://localhost:8000/docs。
Paratran 包含一个 MCP 服务器,因此 Claude Code、Claude Desktop 或任何 MCP 客户端都可以直接转录音频文件。
添加到 .claude/settings.json:
{
"mcpServers": {
"paratran": {
"command": "uvx",
"args": ["--from", "paratran", "paratran-mcp"]
}
}
}
添加到 ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"paratran": {
"command": "uvx",
"args": ["--from", "paratran", "paratran-mcp"]
}
}
}
可选地在 env 块中设置 PARATRAN_MODEL_DIR 以自定义模型缓存位置。
transcribe 工具接受:
file_path(必需)— 音频文件的绝对路径decoding, beam_size, length_penalty, patience, duration_reward, max_words, silence_gap, max_duration, chunk_duration, overlap_duration, 返回包含完整文本、时长、处理时间以及带词级时间戳的句子的 JSON 字符串。
每周安装量
0
仓库
首次出现
1970年1月1日
安全审计
Audio transcription for Apple Silicon using parakeet-mlx. #1 on Open ASR Leaderboard, ~30x faster than Whisper via MLX.
Three interfaces: CLI, REST API, and MCP server.
uvx paratran recording.wav
uv tool install paratran
git clone https://github.com/briansunter/paratran.git
cd paratran
uv sync
uv run paratran recording.wav
# Transcribe to text (default)
paratran recording.wav
# Multiple files with verbose output
paratran -v file1.wav file2.mp3 file3.m4a
# Output as SRT subtitles
paratran --output-format srt recording.wav
# All formats (txt, json, srt, vtt) to a directory
paratran --output-format all --output-dir ./output recording.wav
# Beam search decoding
paratran --decoding beam recording.wav
# Custom model and cache directory
paratran --model mlx-community/parakeet-tdt-1.1b-v2 --cache-dir /path/to/models recording.wav
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
txt, json, srt, vtt, 或 all |
--decoding | greedy | greedy 或 beam |
--chunk-duration | 120 | 分块时长(秒),0 表示禁用 |
--overlap-duration | 15 | 分块之间的重叠时长 |
--beam-size | 5 | 束大小(束搜索解码) |
--fp32 | 使用 FP32 精度而非 BF16 |
-v | 详细输出 |
fp32| Flag | Default | Description |
|---|---|---|
--model | mlx-community/parakeet-tdt-0.6b-v3 | HF model ID or local path |
--cache-dir | HuggingFace default | Model cache directory |
--output-dir | . | Output directory |
--output-format | txt | txt, json, srt, vtt, or all |
--decoding | greedy | greedy or beam |
--chunk-duration | 120 | Chunk duration in seconds (0 to disable) |
--overlap-duration | 15 | Overlap between chunks |
--beam-size | 5 | Beam size (beam decoding) |
--fp32 | Use FP32 precision instead of BF16 | |
-v | Verbose output |
Environment variables: PARATRAN_MODEL, PARATRAN_MODEL_DIR.
# Start server
paratran serve
# Custom host, port, and model cache
paratran serve --host 127.0.0.1 --port 9000 --cache-dir /path/to/models
GET /health — Returns model name, status, and cache directory.
POST /transcribe — Upload audio file, returns transcription JSON.
# Basic transcription
curl -X POST http://localhost:8000/transcribe -F "file=@recording.m4a"
# With beam search and sentence splitting
curl -X POST "http://localhost:8000/transcribe?decoding=beam&max_words=20" -F "file=@recording.m4a"
# Extract just text
curl -s -X POST http://localhost:8000/transcribe -F "file=@audio.m4a" | jq -r '.text'
Query parameters: decoding, beam_size, length_penalty, patience, duration_reward, max_words, silence_gap, max_duration, chunk_duration, overlap_duration, fp32.
{
"text": "Full transcription text.",
"duration": 3.52,
"processing_time": 0.176,
"sentences": [
{
"text": "Full transcription text.",
"start": 0.0,
"end": 3.52,
"tokens": [
{ "text": "Full", "start": 0.0, "end": 0.24 },
{ "text": " transcription", "start": 0.24, "end": 0.8 }
]
}
]
}
Interactive API docs at http://localhost:8000/docs.
Paratran includes an MCP server so Claude Code, Claude Desktop, or any MCP client can transcribe audio files directly.
Add to .claude/settings.json:
{
"mcpServers": {
"paratran": {
"command": "uvx",
"args": ["--from", "paratran", "paratran-mcp"]
}
}
}
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"paratran": {
"command": "uvx",
"args": ["--from", "paratran", "paratran-mcp"]
}
}
}
Optionally set PARATRAN_MODEL_DIR in the env block to customize the model cache location.
The transcribe tool accepts:
file_path (required) — absolute path to audio filedecoding, beam_size, length_penalty, patience, duration_reward, max_words, silence_gap, max_duration, chunk_duration, overlap_duration, fp32Returns JSON string with full text, duration, processing time, and sentences with word-level timestamps.
Weekly Installs
0
Repository
First Seen
Jan 1, 1970
Security Audits
超能力技能使用指南:AI助手技能调用优先级与工作流程详解
45,100 周安装