whisper-transcription by guia-matthieu/clawfu-skills
npx skills add https://github.com/guia-matthieu/clawfu-skills --skill whisper-transcription使用 OpenAI 的 Whisper 模型将任何音频或视频转录为文本——这项技术与 ChatGPT 语音功能所采用的技术相同。
| Claude 负责 | 您决定 |
|---|---|
| 构建生产工作流程 | 最终的创意方向 |
| 建议技术方法 | 设备和工具选择 |
| 创建模板和检查清单 | 质量标准 |
| 识别最佳实践 | 品牌/声音决策 |
| 生成脚本大纲 | 最终脚本批准 |
pip install openai-whisper torch ffmpeg-python click
# 系统上还需要安装 ffmpeg
# macOS: brew install ffmpeg
# Ubuntu: sudo apt install ffmpeg
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
python scripts/main.py transcribe audio.mp3 --model medium --output transcript.txt
python scripts/main.py transcribe video.mp4 --format srt --output subtitles.srt
python scripts/main.py batch ./recordings/ --format txt --output ./transcripts/
python scripts/main.py translate foreign-audio.mp3 --to en
python scripts/main.py timestamps podcast.mp3 --format json
# 转录 1 小时播客
python scripts/main.py transcribe episode-42.mp3 --model medium
# 输出:episode-42.txt (带时间戳的完整转录稿)
# 处理时间:在 M1 Mac 上约 5 分钟处理 1 小时音频
# 为视频上传生成 SRT 文件
python scripts/main.py transcribe marketing-video.mp4 --format srt
# 输出:marketing-video.srt
# 可直接上传到 YouTube/Vimeo
# 转录文件夹中的所有录音文件
python scripts/main.py batch ./customer-interviews/ --model small --format txt
# 输出:./customer-interviews/*.txt (每个音频文件对应一个)
| 模型 | 速度 | 准确率 | 显存需求 | 最佳适用场景 |
|---|---|---|---|---|
tiny | 最快 | ~70% | 1GB | 快速草稿、短视频片段 |
base | 快 | ~80% | 1GB | 社交媒体片段 |
small | 中等 | ~85% | 2GB | 播客、访谈 |
medium | 慢 | ~90% | 5GB | 专业转录稿 |
large | 最慢 | ~95% | 10GB | 对准确性要求极高的场景 |
推荐: 对于大多数营销内容,从 small 模型开始。对于交付给客户的成果,使用 medium 模型。
| 格式 | 扩展名 | 使用场景 |
|---|---|---|
txt | .txt | 博客文章、分析 |
srt | .srt | 视频字幕 (YouTube) |
vtt | .vtt | 网页视频字幕 |
json | .json | 程序化访问 |
tsv | .tsv | 电子表格分析 |
--language 参数指定模式 : cyborg
category: automation subcategory: audio-processing dependencies: [openai-whisper, torch, ffmpeg-python] difficulty: beginner time_saved: 10+ hours/week
每周安装次数
125
代码仓库
GitHub 星标数
47
首次出现
2026年2月13日
安全审计
安装于
opencode120
gemini-cli120
cursor119
codex118
github-copilot117
kimi-cli116
Transcribe any audio or video to text using OpenAI's Whisper model - the same technology powering ChatGPT voice features.
| Claude Does | You Decide |
|---|---|
| Structures production workflow | Final creative direction |
| Suggests technical approaches | Equipment and tool choices |
| Creates templates and checklists | Quality standards |
| Identifies best practices | Brand/voice decisions |
| Generates script outlines | Final script approval |
pip install openai-whisper torch ffmpeg-python click
# Also requires ffmpeg installed on system
# macOS: brew install ffmpeg
# Ubuntu: sudo apt install ffmpeg
python scripts/main.py transcribe audio.mp3 --model medium --output transcript.txt
python scripts/main.py transcribe video.mp4 --format srt --output subtitles.srt
python scripts/main.py batch ./recordings/ --format txt --output ./transcripts/
python scripts/main.py translate foreign-audio.mp3 --to en
python scripts/main.py timestamps podcast.mp3 --format json
# Transcribe 1-hour podcast
python scripts/main.py transcribe episode-42.mp3 --model medium
# Output: episode-42.txt (full transcript with timestamps)
# Processing time: ~5 min for 1 hour audio on M1 Mac
# Generate SRT for video upload
python scripts/main.py transcribe marketing-video.mp4 --format srt
# Output: marketing-video.srt
# Upload directly to YouTube/Vimeo
# Transcribe all recordings in folder
python scripts/main.py batch ./customer-interviews/ --model small --format txt
# Output: ./customer-interviews/*.txt (one per audio file)
| Model | Speed | Accuracy | VRAM | Best For |
|---|---|---|---|---|
tiny | Fastest | ~70% | 1GB | Quick drafts, short clips |
base | Fast | ~80% | 1GB | Social media clips |
small | Medium | ~85% | 2GB | Podcasts, interviews |
medium | Slow |
Recommendation: Start with small for most marketing content. Use medium for client deliverables.
| Format | Extension | Use Case |
|---|---|---|
txt | .txt | Blog posts, analysis |
srt | .srt | Video subtitles (YouTube) |
vtt | .vtt | Web video subtitles |
json | .json | Programmatic access |
tsv | .tsv | Spreadsheet analysis |
--languageMode : cyborg
category: automation subcategory: audio-processing dependencies: [openai-whisper, torch, ffmpeg-python] difficulty: beginner time_saved: 10+ hours/week
Weekly Installs
125
Repository
GitHub Stars
47
First Seen
Feb 13, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
opencode120
gemini-cli120
cursor119
codex118
github-copilot117
kimi-cli116
AI Elements:基于shadcn/ui的AI原生应用组件库,快速构建对话界面
66,200 周安装
| ~90% |
| 5GB |
| Professional transcripts |
large | Slowest | ~95% | 10GB | Critical accuracy needs |