Whisper 音频转录工具 - 使用 OpenAI 模型将音视频转为文本/字幕 (SRT/VTT)

whisper-transcription by guia-matthieu/clawfu-skills

163 周安装量

61 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/guia-matthieu/clawfu-skills --skill whisper-transcription

AI/机器学习自动化音频处理

🇨🇳中文介绍

Whisper 转录

使用 OpenAI 的 Whisper 模型将任何音频或视频转录为文本——这项技术与 ChatGPT 语音功能所采用的技术相同。

何时使用此技能

播客内容再利用 - 将剧集转换为博客文章、节目说明、社交媒体片段
视频字幕 - 为 YouTube、社交媒体生成 SRT/VTT 文件
访谈内容提取 - 从录制的通话中提取引述和见解
内容审计 - 使音视频库可搜索
翻译 - 转录并翻译外语内容

Claude 负责的事项 vs 您决定的事项

Claude 负责	您决定
构建生产工作流程	最终的创意方向
建议技术方法	设备和工具选择
创建模板和检查清单	质量标准
识别最佳实践	品牌/声音决策
生成脚本大纲	最终脚本批准

依赖项

pip install openai-whisper torch ffmpeg-python click
# 系统上还需要安装 ffmpeg
# macOS: brew install ffmpeg
# Ubuntu: sudo apt install ffmpeg

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

示例 1：播客转博客文章

# 转录 1 小时播客
python scripts/main.py transcribe episode-42.mp3 --model medium

# 输出：episode-42.txt (带时间戳的完整转录稿)
# 处理时间：在 M1 Mac 上约 5 分钟处理 1 小时音频

示例 2：YouTube 字幕

# 为视频上传生成 SRT 文件
python scripts/main.py transcribe marketing-video.mp4 --format srt

# 输出：marketing-video.srt
# 可直接上传到 YouTube/Vimeo

示例 3：批量处理访谈库

# 转录文件夹中的所有录音文件
python scripts/main.py batch ./customer-interviews/ --model small --format txt

# 输出：./customer-interviews/*.txt (每个音频文件对应一个)

模型	速度	准确率	显存需求	最佳适用场景
`tiny`	最快	~70%	1GB	快速草稿、短视频片段
`base`	快	~80%	1GB	社交媒体片段
`small`	中等	~85%	2GB	播客、访谈
`medium`	慢	~90%	5GB	专业转录稿
`large`	最慢	~95%	10GB	对准确性要求极高的场景

推荐： 对于大多数营销内容，从 small 模型开始。对于交付给客户的成果，使用 medium 模型。

格式	扩展名	使用场景
`txt`	.txt	博客文章、分析
`srt`	.srt	视频字幕 (YouTube)
`vtt`	.vtt	网页视频字幕
`json`	.json	程序化访问
`tsv`	.tsv	电子表格分析

GPU 加速 - 使用 CUDA GPU 可提速 10 倍
音频提取 - 脚本自动从视频中提取音频
分块处理 - 长文件自动分割以提高内存效率
语言检测 - 自动检测，或使用 --language 参数指定

此技能擅长之处

构建音频制作工作流程
提供技术指导
创建质量检查清单
建议创意方法

此技能无法做到之事

替代音频工程专业知识
做出主观的创意决策
直接访问或编辑音频文件
保证商业成功

video-processing - 从视频中提取音频
youtube-downloader - 下载视频以进行转录
content-repurposer - 将转录稿转换为内容
podcast-production - 制作播客

模式 : cyborg

category: automation subcategory: audio-processing dependencies: [openai-whisper, torch, ffmpeg-python] difficulty: beginner time_saved: 10+ hours/week

🇺🇸English

Whisper Transcription

Transcribe any audio or video to text using OpenAI's Whisper model - the same technology powering ChatGPT voice features.

When to Use This Skill

Podcast repurposing - Convert episodes to blog posts, show notes, social snippets
Video subtitles - Generate SRT/VTT files for YouTube, social media
Interview extraction - Pull quotes and insights from recorded calls
Content audit - Make audio/video libraries searchable
Translation - Transcribe and translate foreign language content

What Claude Does vs What You Decide

Claude Does	You Decide
Structures production workflow	Final creative direction
Suggests technical approaches	Equipment and tool choices
Creates templates and checklists	Quality standards
Identifies best practices	Brand/voice decisions
Generates script outlines	Final script approval

Dependencies

pip install openai-whisper torch ffmpeg-python click
# Also requires ffmpeg installed on system
# macOS: brew install ffmpeg
# Ubuntu: sudo apt install ffmpeg

Commands

Transcribe Single File

python scripts/main.py transcribe audio.mp3 --model medium --output transcript.txt
python scripts/main.py transcribe video.mp4 --format srt --output subtitles.srt

Batch Transcription

python scripts/main.py batch ./recordings/ --format txt --output ./transcripts/

Transcribe + Translate

python scripts/main.py translate foreign-audio.mp3 --to en

Extract Timestamps

python scripts/main.py timestamps podcast.mp3 --format json

Examples

Example 1: Podcast to Blog Post

# Transcribe 1-hour podcast
python scripts/main.py transcribe episode-42.mp3 --model medium

# Output: episode-42.txt (full transcript with timestamps)
# Processing time: ~5 min for 1 hour audio on M1 Mac

Example 2: YouTube Subtitles

# Generate SRT for video upload
python scripts/main.py transcribe marketing-video.mp4 --format srt

# Output: marketing-video.srt
# Upload directly to YouTube/Vimeo

Example 3: Batch Process Interview Library

# Transcribe all recordings in folder
python scripts/main.py batch ./customer-interviews/ --model small --format txt

# Output: ./customer-interviews/*.txt (one per audio file)

Model Selection Guide

Model	Speed	Accuracy	VRAM	Best For
`tiny`	Fastest	~70%	1GB	Quick drafts, short clips
`base`	Fast	~80%	1GB	Social media clips
`small`	Medium	~85%	2GB	Podcasts, interviews
`medium`	Slow

Recommendation: Start with small for most marketing content. Use medium for client deliverables.

Output Formats

Format	Extension	Use Case
`txt`	.txt	Blog posts, analysis
`srt`	.srt	Video subtitles (YouTube)
`vtt`	.vtt	Web video subtitles
`json`	.json	Programmatic access
`tsv`	.tsv	Spreadsheet analysis

Performance Tips

GPU acceleration - 10x faster with CUDA GPU
Audio extraction - Script auto-extracts audio from video
Chunking - Long files auto-split for memory efficiency
Language detection - Automatic, or specify with --language

Skill Boundaries

What This Skill Does Well

Structuring audio production workflows
Providing technical guidance
Creating quality checklists
Suggesting creative approaches

What This Skill Cannot Do

Replace audio engineering expertise
Make subjective creative decisions
Access or edit audio files directly
Guarantee commercial success

Related Skills

video-processing - Extract audio from video
youtube-downloader - Download videos to transcribe
content-repurposer - Transform transcripts to content
podcast-production - Create podcasts

Skill Metadata

Mode : cyborg

category: automation subcategory: audio-processing dependencies: [openai-whisper, torch, ffmpeg-python] difficulty: beginner time_saved: 10+ hours/week

Weekly Installs

125

Repository

guia-matthieu/c…u-skills

GitHub Stars

First Seen

Feb 13, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

opencode120

gemini-cli120

cursor119

codex118

github-copilot117

kimi-cli116

AI Elements：基于shadcn/ui的AI原生应用组件库，快速构建对话界面

66,200 周安装