video-understand by heygen-com/skills
npx skills add https://github.com/heygen-com/skills --skill video-understand使用 ffmpeg 进行帧提取和 Whisper 进行转录,本地理解视频内容。完全离线,无需 API 密钥。
ffmpeg + ffprobe (必需): brew install ffmpegopenai-whisper (可选,用于转录): pip install openai-whisper# 场景检测 + 转录 (默认)
python3 skills/video-understand/scripts/understand_video.py video.mp4
# 关键帧提取
python3 skills/video-understand/scripts/understand_video.py video.mp4 -m keyframe
# 固定间隔提取
python3 skills/video-understand/scripts/understand_video.py video.mp4 -m interval
# 限制提取的帧数
python3 skills/video-understand/scripts/understand_video.py video.mp4 --max-frames 10
# 使用更大的 Whisper 模型
python3 skills/video-understand/scripts/understand_video.py video.mp4 --whisper-model small
# 仅提取帧,跳过转录
python3 skills/video-understand/scripts/understand_video.py video.mp4 --no-transcribe
# 静默模式 (仅 JSON 输出,无进度信息)
python3 skills/video-understand/scripts/understand_video.py video.mp4 -q
# 输出到文件
python3 skills/video-understand/scripts/understand_video.py video.mp4 -o result.json
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| 标志 | 描述 |
|---|---|
video | 输入视频文件 (位置参数,必需) |
-m, --mode | 提取模式: scene (默认), keyframe, interval |
--max-frames | 保留的最大帧数 (默认: 20) |
--whisper-model | Whisper 模型大小: tiny, base, small, medium, large (默认: base) |
--no-transcribe | 跳过音频转录,仅提取帧 |
-o, --output | 将结果 JSON 写入文件而非标准输出 |
-q, --quiet | 抑制进度消息,仅输出 JSON |
| 模式 | 工作原理 | 最佳适用场景 |
|---|---|---|
scene | 通过 ffmpeg select='gt(scene,0.3)' 检测场景变化 | 大多数视频,内容多样 |
keyframe | 提取 I 帧 (编解码器关键帧) | 具有自然关键帧位置的编码视频 |
interval | 基于时长和最大帧数均匀间隔采样 | 固定采样,输出可预测 |
如果 scene 模式未检测到场景变化,它会自动回退到 interval 模式。
脚本将 JSON 输出到标准输出 (或使用 -o 参数输出到文件)。完整模式请参见 references/output-format.md。
{
"video": "video.mp4",
"duration": 18.076,
"resolution": {"width": 1224, "height": 1080},
"mode": "scene",
"frames": [
{"path": "/abs/path/frame_0001.jpg", "timestamp": 0.0, "timestamp_formatted": "00:00"}
],
"frame_count": 12,
"transcript": [
{"start": 0.0, "end": 2.5, "text": "Hello and welcome..."}
],
"text": "Full transcript...",
"note": "Use the Read tool to view frame images for visual understanding."
}
使用 Read 工具查看帧图像路径,以视觉方式检查提取的帧。
references/output-format.md -- 完整的 JSON 输出模式文档每周安装量
356
代码仓库
GitHub 星标数
92
首次出现
9 天前
安全审计
安装于
claude-code319
opencode85
gemini-cli85
github-copilot85
codex85
amp85
Understand video content locally using ffmpeg for frame extraction and Whisper for transcription. Fully offline, no API keys required.
ffmpeg + ffprobe (required): brew install ffmpegopenai-whisper (optional, for transcription): pip install openai-whisper# Scene detection + transcribe (default)
python3 skills/video-understand/scripts/understand_video.py video.mp4
# Keyframe extraction
python3 skills/video-understand/scripts/understand_video.py video.mp4 -m keyframe
# Regular interval extraction
python3 skills/video-understand/scripts/understand_video.py video.mp4 -m interval
# Limit frames extracted
python3 skills/video-understand/scripts/understand_video.py video.mp4 --max-frames 10
# Use a larger Whisper model
python3 skills/video-understand/scripts/understand_video.py video.mp4 --whisper-model small
# Frames only, skip transcription
python3 skills/video-understand/scripts/understand_video.py video.mp4 --no-transcribe
# Quiet mode (JSON only, no progress)
python3 skills/video-understand/scripts/understand_video.py video.mp4 -q
# Output to file
python3 skills/video-understand/scripts/understand_video.py video.mp4 -o result.json
| Flag | Description |
|---|---|
video | Input video file (positional, required) |
-m, --mode | Extraction mode: scene (default), keyframe, interval |
--max-frames | Maximum frames to keep (default: 20) |
--whisper-model | Whisper model size: tiny, base, small, medium, large (default: base) |
| Mode | How it works | Best for |
|---|---|---|
scene | Detects scene changes via ffmpeg select='gt(scene,0.3)' | Most videos, varied content |
keyframe | Extracts I-frames (codec keyframes) | Encoded video with natural keyframe placement |
interval | Evenly spaced frames based on duration and max-frames | Fixed sampling, predictable output |
If scene mode detects no scene changes, it automatically falls back to interval mode.
The script outputs JSON to stdout (or file with -o). See references/output-format.md for the full schema.
{
"video": "video.mp4",
"duration": 18.076,
"resolution": {"width": 1224, "height": 1080},
"mode": "scene",
"frames": [
{"path": "/abs/path/frame_0001.jpg", "timestamp": 0.0, "timestamp_formatted": "00:00"}
],
"frame_count": 12,
"transcript": [
{"start": 0.0, "end": 2.5, "text": "Hello and welcome..."}
],
"text": "Full transcript...",
"note": "Use the Read tool to view frame images for visual understanding."
}
Use the Read tool on frame image paths to visually inspect extracted frames.
references/output-format.md -- Full JSON output schema documentationWeekly Installs
356
Repository
GitHub Stars
92
First Seen
9 days ago
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
claude-code319
opencode85
gemini-cli85
github-copilot85
codex85
amp85
AI Elements:基于shadcn/ui的AI原生应用组件库,快速构建对话界面
56,200 周安装
--no-transcribe| Skip audio transcription, extract frames only |
-o, --output | Write result JSON to file instead of stdout |
-q, --quiet | Suppress progress messages, output only JSON |