video-understand：离线视频理解工具，使用FFmpeg提取帧+Whisper转录，无需API密钥

video-understand by heygen-com/skills

356 周安装量

92 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/heygen-com/skills --skill video-understand

AI/机器学习自动化数据处理

🇨🇳中文介绍

video-understand

使用 ffmpeg 进行帧提取和 Whisper 进行转录，本地理解视频内容。完全离线，无需 API 密钥。

前提条件

ffmpeg + ffprobe (必需): brew install ffmpeg
openai-whisper (可选，用于转录): pip install openai-whisper

命令

# 场景检测 + 转录 (默认)
python3 skills/video-understand/scripts/understand_video.py video.mp4

# 关键帧提取
python3 skills/video-understand/scripts/understand_video.py video.mp4 -m keyframe

# 固定间隔提取
python3 skills/video-understand/scripts/understand_video.py video.mp4 -m interval

# 限制提取的帧数
python3 skills/video-understand/scripts/understand_video.py video.mp4 --max-frames 10

# 使用更大的 Whisper 模型
python3 skills/video-understand/scripts/understand_video.py video.mp4 --whisper-model small

# 仅提取帧，跳过转录
python3 skills/video-understand/scripts/understand_video.py video.mp4 --no-transcribe

# 静默模式 (仅 JSON 输出，无进度信息)
python3 skills/video-understand/scripts/understand_video.py video.mp4 -q

# 输出到文件
python3 skills/video-understand/scripts/understand_video.py video.mp4 -o result.json

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

标志	描述
`video`	输入视频文件 (位置参数，必需)
`-m, --mode`	提取模式: `scene` (默认), `keyframe`, `interval`
`--max-frames`	保留的最大帧数 (默认: 20)
`--whisper-model`	Whisper 模型大小: tiny, base, small, medium, large (默认: base)
`--no-transcribe`	跳过音频转录，仅提取帧
`-o, --output`	将结果 JSON 写入文件而非标准输出
`-q, --quiet`	抑制进度消息，仅输出 JSON

模式	工作原理	最佳适用场景
`scene`	通过 ffmpeg `select='gt(scene,0.3)'` 检测场景变化	大多数视频，内容多样
`keyframe`	提取 I 帧 (编解码器关键帧)	具有自然关键帧位置的编码视频
`interval`	基于时长和最大帧数均匀间隔采样	固定采样，输出可预测

🇺🇸English

video-understand

Understand video content locally using ffmpeg for frame extraction and Whisper for transcription. Fully offline, no API keys required.

Prerequisites

ffmpeg + ffprobe (required): brew install ffmpeg
openai-whisper (optional, for transcription): pip install openai-whisper

Commands

# Scene detection + transcribe (default)
python3 skills/video-understand/scripts/understand_video.py video.mp4

# Keyframe extraction
python3 skills/video-understand/scripts/understand_video.py video.mp4 -m keyframe

# Regular interval extraction
python3 skills/video-understand/scripts/understand_video.py video.mp4 -m interval

# Limit frames extracted
python3 skills/video-understand/scripts/understand_video.py video.mp4 --max-frames 10

# Use a larger Whisper model
python3 skills/video-understand/scripts/understand_video.py video.mp4 --whisper-model small

# Frames only, skip transcription
python3 skills/video-understand/scripts/understand_video.py video.mp4 --no-transcribe

# Quiet mode (JSON only, no progress)
python3 skills/video-understand/scripts/understand_video.py video.mp4 -q

# Output to file
python3 skills/video-understand/scripts/understand_video.py video.mp4 -o result.json

CLI Options

Flag	Description
`video`	Input video file (positional, required)
`-m, --mode`	Extraction mode: `scene` (default), `keyframe`, `interval`
`--max-frames`	Maximum frames to keep (default: 20)
`--whisper-model`	Whisper model size: tiny, base, small, medium, large (default: base)

Extraction Modes

Mode	How it works	Best for
`scene`	Detects scene changes via ffmpeg `select='gt(scene,0.3)'`	Most videos, varied content
`keyframe`	Extracts I-frames (codec keyframes)	Encoded video with natural keyframe placement
`interval`	Evenly spaced frames based on duration and max-frames	Fixed sampling, predictable output

If scene mode detects no scene changes, it automatically falls back to interval mode.

Output

The script outputs JSON to stdout (or file with -o). See references/output-format.md for the full schema.

{
  "video": "video.mp4",
  "duration": 18.076,
  "resolution": {"width": 1224, "height": 1080},
  "mode": "scene",
  "frames": [
    {"path": "/abs/path/frame_0001.jpg", "timestamp": 0.0, "timestamp_formatted": "00:00"}
  ],
  "frame_count": 12,
  "transcript": [
    {"start": 0.0, "end": 2.5, "text": "Hello and welcome..."}
  ],
  "text": "Full transcript...",
  "note": "Use the Read tool to view frame images for visual understanding."
}

Use the Read tool on frame image paths to visually inspect extracted frames.

References

references/output-format.md -- Full JSON output schema documentation

Weekly Installs

356

Repository

heygen-com/skills

GitHub Stars

First Seen

9 days ago

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

claude-code319

opencode85

gemini-cli85

github-copilot85

codex85

amp85

video-understand：离线视频理解工具，使用FFmpeg提取帧+Whisper转录，无需API密钥

🇨🇳中文介绍

video-understand

前提条件

命令

相关 Skills

CLI 选项

提取模式

输出

参考

🇺🇸English

video-understand

Prerequisites

Commands

CLI Options

Extraction Modes

Output

References

最新 Skills