VideoDB：视频AI处理平台，支持桌面录制、直播监控、智能搜索与编辑

videodb by video-db/skills

269 周安装量

62 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/video-db/skills --skill videodb

内容创作自动化计算机视觉

🇨🇳中文介绍

VideoDB 技能

视频、直播流和桌面会话的感知 + 记忆 + 操作。

在需要以下功能时使用此技能：

1) 桌面感知

启动/停止捕获屏幕、麦克风和系统音频的桌面会话
流式传输实时上下文并存储会话片段记忆
对所说内容和屏幕上发生的事件运行实时警报/触发器
生成会话摘要、可搜索的时间线和可播放的证据链接

2) 视频摄取 + 流

摄取文件或 URL 并返回可播放的网络流链接
转码/标准化：编解码器、比特率、帧率、分辨率、宽高比

3) 索引 + 搜索（时间戳 + 证据）

构建视觉、语音和关键词索引
搜索并返回带有时间戳和可播放证据的确切时刻
从搜索结果自动创建片段

4) 时间线编辑 + 生成

字幕：生成、翻译、烧录
叠加层：文本/图像/品牌标识、动态字幕
音频：背景音乐、、

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

相关 Skills

FlyClaw：零登录航班聚合查询工具，Python实现多源航班信息与价格搜索

4,000,000 周安装

Azure RBAC 权限管理工具：查找最小角色、创建自定义角色与自动化分配

142,000 周安装

GitHub Actions 官方文档查询助手 - 精准解答 CI/CD 工作流问题

47,200 周安装

通过 LiteLLM 代理让 Claude Code 对接 GitHub Copilot 运行 | 高级变通方案指南

46,900 周安装

在终端中导出（推荐）：export VIDEO_DB_API_KEY=your-key
项目 .env 文件：在项目的 .env 文件中保存 VIDEO_DB_API_KEY=your-key

from videodb.exceptions import InvalidRequestError

video.index_spoken_words(force=True)

# 当未找到结果时，search() 会引发 InvalidRequestError。
# 始终用 try/except 包装，并将“未找到结果”视为空结果。
try:
    results = video.search("product demo")
    shots = results.get_shots()
    stream_url = results.compile()
except InvalidRequestError as e:
    if "No results found" in str(e):
        shots = []
    else:
        raise

import re
from videodb import SearchType, IndexType, SceneExtractionType
from videodb.exceptions import InvalidRequestError

# index_scenes() 没有 force 参数——如果场景索引已存在，它会引发错误。
# 从错误中提取现有的索引 ID。
try:
    scene_index_id = video.index_scenes(
        extraction_type=SceneExtractionType.shot_based,
        prompt="Describe the visual content in this scene.",
    )
except Exception as e:
    match = re.search(r"id\s+([a-f0-9]+)", str(e))
    if match:
        scene_index_id = match.group(1)
    else:
        raise

# 使用 score_threshold 过滤低相关性噪声（推荐：0.3+）
try:
    results = video.search(
        query="person writing on a whiteboard",
        search_type=SearchType.semantic,
        index_type=IndexType.scene,
        scene_index_id=scene_index_id,
        score_threshold=0.3,
    )
    shots = results.get_shots()
    stream_url = results.compile()
except InvalidRequestError as e:
    if "No results found" in str(e):
        shots = []
    else:
        raise

from videodb.editor import Timeline, Track, Clip, VideoAsset, ImageAsset, AudioAsset, Fit

timeline = Timeline(conn)
timeline.resolution = "1280x720"

video_track = Track()
video_track.add_clip(0, Clip(asset=VideoAsset(id=video.id, start=10), duration=20))

audio_track = Track()
audio_track.add_clip(0, Clip(asset=AudioAsset(id=music.id, volume=0.2), duration=20))

timeline.add_track(video_track)
timeline.add_track(audio_track)
stream_url = timeline.generate_stream()

from videodb import TranscodeMode, VideoConfig, AudioConfig

# 在服务器端更改分辨率、质量或宽高比
job_id = conn.transcode(
    source="https://example.com/video.mp4",
    callback_url="https://example.com/webhook",
    mode=TranscodeMode.economy,
    video_config=VideoConfig(resolution=720, quality=23, aspect_ratio="16:9"),
    audio_config=AudioConfig(mute=False),
)

尽可能使用 start/end 限制为短片段
对于全长视频，使用 callback_url 进行异步处理
先在 Timeline 上修剪视频，然后调整较短结果的宽高比

from videodb import ReframeMode

始终优先调整短片段的宽高比：

reframed = video.reframe(start=0, end=60, target="vertical", mode=ReframeMode.smart)

全长视频的异步调整宽高比（返回 None，结果通过 webhook 获取）：

video.reframe(target="vertical", callback_url="https://example.com/webhook")

预设："vertical" (9:16), "square" (1:1), "landscape" (16:9)

reframed = video.reframe(start=0, end=60, target="square")

自定义尺寸

reframed = video.reframe(start=0, end=60, target={"width": 1280, "height": 720})

场景	错误信息	解决方案
索引已索引的视频	`Spoken word index for video already exists`	使用 `video.index_spoken_words(force=True)` 跳过已索引的情况
场景索引已存在	`Scene index with id XXXX already exists`	使用 `re.search(r"id\s+([a-f0-9]+)", str(e))` 从错误中提取现有的 `scene_index_id`
搜索未找到匹配项	`InvalidRequestError: No results found`	捕获异常并视为空结果（`shots = []`）
调整宽高比超时	长视频无限期阻塞	使用 `start`/`end` 限制片段，或传递 `callback_url` 进行异步处理
时间线上出现负时间戳	静默产生损坏的流	在创建 `VideoAsset` 之前始终验证 `start >= 0`
`generate_video()` / `create_collection()` 失败	`Operation not allowed` 或 `maximum limit`	计划限制的功能——告知用户计划限制

reference/api-reference.md - 完整的 VideoDB Python SDK API 参考
reference/search.md - 视频搜索深入指南（语音和基于场景）
reference/editor.md - 时间线编辑工作流程指南（4层模型、用例、示例）
reference/editor-reference.md - 编辑器代码参考（构造函数、参数、枚举）
reference/streaming.md - HLS 流和即时播放
reference/generative.md - AI 驱动的媒体生成（图像、视频、音频）
reference/rtstream.md - 直播流摄取工作流程 (RTSP/RTMP)
reference/rtstream-reference.md - RTStream SDK 方法和 AI 管道
reference/capture.md - 桌面捕获工作流程
reference/capture-reference.md - 捕获 SDK 和 WebSocket 事件
reference/use-cases.md - 常见的视频处理模式和示例

启动监听器：python scripts/ws_listener.py --cwd=<PROJECT_ROOT> &
获取 WebSocket ID：cat /tmp/videodb_ws_id
运行捕获代码（完整工作流程请参阅 reference/capture.md）
事件写入到：/tmp/videodb_events.jsonl

import json
events = [json.loads(l) for l in open("/tmp/videodb_events.jsonl")]

# 获取所有转录文本
transcripts = [e["data"]["text"] for e in events if e.get("channel") == "transcript"]

# 获取最近 5 分钟的视觉描述
import time
cutoff = time.time() - 300
recent_visual = [e for e in events 
                 if e.get("channel") == "visual_index" and e["unix_ts"] > cutoff]

当 VideoDB 支持该操作时，不要使用 ffmpeg、moviepy 或本地编码工具。 以下所有操作均由 VideoDB 在服务器端处理——修剪、合并片段、叠加音频或音乐、添加字幕、文本/图像叠加层、转码、分辨率更改、宽高比转换、为平台要求调整大小、转录、音量控制、淡入淡出过渡和媒体生成。仅当 reference/editor.md 中“限制”部分列出的操作（速度变化、裁剪/缩放、色彩分级、关键帧动画）才回退到本地工具。

问题	VideoDB 解决方案
平台拒绝视频宽高比或分辨率	`video.reframe()` 或带有 `VideoConfig` 的 `conn.transcode()`
需要为 Twitter/Instagram/TikTok 调整视频大小	`video.reframe(target="vertical")` 或 `target="square"`
需要更改分辨率（例如 1080p → 720p）	带有 `VideoConfig(resolution=720)` 的 `conn.transcode()`
需要在视频上叠加音频/音乐	带有音量控制的 Editor `Timeline` 上的 `AudioAsset`
需要添加字幕	`video.add_subtitle()` 或 Editor `Timeline` 上的 `CaptionAsset`
需要合并/修剪片段	Editor `Timeline` 上的 `VideoAsset`
需要组合图像与画外音	在单独的 Editor 轨道上使用 `ImageAsset` + `AudioAsset`
需要生成画外音、音乐或音效	`coll.generate_voice()`、`generate_music()`、`generate_sound_effect()`

🇺🇸English

VideoDB Skill

Perception + memory + actions for video, live streams, and desktop sessions.

Use this skill when you need to:

1) Desktop Perception

Start/stop a desktop session capturing screen, mic, and system audio
Stream live context and store episodic session memory
Run real-time alerts/triggers on what’s spoken and what's happening on screen
Produce session summaries , a searchable timeline, and playable evidence links

2) Video ingest + stream

Ingest a file or URL and return a playable web stream link
Transcode/normalize: codec, bitrate, fps, resolution, aspect ratio

3) Index + search (timestamps + evidence)

Build visual , spoken , and keyword indexes
Search and return exact moments with timestamps and playable evidence
Auto-create clips from search results

4) Timeline editing + generation

Subtitles: generate , translate , burn-in
Overlays: text/image/branding , motion captions
Audio: background music , voiceover , dubbing
Programmatic composition and exports via timeline operations

5) Live streams (RTSP) + monitoring

Connect RTSP/live feeds
Run real-time visual and spoken understanding and emit events/alerts for monitoring workflows

Common inputs

Local file path , public URL , or RTSP URL
Desktop capture request: start / stop / summarize session
Desired operations: get context for understanding, transcode spec, index spec, search query, clip ranges, timeline edits, alert rules

Common outputs

Stream URL — make it playable: https://console.videodb.io/player?url={STREAM_URL}
Search results with timestamps and evidence links
Generated assets: subtitles, audio, images, clips
Event/alert payloads for live streams
Desktop session summaries and memory entries

Canonical prompts (examples)

“Start desktop capture and alert when a password field appears.”
“Record my session and produce an actionable summary when it ends.”
“Ingest this file and return a playable stream link.”
“Index this folder and find every scene with people, return timestamps.”
“Generate subtitles, burn them in, and add light background music.”
“Connect this RTSP URL and alert when a person enters the zone.”

Running Python code

CRITICAL: Always cd to the user's project directory before running Python code. This ensures load_dotenv(".env") finds the correct .env file.

from dotenv import load_dotenv
load_dotenv(".env")

import videodb
conn = videodb.connect()

This reads VIDEO_DB_API_KEY from:

Environment (if already exported)
Project's .env file in current directory

If the key is missing, videodb.connect() raises AuthenticationError automatically.

Do NOT write a script file when a short inline command works.

When writing inline Python (python -c "..."), always use properly formatted code — use semicolons to separate statements and keep it readable. For anything longer than ~3 statements, use a heredoc instead:

python << 'EOF'
from dotenv import load_dotenv
load_dotenv(".env")

import videodb
conn = videodb.connect()
coll = conn.get_collection()
print(f"Videos: {len(coll.get_videos())}")
EOF

Setup

When the user asks to "setup videodb" or similar:

1. Install SDK

pip install "videodb[capture]" python-dotenv

If videodb[capture] fails on Linux, install without the capture extra:

pip install videodb python-dotenv

2. Configure API key

The user must set VIDEO_DB_API_KEY using either method:

Export in terminal (recommended) : export VIDEO_DB_API_KEY=your-key
Project.env file: Save VIDEO_DB_API_KEY=your-key in the project's .env file

Get a free API key at https://console.videodb.io (50 free uploads, no credit card).

Do NOT read, write, or handle the API key yourself. Always let the user set it.

Quick Reference

Upload media

# URL
video = coll.upload(url="https://example.com/video.mp4")

# YouTube
video = coll.upload(url="https://www.youtube.com/watch?v=VIDEO_ID")

# Local file
video = coll.upload(file_path="/path/to/video.mp4")

Transcript + subtitle

# force=True skips the error if the video is already indexed
video.index_spoken_words(force=True)
text = video.get_transcript_text()
stream_url = video.add_subtitle()

Search inside videos

from videodb.exceptions import InvalidRequestError

video.index_spoken_words(force=True)

# search() raises InvalidRequestError when no results are found.
# Always wrap in try/except and treat "No results found" as empty.
try:
    results = video.search("product demo")
    shots = results.get_shots()
    stream_url = results.compile()
except InvalidRequestError as e:
    if "No results found" in str(e):
        shots = []
    else:
        raise

Scene search

import re
from videodb import SearchType, IndexType, SceneExtractionType
from videodb.exceptions import InvalidRequestError

# index_scenes() has no force parameter — it raises an error if a scene
# index already exists. Extract the existing index ID from the error.
try:
    scene_index_id = video.index_scenes(
        extraction_type=SceneExtractionType.shot_based,
        prompt="Describe the visual content in this scene.",
    )
except Exception as e:
    match = re.search(r"id\s+([a-f0-9]+)", str(e))
    if match:
        scene_index_id = match.group(1)
    else:
        raise

# Use score_threshold to filter low-relevance noise (recommended: 0.3+)
try:
    results = video.search(
        query="person writing on a whiteboard",
        search_type=SearchType.semantic,
        index_type=IndexType.scene,
        scene_index_id=scene_index_id,
        score_threshold=0.3,
    )
    shots = results.get_shots()
    stream_url = results.compile()
except InvalidRequestError as e:
    if "No results found" in str(e):
        shots = []
    else:
        raise

Timeline editing

Use the Editor API to compose videos, images, audio, and text. See reference/editor.md for full workflow.

from videodb.editor import Timeline, Track, Clip, VideoAsset, ImageAsset, AudioAsset, Fit

timeline = Timeline(conn)
timeline.resolution = "1280x720"

video_track = Track()
video_track.add_clip(0, Clip(asset=VideoAsset(id=video.id, start=10), duration=20))

audio_track = Track()
audio_track.add_clip(0, Clip(asset=AudioAsset(id=music.id, volume=0.2), duration=20))

timeline.add_track(video_track)
timeline.add_track(audio_track)
stream_url = timeline.generate_stream()

Transcode video (resolution / quality change)

from videodb import TranscodeMode, VideoConfig, AudioConfig

# Change resolution, quality, or aspect ratio server-side
job_id = conn.transcode(
    source="https://example.com/video.mp4",
    callback_url="https://example.com/webhook",
    mode=TranscodeMode.economy,
    video_config=VideoConfig(resolution=720, quality=23, aspect_ratio="16:9"),
    audio_config=AudioConfig(mute=False),
)

Reframe aspect ratio (for social platforms)

Warning: reframe() is a slow server-side operation. For long videos it can take several minutes and may time out. Best practices:

Always limit to a short segment using start/end when possible
For full-length videos, use callback_url for async processing
Trim the video on a Timeline first, then reframe the shorter result

from videodb import ReframeMode

Always prefer reframing a short segment:

reframed = video.reframe(start=0, end=60, target="vertical", mode=ReframeMode.smart)

Async reframe for full-length videos (returns None, result via webhook):

video.reframe(target="vertical", callback_url="https://example.com/webhook")

Presets: "vertical" (9:16), "square" (1:1), "landscape" (16:9)

reframed = video.reframe(start=0, end=60, target="square")

Custom dimensions

reframed = video.reframe(start=0, end=60, target={"width": 1280, "height": 720})

Generative media

image = coll.generate_image(
    prompt="a sunset over mountains",
    aspect_ratio="16:9",
)

Error handling

from videodb.exceptions import AuthenticationError, InvalidRequestError

try:
    conn = videodb.connect()
except AuthenticationError:
    print("Check your VIDEO_DB_API_KEY")

try:
    video = coll.upload(url="https://example.com/video.mp4")
except InvalidRequestError as e:
    print(f"Upload failed: {e}")

Common pitfalls

Scenario	Error message	Solution
Indexing an already-indexed video	`Spoken word index for video already exists`	Use `video.index_spoken_words(force=True)` to skip if already indexed
Scene index already exists	`Scene index with id XXXX already exists`	Extract the existing `scene_index_id` from the error with `re.search(r"id\s+([a-f0-9]+)", str(e))`
Search finds no matches	`InvalidRequestError: No results found`

Additional docs

Reference documentation is in the reference/ directory adjacent to this SKILL.md file. Use the Glob tool to locate it if needed.

reference/api-reference.md - Complete VideoDB Python SDK API reference
reference/search.md - In-depth guide to video search (spoken word and scene-based)
reference/editor.md - Timeline editing workflow guide (4-layer model, use cases, examples)
reference/editor-reference.md - Editor code reference (constructors, parameters, enums)
reference/streaming.md - HLS streaming and instant playback
reference/generative.md - AI-powered media generation (images, video, audio)
reference/rtstream.md - Live stream ingestion workflow (RTSP/RTMP)
reference/rtstream-reference.md - RTStream SDK methods and AI pipelines
reference/capture.md - Desktop capture workflow
reference/capture-reference.md - Capture SDK and WebSocket events
reference/use-cases.md - Common video processing patterns and examples

Screen Recording (Desktop Capture)

Use ws_listener.py to capture WebSocket events during recording sessions. Desktop capture supports macOS only.

Quick Start

Start listener : python scripts/ws_listener.py --cwd=<PROJECT_ROOT> &
Get WebSocket ID : cat /tmp/videodb_ws_id
Run capture code (see reference/capture.md for full workflow)
Events written to : /tmp/videodb_events.jsonl

Query Events

import json
events = [json.loads(l) for l in open("/tmp/videodb_events.jsonl")]

# Get all transcripts
transcripts = [e["data"]["text"] for e in events if e.get("channel") == "transcript"]

# Get visual descriptions from last 5 minutes
import time
cutoff = time.time() - 300
recent_visual = [e for e in events 
                 if e.get("channel") == "visual_index" and e["unix_ts"] > cutoff]

Utility Scripts

scripts/ws_listener.py - WebSocket event listener (dumps to JSONL)

For complete capture workflow, see reference/capture.md.

Do not use ffmpeg, moviepy, or local encoding tools when VideoDB supports the operation. The following are all handled server-side by VideoDB — trimming, combining clips, overlaying audio or music, adding subtitles, text/image overlays, transcoding, resolution changes, aspect-ratio conversion, resizing for platform requirements, transcription, volume control, fade transitions, and media generation. Only fall back to local tools for operations listed under Limitations in reference/editor.md (speed changes, crop/zoom, colour grading, keyframe animation).

When to use what

Problem	VideoDB solution
Platform rejects video aspect ratio or resolution	`video.reframe()` or `conn.transcode()` with `VideoConfig`
Need to resize video for Twitter/Instagram/TikTok	`video.reframe(target="vertical")` or `target="square"`
Need to change resolution (e.g. 1080p → 720p)	`conn.transcode()` with `VideoConfig(resolution=720)`
Need to overlay audio/music on video

Weekly Installs

134

Repository

video-db/skills

GitHub Stars

First Seen

Feb 25, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykWarn

Installed on

codex134

github-copilot133

kimi-cli133

gemini-cli133

cursor133

amp133

VideoDB：视频AI处理平台，支持桌面录制、直播监控、智能搜索与编辑

🇨🇳中文介绍

VideoDB 技能

1) 桌面感知

2) 视频摄取 + 流

3) 索引 + 搜索（时间戳 + 证据）

4) 时间线编辑 + 生成

相关 Skills

5) 直播流 (RTSP) + 监控

常见输入

常见输出

规范提示（示例）

运行 Python 代码

设置

1. 安装 SDK

2. 配置 API 密钥

快速参考

上传媒体

转录 + 字幕

在视频内搜索

场景搜索

时间线编辑

转码视频（分辨率 / 质量更改）

调整宽高比（适用于社交平台）

始终优先调整短片段的宽高比：

全长视频的异步调整宽高比（返回 None，结果通过 webhook 获取）：

预设："vertical" (9:16), "square" (1:1), "landscape" (16:9)

自定义尺寸

生成式媒体

错误处理

常见陷阱

附加文档

屏幕录制（桌面捕获）

快速开始

查询事件

实用脚本

何时使用何种方案