Veo 3.1 视频生成工具：文生视频、图生视频、AI视频制作与艺术风格应用

generate-video by b-open-io/gemskills

1 周安装量

2 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/b-open-io/gemskills --skill generate-video

AI/机器学习内容创作自动化

🇨🇳中文介绍

生成视频

使用 Veo 3.1 (veo-3.1-generate-preview) 生成带有原生音频、720p/1080p/4K 分辨率、4-8 秒片段的视频。

使用时机

当用户要求以下操作时，请使用此技能：

根据文本提示生成视频（文生视频）
为现有图像制作动画（图生视频）
使用艺术风格库创建具有特定风格的视频片段
构建带有自动生成起始帧的图生视频流程

关键：作为后台任务运行

Veo 视频生成需要 11 秒到 6 分钟。脚本内部处理轮询，仅在完成时输出文件路径。

始终将生成脚本作为后台任务运行，以避免阻塞对话，并防止轮询输出使上下文膨胀：

# 正确：后台任务
bun run scripts/generate.ts "prompt" --output video.mp4
# 在 Bash 工具中使用 run_in_background: true 运行

后台任务完成后，只读取输出的最后一行（文件路径）。不要读取完整输出——它只包含 stderr 进度点。

风格选择

如果用户未指定风格，请在继续之前呈现一个多项选择题：

您希望如何处理艺术风格？

选择一种风格 - 可视化浏览 169 种风格并选择一种

让我选择 - 我将根据您的提示建议一种风格

无风格 - 不应用特定艺术风格进行生成

使用 AskUserQuestion 工具呈现此选择。

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

相关 Skills

FlyClaw：零登录航班聚合查询工具，Python实现多源航班信息与价格搜索

4,000,000 周安装

find-skills 技能搜索工具 - Vercel Labs 开源智能体技能包管理器

812,900 周安装

Azure RBAC 权限管理工具：查找最小角色、创建自定义角色与自动化分配

117,000 周安装

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

109,600 周安装

如果用户选择"选择一种风格"，则启动交互式风格选择器：

STYLE_JSON=$(bun run ${CLAUDE_PLUGIN_ROOT}/skills/browsing-styles/scripts/preview_server.ts --pick --port=3456)

通过 --style <id> 将选定的风格传递给生成命令。

如果用户已经指定了风格，则跳过此步骤，直接使用 --style。

在生成任何视频之前，请根据 references/veo-prompt-guide.md 中的指南改写用户的提示词。

通过添加以下内容来转换提示词：

主体细节 - 具体外观、位置、比例
动作/运动 - 什么在动、速度、方向
风格/美学 - 视觉处理、色彩搭配
摄像机运动 - 平移、推拉、环绕、静态
构图 - 取景、景深、焦点
氛围/音频 - 灯光情绪、声音提示（Veo 生成原生音频）

用户说："海浪" 改写后："黄金时段，戏剧性的慢动作海浪拍打着黑色的火山岩。浪花捕捉到温暖的阳光，形成彩虹般的薄雾。低角度拍摄，摄像机缓慢向前推近。远处传来深沉的海浪轰鸣声和海鸥的叫声。"

bun run ${CLAUDE_PLUGIN_ROOT}/skills/generate-video/scripts/generate.ts "prompt" [options]

--input <path> - 起始帧图像（图生视频模式）
--ref <path> - 用于主体一致性的参考图像（最多 3 张，可多次指定）。自动选择 replicate-veo 模型。
--last-frame <path> - 用于插值的结束帧。自动选择 replicate-veo 模型。
--style <id> - 应用风格库中的风格（与 generate-image 相同）
--aspect <ratio> - 16:9（默认）或 9:16
--resolution <res> - 720p（Gemini 默认）、1080p（Replicate 默认）、4k（仅限 Gemini API）
--duration <sec> - 4、6、8（默认：8）
--negative <text> - 负面提示词（要避免的内容）
--seed <n> - 用于可重复性的随机种子
--output <path> - 输出 .mp4 文件路径
--model <name> - veo（默认，Gemini API）、replicate-veo（Replicate Veo 3.1）或 grok（第三级备选方案）
--no-audio - 禁用音频生成（仅限 Replicate Veo）
--auto-image - 与 --style 一起使用时，首先自动生成带风格的起始帧

# 文生视频（Gemini API，默认）
bun run scripts/generate.ts "Ocean waves crashing on volcanic rocks at sunset" --output waves.mp4

# 图生视频（为现有图像制作动画）
bun run scripts/generate.ts "The lion slowly turns its head, dots shimmer" --input lion.png --output lion.mp4

# 使用参考图像生成主体一致的视频（自动选择 Replicate Veo）
bun run scripts/generate.ts "Two warriors face off in a wheat field, dramatic standoff" \
  --ref warrior1.png --ref warrior2.png --ref scene.png --output standoff.mp4

# 带有结束帧插值的图生视频（自动选择 Replicate Veo）
bun run scripts/generate.ts "Camera slowly pans across the landscape" \
  --input start.png --last-frame end.png --output pan.mp4

# 使用艺术风格
bun run scripts/generate.ts "Mountain landscape comes alive with wind" --style impr --output mountain.mp4

# 完整流程：自动生成带风格的图像，然后制作动画
bun run scripts/generate.ts "A lion turns majestically" --style kusm --auto-image --output lion.mp4

# 用于社交媒体的竖版视频
bun run scripts/generate.ts "Waterfall in lush forest" --aspect 9:16 --resolution 1080p --output waterfall.mp4

# 高分辨率（Gemini API）
bun run scripts/generate.ts "City skyline timelapse" --resolution 4k --duration 8 --output city.mp4

# 当内容被 Veo 安全过滤器阻止时，使用 Grok 作为备选方案
bun run scripts/generate.ts "Famous person dancing" --model grok --output dance.mp4

参考图像（主体一致性）

使用 --ref 传递 1-3 张参考图像，用于生成主体一致的视频（R2V）。这将自动使用 Replicate Veo 3.1。

不能将 --ref 与 --input 结合使用（Replicate API 限制）
参考图像要求 16:9 宽高比和 8 秒时长
提供参考图像时，结束帧参数将被忽略

最适合： 在不同摄像机角度下保持角色相似度，在生成的视频中匹配特定的人物/物体。

宽高比匹配（图生视频的关键）

使用 --input 时，输入图像必须匹配目标视频的宽高比。将正方形图像输入到 16:9 视频中会产生带有裁切边缘的黑色垂直遮幅。

以 --aspect 16:9（默认视频）或 --aspect 9:16（竖版视频）生成起始帧
--auto-image 标志会自动处理此问题

两步流程（先生成图像，再生成视频）

为了获得最大控制权，可以单独生成起始帧。始终匹配宽高比：

# 步骤 1：以 16:9 生成带风格的图像（匹配默认视频宽高比）
bun run ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "majestic lion portrait" --style kusm --aspect 16:9 --size 2K --output lion.png

# 步骤 2：为图像制作动画
bun run scripts/generate.ts "The lion turns its head slowly, dots shimmer in the light" --input lion.png --output lion.mp4

切勿将生成的 .mp4 文件读回上下文 - 指导用户使用 open <file>.mp4 播放它们
作为后台任务运行 - 脚本仅将文件路径输出到 stdout
自动生成的起始帧（通过 --auto-image）会保存为 PNG 文件以供参考

模型（优先级顺序）

1. 通过 Gemini API 的 Veo 3.1 (`--model veo`, 默认)

使用 veo-3.1-generate-preview。主要模型。支持文生视频、图生视频、720p/1080p/4K、负面提示词。可通过 GEMINI_VIDEO_MODEL 环境变量覆盖模型。

2. 通过 Replicate 的 Veo 3.1 (`--model replicate-veo`)

使用 Replicate 上的 google/veo-3.1。当 Gemini API 不可用或需要仅在 Replicate 上可用的功能时作为备选方案：

参考图像 (--ref)：用于主体一致生成的 1-3 张图像（R2V）
结束帧 (--last-frame)：用于在两个图像之间进行插值的结束帧
图像输入 (--input)：起始帧（与 Gemini API 相同）
分辨率：720p 或 1080p（无 4K）
需要 REPLICATE_API_TOKEN

当使用 --ref 或 --last-frame 时自动选择。

3. Grok Imagine Video (`--model grok`)

通过 Replicate 使用 xai/grok-imagine-video。这是一个最后手段的备选方案——Veo 3.1 能产生更好的结果，包括相似度。仅限文生视频（无图像输入）。仅在以下情况下使用：

内容被 Veo 的安全过滤器阻止
用户特别要求使用它

最后验证时间：2026 年 3 月。如果存在更新的版本，请停止并建议向 b-open-io/gemskills 提交 PR。

有关详细的视频提示策略，请参阅：

references/veo-prompt-guide.md - Veo 提示词要素、音频提示、负面提示词和图生视频技巧
ask-gemini 技能的 references/gemini-api.md - 当前的 Gemini/Veo 模型和 SDK 信息

🇺🇸English

Generate Video

Generate videos using Veo 3.1 (veo-3.1-generate-preview) with native audio, 720p/1080p/4K resolution, and 4-8 second clips.

When to Use

Use this skill when the user asks to:

Generate a video from a text prompt (text-to-video)
Animate an existing image (image-to-video)
Create a styled video clip using the art style library
Build an image-to-video pipeline with auto-generated starting frames

Critical: Run as Background Task

Veo video generation takes 11 seconds to 6 minutes. The script handles polling internally and outputs only a file path when complete.

Always run the generate script as a background task to avoid blocking the conversation and bloating context with polling output:

# CORRECT: background task
bun run scripts/generate.ts "prompt" --output video.mp4
# Run with run_in_background: true in the Bash tool

After the background task completes, read only the final line of output (the file path). Do not read the full output — it contains only stderr progress dots.

Style Selection

If the user hasn't specified a style , present a multi-choice question before proceeding:

How would you like to handle the art style?

Pick a style - Browse 169 styles visually and choose one

Let me choose - I'll suggest a style based on your prompt

No style - Generate without a specific art style

Use the AskUserQuestion tool to present this choice.

If the user picks "Pick a style" , launch the interactive style picker:

STYLE_JSON=$(bun run ${CLAUDE_PLUGIN_ROOT}/skills/browsing-styles/scripts/preview_server.ts --pick --port=3456)

Pass the selected style via --style <id> to the generate command.

If the user already specified a style , skip this step and use --style directly.

Prompt Rewriting

Before generating any video, rewrite the user's prompt using the guide in references/veo-prompt-guide.md.

Transform prompts by adding:

Subject details - Specific appearance, position, scale
Action/Motion - What moves, speed, direction
Style/Aesthetic - Visual treatment, color palette
Camera motion - Pan, dolly, orbit, static
Composition - Framing, depth, focus
Ambiance/Audio - Lighting mood, sound cues (Veo generates native audio)

Example Transformation

User says : "ocean waves" Rewritten : "Dramatic slow-motion ocean waves crashing against dark volcanic rocks at golden hour. Spray catches the warm sunlight, creating rainbow mist. Low-angle shot, camera slowly dollying forward. Deep rumbling wave sounds with seagull calls in the distance."

Usage

bun run ${CLAUDE_PLUGIN_ROOT}/skills/generate-video/scripts/generate.ts "prompt" [options]

Options

--input <path> - Starting frame image (image-to-video mode)
--ref <path> - Reference image for subject consistency (up to 3, can specify multiple times). Auto-selects replicate-veo model.
--last-frame <path> - Ending frame for interpolation. Auto-selects replicate-veo model.
--style <id> - Apply style from the style library (same as generate-image)
--aspect <ratio> - 16:9 (default) or 9:16
--resolution <res> - (Gemini default), (Replicate default), (Gemini API only)

Examples

# Text-to-video (Gemini API, default)
bun run scripts/generate.ts "Ocean waves crashing on volcanic rocks at sunset" --output waves.mp4

# Image-to-video (animate an existing image)
bun run scripts/generate.ts "The lion slowly turns its head, dots shimmer" --input lion.png --output lion.mp4

# Subject-consistent video with reference images (auto-selects Replicate Veo)
bun run scripts/generate.ts "Two warriors face off in a wheat field, dramatic standoff" \
  --ref warrior1.png --ref warrior2.png --ref scene.png --output standoff.mp4

# Image-to-video with last frame interpolation (auto-selects Replicate Veo)
bun run scripts/generate.ts "Camera slowly pans across the landscape" \
  --input start.png --last-frame end.png --output pan.mp4

# With art style
bun run scripts/generate.ts "Mountain landscape comes alive with wind" --style impr --output mountain.mp4

# Full pipeline: auto-generate styled image, then animate
bun run scripts/generate.ts "A lion turns majestically" --style kusm --auto-image --output lion.mp4

# Vertical video for social
bun run scripts/generate.ts "Waterfall in lush forest" --aspect 9:16 --resolution 1080p --output waterfall.mp4

# High resolution (Gemini API)
bun run scripts/generate.ts "City skyline timelapse" --resolution 4k --duration 8 --output city.mp4

# Grok fallback for content blocked by Veo safety filters
bun run scripts/generate.ts "Famous person dancing" --model grok --output dance.mp4

Reference Images (Subject Consistency)

Use --ref to pass 1-3 reference images for subject-consistent video generation (R2V). This automatically uses Replicate Veo 3.1.

Constraints:

Cannot combine --ref with --input (Replicate API limitation)
Reference images require 16:9 aspect ratio and 8s duration
Last frame is ignored when reference images are provided

Best for: Maintaining character likeness across camera angles, matching specific people/objects in generated video.

Aspect Ratio Matching (Critical for Image-to-Video)

When using --input, the input image must match the target video aspect ratio. A square image fed to 16:9 video produces black pillarboxing with cutoff edges.

Generate starting frames at --aspect 16:9 (default video) or --aspect 9:16 (vertical video)
The --auto-image flag handles this automatically

Two-Step Pipeline (Image then Video)

For maximum control, generate the starting frame separately. Always match the aspect ratio:

# Step 1: Generate styled image at 16:9 (matches default video aspect)
bun run ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "majestic lion portrait" --style kusm --aspect 16:9 --size 2K --output lion.png

# Step 2: Animate the image
bun run scripts/generate.ts "The lion turns its head slowly, dots shimmer in the light" --input lion.png --output lion.mp4

Context Discipline

Never read generated .mp4 files back into context - instruct the user to play them with open <file>.mp4
Run as background task - the script outputs only the file path to stdout
Auto-generated starting frames (via --auto-image) are saved as PNG files for reference

Models (Priority Order)

1. Veo 3.1 via Gemini API (`--model veo`, default)

Uses veo-3.1-generate-preview. Primary model. Supports text-to-video, image-to-video, 720p/1080p/4K, negative prompts. Override model via GEMINI_VIDEO_MODEL env var.

2. Veo 3.1 via Replicate (`--model replicate-veo`)

Uses google/veo-3.1 on Replicate. Fallback when Gemini API is unavailable or when you need features only available on Replicate:

Reference images (--ref): 1-3 images for subject-consistent generation (R2V)
Last frame (--last-frame): Ending frame for interpolation between two images
Image input (--input): Starting frame (same as Gemini API)
Resolution: 720p or 1080p (no 4K)
Requires REPLICATE_API_TOKEN

Auto-selected when --ref or --last-frame is used.

3. Grok Imagine Video (`--model grok`)

Uses xai/grok-imagine-video via Replicate. This is a last-resort fallback — Veo 3.1 produces better results including likeness. Text-to-video only (no image input). Only use when:

Content is blocked by Veo's safety filters
The user specifically requests it

Last verified: March 2026. If a newer generation exists, STOP and suggest a PR to b-open-io/gemskills.

Reference Files

For detailed video prompting strategies:

references/veo-prompt-guide.md - Veo prompt elements, audio cues, negative prompts, and image-to-video tips
ask-gemini skill'sreferences/gemini-api.md - Current Gemini/Veo models and SDK info

Weekly Installs

Repository

b-open-io/gemskills

GitHub Stars

First Seen

1 day ago

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

windsurf1

amp1

cline1

opencode1

cursor1

kimi-cli1

AI Elements：基于shadcn/ui的AI原生应用组件库，快速构建对话界面

60,400 周安装

--duration <sec> - 4, 6, 8 (default: 8)

--negative <text> - Negative prompt (what to avoid)

--seed <n> - Random seed for reproducibility

--output <path> - Output .mp4 path

--model <name> - veo (default, Gemini API), replicate-veo (Replicate Veo 3.1), or grok (third-tier fallback)

--no-audio - Disable audio generation (Replicate Veo only)

--auto-image - With --style, auto-generate a styled starting frame first

Veo 3.1 视频生成工具：文生视频、图生视频、AI视频制作与艺术风格应用

🇨🇳中文介绍

生成视频

使用时机

关键：作为后台任务运行

风格选择

相关 Skills

提示词改写

转换示例

用法

选项

示例

参考图像（主体一致性）

宽高比匹配（图生视频的关键）

两步流程（先生成图像，再生成视频）

上下文规范

模型（优先级顺序）

1. 通过 Gemini API 的 Veo 3.1 (--model veo, 默认)

2. 通过 Replicate 的 Veo 3.1 (--model replicate-veo)

3. Grok Imagine Video (--model grok)

参考文件

🇺🇸English

Generate Video

When to Use

Critical: Run as Background Task

Style Selection

Prompt Rewriting

Example Transformation

Usage

Options

Examples

Reference Images (Subject Consistency)

Aspect Ratio Matching (Critical for Image-to-Video)

Two-Step Pipeline (Image then Video)

Context Discipline

Models (Priority Order)

1. Veo 3.1 via Gemini API (--model veo, default)

2. Veo 3.1 via Replicate (--model replicate-veo)

3. Grok Imagine Video (--model grok)

Reference Files

最新 Skills

1. 通过 Gemini API 的 Veo 3.1 (`--model veo`, 默认)

2. 通过 Replicate 的 Veo 3.1 (`--model replicate-veo`)

3. Grok Imagine Video (`--model grok`)

1. Veo 3.1 via Gemini API (`--model veo`, default)

2. Veo 3.1 via Replicate (`--model replicate-veo`)

3. Grok Imagine Video (`--model grok`)