generate-video by b-open-io/gemskills
npx skills add https://github.com/b-open-io/gemskills --skill generate-video使用 Veo 3.1 (veo-3.1-generate-preview) 生成带有原生音频、720p/1080p/4K 分辨率、4-8 秒片段的视频。
当用户要求以下操作时,请使用此技能:
Veo 视频生成需要 11 秒到 6 分钟。脚本内部处理轮询,仅在完成时输出文件路径。
始终将生成脚本作为后台任务运行,以避免阻塞对话,并防止轮询输出使上下文膨胀:
# 正确:后台任务
bun run scripts/generate.ts "prompt" --output video.mp4
# 在 Bash 工具中使用 run_in_background: true 运行
后台任务完成后,只读取输出的最后一行(文件路径)。不要读取完整输出——它只包含 stderr 进度点。
如果用户未指定风格,请在继续之前呈现一个多项选择题:
您希望如何处理艺术风格?
- 选择一种风格 - 可视化浏览 169 种风格并选择一种
- 让我选择 - 我将根据您的提示建议一种风格
- 无风格 - 不应用特定艺术风格进行生成
使用 AskUserQuestion 工具呈现此选择。
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
如果用户选择"选择一种风格",则启动交互式风格选择器:
STYLE_JSON=$(bun run ${CLAUDE_PLUGIN_ROOT}/skills/browsing-styles/scripts/preview_server.ts --pick --port=3456)
通过 --style <id> 将选定的风格传递给生成命令。
如果用户已经指定了风格,则跳过此步骤,直接使用 --style。
在生成任何视频之前,请根据 references/veo-prompt-guide.md 中的指南改写用户的提示词。
通过添加以下内容来转换提示词:
用户说:"海浪" 改写后:"黄金时段,戏剧性的慢动作海浪拍打着黑色的火山岩。浪花捕捉到温暖的阳光,形成彩虹般的薄雾。低角度拍摄,摄像机缓慢向前推近。远处传来深沉的海浪轰鸣声和海鸥的叫声。"
bun run ${CLAUDE_PLUGIN_ROOT}/skills/generate-video/scripts/generate.ts "prompt" [options]
--input <path> - 起始帧图像(图生视频模式)--ref <path> - 用于主体一致性的参考图像(最多 3 张,可多次指定)。自动选择 replicate-veo 模型。--last-frame <path> - 用于插值的结束帧。自动选择 replicate-veo 模型。--style <id> - 应用风格库中的风格(与 generate-image 相同)--aspect <ratio> - 16:9(默认)或 9:16--resolution <res> - 720p(Gemini 默认)、1080p(Replicate 默认)、4k(仅限 Gemini API)--duration <sec> - 4、6、8(默认:8)--negative <text> - 负面提示词(要避免的内容)--seed <n> - 用于可重复性的随机种子--output <path> - 输出 .mp4 文件路径--model <name> - veo(默认,Gemini API)、replicate-veo(Replicate Veo 3.1)或 grok(第三级备选方案)--no-audio - 禁用音频生成(仅限 Replicate Veo)--auto-image - 与 --style 一起使用时,首先自动生成带风格的起始帧# 文生视频(Gemini API,默认)
bun run scripts/generate.ts "Ocean waves crashing on volcanic rocks at sunset" --output waves.mp4
# 图生视频(为现有图像制作动画)
bun run scripts/generate.ts "The lion slowly turns its head, dots shimmer" --input lion.png --output lion.mp4
# 使用参考图像生成主体一致的视频(自动选择 Replicate Veo)
bun run scripts/generate.ts "Two warriors face off in a wheat field, dramatic standoff" \
--ref warrior1.png --ref warrior2.png --ref scene.png --output standoff.mp4
# 带有结束帧插值的图生视频(自动选择 Replicate Veo)
bun run scripts/generate.ts "Camera slowly pans across the landscape" \
--input start.png --last-frame end.png --output pan.mp4
# 使用艺术风格
bun run scripts/generate.ts "Mountain landscape comes alive with wind" --style impr --output mountain.mp4
# 完整流程:自动生成带风格的图像,然后制作动画
bun run scripts/generate.ts "A lion turns majestically" --style kusm --auto-image --output lion.mp4
# 用于社交媒体的竖版视频
bun run scripts/generate.ts "Waterfall in lush forest" --aspect 9:16 --resolution 1080p --output waterfall.mp4
# 高分辨率(Gemini API)
bun run scripts/generate.ts "City skyline timelapse" --resolution 4k --duration 8 --output city.mp4
# 当内容被 Veo 安全过滤器阻止时,使用 Grok 作为备选方案
bun run scripts/generate.ts "Famous person dancing" --model grok --output dance.mp4
使用 --ref 传递 1-3 张参考图像,用于生成主体一致的视频(R2V)。这将自动使用 Replicate Veo 3.1。
限制:
--ref 与 --input 结合使用(Replicate API 限制)最适合: 在不同摄像机角度下保持角色相似度,在生成的视频中匹配特定的人物/物体。
使用 --input 时,输入图像必须匹配目标视频的宽高比。将正方形图像输入到 16:9 视频中会产生带有裁切边缘的黑色垂直遮幅。
--aspect 16:9(默认视频)或 --aspect 9:16(竖版视频)生成起始帧--auto-image 标志会自动处理此问题为了获得最大控制权,可以单独生成起始帧。始终匹配宽高比:
# 步骤 1:以 16:9 生成带风格的图像(匹配默认视频宽高比)
bun run ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "majestic lion portrait" --style kusm --aspect 16:9 --size 2K --output lion.png
# 步骤 2:为图像制作动画
bun run scripts/generate.ts "The lion turns its head slowly, dots shimmer in the light" --input lion.png --output lion.mp4
open <file>.mp4 播放它们--auto-image)会保存为 PNG 文件以供参考--model veo, 默认)使用 veo-3.1-generate-preview。主要模型。支持文生视频、图生视频、720p/1080p/4K、负面提示词。可通过 GEMINI_VIDEO_MODEL 环境变量覆盖模型。
--model replicate-veo)使用 Replicate 上的 google/veo-3.1。当 Gemini API 不可用或需要仅在 Replicate 上可用的功能时作为备选方案:
--ref):用于主体一致生成的 1-3 张图像(R2V)--last-frame):用于在两个图像之间进行插值的结束帧--input):起始帧(与 Gemini API 相同)REPLICATE_API_TOKEN当使用 --ref 或 --last-frame 时自动选择。
--model grok)通过 Replicate 使用 xai/grok-imagine-video。这是一个最后手段的备选方案——Veo 3.1 能产生更好的结果,包括相似度。仅限文生视频(无图像输入)。仅在以下情况下使用:
最后验证时间:2026 年 3 月。如果存在更新的版本,请停止并建议向
b-open-io/gemskills提交 PR。
有关详细的视频提示策略,请参阅:
references/veo-prompt-guide.md - Veo 提示词要素、音频提示、负面提示词和图生视频技巧references/gemini-api.md - 当前的 Gemini/Veo 模型和 SDK 信息每周安装次数
1
代码仓库
GitHub 星标数
2
首次出现
1 天前
安全审计
安装于
windsurf1
amp1
cline1
opencode1
cursor1
kimi-cli1
Generate videos using Veo 3.1 (veo-3.1-generate-preview) with native audio, 720p/1080p/4K resolution, and 4-8 second clips.
Use this skill when the user asks to:
Veo video generation takes 11 seconds to 6 minutes. The script handles polling internally and outputs only a file path when complete.
Always run the generate script as a background task to avoid blocking the conversation and bloating context with polling output:
# CORRECT: background task
bun run scripts/generate.ts "prompt" --output video.mp4
# Run with run_in_background: true in the Bash tool
After the background task completes, read only the final line of output (the file path). Do not read the full output — it contains only stderr progress dots.
If the user hasn't specified a style , present a multi-choice question before proceeding:
How would you like to handle the art style?
- Pick a style - Browse 169 styles visually and choose one
- Let me choose - I'll suggest a style based on your prompt
- No style - Generate without a specific art style
Use the AskUserQuestion tool to present this choice.
If the user picks "Pick a style" , launch the interactive style picker:
STYLE_JSON=$(bun run ${CLAUDE_PLUGIN_ROOT}/skills/browsing-styles/scripts/preview_server.ts --pick --port=3456)
Pass the selected style via --style <id> to the generate command.
If the user already specified a style , skip this step and use --style directly.
Before generating any video, rewrite the user's prompt using the guide in references/veo-prompt-guide.md.
Transform prompts by adding:
User says : "ocean waves" Rewritten : "Dramatic slow-motion ocean waves crashing against dark volcanic rocks at golden hour. Spray catches the warm sunlight, creating rainbow mist. Low-angle shot, camera slowly dollying forward. Deep rumbling wave sounds with seagull calls in the distance."
bun run ${CLAUDE_PLUGIN_ROOT}/skills/generate-video/scripts/generate.ts "prompt" [options]
--input <path> - Starting frame image (image-to-video mode)--ref <path> - Reference image for subject consistency (up to 3, can specify multiple times). Auto-selects replicate-veo model.--last-frame <path> - Ending frame for interpolation. Auto-selects replicate-veo model.--style <id> - Apply style from the style library (same as generate-image)--aspect <ratio> - 16:9 (default) or 9:16--resolution <res> - (Gemini default), (Replicate default), (Gemini API only)# Text-to-video (Gemini API, default)
bun run scripts/generate.ts "Ocean waves crashing on volcanic rocks at sunset" --output waves.mp4
# Image-to-video (animate an existing image)
bun run scripts/generate.ts "The lion slowly turns its head, dots shimmer" --input lion.png --output lion.mp4
# Subject-consistent video with reference images (auto-selects Replicate Veo)
bun run scripts/generate.ts "Two warriors face off in a wheat field, dramatic standoff" \
--ref warrior1.png --ref warrior2.png --ref scene.png --output standoff.mp4
# Image-to-video with last frame interpolation (auto-selects Replicate Veo)
bun run scripts/generate.ts "Camera slowly pans across the landscape" \
--input start.png --last-frame end.png --output pan.mp4
# With art style
bun run scripts/generate.ts "Mountain landscape comes alive with wind" --style impr --output mountain.mp4
# Full pipeline: auto-generate styled image, then animate
bun run scripts/generate.ts "A lion turns majestically" --style kusm --auto-image --output lion.mp4
# Vertical video for social
bun run scripts/generate.ts "Waterfall in lush forest" --aspect 9:16 --resolution 1080p --output waterfall.mp4
# High resolution (Gemini API)
bun run scripts/generate.ts "City skyline timelapse" --resolution 4k --duration 8 --output city.mp4
# Grok fallback for content blocked by Veo safety filters
bun run scripts/generate.ts "Famous person dancing" --model grok --output dance.mp4
Use --ref to pass 1-3 reference images for subject-consistent video generation (R2V). This automatically uses Replicate Veo 3.1.
Constraints:
--ref with --input (Replicate API limitation)Best for: Maintaining character likeness across camera angles, matching specific people/objects in generated video.
When using --input, the input image must match the target video aspect ratio. A square image fed to 16:9 video produces black pillarboxing with cutoff edges.
--aspect 16:9 (default video) or --aspect 9:16 (vertical video)--auto-image flag handles this automaticallyFor maximum control, generate the starting frame separately. Always match the aspect ratio:
# Step 1: Generate styled image at 16:9 (matches default video aspect)
bun run ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "majestic lion portrait" --style kusm --aspect 16:9 --size 2K --output lion.png
# Step 2: Animate the image
bun run scripts/generate.ts "The lion turns its head slowly, dots shimmer in the light" --input lion.png --output lion.mp4
open <file>.mp4--auto-image) are saved as PNG files for reference--model veo, default)Uses veo-3.1-generate-preview. Primary model. Supports text-to-video, image-to-video, 720p/1080p/4K, negative prompts. Override model via GEMINI_VIDEO_MODEL env var.
--model replicate-veo)Uses google/veo-3.1 on Replicate. Fallback when Gemini API is unavailable or when you need features only available on Replicate:
--ref): 1-3 images for subject-consistent generation (R2V)--last-frame): Ending frame for interpolation between two images--input): Starting frame (same as Gemini API)REPLICATE_API_TOKENAuto-selected when --ref or --last-frame is used.
--model grok)Uses xai/grok-imagine-video via Replicate. This is a last-resort fallback — Veo 3.1 produces better results including likeness. Text-to-video only (no image input). Only use when:
Last verified: March 2026. If a newer generation exists, STOP and suggest a PR to
b-open-io/gemskills.
For detailed video prompting strategies:
references/veo-prompt-guide.md - Veo prompt elements, audio cues, negative prompts, and image-to-video tipsreferences/gemini-api.md - Current Gemini/Veo models and SDK infoWeekly Installs
1
Repository
GitHub Stars
2
First Seen
1 day ago
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
windsurf1
amp1
cline1
opencode1
cursor1
kimi-cli1
AI Elements:基于shadcn/ui的AI原生应用组件库,快速构建对话界面
60,400 周安装
720p1080p4k--duration <sec> - 4, 6, 8 (default: 8)--negative <text> - Negative prompt (what to avoid)--seed <n> - Random seed for reproducibility--output <path> - Output .mp4 path--model <name> - veo (default, Gemini API), replicate-veo (Replicate Veo 3.1), or grok (third-tier fallback)--no-audio - Disable audio generation (Replicate Veo only)--auto-image - With --style, auto-generate a styled starting frame first