ComfyUI视频生成管线：Wan/FramePack/AnimateDiff三引擎选择指南与优化教程

comfyui-video-pipeline by mckruz/comfyui-expert

185 周安装量

35 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/mckruz/comfyui-expert --skill comfyui-video-pipeline

AI/机器学习内容创作计算机视觉

🇨🇳中文介绍

ComfyUI 视频生成管线

协调三个引擎进行视频生成，根据需求和可用资源选择最佳引擎。

引擎选择

VIDEO REQUEST
    |
    |-- 需要电影级质量？
    |   |-- 是 + 24GB+ 显存 → Wan 2.2 MoE 14B
    |   |-- 是 + 8GB 显存 → Wan 2.2 1.3B
    |
    |-- 需要长视频（>10 秒）？
    |   |-- 是 → FramePack（6GB 显存可生成 60 秒）
    |
    |-- 需要快速迭代？
    |   |-- 是 → AnimateDiff Lightning（4-8 步）
    |
    |-- 需要相机/运动控制？
    |   |-- 是 → AnimateDiff V3 + Motion LoRAs
    |
    |-- 需要首尾帧控制？
    |   |-- 是 → Wan 2.2 MoE（独家功能）
    |
    |-- 默认 → Wan 2.2（最佳通用质量）

管线 1：Wan 2.2 MoE（最高质量）

图生视频

前提条件：

wan2.1_i2v_720p_14b_bf16.safetensors 放入 models/diffusion_models/
umt5_xxl_fp8_e4m3fn_scaled.safetensors 放入 models/clip/

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

参数	值	备注
分辨率	1280x720（横屏）或 720x1280（竖屏）	原生训练分辨率
帧数	81（16fps 下约 5 秒）	4 的倍数 + 1
步数	30-50	越高 = 质量越好
CFG	5-7
采样器	uni_pc	Wan 推荐
调度器	normal

时长	帧数（16fps）
1 秒	17
3 秒	49
5 秒	81
10 秒	161

首尾帧控制（Wan 2.2 独有）

Wan 2.2 MoE 允许指定第一帧和最后一帧，实现精确的视频规划：

生成两张角色一致的关键图像
使用第一张作为起始帧，第二张作为结束帧
Wan 在它们之间插值生成运动

管线 2：FramePack（长视频，低显存）

显存使用量与视频长度无关——仅需 6GB 显存即可生成 30fps 的 60 秒视频。

动态上下文压缩：1536 个标记用于关键帧，192 个用于过渡帧
带反向生成的双向记忆防止漂移
使用上下文窗口逐帧生成

参数	值	备注
分辨率	640x384 到 1280x720	取决于显存
时长	最长 60 秒	显存无关
质量	高（与 Wan 相当）	使用相同的基础模型

视频长度超过 10 秒
显存有限的系统（但 RTX 5090 不需要这个）
当需要显存进行并行操作时
批量视频生成

管线 3：AnimateDiff V3（快速，可控）

用于相机控制的 Motion LoRAs（平移、缩放、倾斜、旋转）
特效 LoRAs（破碎、烟雾、爆炸、液体）
滑动上下文窗口实现无限长度
使用 Lightning 模型非常快（4-8 步）

参数	值（标准）	值（Lightning）
运动模块	`v3_sd15_mm.ckpt`	`animatediff_lightning_4step.safetensors`
步数	20-25	4-8
CFG	7-8	1.5-2.0
采样器	euler_ancestral	lcm
分辨率	512x512	512x512
上下文长度	16	16
上下文重叠	4	4

LoRA	运动
v2_lora_ZoomIn	相机放大
v2_lora_ZoomOut	相机缩小
v2_lora_PanLeft	相机左移
v2_lora_PanRight	相机右移
v2_lora_TiltUp	相机上仰
v2_lora_TiltDown	相机下俯
v2_lora_RollingClockwise	相机顺时针旋转

在任何视频生成之后：

1. 帧插值（RIFE）

将帧数加倍或四倍，以获得更流畅的运动：

输入（16fps） → RIFE 2x → 输出（32fps）
输入（16fps） → RIFE 4x → 输出（64fps）

使用 rife47 或 rife49 模型。

2. 面部增强（如果是角色视频）

对每一帧应用 FaceDetailer：

denoise: 0.3-0.4（低于图像设置 - 保持时间一致性）
guide_size: 384（视频速度优化）
detection_model: face_yolov8m.pt

3. 去闪烁（如果需要）

减少帧之间的时间不一致性。

保持跨帧一致的色彩分级。

通过 VHS Video Combine 最终输出：

frame_rate: 16（原生）或 24/30（插值后）
format: "video/h264-mp4"
crf: 19（高质量）到 23（文件更小）

用于角色对话的完整管线：

1. 生成音频 → comfyui-voice-pipeline
2. 生成基础视频 → 本技能（Wan I2V 或 AnimateDiff）
   - 提示词："{角色}，自然地说话，轻微的头部移动"
   - 时长：匹配音频长度
3. 应用唇形同步 → Wav2Lip 或 LatentSync
4. 增强面部 → FaceDetailer + CodeFormer
5. 最终输出 → video-assembly

在将视频标记为完成之前：

角色身份在帧之间保持一致
无闪烁或时间伪影
运动看起来自然（不卡顿或冻结）
如果是角色视频，已应用面部增强
帧率流畅（交付时 24+ fps）
音频同步（如果是说话头像）
分辨率符合交付目标

references/workflows.md - Wan 和 AnimateDiff 的工作流模板
references/models.md - 视频模型下载链接
references/research-log.md - 最新的视频生成进展
state/inventory.json - 可用的视频模型

🇺🇸English

ComfyUI Video Pipeline

Orchestrates video generation across three engines, selecting the best one based on requirements and available resources.

Engine Selection

VIDEO REQUEST
    |
    |-- Need film-level quality?
    |   |-- Yes + 24GB+ VRAM → Wan 2.2 MoE 14B
    |   |-- Yes + 8GB VRAM → Wan 2.2 1.3B
    |
    |-- Need long video (>10 seconds)?
    |   |-- Yes → FramePack (60 seconds on 6GB)
    |
    |-- Need fast iteration?
    |   |-- Yes → AnimateDiff Lightning (4-8 steps)
    |
    |-- Need camera/motion control?
    |   |-- Yes → AnimateDiff V3 + Motion LoRAs
    |
    |-- Need first+last frame control?
    |   |-- Yes → Wan 2.2 MoE (exclusive feature)
    |
    |-- Default → Wan 2.2 (best general quality)

Pipeline 1: Wan 2.2 MoE (Highest Quality)

Image-to-Video

Prerequisites:

wan2.1_i2v_720p_14b_bf16.safetensors in models/diffusion_models/
umt5_xxl_fp8_e4m3fn_scaled.safetensors in models/clip/
open_clip_vit_h_14.safetensors in models/clip_vision/
wan_2.1_vae.safetensors in models/vae/

Settings:

Parameter	Value	Notes
Resolution	1280x720 (landscape) or 720x1280 (portrait)	Native training resolution
Frames	81 (~5 seconds at 16fps)	Multiples of 4 + 1
Steps	30-50	Higher = better quality
CFG	5-7
Sampler	uni_pc	Recommended for Wan
Scheduler	normal

Frame count guide:

Duration	Frames (16fps)
1 second	17
3 seconds	49
5 seconds	81
10 seconds	161

VRAM optimization:

FP8 quantization: halves VRAM with minimal quality loss
SageAttention: faster attention computation
Reduce frames if OOM

Text-to-Video

Same as I2V but uses wan2.1_t2v_14b_bf16.safetensors and EmptySD3LatentImage instead of image conditioning.

First+Last Frame Control (Wan 2.2 Exclusive)

Wan 2.2 MoE allows specifying both the first and last frame, enabling precise video planning:

Generate two hero images with consistent character
Use first as start frame, second as end frame
Wan interpolates the motion between them

Pipeline 2: FramePack (Long Videos, Low VRAM)

Key Innovation

VRAM usage is invariant to video length - generates 60-second videos at 30fps on just 6GB VRAM.

How it works:

Dynamic context compression: 1536 markers for key frames, 192 for transitions
Bidirectional memory with reverse generation prevents drift
Frame-by-frame generation with context window

Settings

Parameter	Value	Notes
Resolution	640x384 to 1280x720	Depends on VRAM
Duration	Up to 60 seconds	VRAM-invariant
Quality	High (comparable to Wan)	Uses same base models

When to Use

Videos longer than 10 seconds
Limited VRAM systems (but RTX 5090 doesn't need this)
When VRAM is needed for parallel operations
Batch video generation

Pipeline 3: AnimateDiff V3 (Fast, Controllable)

Strengths

Motion LoRAs for camera control (pan, zoom, tilt, roll)
Effect LoRAs (shatter, smoke, explosion, liquid)
Sliding context window for infinite length
Very fast with Lightning model (4-8 steps)

Settings

Parameter	Value (Standard)	Value (Lightning)
Motion Module	`v3_sd15_mm.ckpt`	`animatediff_lightning_4step.safetensors`
Steps	20-25	4-8
CFG	7-8	1.5-2.0
Sampler	euler_ancestral	lcm
Resolution	512x512	512x512
Context Length	16	16
Context Overlap	4	4

Camera Motion LoRAs

LoRA	Motion
v2_lora_ZoomIn	Camera zooms in
v2_lora_ZoomOut	Camera zooms out
v2_lora_PanLeft	Camera pans left
v2_lora_PanRight	Camera pans right
v2_lora_TiltUp	Camera tilts up
v2_lora_TiltDown	Camera tilts down
v2_lora_RollingClockwise	Camera rolls clockwise

Post-Processing Pipeline

After any video generation:

1. Frame Interpolation (RIFE)

Doubles or quadruples frame count for smoother motion:

Input (16fps) → RIFE 2x → Output (32fps)
Input (16fps) → RIFE 4x → Output (64fps)

Use rife47 or rife49 model.

2. Face Enhancement (if character video)

Apply FaceDetailer to each frame:

denoise: 0.3-0.4 (lower than image - preserves temporal consistency)
guide_size: 384 (speed optimization for video)
detection_model: face_yolov8m.pt

3. Deflicker (if needed)

Reduces temporal inconsistencies between frames.

4. Color Correction

Maintain consistent color grading across frames.

5. Video Combine

Final output via VHS Video Combine:

frame_rate: 16 (native) or 24/30 (after interpolation)
format: "video/h264-mp4"
crf: 19 (high quality) to 23 (smaller file)

Talking Head Pipeline

Complete pipeline for character dialogue:

1. Generate audio → comfyui-voice-pipeline
2. Generate base video → This skill (Wan I2V or AnimateDiff)
   - Prompt: "{character}, talking naturally, slight head movement"
   - Duration: match audio length
3. Apply lip-sync → Wav2Lip or LatentSync
4. Enhance faces → FaceDetailer + CodeFormer
5. Final output → video-assembly

Quality Checklist

Before marking video as complete:

Character identity consistent across frames
No flickering or temporal artifacts
Motion looks natural (not jerky or frozen)
Face enhancement applied if character video
Frame rate is smooth (24+ fps for delivery)
Audio synced (if talking head)
Resolution matches delivery target

Reference

references/workflows.md - Workflow templates for Wan and AnimateDiff
references/models.md - Video model download links
references/research-log.md - Latest video generation advances
state/inventory.json - Available video models

Weekly Installs

107

Repository

mckruz/comfyui-expert

GitHub Stars

First Seen

Feb 24, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

gemini-cli104

codex104

kimi-cli104

cursor104

opencode104

github-copilot104

AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具

49,000 周安装

ComfyUI视频生成管线：Wan/FramePack/AnimateDiff三引擎选择指南与优化教程

🇨🇳中文介绍

ComfyUI 视频生成管线

引擎选择

管线 1：Wan 2.2 MoE（最高质量）

图生视频

相关 Skills

文生视频

首尾帧控制（Wan 2.2 独有）

管线 2：FramePack（长视频，低显存）

关键创新

设置

使用时机

管线 3：AnimateDiff V3（快速，可控）

优势

设置

相机运动 LoRAs

后处理管线

1. 帧插值（RIFE）

2. 面部增强（如果是角色视频）

3. 去闪烁（如果需要）

4. 色彩校正

5. 视频合并

说话头像管线

质量检查清单

参考

🇺🇸English

ComfyUI Video Pipeline

Engine Selection

Pipeline 1: Wan 2.2 MoE (Highest Quality)

Image-to-Video

Text-to-Video

First+Last Frame Control (Wan 2.2 Exclusive)

Pipeline 2: FramePack (Long Videos, Low VRAM)

Key Innovation

Settings

When to Use

Pipeline 3: AnimateDiff V3 (Fast, Controllable)

Strengths

Settings

Camera Motion LoRAs

Post-Processing Pipeline

1. Frame Interpolation (RIFE)

2. Face Enhancement (if character video)

3. Deflicker (if needed)

4. Color Correction

5. Video Combine

Talking Head Pipeline

Quality Checklist

Reference

最新 Skills