realistic-ugc-video by dennisonbertram/claude-media-skills
npx skills add https://github.com/dennisonbertram/claude-media-skills --skill realistic-ugc-video创建外观和声音都真实如人类的长篇 AI 视频。此技能通过编排多步骤工作流程来实现,使用 Nano Banana 生成真实的基础图像,使用 Kling AI 进行视频生成,并采用特定技术来避免"AI 感"。
| 问题 | 解决方案 |
|---|---|
| 皮肤过于完美/干净 | 添加瑕疵:微孔、自然油光、细纹 |
| 影棚灯光 | 使用可用/自然光、混合色温 |
| 角色过于静止 | 添加微动作、头部倾斜、自然晃动 |
| 节奏不一致 | 每个片段使用 55-60 个音节 |
| 机械化的声音 | 通过 Adobe Podcast 或 Resemble AI 进行处理 |
| 明显的跳切 | 用 B-roll 或动画覆盖 |
| 奇怪的 AI 手部 | 将手部裁剪出画面或保持完全静止 |
AI 视频模型(Kling, Veo 等)难以生成真实的手部动作。手指会变形,手势看起来不自然,手部通常是最大的破绽。
最佳实践:将手部保持在画面外或完全静止。
选项:
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
开始前,从用户处收集:
使用 Nano Banana 技能生成角色图像。关键: 应用来自 CHARACTER-PROMPTING.md 的瑕疵技术。
真实感 UGC 的关键要素:
命令:
~/.claude/skills/nano-banana/scripts/generate.sh "[enhanced prompt]" --aspect 9:16 --size 2K
然后可以选择通过 Enhancor AI 进行放大以增加纹理。
这是实现自然节奏的最重要步骤。参见 SCRIPT-CHUNKING.md。
55-60 音节规则:
分割示例:
Chunk 1 (58 syllables):
"Hey everyone, I wanted to share something that completely changed how I think about productivity. It's not another app or system."
Chunk 2 (56 syllables):
"It's actually about understanding your own energy patterns throughout the day. Once I figured this out, everything clicked into place."
为每个脚本片段生成一个 10 秒的视频片段。使用 Kling AI 技能,并应用来自 MOVEMENT-PROMPTING.md 的动作提示词。
关键要素:
为每个片段生成后台代理:
Task tool:
- subagent_type: "general-purpose"
- run_in_background: true
- prompt: [Include image URL, script chunk, movement prompt, output path]
详细指南请参见 POST-PRODUCTION.md。
在生成完整视频之前,先用一个片段测试工作流程:
A vertical 9:16 UGC-style video frame captured on an iPhone 11 resting on a tripod.
Medium-wide portrait at true eye-level with slightly forward-leaning posture.
[CHARACTER]: A [age] [gender] with [ethnicity] complexion, [eye color] eyes beneath
[eyebrow description], [nose description]. [Jawline/facial hair]. [Hair description].
Expression is [emotion]—[specific expression details].
[CLOTHING]: [Fitted/casual garment], [collar detail].
[SKIN TEXTURE]: Visible pores across T-zone, faint smile lines, natural oils catching
light on forehead and nose. [Age-appropriate details]. No filter, no foundation.
[FOREGROUND]: Hands rest naturally on [surface], fingers relaxed, visible veins and
knuckle texture. Nearby: [everyday objects like water bottle, phone, notebook].
[CAMERA]: Native iPhone 11 lens (26mm equivalent), slightly wide perspective, mild
barrel softness at edges. Only tiny pockets of neural blur around hair edges.
[LIGHTING]: Available light mix—cool overcast daylight from window left, warm tungsten
from desk lamp right. Soft asymmetric shadows, natural falloff. ISO noise 500-900.
[BACKGROUND]: [Realistic home/office elements]—bookshelf, [furniture], clearly visible
not heavily blurred.
[REALITY DETAILS]: Gentle 35mm film grain, light fingerprint smudge on lens, tiny dust
haze in air. No cinematic bloom, no studio finish.
Styling: raw UGC realism, available indoor light, mixed color temperature, minimal
depth blur, visible ISO noise, emphasis on authenticity.
Hand "Home Base" Protocol: Hands default to Active Idle. Fingers shift, thumbs rub,
wrists rotate slightly while anchored. Gestures only for key emphasis.
[0.0s-0.5s] Pre-roll: Sharp inhale, eyes lock to lens, head still
[0.5s-3.0s] Hands in Active Idle (fingers interlocked), head tilts slightly right,
brows furrow in seriousness
[3.0s-6.0s] Hands break clasp for quick open-palm rotation then return, head drifts
forward, natural blink
[6.0s-8.0s] Hands return to Active Idle (loose clasp, thumbs tapping), head nods
encouragingly, cheeks lift in natural smile
[8.0s-10.0s] Hands anchored (wrist shifts), chin lifts in quick final nod, natural blink
[Script]: "[CHUNK TEXT HERE]"
[Tone]: [Urgent/Conversational/Professional/etc.]
[Pacing]: Rapid fire delivery, high energy, viral UGC style, confident, 2x speed
对于更简单的长篇视频,可以考虑 InfiniteTalk (infinitetalk.ai)。
优点: 单次生成即可获得较长视频 缺点: 对节奏控制较少(无计时器/时长控制),按输出长度收费
当需要精确的节奏控制时,请使用上述的音节方法。
每周安装次数
52
代码库
GitHub 星标数
2
首次出现
2026年1月29日
安全审计
安装于
opencode44
codex42
gemini-cli41
github-copilot39
cursor37
amp33
Create long-form AI videos that look and sound authentically human. This skill orchestrates a multi-step workflow using Nano Banana for realistic base images and Kling AI for video generation, with specific techniques to avoid the "AI look".
| Problem | Solution |
|---|---|
| Too perfect/clean skin | Add imperfections: micro-pores, natural oils, fine lines |
| Studio lighting | Use available/natural light, mixed color temps |
| Character too still | Add micro-movements, head tilts, natural sway |
| Inconsistent pacing | Use 55-60 syllables per clip |
| Robotic voice | Process through Adobe Podcast or Resemble AI |
| Obvious jump cuts | Cover with B-roll or animations |
| Weird AI hands | Crop hands out of frame or keep completely static |
AI video models (Kling, Veo, etc.) struggle with realistic hand movement. Fingers morph, gestures look unnatural, and hands are often the biggest tell.
Best practice: Keep hands OUT of frame or completely static.
Options:
Before starting, gather from the user:
Use the Nano Banana skill to generate the character image. Critical: Apply the imperfection techniques from CHARACTER-PROMPTING.md.
Key elements for realistic UGC:
Command:
~/.claude/skills/nano-banana/scripts/generate.sh "[enhanced prompt]" --aspect 9:16 --size 2K
Then optionally upscale through Enhancor AI for additional texture.
This is the most important step for natural pacing. See SCRIPT-CHUNKING.md.
The 55-60 Syllable Rule:
Example chunking:
Chunk 1 (58 syllables):
"Hey everyone, I wanted to share something that completely changed how I think about productivity. It's not another app or system."
Chunk 2 (56 syllables):
"It's actually about understanding your own energy patterns throughout the day. Once I figured this out, everything clicked into place."
For each script chunk, generate a 10-second video clip. Use the Kling AI skill with movement prompts from MOVEMENT-PROMPTING.md.
Key elements:
Spawn background agent for each clip:
Task tool:
- subagent_type: "general-purpose"
- run_in_background: true
- prompt: [Include image URL, script chunk, movement prompt, output path]
See POST-PRODUCTION.md for detailed guidance.
Before generating full video, test the workflow with one clip:
A vertical 9:16 UGC-style video frame captured on an iPhone 11 resting on a tripod.
Medium-wide portrait at true eye-level with slightly forward-leaning posture.
[CHARACTER]: A [age] [gender] with [ethnicity] complexion, [eye color] eyes beneath
[eyebrow description], [nose description]. [Jawline/facial hair]. [Hair description].
Expression is [emotion]—[specific expression details].
[CLOTHING]: [Fitted/casual garment], [collar detail].
[SKIN TEXTURE]: Visible pores across T-zone, faint smile lines, natural oils catching
light on forehead and nose. [Age-appropriate details]. No filter, no foundation.
[FOREGROUND]: Hands rest naturally on [surface], fingers relaxed, visible veins and
knuckle texture. Nearby: [everyday objects like water bottle, phone, notebook].
[CAMERA]: Native iPhone 11 lens (26mm equivalent), slightly wide perspective, mild
barrel softness at edges. Only tiny pockets of neural blur around hair edges.
[LIGHTING]: Available light mix—cool overcast daylight from window left, warm tungsten
from desk lamp right. Soft asymmetric shadows, natural falloff. ISO noise 500-900.
[BACKGROUND]: [Realistic home/office elements]—bookshelf, [furniture], clearly visible
not heavily blurred.
[REALITY DETAILS]: Gentle 35mm film grain, light fingerprint smudge on lens, tiny dust
haze in air. No cinematic bloom, no studio finish.
Styling: raw UGC realism, available indoor light, mixed color temperature, minimal
depth blur, visible ISO noise, emphasis on authenticity.
Hand "Home Base" Protocol: Hands default to Active Idle. Fingers shift, thumbs rub,
wrists rotate slightly while anchored. Gestures only for key emphasis.
[0.0s-0.5s] Pre-roll: Sharp inhale, eyes lock to lens, head still
[0.5s-3.0s] Hands in Active Idle (fingers interlocked), head tilts slightly right,
brows furrow in seriousness
[3.0s-6.0s] Hands break clasp for quick open-palm rotation then return, head drifts
forward, natural blink
[6.0s-8.0s] Hands return to Active Idle (loose clasp, thumbs tapping), head nods
encouragingly, cheeks lift in natural smile
[8.0s-10.0s] Hands anchored (wrist shifts), chin lifts in quick final nod, natural blink
[Script]: "[CHUNK TEXT HERE]"
[Tone]: [Urgent/Conversational/Professional/etc.]
[Pacing]: Rapid fire delivery, high energy, viral UGC style, confident, 2x speed
For simpler long-form videos, consider InfiniteTalk (infinitetalk.ai).
Pros: Single generation for longer videos Cons: Less control over pacing (no timer/duration control), charged by output length
Use the syllable method above when precise pacing control is needed.
Weekly Installs
52
Repository
GitHub Stars
2
First Seen
Jan 29, 2026
Security Audits
Gen Agent Trust HubFailSocketPassSnykWarn
Installed on
opencode44
codex42
gemini-cli41
github-copilot39
cursor37
amp33
AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具
50,900 周安装