Kling 3.0 提示词编写指南：电影化叙事与多模态视频生成技巧

kling-3-prompting by aedev-tools/kling-3-prompting-skill

132 周安装量

5 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/aedev-tools/kling-3-prompting-skill --skill kling-3-prompting

AI/机器学习内容创作提示工程

🇨🇳中文介绍

概述

Kling 3.0 是一个统一的多模态视频模型。它理解的是电影化叙事，而非关键词列表。请像导演一样编写提示词——描述观众随着时间推移所看到、听到和感受到的内容。

核心转变： 从描述转向叙事。思考“导演一个场景”而非“描述一张图片”。

交互式构建器工作流

当被调用时，使用 AskUserQuestion 引导用户完成以下步骤：

digraph builder {
  "1. 生成模式？" [shape=diamond];
  "文本到视频" [shape=box];
  "图像到视频" [shape=box];
  "多镜头序列" [shape=box];
  "关键帧过渡" [shape=box];
  "2. 收集场景细节" [shape=box];
  "3. 组装提示词" [shape=box];
  "4. 呈现与优化" [shape=box];

  "1. 生成模式？" -> "文本到视频";
  "1. 生成模式？" -> "图像到视频";
  "1. 生成模式？" -> "多镜头序列";
  "1. 生成模式？" -> "关键帧过渡";
  "文本到视频" -> "2. 收集场景细节";
  "图像到视频" -> "2. 收集场景细节";
  "多镜头序列" -> "2. 收集场景细节";
  "关键帧过渡" -> "2. 收集场景细节";
  "2. 收集场景细节" -> "3. 组装提示词";
  "3. 组装提示词" -> "4. 呈现与优化";
}

步骤 1：确定生成模式

询问用户选择哪种模式：

文本到视频 —— 从零开始编写提示词
图像到视频 —— 为参考图像制作动画
多镜头序列 —— 2-6 个镜头的故事板（最长 15 秒）
关键帧过渡 —— 起始帧 → 结束帧，带有插值运动

步骤 2：收集场景细节

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

元素	问题	重要性说明
主体	焦点是谁/什么？具体的外观细节？	锚定一致性——尽早定义显著特征
动作	发生了什么？描述时间线（首先 → 然后 → 最后）	Kling 3.0 擅长处理超过 15 秒弧线的连续动作
环境	在哪里？要具体（不是“一条街”，而是“狭窄的东京小巷，下水道口冒出蒸汽”）	将场景物理地固定下来
摄像机	镜头类型和运动？（参见下方的摄像机参考）	电影语言能产生更好的效果
灯光	有什么光源？具体命名它们	“闪烁的霓虹灯”比“戏剧性的灯光”更好
氛围/情感	观众应该感受到什么？	驱动色彩分级、节奏、音乐
音频	有对话吗？环境音？音乐？	Kling 3.0 生成原生音频 + 唇形同步
时长	多长？（3-15秒）	时长越长 = 描述随时间推移的进展
宽高比	16:9 / 9:16 / 1:1 / 21:9？	16:9 电影感，9:16 社交媒体，21:9 超宽屏

步骤 3：组装提示词

使用主公式：

[场景/环境] + [主体与外观] + [动作时间线] + [摄像机运动] + [音频与氛围] + [技术规格]

使用电影化的运动动词：轨道推近、快速摇摄、急推变焦、移焦、跟拍镜头——不要用“移动”或“去”
命名真实的光源：霓虹灯招牌、烛光、黄金时刻、LED 面板——不要用“戏剧性的灯光”
包含纹理以增加可信度：颗粒感、镜头光晕、冷凝水、织物光泽、烟雾、汗水
描述时间流：开始 → 中间 → 结尾
每个镜头保持 1-3 个丰富的句子（具体性 > 长度）
对于对话：使用角色标签，指定语音语调/情感，使用过渡词（“立刻”、“停顿”）

步骤 4：呈现与优化

呈现组装好的提示词。询问用户是否想要：

调整任何元素
添加负面提示词
生成变体（不同时长、不同摄像机、不同氛围）

运动	效果	示例短语
轨道推近	建立亲密感/紧张感	“缓慢轨道推近她的脸”
滑动变焦	眩晕感/戏剧性揭示	“滑动变焦造成令人迷失方向的景深变化”
跟拍镜头	横向跟随主体	“摄像机在她行走时平行跟拍”
快速摇摄	能量感/惊喜感	“快速摇摄揭示门”
急推变焦	震惊感/强调	“突然急推变焦到物体上”
移焦	转移注意力	“焦点从前景的手移到背景的人物”
手持/肩扛式摄像机	原始感/纪录片感	“手持肩扛式摄像机，轻微晃动”
静态三脚架	构图感/观察感	“固定静态三脚架，广角镜头”
FPV 无人机	高能量沉浸感	“动态 FPV 无人机镜头追逐穿过走廊”
低角度跟拍	英雄感/压迫感	“低角度跟拍镜头，主体居高临下”
左右横移	横向揭示	“摄像机向右横移，揭示城市景观”
上下倾斜	纵向揭示	“从靴子缓慢上摇到脸部”

短语	效果
“使用 35mm 胶片拍摄”	温暖的颗粒感，有机质感
“85mm 微距镜头”	紧密的细节，浅景深
“广角斯坦尼康”	平滑，沉浸感，空间感
“手持摄像机”	原始的 VHS 能量，怀旧感
“变形镜头光晕”	电影感的水平条纹

使用具体的光源，而不是形容词：

“黄金时刻的阳光穿过布满灰尘的仓库窗户”
“闪烁的霓虹灯将洋红色/青色投射在潮湿的路面上”
“单个裸露的灯泡摇晃着，投下移动的阴影”
“冷蓝色 LED 面板反射在玻璃表面上”
“烛光温暖了肤色，远处是深深的阴影”

“低饱和度的青色调，黑色被压暗”
“琥珀色的夜店频闪灯穿透烟雾”
“冷蓝色的薄雾充满走廊”
“洋红色的霓虹灯反射在潮湿的沥青上”
“高光过曝，白色溢出”

规则	应该做	不要做
命名角色	`[角色 A：银发 CEO]`	`[男人] 说...`
与动作关联	特工猛拍桌子。 [特工，愤怒地]： “东西在哪？”	只有对话没有视觉动作
指定语音语调	`[CEO，深沉权威的沙哑嗓音]`	通用的“说”
控制节奏	“立刻”、“停顿”、“片刻之后”	没有过渡的连续对话

镜头 1 (0-5秒): [广角定场镜头描述]
镜头 2 (5-10秒): [中景/特写，动作推进]
镜头 3 (10-15秒): [结局/反应，摄像机收尾]

氛围: [整体情绪，色彩分级]
音频: [音效设计，音乐，对话]

标注每个镜头。分配时长。为每个镜头描述构图 + 主体 + 运动。

起始与结束帧技巧

帧之间应在调色板、风格和灯光上匹配
相同的起始/结束帧 = 无缝循环
提示词要简洁——Kling 能很好地推断帧之间的运动
简单的摄像机指令：推近/拉远、向左/右摇摄、向上/下倾斜
5 秒用于动态过渡，10 秒用于复杂变换
起始帧的宽高比决定了整个剪辑

用于防止常见的 AI 默认效果：

微笑，大笑，卡通化，明亮饱和的颜色，低分辨率，
变形，模糊的文字，畸形的手，多余的手指，静态姿势，
冻结的表情，库存照片美学

根据场景定制——移除与你的意图冲突的项目。

元素	弱	强
摄像机	“摄像机跟随人”	“手持肩扛式摄像机在主体身后轻微晃动漂移”
主体	“一个女人在走路”	“穿着红色连衣裙的女人，高跟鞋踩在潮湿的鹅卵石上咔嗒作响”
环境	“在城市里”	“狭窄的东京小巷，下水道口冒出蒸汽，发光的自动售货机”
灯光	“戏剧性的灯光”	“闪烁的霓虹灯将洋红色/青色投射在潮湿的路面上”
纹理	“看起来很真实”	“雨水在皮夹克上凝结成珠，玻璃上的冷凝水，可见的呼吸”
运动	“她走开了”	“她慢慢转身，头发捕捉到光线，消失在拐角处”

错误	修正方法
使用关键词列表而非场景叙事	像导演镜头一样写作：主体 + 动作 + 摄像机 + 环境
模糊的运动（“移动”、“去”）	使用电影化动词：轨道推近、跟拍、快速摇摄、急推变焦
通用的灯光（“戏剧性的”）	命名光源：霓虹灯、蜡烛、黄金时刻、LED 面板
提示词过长	每个镜头 1-3 个丰富的句子；具体性 > 长度
没有时间进展	描述镜头的开始 → 中间 → 结尾
关键帧不匹配	起始/结束帧之间匹配颜色、灯光和风格
对话未归属	为每个说话者标注姓名、语调和情感
将多镜头塞进一个段落	分开并标注每个镜头的时长

🇺🇸English

Overview

Kling 3.0 is a unified multimodal video model. It understands cinematic direction , not keyword lists. Write prompts like a director — describe what the audience sees, hears, and feels over time.

Core shift: Description → Direction. Think "direct a scene" not "describe an image."

Interactive Builder Workflow

When invoked, guide the user through these steps using AskUserQuestion:

digraph builder {
  "1. Generation mode?" [shape=diamond];
  "Text-to-Video" [shape=box];
  "Image-to-Video" [shape=box];
  "Multi-Shot Sequence" [shape=box];
  "Keyframe Transition" [shape=box];
  "2. Gather scene details" [shape=box];
  "3. Assemble prompt" [shape=box];
  "4. Present & refine" [shape=box];

  "1. Generation mode?" -> "Text-to-Video";
  "1. Generation mode?" -> "Image-to-Video";
  "1. Generation mode?" -> "Multi-Shot Sequence";
  "1. Generation mode?" -> "Keyframe Transition";
  "Text-to-Video" -> "2. Gather scene details";
  "Image-to-Video" -> "2. Gather scene details";
  "Multi-Shot Sequence" -> "2. Gather scene details";
  "Keyframe Transition" -> "2. Gather scene details";
  "2. Gather scene details" -> "3. Assemble prompt";
  "3. Assemble prompt" -> "4. Present & refine";
}

Step 1: Determine Generation Mode

Ask the user which mode:

Text-to-Video — prompt from scratch
Image-to-Video — animate a reference image
Multi-Shot Sequence — 2-6 shot storyboard (up to 15s)
Keyframe Transition — start frame → end frame with interpolated motion

Step 2: Gather Scene Details

Ask about each element (adapt questions to mode):

Element	Question	Why it matters
Subject	Who/what is the focus? Specific appearance details?	Anchors consistency — define distinguishing traits early
Action	What happens? Describe the timeline (first → then → finally)	Kling 3.0 excels at sequential action over 15s arcs
Environment	Where? Be specific (not "a street" but "narrow Tokyo alley, steam from grates")	Grounds the scene physically
Camera	Shot type and movement? (See camera reference below)	Cinematic language produces far better results
Lighting	What light sources? Name them specifically	"Flickering neon" beats "dramatic lighting"
Mood/Emotion	What should the audience feel?	Drives color grade, pacing, music
Audio	Dialogue? Ambient sound? Music?

Image-to-Video: Focus on how the scene evolves from the image — movement, camera motion, environmental change. The model preserves identity/layout from the source.

Keyframes: Ask for start and end frame descriptions. Frames should match in color, style, and lighting. Prompt sparingly — Kling infers motion well.

Multi-Shot: Define each shot separately with its own framing, subject, action, and duration. Label shots explicitly.

Step 3: Assemble the Prompt

Use the Master Formula :

[Scene/Environment] + [Subject & Appearance] + [Action Timeline] + [Camera Movement] + [Audio & Atmosphere] + [Technical Specs]

Writing rules:

Use cinematic motion verbs: dolly push, whip-pan, crash zoom, rack focus, tracking shot — NOT "moves" or "goes"
Name real light sources: neon signs, candlelight, golden hour, LED panels — NOT "dramatic lighting"
Include texture for credibility: grain, lens flares, condensation, fabric sheen, smoke, sweat
Describe temporal flow: beginning → middle → end
Keep to 1-3 rich sentences per shot (specificity > length)
For dialogue: use character labels, assign voice tone/emotion, use transitional words ("Immediately," "Pause")

Step 4: Present & Refine

Present the assembled prompt. Ask if they want to:

Adjust any element
Add a negative prompt
Generate variations (different duration, different camera, different mood)

Quick Reference

Camera Movements

Movement	Effect	Example phrase
Dolly push-in	Builds intimacy/tension	"slow dolly push-in toward her face"
Dolly zoom	Vertigo/dramatic reveal	"dolly zoom creating disorienting depth shift"
Tracking shot	Follows subject laterally	"camera tracks alongside as she walks"
Whip-pan	Energy/surprise	"whip-pan to reveal the door"
Crash zoom	Shock/emphasis	"sudden crash zoom on the object"
Rack focus	Shift attention	"rack focus from foreground hand to background figure"
Handheld/shoulder-cam	Raw/documentary feel	"handheld shoulder-cam with subtle sway"
Static tripod	Composed/observational	"locked-off static tripod, wide shot"
FPV drone	High-energy immersion

Lens & Film Stock

Phrase	Effect
"Shot on 35mm film"	Warm grain, organic texture
"Macro 85mm lens"	Tight detail, shallow depth of field
"Wide-angle steadicam"	Smooth, immersive, spatial
"Handheld camcorder"	Raw VHS energy, nostalgic
"Anamorphic lens flare"	Cinematic horizontal streaks

Lighting

Use specific sources, not adjectives:

"Golden hour sun cutting through dusty warehouse windows"
"Flickering neon casting magenta/cyan across wet pavement"
"Single bare bulb swinging, casting moving shadows"
"Cool blue LED panels reflecting off glass surfaces"
"Candlelight warming skin tones, deep shadows beyond"

Color & Grade

"Desaturated teal grade, crushed blacks"
"Amber nightclub strobe cutting through smoke"
"Cool blue haze filling the corridor"
"Magenta neon reflecting off wet asphalt"
"Overexposed highlights, blown-out whites"

Multi-Character Dialogue

Rule	Do	Don't
Name characters	`[Character A: Silver-haired CEO]`	`[Man] says...`
Anchor to action	Agent slams table. [Agent, angrily]: "Where is it?"	Just dialogue without visual action
Assign voice tone	`[CEO, deep authoritative gravelly voice]`	Generic "says"
Control timing	"Immediately," "Pause," "After a beat"	Back-to-back dialogue without transitions

Multi-Shot Structure

Shot 1 (0-5s): [Wide establishing shot description]
Shot 2 (5-10s): [Medium/close-up with action progression]
Shot 3 (10-15s): [Resolution/reaction with camera payoff]

Atmosphere: [Overall mood, color grade]
Audio: [Sound design, music, dialogue]

Label every shot. Assign durations. Describe framing + subject + motion per shot.

Start & End Frame Tips

Frames should match in color palette, style, and lighting
Identical start/end frames = seamless loop
Prompt sparingly — Kling infers motion between frames well
Simple camera directions: zoom in/out, pan left/right, tilt up/down
5s for dynamic transitions, 10s for complex transformations
Start frame aspect ratio drives the whole clip

Negative Prompts

Use to prevent common AI defaults:

smiling, laughing, cartoonish, bright saturated colors, low resolution,
morphing, blurry text, disfigured hands, extra fingers, static pose,
frozen expression, stock photo aesthetic

Customize based on scene — remove items that conflict with your intent.

Weak → Strong

Element	Weak	Strong
Camera	"Camera follows person"	"Handheld shoulder-cam drifts behind subject with subtle sway"
Subject	"A woman walking"	"Woman in red dress, heels clicking wet cobblestone"
Environment	"In a city"	"Narrow Tokyo alley, steam from grates, glowing vending machines"
Lighting	"Dramatic lighting"	"Flickering neon casting magenta/cyan across wet pavement"
Texture	"It looks realistic"	"Rain beading on leather jacket, condensation on glass, visible breath"
Motion	"She walks away"	"She turns slowly, hair catches light, disappears around corner"

Common Mistakes

Mistake	Fix
Keyword lists instead of scene direction	Write like directing a shot: subject + action + camera + environment
Vague motion ("moves," "goes")	Use cinematic verbs: dolly, track, whip-pan, crash zoom
Generic lighting ("dramatic")	Name the source: neon, candle, golden hour, LED panel
Overlong prompts	1-3 rich sentences per shot; specificity > length
No temporal progression	Describe beginning → middle → end of the shot
Mismatched keyframes	Match color, lighting, and style between start/end frames
Unattributed dialogue	Label every speaker with name, tone, and emotion
Cramming multi-shot into one paragraph	Separate and label each shot with duration

Weekly Installs

Repository

aedev-tools/kli…ng-skill

GitHub Stars

First Seen

Feb 9, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

codex51

opencode50

gemini-cli50

github-copilot46

amp42

kimi-cli42

AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具

47,700 周安装