kling-3-prompting by aedev-tools/kling-3-prompting-skill
npx skills add https://github.com/aedev-tools/kling-3-prompting-skill --skill kling-3-promptingKling 3.0 是一个统一的多模态视频模型。它理解的是电影化叙事,而非关键词列表。请像导演一样编写提示词——描述观众随着时间推移所看到、听到和感受到的内容。
核心转变: 从描述转向叙事。思考“导演一个场景”而非“描述一张图片”。
当被调用时,使用 AskUserQuestion 引导用户完成以下步骤:
digraph builder {
"1. 生成模式?" [shape=diamond];
"文本到视频" [shape=box];
"图像到视频" [shape=box];
"多镜头序列" [shape=box];
"关键帧过渡" [shape=box];
"2. 收集场景细节" [shape=box];
"3. 组装提示词" [shape=box];
"4. 呈现与优化" [shape=box];
"1. 生成模式?" -> "文本到视频";
"1. 生成模式?" -> "图像到视频";
"1. 生成模式?" -> "多镜头序列";
"1. 生成模式?" -> "关键帧过渡";
"文本到视频" -> "2. 收集场景细节";
"图像到视频" -> "2. 收集场景细节";
"多镜头序列" -> "2. 收集场景细节";
"关键帧过渡" -> "2. 收集场景细节";
"2. 收集场景细节" -> "3. 组装提示词";
"3. 组装提示词" -> "4. 呈现与优化";
}
询问用户选择哪种模式:
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
询问每个元素(根据模式调整问题):
| 元素 | 问题 | 重要性说明 |
|---|---|---|
| 主体 | 焦点是谁/什么?具体的外观细节? | 锚定一致性——尽早定义显著特征 |
| 动作 | 发生了什么?描述时间线(首先 → 然后 → 最后) | Kling 3.0 擅长处理超过 15 秒弧线的连续动作 |
| 环境 | 在哪里?要具体(不是“一条街”,而是“狭窄的东京小巷,下水道口冒出蒸汽”) | 将场景物理地固定下来 |
| 摄像机 | 镜头类型和运动?(参见下方的摄像机参考) | 电影语言能产生更好的效果 |
| 灯光 | 有什么光源?具体命名它们 | “闪烁的霓虹灯”比“戏剧性的灯光”更好 |
| 氛围/情感 | 观众应该感受到什么? | 驱动色彩分级、节奏、音乐 |
| 音频 | 有对话吗?环境音?音乐? | Kling 3.0 生成原生音频 + 唇形同步 |
| 时长 | 多长?(3-15秒) | 时长越长 = 描述随时间推移的进展 |
| 宽高比 | 16:9 / 9:16 / 1:1 / 21:9? | 16:9 电影感,9:16 社交媒体,21:9 超宽屏 |
图像到视频: 关注场景如何从图像演变——运动、摄像机运动、环境变化。模型会保留源图像的身份/布局。
关键帧: 询问起始帧和结束帧的描述。帧之间应在颜色、风格和灯光上匹配。提示词要简洁——Kling 能很好地推断运动。
多镜头: 为每个镜头单独定义其构图、主体、动作和时长。明确标注每个镜头。
使用主公式:
[场景/环境] + [主体与外观] + [动作时间线] + [摄像机运动] + [音频与氛围] + [技术规格]
写作规则:
呈现组装好的提示词。询问用户是否想要:
| 运动 | 效果 | 示例短语 |
|---|---|---|
| 轨道推近 | 建立亲密感/紧张感 | “缓慢轨道推近她的脸” |
| 滑动变焦 | 眩晕感/戏剧性揭示 | “滑动变焦造成令人迷失方向的景深变化” |
| 跟拍镜头 | 横向跟随主体 | “摄像机在她行走时平行跟拍” |
| 快速摇摄 | 能量感/惊喜感 | “快速摇摄揭示门” |
| 急推变焦 | 震惊感/强调 | “突然急推变焦到物体上” |
| 移焦 | 转移注意力 | “焦点从前景的手移到背景的人物” |
| 手持/肩扛式摄像机 | 原始感/纪录片感 | “手持肩扛式摄像机,轻微晃动” |
| 静态三脚架 | 构图感/观察感 | “固定静态三脚架,广角镜头” |
| FPV 无人机 | 高能量沉浸感 | “动态 FPV 无人机镜头追逐穿过走廊” |
| 低角度跟拍 | 英雄感/压迫感 | “低角度跟拍镜头,主体居高临下” |
| 左右横移 | 横向揭示 | “摄像机向右横移,揭示城市景观” |
| 上下倾斜 | 纵向揭示 | “从靴子缓慢上摇到脸部” |
| 短语 | 效果 |
|---|---|
| “使用 35mm 胶片拍摄” | 温暖的颗粒感,有机质感 |
| “85mm 微距镜头” | 紧密的细节,浅景深 |
| “广角斯坦尼康” | 平滑,沉浸感,空间感 |
| “手持摄像机” | 原始的 VHS 能量,怀旧感 |
| “变形镜头光晕” | 电影感的水平条纹 |
使用具体的光源,而不是形容词:
| 规则 | 应该做 | 不要做 |
|---|---|---|
| 命名角色 | [角色 A:银发 CEO] | [男人] 说... |
| 与动作关联 | 特工猛拍桌子。 [特工,愤怒地]: “东西在哪?” | 只有对话没有视觉动作 |
| 指定语音语调 | [CEO,深沉权威的沙哑嗓音] | 通用的“说” |
| 控制节奏 | “立刻”、“停顿”、“片刻之后” | 没有过渡的连续对话 |
镜头 1 (0-5秒): [广角定场镜头描述]
镜头 2 (5-10秒): [中景/特写,动作推进]
镜头 3 (10-15秒): [结局/反应,摄像机收尾]
氛围: [整体情绪,色彩分级]
音频: [音效设计,音乐,对话]
标注每个镜头。分配时长。为每个镜头描述构图 + 主体 + 运动。
用于防止常见的 AI 默认效果:
微笑,大笑,卡通化,明亮饱和的颜色,低分辨率,
变形,模糊的文字,畸形的手,多余的手指,静态姿势,
冻结的表情,库存照片美学
根据场景定制——移除与你的意图冲突的项目。
| 元素 | 弱 | 强 |
|---|---|---|
| 摄像机 | “摄像机跟随人” | “手持肩扛式摄像机在主体身后轻微晃动漂移” |
| 主体 | “一个女人在走路” | “穿着红色连衣裙的女人,高跟鞋踩在潮湿的鹅卵石上咔嗒作响” |
| 环境 | “在城市里” | “狭窄的东京小巷,下水道口冒出蒸汽,发光的自动售货机” |
| 灯光 | “戏剧性的灯光” | “闪烁的霓虹灯将洋红色/青色投射在潮湿的路面上” |
| 纹理 | “看起来很真实” | “雨水在皮夹克上凝结成珠,玻璃上的冷凝水,可见的呼吸” |
| 运动 | “她走开了” | “她慢慢转身,头发捕捉到光线,消失在拐角处” |
| 错误 | 修正方法 |
|---|---|
| 使用关键词列表而非场景叙事 | 像导演镜头一样写作:主体 + 动作 + 摄像机 + 环境 |
| 模糊的运动(“移动”、“去”) | 使用电影化动词:轨道推近、跟拍、快速摇摄、急推变焦 |
| 通用的灯光(“戏剧性的”) | 命名光源:霓虹灯、蜡烛、黄金时刻、LED 面板 |
| 提示词过长 | 每个镜头 1-3 个丰富的句子;具体性 > 长度 |
| 没有时间进展 | 描述镜头的开始 → 中间 → 结尾 |
| 关键帧不匹配 | 起始/结束帧之间匹配颜色、灯光和风格 |
| 对话未归属 | 为每个说话者标注姓名、语调和情感 |
| 将多镜头塞进一个段落 | 分开并标注每个镜头的时长 |
周安装量
55
仓库
GitHub 星标
4
首次出现
2026年2月9日
安全审计
安装于
codex51
opencode50
gemini-cli50
github-copilot46
amp42
kimi-cli42
Kling 3.0 is a unified multimodal video model. It understands cinematic direction , not keyword lists. Write prompts like a director — describe what the audience sees, hears, and feels over time.
Core shift: Description → Direction. Think "direct a scene" not "describe an image."
When invoked, guide the user through these steps using AskUserQuestion:
digraph builder {
"1. Generation mode?" [shape=diamond];
"Text-to-Video" [shape=box];
"Image-to-Video" [shape=box];
"Multi-Shot Sequence" [shape=box];
"Keyframe Transition" [shape=box];
"2. Gather scene details" [shape=box];
"3. Assemble prompt" [shape=box];
"4. Present & refine" [shape=box];
"1. Generation mode?" -> "Text-to-Video";
"1. Generation mode?" -> "Image-to-Video";
"1. Generation mode?" -> "Multi-Shot Sequence";
"1. Generation mode?" -> "Keyframe Transition";
"Text-to-Video" -> "2. Gather scene details";
"Image-to-Video" -> "2. Gather scene details";
"Multi-Shot Sequence" -> "2. Gather scene details";
"Keyframe Transition" -> "2. Gather scene details";
"2. Gather scene details" -> "3. Assemble prompt";
"3. Assemble prompt" -> "4. Present & refine";
}
Ask the user which mode:
Ask about each element (adapt questions to mode):
| Element | Question | Why it matters |
|---|---|---|
| Subject | Who/what is the focus? Specific appearance details? | Anchors consistency — define distinguishing traits early |
| Action | What happens? Describe the timeline (first → then → finally) | Kling 3.0 excels at sequential action over 15s arcs |
| Environment | Where? Be specific (not "a street" but "narrow Tokyo alley, steam from grates") | Grounds the scene physically |
| Camera | Shot type and movement? (See camera reference below) | Cinematic language produces far better results |
| Lighting | What light sources? Name them specifically | "Flickering neon" beats "dramatic lighting" |
| Mood/Emotion | What should the audience feel? | Drives color grade, pacing, music |
| Audio | Dialogue? Ambient sound? Music? |
Image-to-Video: Focus on how the scene evolves from the image — movement, camera motion, environmental change. The model preserves identity/layout from the source.
Keyframes: Ask for start and end frame descriptions. Frames should match in color, style, and lighting. Prompt sparingly — Kling infers motion well.
Multi-Shot: Define each shot separately with its own framing, subject, action, and duration. Label shots explicitly.
Use the Master Formula :
[Scene/Environment] + [Subject & Appearance] + [Action Timeline] + [Camera Movement] + [Audio & Atmosphere] + [Technical Specs]
Writing rules:
Present the assembled prompt. Ask if they want to:
| Movement | Effect | Example phrase |
|---|---|---|
| Dolly push-in | Builds intimacy/tension | "slow dolly push-in toward her face" |
| Dolly zoom | Vertigo/dramatic reveal | "dolly zoom creating disorienting depth shift" |
| Tracking shot | Follows subject laterally | "camera tracks alongside as she walks" |
| Whip-pan | Energy/surprise | "whip-pan to reveal the door" |
| Crash zoom | Shock/emphasis | "sudden crash zoom on the object" |
| Rack focus | Shift attention | "rack focus from foreground hand to background figure" |
| Handheld/shoulder-cam | Raw/documentary feel | "handheld shoulder-cam with subtle sway" |
| Static tripod | Composed/observational | "locked-off static tripod, wide shot" |
| FPV drone | High-energy immersion |
| Phrase | Effect |
|---|---|
| "Shot on 35mm film" | Warm grain, organic texture |
| "Macro 85mm lens" | Tight detail, shallow depth of field |
| "Wide-angle steadicam" | Smooth, immersive, spatial |
| "Handheld camcorder" | Raw VHS energy, nostalgic |
| "Anamorphic lens flare" | Cinematic horizontal streaks |
Use specific sources, not adjectives:
| Rule | Do | Don't |
|---|---|---|
| Name characters | [Character A: Silver-haired CEO] | [Man] says... |
| Anchor to action | Agent slams table. [Agent, angrily]: "Where is it?" | Just dialogue without visual action |
| Assign voice tone | [CEO, deep authoritative gravelly voice] | Generic "says" |
| Control timing | "Immediately," "Pause," "After a beat" | Back-to-back dialogue without transitions |
Shot 1 (0-5s): [Wide establishing shot description]
Shot 2 (5-10s): [Medium/close-up with action progression]
Shot 3 (10-15s): [Resolution/reaction with camera payoff]
Atmosphere: [Overall mood, color grade]
Audio: [Sound design, music, dialogue]
Label every shot. Assign durations. Describe framing + subject + motion per shot.
Use to prevent common AI defaults:
smiling, laughing, cartoonish, bright saturated colors, low resolution,
morphing, blurry text, disfigured hands, extra fingers, static pose,
frozen expression, stock photo aesthetic
Customize based on scene — remove items that conflict with your intent.
| Element | Weak | Strong |
|---|---|---|
| Camera | "Camera follows person" | "Handheld shoulder-cam drifts behind subject with subtle sway" |
| Subject | "A woman walking" | "Woman in red dress, heels clicking wet cobblestone" |
| Environment | "In a city" | "Narrow Tokyo alley, steam from grates, glowing vending machines" |
| Lighting | "Dramatic lighting" | "Flickering neon casting magenta/cyan across wet pavement" |
| Texture | "It looks realistic" | "Rain beading on leather jacket, condensation on glass, visible breath" |
| Motion | "She walks away" | "She turns slowly, hair catches light, disappears around corner" |
| Mistake | Fix |
|---|---|
| Keyword lists instead of scene direction | Write like directing a shot: subject + action + camera + environment |
| Vague motion ("moves," "goes") | Use cinematic verbs: dolly, track, whip-pan, crash zoom |
| Generic lighting ("dramatic") | Name the source: neon, candle, golden hour, LED panel |
| Overlong prompts | 1-3 rich sentences per shot; specificity > length |
| No temporal progression | Describe beginning → middle → end of the shot |
| Mismatched keyframes | Match color, lighting, and style between start/end frames |
| Unattributed dialogue | Label every speaker with name, tone, and emotion |
| Cramming multi-shot into one paragraph | Separate and label each shot with duration |
Weekly Installs
55
Repository
GitHub Stars
4
First Seen
Feb 9, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
codex51
opencode50
gemini-cli50
github-copilot46
amp42
kimi-cli42
AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具
47,700 周安装
| Kling 3.0 generates native audio + lip-sync |
| Duration | How long? (3-15s) | Longer = describe progression over time |
| Aspect Ratio | 16:9 / 9:16 / 1:1 / 21:9? | 16:9 cinematic, 9:16 social, 21:9 ultra-wide |
| "dynamic FPV drone shot chasing through corridor" |
| Low-angle tracking | Heroic/imposing | "low-angle tracking shot, subject towers above" |
| Truck left/right | Lateral reveal | "camera trucks right revealing the cityscape" |
| Tilt up/down | Vertical reveal | "slow tilt up from boots to face" |