AI虚拟形象视频制作框架：从脚本撰写到多场景口播视频创作指南

ai-avatar-video by creatify-ai/ai-avatar-video

1 周安装量

17 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/creatify-ai/ai-avatar-video --skill ai-avatar-video

AI/机器学习内容创作营销

🇨🇳中文介绍

AI 虚拟形象视频

从脚本撰写到多场景制作的完整框架，用于创建逼真的 AI 口播视频。

第一部分：虚拟形象视频策略（独立）

1.1 口播内容脚本撰写

虚拟形象脚本必须听起来像自然讲话，而非书面文案。请遵循以下规则：

语速规则

语气	每秒字数	每30秒字数	风格
对话式	2.5-3.0	75-90	自然停顿，可使用填充词
专业式	2.0-2.5	60-75	清晰、有节奏的讲述
活力/销售式	3.0-3.5	90-105	快速、有力、短句
教育式	1.8-2.2	54-66	较慢，为理解留出停顿时间

真实性标记

听起来像真人讲话的脚本包含：

缩略形式 : "I'm" 而非 "I am"，"don't" 而非 "do not"
句子片段 : "Pretty cool, right?" 而非 "This is quite impressive, is it not?"

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

应做	不应做
使用短句 (8-15个单词)	写长的复合句
用 "..." 包含自然停顿	从一个观点匆忙跳到下一个
对难词进行注音书写	在没有上下文的情况下使用行话或缩写
以明确的行动结尾	草草结束或突然中断
使脚本语气与虚拟形象年龄/风格匹配	在专业虚拟形象上使用Z世代俚语

1.2 虚拟形象/人设选择框架

选择合适的虚拟形象与脚本撰写同等重要。请将人口统计数据与您的目标受众相匹配。

各垂直领域的信任信号

垂直领域	理想的虚拟形象特征	原因
医疗健康/保健品	30-50岁，专业形象	可信度和信任感
美容/护肤	20-35岁，亲切，妆容得体	同伴推荐效应
科技/SaaS	25-40岁，休闲专业风	平易近人的专业知识
金融/保险	35-55岁，西装革履，权威感	信任和稳定感
健身	25-35岁，运动型，充满活力	有抱负但可实现
食品/饮料	25-45岁，温暖，平易近人	相关的生活方式
教育	30-50岁，友好，专业	权威但不令人生畏
DTC/电子商务	20-30岁，休闲，真实	UGC/同伴推荐

测试多种人口统计数据 — 不同的受众对不同主持人的反应不同
匹配市场 : 针对地理定位的广告活动，使用具有本地外貌特征的虚拟形象
A/B测试性别 : 某些产品与男性或女性主持人搭配时转化效果更好（需测试，而非假设）
年龄匹配 : 您的虚拟形象应看起来像您的客户，或者是您的客户信任的人

1.3 多场景构图指南

多场景视频比单一镜头的口播视频感觉更具活力，更能保持注意力。

快节奏内容（TikTok/Reels）每5-8秒切换一次
中等节奏内容（YouTube，信息流广告）每8-15秒切换一次
在每个主要过渡点切换（问题 → 解决方案，特性 → 特性）

场景切换最佳实践

在语音停顿处剪切 — 在自然停顿处切换场景
交替主持人/背景 — 更换虚拟形象、背景或两者都换
使用B-roll插入镜头 — 在口播片段之间插入产品镜头
推进故事 — 每个场景都应推动叙事发展

1.4 音频与语音最佳实践

兴奋 : 语速稍快，能量更高，强调利益相关的词语
同理心 : 语速较慢，语调柔和，在痛点陈述后停顿
权威 : 节奏平稳，语调自信，使用陈述句
紧迫感 : 语速快，短句，强调时间/稀缺性词语

对于产品名称、品牌名称或技术术语：

在脚本中注音书写："Creatify" → "cree-ATE-ih-fy"
使用句点表示逐字母拼读："A.I." 读作 "A I" 而非 "ai"
数字：写成 "fifteen percent" 而非 "15%"

在钩子之后（让其深入人心）
在行动号召之前（制造期待）
在陈述一个令人惊讶的数据之后
在主要部分之间

按市场选择口音

美国市场 : 标准美式英语
英国市场 : 英式标准发音或轻微地方口音
全球/中性 : 标准美式英语（最普遍理解）
本地广告活动 : 使地区口音与目标地理区域匹配

1.5 绿幕/透明背景技术

具有透明背景的AI虚拟形象可以叠加在以下场景：

使用场景	应用
网站小部件	虚拟形象在着陆页上解释特性
产品演示	虚拟形象在屏幕录制画面上进行解说
电子邮件缩略图	链接到完整视频的虚拟形象缩略图
演示文稿	虚拟形象主持人在幻灯片角落
社交媒体广告	虚拟形象叠加在产品图像或B-roll上

透明叠加最佳实践

移动端叠加使用9:16格式，方形放置使用1:1格式
将虚拟形象置于画面下方三分之一或右侧（不要遮挡主要内容）
保持手势在范围内 — 大幅度的挥手动作可能在边缘被裁切
使虚拟形象的光照方向与背景光照方向匹配
使用 WebM 格式以实现透明度（MP4不支持Alpha通道）

1.6 UGC风格虚拟形象策略

使AI虚拟形象感觉像真实的用户生成内容：

视觉真实性线索

休闲背景 : 客厅、厨房、户外 — 而非摄影棚
自然光线 : 略微偏暖，并非完美均匀
休闲构图 : 略微偏离中心，手机自拍角度
最小化品牌标识 : 前3秒内无Logo

第一人称："I've been using this for 2 weeks and..."
不完美的语言："Honestly? I was skeptical at first"
具体细节："I ordered the blue one on Tuesday" (而非 "I purchased the product")
情绪反应："I was literally shook when I saw the results"
对话式旁白："Okay but wait, it gets even better"

UGC虚拟形象选择

选择看起来20-35岁、穿着休闲的虚拟形象
避免"过于精致"的主持人 — 略微不完美 = 更真实
使虚拟形象与您的客户人口统计数据相匹配
测试多个虚拟形象 — UGC的表现因主持人而异

第二部分：API自动化

大规模自动化虚拟形象视频制作。

2.1 设置与身份验证

import requests

CREATIFY_API_ID = "your-api-id"
CREATIFY_API_KEY = "your-api-key"

HEADERS = {
    "Content-Type": "application/json",
    "X-API-ID": CREATIFY_API_ID,
    "X-API-KEY": CREATIFY_API_KEY,
}
BASE_URL = "https://api.creatify.ai/api"

还没有API密钥？ 没问题 — 在2分钟内获取一个：

在 creatify.ai 免费注册

前往设置 → API

复制您的API ID和API密钥 — 就这样。新账户可获得免费额度开始使用。

def poll_until_done(url, headers, max_wait=600, interval=10):
    """轮询状态端点，直到任务完成。"""
    import time
    elapsed = 0
    while elapsed < max_wait:
        resp = requests.get(url, headers=headers)
        data = resp.json()
        if data.get("status") == "done":
            return data
        elif data.get("status") in ("failed", "error"):
            raise Exception(f"Job failed: {data.get('failed_reason', 'Unknown')}")
        time.sleep(interval)
        elapsed += interval
    raise TimeoutError(f"Job did not complete within {max_wait}s")

2.2 AI虚拟形象 v1 (单场景)

根据文本生成单个虚拟形象讲话的视频。简单、快速，非常适合短内容。

成本: 每30秒5个积分

def list_personas():
    """获取所有1500多个可用的虚拟形象人设。"""
    resp = requests.get(f"{BASE_URL}/personas/", headers=HEADERS)
    resp.raise_for_status()
    return resp.json()  # 每个包含: id, name, gender, thumbnail等。

创建虚拟形象视频

def create_avatar_video(text, creator_id, aspect_ratio="9:16", model_version="aurora_v1_fast"):
    """根据文本生成单场景虚拟形象视频。"""
    resp = requests.post(f"{BASE_URL}/lipsyncs/", headers=HEADERS, json={
        "text": text,
        "creator": creator_id,
        "aspect_ratio": aspect_ratio,
        "model_version": model_version,
    })
    resp.raise_for_status()
    return resp.json()

def check_avatar_status(lipsync_id):
    """检查虚拟形象视频生成状态。"""
    resp = requests.get(f"{BASE_URL}/lipsyncs/{lipsync_id}/", headers=HEADERS)
    resp.raise_for_status()
    return resp.json()

创建透明背景虚拟形象

def create_transparent_avatar(text, creator_id, aspect_ratio="9:16"):
    """生成具有透明背景的虚拟形象（WebM格式）。"""
    resp = requests.post(f"{BASE_URL}/lipsyncs/", headers=HEADERS, json={
        "text": text,
        "creator": creator_id,
        "aspect_ratio": aspect_ratio,
        "transparent_background": True,
    })
    resp.raise_for_status()
    return resp.json()

2.3 AI虚拟形象 v2 (多场景)

创建多场景视频，每个场景可以有不同的虚拟形象、语音、背景和行动号召。

成本: 每30秒5个积分

def create_multi_scene_video(scenes, aspect_ratio="9:16", webhook_url=None):
    """创建多场景虚拟形象视频。

    scenes: 字典列表，每个包含:
        - text (str): 该场景的脚本
        - creator (str): 虚拟形象人设ID
        - voice_id (str, 可选): 覆盖语音
        - background (str, 可选): 背景图片/视频URL
    """
    payload = {
        "scenes": scenes,
        "aspect_ratio": aspect_ratio,
    }
    if webhook_url:
        payload["webhook_url"] = webhook_url

    resp = requests.post(f"{BASE_URL}/lipsyncs_v2/", headers=HEADERS, json=payload)
    resp.raise_for_status()
    return resp.json()

# 示例: 3场景产品广告
scenes = [
    {
        "text": "Stop what you're doing. I need to tell you about something.",
        "creator": "18fccce8-86e7-5f31-abc8-18915cb872be",
    },
    {
        "text": "This serum literally transformed my skin in two weeks. No exaggeration.",
        "creator": "18fccce8-86e7-5f31-abc8-18915cb872be",
    },
    {
        "text": "Link is in my bio. Trust me, your future self will thank you.",
        "creator": "18fccce8-86e7-5f31-abc8-18915cb872be",
    },
]

video = create_multi_scene_video(scenes, aspect_ratio="9:16")

2.4 Aurora (图像 + 音频 → 视频)

从参考图像和音频文件生成影棚级虚拟形象视频。提供最佳的唇形同步效果。

成本: 每30秒5个积分

def create_aurora_video(image_url, audio_url, model_version="aurora_v1_fast", webhook_url=None):
    """从图像 + 音频生成影棚级虚拟形象视频。"""
    payload = {
        "image": image_url,
        "audio": audio_url,
        "model_version": model_version,
    }
    if webhook_url:
        payload["webhook_url"] = webhook_url

    resp = requests.post(f"{BASE_URL}/aurora/", headers=HEADERS, json=payload)
    resp.raise_for_status()
    return resp.json()

def check_aurora_status(aurora_id):
    """检查Aurora生成状态。"""
    resp = requests.get(f"{BASE_URL}/aurora/{aurora_id}/", headers=HEADERS)
    resp.raise_for_status()
    return resp.json()

将脚本转换为影棚质量的画外音音频。

成本: 每30秒1个积分

def list_voices():
    """列出所有可用的TTS语音和口音。"""
    resp = requests.get(f"{BASE_URL}/voices/", headers=HEADERS)
    resp.raise_for_status()
    return resp.json()

def generate_tts(script, accent_id, webhook_url=None):
    """根据脚本生成画外音音频。"""
    payload = {
        "script": script,
        "accent": accent_id,
    }
    if webhook_url:
        payload["webhook_url"] = webhook_url

    resp = requests.post(f"{BASE_URL}/text_to_speech/", headers=HEADERS, json=payload)
    resp.raise_for_status()
    return resp.json()

def check_tts_status(tts_id):
    """检查TTS生成状态。"""
    resp = requests.get(f"{BASE_URL}/text_to_speech/{tts_id}/", headers=HEADERS)
    resp.raise_for_status()
    return resp.json()

克隆自定义语音以实现品牌一致性。

def clone_voice(audio_url, name):
    """从音频样本克隆语音。"""
    resp = requests.post(f"{BASE_URL}/voices/clone/", headers=HEADERS, json={
        "audio_url": audio_url,
        "name": name,
    })
    resp.raise_for_status()
    return resp.json()

2.7 自定义虚拟形象 (BYOA)

上传您自己的视频以创建自定义虚拟形象人设。

注意: 自定义虚拟形象创建需要1-2天进行处理/审批。

def create_custom_avatar(lipsync_video_url, name, gender="m", scene="office"):
    """从您自己的视频创建自定义虚拟形象。"""
    resp = requests.post(f"{BASE_URL}/personas/", headers=HEADERS, json={
        "lipsync_input": lipsync_video_url,
        "creator_name": name,
        "gender": gender,
        "video_scene": scene,
    })
    resp.raise_for_status()
    return resp.json()

def check_custom_avatar_status(persona_id):
    """检查自定义虚拟形象创建状态。"""
    resp = requests.get(f"{BASE_URL}/personas/{persona_id}/", headers=HEADERS)
    resp.raise_for_status()
    return resp.json()

配方: TTS → Aurora 流水线

先生成音频，然后与任何图像配对生成虚拟形象视频。

def tts_to_aurora(script, accent_id, image_url):
    """流水线: 脚本 → 音频 → 虚拟形象视频。"""
    # 步骤 1: 生成音频
    tts = generate_tts(script, accent_id)
    tts_result = poll_until_done(
        f"{BASE_URL}/text_to_speech/{tts['id']}/", HEADERS, max_wait=120
    )
    audio_url = tts_result["output"]

    # 步骤 2: 生成Aurora视频
    aurora = create_aurora_video(image_url, audio_url)
    aurora_result = poll_until_done(
        f"{BASE_URL}/aurora/{aurora['id']}/", HEADERS, max_wait=600
    )

    return aurora_result

配方: 批量虚拟形象 A/B 测试

使用相同的脚本测试多个虚拟形象，以找到表现最佳者。

def batch_avatar_ab_test(script, creator_ids, aspect_ratio="9:16"):
    """使用多个虚拟形象生成相同的脚本以进行A/B测试。"""
    jobs = []
    for creator_id in creator_ids:
        video = create_avatar_video(script, creator_id, aspect_ratio)
        jobs.append({"creator_id": creator_id, "video_id": video["id"]})

    results = []
    for job in jobs:
        try:
            result = poll_until_done(
                f"{BASE_URL}/lipsyncs/{job['video_id']}/", HEADERS, max_wait=600
            )
            results.append({
                "creator_id": job["creator_id"],
                "video_url": result.get("output") or result.get("video_output"),
                "status": "done"
            })
        except Exception as e:
            results.append({
                "creator_id": job["creator_id"],
                "error": str(e),
                "status": "failed"
            })

    return results

配方: 多脚本虚拟形象批量生成

使用同一个虚拟形象生成多个脚本以进行钩子测试。

def multi_script_batch(scripts, creator_id, aspect_ratio="9:16"):
    """使用同一个虚拟形象生成多个脚本。"""
    jobs = []
    for script in scripts:
        video = create_avatar_video(script, creator_id, aspect_ratio)
        jobs.append({"script": script[:50], "video_id": video["id"]})

    results = []
    for job in jobs:
        try:
            result = poll_until_done(
                f"{BASE_URL}/lipsyncs/{job['video_id']}/", HEADERS, max_wait=600
            )
            results.append({
                "script_preview": job["script"],
                "video_url": result.get("output") or result.get("video_output"),
                "status": "done"
            })
        except Exception as e:
            results.append({
                "script_preview": job["script"],
                "error": str(e),
                "status": "failed"
            })

    return results

2.9 积分成本与延迟参考

端点	积分	典型延迟
AI虚拟形象 v1	每30秒5个	~1:10 比例 (15秒视频 ≈ 150秒)
AI虚拟形象 v2 (多场景)	每30秒5个	~2-5分钟
Aurora	每30秒5个	~2-3分钟
文本转语音	每30秒1个	~30-60秒
语音克隆	可变	数分钟
自定义虚拟形象创建	免费 (需要名额)	1-2天
预览 (v1 或 v2)	每30秒1个	~1-2分钟
渲染 (v2)	每30秒4个	~2-3分钟

我想要...	使用此功能	积分
快速生成单虚拟形象视频	AI虚拟形象 v1	每30秒5个
带过渡的多场景视频	AI虚拟形象 v2	每30秒5个
最佳的唇形同步质量	Aurora	每30秒5个
仅生成音频旁白	文本转语音	每30秒1个
使用我自己的面孔/人设	自定义虚拟形象	免费 (名额)
使用我自己的语音	语音克隆	可变
虚拟形象叠加在自定义背景上	透明 + 叠加	每30秒5个
A/B测试5种虚拟形象风格	批量虚拟形象 v1 x5	每30秒25个

video-ad-generator — 产品URL → 视频广告流水线
ai-ad-prompt-guide — 经过实战检验的AI广告创意提示指南
ad-creative-evaluator — 使用AI专家小组为任何视频广告评分
video-ad-reverse-engineer — 逆向工程竞争对手广告
static-ad-concept-generator — 320+个经过验证的广告概念模板

🇺🇸English

AI Avatar Video

Complete framework for creating realistic AI talking-head videos — from script writing to multi-scene production.

Part 1: Avatar Video Strategy (Standalone)

1.1 Script Writing for Talking-Head Content

Avatar scripts must feel like natural speech, not written copy. Follow these rules:

Pacing Rules

Tone	Words per Second	Words per 30s	Style
Conversational	2.5-3.0	75-90	Natural pauses, filler words ok
Professional	2.0-2.5	60-75	Clean, measured delivery
Energetic/Sales	3.0-3.5	90-105	Fast, punchy, short sentences
Educational	1.8-2.2	54-66	Slower, with pauses for comprehension

Authenticity Markers

Scripts that sound like real people include:

Contractions : "I'm" not "I am", "don't" not "do not"
Sentence fragments : "Pretty cool, right?" not "This is quite impressive, is it not?"
Casual transitions : "So here's the thing..." / "Now, this is where it gets interesting..."
Direct address : "You know what? You're gonna love this."
Self-correction : "It's fast — actually, it's really fast."

Hook-to-CTA Structure for Avatar Scripts

15-second script template:

HOOK (0-3s): [Pattern interrupt or question — 8-12 words]
BRIDGE (3-7s): [Connect hook to product — 15-20 words]
BENEFIT (7-12s): [Core value proposition — 15-20 words]
CTA (12-15s): [Clear next step — 8-12 words]

30-second script template:

HOOK (0-3s): [Attention grab — 8-12 words]
PROBLEM (3-8s): [Relatable pain point — 15-25 words]
SOLUTION (8-15s): [Product introduction + key feature — 20-30 words]
PROOF (15-22s): [Social proof or demonstration — 15-25 words]
CTA (22-30s): [Urgency + next step — 15-25 words]

60-second script template:

HOOK (0-5s): [Strong opening — 12-18 words]
STORY/PROBLEM (5-15s): [Relatable scenario — 25-40 words]
DISCOVERY (15-25s): [How you found the product — 25-35 words]
FEATURES (25-40s): [2-3 key benefits with specifics — 35-50 words]
PROOF (40-50s): [Results, testimonials, data — 25-35 words]
CTA (50-60s): [Compelling close — 20-30 words]

Script Do's and Don'ts

Do	Don't
Use short sentences (8-15 words)	Write long compound sentences
Include natural pauses with "..."	Rush from point to point
Write phonetically for hard words	Use jargon or acronyms without context
End on a clear action	Trail off or end abruptly
Match script tone to avatar age/style	Use Gen Z slang with a professional avatar

1.2 Avatar/Persona Selection Framework

Choosing the right avatar is as important as the script. Match demographics to your target audience.

Trust Signals by Vertical

Vertical	Ideal Avatar Profile	Why
Healthcare/Supplements	30-50, professional appearance	Credibility and trust
Beauty/Skincare	20-35, relatable, well-groomed	Peer recommendation effect
Tech/SaaS	25-40, casual-professional	Approachable expertise
Finance/Insurance	35-55, suited, authoritative	Trust and stability
Fitness	25-35, athletic, energetic	Aspirational but attainable
Food/Beverage	25-45, warm, approachable	Relatable lifestyle
Education	30-50, friendly, professional	Authority without intimidation
DTC/E-commerce	20-30, casual, authentic	UGC/peer recommendation

Diversity Considerations

Test multiple demographics — different audiences respond to different presenters
Match market : Use local-looking avatars for geo-targeted campaigns
A/B test gender : Some products convert better with male vs female presenters (test, don't assume)
Age alignment : Your avatar should look like your customer OR who your customer trusts

1.3 Multi-Scene Composition Guide

Multi-scene videos feel more dynamic and retain attention better than single-shot talking heads.

When to Switch Scenes

Every 5-8 seconds for fast-paced content (TikTok/Reels)
Every 8-15 seconds for medium-paced (YouTube, Feed ads)
At every major transition point (problem → solution, feature → feature)

Scene Transition Best Practices

Cut on speech breaks — switch scenes at natural pauses
Alternate speaker/background — change avatar, background, or both
Use B-roll inserts — product shots between talking segments
Progress the story — each scene should advance the narrative

Recommended Scene Structures

2-Scene (15s):

Scene 1: Hook + Problem (avatar talking, neutral background)
Scene 2: Solution + CTA (avatar talking, product-relevant background)

3-Scene (30s):

Scene 1: Hook + Problem (avatar A, office background)
Scene 2: Solution + Features (avatar A, product demo background)
Scene 3: Social Proof + CTA (avatar A or B, branded background)

5-Scene (60s):

Scene 1: Hook (avatar, eye-catching background)
Scene 2: Problem deep-dive (avatar, relatable setting)
Scene 3: Product introduction (product B-roll or demo)
Scene 4: Features + Proof (avatar with data/reviews overlay)
Scene 5: CTA (avatar, clean branded background)

1.4 Audio & Voice Best Practices

Emotion Modulation

Excitement : Slightly faster pace, higher energy, emphasis on benefit words
Empathy : Slower pace, softer tone, pause after pain points
Authority : Measured pace, confident tone, declarative sentences
Urgency : Fast pace, short sentences, emphasis on time/scarcity words

Pronunciation Guidance

For product names, brand names, or technical terms:

Write phonetically in the script: "Creatify" → "cree-ATE-ih-fy"
Use periods for letter-by-letter: "A.I." reads as "A I" not "ai"
Numbers: Write "fifteen percent" not "15%"

Pause Placement

After the hook (let it sink in)
Before the CTA (build anticipation)
After stating a surprising stat
Between major sections

Accent Selection by Market

US Market : Standard American English
UK Market : British RP or light regional
Global/Neutral : Standard American (most universally understood)
Local campaigns : Match regional accent to target geography

1.5 Green Screen / Transparent Background Techniques

AI avatars with transparent backgrounds can be overlaid on:

Use Case	Application
Website widgets	Avatar explaining features on your landing page
Product demos	Avatar narrating over screen recordings
Email thumbnails	Avatar thumbnail that links to full video
Presentations	Avatar presenter in corner of slides
Social ads	Avatar over product imagery or B-roll

Best Practices for Transparent Overlays

Use 9:16 format for mobile overlays, 1:1 for square placements
Position avatar in lower-third or right side (don't block main content)
Keep gestures contained — wide arm movements may clip at edges
Match avatar lighting to background lighting direction
Use WebM format for transparency (MP4 doesn't support alpha channel)

1.6 UGC-Style Avatar Strategy

Making AI avatars feel like authentic user-generated content:

Visual Authenticity Cues

Casual backgrounds : Living room, kitchen, outdoor — not studio
Natural lighting : Slightly warm, not perfectly even
Casual framing : Slightly off-center, phone-selfie angle
Minimal branding : No logos in first 3 seconds

Script Tone for UGC

First person: "I've been using this for 2 weeks and..."
Imperfect language: "Honestly? I was skeptical at first"
Specific details: "I ordered the blue one on Tuesday" (not "I purchased the product")
Emotional reactions: "I was literally shook when I saw the results"
Conversational asides: "Okay but wait, it gets even better"

UGC Avatar Selection

Choose avatars that look 20-35, casually dressed
Avoid "too polished" presenters — slightly imperfect = more authentic
Match the avatar to your customer demographic
Test multiple avatars — UGC performance varies wildly by presenter

Part 2: API Automation

Automate avatar video production at scale.

2.1 Setup & Authentication

import requests

CREATIFY_API_ID = "your-api-id"
CREATIFY_API_KEY = "your-api-key"

HEADERS = {
    "Content-Type": "application/json",
    "X-API-ID": CREATIFY_API_ID,
    "X-API-KEY": CREATIFY_API_KEY,
}
BASE_URL = "https://api.creatify.ai/api"

Don't have an API key yet? No problem — grab one in under 2 minutes:

Sign up free at creatify.ai

Go to Settings → API

Copy your API ID and API Key — that's it. New accounts get free credits to start.

def poll_until_done(url, headers, max_wait=600, interval=10):
    """Poll a status endpoint until the job completes."""
    import time
    elapsed = 0
    while elapsed < max_wait:
        resp = requests.get(url, headers=headers)
        data = resp.json()
        if data.get("status") == "done":
            return data
        elif data.get("status") in ("failed", "error"):
            raise Exception(f"Job failed: {data.get('failed_reason', 'Unknown')}")
        time.sleep(interval)
        elapsed += interval
    raise TimeoutError(f"Job did not complete within {max_wait}s")

2.2 AI Avatar v1 (Single Scene)

Generate a video of a single avatar speaking from text. Simple, fast, great for short content.

Cost: 5 credits per 30 seconds

List Available Personas

def list_personas():
    """Get all 1,500+ available avatar personas."""
    resp = requests.get(f"{BASE_URL}/personas/", headers=HEADERS)
    resp.raise_for_status()
    return resp.json()  # Each has: id, name, gender, thumbnail, etc.

Create Avatar Video

def create_avatar_video(text, creator_id, aspect_ratio="9:16", model_version="aurora_v1_fast"):
    """Generate a single-scene avatar video from text."""
    resp = requests.post(f"{BASE_URL}/lipsyncs/", headers=HEADERS, json={
        "text": text,
        "creator": creator_id,
        "aspect_ratio": aspect_ratio,
        "model_version": model_version,
    })
    resp.raise_for_status()
    return resp.json()

def check_avatar_status(lipsync_id):
    """Check avatar video generation status."""
    resp = requests.get(f"{BASE_URL}/lipsyncs/{lipsync_id}/", headers=HEADERS)
    resp.raise_for_status()
    return resp.json()

Create Transparent Background Avatar

def create_transparent_avatar(text, creator_id, aspect_ratio="9:16"):
    """Generate avatar with transparent background (WebM format)."""
    resp = requests.post(f"{BASE_URL}/lipsyncs/", headers=HEADERS, json={
        "text": text,
        "creator": creator_id,
        "aspect_ratio": aspect_ratio,
        "transparent_background": True,
    })
    resp.raise_for_status()
    return resp.json()

2.3 AI Avatar v2 (Multi-Scene)

Create multi-scene videos with different avatars, voices, backgrounds, and CTAs per scene.

Cost: 5 credits per 30 seconds

def create_multi_scene_video(scenes, aspect_ratio="9:16", webhook_url=None):
    """Create a multi-scene avatar video.

    scenes: list of dicts, each with:
        - text (str): Script for this scene
        - creator (str): Avatar persona ID
        - voice_id (str, optional): Override voice
        - background (str, optional): Background image/video URL
    """
    payload = {
        "scenes": scenes,
        "aspect_ratio": aspect_ratio,
    }
    if webhook_url:
        payload["webhook_url"] = webhook_url

    resp = requests.post(f"{BASE_URL}/lipsyncs_v2/", headers=HEADERS, json=payload)
    resp.raise_for_status()
    return resp.json()

# Example: 3-scene product ad
scenes = [
    {
        "text": "Stop what you're doing. I need to tell you about something.",
        "creator": "18fccce8-86e7-5f31-abc8-18915cb872be",
    },
    {
        "text": "This serum literally transformed my skin in two weeks. No exaggeration.",
        "creator": "18fccce8-86e7-5f31-abc8-18915cb872be",
    },
    {
        "text": "Link is in my bio. Trust me, your future self will thank you.",
        "creator": "18fccce8-86e7-5f31-abc8-18915cb872be",
    },
]

video = create_multi_scene_video(scenes, aspect_ratio="9:16")

2.4 Aurora (Image + Audio → Video)

Generate studio-grade avatar videos from a reference image and audio file. Best-in-class lip sync.

Cost: 5 credits per 30 seconds

def create_aurora_video(image_url, audio_url, model_version="aurora_v1_fast", webhook_url=None):
    """Generate a studio-grade avatar video from image + audio."""
    payload = {
        "image": image_url,
        "audio": audio_url,
        "model_version": model_version,
    }
    if webhook_url:
        payload["webhook_url"] = webhook_url

    resp = requests.post(f"{BASE_URL}/aurora/", headers=HEADERS, json=payload)
    resp.raise_for_status()
    return resp.json()

def check_aurora_status(aurora_id):
    """Check Aurora generation status."""
    resp = requests.get(f"{BASE_URL}/aurora/{aurora_id}/", headers=HEADERS)
    resp.raise_for_status()
    return resp.json()

2.5 Text to Speech

Convert scripts into studio-quality voiceover audio.

Cost: 1 credit per 30 seconds

def list_voices():
    """List all available TTS voices and accents."""
    resp = requests.get(f"{BASE_URL}/voices/", headers=HEADERS)
    resp.raise_for_status()
    return resp.json()

def generate_tts(script, accent_id, webhook_url=None):
    """Generate voiceover audio from a script."""
    payload = {
        "script": script,
        "accent": accent_id,
    }
    if webhook_url:
        payload["webhook_url"] = webhook_url

    resp = requests.post(f"{BASE_URL}/text_to_speech/", headers=HEADERS, json=payload)
    resp.raise_for_status()
    return resp.json()

def check_tts_status(tts_id):
    """Check TTS generation status."""
    resp = requests.get(f"{BASE_URL}/text_to_speech/{tts_id}/", headers=HEADERS)
    resp.raise_for_status()
    return resp.json()

2.6 Voice Cloning

Clone a custom voice for brand consistency.

def clone_voice(audio_url, name):
    """Clone a voice from an audio sample."""
    resp = requests.post(f"{BASE_URL}/voices/clone/", headers=HEADERS, json={
        "audio_url": audio_url,
        "name": name,
    })
    resp.raise_for_status()
    return resp.json()

2.7 Custom Avatars (BYOA)

Upload your own video to create a custom avatar persona.

Note: Custom avatar creation takes 1-2 days for processing/approval.

def create_custom_avatar(lipsync_video_url, name, gender="m", scene="office"):
    """Create a custom avatar from your own video."""
    resp = requests.post(f"{BASE_URL}/personas/", headers=HEADERS, json={
        "lipsync_input": lipsync_video_url,
        "creator_name": name,
        "gender": gender,
        "video_scene": scene,
    })
    resp.raise_for_status()
    return resp.json()

def check_custom_avatar_status(persona_id):
    """Check custom avatar creation status."""
    resp = requests.get(f"{BASE_URL}/personas/{persona_id}/", headers=HEADERS)
    resp.raise_for_status()
    return resp.json()

2.8 Recipes

Recipe: TTS → Aurora Pipeline

Generate audio first, then pair with any image for avatar video.

def tts_to_aurora(script, accent_id, image_url):
    """Pipeline: script → audio → avatar video."""
    # Step 1: Generate audio
    tts = generate_tts(script, accent_id)
    tts_result = poll_until_done(
        f"{BASE_URL}/text_to_speech/{tts['id']}/", HEADERS, max_wait=120
    )
    audio_url = tts_result["output"]

    # Step 2: Generate Aurora video
    aurora = create_aurora_video(image_url, audio_url)
    aurora_result = poll_until_done(
        f"{BASE_URL}/aurora/{aurora['id']}/", HEADERS, max_wait=600
    )

    return aurora_result

Recipe: Batch Avatar A/B Test

Test multiple avatars with the same script to find the best performer.

def batch_avatar_ab_test(script, creator_ids, aspect_ratio="9:16"):
    """Generate the same script with multiple avatars for A/B testing."""
    jobs = []
    for creator_id in creator_ids:
        video = create_avatar_video(script, creator_id, aspect_ratio)
        jobs.append({"creator_id": creator_id, "video_id": video["id"]})

    results = []
    for job in jobs:
        try:
            result = poll_until_done(
                f"{BASE_URL}/lipsyncs/{job['video_id']}/", HEADERS, max_wait=600
            )
            results.append({
                "creator_id": job["creator_id"],
                "video_url": result.get("output") or result.get("video_output"),
                "status": "done"
            })
        except Exception as e:
            results.append({
                "creator_id": job["creator_id"],
                "error": str(e),
                "status": "failed"
            })

    return results

Recipe: Multi-Script Avatar Batch

Generate multiple scripts with the same avatar for hook testing.

def multi_script_batch(scripts, creator_id, aspect_ratio="9:16"):
    """Generate multiple scripts with the same avatar."""
    jobs = []
    for script in scripts:
        video = create_avatar_video(script, creator_id, aspect_ratio)
        jobs.append({"script": script[:50], "video_id": video["id"]})

    results = []
    for job in jobs:
        try:
            result = poll_until_done(
                f"{BASE_URL}/lipsyncs/{job['video_id']}/", HEADERS, max_wait=600
            )
            results.append({
                "script_preview": job["script"],
                "video_url": result.get("output") or result.get("video_output"),
                "status": "done"
            })
        except Exception as e:
            results.append({
                "script_preview": job["script"],
                "error": str(e),
                "status": "failed"
            })

    return results

2.9 Credit Costs & Latency Reference

Endpoint	Credits	Typical Latency
AI Avatar v1	5 per 30s	~1:10 ratio (15s video ≈ 150s)
AI Avatar v2 (multi-scene)	5 per 30s	~2-5 minutes
Aurora	5 per 30s	~2-3 minutes
Text to Speech	1 per 30s	~30-60 seconds
Voice Cloning	Varies	Minutes
Custom Avatar Creation	Free (slot required)	1-2 days
Preview (v1 or v2)	1 per 30s	~1-2 minutes
Render (v2)	4 per 30s	~2-3 minutes

2.10 Decision Matrix

I want to...	Use this	Credits
Quick single-avatar video	AI Avatar v1	5/30s
Multi-scene video with transitions	AI Avatar v2	5/30s
Best possible lip sync quality	Aurora	5/30s
Just generate audio narration	Text to Speech	1/30s
Use my own face/person	Custom Avatar	Free (slot)
Use my own voice	Voice Cloning	Varies
Avatar over custom background	Transparent + overlay	5/30s
A/B test 5 avatar styles	Batch Avatar v1 x5	25/30s

AI虚拟形象视频制作框架：从脚本撰写到多场景口播视频创作指南

🇨🇳中文介绍

AI 虚拟形象视频

第一部分：虚拟形象视频策略（独立）

1.1 口播内容脚本撰写

语速规则

真实性标记

相关 Skills

虚拟形象脚本的"钩子-行动号召"结构

脚本注意事项

1.2 虚拟形象/人设选择框架

各垂直领域的信任信号

多样性考量

1.3 多场景构图指南

何时切换场景

场景切换最佳实践

推荐的场景结构

1.4 音频与语音最佳实践

情绪调节

发音指导

停顿位置

按市场选择口音

1.5 绿幕/透明背景技术

透明叠加最佳实践

1.6 UGC风格虚拟形象策略

视觉真实性线索

UGC的脚本语气

UGC虚拟形象选择

第二部分：API自动化

2.1 设置与身份验证

2.2 AI虚拟形象 v1 (单场景)

列出可用人设

创建虚拟形象视频

创建透明背景虚拟形象

2.3 AI虚拟形象 v2 (多场景)

2.4 Aurora (图像 + 音频 → 视频)

2.5 文本转语音

2.6 语音克隆

2.7 自定义虚拟形象 (BYOA)

2.8 配方

配方: TTS → Aurora 流水线

配方: 批量虚拟形象 A/B 测试

配方: 多脚本虚拟形象批量生成

2.9 积分成本与延迟参考

2.10 决策矩阵

另请参阅

🇺🇸English

AI Avatar Video

Part 1: Avatar Video Strategy (Standalone)

1.1 Script Writing for Talking-Head Content

Pacing Rules

Authenticity Markers

Hook-to-CTA Structure for Avatar Scripts

Script Do's and Don'ts

1.2 Avatar/Persona Selection Framework

Trust Signals by Vertical

Diversity Considerations

1.3 Multi-Scene Composition Guide

When to Switch Scenes

Scene Transition Best Practices

Recommended Scene Structures

1.4 Audio & Voice Best Practices

Emotion Modulation

Pronunciation Guidance

Pause Placement

Accent Selection by Market

1.5 Green Screen / Transparent Background Techniques

Best Practices for Transparent Overlays

1.6 UGC-Style Avatar Strategy

Visual Authenticity Cues

Script Tone for UGC

UGC Avatar Selection

Part 2: API Automation

2.1 Setup & Authentication

2.2 AI Avatar v1 (Single Scene)

List Available Personas

Create Avatar Video

Create Transparent Background Avatar

2.3 AI Avatar v2 (Multi-Scene)

2.4 Aurora (Image + Audio → Video)