AI虚拟形象视频生成工具 - 说话头部与口型同步技术，支持多语言配音

ai-avatar-video by inferen-sh/skills

7,300 周安装量

228 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/inferen-sh/skills --skill ai-avatar-video

AI/机器学习营销音频处理

🇨🇳中文介绍

AI 虚拟形象与说话头部视频

通过 inference.sh CLI 创建 AI 虚拟形象和说话头部视频。

AI 虚拟形象与说话头部视频

快速开始

需要 inference.sh CLI (infsh)。安装说明

infsh login

# 从图像 + 音频创建虚拟形象视频
infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

可用模型

模型	应用 ID	最佳用途
OmniHuman 1.5	`bytedance/omnihuman-1-5`

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

搜索虚拟形象应用

infsh app list --search "omnihuman"
infsh app list --search "lipsync"
infsh app list --search "fabric"

OmniHuman 1.5 (多角色)

infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

支持在多人物图像中指定驱动哪个角色。

Fabric 1.0 (图像说话)

infsh app run falai/fabric-1-0 --input '{
  "image_url": "https://face.jpg",
  "audio_url": "https://audio.mp3"
}'

PixVerse 口型同步

infsh app run falai/pixverse-lipsync --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

从任何音频生成高度逼真的口型同步。

完整工作流：TTS + 虚拟形象

# 1. 从文本生成语音
infsh app run infsh/kokoro-tts --input '{
  "prompt": "Welcome to our product demo. Today I will show you..."
}' > speech.json

# 2. 使用语音创建虚拟形象视频
infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://presenter-photo.jpg",
  "audio_url": "<audio-url-from-step-1>"
}'

完整工作流：为视频进行另一种语言配音

# 1. 转录原始视频
infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://video.mp4"}' > transcript.json

# 2. 翻译文本（手动或使用 LLM）

# 3. 生成新语言的语音
infsh app run infsh/kokoro-tts --input '{"text": "<translated-text>"}' > new_speech.json

# 4. 将原始视频与新音频进行口型同步
infsh app run infsh/latentsync-1-6 --input '{
  "video_url": "https://original-video.mp4",
  "audio_url": "<new-audio-url>"
}'

营销：带有 AI 演示者的产品演示
教育：课程视频、讲解视频
本地化：为内容进行多语言配音
社交媒体：一致的虚拟影响者
企业：培训视频、公告

使用高质量的肖像照片（正面、光线良好）
音频应清晰，背景噪音最小
OmniHuman 1.5 支持单张图像中的多个人物
LatentSync 最适合将现有视频与新音频同步

# 完整平台技能（所有 150+ 应用）
npx skills add inference-sh/skills@infsh-cli

# 文本转语音（为虚拟形象生成音频）
npx skills add inference-sh/skills@text-to-speech

# 语音转文本（为配音转录）
npx skills add inference-sh/skills@speech-to-text

# 视频生成
npx skills add inference-sh/skills@ai-video-generation

# 图像生成（创建虚拟形象图像）
npx skills add inference-sh/skills@ai-image-generation

浏览所有视频应用：infsh app list --category video

运行应用 - 如何通过 CLI 运行应用
内容管道示例 - 构建媒体工作流
流式传输结果 - 实时进度更新

🇺🇸English

AI Avatar & Talking Head Videos

Create AI avatars and talking head videos via inference.sh CLI.

AI Avatar & Talking Head Videos

Quick Start

Requires inference.sh CLI (infsh). Install instructions

infsh login

# Create avatar video from image + audio
infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

Available Models

Model	App ID	Best For
OmniHuman 1.5	`bytedance/omnihuman-1-5`	Multi-character, best quality
OmniHuman 1.0	`bytedance/omnihuman-1-0`	Single character
Fabric 1.0	`falai/fabric-1-0`	Image talks with lipsync
PixVerse Lipsync	`falai/pixverse-lipsync`	Highly realistic

Search Avatar Apps

infsh app list --search "omnihuman"
infsh app list --search "lipsync"
infsh app list --search "fabric"

Examples

OmniHuman 1.5 (Multi-Character)

infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

Supports specifying which character to drive in multi-person images.

Fabric 1.0 (Image Talks)

infsh app run falai/fabric-1-0 --input '{
  "image_url": "https://face.jpg",
  "audio_url": "https://audio.mp3"
}'

PixVerse Lipsync

infsh app run falai/pixverse-lipsync --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

Generates highly realistic lipsync from any audio.

Full Workflow: TTS + Avatar

# 1. Generate speech from text
infsh app run infsh/kokoro-tts --input '{
  "prompt": "Welcome to our product demo. Today I will show you..."
}' > speech.json

# 2. Create avatar video with the speech
infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://presenter-photo.jpg",
  "audio_url": "<audio-url-from-step-1>"
}'

Full Workflow: Dub Video in Another Language

# 1. Transcribe original video
infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://video.mp4"}' > transcript.json

# 2. Translate text (manually or with an LLM)

# 3. Generate speech in new language
infsh app run infsh/kokoro-tts --input '{"text": "<translated-text>"}' > new_speech.json

# 4. Lipsync the original video with new audio
infsh app run infsh/latentsync-1-6 --input '{
  "video_url": "https://original-video.mp4",
  "audio_url": "<new-audio-url>"
}'

Use Cases

Marketing : Product demos with AI presenter
Education : Course videos, explainers
Localization : Dub content in multiple languages
Social Media : Consistent virtual influencer
Corporate : Training videos, announcements

Tips

Use high-quality portrait photos (front-facing, good lighting)
Audio should be clear with minimal background noise
OmniHuman 1.5 supports multiple people in one image
LatentSync is best for syncing existing videos to new audio

Related Skills

# Full platform skill (all 150+ apps)
npx skills add inference-sh/skills@infsh-cli

# Text-to-speech (generate audio for avatars)
npx skills add inference-sh/skills@text-to-speech

# Speech-to-text (transcribe for dubbing)
npx skills add inference-sh/skills@speech-to-text

# Video generation
npx skills add inference-sh/skills@ai-video-generation

# Image generation (create avatar images)
npx skills add inference-sh/skills@ai-image-generation

Browse all video apps: infsh app list --category video

Documentation

Running Apps - How to run apps via CLI
Content Pipeline Example - Building media workflows
Streaming Results - Real-time progress updates

Weekly Installs

7.3K

Repository

inferen-sh/skills

GitHub Stars

228

First Seen

Mar 12, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykWarn

Installed on

claude-code5.8K

gemini-cli5.3K

codex5.2K

opencode5.2K

amp5.2K

github-copilot5.2K

专业文案撰写指南：转化文案写作技巧、框架与SEO优化原则

48,600 周安装

AI虚拟形象视频生成工具 - 说话头部与口型同步技术，支持多语言配音

🇨🇳中文介绍

AI 虚拟形象与说话头部视频

快速开始

可用模型

相关 Skills

搜索虚拟形象应用

示例

OmniHuman 1.5 (多角色)

Fabric 1.0 (图像说话)

PixVerse 口型同步

完整工作流：TTS + 虚拟形象

完整工作流：为视频进行另一种语言配音

使用场景

提示

相关技能

文档

🇺🇸English

AI Avatar & Talking Head Videos

Quick Start

Available Models

Search Avatar Apps

Examples

OmniHuman 1.5 (Multi-Character)

Fabric 1.0 (Image Talks)

PixVerse Lipsync

Full Workflow: TTS + Avatar

Full Workflow: Dub Video in Another Language

Use Cases

Tips

Related Skills

Documentation

最新 Skills