gemini-live-api-dev by google-gemini/gemini-skills
npx skills add https://github.com/google-gemini/gemini-skills --skill gemini-live-api-devLive API 通过 WebSockets 实现与 Gemini 的低延迟、实时语音和视频交互。它处理连续的音频、视频或文本流,以提供即时、类人的语音响应。
核心功能:
[!NOTE] Live API 目前仅支持 WebSockets。如需 WebRTC 支持或简化集成,请使用合作伙伴集成方案。
gemini-2.5-flash-native-audio-preview-12-2025 — 原生音频输出、情感对话、主动音频、思考音效。128k 上下文窗口。这是所有 Live API 用例的推荐模型。[!WARNING] 以下 Live API 模型已弃用并将被关闭。请迁移至 。
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
gemini-2.5-flash-native-audio-preview-12-2025gemini-live-2.5-flash-preview — 发布于 2025年6月17日。关闭日期:2025年12月9日。gemini-2.0-flash-live-001 — 发布于 2025年4月9日。关闭日期:2025年12月9日。google-genai — pip install google-genai@google/genai — npm install @google/genai[!WARNING] 旧版 SDK
google-generativeai(Python) 和@google/generative-ai(JS) 已弃用。请使用上方的新版 SDK。
为了简化实时音频/视频应用开发,可以使用支持通过 WebRTC 或 WebSockets 集成 Gemini Live API 的第三方方案:
audio/pcm;rate=16000[!IMPORTANT] 对所有实时用户输入(音频、视频和文本)使用
send_realtime_input/sendRealtimeInput。仅将send_client_content/sendClientContent用于增量对话历史更新(将先前的轮次追加到上下文中),而不是用于发送新的用户消息。
[!WARNING] 不要在
sendRealtimeInput中使用media。请使用特定的键:audio用于音频数据,video用于图像/视频帧,text用于文本输入。
from google import genai
client = genai.Client(api_key="YOUR_API_KEY")
import { GoogleGenAI } from '@google/genai';
const ai = new GoogleGenAI({ apiKey: 'YOUR_API_KEY' });
from google.genai import types
config = types.LiveConnectConfig(
response_modalities=[types.Modality.AUDIO],
system_instruction=types.Content(
parts=[types.Part(text="You are a helpful assistant.")]
)
)
async with client.aio.live.connect(model="gemini-2.5-flash-native-audio-preview-12-2025", config=config) as session:
pass # Session is now active
const session = await ai.live.connect({
model: 'gemini-2.5-flash-native-audio-preview-12-2025',
config: {
responseModalities: ['audio'],
systemInstruction: { parts: [{ text: 'You are a helpful assistant.' }] }
},
callbacks: {
onopen: () => console.log('Connected'),
onmessage: (response) => console.log('Message:', response),
onerror: (error) => console.error('Error:', error),
onclose: () => console.log('Closed')
}
});
await session.send_realtime_input(text="Hello, how are you?")
session.sendRealtimeInput({ text: 'Hello, how are you?' });
await session.send_realtime_input(
audio=types.Blob(data=chunk, mime_type="audio/pcm;rate=16000")
)
session.sendRealtimeInput({
audio: { data: chunk.toString('base64'), mimeType: 'audio/pcm;rate=16000' }
});
# frame: raw JPEG-encoded bytes
await session.send_realtime_input(
video=types.Blob(data=frame, mime_type="image/jpeg")
)
session.sendRealtimeInput({
video: { data: frame.toString('base64'), mimeType: 'image/jpeg' }
});
async for response in session.receive():
content = response.server_content
if content:
# Audio
if content.model_turn:
for part in content.model_turn.parts:
if part.inline_data:
audio_data = part.inline_data.data
# Transcription
if content.input_transcription:
print(f"User: {content.input_transcription.text}")
if content.output_transcription:
print(f"Gemini: {content.output_transcription.text}")
# Interruption
if content.interrupted is True:
pass # Stop playback, clear audio queue
// Inside the onmessage callback
const content = response.serverContent;
if (content?.modelTurn?.parts) {
for (const part of content.modelTurn.parts) {
if (part.inlineData) {
const audioData = part.inlineData.data; // Base64 encoded
}
}
}
if (content?.inputTranscription) console.log('User:', content.inputTranscription.text);
if (content?.outputTranscription) console.log('Gemini:', content.outputTranscription.text);
if (content?.interrupted) { /* Stop playback, clear audio queue */ }
TEXT 或 AUDIO,不能同时支持两者send_realtime_input 处理所有实时用户输入(音频、视频、文本)。仅将 send_client_content 用于注入对话历史audioStreamEnd 以清空缓存的音频有关详细的 API 文档,请从官方文档索引获取:
llms.txt URL : https://ai.google.dev/gemini-api/docs/llms.txt
此索引包含所有文档页面的链接,格式为 .md.txt。使用网页抓取工具来:
llms.txt 以发现可用的文档页面https://ai.google.dev/gemini-api/docs/live-session.md.txt)[!IMPORTANT] 这些并非所有文档页面。请使用
llms.txt索引来发现可用的文档页面
Live API 支持 70 多种语言,包括:英语、西班牙语、法语、德语、意大利语、葡萄牙语、中文、日语、韩语、印地语、阿拉伯语、俄语等。原生音频模型会自动检测并切换语言。
每周安装次数
657
代码仓库
GitHub 星标数
2.3K
首次出现
2026年3月3日
安全审计
安装于
gemini-cli607
codex600
cursor599
opencode597
kimi-cli596
github-copilot596
The Live API enables low-latency, real-time voice and video interactions with Gemini over WebSockets. It processes continuous streams of audio, video, or text to deliver immediate, human-like spoken responses.
Key capabilities:
[!NOTE] The Live API currently only supports WebSockets. For WebRTC support or simplified integration, use a partner integration.
gemini-2.5-flash-native-audio-preview-12-2025 — Native audio output, affective dialog, proactive audio, thinking. 128k context window. This is the recommended model for all Live API use cases.[!WARNING] The following Live API models are deprecated and will be shut down. Migrate to
gemini-2.5-flash-native-audio-preview-12-2025.
gemini-live-2.5-flash-preview— Released June 17, 2025. Shutdown: December 9, 2025.gemini-2.0-flash-live-001— Released April 9, 2025. Shutdown: December 9, 2025.
google-genai — pip install google-genai@google/genai — npm install @google/genai[!WARNING] Legacy SDKs
google-generativeai(Python) and@google/generative-ai(JS) are deprecated. Use the new SDKs above.
To streamline real-time audio/video app development, use a third-party integration supporting the Gemini Live API over WebRTC or WebSockets :
audio/pcm;rate=16000[!IMPORTANT] Use
send_realtime_input/sendRealtimeInputfor all real-time user input (audio, video, and text). Usesend_client_content/sendClientContentonly for incremental conversation history updates (appending prior turns to context), not for sending new user messages.
[!WARNING] Do not use
mediainsendRealtimeInput. Use the specific keys:audiofor audio data,videofor images/video frames, andtextfor text input.
from google import genai
client = genai.Client(api_key="YOUR_API_KEY")
import { GoogleGenAI } from '@google/genai';
const ai = new GoogleGenAI({ apiKey: 'YOUR_API_KEY' });
from google.genai import types
config = types.LiveConnectConfig(
response_modalities=[types.Modality.AUDIO],
system_instruction=types.Content(
parts=[types.Part(text="You are a helpful assistant.")]
)
)
async with client.aio.live.connect(model="gemini-2.5-flash-native-audio-preview-12-2025", config=config) as session:
pass # Session is now active
const session = await ai.live.connect({
model: 'gemini-2.5-flash-native-audio-preview-12-2025',
config: {
responseModalities: ['audio'],
systemInstruction: { parts: [{ text: 'You are a helpful assistant.' }] }
},
callbacks: {
onopen: () => console.log('Connected'),
onmessage: (response) => console.log('Message:', response),
onerror: (error) => console.error('Error:', error),
onclose: () => console.log('Closed')
}
});
await session.send_realtime_input(text="Hello, how are you?")
session.sendRealtimeInput({ text: 'Hello, how are you?' });
await session.send_realtime_input(
audio=types.Blob(data=chunk, mime_type="audio/pcm;rate=16000")
)
session.sendRealtimeInput({
audio: { data: chunk.toString('base64'), mimeType: 'audio/pcm;rate=16000' }
});
# frame: raw JPEG-encoded bytes
await session.send_realtime_input(
video=types.Blob(data=frame, mime_type="image/jpeg")
)
session.sendRealtimeInput({
video: { data: frame.toString('base64'), mimeType: 'image/jpeg' }
});
async for response in session.receive():
content = response.server_content
if content:
# Audio
if content.model_turn:
for part in content.model_turn.parts:
if part.inline_data:
audio_data = part.inline_data.data
# Transcription
if content.input_transcription:
print(f"User: {content.input_transcription.text}")
if content.output_transcription:
print(f"Gemini: {content.output_transcription.text}")
# Interruption
if content.interrupted is True:
pass # Stop playback, clear audio queue
// Inside the onmessage callback
const content = response.serverContent;
if (content?.modelTurn?.parts) {
for (const part of content.modelTurn.parts) {
if (part.inlineData) {
const audioData = part.inlineData.data; // Base64 encoded
}
}
}
if (content?.inputTranscription) console.log('User:', content.inputTranscription.text);
if (content?.outputTranscription) console.log('Gemini:', content.outputTranscription.text);
if (content?.interrupted) { /* Stop playback, clear audio queue */ }
TEXT or AUDIO per session, not bothsend_realtime_input for all real-time user input (audio, video, text). Reserve send_client_content only for injecting conversation historyaudioStreamEnd when the mic is paused to flush cached audioFor detailed API documentation, fetch from the official docs index:
llms.txt URL : https://ai.google.dev/gemini-api/docs/llms.txt
This index contains links to all documentation pages in .md.txt format. Use web fetch tools to:
llms.txt to discover available documentation pageshttps://ai.google.dev/gemini-api/docs/live-session.md.txt)[!IMPORTANT] Those are not all the documentation pages. Use the
llms.txtindex to discover available documentation pages
The Live API supports 70 languages including: English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Hindi, Arabic, Russian, and many more. Native audio models automatically detect and switch languages.
Weekly Installs
657
Repository
GitHub Stars
2.3K
First Seen
Mar 3, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
gemini-cli607
codex600
cursor599
opencode597
kimi-cli596
github-copilot596
AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具
41,400 周安装
Entity Framework Core 最佳实践指南 - 数据上下文设计、性能优化、迁移与安全
7,500 周安装
Dataverse Python 高级模式:生产级SDK代码,含错误处理、批量操作与Pandas集成
7,600 周安装
C# XML 注释最佳实践指南 - 提升代码文档质量与可读性
7,500 周安装
Boost Prompt - AI 提示词优化助手 | GitHub Copilot 增强工具 | 提升编程效率
7,600 周安装
AI故事板创作指南:使用inference.sh CLI快速生成电影分镜与镜头脚本
7,500 周安装
SEO内容简报工具 - 数据驱动的内容策略与SERP分析指南
7,500 周安装