Google Gemini API 完整指南：最新SDK迁移、模型对比与实战教程（2025版）

google-gemini-api by jezweb/claude-skills

467 周安装量

650 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/jezweb/claude-skills --skill google-gemini-api

AI/机器学习 Node.js API

🇨🇳中文介绍

Google Gemini API - 完整指南

版本 : 3.0.0 (新增 14 个已知问题) 包 : @google/genai@1.35.0 (⚠️ 不是 @google/generative-ai) 最后更新 : 2026-01-21

⚠️ 关键 SDK 迁移警告

已弃用的 SDK : @google/generative-ai (已于 2025 年 11 月 30 日停止支持) 当前 SDK : @google/genai v1.27+

如果你看到使用 @google/generative-ai 的代码，那已经过时了！

本技能使用正确的当前 SDK 并提供了完整的迁移指南。

状态

✅ 第一阶段完成 :

✅ 文本生成 (基础 + 流式)
✅ 多模态输入 (图像、视频、音频、PDF)
✅ 函数调用 (基础 + 并行执行)
✅ 系统指令与多轮对话
✅ 思考模式配置
✅ 生成参数 (temperature、top-p、top-k、停止序列)
✅ Node.js SDK (@google/genai) 和 fetch 两种方法

✅ 第二阶段完成 :

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

功能	3-Flash	3-Pro (预览版)	2.5-Pro	2.5-Flash	2.5-Flash-Lite
思考模式	✅ 默认开启	待定	✅ 默认开启	✅ 默认开启	✅ 默认开启
函数调用	✅	✅	✅	✅	✅
多模态	✅ 增强	✅ 增强	✅	✅	✅
流式传输	✅	✅	✅	✅	✅
系统指令	✅	✅	✅	✅	✅
上下文窗口	1,048,576 输入	待定	1,048,576 输入	1,048,576 输入	1,048,576 输入
输出令牌	65,536 最大	待定	65,536 最大	65,536 最大	65,536 最大
状态	正式可用	预览版	稳定	稳定	稳定

参数	范围	默认值	用例
temperature	0.0-2.0	1.0	越低越专注，越高越有创意
topP	0.0-1.0	0.95	核心采样阈值
topK	1-100+	40	限制为前 K 个令牌
maxOutputTokens	1-65536	模型最大值	控制响应长度
stopSequences	数组	无	在特定字符串处停止生成

🇺🇸English

Google Gemini API - Complete Guide

Version : 3.0.0 (14 Known Issues Added) Package : @google/genai@1.35.0 (⚠️ NOT @google/generative-ai) Last Updated : 2026-01-21

⚠️ CRITICAL SDK MIGRATION WARNING

DEPRECATED SDK : @google/generative-ai (sunset November 30, 2025) CURRENT SDK : @google/genai v1.27+

If you see code using@google/generative-ai, it's outdated!

This skill uses the correct current SDK and provides a complete migration guide.

Status

✅ Phase 1 Complete :

✅ Text Generation (basic + streaming)
✅ Multimodal Inputs (images, video, audio, PDFs)
✅ Function Calling (basic + parallel execution)
✅ System Instructions & Multi-turn Chat
✅ Thinking Mode Configuration
✅ Generation Parameters (temperature, top-p, top-k, stop sequences)
✅ Both Node.js SDK (@google/genai) and fetch approaches

✅ Phase 2 Complete :

✅ Context Caching (cost optimization with TTL-based caching)
✅ Code Execution (built-in Python interpreter and sandbox)
✅ Grounding with Google Search (real-time web information + citations)

📦 Separate Skills :

Embeddings : See google-gemini-embeddings skill for text-embedding-004

Phase 1 - Core Features :

Quick Start
Current Models (2025)
SDK vs Fetch Approaches
Text Generation
Streaming
Multimodal Inputs
Function Calling
System Instructions
Multi-turn Chat
Thinking Mode
Generation Configuration

Phase 2 - Advanced Features : 12. Context Caching 13. Code Execution 14. Grounding with Google Search

Common Reference : 15. Known Issues Prevention 16. Error Handling 17. Rate Limits 18. SDK Migration Guide 19. Production Best Practices

Quick Start

Installation

CORRECT SDK:

npm install @google/genai@1.34.0

❌ WRONG (DEPRECATED):

npm install @google/generative-ai  # DO NOT USE!

Environment Setup

export GEMINI_API_KEY="..."

Or create .env file:

GEMINI_API_KEY=...

First Text Generation (Node.js SDK)

import { GoogleGenAI } from '@google/genai';

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: 'Explain quantum computing in simple terms'
});

console.log(response.text);

First Text Generation (Fetch - Cloudflare Workers)

const response = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent`,
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': env.GEMINI_API_KEY,
    },
    body: JSON.stringify({
      contents: [{ parts: [{ text: 'Explain quantum computing in simple terms' }] }]
    }),
  }
);

const data = await response.json();
console.log(data.candidates[0].content.parts[0].text);

Current Models (2025)

Gemini 3 Series (December 2025)

gemini-3-flash

Context : 1,048,576 input tokens / 65,536 output tokens
Status : 🆕 Generally Available (December 2025)
Description : Google's fastest and most efficient Gemini 3 model for production workloads
Best for : High-throughput applications, low-latency responses, cost-sensitive production
Features : Enhanced multimodal, function calling, streaming, thinking mode
Benchmark Performance : Matches gemini-2.5-pro quality at gemini-2.5-flash speed/cost
Recommended for : Production use cases requiring speed + quality balance

gemini-3-pro-preview

Context : TBD (documentation pending)
Status : Preview release (November 18, 2025)
Description : Google's newest and most intelligent AI model with state-of-the-art reasoning
Best for : Most complex reasoning tasks, advanced multimodal understanding, benchmark-critical applications
Features : Enhanced multimodal (text, image, video, audio, PDF), function calling, streaming
Benchmark Performance : Outperforms Gemini 2.5 Pro on every major AI benchmark
⚠️ Preview Models Warning : Preview models have NO SLAs and can change or be deprecated with little notice. Use GA (generally available) models for production. See Issue #13

Gemini 2.5 Series (General Availability - Stable)

gemini-2.5-pro

Context : 1,048,576 input tokens / 65,536 output tokens
Description : State-of-the-art thinking model for complex reasoning
Best for : Code, math, STEM, complex problem-solving
Features : Thinking mode (default on), function calling, multimodal, streaming
Knowledge cutoff : January 2025

gemini-2.5-flash

Context : 1,048,576 input tokens / 65,536 output tokens
Description : Best price-performance workhorse model
Best for : Large-scale processing, low-latency, high-volume, agentic use cases
Features : Thinking mode (default on), function calling, multimodal, streaming
Knowledge cutoff : January 2025

gemini-2.5-flash-lite

Context : 1,048,576 input tokens / 65,536 output tokens
Description : Cost-optimized, fastest 2.5 model
Best for : High throughput, cost-sensitive applications
Features : Thinking mode (default on), function calling, multimodal, streaming
Knowledge cutoff : January 2025

Model Feature Matrix

Feature	3-Flash	3-Pro (Preview)	2.5-Pro	2.5-Flash	2.5-Flash-Lite
Thinking Mode	✅ Default ON	TBD	✅ Default ON	✅ Default ON	✅ Default ON
Function Calling	✅	✅	✅	✅	✅
Multimodal	✅ Enhanced	✅ Enhanced	✅	✅	✅
Streaming	✅	✅	✅	✅	✅
System Instructions	✅	✅	✅

⚠️ Context Window Correction

ACCURATE (Gemini 2.5) : Gemini 2.5 models support 1,048,576 input tokens (NOT 2M!) OUTDATED : Only Gemini 1.5 Pro (previous generation) had 2M token context window GEMINI 3 : Context window specifications pending official documentation

Common mistake : Claiming Gemini 2.5 has 2M tokens. It doesn't. This skill prevents this error.

SDK vs Fetch Approaches

Node.js SDK (@google/genai)

Pros:

Type-safe with TypeScript
Easier API (simpler syntax)
Built-in chat helpers
Automatic SSE parsing for streaming
Better error handling

Cons:

Requires Node.js or compatible runtime
Larger bundle size
May not work in all edge runtimes

Use when: Building Node.js apps, Next.js Server Actions/Components, or any environment with Node.js compatibility

Fetch-based (Direct REST API)

Pros:

Works in any JavaScript environment (Cloudflare Workers, Deno, Bun, browsers)
Minimal dependencies
Smaller bundle size
Full control over requests

Cons:

More verbose syntax
Manual SSE parsing for streaming
No built-in chat helpers
Manual error handling

Use when: Deploying to Cloudflare Workers, browser clients, or lightweight edge runtimes

Text Generation

Basic Text Generation (SDK)

import { GoogleGenAI } from '@google/genai';

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: 'Write a haiku about artificial intelligence'
});

console.log(response.text);

Basic Text Generation (Fetch)

const response = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent`,
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': env.GEMINI_API_KEY,
    },
    body: JSON.stringify({
      contents: [
        {
          parts: [
            { text: 'Write a haiku about artificial intelligence' }
          ]
        }
      ]
    }),
  }
);

const data = await response.json();
console.log(data.candidates[0].content.parts[0].text);

Response Structure

{
  text: string,                  // Convenience accessor for text content
  candidates: [
    {
      content: {
        parts: [
          { text: string }       // Generated text
        ],
        role: string             // "model"
      },
      finishReason: string,      // "STOP" | "MAX_TOKENS" | "SAFETY" | "OTHER"
      index: number
    }
  ],
  usageMetadata: {
    promptTokenCount: number,
    candidatesTokenCount: number,
    totalTokenCount: number
  }
}

Streaming

Streaming with SDK (Async Iteration)

const response = await ai.models.generateContentStream({
  model: 'gemini-2.5-flash',
  contents: 'Write a 200-word story about time travel'
});

for await (const chunk of response) {
  process.stdout.write(chunk.text);
}

Streaming with Fetch (SSE Parsing)

const response = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:streamGenerateContent`,
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': env.GEMINI_API_KEY,
    },
    body: JSON.stringify({
      contents: [{ parts: [{ text: 'Write a 200-word story about time travel' }] }]
    }),
  }
);

const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split('\n');
  buffer = lines.pop() || '';

  for (const line of lines) {
    if (line.trim() === '' || line.startsWith('data: [DONE]')) continue;
    if (!line.startsWith('data: ')) continue;

    try {
      const data = JSON.parse(line.slice(6));
      const text = data.candidates[0]?.content?.parts[0]?.text;
      if (text) {
        process.stdout.write(text);
      }
    } catch (e) {
      // Skip invalid JSON
    }
  }
}

Key Points:

Use streamGenerateContent endpoint (not generateContent)
Parse Server-Sent Events (SSE) format: data: {json}\n\n
Handle incomplete chunks in buffer
Skip empty lines and [DONE] markers

Multimodal Inputs

Gemini 2.5 models support text + images + video + audio + PDFs in the same request.

Images (Vision)

SDK Approach

import { GoogleGenAI } from '@google/genai';
import fs from 'fs';

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

// From file
const imageData = fs.readFileSync('/path/to/image.jpg');
const base64Image = imageData.toString('base64');

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: [
    {
      parts: [
        { text: 'What is in this image?' },
        {
          inlineData: {
            data: base64Image,
            mimeType: 'image/jpeg'
          }
        }
      ]
    }
  ]
});

console.log(response.text);

Fetch Approach

const imageData = fs.readFileSync('/path/to/image.jpg');
const base64Image = imageData.toString('base64');

const response = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent`,
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': env.GEMINI_API_KEY,
    },
    body: JSON.stringify({
      contents: [
        {
          parts: [
            { text: 'What is in this image?' },
            {
              inlineData: {
                data: base64Image,
                mimeType: 'image/jpeg'
              }
            }
          ]
        }
      ]
    }),
  }
);

const data = await response.json();
console.log(data.candidates[0].content.parts[0].text);

Supported Image Formats:

JPEG (.jpg, .jpeg)
PNG (.png)
WebP (.webp)
HEIC (.heic)
HEIF (.heif)

Max Image Size : 20MB per image

Video

// Video must be < 2 minutes for inline data
const videoData = fs.readFileSync('/path/to/video.mp4');
const base64Video = videoData.toString('base64');

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: [
    {
      parts: [
        { text: 'Describe what happens in this video' },
        {
          inlineData: {
            data: base64Video,
            mimeType: 'video/mp4'
          }
        }
      ]
    }
  ]
});

console.log(response.text);

Supported Video Formats:

MP4 (.mp4)
MPEG (.mpeg)
MOV (.mov)
AVI (.avi)
FLV (.flv)
MPG (.mpg)
WebM (.webm)
WMV (.wmv)

Max Video Length (inline) : 2 minutes Max Video Size : 2GB (use File API for larger files - Phase 2)

Audio

const audioData = fs.readFileSync('/path/to/audio.mp3');
const base64Audio = audioData.toString('base64');

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: [
    {
      parts: [
        { text: 'Transcribe and summarize this audio' },
        {
          inlineData: {
            data: base64Audio,
            mimeType: 'audio/mp3'
          }
        }
      ]
    }
  ]
});

console.log(response.text);

Supported Audio Formats:

MP3 (.mp3)
WAV (.wav)
FLAC (.flac)
AAC (.aac)
OGG (.ogg)
OPUS (.opus)

Max Audio Size : 20MB

PDFs

const pdfData = fs.readFileSync('/path/to/document.pdf');
const base64Pdf = pdfData.toString('base64');

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: [
    {
      parts: [
        { text: 'Summarize the key points in this PDF' },
        {
          inlineData: {
            data: base64Pdf,
            mimeType: 'application/pdf'
          }
        }
      ]
    }
  ]
});

console.log(response.text);

Max PDF Size : 30MB PDF Limitations : Text-based PDFs work best; scanned images may have lower accuracy

Multiple Inputs

You can combine multiple modalities in one request:

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: [
    {
      parts: [
        { text: 'Compare these two images and describe the differences:' },
        { inlineData: { data: base64Image1, mimeType: 'image/jpeg' } },
        { inlineData: { data: base64Image2, mimeType: 'image/jpeg' } }
      ]
    }
  ]
});

Function Calling

Gemini supports function calling (tool use) to connect models with external APIs and systems.

Basic Function Calling (SDK)

import { GoogleGenAI, FunctionCallingConfigMode } from '@google/genai';

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

// Define function declarations
const getCurrentWeather = {
  name: 'get_current_weather',
  description: 'Get the current weather for a location',
  parametersJsonSchema: {
    type: 'object',
    properties: {
      location: {
        type: 'string',
        description: 'City name, e.g. San Francisco'
      },
      unit: {
        type: 'string',
        enum: ['celsius', 'fahrenheit']
      }
    },
    required: ['location']
  }
};

// Make request with tools
const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: 'What\'s the weather in Tokyo?',
  config: {
    tools: [
      { functionDeclarations: [getCurrentWeather] }
    ]
  }
});

// Check if model wants to call a function
const functionCall = response.candidates[0].content.parts[0].functionCall;

if (functionCall) {
  console.log('Function to call:', functionCall.name);
  console.log('Arguments:', functionCall.args);

  // Execute the function (your implementation)
  const weatherData = await fetchWeather(functionCall.args.location);

  // Send function result back to model
  const finalResponse = await ai.models.generateContent({
    model: 'gemini-2.5-flash',
    contents: [
      'What\'s the weather in Tokyo?',
      response.candidates[0].content, // Original assistant response with function call
      {
        parts: [
          {
            functionResponse: {
              name: functionCall.name,
              response: weatherData
            }
          }
        ]
      }
    ],
    config: {
      tools: [
        { functionDeclarations: [getCurrentWeather] }
      ]
    }
  });

  console.log(finalResponse.text);
}

Function Calling (Fetch)

const response = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent`,
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': env.GEMINI_API_KEY,
    },
    body: JSON.stringify({
      contents: [
        { parts: [{ text: 'What\'s the weather in Tokyo?' }] }
      ],
      tools: [
        {
          functionDeclarations: [
            {
              name: 'get_current_weather',
              description: 'Get the current weather for a location',
              parameters: {
                type: 'object',
                properties: {
                  location: {
                    type: 'string',
                    description: 'City name'
                  }
                },
                required: ['location']
              }
            }
          ]
        }
      ]
    }),
  }
);

const data = await response.json();
const functionCall = data.candidates[0]?.content?.parts[0]?.functionCall;

if (functionCall) {
  // Execute function and send result back (same flow as SDK)
}

Parallel Function Calling

Gemini can call multiple independent functions simultaneously:

const tools = [
  {
    functionDeclarations: [
      {
        name: 'get_weather',
        description: 'Get weather for a location',
        parametersJsonSchema: {
          type: 'object',
          properties: {
            location: { type: 'string' }
          },
          required: ['location']
        }
      },
      {
        name: 'get_population',
        description: 'Get population of a city',
        parametersJsonSchema: {
          type: 'object',
          properties: {
            city: { type: 'string' }
          },
          required: ['city']
        }
      }
    ]
  }
];

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: 'What is the weather and population of Tokyo?',
  config: { tools }
});

// Model may return MULTIPLE function calls in parallel
const functionCalls = response.candidates[0].content.parts.filter(
  part => part.functionCall
);

console.log(`Model wants to call ${functionCalls.length} functions in parallel`);

Function Calling Modes

import { FunctionCallingConfigMode } from '@google/genai';

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: 'What\'s the weather?',
  config: {
    tools: [{ functionDeclarations: [getCurrentWeather] }],
    toolConfig: {
      functionCallingConfig: {
        mode: FunctionCallingConfigMode.ANY, // Force function call
        // mode: FunctionCallingConfigMode.AUTO, // Model decides (default)
        // mode: FunctionCallingConfigMode.NONE, // Never call functions
        allowedFunctionNames: ['get_current_weather'] // Optional: restrict to specific functions
      }
    }
  }
});

Modes:

AUTO (default): Model decides whether to call functions
ANY: Force model to call at least one function
NONE: Disable function calling for this request

System Instructions

System instructions guide the model's behavior and set context. They are separate from the conversation messages.

SDK Approach

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  systemInstruction: 'You are a helpful AI assistant that always responds in the style of a pirate. Use nautical terminology and end sentences with "arrr".',
  contents: 'Explain what a database is'
});

console.log(response.text);
// Output: "Ahoy there! A database be like a treasure chest..."

Fetch Approach

const response = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent`,
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': env.GEMINI_API_KEY,
    },
    body: JSON.stringify({
      systemInstruction: {
        parts: [
          { text: 'You are a helpful AI assistant that always responds in the style of a pirate.' }
        ]
      },
      contents: [
        { parts: [{ text: 'Explain what a database is' }] }
      ]
    }),
  }
);

Key Points:

System instructions are NOT part of contents array
They are set once at the top level of the request
They persist for the entire conversation (when using multi-turn chat)
They don't count as user or model messages

Multi-turn Chat

For conversations with history, use the SDK's chat helpers or manually manage conversation state.

SDK Chat Helpers (Recommended)

const chat = await ai.models.createChat({
  model: 'gemini-2.5-flash',
  systemInstruction: 'You are a helpful coding assistant.',
  history: [] // Start empty or with previous messages
});

// Send first message
const response1 = await chat.sendMessage('What is TypeScript?');
console.log('Assistant:', response1.text);

// Send follow-up (context is automatically maintained)
const response2 = await chat.sendMessage('How do I install it?');
console.log('Assistant:', response2.text);

// Get full chat history
const history = chat.getHistory();
console.log('Full conversation:', history);

Manual Chat Management (Fetch)

const conversationHistory = [];

// First turn
const response1 = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent`,
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': env.GEMINI_API_KEY,
    },
    body: JSON.stringify({
      contents: [
        {
          role: 'user',
          parts: [{ text: 'What is TypeScript?' }]
        }
      ]
    }),
  }
);

const data1 = await response1.json();
const assistantReply1 = data1.candidates[0].content.parts[0].text;

// Add to history
conversationHistory.push(
  { role: 'user', parts: [{ text: 'What is TypeScript?' }] },
  { role: 'model', parts: [{ text: assistantReply1 }] }
);

// Second turn (include full history)
const response2 = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent`,
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': env.GEMINI_API_KEY,
    },
    body: JSON.stringify({
      contents: [
        ...conversationHistory,
        { role: 'user', parts: [{ text: 'How do I install it?' }] }
      ]
    }),
  }
);

Message Roles:

user: User messages
model: Assistant responses

⚠️ Important : Chat helpers are SDK-only. With fetch, you must manually manage conversation history.

Thinking Mode

Gemini 2.5 models have thinking mode enabled by default for enhanced quality. You can configure the thinking budget.

Configure Thinking Budget (SDK)

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: 'Solve this complex math problem: ...',
  config: {
    thinkingConfig: {
      thinkingBudget: 8192 // Max tokens for thinking (default: model-dependent)
    }
  }
});

Configure Thinking Budget (Fetch)

const response = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent`,
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': env.GEMINI_API_KEY,
    },
    body: JSON.stringify({
      contents: [{ parts: [{ text: 'Solve this complex math problem: ...' }] }],
      generationConfig: {
        thinkingConfig: {
          thinkingBudget: 8192
        }
      }
    }),
  }
);

Configure Thinking Level (SDK) - New in v1.30.0

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: 'Solve this complex problem: ...',
  config: {
    thinkingConfig: {
      thinkingLevel: 'MEDIUM' // 'LOW' | 'MEDIUM' | 'HIGH'
    }
  }
});

Thinking Levels:

LOW: Minimal internal reasoning (faster, lower quality)
MEDIUM: Balanced reasoning (default)
HIGH: Maximum reasoning depth (slower, higher quality)

Key Points:

Thinking mode is always enabled on Gemini 2.5 models (cannot be disabled)
Higher thinking budgets allow more internal reasoning (may increase latency)
thinkingLevel provides simpler control than thinkingBudget (new in v1.30.0)
Default budget varies by model (usually sufficient for most tasks)
Only increase budget/level for very complex reasoning tasks

Generation Configuration

Customize model behavior with generation parameters.

All Configuration Options (SDK)

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: 'Write a creative story',
  config: {
    temperature: 0.9,           // Randomness (0.0-2.0, default: 1.0)
    topP: 0.95,                 // Nucleus sampling (0.0-1.0)
    topK: 40,                   // Top-k sampling
    maxOutputTokens: 2048,      // Max tokens to generate
    stopSequences: ['END'],     // Stop generation if these appear
    responseMimeType: 'text/plain', // Or 'application/json' for JSON mode
    candidateCount: 1           // Number of response candidates (usually 1)
  }
});

All Configuration Options (Fetch)

const response = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent`,
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': env.GEMINI_API_KEY,
    },
    body: JSON.stringify({
      contents: [{ parts: [{ text: 'Write a creative story' }] }],
      generationConfig: {
        temperature: 0.9,
        topP: 0.95,
        topK: 40,
        maxOutputTokens: 2048,
        stopSequences: ['END'],
        responseMimeType: 'text/plain',
        candidateCount: 1
      }
    }),
  }
);

Parameter Guidelines

Parameter	Range	Default	Use Case
temperature	0.0-2.0	1.0	Lower = more focused, higher = more creative
topP	0.0-1.0	0.95	Nucleus sampling threshold
topK	1-100+	40	Limit to top K tokens
maxOutputTokens	1-65536	Model max	Control response length
stopSequences	Array	None	Stop generation at specific strings

Tips:

For factual tasks : Use low temperature (0.0-0.3)
For creative tasks : Use high temperature (0.7-1.5)
topP and topK both control randomness; use one or the other (not both)
Always set maxOutputTokens to prevent excessive generation

Context Caching

Context caching allows you to cache frequently used content (like system instructions, large documents, or video files) to reduce costs by up to 90% and improve latency.

How It Works

Create a cache with your repeated content
Reference the cache in subsequent requests
Save tokens - cached tokens cost significantly less
TTL management - caches expire after specified time

Benefits

Cost savings : Up to 90% reduction on cached tokens
Reduced latency : Faster responses by reusing processed content
Consistent context : Same large context across multiple requests

Cache Creation (SDK)

import { GoogleGenAI } from '@google/genai';
import fs from 'fs';

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

// Create a cache for a large document
const documentText = fs.readFileSync('./large-document.txt', 'utf-8');

const cache = await ai.caches.create({
  model: 'gemini-2.5-flash',
  config: {
    displayName: 'large-doc-cache', // Identifier for the cache
    systemInstruction: 'You are an expert at analyzing legal documents.',
    contents: documentText,
    ttl: '3600s', // Cache for 1 hour
  }
});

console.log('Cache created:', cache.name);
console.log('Expires at:', cache.expireTime);

Cache Creation (Fetch)

const response = await fetch(
  'https://generativelanguage.googleapis.com/v1beta/cachedContents',
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': env.GEMINI_API_KEY,
    },
    body: JSON.stringify({
      model: 'models/gemini-2.5-flash',
      displayName: 'large-doc-cache',
      systemInstruction: {
        parts: [{ text: 'You are an expert at analyzing legal documents.' }]
      },
      contents: [
        { parts: [{ text: documentText }] }
      ],
      ttl: '3600s'
    }),
  }
);

const cache = await response.json();
console.log('Cache created:', cache.name);

Using a Cache (SDK)

// Generate content using the cache
const response = await ai.models.generateContent({
  model: cache.name, // Use cache name as model
  contents: 'Summarize the key points in the document'
});

console.log(response.text);

Using a Cache (Fetch)

const response = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/${cache.name}:generateContent`,
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': env.GEMINI_API_KEY,
    },
    body: JSON.stringify({
      contents: [
        { parts: [{ text: 'Summarize the key points in the document' }] }
      ]
    }),
  }
);

const data = await response.json();
console.log(data.candidates[0].content.parts[0].text);

Update Cache TTL (SDK)

import { UpdateCachedContentConfig } from '@google/genai';

await ai.caches.update({
  name: cache.name,
  config: {
    ttl: '7200s' // Extend to 2 hours
  }
});

Update Cache with Expiration Time (SDK)

// Set specific expiration time (must be timezone-aware)
const in10Minutes = new Date(Date.now() + 10 * 60 * 1000);

await ai.caches.update({
  name: cache.name,
  config: {
    expireTime: in10Minutes
  }
});

List and Delete Caches (SDK)

// List all caches
const caches = await ai.caches.list();
for (const cache of caches) {
  console.log(cache.name, cache.displayName);
}

// Delete a specific cache
await ai.caches.delete({ name: cache.name });

Caching with Video Files

import { GoogleGenAI } from '@google/genai';
import fs from 'fs';

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

// Upload video file
const videoFile = await ai.files.upload({
  file: fs.createReadStream('./video.mp4')
});

// Wait for processing
while (videoFile.state.name === 'PROCESSING') {
  await new Promise(resolve => setTimeout(resolve, 2000));
  videoFile = await ai.files.get({ name: videoFile.name });
}

// Create cache with video
const cache = await ai.caches.create({
  model: 'gemini-2.5-flash',
  config: {
    displayName: 'video-analysis-cache',
    systemInstruction: 'You are an expert video analyzer.',
    contents: [videoFile],
    ttl: '300s' // 5 minutes
  }
});

// Use cache for multiple queries
const response1 = await ai.models.generateContent({
  model: cache.name,
  contents: 'What happens in the first minute?'
});

const response2 = await ai.models.generateContent({
  model: cache.name,
  contents: 'Describe the main characters'
});

Key Points

When to Use Caching:

Large system instructions used repeatedly
Long documents analyzed multiple times
Video/audio files queried with different prompts
Consistent context across conversation sessions

TTL Guidelines:

Short sessions: 300s (5 min) to 3600s (1 hour)
Long sessions: 3600s (1 hour) to 86400s (24 hours)
Maximum: 7 days

Cost Savings:

Cached input tokens: ~90% cheaper than regular tokens
Output tokens: Same price (not cached)

Important:

You must use explicit model version suffixes (e.g., gemini-2.5-flash-001, NOT just gemini-2.5-flash)
Caches are automatically deleted after TTL expires
Update TTL before expiration to extend cache lifetime

Code Execution

Gemini models can generate and execute Python code to solve problems requiring computation, data analysis, or visualization.

How It Works

Model generates executable Python code
Code runs in secure sandbox
Results are returned to the model
Model incorporates results into response

Supported Operations

Mathematical calculations
Data analysis and statistics
File processing (CSV, JSON, etc.)
Chart and graph generation
Algorithm implementation
Data transformations

Available Python Packages

Standard Library:

math, statistics, random, datetime, json, csv, re
collections, itertools, functools

Data Science:

numpy, pandas, scipy

Visualization:

matplotlib, seaborn

Note : Limited package availability compared to full Python environment

Basic Code Execution (SDK)

import { GoogleGenAI, Tool, ToolCodeExecution } from '@google/genai';

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: 'What is the sum of the first 50 prime numbers? Generate and run code for the calculation.',
  config: {
    tools: [{ codeExecution: {} }]
  }
});

// Parse response parts
for (const part of response.candidates[0].content.parts) {
  if (part.text) {
    console.log('Text:', part.text);
  }
  if (part.executableCode) {
    console.log('Generated Code:', part.executableCode.code);
  }
  if (part.codeExecutionResult) {
    console.log('Execution Output:', part.codeExecutionResult.output);
  }
}

Basic Code Execution (Fetch)

const response = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent`,
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': env.GEMINI_API_KEY,
    },
    body: JSON.stringify({
      tools: [{ code_execution: {} }],
      contents: [
        {
          parts: [
            { text: 'What is the sum of the first 50 prime numbers? Generate and run code.' }
          ]
        }
      ]
    }),
  }
);

const data = await response.json();

for (const part of data.candidates[0].content.parts) {
  if (part.text) {
    console.log('Text:', part.text);
  }
  if (part.executableCode) {
    console.log('Code:', part.executableCode.code);
  }
  if (part.codeExecutionResult) {
    console.log('Result:', part.codeExecutionResult.output);
  }
}

Chat with Code Execution (SDK)

const chat = await ai.chats.create({
  model: 'gemini-2.5-flash',
  config: {
    tools: [{ codeExecution: {} }]
  }
});

let response = await chat.sendMessage('I have a math question for you.');
console.log(response.text);

response = await chat.sendMessage(
  'Calculate the Fibonacci sequence up to the 20th number and sum them.'
);

// Model will generate and execute code, then provide answer
for (const part of response.candidates[0].content.parts) {
  if (part.text) console.log(part.text);
  if (part.executableCode) console.log('Code:', part.executableCode.code);
  if (part.codeExecutionResult) console.log('Output:', part.codeExecutionResult.output);
}

Data Analysis Example

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: `
    Analyze this sales data and calculate:
    1. Total revenue
    2. Average sale price
    3. Best-selling month

    Data (CSV format):
    month,sales,revenue
    Jan,150,45000
    Feb,200,62000
    Mar,175,53000
    Apr,220,68000
  `,
  config: {
    tools: [{ codeExecution: {} }]
  }
});

// Model will generate pandas/numpy code to analyze data
for (const part of response.candidates[0].content.parts) {
  if (part.text) console.log(part.text);
  if (part.executableCode) console.log('Analysis Code:', part.executableCode.code);
  if (part.codeExecutionResult) console.log('Results:', part.codeExecutionResult.output);
}

Visualization Example

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: 'Create a bar chart showing the distribution of prime numbers under 100 by their last digit. Generate the chart and describe the pattern.',
  config: {
    tools: [{ codeExecution: {} }]
  }
});

// Model generates matplotlib code, executes it, and describes results
for (const part of response.candidates[0].content.parts) {
  if (part.text) console.log(part.text);
  if (part.executableCode) console.log('Chart Code:', part.executableCode.code);
  if (part.codeExecutionResult) {
    // Note: Chart image data would be in output
    console.log('Execution completed');
  }
}

Response Structure

{
  candidates: [
    {
      content: {
        parts: [
          { text: "I'll calculate that for you." },
          {
            executableCode: {
              language: "PYTHON",
              code: "def is_prime(n):\n  if n <= 1:\n    return False\n  ..."
            }
          },
          {
            codeExecutionResult: {
              outcome: "OUTCOME_OK", // or "OUTCOME_FAILED"
              output: "5117\n"
            }
          },
          { text: "The sum of the first 50 prime numbers is 5117." }
        ]
      }
    }
  ]
}

Error Handling

for (const part of response.candidates[0].content.parts) {
  if (part.codeExecutionResult) {
    if (part.codeExecutionResult.outcome === 'OUTCOME_FAILED') {
      console.error('Code execution failed:', part.codeExecutionResult.output);
    } else {
      console.log('Success:', part.codeExecutionResult.output);
    }
  }
}

Key Points

When to Use Code Execution:

Complex mathematical calculations
Data analysis and statistics
Algorithm implementations
File parsing and processing
Chart generation
Computational problems

Limitations:

Sandbox environment (limited file system access)
Limited Python package availability
Execution timeout limits
No network access from code
No persistent state between executions

Best Practices:

Specify what calculation or analysis you need clearly
Request code generation explicitly ("Generate and run code...")
Check outcome field for errors
Use for deterministic computations, not for general programming

Important:

Available on all Gemini 2.5 models (Pro, Flash, Flash-Lite)
Code runs in isolated sandbox for security
Supports Python with standard library and common data science packages

Grounding with Google Search

Grounding connects the model to real-time web information, reducing hallucinations and providing up-to-date, fact-checked responses with citations.

How It Works

Model determines if it needs current information
Automatically performs Google Search
Processes search results
Incorporates findings into response
Provides citations and source URLs

Benefits

Real-time information : Access to current events and data
Reduced hallucinations : Answers grounded in web sources
Verifiable : Citations allow fact-checking
Up-to-date : Not limited to model's training cutoff

Grounding Options

1. Google Search (`googleSearch`) - Recommended for Gemini 2.5

const groundingTool = {
  googleSearch: {}
};

Features:

Simple configuration
Automatic search when needed
Available on all Gemini 2.5 models

2. FileSearch - New in v1.29.0 (Preview)

const fileSearchTool = {
  fileSearch: {
    fileSearchStoreId: 'store-id-here' // Created via FileSearchStore APIs
  }
};

Features:

Search through your own document collections
Upload and index custom knowledge bases
Alternative to web search for proprietary data
Preview feature (requires FileSearchStore setup)

Note : See FileSearch documentation for store creation and management.

3. Google Search Retrieval (`googleSearchRetrieval`) - Legacy (Gemini 1.5)

const retrievalTool = {
  googleSearchRetrieval: {
    dynamicRetrievalConfig: {
      mode: 'MODE_DYNAMIC',
      dynamicThreshold: 0.7 // Only search if confidence < 70%
    }
  }
};

Features:

Dynamic threshold control
Used with Gemini 1.5 models
More configuration options

Basic Grounding (SDK) - Gemini 2.5

import { GoogleGenAI } from '@google/genai';

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: 'Who won the euro 2024?',
  config: {
    tools: [{ googleSearch: {} }]
  }
});

console.log(response.text);

// Check if grounding was used
if (response.candidates[0].groundingMetadata) {
  console.log('Search was performed!');
  console.log('Sources:', response.candidates[0].groundingMetadata);
}

Basic Grounding (Fetch) - Gemini 2.5

const response = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent`,
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': env.GEMINI_API_KEY,
    },
    body: JSON.stringify({
      contents: [
        { parts: [{ text: 'Who won the euro 2024?' }] }
      ],
      tools: [
        { google_search: {} }
      ]
    }),
  }
);

const data = await response.json();
console.log(data.candidates[0].content.parts[0].text);

if (data.candidates[0].groundingMetadata) {
  console.log('Grounding metadata:', data.candidates[0].groundingMetadata);
}

Dynamic Retrieval (SDK) - Gemini 1.5

import { GoogleGenAI, DynamicRetrievalConfigMode } from '@google/genai';

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: 'Who won the euro 2024?',
  config: {
    tools: [
      {
        googleSearchRetrieval: {
          dynamicRetrievalConfig: {
            mode: DynamicRetrievalConfigMode.MODE_DYNAMIC,
            dynamicThreshold: 0.7 // Search only if confidence < 70%
          }
        }
      }
    ]
  }
});

console.log(response.text);

if (!response.candidates[0].groundingMetadata) {
  console.log('Model answered from its own knowledge (high confidence)');
}

Grounding Metadata Structure

{
  groundingMetadata: {
    searchQueries: [
      { text: "euro 2024 winner" }
    ],
    webPages: [
      {
        url: "https://example.com/euro-2024-results",
        title: "UEFA Euro 2024 Final Results",
        snippet: "Spain won UEFA Euro 2024..."
      }
    ],
    citations: [
      {
        startIndex: 42,
        endIndex: 47,
        uri: "https://example.com/euro-2024-results"
      }
    ],
    retrievalQueries: [
      {
        query: "who won euro 2024 final"
      }
    ]
  }
}

Chat with Grounding (SDK)

const chat = await ai.chats.create({
  model: 'gemini-2.5-flash',
  config: {
    tools: [{ googleSearch: {} }]
  }
});

let response = await chat.sendMessage('What are the latest developments in quantum computing?');
console.log(response.text);

// Check grounding sources
if (response.candidates[0].groundingMetadata) {
  const sources = response.candidates[0].groundingMetadata.webPages || [];
  console.log(`Sources used: ${sources.length}`);
  sources.forEach(source => {
    console.log(`- ${source.title}: ${source.url}`);
  });
}

// Follow-up still has grounding enabled
response = await chat.sendMessage('Which company made the biggest breakthrough?');
console.log(response.text);

Combining Grounding with Function Calling

const weatherFunction = {
  name: 'get_current_weather',
  description: 'Get current weather for a location',
  parametersJsonSchema: {
    type: 'object',
    properties: {
      location: { type: 'string', description: 'City name' }
    },
    required: ['location']
  }
};

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: 'What is the weather like in the city that won Euro 2024?',
  config: {
    tools: [
      { googleSearch: {} },
      { functionDeclarations: [weatherFunction] }
    ]
  }
});

// Model will:
// 1. Use Google Search to find Euro 2024 winner
// 2. Call get_current_weather function with the city
// 3. Combine both results in response

Checking if Grounding was Used

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: 'What is 2+2?', // Model knows this without search
  config: {
    tools: [{ googleSearch: {} }]
  }
});

if (!response.candidates[0].groundingMetadata) {
  console.log('Model answered from its own knowledge (no search needed)');
} else {
  console.log('Search was performed');
}

Key Points

When to Use Grounding:

Current events and news
Real-time data (stock prices, sports scores, weather)
Fact-checking and verification
Questions about recent developments
Information beyond model's training cutoff

When NOT to Use:

General knowledge questions
Mathematical calculations
Code generation
Creative writing
Tasks requiring internal reasoning only

Cost Considerations:

Grounding adds latency (search takes time)
Additional token costs for retrieved content
Use dynamicThreshold to control when searches happen (Gemini 1.5)

Important Notes:

Grounding requires Google Cloud project (not just API key)
Search results quality depends on query phrasing
Citations may not cover all facts in response
Search is performed automatically based on confidence

Gemini 2.5 vs 1.5:

Gemini 2.5 : Use googleSearch (simple, recommended)
Gemini 1.5 : Use googleSearchRetrieval with dynamicThreshold

Best Practices:

Always check groundingMetadata to see if search was used
Display citations to users for transparency
Use specific, well-phrased questions for better search results
Combine with function calling for hybrid workflows

Known Issues Prevention

This skill prevents 14 documented issues:

Issue #1: Multi-byte Character Corruption in Streaming

Error : Garbled text or � symbols when streaming responses with non-English text Source : GitHub Issue #764 Why It Happens : The TextDecoder converts chunks to strings without the {stream: true} option. Multi-byte UTF-8 characters (Chinese, Japanese, Korean, emoji) split across chunks create invalid strings.

Prevention :

// The SDK already fixes this, but if implementing custom streaming:
const decoder = new TextDecoder();
const { value } = await reader.read();
const text = decoder.decode(value, { stream: true }); // ← stream: true required

Affected : All non-English languages using multi-byte characters Status : Fixed in SDK, but documented for custom implementations

Issue #2: Safety Settings Method Parameter Not Supported

Error : "method parameter is not supported in Gemini API" Source : GitHub Issue #810 Why It Happens : The method parameter in safetySettings only works with Vertex AI Gemini API, not Gemini Developer API or Google AI Studio. The SDK allows passing it without validation.

Prevention :

// ❌ WRONG - Fails with Gemini Developer API:
config: {
  safetySettings: [{
    category: HarmCategory.HARM_CATEGORY_HATE_SPEECH,
    threshold: HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
    method: HarmBlockMethod.SEVERITY // Not supported!
  }]
}

// ✅ CORRECT - Omit 'method' for Gemini Developer API:
config: {
  safetySettings: [{
    category: HarmCategory.HARM_CATEGORY_HATE_SPEECH,
    threshold: HarmBlockThreshold.BLOCK_LOW_AND_ABOVE
    // No 'method' field
  }]
}

Affected : Gemini Developer API and Google AI Studio users Status : Known limitation, use Vertex AI if you need method parameter

Issue #3: Safety Settings Have Model-Specific Thresholds

Error : Content passes through despite strict safety settings, or safetyRatings shows NEGLIGIBLE with empty output Source : GitHub Issue #872 Why It Happens : Different models have different blocking thresholds. gemini-2.5-flash blocks more strictly than gemini-2.0-flash. Additionally, promptFeedback only appears when INPUT is blocked; if the model generates a refusal message, safetyRatings may show NEGLIGIBLE.

Prevention :

// Check BOTH promptFeedback AND empty response:
if (response.candidates[0].finishReason === 'SAFETY' ||
    !response.text || response.text.trim() === '') {
  console.log('Content blocked or refused');
}

// Be aware: Different models have different thresholds
// gemini-2.5-flash: Lower threshold (stricter blocking)
// gemini-2.0-flash: Higher threshold (more permissive)

Affected : All models when using safety settings Status : Known behavior, model-specific thresholds are by design

Issue #4: FunctionCallingConfigMode.ANY Causes Infinite Loop

Error : Model loops forever calling tools, never returns text response Source : GitHub Issue #908 Why It Happens : When FunctionCallingConfigMode.ANY is set with automatic function calling (CallableTool), the model is forced to call at least one tool on every turn and physically cannot stop, looping until max invocations limit.

Prevention :

// ❌ WRONG - Loops forever:
config: {
  toolConfig: {
    functionCallingConfig: {
      mode: FunctionCallingConfigMode.ANY // Forces tool calls forever
    }
  }
}

// ✅ CORRECT - Use AUTO mode (model decides):
config: {
  toolConfig: {
    functionCallingConfig: {
      mode: FunctionCallingConfigMode.AUTO // Model can choose to answer directly
    }
  }
}

// Or use manual function calling (check for functionCall, execute, send back)

Affected : Automatic function calling with CallableTool Status : Known limitation, use AUTO mode or manual function calling

Issue #5: Structured Output Doesn't Preserve Escaped Backslashes (Gemini 3)

Error : JSON.parse fails on structured output, or keys with backslashes are incorrect Source : GitHub Issue #1226 Why It Happens : When using responseMimeType: "application/json" with schema keys containing escaped backslashes (e.g., \\a for key \a), the model output doesn't preserve JSON escaping. It emits a single backslash, causing invalid JSON.

Prevention :

// Avoid using backslashes in JSON schema keys
// Or manually post-process if required:
let jsonText = response.text;
// Add custom escaping logic if needed

Affected : Gemini 3 models with structured output using backslashes in keys Status : Known issue, workaround required

Issue #6: Large PDFs from S3 Signed URLs Fail with "Document has no pages"

Error : ApiError: {"error":{"code":400,"message":"The document has no pages.","status":"INVALID_ARGUMENT"}} Source : GitHub Issue #1259 Why It Happens : Larger PDFs (e.g., 20MB) from AWS S3 signed URLs fail when passed via fileData.fileUri. The API cannot fetch or process the PDF from signed URLs.

Prevention :

// ❌ WRONG - Fails with large PDFs from S3:
contents: [{
  parts: [{
    fileData: {
      fileUri: 'https://bucket.s3.region.amazonaws.com/file.pdf?X-Amz-Algorithm=...'
    }
  }]
}]

// ✅ CORRECT - Fetch and encode to base64:
const pdfResponse = await fetch(signedUrl);
const pdfBuffer = await pdfResponse.arrayBuffer();
const base64Pdf = Buffer.from(pdfBuffer).toString('base64');

contents: [{
  parts: [{
    inlineData: {
      data: base64Pdf,
      mimeType: 'application/pdf'
    }
  }]
}]

Affected : PDF files from external signed URLs Status : Known limitation, use base64 inline data instead

Issue #7: 404 NOT_FOUND with Uploaded Video on Gemini 3 Models

Error : 404 NOT_FOUND when using uploaded video files with Gemini 3 models Source : GitHub Issue #1220 Why It Happens : Some Gemini 3 models (gemini-3-flash-preview, gemini-3-pro-preview) are not available in the free tier or have limited access even with paid accounts. Video file uploads fail with 404.

Prevention :

// ❌ WRONG - 404 error with Gemini 3:
const response = await ai.models.generateContent({
  model: 'gemini-3-pro-preview', // 404 error
  contents: [{
    parts: [
      { text: 'Describe this video' },
      { fileData: { fileUri: videoFile.uri }}
    ]
  }]
});

// ✅ CORRECT - Use Gemini 2.5 for video understanding:
const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash', // Works
  contents: [{
    parts: [
      { text: 'Describe this video' },
      { fileData: { fileUri: videoFile.uri }}
    ]
  }]
});

Affected : Gemini 3 preview models with video uploads Status : Known limitation, use Gemini 2.5 models for video

Issue #8: Batch API Returns 429 Despite Being Under Quota

Error : 429 RESOURCE_EXHAUSTED when using Batch API, even when under documented quota Source : GitHub Issue #1264 Why It Happens : The Batch API may have dynamic rate limiting based on server load or undocumented limits beyond static quotas.

Prevention :

// Implement exponential backoff for Batch API:
async function batchWithRetry(request, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await ai.batches.create(request);
    } catch (error) {
      if (error.status === 429 && i < maxRetries - 1) {
        const delay = Math.pow(2, i) * 1000;
        await new Promise(resolve => setTimeout(resolve, delay));
        continue;
      }
      throw error;
    }
  }
}

Affected : Batch API users on paid tier Status : Under investigation, use retry logic

Issue #9: Context Caching Only Works with Gemini 1.5 Models

Error : 404 NOT FOUND when creating caches with Gemini 2.0, 2.5, or 3.0 models Source : GitHub Issue #339 Why It Happens : Context caching only supports Gemini 1.5 Pro and Gemini 1.5 Flash models. Documentation examples incorrectly show Gemini 2.0+ models.

Prevention :

// ❌ WRONG - 404 error:
const cache = await ai.caches.create({
  model: 'gemini-2.5-flash', // Not supported
  config: { /* ... */ }
});

// ✅ CORRECT - Use Gemini 1.5 with explicit version:
const cache = await ai.caches.create({
  model: 'gemini-1.5-flash-001', // Explicit version required
  config: { /* ... */ }
});

Affected : All Gemini 2.x and 3.x users trying to use context caching Status : Known limitation, only Gemini 1.5 models support caching

Issue #10: Structured Output Occasionally Returns Backticks Causing JSON.parse Error

Error : SyntaxError: Unexpected token ''when parsing JSON responses **Source**: [GitHub Issue #976](https://github.com/googleapis/js-genai/issues/976) **Why It Happens**: When usingresponseMimeType: "application/json", the response occasionally includes markdown code fence backticks wrapping the JSON (`` ```json\n{...}\n``` ``), breaking JSON.parse()`.

Prevention :

// Strip markdown code fences before parsing:
let jsonText = response.text.trim();

if (jsonText.startsWith('```json')) {
  jsonText = jsonText.replace(/^```json\n/, '').replace(/\n```$/, '');
} else if (jsonText.startsWith('```')) {
  jsonText = jsonText.replace(/^```\n/, '').replace(/\n```$/, '');
}

const data = JSON.parse(jsonText);

Affected : All models when using structured output with responseMimeType: "application/json" Status : Known intermittent issue, workaround required

Issue #11: Gemini 3 Temperature Below 1.0 Causes Looping/Degraded Reasoning

Error : Infinite loops or degraded reasoning quality on complex tasks Source : Official Troubleshooting Docs Why It Happens : Gemini 3 models are optimized for temperature 1.0. Lowering temperature below 1.0 may cause looping behavior or degraded performance on complex mathematical/reasoning tasks.

Prevention :

// ❌ WRONG - May cause issues with Gemini 3:
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'Solve this complex math problem: ...',
  config: {
    temperature: 0.3 // May cause looping/degradation
  }
});

// ✅ CORRECT - Keep default temperature:
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'Solve this complex math problem: ...',
  config: {
    temperature: 1.0 // Recommended for Gemini 3
  }
});
// Or omit temperature config entirely (uses default 1.0)

Affected : Gemini 3 series models Status : Official recommendation, keep temperature at 1.0

Issue #12: Massive Rate Limit Reductions in December 2025 (Free Tier)

Error : Sudden 429 RESOURCE_EXHAUSTED errors after December 6, 2025 Source : LaoZhang AI Blog | HowToGeek Why It Happens : Google reduced free tier rate limits by 80-90% without wide announcement, catching developers off guard.

Changes :

Gemini 2.5 Pro: 80% reduction in daily requests (100 RPD, was ~250)
Gemini 2.5 Flash: ~20 requests per day (was ~250) - 90% reduction
Free tier now impractical for production

Prevention :

// For production, upgrade to paid tier:
// https://ai.google.dev/pricing

// For free tier, implement aggressive rate limiting:
const rateLimiter = {
  requests: 0,
  resetTime: Date.now() + 24 * 60 * 60 * 1000,
  async checkLimit() {
    if (Date.now() > this.resetTime) {
      this.requests = 0;
      this.resetTime = Date.now() + 24 * 60 * 60 * 1000;
    }
    if (this.requests >= 20) {
      throw new Error('Daily limit reached');
    }
    this.requests++;
  }
};

await rateLimiter.checkLimit();
const response = await ai.models.generateContent({/* ... */});

Affected : Free tier users (December 6, 2025 onwards) Status : Permanent change, upgrade to paid tier for production

Issue #13: Preview Models Have No SLAs and Can Change Without Warning

Error : Unexpected behavior changes, deprecation, or service interruptions Source : Arsturn Blog | Official docs Why It Happens : Preview and experimental models (e.g., gemini-2.5-flash-preview, gemini-3-pro-preview) have no service level agreements (SLAs) and are inherently unstable. Google can change or deprecate them with little notice.

Prevention :

// ❌ WRONG - Using preview models in production:
const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash-preview', // No SLA!
  contents: 'Production traffic'
});

// ✅ CORRECT - Use GA (generally available) models:
const re

Google Gemini API 完整指南：最新SDK迁移、模型对比与实战教程（2025版）

🇨🇳中文介绍

Google Gemini API - 完整指南

⚠️ 关键 SDK 迁移警告

状态

相关 Skills

目录

快速开始

安装

环境设置

首次文本生成 (Node.js SDK)

首次文本生成 (Fetch - Cloudflare Workers)

当前模型 (2025)

Gemini 3 系列 (2025 年 12 月)

gemini-3-flash

gemini-3-pro-preview

Gemini 2.5 系列 (正式可用 - 稳定)

gemini-2.5-pro

gemini-2.5-flash

gemini-2.5-flash-lite

模型功能矩阵

⚠️ 上下文窗口更正

SDK 与 Fetch 方法对比

Node.js SDK (@google/genai)

基于 Fetch 的方法 (直接 REST API)

文本生成

基础文本生成 (SDK)

基础文本生成 (Fetch)

响应结构

流式传输

使用 SDK 进行流式传输 (异步迭代)

使用 Fetch 进行流式传输 (SSE 解析)

多模态输入

图像 (视觉)

SDK 方法

Fetch 方法

视频

音频

PDF

多输入组合

函数调用

基础函数调用 (SDK)

函数调用 (Fetch)

并行函数调用

函数调用模式

系统指令

SDK 方法

Fetch 方法

多轮对话

SDK 对话助手 (推荐)

手动对话管理 (Fetch)

思考模式

配置思考预算 (SDK)

配置思考预算 (Fetch)

配置思考级别 (SDK) - v1.30.0 新增

生成配置

所有配置选项 (SDK)

所有配置选项 (Fetch)

参数指南

上下文缓存

工作原理

优势

缓存创建 (SDK)

缓存创建 (Fetch)

使用缓存 (SDK)

使用缓存 (Fetch)

🇺🇸English

Google Gemini API - Complete Guide

⚠️ CRITICAL SDK MIGRATION WARNING

Status

Table of Contents

Quick Start

Installation

Environment Setup

First Text Generation (Node.js SDK)

First Text Generation (Fetch - Cloudflare Workers)

Current Models (2025)

Gemini 3 Series (December 2025)

gemini-3-flash

gemini-3-pro-preview