Gemini AI 图像生成技能 - 使用 Google AI 生成和编辑网站图片，支持 4K 分辨率和文本渲染

image-gen by jezweb/claude-skills

350 周安装量

643 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/jezweb/claude-skills --skill image-gen

AI/机器学习内容创作开发

🇨🇳中文介绍

图像生成技能

使用 Gemini 原生图像生成功能生成和编辑网站图像。

⚠️ 重要：必须进行 SDK 迁移

重要提示：@google/generative-ai 包自 2025 年 11 月 30 日起已弃用。所有新项目必须使用 @google/genai。

必须迁移：

// ❌ 旧版（已弃用，支持已于 2025 年 11 月 30 日结束）
import { GoogleGenerativeAI } from "@google/generative-ai";
const genAI = new GoogleGenerativeAI(API_KEY);

// ✅ 新版（必需）
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({ apiKey: API_KEY });

来源：GitHub 仓库迁移通知

模型

模型	ID	状态	最佳用途

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

功能	支持情况
根据文本生成	✅
编辑现有图像	✅
更改宽高比	✅
拓宽/扩展图像	✅
风格迁移	✅
更改颜色	✅
添加/移除元素	✅
图像中的文本	✅（清晰可读！）
多张参考图像	✅（最多 14 张：最多 5 张人物，9 张物体）
4K 分辨率	✅（仅限 Pro 版）

分辨率（仅限 Pro 版）

尺寸	1:1	16:9	4:3
1K	1024x1024	1376x768	1184x880
2K	2048x2048	2752x1536	2368x1760
4K	4096x4096	5504x3072	4736x3520

import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

// 生成新图像
const response = await ai.models.generateContent({
  model: "gemini-2.5-flash-image",
  contents: "A professional plumber in hi-vis working in modern Australian home",
  config: {
    responseModalities: ["TEXT", "IMAGE"],  // 两者都必需 - 不能单独使用 ["IMAGE"]
    imageGenerationConfig: {
      aspectRatio: "16:9",
    },
  },
});

// 提取图像
for (const part of response.candidates[0].content.parts) {
  if (part.inlineData) {
    const buffer = Buffer.from(part.inlineData.data, "base64");
    fs.writeFileSync("hero.png", buffer);
  }
}

重要提示：responseModalities 必须同时包含 ["TEXT", "IMAGE"]。单独使用 ["IMAGE"] 可能会失败或产生意外结果。

需求	使用
快速迭代	Gemini 2.5 Flash Image
4K 分辨率	Gemini 3 Pro Image Preview
图像中的文本	Gemini 3 Pro（4K 下 94% 可读性）
简单编辑	Gemini 2.5 Flash Image
复杂构图	Gemini 3 Pro Image Preview
信息图/图表	Gemini 3 Pro Image Preview

文本渲染基准测试（4K 分辨率）：

Gemini 3 Pro Image: 94% 文本可读
DALL-E 3: 78% 文本可读
Midjourney: 仅装饰性伪文本

在以下情况使用 Gemini 图像生成：

库存照片不符合品牌/情境
需要澳大利亚特定图像
图像中需要包含文本（信息图、图表）
需要跨多张图像保持一致的风格
需要编辑/修改现有图像
客户没有其工作的照片

在以下情况不要使用：

客户有实际工作的优质照片
需要真实的团队照片（请先讨论）
产品拍摄（使用真实产品）
存在法律/合规问题

此技能可预防 5 个已记录的问题：

问题 #1：分辨率参数大小写敏感

错误：请求因参数无效错误而失败来源：Google AI 图像生成文档原因：分辨率值区分大小写，必须使用大写 'K'。 预防措施：始终使用 "4K"、"2K"、"1K" - 切勿使用小写 "4k"。

// ❌ 错误 - 导致请求失败
config: { imageGenerationConfig: { resolution: "4k" } }

// ✅ 正确 - 必须使用大写
config: { imageGenerationConfig: { resolution: "4K" } }

问题 #2：宽高比可能被忽略（2025 年 9 月后）

错误：尽管请求了 16:9 或其他比例，但返回 1:1 正方形图像来源：Google 支持主题原因：2025 年 9 月的后端更新影响了 Gemini 2.5 Flash Image 模型的宽高比处理。 预防措施：使用 Gemini 3 Pro Image Preview 以获得可靠的宽高比控制，或者生成 1:1 图像并使用多轮编辑进行扩展。

// Gemini 2.5 Flash Image 上可能忽略 aspectRatio
model: "gemini-2.5-flash-image",
config: { imageGenerationConfig: { aspectRatio: "16:9" } }

// 对于宽高比控制更可靠
model: "gemini-3-pro-image-preview",
config: { imageGenerationConfig: { aspectRatio: "16:9" } }

状态：Google 已确认正在修复（2025 年 9 月）。

问题 #3：超过 5 张人物参考图像

错误：生成图像中角色一致性不可预测来源：Google AI 图像生成文档原因：Gemini 3 Pro Image 总共支持最多 14 张参考图像，但为了保持角色一致性，其中只能有 5 张是人物图像。 预防措施：将人物图像限制在 5 张或更少。使用剩余的额度（总共最多 14 张）用于物体/场景。

// ❌ 错误 - 7 张人物图像超出限制
const humanImages = [img1, img2, img3, img4, img5, img6, img7];
const prompt = [
  { text: "Generate consistent characters" },
  ...humanImages.map(img => ({ inlineData: { data: img, mimeType: "image/png" }})),
];

// ✅ 正确 - 最多 5 张人物图像
const humanImages = images.slice(0, 5);  // 限制为 5 张
const objectImages = images.slice(5, 14);  // 最多再添加 9 张物体图像
const prompt = [
  { text: "Generate consistent characters" },
  ...humanImages.map(img => ({ inlineData: { data: img, mimeType: "image/png" }})),
  ...objectImages.map(img => ({ inlineData: { data: img, mimeType: "image/png" }})),
];

问题 #4：SynthID 水印无法禁用

错误：不适用（已记录的局限性）来源：Google AI 图像生成文档原因：所有生成的图像都会自动包含 SynthID 水印，用于内容真实性跟踪。 预防措施：对于商业用例，请注意此限制。开发者无法禁用水印。

问题 #5：Google 搜索基础数据排除图像结果

错误：生成的图像不反映视觉搜索结果，仅反映文本来源：Google AI 图像生成文档原因：当将 Google 搜索工具与图像生成结合使用时，“基于图像的搜索结果不会传递给生成模型。” 预防措施：只有基于文本的搜索结果会影响视觉输出。不要期望模型会参考搜索结果中的图像。

// 启用 Google 搜索工具
const response = await ai.models.generateContent({
  model: "gemini-3-pro-image-preview",
  contents: "Generate image of latest iPhone design",
  tools: [{ googleSearch: {} }],
  config: { responseModalities: ["TEXT", "IMAGE"] },
});
// 结果：仅使用文本搜索结果，而非网络搜索中的图像结果

当前定价（截至 2025 年 11 月）：

Gemini 2.5 Flash Image：约 $0.008 每张图像
- 输入：每张图像 258 个令牌
- 输出：每张图像 1290 个令牌
- 费率：每 100 万输出令牌 $30.00

注意：generateImages API（Imagen 模型）在响应中不返回 usageMetadata。请根据上述定价手动跟踪成本。

references/prompting.md - 有效的提示模式
references/website-images.md - 横幅图、服务图、背景图模板
references/editing.md - 多轮编辑模式
references/local-imagery.md - 澳大利亚特定细节
references/integration.md - API 代码示例

最后验证：2026-01-21 | 技能版本：2.0.0 | 变更：添加了 SDK 迁移通知（重要），更新为当前模型名称（gemini-3-pro-image-preview、gemini-2.5-flash-image），添加了 5 个已知问题（分辨率大小写敏感、宽高比错误、参考图像限制、SynthID 水印、Google 搜索基础数据），添加了定价部分，添加了文本渲染基准测试。

2026 年 1 月 20 日

🇺🇸English

Image Generation Skill

Generate and edit website images using Gemini Native Image Generation.

⚠️ Critical: SDK Migration Required

IMPORTANT : The @google/generative-ai package is deprecated as of November 30, 2025. All new projects must use @google/genai.

Migration Required :

// ❌ OLD (deprecated, support ended Nov 30, 2025)
import { GoogleGenerativeAI } from "@google/generative-ai";
const genAI = new GoogleGenerativeAI(API_KEY);

// ✅ NEW (required)
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({ apiKey: API_KEY });

Source : GitHub Repository Migration Notice

Models

Model	ID	Status	Best For
Gemini 3 Pro Image	`gemini-3-pro-image-preview`	Preview (Nov 20, 2025)	4K, complex prompts, text
Gemini 2.5 Flash Image	`gemini-2.5-flash-image`	GA (Oct 2, 2025)	Fast iteration, general use
Imagen 4.0	`imagen-4.0-generate-001`	GA (Aug 14, 2025)	Alternative platform

Deprecated Models (do not use):

gemini-2.0-flash-exp-image-generation - Shut down Nov 11, 2025
gemini-2.0-flash-preview-image-generation - Shut down Nov 11, 2025
gemini-2.5-flash-image-preview - Scheduled shutdown Jan 15, 2026

Source : Google AI Changelog

Capabilities

Feature	Supported
Generate from text	✅
Edit existing images	✅
Change aspect ratio	✅
Widen/extend images	✅
Style transfer	✅
Change colours	✅
Add/remove elements	✅
Text in images	✅ (legible!)
Multiple reference images	✅ (up to 14: max 5 humans, 9 objects)
4K resolution	✅ (Pro only)

Note : Exceeding 5 human reference images causes unpredictable character consistency. Keep human images ≤ 5 for reliable results.

Aspect Ratios

1:1   | 2:3  | 3:2  | 3:4  | 4:3
4:5   | 5:4  | 9:16 | 16:9 | 21:9

Resolutions (Pro only)

Size	1:1	16:9	4:3
1K	1024x1024	1376x768	1184x880
2K	2048x2048	2752x1536	2368x1760
4K	4096x4096	5504x3072	4736x3520

Quick Start

import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

// Generate new image
const response = await ai.models.generateContent({
  model: "gemini-2.5-flash-image",
  contents: "A professional plumber in hi-vis working in modern Australian home",
  config: {
    responseModalities: ["TEXT", "IMAGE"],  // BOTH required - cannot use ["IMAGE"] alone
    imageGenerationConfig: {
      aspectRatio: "16:9",
    },
  },
});

// Extract image
for (const part of response.candidates[0].content.parts) {
  if (part.inlineData) {
    const buffer = Buffer.from(part.inlineData.data, "base64");
    fs.writeFileSync("hero.png", buffer);
  }
}

Important : responseModalities must include both ["TEXT", "IMAGE"]. Using ["IMAGE"] alone may fail or produce unexpected results.

Model Selection

Requirement	Use
Fast iteration	Gemini 2.5 Flash Image
4K resolution	Gemini 3 Pro Image Preview
Text in images	Gemini 3 Pro (94% legibility at 4K)
Simple edits	Gemini 2.5 Flash Image
Complex compositions	Gemini 3 Pro Image Preview
Infographics/diagrams	Gemini 3 Pro Image Preview

Text Rendering Benchmarks (4K resolution):

Gemini 3 Pro Image: 94% legible text
DALL-E 3: 78% legible text
Midjourney: Decorative pseudo-text only

When to Use

Use Gemini Image Gen when:

Stock photos don't fit brand/context
Need Australian-specific imagery
Need text in images (infographics, diagrams)
Need consistent style across multiple images
Need to edit/modify existing images
Client has no photos of their work

Don't use when:

Client has good photos of actual work
Real team photos needed (discuss first)
Product shots (use real products)
Legal/compliance concerns

Known Issues Prevention

This skill prevents 5 documented issues:

Issue #1: Resolution Parameter Case Sensitivity

Error : Request fails with invalid parameter error Source : Google AI Image Generation Docs Why It Happens : Resolution values are case-sensitive and must use uppercase 'K'. Prevention : Always use "4K", "2K", "1K" - never lowercase "4k".

// ❌ WRONG - causes request failure
config: { imageGenerationConfig: { resolution: "4k" } }

// ✅ CORRECT - uppercase required
config: { imageGenerationConfig: { resolution: "4K" } }

Issue #2: Aspect Ratio May Be Ignored (Sept 2025+)

Error : Returns 1:1 square image despite requesting 16:9 or other ratios Source : Google Support Thread Why It Happens : Backend update in September 2025 affected Gemini 2.5 Flash Image model's aspect ratio handling. Prevention : Use Gemini 3 Pro Image Preview for reliable aspect ratio control, or generate 1:1 and use multi-turn editing to extend.

// May ignore aspectRatio on Gemini 2.5 Flash Image
model: "gemini-2.5-flash-image",
config: { imageGenerationConfig: { aspectRatio: "16:9" } }

// More reliable for aspect ratio control
model: "gemini-3-pro-image-preview",
config: { imageGenerationConfig: { aspectRatio: "16:9" } }

Status : Google confirmed working on fix (Sept 2025).

Issue #3: Exceeding 5 Human Reference Images

Error : Unpredictable character consistency in generated images Source : Google AI Image Generation Docs Why It Happens : Gemini 3 Pro Image supports up to 14 reference images total, but only 5 can be human images for character consistency. Prevention : Limit human images to 5 or fewer. Use remaining slots (up to 14 total) for objects/scenes.

// ❌ WRONG - 7 human images exceeds limit
const humanImages = [img1, img2, img3, img4, img5, img6, img7];
const prompt = [
  { text: "Generate consistent characters" },
  ...humanImages.map(img => ({ inlineData: { data: img, mimeType: "image/png" }})),
];

// ✅ CORRECT - max 5 human images
const humanImages = images.slice(0, 5);  // Limit to 5
const objectImages = images.slice(5, 14);  // Up to 9 more for objects
const prompt = [
  { text: "Generate consistent characters" },
  ...humanImages.map(img => ({ inlineData: { data: img, mimeType: "image/png" }})),
  ...objectImages.map(img => ({ inlineData: { data: img, mimeType: "image/png" }})),
];

Issue #4: SynthID Watermark Cannot Be Disabled

Error : N/A (documented limitation) Source : Google AI Image Generation Docs Why It Happens : All generated images automatically include a SynthID watermark for content authenticity tracking. Prevention : Be aware of this limitation for commercial use cases. Watermark cannot be disabled by developers.

Issue #5: Google Search Grounding Excludes Image Results

Error : Generated images don't reflect visual search results, only text Source : Google AI Image Generation Docs Why It Happens : When using Google Search tool with image generation, "image-based search results are not passed to the generation model." Prevention : Only text-based search results inform the visual output. Don't expect the model to reference images from search results.

// Google Search tool enabled
const response = await ai.models.generateContent({
  model: "gemini-3-pro-image-preview",
  contents: "Generate image of latest iPhone design",
  tools: [{ googleSearch: {} }],
  config: { responseModalities: ["TEXT", "IMAGE"] },
});
// Result: Only text search results used, not image results from web search

Pricing

Current Pricing (as of November 2025):

Gemini 2.5 Flash Image : ~$0.008 per image
- Input: 258 tokens per image
- Output: 1290 tokens per image
- Rate: $30.00 per 1M output tokens

Note : The generateImages API (Imagen models) does not return usageMetadata in responses. Track costs manually based on pricing above.

Source : Google Developers Blog - Gemini 2.5 Flash Image

Reference Files

references/prompting.md - Effective prompt patterns
references/website-images.md - Hero, service, background templates
references/editing.md - Multi-turn editing patterns
references/local-imagery.md - Australian-specific details
references/integration.md - API code examples

Last verified : 2026-01-21 | Skill version : 2.0.0 | Changes : Added SDK migration notice (critical), updated to current model names (gemini-3-pro-image-preview, gemini-2.5-flash-image), added 5 Known Issues (resolution case sensitivity, aspect ratio bug, reference image limits, SynthID watermark, Google Search grounding), added pricing section, added text rendering benchmarks.

Weekly Installs

350

Repository

jezweb/claude-skills

GitHub Stars

643

First Seen

Jan 20, 2026

Security Audits

Gen Agent Trust HubFail SocketPass SnykWarn

Installed on

claude-code284

gemini-cli230

opencode227

cursor217

antigravity211

codex202

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

105,000 周安装