AI图像生成器：使用Gemini和GPT API生成专业图片的完整指南

ai-image-generator by jezweb/claude-skills

364 周安装量

691 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/jezweb/claude-skills --skill ai-image-generator

AI/机器学习内容创作图像处理

🇨🇳中文介绍

AI 图像生成器

使用 AI API（Google Gemini 和 OpenAI GPT）生成图像。此技能教授如何通过提示词模式和 API 机制，直接从 Claude Code 生成专业图像。

托管替代方案：如果您不想管理 API 密钥，ImageBot 提供了一个托管的图像生成服务，支持相册模板和品牌工具包。

模型选择

根据需求选择合适的模型：

需求	模型	原因
场景 / 库存照片	Gemini 3.1 Flash Image	最佳深度、复杂性和环境上下文
透明图标 / 徽标	GPT Image 1.5	原生 RGBA 透明通道 (`background: "transparent"`)
图像上的文字	GPT Image 1.5	90% 准确的文字渲染
草稿 / 迭代	Gemini 2.5 Flash Image	免费层级（约 500 次/天）

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

五部分提示词框架

按此顺序构建提示词以获得一致的结果：

设定类型："一张照片级真实感的照片"、"一张等距插图"、"一个扁平矢量图标"

谁或什么，包含具体细节："一位温暖、平易近人的澳大利亚女性，30 岁出头，自然微笑"

场景和空间关系："在一个明亮的现代住宅中，她身后的木架上摆放着赤陶土装饰品"

相机和灯光："使用 85mm f/2.0 镜头拍摄，自然窗光，头部和肩部取景"

需要排除的内容："照片级真实感，无文字，无水印，无徽标"

示例（好 vs 坏）

差 — 关键词堆砌：
"专业女性，水疗中心，温暖灯光，高质量，4K"

好 — 叙述性引导：
"一个温暖临床环境中的专业皮肤护理场景。
一位戴着蓝色医用手套的从业者使用微针笔
在客户的额头上操作。客户躺在白色的治疗床上，
闭着眼睛，放松状态。左侧窗户透进温暖的金色时刻光线。
背景可见赤陶土色调的墙壁。使用
85mm f/2.0 镜头拍摄，浅景深。无文字，无水印。"

1. 确定图像需求

用途	宽高比	模型
横幅主图	16:9 或 21:9	Gemini
服务卡片	4:3 或 3:4	Gemini
个人资料 / 头像	1:1	Gemini
图标 / 徽章	1:1	GPT（透明）
OG / 社交分享	1.91:1	Gemini
Instagram 帖子	1:1 或 4:5	Gemini
移动端横幅	9:16	Gemini

使用五部分框架。有关详细的摄影参数，请参考 references/prompting-guide.md。

3. 通过 API 生成

Gemini (Python — 正确处理 shell 转义)

python3 << 'PYEOF'
import json, base64, urllib.request, os, sys

GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY")
if not GEMINI_API_KEY:
    print("Set GEMINI_API_KEY environment variable"); sys.exit(1)

model = "gemini-3.1-flash-image-preview"
url = f"https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent?key={GEMINI_API_KEY}"

prompt = """A professional photograph of a modern co-working space in
Newcastle, Australia. Natural light floods through floor-to-ceiling
windows. Three people collaborate at a standing desk — one pointing
at a laptop screen. Exposed brick wall, potted fiddle-leaf fig,
coffee cups on the desk. Shot at 35mm f/4.0, environmental portrait
style. No text, no watermarks, no logos."""

payload = json.dumps({
    "contents": [{"parts": [{"text": prompt}]}],
    "generationConfig": {
        "responseModalities": ["TEXT", "IMAGE"],
        "temperature": 0.8
    }
}).encode()

req = urllib.request.Request(url, data=payload, headers={
    "Content-Type": "application/json",
    "User-Agent": "ImageGen/1.0"
})

resp = urllib.request.urlopen(req, timeout=120)
result = json.loads(resp.read())

# Extract image from response
for part in result["candidates"][0]["content"]["parts"]:
    if "inlineData" in part:
        img_data = base64.b64decode(part["inlineData"]["data"])
        output_path = "hero-image.png"
        with open(output_path, "wb") as f:
            f.write(img_data)
        print(f"Saved: {output_path} ({len(img_data):,} bytes)")
        break
PYEOF

GPT（透明图标）

python3 << 'PYEOF'
import json, base64, urllib.request, os, sys

OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
if not OPENAI_API_KEY:
    print("Set OPENAI_API_KEY environment variable"); sys.exit(1)

url = "https://api.openai.com/v1/images/generations"

payload = json.dumps({
    "model": "gpt-image-1.5",
    "prompt": "A minimal, clean plumbing wrench icon. Flat design, single consistent stroke weight, modern style. On a transparent background.",
    "n": 1,
    "size": "1024x1024",
    "background": "transparent",
    "output_format": "png"
}).encode()

req = urllib.request.Request(url, data=payload, headers={
    "Content-Type": "application/json",
    "Authorization": f"Bearer {OPENAI_API_KEY}"
})

resp = urllib.request.urlopen(req, timeout=120)
result = json.loads(resp.read())

img_data = base64.b64decode(result["data"][0]["b64_json"])
with open("icon-wrench.png", "wb") as f:
    f.write(img_data)
print(f"Saved: icon-wrench.png ({len(img_data):,} bytes)")
PYEOF

将生成的图像保存到 .jez/artifacts/ 或用户指定的路径。

后处理（可选）：

# 转换为 WebP 用于网页
python3 -c "
from PIL import Image
img = Image.open('hero-image.png')
img.save('hero-image.webp', 'WEBP', quality=85)
print(f'WebP: {img.size[0]}x{img.size[1]}')
"

# 修剪透明图标的空白区域
python3 -c "
from PIL import Image
img = Image.open('icon.png')
trimmed = img.crop(img.getbbox())
trimmed.save('icon-trimmed.png')
"

5. 质量检查（可选）

将生成的图像发送回视觉模型进行质量检查：

# 发送给 Gemini Flash 进行评价
critique_prompt = """Review this image for:
1. AI 伪影（多余的手指、漂浮的物体、文字错误）
2. 技术准确性（错误的设备、不安全的姿势）
3. 构图问题（尴尬的裁剪、杂乱的背景）
4. 风格与专业库存照片的一致性

列出发现的所有问题，如果图像已准备好用于生产，则说 'PASS'。"""

如果发现问题，将它们作为负面指导添加到原始提示词中并重新生成。

Gemini 支持在对话轮次中编辑生成的图像。关键要求：保留模型响应中的思维签名。

# 第 1 轮：生成基础图像
contents = [{"role": "user", "parts": [{"text": "Scene prompt..."}]}]

# 响应包含 parts 上的 thoughtSignature — 完整保留它们

# 第 2 轮：编辑图像
contents = [
    {"role": "user", "parts": [{"text": "Original prompt"}]},
    {"role": "model", "parts": response_parts_with_signatures},  # 保持原样
    {"role": "user", "parts": [{"text": "Edit: change the wall colour to blue. Keep everything else exactly the same."}]}
]

编辑提示词模式：始终指定要保持不变的内容，而不仅仅是更改的内容。模型会将未列出的元素视为可以自由修改。

好："编辑此图像：保持人物、桌子和窗户不变。
仅更改：墙壁颜色从赤陶土色改为海洋蓝色。"

差："现在把墙改成蓝色。"
（模型可能会更改其他所有内容）

提供商	获取密钥地址	环境变量
Google Gemini	aistudio.google.com	`GEMINI_API_KEY`
OpenAI	platform.openai.com	`OPENAI_API_KEY`

export GEMINI_API_KEY="your-key-here"
export OPENAI_API_KEY="your-key-here"

错误	修复方法
使用 curl 处理 Gemini 提示词	使用 Python — shell 转义会破坏撇号
使用"美丽、专业、高质量"	使用具体规格："85mm f/1.8，金色时刻光线"
未指定要排除的内容	始终以"无文字，无水印，无徽标"结尾
向 Gemini 请求透明 PNG	Gemini 无法处理透明度 — 使用 GPT 并设置 `background: "transparent"`
为澳大利亚企业使用美国默认设置	明确指定"澳大利亚" + 当地建筑、植被
模型 ID 使用通用数据	验证当前模型 ID — 它们经常变更

🇺🇸English

AI Image Generator

Generate images using AI APIs (Google Gemini and OpenAI GPT). This skill teaches the prompting patterns and API mechanics for producing professional images directly from Claude Code.

Managed alternative : If you don't want to manage API keys, ImageBot provides a managed image generation service with album templates and brand kit support.

Model Selection

Choose the right model for the job:

Need	Model	Why
Scenes / stock photos	Gemini 3.1 Flash Image	Best depth, complexity, environmental context
Transparent icons / logos	GPT Image 1.5	Native RGBA alpha channel (`background: "transparent"`)
Text on images	GPT Image 1.5	90% accurate text rendering
Drafts / iteration	Gemini 2.5 Flash Image	Free tier (~500/day)
Final client assets	Gemini 3 Pro Image	Higher detail, better style consistency

Model IDs

Model	API ID	Provider
Gemini 3.1 Flash Image	`gemini-3.1-flash-image-preview`	Google AI
Gemini 3 Pro Image	`gemini-3-pro-image-preview`	Google AI
Gemini 2.5 Flash Image	`gemini-2.5-flash-image`	Google AI
GPT Image 1.5	`gpt-image-1.5`	OpenAI

Verify model IDs before use — they change frequently:

curl -s "https://generativelanguage.googleapis.com/v1beta/models?key=$GEMINI_API_KEY" | python3 -c "import sys,json; [print(m['name']) for m in json.load(sys.stdin)['models'] if 'image' in m['name'].lower()]"

The 5-Part Prompting Framework

Build prompts in this order for consistent results:

1. Image Type

Set the genre: "A photorealistic photograph", "An isometric illustration", "A flat vector icon"

2. Subject

Who or what, with specific details: "of a warm, approachable Australian woman in her early 30s, smiling naturally"

3. Environment

Setting and spatial relationships: "in a bright modern home with terracotta decor on wooden shelves behind her"

4. Technical Specs

Camera and lighting: "Shot at 85mm f/2.0, natural window light, head and shoulders framing"

5. Constraints

What to exclude: "Photorealistic, no text, no watermarks, no logos"

Example (Good vs Bad)

BAD — keyword soup:
"professional woman, spa, warm lighting, high quality, 4K"

GOOD — narrative direction:
"A professional skin treatment scene in a warm clinical setting.
A practitioner wearing blue medical gloves uses a microneedling pen
on the client's forehead. The client lies on a white treatment bed,
eyes closed, relaxed. Warm golden-hour light from a window to the
left. Terracotta-toned wall visible in the background. Shot at
85mm f/2.0, shallow depth of field. No text, no watermarks."

Workflow

1. Determine Image Need

Purpose	Aspect Ratio	Model
Hero banner	16:9 or 21:9	Gemini
Service card	4:3 or 3:4	Gemini
Profile / avatar	1:1	Gemini
Icon / badge	1:1	GPT (transparent)
OG / social share	1.91:1	Gemini
Instagram post	1:1 or 4:5	Gemini
Mobile hero	9:16	Gemini

2. Build the Prompt

Use the 5-part framework. Refer to references/prompting-guide.md for detailed photography parameters.

3. Generate via API

Gemini (Python — handles shell escaping correctly)

python3 << 'PYEOF'
import json, base64, urllib.request, os, sys

GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY")
if not GEMINI_API_KEY:
    print("Set GEMINI_API_KEY environment variable"); sys.exit(1)

model = "gemini-3.1-flash-image-preview"
url = f"https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent?key={GEMINI_API_KEY}"

prompt = """A professional photograph of a modern co-working space in
Newcastle, Australia. Natural light floods through floor-to-ceiling
windows. Three people collaborate at a standing desk — one pointing
at a laptop screen. Exposed brick wall, potted fiddle-leaf fig,
coffee cups on the desk. Shot at 35mm f/4.0, environmental portrait
style. No text, no watermarks, no logos."""

payload = json.dumps({
    "contents": [{"parts": [{"text": prompt}]}],
    "generationConfig": {
        "responseModalities": ["TEXT", "IMAGE"],
        "temperature": 0.8
    }
}).encode()

req = urllib.request.Request(url, data=payload, headers={
    "Content-Type": "application/json",
    "User-Agent": "ImageGen/1.0"
})

resp = urllib.request.urlopen(req, timeout=120)
result = json.loads(resp.read())

# Extract image from response
for part in result["candidates"][0]["content"]["parts"]:
    if "inlineData" in part:
        img_data = base64.b64decode(part["inlineData"]["data"])
        output_path = "hero-image.png"
        with open(output_path, "wb") as f:
            f.write(img_data)
        print(f"Saved: {output_path} ({len(img_data):,} bytes)")
        break
PYEOF

GPT (Transparent Icons)

python3 << 'PYEOF'
import json, base64, urllib.request, os, sys

OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
if not OPENAI_API_KEY:
    print("Set OPENAI_API_KEY environment variable"); sys.exit(1)

url = "https://api.openai.com/v1/images/generations"

payload = json.dumps({
    "model": "gpt-image-1.5",
    "prompt": "A minimal, clean plumbing wrench icon. Flat design, single consistent stroke weight, modern style. On a transparent background.",
    "n": 1,
    "size": "1024x1024",
    "background": "transparent",
    "output_format": "png"
}).encode()

req = urllib.request.Request(url, data=payload, headers={
    "Content-Type": "application/json",
    "Authorization": f"Bearer {OPENAI_API_KEY}"
})

resp = urllib.request.urlopen(req, timeout=120)
result = json.loads(resp.read())

img_data = base64.b64decode(result["data"][0]["b64_json"])
with open("icon-wrench.png", "wb") as f:
    f.write(img_data)
print(f"Saved: icon-wrench.png ({len(img_data):,} bytes)")
PYEOF

4. Save and Optimise

Save generated images to .jez/artifacts/ or the user's specified path.

Post-processing (optional):

# Convert to WebP for web use
python3 -c "
from PIL import Image
img = Image.open('hero-image.png')
img.save('hero-image.webp', 'WEBP', quality=85)
print(f'WebP: {img.size[0]}x{img.size[1]}')
"

# Trim whitespace from transparent icons
python3 -c "
from PIL import Image
img = Image.open('icon.png')
trimmed = img.crop(img.getbbox())
trimmed.save('icon-trimmed.png')
"

5. Quality Check (Optional)

Send the generated image back to a vision model for QA:

# Send to Gemini Flash for critique
critique_prompt = """Review this image for:
1. AI artifacts (extra fingers, floating objects, text errors)
2. Technical accuracy (wrong equipment, unsafe positioning)
3. Composition issues (awkward cropping, cluttered background)
4. Style consistency with a professional stock photo

List any issues found, or say 'PASS' if the image is production-ready."""

If issues are found, append them as negative guidance to the original prompt and regenerate.

Multi-Turn Editing

Gemini supports editing a generated image across conversation turns. The key requirement: preserve thought signatures from model responses.

# Turn 1: Generate base image
contents = [{"role": "user", "parts": [{"text": "Scene prompt..."}]}]

# The response includes thoughtSignature on parts — preserve them ALL

# Turn 2: Edit the image
contents = [
    {"role": "user", "parts": [{"text": "Original prompt"}]},
    {"role": "model", "parts": response_parts_with_signatures},  # Keep intact
    {"role": "user", "parts": [{"text": "Edit: change the wall colour to blue. Keep everything else exactly the same."}]}
]

Edit prompt pattern : Always specify what to KEEP unchanged, not just what to change. The model treats unlisted elements as free to modify.

GOOD: "Edit this image: keep the people, desk, and window unchanged.
Only change: wall colour from terracotta to ocean blue."

BAD: "Now make the wall blue."
(Model may change everything else too)

API Key Setup

Provider	Get key at	Env variable
Google Gemini	aistudio.google.com	`GEMINI_API_KEY`
OpenAI	platform.openai.com	`OPENAI_API_KEY`

export GEMINI_API_KEY="your-key-here"
export OPENAI_API_KEY="your-key-here"

Common Mistakes

Mistake	Fix
Using curl for Gemini prompts	Use Python — shell escaping breaks on apostrophes
"Beautiful, professional, high quality"	Use concrete specs: "85mm f/1.8, golden hour light"
Not specifying what to exclude	Always end with "No text, no watermarks, no logos"
Requesting transparent PNG from Gemini	Gemini cannot do transparency — use GPT with `background: "transparent"`
American defaults for AU businesses	Explicitly specify "Australian" + local architecture, vegetation
Generic data for model ID	Verify current model IDs — they change frequently

Weekly Installs

178

Repository

jezweb/claude-skills

GitHub Stars

643

First Seen

11 days ago

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

github-copilot173

opencode173

kimi-cli172

gemini-cli172

amp172

cline172

AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具

49,000 周安装