ai-image-generator by jezweb/claude-skills
npx skills add https://github.com/jezweb/claude-skills --skill ai-image-generator使用 AI API(Google Gemini 和 OpenAI GPT)生成图像。此技能教授如何通过提示词模式和 API 机制,直接从 Claude Code 生成专业图像。
托管替代方案:如果您不想管理 API 密钥,ImageBot 提供了一个托管的图像生成服务,支持相册模板和品牌工具包。
根据需求选择合适的模型:
| 需求 | 模型 | 原因 |
|---|---|---|
| 场景 / 库存照片 | Gemini 3.1 Flash Image | 最佳深度、复杂性和环境上下文 |
| 透明图标 / 徽标 | GPT Image 1.5 | 原生 RGBA 透明通道 (background: "transparent") |
| 图像上的文字 | GPT Image 1.5 | 90% 准确的文字渲染 |
| 草稿 / 迭代 | Gemini 2.5 Flash Image | 免费层级(约 500 次/天) |
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| 最终客户素材 | Gemini 3 Pro Image | 更高细节,更好的风格一致性 |
| 模型 | API ID | 提供商 |
|---|---|---|
| Gemini 3.1 Flash Image | gemini-3.1-flash-image-preview | Google AI |
| Gemini 3 Pro Image | gemini-3-pro-image-preview | Google AI |
| Gemini 2.5 Flash Image | gemini-2.5-flash-image | Google AI |
| GPT Image 1.5 | gpt-image-1.5 | OpenAI |
使用前请验证模型 ID — 它们经常变更:
curl -s "https://generativelanguage.googleapis.com/v1beta/models?key=$GEMINI_API_KEY" | python3 -c "import sys,json; [print(m['name']) for m in json.load(sys.stdin)['models'] if 'image' in m['name'].lower()]"
按此顺序构建提示词以获得一致的结果:
设定类型:"一张照片级真实感的照片"、"一张等距插图"、"一个扁平矢量图标"
谁或什么,包含具体细节:"一位温暖、平易近人的澳大利亚女性,30 岁出头,自然微笑"
场景和空间关系:"在一个明亮的现代住宅中,她身后的木架上摆放着赤陶土装饰品"
相机和灯光:"使用 85mm f/2.0 镜头拍摄,自然窗光,头部和肩部取景"
需要排除的内容:"照片级真实感,无文字,无水印,无徽标"
差 — 关键词堆砌:
"专业女性,水疗中心,温暖灯光,高质量,4K"
好 — 叙述性引导:
"一个温暖临床环境中的专业皮肤护理场景。
一位戴着蓝色医用手套的从业者使用微针笔
在客户的额头上操作。客户躺在白色的治疗床上,
闭着眼睛,放松状态。左侧窗户透进温暖的金色时刻光线。
背景可见赤陶土色调的墙壁。使用
85mm f/2.0 镜头拍摄,浅景深。无文字,无水印。"
| 用途 | 宽高比 | 模型 |
|---|---|---|
| 横幅主图 | 16:9 或 21:9 | Gemini |
| 服务卡片 | 4:3 或 3:4 | Gemini |
| 个人资料 / 头像 | 1:1 | Gemini |
| 图标 / 徽章 | 1:1 | GPT(透明) |
| OG / 社交分享 | 1.91:1 | Gemini |
| Instagram 帖子 | 1:1 或 4:5 | Gemini |
| 移动端横幅 | 9:16 | Gemini |
使用五部分框架。有关详细的摄影参数,请参考 references/prompting-guide.md。
python3 << 'PYEOF'
import json, base64, urllib.request, os, sys
GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY")
if not GEMINI_API_KEY:
print("Set GEMINI_API_KEY environment variable"); sys.exit(1)
model = "gemini-3.1-flash-image-preview"
url = f"https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent?key={GEMINI_API_KEY}"
prompt = """A professional photograph of a modern co-working space in
Newcastle, Australia. Natural light floods through floor-to-ceiling
windows. Three people collaborate at a standing desk — one pointing
at a laptop screen. Exposed brick wall, potted fiddle-leaf fig,
coffee cups on the desk. Shot at 35mm f/4.0, environmental portrait
style. No text, no watermarks, no logos."""
payload = json.dumps({
"contents": [{"parts": [{"text": prompt}]}],
"generationConfig": {
"responseModalities": ["TEXT", "IMAGE"],
"temperature": 0.8
}
}).encode()
req = urllib.request.Request(url, data=payload, headers={
"Content-Type": "application/json",
"User-Agent": "ImageGen/1.0"
})
resp = urllib.request.urlopen(req, timeout=120)
result = json.loads(resp.read())
# Extract image from response
for part in result["candidates"][0]["content"]["parts"]:
if "inlineData" in part:
img_data = base64.b64decode(part["inlineData"]["data"])
output_path = "hero-image.png"
with open(output_path, "wb") as f:
f.write(img_data)
print(f"Saved: {output_path} ({len(img_data):,} bytes)")
break
PYEOF
python3 << 'PYEOF'
import json, base64, urllib.request, os, sys
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
if not OPENAI_API_KEY:
print("Set OPENAI_API_KEY environment variable"); sys.exit(1)
url = "https://api.openai.com/v1/images/generations"
payload = json.dumps({
"model": "gpt-image-1.5",
"prompt": "A minimal, clean plumbing wrench icon. Flat design, single consistent stroke weight, modern style. On a transparent background.",
"n": 1,
"size": "1024x1024",
"background": "transparent",
"output_format": "png"
}).encode()
req = urllib.request.Request(url, data=payload, headers={
"Content-Type": "application/json",
"Authorization": f"Bearer {OPENAI_API_KEY}"
})
resp = urllib.request.urlopen(req, timeout=120)
result = json.loads(resp.read())
img_data = base64.b64decode(result["data"][0]["b64_json"])
with open("icon-wrench.png", "wb") as f:
f.write(img_data)
print(f"Saved: icon-wrench.png ({len(img_data):,} bytes)")
PYEOF
将生成的图像保存到 .jez/artifacts/ 或用户指定的路径。
后处理(可选):
# 转换为 WebP 用于网页
python3 -c "
from PIL import Image
img = Image.open('hero-image.png')
img.save('hero-image.webp', 'WEBP', quality=85)
print(f'WebP: {img.size[0]}x{img.size[1]}')
"
# 修剪透明图标的空白区域
python3 -c "
from PIL import Image
img = Image.open('icon.png')
trimmed = img.crop(img.getbbox())
trimmed.save('icon-trimmed.png')
"
将生成的图像发送回视觉模型进行质量检查:
# 发送给 Gemini Flash 进行评价
critique_prompt = """Review this image for:
1. AI 伪影(多余的手指、漂浮的物体、文字错误)
2. 技术准确性(错误的设备、不安全的姿势)
3. 构图问题(尴尬的裁剪、杂乱的背景)
4. 风格与专业库存照片的一致性
列出发现的所有问题,如果图像已准备好用于生产,则说 'PASS'。"""
如果发现问题,将它们作为负面指导添加到原始提示词中并重新生成。
Gemini 支持在对话轮次中编辑生成的图像。关键要求:保留模型响应中的思维签名。
# 第 1 轮:生成基础图像
contents = [{"role": "user", "parts": [{"text": "Scene prompt..."}]}]
# 响应包含 parts 上的 thoughtSignature — 完整保留它们
# 第 2 轮:编辑图像
contents = [
{"role": "user", "parts": [{"text": "Original prompt"}]},
{"role": "model", "parts": response_parts_with_signatures}, # 保持原样
{"role": "user", "parts": [{"text": "Edit: change the wall colour to blue. Keep everything else exactly the same."}]}
]
编辑提示词模式:始终指定要保持不变的内容,而不仅仅是更改的内容。模型会将未列出的元素视为可以自由修改。
好:"编辑此图像:保持人物、桌子和窗户不变。
仅更改:墙壁颜色从赤陶土色改为海洋蓝色。"
差:"现在把墙改成蓝色。"
(模型可能会更改其他所有内容)
| 提供商 | 获取密钥地址 | 环境变量 |
|---|---|---|
| Google Gemini | aistudio.google.com | GEMINI_API_KEY |
| OpenAI | platform.openai.com | OPENAI_API_KEY |
export GEMINI_API_KEY="your-key-here"
export OPENAI_API_KEY="your-key-here"
| 错误 | 修复方法 |
|---|---|
| 使用 curl 处理 Gemini 提示词 | 使用 Python — shell 转义会破坏撇号 |
| 使用"美丽、专业、高质量" | 使用具体规格:"85mm f/1.8,金色时刻光线" |
| 未指定要排除的内容 | 始终以"无文字,无水印,无徽标"结尾 |
| 向 Gemini 请求透明 PNG | Gemini 无法处理透明度 — 使用 GPT 并设置 background: "transparent" |
| 为澳大利亚企业使用美国默认设置 | 明确指定"澳大利亚" + 当地建筑、植被 |
| 模型 ID 使用通用数据 | 验证当前模型 ID — 它们经常变更 |
每周安装量
178
仓库
GitHub 星标数
643
首次出现
11 天前
安全审计
安装于
github-copilot173
opencode173
kimi-cli172
gemini-cli172
amp172
cline172
Generate images using AI APIs (Google Gemini and OpenAI GPT). This skill teaches the prompting patterns and API mechanics for producing professional images directly from Claude Code.
Managed alternative : If you don't want to manage API keys, ImageBot provides a managed image generation service with album templates and brand kit support.
Choose the right model for the job:
| Need | Model | Why |
|---|---|---|
| Scenes / stock photos | Gemini 3.1 Flash Image | Best depth, complexity, environmental context |
| Transparent icons / logos | GPT Image 1.5 | Native RGBA alpha channel (background: "transparent") |
| Text on images | GPT Image 1.5 | 90% accurate text rendering |
| Drafts / iteration | Gemini 2.5 Flash Image | Free tier (~500/day) |
| Final client assets | Gemini 3 Pro Image | Higher detail, better style consistency |
| Model | API ID | Provider |
|---|---|---|
| Gemini 3.1 Flash Image | gemini-3.1-flash-image-preview | Google AI |
| Gemini 3 Pro Image | gemini-3-pro-image-preview | Google AI |
| Gemini 2.5 Flash Image | gemini-2.5-flash-image | Google AI |
| GPT Image 1.5 | gpt-image-1.5 | OpenAI |
Verify model IDs before use — they change frequently:
curl -s "https://generativelanguage.googleapis.com/v1beta/models?key=$GEMINI_API_KEY" | python3 -c "import sys,json; [print(m['name']) for m in json.load(sys.stdin)['models'] if 'image' in m['name'].lower()]"
Build prompts in this order for consistent results:
Set the genre: "A photorealistic photograph", "An isometric illustration", "A flat vector icon"
Who or what, with specific details: "of a warm, approachable Australian woman in her early 30s, smiling naturally"
Setting and spatial relationships: "in a bright modern home with terracotta decor on wooden shelves behind her"
Camera and lighting: "Shot at 85mm f/2.0, natural window light, head and shoulders framing"
What to exclude: "Photorealistic, no text, no watermarks, no logos"
BAD — keyword soup:
"professional woman, spa, warm lighting, high quality, 4K"
GOOD — narrative direction:
"A professional skin treatment scene in a warm clinical setting.
A practitioner wearing blue medical gloves uses a microneedling pen
on the client's forehead. The client lies on a white treatment bed,
eyes closed, relaxed. Warm golden-hour light from a window to the
left. Terracotta-toned wall visible in the background. Shot at
85mm f/2.0, shallow depth of field. No text, no watermarks."
| Purpose | Aspect Ratio | Model |
|---|---|---|
| Hero banner | 16:9 or 21:9 | Gemini |
| Service card | 4:3 or 3:4 | Gemini |
| Profile / avatar | 1:1 | Gemini |
| Icon / badge | 1:1 | GPT (transparent) |
| OG / social share | 1.91:1 | Gemini |
| Instagram post | 1:1 or 4:5 | Gemini |
| Mobile hero | 9:16 | Gemini |
Use the 5-part framework. Refer to references/prompting-guide.md for detailed photography parameters.
python3 << 'PYEOF'
import json, base64, urllib.request, os, sys
GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY")
if not GEMINI_API_KEY:
print("Set GEMINI_API_KEY environment variable"); sys.exit(1)
model = "gemini-3.1-flash-image-preview"
url = f"https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent?key={GEMINI_API_KEY}"
prompt = """A professional photograph of a modern co-working space in
Newcastle, Australia. Natural light floods through floor-to-ceiling
windows. Three people collaborate at a standing desk — one pointing
at a laptop screen. Exposed brick wall, potted fiddle-leaf fig,
coffee cups on the desk. Shot at 35mm f/4.0, environmental portrait
style. No text, no watermarks, no logos."""
payload = json.dumps({
"contents": [{"parts": [{"text": prompt}]}],
"generationConfig": {
"responseModalities": ["TEXT", "IMAGE"],
"temperature": 0.8
}
}).encode()
req = urllib.request.Request(url, data=payload, headers={
"Content-Type": "application/json",
"User-Agent": "ImageGen/1.0"
})
resp = urllib.request.urlopen(req, timeout=120)
result = json.loads(resp.read())
# Extract image from response
for part in result["candidates"][0]["content"]["parts"]:
if "inlineData" in part:
img_data = base64.b64decode(part["inlineData"]["data"])
output_path = "hero-image.png"
with open(output_path, "wb") as f:
f.write(img_data)
print(f"Saved: {output_path} ({len(img_data):,} bytes)")
break
PYEOF
python3 << 'PYEOF'
import json, base64, urllib.request, os, sys
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
if not OPENAI_API_KEY:
print("Set OPENAI_API_KEY environment variable"); sys.exit(1)
url = "https://api.openai.com/v1/images/generations"
payload = json.dumps({
"model": "gpt-image-1.5",
"prompt": "A minimal, clean plumbing wrench icon. Flat design, single consistent stroke weight, modern style. On a transparent background.",
"n": 1,
"size": "1024x1024",
"background": "transparent",
"output_format": "png"
}).encode()
req = urllib.request.Request(url, data=payload, headers={
"Content-Type": "application/json",
"Authorization": f"Bearer {OPENAI_API_KEY}"
})
resp = urllib.request.urlopen(req, timeout=120)
result = json.loads(resp.read())
img_data = base64.b64decode(result["data"][0]["b64_json"])
with open("icon-wrench.png", "wb") as f:
f.write(img_data)
print(f"Saved: icon-wrench.png ({len(img_data):,} bytes)")
PYEOF
Save generated images to .jez/artifacts/ or the user's specified path.
Post-processing (optional):
# Convert to WebP for web use
python3 -c "
from PIL import Image
img = Image.open('hero-image.png')
img.save('hero-image.webp', 'WEBP', quality=85)
print(f'WebP: {img.size[0]}x{img.size[1]}')
"
# Trim whitespace from transparent icons
python3 -c "
from PIL import Image
img = Image.open('icon.png')
trimmed = img.crop(img.getbbox())
trimmed.save('icon-trimmed.png')
"
Send the generated image back to a vision model for QA:
# Send to Gemini Flash for critique
critique_prompt = """Review this image for:
1. AI artifacts (extra fingers, floating objects, text errors)
2. Technical accuracy (wrong equipment, unsafe positioning)
3. Composition issues (awkward cropping, cluttered background)
4. Style consistency with a professional stock photo
List any issues found, or say 'PASS' if the image is production-ready."""
If issues are found, append them as negative guidance to the original prompt and regenerate.
Gemini supports editing a generated image across conversation turns. The key requirement: preserve thought signatures from model responses.
# Turn 1: Generate base image
contents = [{"role": "user", "parts": [{"text": "Scene prompt..."}]}]
# The response includes thoughtSignature on parts — preserve them ALL
# Turn 2: Edit the image
contents = [
{"role": "user", "parts": [{"text": "Original prompt"}]},
{"role": "model", "parts": response_parts_with_signatures}, # Keep intact
{"role": "user", "parts": [{"text": "Edit: change the wall colour to blue. Keep everything else exactly the same."}]}
]
Edit prompt pattern : Always specify what to KEEP unchanged, not just what to change. The model treats unlisted elements as free to modify.
GOOD: "Edit this image: keep the people, desk, and window unchanged.
Only change: wall colour from terracotta to ocean blue."
BAD: "Now make the wall blue."
(Model may change everything else too)
| Provider | Get key at | Env variable |
|---|---|---|
| Google Gemini | aistudio.google.com | GEMINI_API_KEY |
| OpenAI | platform.openai.com | OPENAI_API_KEY |
export GEMINI_API_KEY="your-key-here"
export OPENAI_API_KEY="your-key-here"
| Mistake | Fix |
|---|---|
| Using curl for Gemini prompts | Use Python — shell escaping breaks on apostrophes |
| "Beautiful, professional, high quality" | Use concrete specs: "85mm f/1.8, golden hour light" |
| Not specifying what to exclude | Always end with "No text, no watermarks, no logos" |
| Requesting transparent PNG from Gemini | Gemini cannot do transparency — use GPT with background: "transparent" |
| American defaults for AU businesses | Explicitly specify "Australian" + local architecture, vegetation |
| Generic data for model ID | Verify current model IDs — they change frequently |
Weekly Installs
178
Repository
GitHub Stars
643
First Seen
11 days ago
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
github-copilot173
opencode173
kimi-cli172
gemini-cli172
amp172
cline172
AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具
49,000 周安装