AI图像生成工具 - 根据文本描述创建封面图、插图和概念艺术图

image-gen by marswaveai/skills

460 周安装量

34 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/marswaveai/skills --skill image-gen

AI/机器学习内容创作设计

🇨🇳中文介绍

使用时机

用户希望通过文本描述生成 AI 图像
用户提到“生成图片”、“画图”、“创建图片”、“配图”
用户提到“生成图片”、“画一张”、“AI图”
用户需要封面图、插图或概念艺术图

不适用时机

用户想要创建音频内容（请使用 /podcast、/speech）
用户想要创建视频（请使用 /explainer）
用户想要编辑现有图片（不支持）
用户想要从 URL 提取内容（请使用 /content-parser）

目的

使用 Labnana API 生成 AI 图像。支持带有可选参考图像的文本提示、多种分辨率和宽高比。图像保存为本地文件。

硬性约束

禁止使用 shell 脚本。请根据“资源”中列出的 API 参考文件构建 curl 命令
始终阅读 shared/authentication.md 以获取 API 密钥和请求头信息
遵循 shared/common-patterns.md 进行错误处理

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

步骤 -1：API 密钥检查

遵循 shared/config-pattern.md § API 密钥检查。如果密钥缺失，立即停止。

步骤 0：配置设置

遵循 shared/config-pattern.md 步骤 0（零问题启动）。

如果文件不存在 — 静默创建默认配置并继续：

mkdir -p ".listenhub/image-gen"
echo '{"outputDir":".listenhub","outputMode":"inline"}' > ".listenhub/image-gen/config.json"
CONFIG_PATH=".listenhub/image-gen/config.json"
CONFIG=$(cat "$CONFIG_PATH")

不要询问任何设置问题。 直接进入交互流程。

如果文件存在 — 静默读取配置并继续：

CONFIG_PATH=".listenhub/image-gen/config.json"
[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/image-gen/config.json"
CONFIG=$(cat "$CONFIG_PATH")

设置流程（仅限用户主动要求重新配置时）

仅在用户明确要求重新配置时运行。显示当前设置：

当前配置 (image-gen)：
  输出方式：{inline / download / both}

outputMode : 遵循 shared/output-mode.md § 设置流程问题。

NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")

步骤 1：图像描述

自由文本输入。询问用户：

请描述您想要生成的图像。

如果提示非常简短（< 10 个词）且用户未要求逐字生成，则主动提供帮助以丰富提示。否则，直接使用。

问题："选择哪个模型？"
选项：
  - "pro (推荐)" — gemini-3-pro-image-preview，质量更高
  - "flash" — gemini-3.1-flash-image-preview，更快更便宜，解锁极端宽高比 (1:4, 4:1, 1:8, 8:1)

步骤 3：分辨率和宽高比

一起询问（独立参数）：

问题："选择什么分辨率？"
选项：
  - "1K" — 标准质量
  - "2K (推荐)" — 高质量，良好平衡
  - "4K" — 超高质量，生成速度较慢



问题："选择什么宽高比？"
选项（所有模型）：
  - "16:9" — 横向，宽屏
  - "1:1" — 方形
  - "9:16" — 纵向，手机屏幕
  - "其他" — 2:3, 3:2, 3:4, 4:3, 21:9

如果选择了 flash 模型，额外提供：1:4（窄纵向），4:1（宽横向），1:8（极端纵向），8:1（全景）

步骤 4：参考图像（可选）

问题："是否需要参考图像用于风格引导？"
选项：
  - "是，我有 URL(s)" — 提供参考图像的 URL
  - "是，我有本地文件" — 提供本地文件路径（base64 模式）
  - "不需要参考" — 仅根据提示生成

如果选择 URL 模式：收集 URL（逗号分隔，最多 14 个）。对于每个 URL，根据后缀推断 mimeType 并构建：

{ "fileData": { "fileUri": "<url>", "mimeType": "<inferred>" } }

后缀映射：.jpg/.jpeg → image/jpeg，.png → image/png，.webp → image/webp，.gif → image/gif

如果选择本地文件 (base64) 模式：收集文件路径（逗号分隔，最多 14 个）。对于每个文件，编码为 base64 并根据后缀推断 mimeType：

# macOS
BASE64_REF=$(base64 -i /path/to/image.png)
# Linux
BASE64_REF=$(base64 -w 0 /path/to/image.png)

{ "inlineData": { "data": "<base64-encoded>", "mimeType": "<inferred>" } }

后缀映射：.jpg/.jpeg → image/jpeg，.png → image/png，.webp → image/webp，.heic → image/heic，.heif → image/heif

步骤 5：确认并生成

总结所有选择：

准备生成图像：

  提示：{提示文本}
  模型：{pro / flash}
  分辨率：{1K / 2K / 4K}
  宽高比：{比例}
  参考图像：{是 — N 个 URL / 是 — N 个本地文件 / 否}

  是否继续？

在调用 API 之前，等待用户明确确认。

构建请求：使用 provider、model、prompt、imageConfig 和可选的 referenceImages（基于 URL 的 fileData 或基于 base64 的 inlineData）构建 JSON
编码本地文件（如果是 base64 模式）：对于每个本地文件路径，编码为 base64 并构建 inlineData 对象
提交：POST https://api.marswave.ai/openapi/v1/images/generation，超时时间为 600 秒
提取图像：从响应中解析 base64 数据
解码并呈现结果

从配置中读取 OUTPUT_MODE。遵循 shared/output-mode.md 中的行为规范。

inline 或 both：将 base64 解码到临时文件，然后使用 Read 工具。

JOB_ID=$(date +%s)
echo "$BASE64_DATA" | base64 -D > /tmp/image-gen-${JOB_ID}.jpg

然后在 /tmp/image-gen-{jobId}.jpg 上使用 Read 工具。图像将内联显示在对话中。

图片已生成！

download 或 both：保存到工件目录。

JOB_ID=$(date +%s)
DATE=$(date +%Y-%m-%d)
JOB_DIR=".listenhub/image-gen/${DATE}-${JOB_ID}"
mkdir -p "$JOB_DIR"
echo "$BASE64_DATA" | base64 -D > "${JOB_DIR}/${JOB_ID}.jpg"

图片已生成！

已保存到 .listenhub/image-gen/{YYYY-MM-DD}-{jobId}/：
  {jobId}.jpg

Base64 解码（跨平台）：

# Linux
echo "$BASE64_DATA" | base64 -d > output.jpg

# macOS
echo "$BASE64_DATA" | base64 -D > output.jpg
# 或
echo "$BASE64_DATA" | base64 --decode > output.jpg

重试逻辑：遇到 429（速率限制）时，等待 15 秒后重试。最多重试 3 次。

默认：直接传递用户的提示，不做修改。

何时提供优化建议：

提示非常简短（几个词）且用户未要求逐字生成
询问："您是否需要帮助，为提示添加风格/光照/构图等细节？"

何时绝不修改：

提示冗长、详细或结构清晰 — 将用户视为有经验者
用户明确表示"请完全按照此提示生成"

优化技巧（如果用户同意）：

风格："cyberpunk" → 添加"霓虹灯、未来主义、反乌托邦"
场景：时间、光照、天气
质量："高度详细"、"8K 质量"、"电影构图"
始终使用英文关键词（模型基于英文训练）
在提交前展示优化后的提示

图像生成：shared/api-image.md
错误处理：shared/common-patterns.md § 错误处理

调用：无（直接 API 调用）
被调用：平台技能用于封面图（第二阶段）

用户："生成一张图片：夜晚的赛博朋克城市"

代理工作流程：

提示简短 → 提供优化建议 → 用户拒绝
询问模型 → "pro"
询问分辨率 → "2K"
询问宽高比 → "16:9"
无参考图像

RESPONSE=$(curl -sS -X POST "https://api.marswave.ai/openapi/v1/images/generation" \

  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -H "X-Source: skills" \
  --max-time 600 \
  -d '{
    "provider": "google",
    "model": "gemini-3-pro-image-preview",
    "prompt": "cyberpunk city at night",
    "imageConfig": {"imageSize": "2K", "aspectRatio": "16:9"}
  }')

BASE64_DATA=$(echo "$RESPONSE" | jq -r '.candidates[0].content.parts[0].inlineData.data // .data')
JOB_ID=$(date +%s)
DATE=$(date +%Y-%m-%d)
JOB_DIR=".listenhub/image-gen/${DATE}-${JOB_ID}"
mkdir -p "$JOB_DIR"
echo "$BASE64_DATA" | base64 -D > "${JOB_DIR}/${JOB_ID}.jpg"

根据 outputMode 解码 base64 数据（参见 shared/output-mode.md）。

示例 2 — 使用本地参考图像 (base64)

用户："按照这个风格生成一张图片"（提供本地文件路径）

代理工作流程：

询问提示 → "黎明时分宁静的山间湖泊"
询问模型 → "pro"
询问分辨率 → "2K"
询问宽高比 → "16:9"
参考图像 → 本地文件 → /path/to/style-reference.png

# 编码本地参考图像

BASE64_REF=$(base64 -i /path/to/style-reference.png)

RESPONSE=$(curl -sS -X POST "https://api.marswave.ai/openapi/v1/images/generation" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  --max-time 600 \
  -d "{
    \"provider\": \"google\",
    \"model\": \"gemini-3-pro-image-preview\",
    \"prompt\": \"a serene mountain lake at dawn\",
    \"imageConfig\": {\"imageSize\": \"2K\", \"aspectRatio\": \"16:9\"},
    \"referenceImages\": [{\"inlineData\": {\"data\": \"$BASE64_REF\", \"mimeType\": \"image/png\"}}]
  }")

BASE64_DATA=$(echo "$RESPONSE" | jq -r '.candidates[0].content.parts[0].inlineData.data // .data')
JOB_ID=$(date +%s)
DATE=$(date +%Y-%m-%d)
JOB_DIR=".listenhub/image-gen/${DATE}-${JOB_ID}"
mkdir -p "$JOB_DIR"
echo "$BASE64_DATA" | base64 -D > "${JOB_DIR}/${JOB_ID}.jpg"

根据 outputMode 解码 base64 数据（参见 shared/output-mode.md）。

🇺🇸English

When to Use

User wants to generate an AI image from a text description
User says "generate image", "draw", "create picture", "配图"
User says "生成图片", "画一张", "AI图"
User needs a cover image, illustration, or concept art

When NOT to Use

User wants to create audio content (use /podcast, /speech)
User wants to create a video (use /explainer)
User wants to edit an existing image (not supported)
User wants to extract content from a URL (use /content-parser)

Purpose

Generate AI images using the Labnana API. Supports text prompts with optional reference images, multiple resolutions, and aspect ratios. Images are saved as local files.

Hard Constraints

No shell scripts. Construct curl commands from the API reference files listed in Resources
Always read shared/authentication.md for API key and headers
Follow shared/common-patterns.md for error handling
Image generation uses a different base URL : https://api.marswave.ai/openapi/v1
Always read config following shared/config-pattern.md before any interaction
Output saved to .listenhub/image-gen/YYYY-MM-DD-{jobId}/ — never ~/Downloads/

Step -1: API Key Check

Follow shared/config-pattern.md § API Key Check. If the key is missing, stop immediately.

Step 0: Config Setup

Follow shared/config-pattern.md Step 0 (Zero-Question Boot).

If file doesn't exist — silently create with defaults and proceed:

mkdir -p ".listenhub/image-gen"
echo '{"outputDir":".listenhub","outputMode":"inline"}' > ".listenhub/image-gen/config.json"
CONFIG_PATH=".listenhub/image-gen/config.json"
CONFIG=$(cat "$CONFIG_PATH")

Do NOT ask any setup questions. Proceed directly to the Interaction Flow.

If file exists — read config silently and proceed:

CONFIG_PATH=".listenhub/image-gen/config.json"
[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/image-gen/config.json"
CONFIG=$(cat "$CONFIG_PATH")

Setup Flow (user-initiated reconfigure only)

Only run when the user explicitly asks to reconfigure. Display current settings:

当前配置 (image-gen)：
  输出方式：{inline / download / both}

Then ask:

outputMode : Follow shared/output-mode.md § Setup Flow Question.

Save immediately:

NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")

Interaction Flow

Step 1: Image Description

Free text input. Ask the user:

Describe the image you want to generate.

If the prompt is very short (< 10 words) and the user hasn't asked for verbatim generation, offer to help enrich the prompt. Otherwise, use as-is.

Step 2: Model

Ask:

Question: "Which model?"
Options:
  - "pro (recommended)" — gemini-3-pro-image-preview, higher quality
  - "flash" — gemini-3.1-flash-image-preview, faster and cheaper, unlocks extreme aspect ratios (1:4, 4:1, 1:8, 8:1)

Step 3: Resolution and Aspect Ratio

Ask both together (independent parameters):

Question: "What resolution?"
Options:
  - "1K" — Standard quality
  - "2K (recommended)" — High quality, good balance
  - "4K" — Ultra high quality, slower generation



Question: "What aspect ratio?"
Options (all models):
  - "16:9" — Landscape, widescreen
  - "1:1" — Square
  - "9:16" — Portrait, phone screen
  - "Other" — 2:3, 3:2, 3:4, 4:3, 21:9

If flash model was selected, also offer: 1:4 (narrow portrait), 4:1 (wide landscape), 1:8 (extreme portrait), 8:1 (panoramic)

Step 4: Reference Images (optional)

Question: "Any reference images for style guidance?"
Options:
  - "Yes, I have URL(s)" — Provide reference image URLs
  - "Yes, I have local file(s)" — Provide local file paths (base64 mode)
  - "No references" — Generate from prompt only

If URL mode : Collect URLs (comma-separated, max 14). For each URL, infer mimeType from suffix and build:

{ "fileData": { "fileUri": "<url>", "mimeType": "<inferred>" } }

Suffix mapping: .jpg/.jpeg → image/jpeg, .png → image/png, .webp → image/webp, .gif → image/gif

If local file (base64) mode : Collect file paths (comma-separated, max 14). For each file, encode to base64 and infer mimeType from suffix:

# macOS
BASE64_REF=$(base64 -i /path/to/image.png)
# Linux
BASE64_REF=$(base64 -w 0 /path/to/image.png)

Build:

{ "inlineData": { "data": "<base64-encoded>", "mimeType": "<inferred>" } }

Suffix mapping: .jpg/.jpeg → image/jpeg, .png → image/png, .webp → image/webp, .heic → image/heic, .heif → image/heif

Step 5: Confirm & Generate

Summarize all choices:

Ready to generate image:

  Prompt: {prompt text}
  Model: {pro / flash}
  Resolution: {1K / 2K / 4K}
  Aspect ratio: {ratio}
  References: {yes — N URL(s) / yes — N local file(s) / no}

  Proceed?

Wait for explicit confirmation before calling the API.

Workflow

Build request : Construct JSON with provider, model, prompt, imageConfig, and optional referenceImages (URL-based via fileData or base64 via inlineData)
Encode local files (if base64 mode): For each local file path, encode to base64 and build inlineData objects
Submit : POST https://api.marswave.ai/openapi/v1/images/generation with timeout of 600s
Extract image : Parse base64 data from response
Decode and present result

Read OUTPUT_MODE from config. Follow shared/output-mode.md for behavior.

inline or both: Decode base64 to a temp file, then use the Read tool.

JOB_ID=$(date +%s)
echo "$BASE64_DATA" | base64 -D > /tmp/image-gen-${JOB_ID}.jpg

Then use the Read tool on /tmp/image-gen-{jobId}.jpg. The image displays inline in the conversation.

Present:

图片已生成！

download or both: Save to the artifact directory.

JOB_ID=$(date +%s)
DATE=$(date +%Y-%m-%d)
JOB_DIR=".listenhub/image-gen/${DATE}-${JOB_ID}"
mkdir -p "$JOB_DIR"
echo "$BASE64_DATA" | base64 -D > "${JOB_DIR}/${JOB_ID}.jpg"

Present:

图片已生成！

已保存到 .listenhub/image-gen/{YYYY-MM-DD}-{jobId}/：
  {jobId}.jpg

Base64 decoding (cross-platform):

# Linux
echo "$BASE64_DATA" | base64 -d > output.jpg

# macOS
echo "$BASE64_DATA" | base64 -D > output.jpg
# or
echo "$BASE64_DATA" | base64 --decode > output.jpg

Retry logic : On 429 (rate limit), wait 15 seconds and retry. Max 3 retries.

Prompt Handling

Default : Pass the user's prompt directly without modification.

When to offer optimization :

Prompt is very short (a few words) AND user hasn't requested verbatim
Ask: "Would you like help enriching the prompt with style/lighting/composition details?"

When to never modify :

Long, detailed, or structured prompts — treat the user as experienced
User says "use this prompt exactly"

Optimization techniques (if user agrees):

Style: "cyberpunk" → add "neon lights, futuristic, dystopian"
Scene: time of day, lighting, weather
Quality: "highly detailed", "8K quality", "cinematic composition"
Always use English keywords (models trained on English)
Show optimized prompt before submitting

API Reference

Image generation: shared/api-image.md
Error handling: shared/common-patterns.md § Error Handling

Composability

Invokes : nothing (direct API call)
Invoked by : platform skills for cover images (Phase 2)

Example

User : "Generate an image: cyberpunk city at night"

Agent workflow :

Prompt is short → offer enrichment → user declines
Ask model → "pro"
Ask resolution → "2K"
Ask ratio → "16:9"
No references

RESPONSE=$(curl -sS -X POST "https://api.marswave.ai/openapi/v1/images/generation" \

  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -H "X-Source: skills" \
  --max-time 600 \
  -d '{
    "provider": "google",
    "model": "gemini-3-pro-image-preview",
    "prompt": "cyberpunk city at night",
    "imageConfig": {"imageSize": "2K", "aspectRatio": "16:9"}
  }')

BASE64_DATA=$(echo "$RESPONSE" | jq -r '.candidates[0].content.parts[0].inlineData.data // .data')
JOB_ID=$(date +%s)
DATE=$(date +%Y-%m-%d)
JOB_DIR=".listenhub/image-gen/${DATE}-${JOB_ID}"
mkdir -p "$JOB_DIR"
echo "$BASE64_DATA" | base64 -D > "${JOB_DIR}/${JOB_ID}.jpg"

Decode the base64 data per outputMode (see shared/output-mode.md).

Example 2 — With Local Reference Image (base64)

User : "Generate an image in this style" (provides a local file path)

Agent workflow :

Ask prompt → "a serene mountain lake at dawn"
Ask model → "pro"
Ask resolution → "2K"
Ask ratio → "16:9"
References → local file → /path/to/style-reference.png

# Encode local reference image

BASE64_REF=$(base64 -i /path/to/style-reference.png)

RESPONSE=$(curl -sS -X POST "https://api.marswave.ai/openapi/v1/images/generation" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  --max-time 600 \
  -d "{
    \"provider\": \"google\",
    \"model\": \"gemini-3-pro-image-preview\",
    \"prompt\": \"a serene mountain lake at dawn\",
    \"imageConfig\": {\"imageSize\": \"2K\", \"aspectRatio\": \"16:9\"},
    \"referenceImages\": [{\"inlineData\": {\"data\": \"$BASE64_REF\", \"mimeType\": \"image/png\"}}]
  }")

BASE64_DATA=$(echo "$RESPONSE" | jq -r '.candidates[0].content.parts[0].inlineData.data // .data')
JOB_ID=$(date +%s)
DATE=$(date +%Y-%m-%d)
JOB_DIR=".listenhub/image-gen/${DATE}-${JOB_ID}"
mkdir -p "$JOB_DIR"
echo "$BASE64_DATA" | base64 -D > "${JOB_DIR}/${JOB_ID}.jpg"

Decode the base64 data per outputMode (see shared/output-mode.md).

Weekly Installs

365

Repository

marswaveai/skills

GitHub Stars

First Seen

11 days ago

Security Audits

Gen Agent Trust HubPass SocketWarn SnykWarn

Installed on

codex361

gemini-cli359

cursor359

opencode359

kimi-cli358

amp358

AI Elements：基于shadcn/ui的AI原生应用组件库，快速构建对话界面

56,200 周安装