Gemini 图像分析工具 - 使用 AI 视觉分析图片、提取文本、调试代码、UI设计反馈

gemini-image by johnlindquist/claude

608 周安装量

21 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/johnlindquist/claude --skill gemini-image

AI/机器学习开发自动化

🇨🇳中文介绍

Gemini 图像分析

使用 Gemini Pro 的视觉能力分析图像。

前置条件

pip install google-generativeai
export GEMINI_API_KEY=your_api_key

CLI 参考

基础图像分析

# 分析图像
gemini -m pro -f /path/to/image.png "详细描述这张图片"

# 提出具体问题
gemini -m pro -f screenshot.png "显示了什么错误信息？"

# 多张图像
gemini -m pro -f image1.png -f image2.png "比较这两张图片"

分析操作

通用描述

gemini -m pro -f image.png "全面描述这张图片：
1. 主要主题/内容
2. 颜色和构图
3. 可见文本（如果有）
4. 上下文和用途
5. 显著细节"

提取文本 (OCR)

gemini -m pro -f screenshot.png "从这张图片中提取所有文本。
格式化为纯文本，尽可能保留布局。
包括按钮、标签或 UI 元素中的任何文本。"

从截图提取代码

gemini -m pro -f code-screenshot.png "从这张截图中提取代码。
提供格式正确、缩进正确的代码。
注明任何不清楚或部分可见的部分。"

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

相关 Skills

FlyClaw：零登录航班聚合查询工具，Python实现多源航班信息与价格搜索

4,000,000 周安装

find-skills 技能搜索工具 - Vercel Labs 开源智能体技能包管理器

749,400 周安装

Vercel React 最佳实践指南 | 58条Next.js性能优化规则与代码重构

255,700 周安装

Vercel Web界面规范检查工具 - 自动检测代码是否符合Web设计指南

205,600 周安装

gemini -m pro -f ui-screenshot.png "分析此 UI：
1. 这是什么应用程序/网站？
2. 显示的是什么页面/屏幕？
3. 主要 UI 元素及其用途
4. 可用的用户流程/操作
5. 任何 UX 问题或建议"

gemini -m pro -f error-screenshot.png "分析此错误：
1. 显示了什么错误？
2. 可能的原因是什么？
3. 如何修复？
4. 任何可见的相关信息？"

gemini -m pro -f diagram.png "解释此图表：
1. 这是什么类型的图表？
2. 主要组件及其关系
3. 数据/流程流向
4. 关键要点"

gemini -m pro -f debug-screen.png "我正在调试一个问题。根据此截图：
1. 当前状态是什么？
2. 可见哪些错误或警告？
3. 我应该查看什么？
4. 建议的后续步骤"

gemini -m pro -f before.png -f after.png "比较这些前后版本图片：
1. 发生了什么变化？
2. 这是改进吗？
3. '之后'版本有任何问题吗？
4. 缺少了什么？"

gemini -m pro -f design.png "提供设计反馈：
1. 视觉层次
2. 色彩使用
3. 排版
4. 间距和对齐
5. 无障碍性考虑
6. 改进建议"

gemini -m pro -f chart.png "从此图表中提取数据：
1. 图表类型
2. 数据系列和值
3. 轴标签和范围
4. 关键趋势或洞察
5. 如果可能，输出为结构化数据"

gemini -m pro -f form.png "分析此表单：
1. 表单用途
2. 字段及其类型
3. 必填与可选
4. 可见的验证规则
5. UX 建议"

截图转问题报告

# 捕获截图 (macOS)
screencapture -i /tmp/bug.png

# 分析并格式化为问题报告
gemini -m pro -f /tmp/bug.png "根据此截图创建错误报告：

## 摘要
[一行描述]

## 重现步骤
[根据截图推断]

## 预期行为
[应该发生什么]

## 实际行为
[截图显示的内容]

## 环境
[任何可见的系统信息]"

gemini -m pro -f ui-design.png "生成重新创建此 UI 的 React 组件代码：
- 使用 Tailwind CSS 进行样式设计
- 使其具有响应性
- 包含适当的 TypeScript 类型
- 添加适当的无障碍属性"

gemini -m pro -f app-screen.png "为此屏幕编写用户文档：
- 此屏幕的用途
- 如何使用每个功能
- 常见任务
- 提示和注意事项"

支持的图像类型

PNG, JPEG, GIF, WebP
截图
照片
图表和示意图
UI 原型图
代码片段
文档

使用清晰的图像 - 质量越高 = 分析效果越好
裁剪到相关区域 - 移除不必要的上下文
提出具体问题 - 模糊的提示会得到模糊的答案
提供上下文 - 告诉 Gemini 你在寻找什么
验证提取的文本 - OCR 并不完美
多角度分析 - 对于复杂主题使用多张图像

🇺🇸English

Gemini Image Analysis

Analyze images using Gemini Pro's vision capabilities.

Prerequisites

pip install google-generativeai
export GEMINI_API_KEY=your_api_key

CLI Reference

Basic Image Analysis

# Analyze an image
gemini -m pro -f /path/to/image.png "Describe this image in detail"

# With specific question
gemini -m pro -f screenshot.png "What error message is shown?"

# Multiple images
gemini -m pro -f image1.png -f image2.png "Compare these two images"

Analysis Operations

General Description

gemini -m pro -f image.png "Describe this image comprehensively:
1. Main subject/content
2. Colors and composition
3. Text visible (if any)
4. Context and purpose
5. Notable details"

Extract Text (OCR)

gemini -m pro -f screenshot.png "Extract all text from this image.
Format as plain text, preserving layout where possible.
Include any text in buttons, labels, or UI elements."

Code from Screenshot

gemini -m pro -f code-screenshot.png "Extract the code from this screenshot.
Provide as properly formatted code with correct indentation.
Note any parts that are unclear or partially visible."

UI Analysis

gemini -m pro -f ui-screenshot.png "Analyze this UI:
1. What application/website is this?
2. What page/screen is shown?
3. Main UI elements and their purpose
4. User flow/actions available
5. Any UX issues or suggestions"

Error Analysis

gemini -m pro -f error-screenshot.png "Analyze this error:
1. What error is shown?
2. What is the likely cause?
3. How to fix it?
4. Any related information visible?"

Diagram Understanding

gemini -m pro -f diagram.png "Explain this diagram:
1. What type of diagram is this?
2. Main components and their relationships
3. Data/process flow
4. Key takeaways"

Specific Use Cases

Debug Screenshot

gemini -m pro -f debug-screen.png "I'm debugging an issue. From this screenshot:
1. What is the current state?
2. What errors or warnings are visible?
3. What should I look at?
4. Suggested next steps"

Compare Before/After

gemini -m pro -f before.png -f after.png "Compare these before and after images:
1. What changed?
2. Is this an improvement?
3. Any issues in the 'after' version?
4. Anything missing?"

Design Feedback

gemini -m pro -f design.png "Provide design feedback:
1. Visual hierarchy
2. Color usage
3. Typography
4. Spacing and alignment
5. Accessibility concerns
6. Suggestions for improvement"

Data Extraction

gemini -m pro -f chart.png "Extract data from this chart:
1. Chart type
2. Data series and values
3. Axes labels and ranges
4. Key trends or insights
5. Output as structured data if possible"

Form Analysis

gemini -m pro -f form.png "Analyze this form:
1. Form purpose
2. Fields and their types
3. Required vs optional
4. Validation rules visible
5. UX suggestions"

Workflow Patterns

Screenshot to Issue

# Capture screenshot (macOS)
screencapture -i /tmp/bug.png

# Analyze and format as issue
gemini -m pro -f /tmp/bug.png "Create a bug report from this screenshot:

## Summary
[One-line description]

## Steps to Reproduce
[Inferred from screenshot]

## Expected Behavior
[What should happen]

## Actual Behavior
[What the screenshot shows]

## Environment
[Any visible system info]"

UI to Code

gemini -m pro -f ui-design.png "Generate React component code that recreates this UI:
- Use Tailwind CSS for styling
- Make it responsive
- Include proper TypeScript types
- Add appropriate accessibility attributes"

Documentation

gemini -m pro -f app-screen.png "Write user documentation for this screen:
- What this screen is for
- How to use each feature
- Common tasks
- Tips and notes"

Image Types Supported

PNG, JPEG, GIF, WebP
Screenshots
Photos
Diagrams and charts
UI mockups
Code snippets
Documents

Best Practices

Use clear images - Higher quality = better analysis
Crop to relevant area - Remove unnecessary context
Ask specific questions - Vague prompts get vague answers
Provide context - Tell Gemini what you're looking for
Verify extracted text - OCR isn't perfect
Multiple angles - Use multiple images for complex subjects

Weekly Installs

580

Repository

johnlindquist/claude

GitHub Stars

First Seen

Jan 21, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

opencode555

gemini-cli553

codex548

cursor546

github-copilot533

amp521

agent-browser 浏览器自动化工具 - Vercel Labs 命令行网页操作与测试

136,300 周安装