桌面控制技能：AI代理自动化鼠标键盘截图，PyAutoGUI桌面控制工具

Desktop Control by patrickporto/desktop-agent

GitHub

安装命令

npx skills add https://github.com/patrickporto/desktop-agent --skill 'Desktop Control'

自动化命令行工具测试

🇨🇳中文介绍

桌面控制技能

此技能通过 PyAutoGUI 提供全面的桌面自动化能力，允许 AI 代理控制鼠标、键盘、截取屏幕截图以及与桌面环境交互。

如何使用此技能

作为 AI 代理，您可以使用 uvx desktop-agent CLI 调用桌面自动化命令。

命令结构

所有命令遵循以下模式：

uvx desktop-agent <category> <command> [arguments] [options]

类别：

mouse - 鼠标控制
keyboard - 键盘输入
screen - 屏幕截图和屏幕分析
message - 用户对话框
app - 应用程序控制（打开、聚焦、列出窗口）

可用命令

🖱️ 鼠标控制 (`mouse`)

控制光标移动和点击。

# 将光标移动到坐标
uvx desktop-agent mouse move <x> <y> [--duration SECONDS]

# 在当前位置或特定坐标点击
uvx desktop-agent mouse click [x] [y] [--button left|right|middle] [--clicks N]

# 特殊点击
uvx desktop-agent mouse double-click [x] [y]
uvx desktop-agent mouse right-click [x] [y]
uvx desktop-agent mouse middle-click [x] [y]

# 拖拽到坐标
uvx desktop-agent mouse drag <x> <y> [--duration SECONDS] [--button BUTTON]

# 滚动（正数=向上，负数=向下）
uvx desktop-agent mouse scroll <clicks> [x] [y]

# 获取当前鼠标位置
uvx desktop-agent mouse position

示例：

# 移动到 1920x1080 屏幕中心
uvx desktop-agent mouse move 960 540 --duration 0.5

# 在特定位置右键点击
uvx desktop-agent mouse right-click 500 300

# 向下滚动 5 次
uvx desktop-agent mouse scroll -5

⌨️ 键盘控制 (`keyboard`)

输入文本和执行键盘快捷键。

# 输入文本
uvx desktop-agent keyboard write "<text>" [--interval SECONDS]

# 按下按键
uvx desktop-agent keyboard press <key> [--presses N] [--interval SECONDS]

# 执行热键组合（逗号分隔）
uvx desktop-agent keyboard hotkey "<key1>,<key2>,..."

# 按住/释放按键
uvx desktop-agent keyboard keydown <key>
uvx desktop-agent keyboard keyup <key>

示例：

# 以自然延迟输入文本
uvx desktop-agent keyboard write "Hello World" --interval 0.05

# 复制选中的文本
uvx desktop-agent keyboard hotkey "ctrl,c"

# 打开任务管理器
uvx desktop-agent keyboard hotkey "ctrl,shift,esc"

# 按 Enter 键 3 次
uvx desktop-agent keyboard press enter --presses 3

常用键名：

修饰键：ctrl, shift, alt, win
特殊键：enter, tab, esc, space, backspace, delete
功能键：f1 到 f12

🖼️ 屏幕与截图 (`screen`)

捕获屏幕截图和分析屏幕内容。支持定位特定窗口。

# 截取屏幕截图
uvx desktop-agent screen screenshot <filename> [--region "x,y,width,height"] [--window <title>] [--active]

# 在屏幕或窗口内定位图像
uvx desktop-agent screen locate <image_path> [--confidence 0.0-1.0] [--window <title>] [--active]
uvx desktop-agent screen locate-center <image_path> [--confidence 0.0-1.0] [--window <title>] [--active]

# 使用 OCR 在窗口内定位文本
uvx desktop-agent screen locate-text-coordinates <text> [--window <title>] [--active]
uvx desktop-agent screen read-all-text [--window <title>] [--active]

# 实用命令
uvx desktop-agent screen pixel <x> <y>
uvx desktop-agent screen size
uvx desktop-agent screen on-screen <x> <y>

示例：

# 活动窗口截图
uvx desktop-agent screen screenshot active.png --active

# 特定应用程序截图
uvx desktop-agent screen screenshot chrome.png --window "Google Chrome"

# 在记事本内定位图像
uvx desktop-agent screen locate-center button.png --window "Notepad"

💬 消息对话框 (`message`)

显示用户交互对话框。

# 显示警告
uvx desktop-agent message alert "<text>" [--title TITLE] [--button BUTTON]

# 显示确认对话框
uvx desktop-agent message confirm "<text>" [--title TITLE] [--buttons "OK,Cancel"]

# 提示输入
uvx desktop-agent message prompt "<text>" [--title TITLE] [--default TEXT]

# 密码输入
uvx desktop-agent message password "<text>" [--title TITLE] [--mask CHAR]

示例：

# 简单警告
uvx desktop-agent message alert "任务完成！"

# 获取用户确认
uvx desktop-agent message confirm "继续操作？"

# 请求用户输入
uvx desktop-agent message prompt "请输入您的姓名："

📱 应用程序控制 (`app`)

跨 Windows、macOS 和 Linux 控制应用程序。

# 按名称打开应用程序
uvx desktop-agent app open <name> [--arg ARGS...]

# 通过标题/名称聚焦窗口
uvx desktop-agent app focus <name>

# 列出所有可见窗口
uvx desktop-agent app list

示例：

# Windows：打开记事本
uvx desktop-agent app open notepad

# Windows：打开 Chrome 并附带 URL
uvx desktop-agent app open "chrome" --arg "https://google.com"

# macOS：打开 Safari
uvx desktop-agent app open "Safari"

# 聚焦特定窗口
uvx desktop-agent app focus "Untitled - Notepad"

# 列出所有打开的窗口
uvx desktop-agent app list

常见自动化工作流

工作流 1：打开应用程序并输入

# 直接打开记事本（跨平台）
uvx desktop-agent app open notepad

# 等待应用打开，然后聚焦它
uvx desktop-agent app focus notepad

# 输入一些文本
uvx desktop-agent keyboard write "Hello from Desktop Skill!"

工作流 2：截图 + 分析

# 首先获取屏幕尺寸
uvx desktop-agent screen size

# 截取完整屏幕截图
uvx desktop-agent screen screenshot current_screen.png

# 检查特定 UI 元素是否可见
uvx desktop-agent screen locate save_button.png

工作流 3：表单填写

# 点击第一个字段
uvx desktop-agent mouse click 300 200

# 填写字段
uvx desktop-agent keyboard write "John Doe"

# 按 Tab 键切换到下一个字段
uvx desktop-agent keyboard press tab

# 填写第二个字段
uvx desktop-agent keyboard write "john@example.com"

# 提交表单（按 Enter）
uvx desktop-agent keyboard press enter

工作流 4：复制/粘贴操作

# 全选文本
uvx desktop-agent keyboard hotkey "ctrl,a"

# 复制
uvx desktop-agent keyboard hotkey "ctrl,c"

# 点击目标位置
uvx desktop-agent mouse click 500 600

# 粘贴
uvx desktop-agent keyboard hotkey "ctrl,v"

安全注意事项

使用此技能时，AI 代理应：

验证坐标：在点击前使用 screen size 和 on-screen
添加延迟：在命令之间插入适当的延迟以确保 UI 响应
验证图像：在使用 locate 命令前确保图像文件存在
处理失败：如果窗口改变或元素移动，命令可能会失败
用户安全：始终通过 message confirm 与用户确认破坏性操作

故障排除

PyAutoGUI 安全保护

PyAutoGUI 有一个安全保护机制：将鼠标移动到屏幕角落会中止操作。这是一个安全功能。

图像未找到

使用 screen locate 时，请确保：

图像文件存在且路径正确
调整 --confidence（尝试 0.7-0.9）
图像与屏幕外观完全匹配（分辨率、颜色）

获取帮助

# 显示所有可用命令
uvx desktop-agent --help

# 显示特定类别的命令
uvx desktop-agent mouse --help
uvx desktop-agent keyboard --help
uvx desktop-agent screen --help
uvx desktop-agent message --help

# 显示特定命令的帮助
uvx desktop-agent mouse move --help

AI 代理集成提示

始终先检查屏幕尺寸，当使用绝对坐标时
尽可能使用相对定位（例如，获取当前位置，计算偏移量）
组合命令以完成复杂工作流
执行前验证（例如，检查图像是否在屏幕上）
使用消息对话框提供用户反馈，用于重要操作
优雅地处理错误 - 如果 UI 状态改变，命令可能会失败

性能说明

带有 --duration 的鼠标移动是动画的并且需要时间
图像定位（locate）在大屏幕上可能很慢 - 尽可能使用区域
键盘命令通常很快（< 100ms）
截图取决于屏幕分辨率和区域大小

输出格式

所有命令默认输出结构化 JSON，非常适合 AI 代理以编程方式使用：

uvx desktop-agent mouse position
# 输出：{"success": true, "command": "mouse.position", "timestamp": "2026-01-31T10:00:00Z", "duration_ms": 5, "data": {"position": {"x": 960, "y": 540}}}

响应模式

所有 JSON 响应都遵循此模式：

{
  "success": true,
  "command": "category.command",
  "timestamp": "2026-01-31T10:00:00Z",
  "duration_ms": 150,
  "data": { ... },
  "error": null
}

错误响应模式

{
  "success": false,
  "command": "category.command",
  "timestamp": "2026-01-31T10:00:00Z",
  "duration_ms": 50,
  "data": null,
  "error": {
    "code": "image_not_found",
    "message": "Image file 'button.png' not found",
    "details": {},
    "recoverable": true
  }
}

错误代码

代码	描述
`success`	命令成功
`invalid_argument`	无效的命令参数
`coordinates_out_of_bounds`	坐标超出屏幕范围
`image_not_found`	图像文件未找到或不在屏幕上
`window_not_found`	目标窗口未找到
`ocr_failed`	OCR 操作失败

鼠标移动：

uvx desktop-agent mouse move 960 540



{"success": true, "command": "mouse.move", "timestamp": "...", "duration_ms": 150, "data": {"x": 960, "y": 540, "duration": 0}, "error": null}

屏幕尺寸：

uvx desktop-agent screen size



{"success": true, "command": "screen.size", "timestamp": "...", "duration_ms": 5, "data": {"size": {"width": 1920, "height": 1080}}, "error": null}

定位图像：

uvx desktop-agent screen locate button.png



{"success": true, "command": "screen.locate", "timestamp": "...", "duration_ms": 250, "data": {"image_found": true, "bounding_box": {"left": 100, "top": 200, "width": 50, "height": 30, "center_x": 125, "center_y": 215}}, "error": null}

列出窗口：

uvx desktop-agent app list



{"success": true, "command": "app.list", "timestamp": "...", "duration_ms": 100, "data": {"windows": ["Untitled - Notepad", "Google Chrome", "Visual Studio Code"]}, "error": null}

错误示例：

uvx desktop-agent screen locate missing.png



{"success": false, "command": "screen.locate", "timestamp": "...", "duration_ms": 50, "data": null, "error": {"code": "image_not_found", "message": "Image file 'missing.png' not found", "details": {}, "recoverable": true}}

AI 代理有效使用指南

本节教授 AI 代理如何有效使用此技能，包括最佳命令序列和最佳实践。

🎯 核心策略：先观察，后行动

始终在执行操作前了解当前状态。这可以避免点击错误的坐标或在错误的窗口中输入。

推荐的初始序列：

# 1. 获取屏幕尺寸以了解您的工作空间
uvx desktop-agent screen size
uvx desktop-agent app list
uvx desktop-agent mouse position

📋 按任务推荐的命令序列

打开并与应用程序交互

# ✅ 正确：打开、等待、验证，然后交互
uvx desktop-agent app open notepad              # 步骤 1：打开应用
uvx desktop-agent app list
uvx desktop-agent app focus "Notepad"
uvx desktop-agent keyboard write "Hello World"  # 步骤 4：现在可以安全地输入

# ❌ 错误：未经验证立即输入
uvx desktop-agent app open notepad
uvx desktop-agent keyboard write "Hello World"  # 可能输入到错误的窗口！

查找并点击 UI 元素（基于图像）

# ✅ 正确：先定位，如果找到则点击
uvx desktop-agent screen locate-center button.png --confidence 0.8
# 检查 success=true 且坐标有效
uvx desktop-agent mouse click 125 215  # 使用返回的坐标

# ❌ 错误：未验证元素存在就点击
uvx desktop-agent mouse click 125 215  # 可能点击错误区域！

查找并点击 UI 元素（基于文本，使用 OCR）

# ✅ 正确：读取屏幕文本，然后定位特定文本
uvx desktop-agent screen read-all-text --active
uvx desktop-agent screen locate-text-coordinates "Save" --active
# 使用返回的坐标进行点击

# 对于特定窗口的 OCR：
uvx desktop-agent screen locate-text-coordinates "OK" --window "Dialog Title"

填写包含多个字段的表单

# ✅ 正确：在输入前明确点击每个字段
uvx desktop-agent mouse click 300 200           # 点击第一个字段
uvx desktop-agent keyboard write "John Doe"
uvx desktop-agent mouse click 300 250           # 点击第二个字段（更可靠）
uvx desktop-agent keyboard write "john@example.com"
uvx desktop-agent mouse click 300 300           # 点击第三个字段
uvx desktop-agent keyboard write "555-1234"

# 或者使用 Tab 导航（如果字段顺序改变则不太可靠）
uvx desktop-agent mouse click 300 200
uvx desktop-agent keyboard write "John Doe"
uvx desktop-agent keyboard press tab
uvx desktop-agent keyboard write "john@example.com"
uvx desktop-agent keyboard press tab
uvx desktop-agent keyboard write "555-1234"
uvx desktop-agent keyboard press enter          # 提交

为分析截取目标截图

# ✅ 正确：截取特定窗口以加快处理速度
uvx desktop-agent app list --json                           # 查找确切的窗口标题
uvx desktop-agent screen screenshot app.png --window "Google Chrome"

# 仅限活动窗口
uvx desktop-agent screen screenshot active.png --active

# 仅在必要时截取全屏（较慢，文件较大）
uvx desktop-agent screen size
uvx desktop-agent screen screenshot full.png

安全的拖放操作

# ✅ 正确：移动到起点，验证位置，然后拖拽
uvx desktop-agent mouse move 100 200                 # 移动到源位置
uvx desktop-agent mouse position              # 验证位置
uvx desktop-agent mouse drag 500 400 --duration 0.5  # 拖拽到目标位置

# 为了精确，使用较慢的持续时间
uvx desktop-agent mouse drag 500 400 --duration 1.0

🔄 错误恢复模式

当窗口未找到时

# 模式：列出窗口，找到最接近的匹配项，重试
uvx desktop-agent app focus "Chrome"             # 失败，错误为 window_not_found
uvx desktop-agent app list                # 查看实际的窗口标题
# 输出显示："Google Chrome - My Page"
uvx desktop-agent app focus "Google Chrome"      # 使用正确的标题

当图像未找到时

# 模式：调整置信度或截取新截图
uvx desktop-agent screen locate button.png --confidence 0.9
uvx desktop-agent screen locate button.png --confidence 0.7
# 如果仍然失败，捕获当前状态进行分析
uvx desktop-agent screen screenshot current.png --active

当点击似乎未命中时

# 模式：验证坐标是否在屏幕上
uvx desktop-agent screen size             # 获取屏幕边界
uvx desktop-agent screen on-screen 1500 900      # 检查坐标是否有效
uvx desktop-agent mouse move 1500 900            # 首先移动以可视化
uvx desktop-agent mouse click                    # 然后在当前位置点击

⚡ 性能优化

最小化截图

# ✅ 良好：仅截取您需要的区域
uvx desktop-agent screen screenshot button_area.png --region "100,200,200,100"

# ✅ 良好：截取特定窗口而非全屏
uvx desktop-agent screen screenshot chrome.png --window "Google Chrome"

# ❌ 缓慢：当您只需要一个小区域时进行全屏捕获
uvx desktop-agent screen screenshot full.png

批量键盘输入

# ✅ 更快：一次性写入整个文本
uvx desktop-agent keyboard write "This is a complete sentence with all the text."

# ❌ 较慢：多个写入命令
uvx desktop-agent keyboard write "This is "
uvx desktop-agent keyboard write "a complete "
uvx desktop-agent keyboard write "sentence."

尽可能使用热键而非鼠标

# ✅ 更快：使用键盘快捷键
uvx desktop-agent keyboard hotkey "ctrl,s"       # 保存
uvx desktop-agent keyboard hotkey "ctrl,a"       # 全选
uvx desktop-agent keyboard hotkey "ctrl,shift,s" # 另存为

# ❌ 较慢：使用鼠标导航菜单
uvx desktop-agent mouse click 50 30              # 点击文件菜单
uvx desktop-agent mouse click 60 80              # 点击保存选项

🛡️ 防御性编程模式

始终验证关键操作

# 在破坏性操作前，与用户确认
uvx desktop-agent message confirm "This will delete all files. Continue?" --title "Warning"
# 检查输出：如果点击了"取消"，则中止操作

使用 JSON 模式进行可靠解析

# ✅ 可靠：解析结构化 JSON 输出
uvx desktop-agent screen locate button.png
# 解析：{"success": true, "data": {"center_x": 125, "center_y": 215}}

# ❌ 脆弱：解析文本输出
uvx desktop-agent screen locate button.png
# 解析："Found at: Box(left=100, top=200, width=50, height=30)"

在多步骤操作前验证

# 带验证的多步骤文件操作
uvx desktop-agent app list
uvx desktop-agent screen locate-text-coordinates "File" --active
uvx desktop-agent mouse click <returned_x> <returned_y>
uvx desktop-agent screen locate-text-coordinates "Save As" --active
uvx desktop-agent mouse click <returned_x> <returned_y>

🎮 平台特定注意事项

Windows

# 常见的 Windows 快捷键
uvx desktop-agent keyboard hotkey "win,d"        # 显示桌面
uvx desktop-agent keyboard hotkey "win,e"        # 打开资源管理器
uvx desktop-agent keyboard hotkey "alt,tab"      # 切换窗口
uvx desktop-agent keyboard hotkey "win,r"        # 运行对话框

# 按名称打开应用
uvx desktop-agent app open notepad
uvx desktop-agent app open calc
uvx desktop-agent app open mspaint

macOS

# 常见的 macOS 快捷键（使用 'command' 表示 Cmd 键）
uvx desktop-agent keyboard hotkey "command,space"   # Spotlight
uvx desktop-agent keyboard hotkey "command,tab"     # 应用切换器
uvx desktop-agent keyboard hotkey "command,q"       # 退出应用
uvx desktop-agent keyboard hotkey "command,shift,3" # 截图

# 打开应用
uvx desktop-agent app open "Safari"
uvx desktop-agent app open "TextEdit"

Linux

# 打开应用（使用 xdg-open 或直接命令）
uvx desktop-agent app open firefox
uvx desktop-agent app open gedit

# 常见快捷键可能因桌面环境而异
uvx desktop-agent keyboard hotkey "alt,f2"       # 运行对话框（许多桌面环境）

📊 决策树：选择正确的命令

 想要与应用程序交互吗？
├── 应用程序未运行 → `app open <name>`
├── 应用程序正在运行但未聚焦 → `app focus <name>`
└── 需要验证窗口 → `app list`

想要查找 UI 元素吗？
├── 有参考图像 → `screen locate-center <image>`
├── 知道文本标签 → `screen locate-text-coordinates "<text>"`
└── 需要查看所有文本 → `screen read-all-text --active`

想要点击某个东西吗？
├── 知道确切坐标 → `mouse click <x> <y>`
├── 需要先查找 → 使用上述定位命令，然后点击返回的坐标
└── 不确定是否在屏幕上 → 先使用 `screen on-screen <x> <y>`

想要输入某些内容吗？
├── 常规文本 → `keyboard write "<text>"`
├── 键盘快捷键 → `keyboard hotkey "<key1>,<key2>"`
├── 单次按键 → `keyboard press <key>`
└── 多次相同按键 → `keyboard press <key> --presses N`

AI 代理集成提示

始终先检查屏幕尺寸，当使用绝对坐标时
尽可能使用相对定位（例如，获取当前位置，计算偏移量）
组合命令以完成复杂工作流
执行前验证（例如，检查图像是否在屏幕上）
使用消息对话框提供用户反馈，用于重要操作
优雅地处理错误 - 如果 UI 状态改变，命令可能会失败

每周安装次数

–

仓库

patrickporto/de…op-agent

首次出现

–

安全审计

Gen Agent Trust HubFail SocketPass SnykWarn

🇺🇸English

Desktop Control Skill

This skill provides comprehensive desktop automation capabilities through PyAutoGUI, allowing AI agents to control the mouse, keyboard, take screenshots, and interact with the desktop environment.

How to Use This Skill

As an AI agent, you can invoke desktop automation commands using the uvx desktop-agent CLI.

Command Structure

All commands follow this pattern:

uvx desktop-agent <category> <command> [arguments] [options]

Categories:

mouse - Mouse control
keyboard - Keyboard input
screen - Screenshots and screen analysis
message - User dialogs
app - Application control (open, focus, list windows)

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

🖱️ Mouse Control (`mouse`)

Control cursor movement and clicks.

# Move cursor to coordinates
uvx desktop-agent mouse move <x> <y> [--duration SECONDS]

# Click at current position or specific coordinates
uvx desktop-agent mouse click [x] [y] [--button left|right|middle] [--clicks N]

# Specialized clicks
uvx desktop-agent mouse double-click [x] [y]
uvx desktop-agent mouse right-click [x] [y]
uvx desktop-agent mouse middle-click [x] [y]

# Drag to coordinates
uvx desktop-agent mouse drag <x> <y> [--duration SECONDS] [--button BUTTON]

# Scroll (positive=up, negative=down)
uvx desktop-agent mouse scroll <clicks> [x] [y]

# Get current mouse position
uvx desktop-agent mouse position

# Move to center of 1920x1080 screen
uvx desktop-agent mouse move 960 540 --duration 0.5

# Right-click at specific location
uvx desktop-agent mouse right-click 500 300

# Scroll down 5 clicks
uvx desktop-agent mouse scroll -5

⌨️ Keyboard Control (`keyboard`)

Type text and execute keyboard shortcuts.

# Type text
uvx desktop-agent keyboard write "<text>" [--interval SECONDS]

# Press keys
uvx desktop-agent keyboard press <key> [--presses N] [--interval SECONDS]

# Execute hotkey combination (comma-separated)
uvx desktop-agent keyboard hotkey "<key1>,<key2>,..."

# Hold/release keys
uvx desktop-agent keyboard keydown <key>
uvx desktop-agent keyboard keyup <key>

# Type text with natural delay
uvx desktop-agent keyboard write "Hello World" --interval 0.05

# Copy selected text
uvx desktop-agent keyboard hotkey "ctrl,c"

# Open Task Manager
uvx desktop-agent keyboard hotkey "ctrl,shift,esc"

# Press Enter 3 times
uvx desktop-agent keyboard press enter --presses 3

Common Key Names:

Modifiers: ctrl, shift, alt, win
Special: enter, tab, esc, space, backspace, delete
Function: f1 through f12
Arrows: up, down, left, right

🖼️ Screen & Screenshots (`screen`)

Capture screenshots and analyze screen content. Supports targeting specific windows.

# Take screenshot
uvx desktop-agent screen screenshot <filename> [--region "x,y,width,height"] [--window <title>] [--active]

# Locate image on screen or within window
uvx desktop-agent screen locate <image_path> [--confidence 0.0-1.0] [--window <title>] [--active]
uvx desktop-agent screen locate-center <image_path> [--confidence 0.0-1.0] [--window <title>] [--active]

# Locate text using OCR within window
uvx desktop-agent screen locate-text-coordinates <text> [--window <title>] [--active]
uvx desktop-agent screen read-all-text [--window <title>] [--active]

# Utility commands
uvx desktop-agent screen pixel <x> <y>
uvx desktop-agent screen size
uvx desktop-agent screen on-screen <x> <y>

# Screenshot of active window
uvx desktop-agent screen screenshot active.png --active

# Screenshot of a specific application
uvx desktop-agent screen screenshot chrome.png --window "Google Chrome"

# Locate image within Notepad
uvx desktop-agent screen locate-center button.png --window "Notepad"

💬 Message Dialogs (`message`)

Display user interaction dialogs.

# Show alert
uvx desktop-agent message alert "<text>" [--title TITLE] [--button BUTTON]

# Show confirmation dialog
uvx desktop-agent message confirm "<text>" [--title TITLE] [--buttons "OK,Cancel"]

# Prompt for input
uvx desktop-agent message prompt "<text>" [--title TITLE] [--default TEXT]

# Password input
uvx desktop-agent message password "<text>" [--title TITLE] [--mask CHAR]

# Simple alert
uvx desktop-agent message alert "Task completed!"

# Get user confirmation
uvx desktop-agent message confirm "Continue with operation?"

# Ask for user input
uvx desktop-agent message prompt "Enter your name:"

📱 Application Control (`app`)

Control applications across Windows, macOS, and Linux.

# Open an application by name
uvx desktop-agent app open <name> [--arg ARGS...]

# Focus on a window by title/name
uvx desktop-agent app focus <name>

# List all visible windows
uvx desktop-agent app list

# Windows: Open Notepad
uvx desktop-agent app open notepad

# Windows: Open Chrome with a URL
uvx desktop-agent app open "chrome" --arg "https://google.com"

# macOS: Open Safari
uvx desktop-agent app open "Safari"

# Focus on a specific window
uvx desktop-agent app focus "Untitled - Notepad"

# List all open windows
uvx desktop-agent app list

Common Automation Workflows

Workflow 1: Open Application and Type

# Open notepad directly (cross-platform)
uvx desktop-agent app open notepad

# Wait for app to open, then focus it
uvx desktop-agent app focus notepad

# Type some text
uvx desktop-agent keyboard write "Hello from Desktop Skill!"

Workflow 2: Screenshot + Analysis

# Get screen size first
uvx desktop-agent screen size

# Take full screenshot
uvx desktop-agent screen screenshot current_screen.png

# Check if specific UI element is visible
uvx desktop-agent screen locate save_button.png

Workflow 3: Form Filling

# Click first field
uvx desktop-agent mouse click 300 200

# Fill field
uvx desktop-agent keyboard write "John Doe"

# Tab to next field
uvx desktop-agent keyboard press tab

# Fill second field
uvx desktop-agent keyboard write "john@example.com"

# Submit form (Enter)
uvx desktop-agent keyboard press enter

Workflow 4: Copy/Paste Operations

# Select all text
uvx desktop-agent keyboard hotkey "ctrl,a"

# Copy
uvx desktop-agent keyboard hotkey "ctrl,c"

# Click destination
uvx desktop-agent mouse click 500 600

# Paste
uvx desktop-agent keyboard hotkey "ctrl,v"

Safety Considerations

When using this skill, AI agents should:

Verify coordinates : Use screen size and on-screen before clicking
Add delays : Insert appropriate delays between commands for UI responsiveness
Validate images : Ensure image files exist before using locate commands
Handle failures : Commands may fail if windows change or elements move
User safety : Always confirm destructive actions with user via message confirm

PyAutoGUI has a fail-safe: moving mouse to screen corner aborts operations. This is a safety feature.

When using screen locate, ensure:

Image file exists and path is correct
Adjust --confidence (try 0.7-0.9)
Image matches exact screen appearance (resolution, colors)

# Show all available commands
uvx desktop-agent --help

# Show commands for specific category
uvx desktop-agent mouse --help
uvx desktop-agent keyboard --help
uvx desktop-agent screen --help
uvx desktop-agent message --help

# Show help for specific command
uvx desktop-agent mouse move --help

Integration Tips for AI Agents

Always check screen size first when working with absolute coordinates
Use relative positioning when possible (e.g., get current position, calculate offset)
Combine commands for complex workflows
Validate before executing (e.g., check if image exists on screen)
Provide user feedback using message dialogs for important operations
Handle errors gracefully - commands may fail if UI state changes

Mouse movements with --duration are animated and take time
Image location (locate) can be slow on large screens - use regions when possible
Keyboard commands are generally fast (< 100ms)
Screenshots depend on screen resolution and region size

All commands output structured JSON by default, ideal for programmatic use by AI agents:

uvx desktop-agent mouse position
# Output: {"success": true, "command": "mouse.position", "timestamp": "2026-01-31T10:00:00Z", "duration_ms": 5, "data": {"position": {"x": 960, "y": 540}}}

All JSON responses follow this schema:

{
  "success": true,
  "command": "category.command",
  "timestamp": "2026-01-31T10:00:00Z",
  "duration_ms": 150,
  "data": { ... },
  "error": null
}

Error Response Schema

{
  "success": false,
  "command": "category.command",
  "timestamp": "2026-01-31T10:00:00Z",
  "duration_ms": 50,
  "data": null,
  "error": {
    "code": "image_not_found",
    "message": "Image file 'button.png' not found",
    "details": {},
    "recoverable": true
  }
}

Code	Description
`success`	Command succeeded
`invalid_argument`	Invalid command arguments
`coordinates_out_of_bounds`	Coordinates outside screen
`image_not_found`	Image file not found or not on screen
`window_not_found`	Target window not found
`ocr_failed`	OCR operation failed
`application_not_found`	Application not found
`permission_denied`	Permission denied
`platform_not_supported`	Platform not supported
`timeout`	Operation timed out
`unknown_error`	Unknown error

uvx desktop-agent mouse move 960 540



{"success": true, "command": "mouse.move", "timestamp": "...", "duration_ms": 150, "data": {"x": 960, "y": 540, "duration": 0}, "error": null}

uvx desktop-agent screen size



{"success": true, "command": "screen.size", "timestamp": "...", "duration_ms": 5, "data": {"size": {"width": 1920, "height": 1080}}, "error": null}

uvx desktop-agent screen locate button.png



{"success": true, "command": "screen.locate", "timestamp": "...", "duration_ms": 250, "data": {"image_found": true, "bounding_box": {"left": 100, "top": 200, "width": 50, "height": 30, "center_x": 125, "center_y": 215}}, "error": null}

uvx desktop-agent app list



{"success": true, "command": "app.list", "timestamp": "...", "duration_ms": 100, "data": {"windows": ["Untitled - Notepad", "Google Chrome", "Visual Studio Code"]}, "error": null}

uvx desktop-agent screen locate missing.png



{"success": false, "command": "screen.locate", "timestamp": "...", "duration_ms": 50, "data": null, "error": {"code": "image_not_found", "message": "Image file 'missing.png' not found", "details": {}, "recoverable": true}}

Effective Usage Guide for AI Agents

This section teaches AI agents how to use this skill effectively with optimal command sequences and best practices.

🎯 Core Strategy: Observe First, Then Act

Always understand the current state before performing actions. This avoids clicking wrong coordinates or typing in the wrong window.

Recommended Initial Sequence:

# 1. Get screen dimensions to understand your workspace
uvx desktop-agent screen size
uvx desktop-agent app list
uvx desktop-agent mouse position

📋 Recommended Command Sequences by Task

Open and Interact with Application

# ✅ CORRECT: Open, wait, verify, then interact
uvx desktop-agent app open notepad              # Step 1: Open app
uvx desktop-agent app list
uvx desktop-agent app focus "Notepad"
uvx desktop-agent keyboard write "Hello World"  # Step 4: Now safe to type

# ❌ WRONG: Type immediately without verification
uvx desktop-agent app open notepad
uvx desktop-agent keyboard write "Hello World"  # May type in wrong window!

Find and Click UI Element (Image-Based)

# ✅ CORRECT: Locate first, click if found
uvx desktop-agent screen locate-center button.png --confidence 0.8
# Check if success=true and coordinates are valid
uvx desktop-agent mouse click 125 215  # Use returned coordinates

# ❌ WRONG: Click without verifying element exists
uvx desktop-agent mouse click 125 215  # Might click wrong area!

Find and Click UI Element (Text-Based with OCR)

# ✅ CORRECT: Read screen text, then locate specific text
uvx desktop-agent screen read-all-text --active
uvx desktop-agent screen locate-text-coordinates "Save" --active
# Use returned coordinates to click

# For window-specific OCR:
uvx desktop-agent screen locate-text-coordinates "OK" --window "Dialog Title"

Fill a Form with Multiple Fields

# ✅ CORRECT: Click each field explicitly before typing
uvx desktop-agent mouse click 300 200           # Click first field
uvx desktop-agent keyboard write "John Doe"
uvx desktop-agent mouse click 300 250           # Click second field (more reliable)
uvx desktop-agent keyboard write "john@example.com"
uvx desktop-agent mouse click 300 300           # Click third field
uvx desktop-agent keyboard write "555-1234"

# OR use Tab navigation (less reliable if field order changes)
uvx desktop-agent mouse click 300 200
uvx desktop-agent keyboard write "John Doe"
uvx desktop-agent keyboard press tab
uvx desktop-agent keyboard write "john@example.com"
uvx desktop-agent keyboard press tab
uvx desktop-agent keyboard write "555-1234"
uvx desktop-agent keyboard press enter          # Submit

Take Targeted Screenshots for Analysis

# ✅ CORRECT: Screenshot specific windows for faster processing
uvx desktop-agent app list --json                           # Find exact window title
uvx desktop-agent screen screenshot app.png --window "Google Chrome"

# For active window only
uvx desktop-agent screen screenshot active.png --active

# Full screen only when necessary (slower, larger file)
uvx desktop-agent screen size
uvx desktop-agent screen screenshot full.png

# ✅ CORRECT: Move to start, verify position, then drag
uvx desktop-agent mouse move 100 200                 # Move to source
uvx desktop-agent mouse position              # Verify position
uvx desktop-agent mouse drag 500 400 --duration 0.5  # Drag to destination

# For precision, use slower duration
uvx desktop-agent mouse drag 500 400 --duration 1.0

🔄 Error Recovery Patterns

When Window Not Found

# Pattern: List windows, find closest match, retry
uvx desktop-agent app focus "Chrome"             # Fails with window_not_found
uvx desktop-agent app list                # See actual window titles
# Output shows: "Google Chrome - My Page"
uvx desktop-agent app focus "Google Chrome"      # Use correct title

When Image Not Found

# Pattern: Adjust confidence or take new screenshot
uvx desktop-agent screen locate button.png --confidence 0.9
uvx desktop-agent screen locate button.png --confidence 0.7
# If still failing, capture current state for analysis
uvx desktop-agent screen screenshot current.png --active

When Click Seems to Miss

# Pattern: Verify coordinates are on screen
uvx desktop-agent screen size             # Get screen bounds
uvx desktop-agent screen on-screen 1500 900      # Check if coords are valid
uvx desktop-agent mouse move 1500 900            # Move first to visualize
uvx desktop-agent mouse click                    # Then click at current position

⚡ Performance Optimization

Minimize Screenshots

# ✅ GOOD: Screenshot only the region you need
uvx desktop-agent screen screenshot button_area.png --region "100,200,200,100"

# ✅ GOOD: Screenshot specific window instead of full screen  
uvx desktop-agent screen screenshot chrome.png --window "Google Chrome"

# ❌ SLOW: Full screen capture when you only need a small area
uvx desktop-agent screen screenshot full.png

Batch Keyboard Input

# ✅ FASTER: Write entire text at once
uvx desktop-agent keyboard write "This is a complete sentence with all the text."

# ❌ SLOWER: Multiple write commands
uvx desktop-agent keyboard write "This is "
uvx desktop-agent keyboard write "a complete "
uvx desktop-agent keyboard write "sentence."

Use Hotkeys Over Mouse When Possible

# ✅ FASTER: Use keyboard shortcuts
uvx desktop-agent keyboard hotkey "ctrl,s"       # Save
uvx desktop-agent keyboard hotkey "ctrl,a"       # Select all
uvx desktop-agent keyboard hotkey "ctrl,shift,s" # Save as

# ❌ SLOWER: Navigate menu with mouse
uvx desktop-agent mouse click 50 30              # Click File menu
uvx desktop-agent mouse click 60 80              # Click Save option

🛡️ Defensive Programming Patterns

Always Verify Critical Actions

# Before destructive action, confirm with user
uvx desktop-agent message confirm "This will delete all files. Continue?" --title "Warning"
# Check output: if "Cancel" was clicked, abort operation

Use JSON Mode for Reliable Parsing

# ✅ RELIABLE: Parse structured JSON output
uvx desktop-agent screen locate button.png
# Parse: {"success": true, "data": {"center_x": 125, "center_y": 215}}

# ❌ FRAGILE: Parse text output
uvx desktop-agent screen locate button.png
# Parse: "Found at: Box(left=100, top=200, width=50, height=30)"

Validate Before Multi-Step Operations

# Multi-step file operation with validation
uvx desktop-agent app list
uvx desktop-agent screen locate-text-coordinates "File" --active
uvx desktop-agent mouse click <returned_x> <returned_y>
uvx desktop-agent screen locate-text-coordinates "Save As" --active
uvx desktop-agent mouse click <returned_x> <returned_y>

🎮 Platform-Specific Considerations

# Common Windows shortcuts
uvx desktop-agent keyboard hotkey "win,d"        # Show desktop
uvx desktop-agent keyboard hotkey "win,e"        # Open Explorer
uvx desktop-agent keyboard hotkey "alt,tab"      # Switch windows
uvx desktop-agent keyboard hotkey "win,r"        # Run dialog

# Open apps by name
uvx desktop-agent app open notepad
uvx desktop-agent app open calc
uvx desktop-agent app open mspaint

# Common macOS shortcuts (use 'command' for Cmd key)
uvx desktop-agent keyboard hotkey "command,space"   # Spotlight
uvx desktop-agent keyboard hotkey "command,tab"     # App switcher
uvx desktop-agent keyboard hotkey "command,q"       # Quit app
uvx desktop-agent keyboard hotkey "command,shift,3" # Screenshot

# Open apps
uvx desktop-agent app open "Safari"
uvx desktop-agent app open "TextEdit"

# Open apps (uses xdg-open or direct command)
uvx desktop-agent app open firefox
uvx desktop-agent app open gedit

# Common shortcuts may vary by DE
uvx desktop-agent keyboard hotkey "alt,f2"       # Run dialog (many DEs)

📊 Decision Tree: Choosing the Right Command

 Want to interact with an app?
├── App not running → `app open <name>`
├── App running but not focused → `app focus <name>` 
└── Need to verify windows → `app list`

Want to find a UI element?
├── Have reference image → `screen locate-center <image>`
├── Know the text label → `screen locate-text-coordinates "<text>"`
└── Need to see all text → `screen read-all-text --active`

Want to click something?
├── Know exact coordinates → `mouse click <x> <y>`
├── Need to find first → Use locate commands above, then click returned coords
└── Not sure if on screen → `screen on-screen <x> <y>` first

Want to type something?
├── Regular text → `keyboard write "<text>"`
├── Keyboard shortcut → `keyboard hotkey "<key1>,<key2>"`
├── Single key press → `keyboard press <key>`
└── Multiple of same key → `keyboard press <key> --presses N`

Integration Tips for AI Agents

Always check screen size first when working with absolute coordinates
Use relative positioning when possible (e.g., get current position, calculate offset)
Combine commands for complex workflows
Validate before executing (e.g., check if image exists on screen)
Provide user feedback using message dialogs for important operations
Handle errors gracefully - commands may fail if UI state changes

Skills CLI 使用指南：AI Agent 技能包管理器安装与管理教程

27,400 周安装

桌面控制技能：AI代理自动化鼠标键盘截图，PyAutoGUI桌面控制工具

🇨🇳中文介绍

桌面控制技能

如何使用此技能

命令结构

可用命令

🖱️ 鼠标控制 (mouse)

⌨️ 键盘控制 (keyboard)

🖼️ 屏幕与截图 (screen)

💬 消息对话框 (message)

📱 应用程序控制 (app)

常见自动化工作流

工作流 1：打开应用程序并输入

工作流 2：截图 + 分析

工作流 3：表单填写

工作流 4：复制/粘贴操作

安全注意事项

故障排除

PyAutoGUI 安全保护

图像未找到

获取帮助

AI 代理集成提示

性能说明

输出格式

响应模式

错误响应模式

错误代码

AI 代理有效使用指南

🎯 核心策略：先观察，后行动

📋 按任务推荐的命令序列

打开并与应用程序交互

查找并点击 UI 元素（基于图像）

查找并点击 UI 元素（基于文本，使用 OCR）

填写包含多个字段的表单

为分析截取目标截图

安全的拖放操作

🔄 错误恢复模式

当窗口未找到时

当图像未找到时

当点击似乎未命中时

⚡ 性能优化

最小化截图

批量键盘输入

尽可能使用热键而非鼠标

🛡️ 防御性编程模式

始终验证关键操作

使用 JSON 模式进行可靠解析

在多步骤操作前验证

🎮 平台特定注意事项

Windows

macOS

Linux

📊 决策树：选择正确的命令

AI 代理集成提示

🇺🇸English

Desktop Control Skill

How to Use This Skill

Command Structure

相关 Skills

Available Commands

🖱️ Mouse Control (mouse)

⌨️ Keyboard Control (keyboard)

🖼️ Screen & Screenshots (screen)

💬 Message Dialogs (message)

📱 Application Control (app)

Common Automation Workflows

Workflow 1: Open Application and Type

Workflow 2: Screenshot + Analysis

Workflow 3: Form Filling

Workflow 4: Copy/Paste Operations

Safety Considerations

Troubleshooting

PyAutoGUI Fail-Safe

Image not found

Getting Help

Integration Tips for AI Agents

Performance Notes

Output Format

Response Schema

Error Response Schema

🖱️ 鼠标控制 (`mouse`)

⌨️ 键盘控制 (`keyboard`)

🖼️ 屏幕与截图 (`screen`)

💬 消息对话框 (`message`)

📱 应用程序控制 (`app`)

🖱️ Mouse Control (`mouse`)

⌨️ Keyboard Control (`keyboard`)

🖼️ Screen & Screenshots (`screen`)

💬 Message Dialogs (`message`)

📱 Application Control (`app`)