npx skills add https://github.com/patrickporto/desktop-agent --skill 'Desktop Control'此技能通过 PyAutoGUI 提供全面的桌面自动化能力,允许 AI 代理控制鼠标、键盘、截取屏幕截图以及与桌面环境交互。
作为 AI 代理,您可以使用 uvx desktop-agent CLI 调用桌面自动化命令。
所有命令遵循以下模式:
uvx desktop-agent <category> <command> [arguments] [options]
类别:
mouse - 鼠标控制keyboard - 键盘输入screen - 屏幕截图和屏幕分析message - 用户对话框app - 应用程序控制(打开、聚焦、列出窗口)mouse)控制光标移动和点击。
# 将光标移动到坐标
uvx desktop-agent mouse move <x> <y> [--duration SECONDS]
# 在当前位置或特定坐标点击
uvx desktop-agent mouse click [x] [y] [--button left|right|middle] [--clicks N]
# 特殊点击
uvx desktop-agent mouse double-click [x] [y]
uvx desktop-agent mouse right-click [x] [y]
uvx desktop-agent mouse middle-click [x] [y]
# 拖拽到坐标
uvx desktop-agent mouse drag <x> <y> [--duration SECONDS] [--button BUTTON]
# 滚动(正数=向上,负数=向下)
uvx desktop-agent mouse scroll <clicks> [x] [y]
# 获取当前鼠标位置
uvx desktop-agent mouse position
示例:
# 移动到 1920x1080 屏幕中心
uvx desktop-agent mouse move 960 540 --duration 0.5
# 在特定位置右键点击
uvx desktop-agent mouse right-click 500 300
# 向下滚动 5 次
uvx desktop-agent mouse scroll -5
keyboard)输入文本和执行键盘快捷键。
# 输入文本
uvx desktop-agent keyboard write "<text>" [--interval SECONDS]
# 按下按键
uvx desktop-agent keyboard press <key> [--presses N] [--interval SECONDS]
# 执行热键组合(逗号分隔)
uvx desktop-agent keyboard hotkey "<key1>,<key2>,..."
# 按住/释放按键
uvx desktop-agent keyboard keydown <key>
uvx desktop-agent keyboard keyup <key>
示例:
# 以自然延迟输入文本
uvx desktop-agent keyboard write "Hello World" --interval 0.05
# 复制选中的文本
uvx desktop-agent keyboard hotkey "ctrl,c"
# 打开任务管理器
uvx desktop-agent keyboard hotkey "ctrl,shift,esc"
# 按 Enter 键 3 次
uvx desktop-agent keyboard press enter --presses 3
常用键名:
ctrl, shift, alt, winenter, tab, esc, space, backspace, deletef1 到 f12screen)捕获屏幕截图和分析屏幕内容。支持定位特定窗口。
# 截取屏幕截图
uvx desktop-agent screen screenshot <filename> [--region "x,y,width,height"] [--window <title>] [--active]
# 在屏幕或窗口内定位图像
uvx desktop-agent screen locate <image_path> [--confidence 0.0-1.0] [--window <title>] [--active]
uvx desktop-agent screen locate-center <image_path> [--confidence 0.0-1.0] [--window <title>] [--active]
# 使用 OCR 在窗口内定位文本
uvx desktop-agent screen locate-text-coordinates <text> [--window <title>] [--active]
uvx desktop-agent screen read-all-text [--window <title>] [--active]
# 实用命令
uvx desktop-agent screen pixel <x> <y>
uvx desktop-agent screen size
uvx desktop-agent screen on-screen <x> <y>
示例:
# 活动窗口截图
uvx desktop-agent screen screenshot active.png --active
# 特定应用程序截图
uvx desktop-agent screen screenshot chrome.png --window "Google Chrome"
# 在记事本内定位图像
uvx desktop-agent screen locate-center button.png --window "Notepad"
message)显示用户交互对话框。
# 显示警告
uvx desktop-agent message alert "<text>" [--title TITLE] [--button BUTTON]
# 显示确认对话框
uvx desktop-agent message confirm "<text>" [--title TITLE] [--buttons "OK,Cancel"]
# 提示输入
uvx desktop-agent message prompt "<text>" [--title TITLE] [--default TEXT]
# 密码输入
uvx desktop-agent message password "<text>" [--title TITLE] [--mask CHAR]
示例:
# 简单警告
uvx desktop-agent message alert "任务完成!"
# 获取用户确认
uvx desktop-agent message confirm "继续操作?"
# 请求用户输入
uvx desktop-agent message prompt "请输入您的姓名:"
app)跨 Windows、macOS 和 Linux 控制应用程序。
# 按名称打开应用程序
uvx desktop-agent app open <name> [--arg ARGS...]
# 通过标题/名称聚焦窗口
uvx desktop-agent app focus <name>
# 列出所有可见窗口
uvx desktop-agent app list
示例:
# Windows:打开记事本
uvx desktop-agent app open notepad
# Windows:打开 Chrome 并附带 URL
uvx desktop-agent app open "chrome" --arg "https://google.com"
# macOS:打开 Safari
uvx desktop-agent app open "Safari"
# 聚焦特定窗口
uvx desktop-agent app focus "Untitled - Notepad"
# 列出所有打开的窗口
uvx desktop-agent app list
# 直接打开记事本(跨平台)
uvx desktop-agent app open notepad
# 等待应用打开,然后聚焦它
uvx desktop-agent app focus notepad
# 输入一些文本
uvx desktop-agent keyboard write "Hello from Desktop Skill!"
# 首先获取屏幕尺寸
uvx desktop-agent screen size
# 截取完整屏幕截图
uvx desktop-agent screen screenshot current_screen.png
# 检查特定 UI 元素是否可见
uvx desktop-agent screen locate save_button.png
# 点击第一个字段
uvx desktop-agent mouse click 300 200
# 填写字段
uvx desktop-agent keyboard write "John Doe"
# 按 Tab 键切换到下一个字段
uvx desktop-agent keyboard press tab
# 填写第二个字段
uvx desktop-agent keyboard write "john@example.com"
# 提交表单(按 Enter)
uvx desktop-agent keyboard press enter
# 全选文本
uvx desktop-agent keyboard hotkey "ctrl,a"
# 复制
uvx desktop-agent keyboard hotkey "ctrl,c"
# 点击目标位置
uvx desktop-agent mouse click 500 600
# 粘贴
uvx desktop-agent keyboard hotkey "ctrl,v"
使用此技能时,AI 代理应:
screen size 和 on-screenlocate 命令前确保图像文件存在message confirm 与用户确认破坏性操作PyAutoGUI 有一个安全保护机制:将鼠标移动到屏幕角落会中止操作。这是一个安全功能。
使用 screen locate 时,请确保:
--confidence(尝试 0.7-0.9)# 显示所有可用命令
uvx desktop-agent --help
# 显示特定类别的命令
uvx desktop-agent mouse --help
uvx desktop-agent keyboard --help
uvx desktop-agent screen --help
uvx desktop-agent message --help
# 显示特定命令的帮助
uvx desktop-agent mouse move --help
--duration 的鼠标移动是动画的并且需要时间locate)在大屏幕上可能很慢 - 尽可能使用区域所有命令默认输出结构化 JSON,非常适合 AI 代理以编程方式使用:
uvx desktop-agent mouse position
# 输出:{"success": true, "command": "mouse.position", "timestamp": "2026-01-31T10:00:00Z", "duration_ms": 5, "data": {"position": {"x": 960, "y": 540}}}
所有 JSON 响应都遵循此模式:
{
"success": true,
"command": "category.command",
"timestamp": "2026-01-31T10:00:00Z",
"duration_ms": 150,
"data": { ... },
"error": null
}
{
"success": false,
"command": "category.command",
"timestamp": "2026-01-31T10:00:00Z",
"duration_ms": 50,
"data": null,
"error": {
"code": "image_not_found",
"message": "Image file 'button.png' not found",
"details": {},
"recoverable": true
}
}
| 代码 | 描述 |
|---|---|
success | 命令成功 |
invalid_argument | 无效的命令参数 |
coordinates_out_of_bounds | 坐标超出屏幕范围 |
image_not_found | 图像文件未找到或不在屏幕上 |
window_not_found | 目标窗口未找到 |
ocr_failed | OCR 操作失败 |
鼠标移动:
uvx desktop-agent mouse move 960 540
{"success": true, "command": "mouse.move", "timestamp": "...", "duration_ms": 150, "data": {"x": 960, "y": 540, "duration": 0}, "error": null}
屏幕尺寸:
uvx desktop-agent screen size
{"success": true, "command": "screen.size", "timestamp": "...", "duration_ms": 5, "data": {"size": {"width": 1920, "height": 1080}}, "error": null}
定位图像:
uvx desktop-agent screen locate button.png
{"success": true, "command": "screen.locate", "timestamp": "...", "duration_ms": 250, "data": {"image_found": true, "bounding_box": {"left": 100, "top": 200, "width": 50, "height": 30, "center_x": 125, "center_y": 215}}, "error": null}
列出窗口:
uvx desktop-agent app list
{"success": true, "command": "app.list", "timestamp": "...", "duration_ms": 100, "data": {"windows": ["Untitled - Notepad", "Google Chrome", "Visual Studio Code"]}, "error": null}
错误示例:
uvx desktop-agent screen locate missing.png
{"success": false, "command": "screen.locate", "timestamp": "...", "duration_ms": 50, "data": null, "error": {"code": "image_not_found", "message": "Image file 'missing.png' not found", "details": {}, "recoverable": true}}
本节教授 AI 代理如何有效使用此技能,包括最佳命令序列和最佳实践。
始终在执行操作前了解当前状态。这可以避免点击错误的坐标或在错误的窗口中输入。
推荐的初始序列:
# 1. 获取屏幕尺寸以了解您的工作空间
uvx desktop-agent screen size
uvx desktop-agent app list
uvx desktop-agent mouse position
# ✅ 正确:打开、等待、验证,然后交互
uvx desktop-agent app open notepad # 步骤 1:打开应用
uvx desktop-agent app list
uvx desktop-agent app focus "Notepad"
uvx desktop-agent keyboard write "Hello World" # 步骤 4:现在可以安全地输入
# ❌ 错误:未经验证立即输入
uvx desktop-agent app open notepad
uvx desktop-agent keyboard write "Hello World" # 可能输入到错误的窗口!
# ✅ 正确:先定位,如果找到则点击
uvx desktop-agent screen locate-center button.png --confidence 0.8
# 检查 success=true 且坐标有效
uvx desktop-agent mouse click 125 215 # 使用返回的坐标
# ❌ 错误:未验证元素存在就点击
uvx desktop-agent mouse click 125 215 # 可能点击错误区域!
# ✅ 正确:读取屏幕文本,然后定位特定文本
uvx desktop-agent screen read-all-text --active
uvx desktop-agent screen locate-text-coordinates "Save" --active
# 使用返回的坐标进行点击
# 对于特定窗口的 OCR:
uvx desktop-agent screen locate-text-coordinates "OK" --window "Dialog Title"
# ✅ 正确:在输入前明确点击每个字段
uvx desktop-agent mouse click 300 200 # 点击第一个字段
uvx desktop-agent keyboard write "John Doe"
uvx desktop-agent mouse click 300 250 # 点击第二个字段(更可靠)
uvx desktop-agent keyboard write "john@example.com"
uvx desktop-agent mouse click 300 300 # 点击第三个字段
uvx desktop-agent keyboard write "555-1234"
# 或者使用 Tab 导航(如果字段顺序改变则不太可靠)
uvx desktop-agent mouse click 300 200
uvx desktop-agent keyboard write "John Doe"
uvx desktop-agent keyboard press tab
uvx desktop-agent keyboard write "john@example.com"
uvx desktop-agent keyboard press tab
uvx desktop-agent keyboard write "555-1234"
uvx desktop-agent keyboard press enter # 提交
# ✅ 正确:截取特定窗口以加快处理速度
uvx desktop-agent app list --json # 查找确切的窗口标题
uvx desktop-agent screen screenshot app.png --window "Google Chrome"
# 仅限活动窗口
uvx desktop-agent screen screenshot active.png --active
# 仅在必要时截取全屏(较慢,文件较大)
uvx desktop-agent screen size
uvx desktop-agent screen screenshot full.png
# ✅ 正确:移动到起点,验证位置,然后拖拽
uvx desktop-agent mouse move 100 200 # 移动到源位置
uvx desktop-agent mouse position # 验证位置
uvx desktop-agent mouse drag 500 400 --duration 0.5 # 拖拽到目标位置
# 为了精确,使用较慢的持续时间
uvx desktop-agent mouse drag 500 400 --duration 1.0
# 模式:列出窗口,找到最接近的匹配项,重试
uvx desktop-agent app focus "Chrome" # 失败,错误为 window_not_found
uvx desktop-agent app list # 查看实际的窗口标题
# 输出显示:"Google Chrome - My Page"
uvx desktop-agent app focus "Google Chrome" # 使用正确的标题
# 模式:调整置信度或截取新截图
uvx desktop-agent screen locate button.png --confidence 0.9
uvx desktop-agent screen locate button.png --confidence 0.7
# 如果仍然失败,捕获当前状态进行分析
uvx desktop-agent screen screenshot current.png --active
# 模式:验证坐标是否在屏幕上
uvx desktop-agent screen size # 获取屏幕边界
uvx desktop-agent screen on-screen 1500 900 # 检查坐标是否有效
uvx desktop-agent mouse move 1500 900 # 首先移动以可视化
uvx desktop-agent mouse click # 然后在当前位置点击
# ✅ 良好:仅截取您需要的区域
uvx desktop-agent screen screenshot button_area.png --region "100,200,200,100"
# ✅ 良好:截取特定窗口而非全屏
uvx desktop-agent screen screenshot chrome.png --window "Google Chrome"
# ❌ 缓慢:当您只需要一个小区域时进行全屏捕获
uvx desktop-agent screen screenshot full.png
# ✅ 更快:一次性写入整个文本
uvx desktop-agent keyboard write "This is a complete sentence with all the text."
# ❌ 较慢:多个写入命令
uvx desktop-agent keyboard write "This is "
uvx desktop-agent keyboard write "a complete "
uvx desktop-agent keyboard write "sentence."
# ✅ 更快:使用键盘快捷键
uvx desktop-agent keyboard hotkey "ctrl,s" # 保存
uvx desktop-agent keyboard hotkey "ctrl,a" # 全选
uvx desktop-agent keyboard hotkey "ctrl,shift,s" # 另存为
# ❌ 较慢:使用鼠标导航菜单
uvx desktop-agent mouse click 50 30 # 点击文件菜单
uvx desktop-agent mouse click 60 80 # 点击保存选项
# 在破坏性操作前,与用户确认
uvx desktop-agent message confirm "This will delete all files. Continue?" --title "Warning"
# 检查输出:如果点击了"取消",则中止操作
# ✅ 可靠:解析结构化 JSON 输出
uvx desktop-agent screen locate button.png
# 解析:{"success": true, "data": {"center_x": 125, "center_y": 215}}
# ❌ 脆弱:解析文本输出
uvx desktop-agent screen locate button.png
# 解析:"Found at: Box(left=100, top=200, width=50, height=30)"
# 带验证的多步骤文件操作
uvx desktop-agent app list
uvx desktop-agent screen locate-text-coordinates "File" --active
uvx desktop-agent mouse click <returned_x> <returned_y>
uvx desktop-agent screen locate-text-coordinates "Save As" --active
uvx desktop-agent mouse click <returned_x> <returned_y>
# 常见的 Windows 快捷键
uvx desktop-agent keyboard hotkey "win,d" # 显示桌面
uvx desktop-agent keyboard hotkey "win,e" # 打开资源管理器
uvx desktop-agent keyboard hotkey "alt,tab" # 切换窗口
uvx desktop-agent keyboard hotkey "win,r" # 运行对话框
# 按名称打开应用
uvx desktop-agent app open notepad
uvx desktop-agent app open calc
uvx desktop-agent app open mspaint
# 常见的 macOS 快捷键(使用 'command' 表示 Cmd 键)
uvx desktop-agent keyboard hotkey "command,space" # Spotlight
uvx desktop-agent keyboard hotkey "command,tab" # 应用切换器
uvx desktop-agent keyboard hotkey "command,q" # 退出应用
uvx desktop-agent keyboard hotkey "command,shift,3" # 截图
# 打开应用
uvx desktop-agent app open "Safari"
uvx desktop-agent app open "TextEdit"
# 打开应用(使用 xdg-open 或直接命令)
uvx desktop-agent app open firefox
uvx desktop-agent app open gedit
# 常见快捷键可能因桌面环境而异
uvx desktop-agent keyboard hotkey "alt,f2" # 运行对话框(许多桌面环境)
想要与应用程序交互吗?
├── 应用程序未运行 → `app open <name>`
├── 应用程序正在运行但未聚焦 → `app focus <name>`
└── 需要验证窗口 → `app list`
想要查找 UI 元素吗?
├── 有参考图像 → `screen locate-center <image>`
├── 知道文本标签 → `screen locate-text-coordinates "<text>"`
└── 需要查看所有文本 → `screen read-all-text --active`
想要点击某个东西吗?
├── 知道确切坐标 → `mouse click <x> <y>`
├── 需要先查找 → 使用上述定位命令,然后点击返回的坐标
└── 不确定是否在屏幕上 → 先使用 `screen on-screen <x> <y>`
想要输入某些内容吗?
├── 常规文本 → `keyboard write "<text>"`
├── 键盘快捷键 → `keyboard hotkey "<key1>,<key2>"`
├── 单次按键 → `keyboard press <key>`
└── 多次相同按键 → `keyboard press <key> --presses N`
每周安装次数
–
仓库
首次出现
–
安全审计
This skill provides comprehensive desktop automation capabilities through PyAutoGUI, allowing AI agents to control the mouse, keyboard, take screenshots, and interact with the desktop environment.
As an AI agent, you can invoke desktop automation commands using the uvx desktop-agent CLI.
All commands follow this pattern:
uvx desktop-agent <category> <command> [arguments] [options]
Categories:
mouse - Mouse controlkeyboard - Keyboard inputscreen - Screenshots and screen analysismessage - User dialogsapp - Application control (open, focus, list windows)广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
up, down, left, rightapplication_not_found| 应用程序未找到 |
permission_denied | 权限被拒绝 |
platform_not_supported | 平台不支持 |
timeout | 操作超时 |
unknown_error | 未知错误 |
mouse)Control cursor movement and clicks.
# Move cursor to coordinates
uvx desktop-agent mouse move <x> <y> [--duration SECONDS]
# Click at current position or specific coordinates
uvx desktop-agent mouse click [x] [y] [--button left|right|middle] [--clicks N]
# Specialized clicks
uvx desktop-agent mouse double-click [x] [y]
uvx desktop-agent mouse right-click [x] [y]
uvx desktop-agent mouse middle-click [x] [y]
# Drag to coordinates
uvx desktop-agent mouse drag <x> <y> [--duration SECONDS] [--button BUTTON]
# Scroll (positive=up, negative=down)
uvx desktop-agent mouse scroll <clicks> [x] [y]
# Get current mouse position
uvx desktop-agent mouse position
Examples:
# Move to center of 1920x1080 screen
uvx desktop-agent mouse move 960 540 --duration 0.5
# Right-click at specific location
uvx desktop-agent mouse right-click 500 300
# Scroll down 5 clicks
uvx desktop-agent mouse scroll -5
keyboard)Type text and execute keyboard shortcuts.
# Type text
uvx desktop-agent keyboard write "<text>" [--interval SECONDS]
# Press keys
uvx desktop-agent keyboard press <key> [--presses N] [--interval SECONDS]
# Execute hotkey combination (comma-separated)
uvx desktop-agent keyboard hotkey "<key1>,<key2>,..."
# Hold/release keys
uvx desktop-agent keyboard keydown <key>
uvx desktop-agent keyboard keyup <key>
Examples:
# Type text with natural delay
uvx desktop-agent keyboard write "Hello World" --interval 0.05
# Copy selected text
uvx desktop-agent keyboard hotkey "ctrl,c"
# Open Task Manager
uvx desktop-agent keyboard hotkey "ctrl,shift,esc"
# Press Enter 3 times
uvx desktop-agent keyboard press enter --presses 3
Common Key Names:
ctrl, shift, alt, winenter, tab, esc, space, backspace, deletef1 through f12up, down, left, rightscreen)Capture screenshots and analyze screen content. Supports targeting specific windows.
# Take screenshot
uvx desktop-agent screen screenshot <filename> [--region "x,y,width,height"] [--window <title>] [--active]
# Locate image on screen or within window
uvx desktop-agent screen locate <image_path> [--confidence 0.0-1.0] [--window <title>] [--active]
uvx desktop-agent screen locate-center <image_path> [--confidence 0.0-1.0] [--window <title>] [--active]
# Locate text using OCR within window
uvx desktop-agent screen locate-text-coordinates <text> [--window <title>] [--active]
uvx desktop-agent screen read-all-text [--window <title>] [--active]
# Utility commands
uvx desktop-agent screen pixel <x> <y>
uvx desktop-agent screen size
uvx desktop-agent screen on-screen <x> <y>
Examples:
# Screenshot of active window
uvx desktop-agent screen screenshot active.png --active
# Screenshot of a specific application
uvx desktop-agent screen screenshot chrome.png --window "Google Chrome"
# Locate image within Notepad
uvx desktop-agent screen locate-center button.png --window "Notepad"
message)Display user interaction dialogs.
# Show alert
uvx desktop-agent message alert "<text>" [--title TITLE] [--button BUTTON]
# Show confirmation dialog
uvx desktop-agent message confirm "<text>" [--title TITLE] [--buttons "OK,Cancel"]
# Prompt for input
uvx desktop-agent message prompt "<text>" [--title TITLE] [--default TEXT]
# Password input
uvx desktop-agent message password "<text>" [--title TITLE] [--mask CHAR]
Examples:
# Simple alert
uvx desktop-agent message alert "Task completed!"
# Get user confirmation
uvx desktop-agent message confirm "Continue with operation?"
# Ask for user input
uvx desktop-agent message prompt "Enter your name:"
app)Control applications across Windows, macOS, and Linux.
# Open an application by name
uvx desktop-agent app open <name> [--arg ARGS...]
# Focus on a window by title/name
uvx desktop-agent app focus <name>
# List all visible windows
uvx desktop-agent app list
Examples:
# Windows: Open Notepad
uvx desktop-agent app open notepad
# Windows: Open Chrome with a URL
uvx desktop-agent app open "chrome" --arg "https://google.com"
# macOS: Open Safari
uvx desktop-agent app open "Safari"
# Focus on a specific window
uvx desktop-agent app focus "Untitled - Notepad"
# List all open windows
uvx desktop-agent app list
# Open notepad directly (cross-platform)
uvx desktop-agent app open notepad
# Wait for app to open, then focus it
uvx desktop-agent app focus notepad
# Type some text
uvx desktop-agent keyboard write "Hello from Desktop Skill!"
# Get screen size first
uvx desktop-agent screen size
# Take full screenshot
uvx desktop-agent screen screenshot current_screen.png
# Check if specific UI element is visible
uvx desktop-agent screen locate save_button.png
# Click first field
uvx desktop-agent mouse click 300 200
# Fill field
uvx desktop-agent keyboard write "John Doe"
# Tab to next field
uvx desktop-agent keyboard press tab
# Fill second field
uvx desktop-agent keyboard write "john@example.com"
# Submit form (Enter)
uvx desktop-agent keyboard press enter
# Select all text
uvx desktop-agent keyboard hotkey "ctrl,a"
# Copy
uvx desktop-agent keyboard hotkey "ctrl,c"
# Click destination
uvx desktop-agent mouse click 500 600
# Paste
uvx desktop-agent keyboard hotkey "ctrl,v"
When using this skill, AI agents should:
screen size and on-screen before clickinglocate commandsmessage confirmPyAutoGUI has a fail-safe: moving mouse to screen corner aborts operations. This is a safety feature.
When using screen locate, ensure:
--confidence (try 0.7-0.9)# Show all available commands
uvx desktop-agent --help
# Show commands for specific category
uvx desktop-agent mouse --help
uvx desktop-agent keyboard --help
uvx desktop-agent screen --help
uvx desktop-agent message --help
# Show help for specific command
uvx desktop-agent mouse move --help
--duration are animated and take timelocate) can be slow on large screens - use regions when possibleAll commands output structured JSON by default, ideal for programmatic use by AI agents:
uvx desktop-agent mouse position
# Output: {"success": true, "command": "mouse.position", "timestamp": "2026-01-31T10:00:00Z", "duration_ms": 5, "data": {"position": {"x": 960, "y": 540}}}
All JSON responses follow this schema:
{
"success": true,
"command": "category.command",
"timestamp": "2026-01-31T10:00:00Z",
"duration_ms": 150,
"data": { ... },
"error": null
}
{
"success": false,
"command": "category.command",
"timestamp": "2026-01-31T10:00:00Z",
"duration_ms": 50,
"data": null,
"error": {
"code": "image_not_found",
"message": "Image file 'button.png' not found",
"details": {},
"recoverable": true
}
}
| Code | Description |
|---|---|
success | Command succeeded |
invalid_argument | Invalid command arguments |
coordinates_out_of_bounds | Coordinates outside screen |
image_not_found | Image file not found or not on screen |
window_not_found | Target window not found |
ocr_failed | OCR operation failed |
application_not_found | Application not found |
permission_denied | Permission denied |
platform_not_supported | Platform not supported |
timeout | Operation timed out |
unknown_error | Unknown error |
Mouse move:
uvx desktop-agent mouse move 960 540
{"success": true, "command": "mouse.move", "timestamp": "...", "duration_ms": 150, "data": {"x": 960, "y": 540, "duration": 0}, "error": null}
Screen size:
uvx desktop-agent screen size
{"success": true, "command": "screen.size", "timestamp": "...", "duration_ms": 5, "data": {"size": {"width": 1920, "height": 1080}}, "error": null}
Locate image:
uvx desktop-agent screen locate button.png
{"success": true, "command": "screen.locate", "timestamp": "...", "duration_ms": 250, "data": {"image_found": true, "bounding_box": {"left": 100, "top": 200, "width": 50, "height": 30, "center_x": 125, "center_y": 215}}, "error": null}
List windows:
uvx desktop-agent app list
{"success": true, "command": "app.list", "timestamp": "...", "duration_ms": 100, "data": {"windows": ["Untitled - Notepad", "Google Chrome", "Visual Studio Code"]}, "error": null}
Error example:
uvx desktop-agent screen locate missing.png
{"success": false, "command": "screen.locate", "timestamp": "...", "duration_ms": 50, "data": null, "error": {"code": "image_not_found", "message": "Image file 'missing.png' not found", "details": {}, "recoverable": true}}
This section teaches AI agents how to use this skill effectively with optimal command sequences and best practices.
Always understand the current state before performing actions. This avoids clicking wrong coordinates or typing in the wrong window.
Recommended Initial Sequence:
# 1. Get screen dimensions to understand your workspace
uvx desktop-agent screen size
uvx desktop-agent app list
uvx desktop-agent mouse position
# ✅ CORRECT: Open, wait, verify, then interact
uvx desktop-agent app open notepad # Step 1: Open app
uvx desktop-agent app list
uvx desktop-agent app focus "Notepad"
uvx desktop-agent keyboard write "Hello World" # Step 4: Now safe to type
# ❌ WRONG: Type immediately without verification
uvx desktop-agent app open notepad
uvx desktop-agent keyboard write "Hello World" # May type in wrong window!
# ✅ CORRECT: Locate first, click if found
uvx desktop-agent screen locate-center button.png --confidence 0.8
# Check if success=true and coordinates are valid
uvx desktop-agent mouse click 125 215 # Use returned coordinates
# ❌ WRONG: Click without verifying element exists
uvx desktop-agent mouse click 125 215 # Might click wrong area!
# ✅ CORRECT: Read screen text, then locate specific text
uvx desktop-agent screen read-all-text --active
uvx desktop-agent screen locate-text-coordinates "Save" --active
# Use returned coordinates to click
# For window-specific OCR:
uvx desktop-agent screen locate-text-coordinates "OK" --window "Dialog Title"
# ✅ CORRECT: Click each field explicitly before typing
uvx desktop-agent mouse click 300 200 # Click first field
uvx desktop-agent keyboard write "John Doe"
uvx desktop-agent mouse click 300 250 # Click second field (more reliable)
uvx desktop-agent keyboard write "john@example.com"
uvx desktop-agent mouse click 300 300 # Click third field
uvx desktop-agent keyboard write "555-1234"
# OR use Tab navigation (less reliable if field order changes)
uvx desktop-agent mouse click 300 200
uvx desktop-agent keyboard write "John Doe"
uvx desktop-agent keyboard press tab
uvx desktop-agent keyboard write "john@example.com"
uvx desktop-agent keyboard press tab
uvx desktop-agent keyboard write "555-1234"
uvx desktop-agent keyboard press enter # Submit
# ✅ CORRECT: Screenshot specific windows for faster processing
uvx desktop-agent app list --json # Find exact window title
uvx desktop-agent screen screenshot app.png --window "Google Chrome"
# For active window only
uvx desktop-agent screen screenshot active.png --active
# Full screen only when necessary (slower, larger file)
uvx desktop-agent screen size
uvx desktop-agent screen screenshot full.png
# ✅ CORRECT: Move to start, verify position, then drag
uvx desktop-agent mouse move 100 200 # Move to source
uvx desktop-agent mouse position # Verify position
uvx desktop-agent mouse drag 500 400 --duration 0.5 # Drag to destination
# For precision, use slower duration
uvx desktop-agent mouse drag 500 400 --duration 1.0
# Pattern: List windows, find closest match, retry
uvx desktop-agent app focus "Chrome" # Fails with window_not_found
uvx desktop-agent app list # See actual window titles
# Output shows: "Google Chrome - My Page"
uvx desktop-agent app focus "Google Chrome" # Use correct title
# Pattern: Adjust confidence or take new screenshot
uvx desktop-agent screen locate button.png --confidence 0.9
uvx desktop-agent screen locate button.png --confidence 0.7
# If still failing, capture current state for analysis
uvx desktop-agent screen screenshot current.png --active
# Pattern: Verify coordinates are on screen
uvx desktop-agent screen size # Get screen bounds
uvx desktop-agent screen on-screen 1500 900 # Check if coords are valid
uvx desktop-agent mouse move 1500 900 # Move first to visualize
uvx desktop-agent mouse click # Then click at current position
# ✅ GOOD: Screenshot only the region you need
uvx desktop-agent screen screenshot button_area.png --region "100,200,200,100"
# ✅ GOOD: Screenshot specific window instead of full screen
uvx desktop-agent screen screenshot chrome.png --window "Google Chrome"
# ❌ SLOW: Full screen capture when you only need a small area
uvx desktop-agent screen screenshot full.png
# ✅ FASTER: Write entire text at once
uvx desktop-agent keyboard write "This is a complete sentence with all the text."
# ❌ SLOWER: Multiple write commands
uvx desktop-agent keyboard write "This is "
uvx desktop-agent keyboard write "a complete "
uvx desktop-agent keyboard write "sentence."
# ✅ FASTER: Use keyboard shortcuts
uvx desktop-agent keyboard hotkey "ctrl,s" # Save
uvx desktop-agent keyboard hotkey "ctrl,a" # Select all
uvx desktop-agent keyboard hotkey "ctrl,shift,s" # Save as
# ❌ SLOWER: Navigate menu with mouse
uvx desktop-agent mouse click 50 30 # Click File menu
uvx desktop-agent mouse click 60 80 # Click Save option
# Before destructive action, confirm with user
uvx desktop-agent message confirm "This will delete all files. Continue?" --title "Warning"
# Check output: if "Cancel" was clicked, abort operation
# ✅ RELIABLE: Parse structured JSON output
uvx desktop-agent screen locate button.png
# Parse: {"success": true, "data": {"center_x": 125, "center_y": 215}}
# ❌ FRAGILE: Parse text output
uvx desktop-agent screen locate button.png
# Parse: "Found at: Box(left=100, top=200, width=50, height=30)"
# Multi-step file operation with validation
uvx desktop-agent app list
uvx desktop-agent screen locate-text-coordinates "File" --active
uvx desktop-agent mouse click <returned_x> <returned_y>
uvx desktop-agent screen locate-text-coordinates "Save As" --active
uvx desktop-agent mouse click <returned_x> <returned_y>
# Common Windows shortcuts
uvx desktop-agent keyboard hotkey "win,d" # Show desktop
uvx desktop-agent keyboard hotkey "win,e" # Open Explorer
uvx desktop-agent keyboard hotkey "alt,tab" # Switch windows
uvx desktop-agent keyboard hotkey "win,r" # Run dialog
# Open apps by name
uvx desktop-agent app open notepad
uvx desktop-agent app open calc
uvx desktop-agent app open mspaint
# Common macOS shortcuts (use 'command' for Cmd key)
uvx desktop-agent keyboard hotkey "command,space" # Spotlight
uvx desktop-agent keyboard hotkey "command,tab" # App switcher
uvx desktop-agent keyboard hotkey "command,q" # Quit app
uvx desktop-agent keyboard hotkey "command,shift,3" # Screenshot
# Open apps
uvx desktop-agent app open "Safari"
uvx desktop-agent app open "TextEdit"
# Open apps (uses xdg-open or direct command)
uvx desktop-agent app open firefox
uvx desktop-agent app open gedit
# Common shortcuts may vary by DE
uvx desktop-agent keyboard hotkey "alt,f2" # Run dialog (many DEs)
Want to interact with an app?
├── App not running → `app open <name>`
├── App running but not focused → `app focus <name>`
└── Need to verify windows → `app list`
Want to find a UI element?
├── Have reference image → `screen locate-center <image>`
├── Know the text label → `screen locate-text-coordinates "<text>"`
└── Need to see all text → `screen read-all-text --active`
Want to click something?
├── Know exact coordinates → `mouse click <x> <y>`
├── Need to find first → Use locate commands above, then click returned coords
└── Not sure if on screen → `screen on-screen <x> <y>` first
Want to type something?
├── Regular text → `keyboard write "<text>"`
├── Keyboard shortcut → `keyboard hotkey "<key1>,<key2>"`
├── Single key press → `keyboard press <key>`
└── Multiple of same key → `keyboard press <key> --presses N`
Weekly Installs
–
Repository
First Seen
–
Security Audits
Skills CLI 使用指南:AI Agent 技能包管理器安装与管理教程
27,400 周安装