浏览器自动化技能：使用Stagehand CLI与Claude实现自然语言控制Chrome浏览器

Browser Automation by browserbase/agent-browse

473 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/browserbase/agent-browse --skill 'Browser Automation'

AI/机器学习自动化测试

🇨🇳中文介绍

浏览器自动化

使用 Stagehand CLI 与 Claude 实现浏览器交互自动化。此技能通过命令行工具提供对 Chrome 浏览器的自然语言控制，用于导航、交互、数据提取和截图。

概述

此技能采用基于 CLI 的方法，Claude Code 通过 bash 调用浏览器自动化命令。浏览器在命令之间保持打开状态，以实现更快的顺序操作并保留浏览器状态（cookies、会话等）。

设置验证

重要提示：在使用任何浏览器命令之前，您必须检查此目录中的 setup.json 文件。

首次设置检查

读取 setup.json（位于 .claude/skills/browser-automation/setup.json）
检查 setupComplete 字段：
- 如果为 true：所有先决条件已满足，可继续使用浏览器命令
- 如果为 false：需要进行设置 - 请按照以下步骤操作

如果需要设置（`setupComplete: false`）

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

相关 Skills

FlyClaw：零登录航班聚合查询工具，Python实现多源航班信息与价格搜索

4,000,000 周安装

find-skills 技能搜索工具 - Vercel Labs 开源智能体技能包管理器

776,000 周安装

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

106,200 周安装

Azure RBAC 权限管理工具：查找最小角色、创建自定义角色与自动化分配

104,600 周安装

在插件目录中运行以下命令：

# 1. 安装依赖项并构建（必需）
# 这将自动构建 TypeScript
npm install
# 或者：pnpm install
# 或者：bun install

# 2. 全局链接浏览器命令（必需）
npm link

# 3. 配置 API 密钥（必需）
# 选项 1（推荐）：在终端中导出
export ANTHROPIC_API_KEY="your-api-key-here"

# 选项 2：或使用 .env 文件
cp .env.example .env
# 然后编辑 .env 并添加：ANTHROPIC_API_KEY="your-api-key-here"

# 4. 验证 Chrome 是否已安装
# Chrome 应位于您操作系统的标准位置

# 5. 测试安装
browser navigate https://example.com

# 6. 如果测试成功，更新 setup.json
# 将所有 "installed"/"configured" 字段设置为 true
# 将 "setupComplete" 设置为 true

✅ 系统上已安装 Google Chrome
✅ Node.js 依赖项已安装且 TypeScript 已构建（npm install 会自动运行构建）
✅ 浏览器命令全局可用（npm link 创建全局符号链接）
✅ Anthropic API 密钥已配置（作为 ANTHROPIC_API_KEY 环境变量导出或在 .env 文件中）

如果 setup.json 中的 setupComplete 为 false，请勿尝试使用浏览器命令。首先指导用户完成设置。

browser navigate <url>

使用时机：打开任何网站、加载特定 URL、转到网页。

browser navigate https://example.com
browser navigate https://news.ycombinator.com

输出：包含成功状态、消息和截图路径的 JSON

browser act "<action>"

使用时机：点击按钮、填写表单、滚动、选择选项、输入文本。

browser act "click the Sign In button"
browser act "fill in the email field with test@example.com"
browser act "scroll down to the footer"
browser act "type 'laptop' in the search box and press enter"

重要提示：尽可能具体 - 细节至关重要。填写字段时，您无需组合“点击并输入”；该工具将执行类似于 Playwright 的 fill 函数的填充操作。

输出：包含成功状态、消息和截图路径的 JSON

browser extract "<instruction>" ['{"field": "type"}']

使用时机：抓取数据、获取特定信息、收集结构化内容。

模式格式（可选）：JSON 对象，其中键是字段名，值是类型：

"string" 用于文本
"number" 用于数值
"boolean" 用于真/假值

注意：模式参数是可选的。如果省略或模式验证失败，提取将在没有类型验证的情况下继续进行。

browser extract "get the product title and price" '{"title": "string", "price": "number"}'
browser extract "get all article headlines" '{"headlines": "string"}'
browser extract "get the page title"（无模式）

输出：包含成功状态、提取数据和截图路径的 JSON

browser observe "<query>"

使用时机：理解页面结构、查找可点击内容、发现表单字段。

browser observe "find all clickable buttons"
browser observe "find all form fields"
browser observe "find all navigation links"

输出：包含成功状态、发现元素和截图路径的 JSON

browser screenshot

使用时机：视觉验证、记录页面状态、调试、创建记录。

截图保存到插件目录的 agent/browser_screenshots/ 文件夹
大于 2000x2000 像素的图像会自动调整大小
文件名包含时间戳以确保唯一性

输出：包含成功状态和截图路径的 JSON

使用时机：完成所有浏览器交互后，释放资源。

输出：包含成功状态和消息的 JSON

持久化浏览器：浏览器在命令之间保持打开状态，以实现更快的顺序操作并保留浏览器状态（cookies、会话等）。

重用现有实例：如果 Chrome 已在端口 9222 上运行，它将重用该实例。

最小化启动：Chrome 在屏幕外打开（位置 -9999,-9999），以避免干扰工作流程。

安全清理：浏览器仅在您显式调用 close 命令时关闭。

始终先导航：在与页面交互之前，先导航到 URL
📸 始终查看截图：每个命令（navigate、act、extract、observe）后，使用 Read 工具查看截图并验证命令是否正确执行
使用自然语言：像指导人类一样描述操作
使用清晰模式提取：在 JSON 中明确定义字段名和类型
优雅处理错误：检查 JSON 输出中的 success 字段；如果操作失败，查看截图并尝试使用 observe 来更好地理解页面
完成后关闭：完成任务后始终清理浏览器资源
具体明确：在自然语言中使用精确的选择器（“蓝色的提交按钮”与“按钮”）
链式命令：无需重新打开浏览器即可顺序运行多个命令

browser navigate https://example.com
browser act "click the login button"
browser screenshot
browser close

browser navigate https://example.com/products
browser act "wait for page to load"
browser extract "get all products" '{"name": "string", "price": "number"}'
# 或者不使用模式：
# browser extract "get the page content"
browser close

browser navigate https://example.com/login
browser act "fill in email with user@example.com"
browser act "fill in password with mypassword"
browser act "click the submit button"
browser screenshot
browser close

browser navigate https://example.com
browser screenshot
browser observe "find all buttons"
browser act "click the specific button"
browser screenshot
browser close

页面未加载：导航后等待几秒钟再操作。您可以显式执行：browser act "wait for the page to fully load"

未找到元素：使用 observe 来发现页面上实际可用的元素

操作失败：在自然语言描述中更具体。不要使用“点击按钮”，尝试“点击表单中的蓝色提交按钮”

截图缺失：检查插件目录的 agent/browser_screenshots/ 文件夹中的保存文件

未找到 Chrome：安装 Google Chrome，否则 CLI 将显示包含安装说明的错误

端口 9222 被占用：另一个 Chrome 调试会话正在运行。关闭它或等待超时

有关详细示例，请参阅 EXAMPLES.md。有关 API 参考和技术细节，请参阅 REFERENCE.md。

要使用此技能，请仅在以下依赖项不存在时安装它们：

npm install
# 或者
pnpm install
# 或者
bun install

🇺🇸English

Browser Automation

Automate browser interactions using Stagehand CLI with Claude. This skill provides natural language control over a Chrome browser through command-line tools for navigation, interaction, data extraction, and screenshots.

Overview

This skill uses a CLI-based approach where Claude Code calls browser automation commands via bash. The browser stays open between commands for faster sequential operations and preserves browser state (cookies, sessions, etc.).

Setup Verification

IMPORTANT: Before using any browser commands, you MUST check setup.json in this directory.

First-Time Setup Check

Readsetup.json (located in .claude/skills/browser-automation/setup.json)
ChecksetupComplete field:
- If true: All prerequisites are met, proceed with browser commands
- If false: Setup required - follow the steps below

If Setup is Required (`setupComplete: false`)

Run these commands in the plugin directory:

# 1. Install dependencies and build (REQUIRED)
# This automatically builds TypeScript
npm install
# or: pnpm install
# or: bun install

# 2. Link the browser command globally (REQUIRED)
npm link

# 3. Configure API key (REQUIRED)
# Option 1 (RECOMMENDED): Export in your terminal
export ANTHROPIC_API_KEY="your-api-key-here"

# Option 2: Or use .env file
cp .env.example .env
# Then edit .env and add: ANTHROPIC_API_KEY="your-api-key-here"

# 4. Verify Chrome is installed
# Chrome should be at standard location for your OS

# 5. Test the installation
browser navigate https://example.com

# 6. If test succeeds, update setup.json
# Set all "installed"/"configured" fields to true
# Set "setupComplete" to true

Prerequisites Summary

✅ Google Chrome installed on your system
✅ Node.js dependencies installed and TypeScript built (npm install runs build automatically)
✅ Browser command globally available (npm link creates the global symlink)
✅ Anthropic API key configured (exported as ANTHROPIC_API_KEY environment variable or in .env file)

DO NOT attempt to use browser commands ifsetupComplete: false in setup.json. Guide the user through setup first.

Available Commands

Navigate to URLs

browser navigate <url>

When to use : Opening any website, loading a specific URL, going to a web page.

Example usage :

browser navigate https://example.com
browser navigate https://news.ycombinator.com

Output : JSON with success status, message, and screenshot path

Interact with Pages

browser act "<action>"

When to use : Clicking buttons, filling forms, scrolling, selecting options, typing text.

Example usage :

browser act "click the Sign In button"
browser act "fill in the email field with test@example.com"
browser act "scroll down to the footer"
browser act "type 'laptop' in the search box and press enter"

Important : Be as specific as possible - details make a world of difference. When filling fields, you don't need to combine 'click and type'; the tool will perform a fill similar to Playwright's fill function.

Output : JSON with success status, message, and screenshot path

Extract Data

browser extract "<instruction>" ['{"field": "type"}']

When to use : Scraping data, getting specific information, collecting structured content.

Schema format (optional): JSON object where keys are field names and values are types:

"string" for text
"number" for numeric values
"boolean" for true/false values

Note : The schema parameter is optional. If omitted or if schema validation fails, extraction will proceed without type validation.

Example usage :

browser extract "get the product title and price" '{"title": "string", "price": "number"}'
browser extract "get all article headlines" '{"headlines": "string"}'
browser extract "get the page title" (no schema)

Output : JSON with success status, extracted data, and screenshot path

Discover Elements

browser observe "<query>"

When to use : Understanding page structure, finding what's clickable, discovering form fields.

Example usage :

browser observe "find all clickable buttons"
browser observe "find all form fields"
browser observe "find all navigation links"

Output : JSON with success status, discovered elements, and screenshot path

Take Screenshots

browser screenshot

When to use : Visual verification, documenting page state, debugging, creating records.

Notes :

Screenshots are saved to the plugin directory's agent/browser_screenshots/ folder
Images larger than 2000x2000 pixels are automatically resized
Filename includes timestamp for uniqueness

Output : JSON with success status and screenshot path

Clean Up

browser close

When to use : After completing all browser interactions, to free up resources.

Output : JSON with success status and message

Browser Behavior

Persistent Browser : The browser stays open between commands for faster sequential operations and to preserve browser state (cookies, sessions, etc.).

Reuse Existing : If Chrome is already running on port 9222, it will reuse that instance.

Minimized Launch : Chrome opens off-screen (position -9999,-9999) to avoid disrupting workflow.

Safe Cleanup : The browser only closes when you explicitly call the close command.

Best Practices

Always navigate first : Before interacting with a page, navigate to the URL
📸 Always view screenshots : After each command (navigate, act, extract, observe), use the Read tool to view the screenshot and verify the command worked correctly
Use natural language : Describe actions as you would instruct a human
Extract with clear schemas : Define field names and types explicitly in JSON
Handle errors gracefully : Check the success field in JSON output; if an action fails, view the screenshot and try using observe to understand the page better
Close when done : Always clean up browser resources after completing tasks
Be specific : Use precise selectors in natural language ("the blue Submit button" vs "the button")
Chain commands : Run multiple commands sequentially without reopening the browser

Common Patterns

Simple browsing task

browser navigate https://example.com
browser act "click the login button"
browser screenshot
browser close

Data extraction task

browser navigate https://example.com/products
browser act "wait for page to load"
browser extract "get all products" '{"name": "string", "price": "number"}'
# Or without schema:
# browser extract "get the page content"
browser close

Multi-step interaction

browser navigate https://example.com/login
browser act "fill in email with user@example.com"
browser act "fill in password with mypassword"
browser act "click the submit button"
browser screenshot
browser close

Debugging workflow

browser navigate https://example.com
browser screenshot
browser observe "find all buttons"
browser act "click the specific button"
browser screenshot
browser close

Troubleshooting

Page not loading : Wait a few seconds after navigation before acting. You can explicitly: browser act "wait for the page to fully load"

Element not found : Use observe to discover what elements are actually available on the page

Action fails : Be more specific in natural language description. Instead of "click the button", try "click the blue Submit button in the form"

Screenshots missing : Check the plugin directory's agent/browser_screenshots/ folder for saved files

Chrome not found : Install Google Chrome or the CLI will show an error with installation instructions

Port 9222 in use : Another Chrome debugging session is running. Close it or wait for timeout

For detailed examples, see EXAMPLES.md. For API reference and technical details, see REFERENCE.md.

Dependencies

To use this skill, install these dependencies only if they aren't already present:

npm install
# or
pnpm install
# or
bun install

Weekly Installs

Repository

browserbase/agent-browse

GitHub Stars

452

First Seen

Jan 1, 1970

Security Audits

Gen Agent Trust HubWarn

AI Elements：基于shadcn/ui的AI原生应用组件库，快速构建对话界面

56,200 周安装

浏览器自动化技能：使用Stagehand CLI与Claude实现自然语言控制Chrome浏览器

🇨🇳中文介绍

浏览器自动化

概述

设置验证

首次设置检查

如果需要设置（setupComplete: false）

相关 Skills

先决条件总结

可用命令

导航到 URL

与页面交互

提取数据

发现元素

截图

清理

浏览器行为

最佳实践

常见模式

简单浏览任务

数据提取任务

多步骤交互

调试工作流

故障排除

依赖项

🇺🇸English

Browser Automation

Overview

Setup Verification

First-Time Setup Check

If Setup is Required (setupComplete: false)

Prerequisites Summary

Available Commands

Navigate to URLs

Interact with Pages

Extract Data

Discover Elements

Take Screenshots

Clean Up

Browser Behavior

Best Practices

Common Patterns

Simple browsing task

Data extraction task

Multi-step interaction

Debugging workflow

Troubleshooting

Dependencies

最新 Skills

如果需要设置（`setupComplete: false`）

If Setup is Required (`setupComplete: false`)