Browser Automation by browserbase/agent-browse
npx skills add https://github.com/browserbase/agent-browse --skill 'Browser Automation'使用 Stagehand CLI 与 Claude 实现浏览器交互自动化。此技能通过命令行工具提供对 Chrome 浏览器的自然语言控制,用于导航、交互、数据提取和截图。
此技能采用基于 CLI 的方法,Claude Code 通过 bash 调用浏览器自动化命令。浏览器在命令之间保持打开状态,以实现更快的顺序操作并保留浏览器状态(cookies、会话等)。
重要提示:在使用任何浏览器命令之前,您必须检查此目录中的 setup.json 文件。
setup.json(位于 .claude/skills/browser-automation/setup.json)setupComplete 字段:
true:所有先决条件已满足,可继续使用浏览器命令false:需要进行设置 - 请按照以下步骤操作setupComplete: false)广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
在插件目录中运行以下命令:
# 1. 安装依赖项并构建(必需)
# 这将自动构建 TypeScript
npm install
# 或者:pnpm install
# 或者:bun install
# 2. 全局链接浏览器命令(必需)
npm link
# 3. 配置 API 密钥(必需)
# 选项 1(推荐):在终端中导出
export ANTHROPIC_API_KEY="your-api-key-here"
# 选项 2:或使用 .env 文件
cp .env.example .env
# 然后编辑 .env 并添加:ANTHROPIC_API_KEY="your-api-key-here"
# 4. 验证 Chrome 是否已安装
# Chrome 应位于您操作系统的标准位置
# 5. 测试安装
browser navigate https://example.com
# 6. 如果测试成功,更新 setup.json
# 将所有 "installed"/"configured" 字段设置为 true
# 将 "setupComplete" 设置为 true
npm install 会自动运行构建)npm link 创建全局符号链接)ANTHROPIC_API_KEY 环境变量导出或在 .env 文件中)如果 setup.json 中的 setupComplete 为 false,请勿尝试使用浏览器命令。首先指导用户完成设置。
browser navigate <url>
使用时机:打开任何网站、加载特定 URL、转到网页。
示例用法:
browser navigate https://example.combrowser navigate https://news.ycombinator.com输出:包含成功状态、消息和截图路径的 JSON
browser act "<action>"
使用时机:点击按钮、填写表单、滚动、选择选项、输入文本。
示例用法:
browser act "click the Sign In button"browser act "fill in the email field with test@example.com"browser act "scroll down to the footer"browser act "type 'laptop' in the search box and press enter"重要提示:尽可能具体 - 细节至关重要。填写字段时,您无需组合“点击并输入”;该工具将执行类似于 Playwright 的 fill 函数的填充操作。
输出:包含成功状态、消息和截图路径的 JSON
browser extract "<instruction>" ['{"field": "type"}']
使用时机:抓取数据、获取特定信息、收集结构化内容。
模式格式(可选):JSON 对象,其中键是字段名,值是类型:
"string" 用于文本"number" 用于数值"boolean" 用于真/假值注意:模式参数是可选的。如果省略或模式验证失败,提取将在没有类型验证的情况下继续进行。
示例用法:
browser extract "get the product title and price" '{"title": "string", "price": "number"}'browser extract "get all article headlines" '{"headlines": "string"}'browser extract "get the page title"(无模式)输出:包含成功状态、提取数据和截图路径的 JSON
browser observe "<query>"
使用时机:理解页面结构、查找可点击内容、发现表单字段。
示例用法:
browser observe "find all clickable buttons"browser observe "find all form fields"browser observe "find all navigation links"输出:包含成功状态、发现元素和截图路径的 JSON
browser screenshot
使用时机:视觉验证、记录页面状态、调试、创建记录。
注意事项:
agent/browser_screenshots/ 文件夹输出:包含成功状态和截图路径的 JSON
browser close
使用时机:完成所有浏览器交互后,释放资源。
输出:包含成功状态和消息的 JSON
持久化浏览器:浏览器在命令之间保持打开状态,以实现更快的顺序操作并保留浏览器状态(cookies、会话等)。
重用现有实例:如果 Chrome 已在端口 9222 上运行,它将重用该实例。
最小化启动:Chrome 在屏幕外打开(位置 -9999,-9999),以避免干扰工作流程。
安全清理:浏览器仅在您显式调用 close 命令时关闭。
success 字段;如果操作失败,查看截图并尝试使用 observe 来更好地理解页面browser navigate https://example.com
browser act "click the login button"
browser screenshot
browser close
browser navigate https://example.com/products
browser act "wait for page to load"
browser extract "get all products" '{"name": "string", "price": "number"}'
# 或者不使用模式:
# browser extract "get the page content"
browser close
browser navigate https://example.com/login
browser act "fill in email with user@example.com"
browser act "fill in password with mypassword"
browser act "click the submit button"
browser screenshot
browser close
browser navigate https://example.com
browser screenshot
browser observe "find all buttons"
browser act "click the specific button"
browser screenshot
browser close
页面未加载:导航后等待几秒钟再操作。您可以显式执行:browser act "wait for the page to fully load"
未找到元素:使用 observe 来发现页面上实际可用的元素
操作失败:在自然语言描述中更具体。不要使用“点击按钮”,尝试“点击表单中的蓝色提交按钮”
截图缺失:检查插件目录的 agent/browser_screenshots/ 文件夹中的保存文件
未找到 Chrome:安装 Google Chrome,否则 CLI 将显示包含安装说明的错误
端口 9222 被占用:另一个 Chrome 调试会话正在运行。关闭它或等待超时
有关详细示例,请参阅 EXAMPLES.md。有关 API 参考和技术细节,请参阅 REFERENCE.md。
要使用此技能,请仅在以下依赖项不存在时安装它们:
npm install
# 或者
pnpm install
# 或者
bun install
每周安装次数
0
仓库
GitHub 星标数
452
首次出现时间
1970年1月1日
安全审计
Automate browser interactions using Stagehand CLI with Claude. This skill provides natural language control over a Chrome browser through command-line tools for navigation, interaction, data extraction, and screenshots.
This skill uses a CLI-based approach where Claude Code calls browser automation commands via bash. The browser stays open between commands for faster sequential operations and preserves browser state (cookies, sessions, etc.).
IMPORTANT: Before using any browser commands, you MUST check setup.json in this directory.
setup.json (located in .claude/skills/browser-automation/setup.json)setupComplete field:
true: All prerequisites are met, proceed with browser commandsfalse: Setup required - follow the steps belowsetupComplete: false)Run these commands in the plugin directory:
# 1. Install dependencies and build (REQUIRED)
# This automatically builds TypeScript
npm install
# or: pnpm install
# or: bun install
# 2. Link the browser command globally (REQUIRED)
npm link
# 3. Configure API key (REQUIRED)
# Option 1 (RECOMMENDED): Export in your terminal
export ANTHROPIC_API_KEY="your-api-key-here"
# Option 2: Or use .env file
cp .env.example .env
# Then edit .env and add: ANTHROPIC_API_KEY="your-api-key-here"
# 4. Verify Chrome is installed
# Chrome should be at standard location for your OS
# 5. Test the installation
browser navigate https://example.com
# 6. If test succeeds, update setup.json
# Set all "installed"/"configured" fields to true
# Set "setupComplete" to true
npm install runs build automatically)npm link creates the global symlink)ANTHROPIC_API_KEY environment variable or in .env file)DO NOT attempt to use browser commands ifsetupComplete: false in setup.json. Guide the user through setup first.
browser navigate <url>
When to use : Opening any website, loading a specific URL, going to a web page.
Example usage :
browser navigate https://example.combrowser navigate https://news.ycombinator.comOutput : JSON with success status, message, and screenshot path
browser act "<action>"
When to use : Clicking buttons, filling forms, scrolling, selecting options, typing text.
Example usage :
browser act "click the Sign In button"browser act "fill in the email field with test@example.com"browser act "scroll down to the footer"browser act "type 'laptop' in the search box and press enter"Important : Be as specific as possible - details make a world of difference. When filling fields, you don't need to combine 'click and type'; the tool will perform a fill similar to Playwright's fill function.
Output : JSON with success status, message, and screenshot path
browser extract "<instruction>" ['{"field": "type"}']
When to use : Scraping data, getting specific information, collecting structured content.
Schema format (optional): JSON object where keys are field names and values are types:
"string" for text"number" for numeric values"boolean" for true/false valuesNote : The schema parameter is optional. If omitted or if schema validation fails, extraction will proceed without type validation.
Example usage :
browser extract "get the product title and price" '{"title": "string", "price": "number"}'browser extract "get all article headlines" '{"headlines": "string"}'browser extract "get the page title" (no schema)Output : JSON with success status, extracted data, and screenshot path
browser observe "<query>"
When to use : Understanding page structure, finding what's clickable, discovering form fields.
Example usage :
browser observe "find all clickable buttons"browser observe "find all form fields"browser observe "find all navigation links"Output : JSON with success status, discovered elements, and screenshot path
browser screenshot
When to use : Visual verification, documenting page state, debugging, creating records.
Notes :
agent/browser_screenshots/ folderOutput : JSON with success status and screenshot path
browser close
When to use : After completing all browser interactions, to free up resources.
Output : JSON with success status and message
Persistent Browser : The browser stays open between commands for faster sequential operations and to preserve browser state (cookies, sessions, etc.).
Reuse Existing : If Chrome is already running on port 9222, it will reuse that instance.
Minimized Launch : Chrome opens off-screen (position -9999,-9999) to avoid disrupting workflow.
Safe Cleanup : The browser only closes when you explicitly call the close command.
success field in JSON output; if an action fails, view the screenshot and try using observe to understand the page betterbrowser navigate https://example.com
browser act "click the login button"
browser screenshot
browser close
browser navigate https://example.com/products
browser act "wait for page to load"
browser extract "get all products" '{"name": "string", "price": "number"}'
# Or without schema:
# browser extract "get the page content"
browser close
browser navigate https://example.com/login
browser act "fill in email with user@example.com"
browser act "fill in password with mypassword"
browser act "click the submit button"
browser screenshot
browser close
browser navigate https://example.com
browser screenshot
browser observe "find all buttons"
browser act "click the specific button"
browser screenshot
browser close
Page not loading : Wait a few seconds after navigation before acting. You can explicitly: browser act "wait for the page to fully load"
Element not found : Use observe to discover what elements are actually available on the page
Action fails : Be more specific in natural language description. Instead of "click the button", try "click the blue Submit button in the form"
Screenshots missing : Check the plugin directory's agent/browser_screenshots/ folder for saved files
Chrome not found : Install Google Chrome or the CLI will show an error with installation instructions
Port 9222 in use : Another Chrome debugging session is running. Close it or wait for timeout
For detailed examples, see EXAMPLES.md. For API reference and technical details, see REFERENCE.md.
To use this skill, install these dependencies only if they aren't already present:
npm install
# or
pnpm install
# or
bun install
Weekly Installs
0
Repository
GitHub Stars
452
First Seen
Jan 1, 1970
Security Audits
AI Elements:基于shadcn/ui的AI原生应用组件库,快速构建对话界面
56,200 周安装