Chrome浏览器自动化控制工具 - 使用use_browser MCP实现网页浏览与数据提取

browsing by obra/superpowers-chrome

155 周安装量

219 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/obra/superpowers-chrome --skill browsing

自动化 Web应用测试

🇨🇳中文介绍

通过 Chrome Direct 进行浏览

概述

使用 use_browser MCP 工具通过 DevTools 协议控制 Chrome。单一的统一接口，可自动启动 Chrome。

声明： "我正在使用浏览技能来控制 Chrome。"

何时使用

在以下情况下使用此方式：

控制已认证的会话
管理运行中浏览器的多个标签页
Playwright MCP 不可用或过于复杂

在以下情况下使用 Playwright MCP：

需要全新的浏览器实例
生成屏幕截图/PDF
偏好更高级别的抽象

自动捕获

每次 DOM 操作（导航、点击、输入、选择、执行脚本、键盘按键）都会自动保存：

{前缀}.png — 视口屏幕截图
{前缀}.md — 页面内容，以结构化 Markdown 格式保存
{前缀}.html — 完整的渲染后 DOM
{前缀}-console.txt — 浏览器控制台消息

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

浏览器模式控制

show_browser：使浏览器窗口可见（有头模式）
- 示例：{action: "show_browser"}
- ⚠️ 警告：重启 Chrome，通过 GET 重新加载页面，丢失 POST 状态
hide_browser：切换到无头模式（不可见浏览器）
- 示例：{action: "hide_browser"}
- ⚠️ 警告：重启 Chrome，通过 GET 重新加载页面，丢失 POST 状态
browser_mode：检查当前浏览器模式、端口和配置文件
- 示例：{action: "browser_mode"}
- 返回：{"headless": true|false, "mode": "headless"|"headed", "running": true|false, "port": 9222, "profile": "name", "profileDir": "/path"}

set_profile：更改 Chrome 配置文件（必须先终止 Chrome）
- 示例：{action: "set_profile", "payload": "browser-user"}
- ⚠️ 警告：必须先停止 Chrome
get_profile：获取当前配置文件名称和目录
- 示例：{action: "get_profile"}
- 返回：{"profile": "name", "profileDir": "/path"}

默认行为：Chrome 以无头模式启动，使用 "superpowers-chrome" 配置文件，在动态分配的端口（范围 9222-12111）上运行。可通过 CHROME_WS_PORT 环境变量或 MCP --port=N 标志覆盖。

切换模式时的关键注意事项：

Chrome 必须重启 — 无法在运行中的 Chrome 上切换无头/有头模式
页面通过 GET 重新加载 — 所有打开的标签页都通过 GET 请求重新打开
POST 状态丢失 — 表单提交、POST 结果和基于 POST 的导航将丢失
会话状态丢失 — 任何客户端状态（JavaScript 变量等）将被清除
Cookie/认证可能保留 — 使用相同的用户数据目录，因此已登录的会话可能得以保留

何时使用有头模式：

调试视觉渲染问题
向用户演示浏览器行为
测试仅在可见浏览器中有效的功能
调试在无头模式下无法复现的问题

何时保持在无头模式（默认）：

所有其他情况 — 更快、更简洁、干扰更小
屏幕截图在无头模式下完美工作
大多数自动化在两种模式下工作方式相同

配置文件管理：配置文件存储持久的浏览器数据（Cookie、localStorage、扩展程序、认证会话）。

配置文件位置：

macOS：~/Library/Caches/superpowers/browser-profiles/{name}/
Linux：~/.cache/superpowers/browser-profiles/{name}/
Windows：%LOCALAPPDATA%/superpowers/browser-profiles/{name}/

何时使用单独的配置文件：

默认配置文件（"superpowers-chrome"）：通用自动化、共享会话
特定于代理的配置文件：隔离不同代理的浏览器状态
- 示例：browser-user 代理使用 "browser-user" 配置文件
特定于任务的配置文件：使用不同用户上下文进行测试
- 示例："test-logged-in" 与 "test-logged-out"

配置文件数据在以下情况中持续存在：

Chrome 重启
模式切换（无头 ↔ 有头）
系统重启（数据位于缓存目录中）

要使用不同的配置文件：

如果 Chrome 正在运行，则终止它：await chromeLib.killChrome()
设置配置文件：{action: "set_profile", "payload": "my-profile"}
启动 Chrome：下一次导航/操作将使用新配置文件

导航并提取：
{action: "navigate", payload: "https://example.com"}
{action: "await_element", selector: "h1"}
{action: "extract", payload: "text", selector: "h1"}

填写并提交表单

{action: "navigate", payload: "https://example.com/login"}
{action: "await_element", selector: "input[name=email]"}
{action: "type", selector: "input[name=email]", payload: "user@example.com"}
{action: "type", selector: "input[name=password]", payload: "pass123\n"}
{action: "await_text", payload: "Welcome"}

密码末尾的 \n 用于提交表单。

多标签页工作流

{action: "list_tabs"}
{action: "click", tab_index: 2, selector: "a.email"}
{action: "await_element", tab_index: 2, selector: ".content"}
{action: "extract", tab_index: 2, payload: "text", selector: ".amount"}

{action: "navigate", payload: "https://example.com"}
{action: "type", selector: "input[name=q]", payload: "query"}
{action: "click", selector: "button.search"}
{action: "await_element", selector: ".results"}
{action: "extract", payload: "text", selector: ".result-title"}

{action: "navigate", payload: "https://example.com"}
{action: "await_element", selector: "a.download"}
{action: "attr", selector: "a.download", payload: "href"}

{action: "eval", payload: "document.querySelectorAll('a').length"}
{action: "eval", payload: "Array.from(document.querySelectorAll('a')).map(a => a.href)"}

调整视口大小（响应式测试）

使用 eval 调整浏览器窗口大小以测试响应式布局：

{action: "eval", payload: "window.resizeTo(375, 812); 'Resized to mobile'"}
{action: "eval", payload: "window.resizeTo(768, 1024); 'Resized to tablet'"}
{action: "eval", payload: "window.resizeTo(1920, 1080); 'Resized to desktop'"}

注意：这调整的是窗口大小，而不是设备模拟。它不会改变：

设备像素比（视网膜显示屏）
触摸事件
User-Agent 字符串

对于大多数响应式测试，调整窗口大小就足够了。

使用 eval 清除 JavaScript 可访问的 Cookie：

{action: "eval", payload: "document.cookie.split(';').forEach(c => { document.cookie = c.trim().split('=')[0] + '=;expires=Thu, 01 Jan 1970 00:00:00 GMT;path=/'; }); 'Cookies cleared'"}

注意：这清除的是 JavaScript 可访问的 Cookie。它不会清除：

httpOnly Cookie（仅服务器端）
来自其他域的 Cookie

对于大多数登出/重置场景，这已经足够了。

{action: "eval", payload: "window.scrollTo(0, document.body.scrollHeight); 'Scrolled to bottom'"}
{action: "eval", payload: "window.scrollTo(0, 0); 'Scrolled to top'"}
{action: "eval", payload: "document.querySelector('.target').scrollIntoView(); 'Scrolled to element'"}

交互前始终等待： 不要在导航后立即点击或填写 — 页面需要时间加载。

// 不好 - 如果页面加载慢可能会失败
{action: "navigate", payload: "https://example.com"}
{action: "click", selector: "button"}  // 可能失败！

// 好 - 先等待
{action: "navigate", payload: "https://example.com"}
{action: "await_element", selector: "button"}
{action: "click", selector: "button"}

使用特定的选择器： 避免匹配多个元素的通用选择器。

// 不好 - 匹配第一个按钮
{action: "click", selector: "button"}

// 好 - 具体
{action: "click", selector: "button[type=submit]"}
{action: "click", selector: "#login-button"}

使用 \n 提交表单： 在文本后附加换行符以自动提交表单。

{action: "type", selector: "#search", payload: "query\n"}

先检查内容： 在构建工作流之前，提取页面内容以验证选择器。

{action: "extract", payload: "html"}

找不到元素：

在交互前使用 await_element
使用 'html' 格式的 extract 操作验证选择器

增加超时时间：{timeout: 30000} 用于慢速页面
等待特定元素而不是文本

标签页索引超出范围：

使用 list_tabs 获取当前索引
当标签页关闭时，索引会改变

eval 返回 [object Object]：

对于复杂对象使用 JSON.stringify()：{action: "eval", payload: "JSON.stringify({name: 'test'})"}
对于异步函数：{action: "eval", payload: "JSON.stringify(await yourAsyncFunction())"}

测试自动化（高级）

构建测试自动化时，您有两种方法：

方法 1：use_browser MCP（简单测试）

最适合：单步测试、对话期间直接由 Claude 控制

{"action": "navigate", "payload": "https://app.com"}
{"action": "click", "selector": "#test-button"}
{"action": "eval", "payload": "JSON.stringify({passed: document.querySelector('.success') !== null})"}

方法 2：chrome-ws CLI（复杂测试）

最适合：多步测试套件、独立的自动化脚本

关键见解：chrome-ws 是展示正确 Chrome DevTools 协议使用方式的参考实现。当 use_browser 未按预期工作时，检查 chrome-ws 如何处理相同的操作。

# 示例：自动化表单测试
./chrome-ws navigate 0 "https://app.com/form"
./chrome-ws fill 0 "#email" "test@example.com"
./chrome-ws click 0 "button[type=submit]"
./chrome-ws wait-text 0 "Success"

当 use_browser 失败时

检查 chrome-ws 源代码 — 它展示了正确的 CDP 模式
使用 chrome-ws 验证 — 通过 CLI 测试相同的操作
调整模式 — 将有效的 CDP 方法应用于 use_browser

常见的测试自动化模式

表单验证：填写表单，检查错误状态
UI 状态测试：点击元素，验证 DOM 变化
性能测试：测量加载时间，捕获指标
屏幕截图比较：捕获前后状态

有关在 Claude Code 之外使用命令行的信息，请参阅 COMMANDLINE-USAGE.md。

有关详细示例，请参阅 EXAMPLES.md。

🇺🇸English

Browsing with Chrome Direct

Overview

Control Chrome via DevTools Protocol using the use_browser MCP tool. Single unified interface with auto-starting Chrome.

Announce: "I'm using the browsing skill to control Chrome."

When to Use

Use this when:

Controlling authenticated sessions
Managing multiple tabs in running browser
Playwright MCP unavailable or excessive

Use Playwright MCP when:

Need fresh browser instances
Generating screenshots/PDFs
Prefer higher-level abstractions

Auto-Capture

Every DOM action (navigate, click, type, select, eval, keyboard_press) automatically saves:

{prefix}.png — viewport screenshot
{prefix}.md — page content as structured markdown
{prefix}.html — full rendered DOM
{prefix}-console.txt — browser console messages

Files are saved to the session directory with sequential prefixes (001-navigate, 002-click, etc.). You must check these before using extract or screenshot actions.

The use_browser Tool

Single MCP tool with action-based interface. Chrome auto-starts on first use.

Parameters:

action (required): Operation to perform
tab_index (optional): Tab to operate on (default: 0)
selector (optional): CSS selector for element operations
payload (optional): Action-specific data
timeout (optional): Timeout in ms for await operations (default: 5000)

Actions Reference

Navigation

navigate : Navigate to URL
- payload: URL string
- Example: {action: "navigate", payload: "https://example.com"}
await_element : Wait for element to appear
- selector: CSS selector
- timeout: Max wait time in ms
- Example: {action: "await_element", selector: ".loaded", timeout: 10000}
await_text : Wait for text to appear
- payload: Text to wait for
- Example: {action: "await_text", payload: "Welcome"}

Interaction

click : Click element
- selector: CSS selector
- Example: {action: "click", selector: "button.submit"}
type : Type text into input (append \n to submit)
- selector: CSS selector
- payload: Text to type
- Example: {action: "type", selector: "#email", payload: "user@example.com\n"}
select : Select dropdown option
- selector: CSS selector

Extraction

extract : Get page content
- payload: Format ('markdown'|'text'|'html')
- selector: Optional - limit to element
- Example: {action: "extract", payload: "markdown"}
- Example: {action: "extract", payload: "text", selector: "h1"}
attr : Get element attribute
- selector: CSS selector
- payload: Attribute name
- Example: {action: "attr", selector: "a.download", payload: "href"}
: Execute JavaScript

Export

screenshot : Capture screenshot of a specific element
- payload: Filename
- selector: Optional - screenshot specific element
- Viewport screenshots are auto-captured after every DOM action. Use this only when you need a specific element.
- Example: {action: "screenshot", payload: "/tmp/chart.png", selector: ".chart"}

Tab Management

list_tabs : List all open tabs
- Example: {action: "list_tabs"}
new_tab : Create new tab
- Example: {action: "new_tab"}
close_tab : Close tab
- tab_index: Tab to close
- Example: {action: "close_tab", tab_index: 2}

Browser Mode Control

show_browser : Make browser window visible (headed mode)
- Example: {action: "show_browser"}
- ⚠️ WARNING : Restarts Chrome, reloads pages via GET, loses POST state
hide_browser : Switch to headless mode (invisible browser)
- Example: {action: "hide_browser"}
- ⚠️ WARNING : Restarts Chrome, reloads pages via GET, loses POST state
browser_mode : Check current browser mode, port, and profile
- Example: {action: "browser_mode"}
- Returns: {"headless": true|false, "mode": "headless"|"headed", "running": true|false, "port": 9222, "profile": "name", "profileDir": "/path"}

Profile Management

set_profile : Change Chrome profile (must kill Chrome first)
- Example: {action: "set_profile", "payload": "browser-user"}
- ⚠️ WARNING : Chrome must be stopped first
get_profile : Get current profile name and directory
- Example: {action: "get_profile"}
- Returns: {"profile": "name", "profileDir": "/path"}

Default behavior : Chrome starts in headless mode with "superpowers-chrome" profile on a dynamically allocated port (range 9222-12111). Override with CHROME_WS_PORT env var or MCP --port=N flag.

Critical caveats when toggling modes :

Chrome must restart - Cannot switch headless/headed mode on running Chrome
Pages reload via GET - All open tabs are reopened with GET requests
POST state is lost - Form submissions, POST results, and POST-based navigation will be lost
Session state is lost - Any client-side state (JavaScript variables, etc.) is cleared
Cookies/auth may persist - Uses same user data directory, so logged-in sessions may survive

When to use headed mode :

Debugging visual rendering issues
Demonstrating browser behavior to user
Testing features that only work with visible browser
Debugging issues that don't reproduce in headless mode

When to stay in headless mode (default):

All other cases - faster, cleaner, less intrusive
Screenshots work perfectly in headless mode
Most automation works identically in both modes

Profile management : Profiles store persistent browser data (cookies, localStorage, extensions, auth sessions).

Profile locations :

macOS: ~/Library/Caches/superpowers/browser-profiles/{name}/
Linux: ~/.cache/superpowers/browser-profiles/{name}/
Windows: %LOCALAPPDATA%/superpowers/browser-profiles/{name}/

When to use separate profiles :

Default profile ("superpowers-chrome") : General automation, shared sessions
Agent-specific profiles : Isolate different agents' browser state
- Example: browser-user agent uses "browser-user" profile
Task-specific profiles : Testing with different user contexts
- Example: "test-logged-in" vs "test-logged-out"

Profile data persists across :

Chrome restarts
Mode toggles (headless ↔ headed)
System reboots (data is in cache directory)

To use a different profile :

Kill Chrome if running: await chromeLib.killChrome()
Set profile: {action: "set_profile", "payload": "my-profile"}
Start Chrome: Next navigate/action will use new profile

Quick Start Pattern

Navigate and extract:
{action: "navigate", payload: "https://example.com"}
{action: "await_element", selector: "h1"}
{action: "extract", payload: "text", selector: "h1"}

Common Patterns

Fill and Submit Form

{action: "navigate", payload: "https://example.com/login"}
{action: "await_element", selector: "input[name=email]"}
{action: "type", selector: "input[name=email]", payload: "user@example.com"}
{action: "type", selector: "input[name=password]", payload: "pass123\n"}
{action: "await_text", payload: "Welcome"}

The \n at the end of the password submits the form.

Multi-Tab Workflow

{action: "list_tabs"}
{action: "click", tab_index: 2, selector: "a.email"}
{action: "await_element", tab_index: 2, selector: ".content"}
{action: "extract", tab_index: 2, payload: "text", selector: ".amount"}

Dynamic Content

{action: "navigate", payload: "https://example.com"}
{action: "type", selector: "input[name=q]", payload: "query"}
{action: "click", selector: "button.search"}
{action: "await_element", selector: ".results"}
{action: "extract", payload: "text", selector: ".result-title"}

Get Link Attribute

{action: "navigate", payload: "https://example.com"}
{action: "await_element", selector: "a.download"}
{action: "attr", selector: "a.download", payload: "href"}

Execute JavaScript

{action: "eval", payload: "document.querySelectorAll('a').length"}
{action: "eval", payload: "Array.from(document.querySelectorAll('a')).map(a => a.href)"}

Resize Viewport (Responsive Testing)

Use eval to resize the browser window for testing responsive layouts:

{action: "eval", payload: "window.resizeTo(375, 812); 'Resized to mobile'"}
{action: "eval", payload: "window.resizeTo(768, 1024); 'Resized to tablet'"}
{action: "eval", payload: "window.resizeTo(1920, 1080); 'Resized to desktop'"}

Note : This resizes the window, not device emulation. It won't change:

Device pixel ratio (retina displays)
Touch events
User-Agent string

For most responsive testing, window resize is sufficient.

Clear Cookies

Use eval to clear cookies accessible to JavaScript:

{action: "eval", payload: "document.cookie.split(';').forEach(c => { document.cookie = c.trim().split('=')[0] + '=;expires=Thu, 01 Jan 1970 00:00:00 GMT;path=/'; }); 'Cookies cleared'"}

Note : This clears cookies accessible to JavaScript. It won't clear:

httpOnly cookies (server-side only)
Cookies from other domains

For most logout/reset scenarios, this is sufficient.

Scroll Page

{action: "eval", payload: "window.scrollTo(0, document.body.scrollHeight); 'Scrolled to bottom'"}
{action: "eval", payload: "window.scrollTo(0, 0); 'Scrolled to top'"}
{action: "eval", payload: "document.querySelector('.target').scrollIntoView(); 'Scrolled to element'"}

Tips

Always wait before interaction: Don't click or fill immediately after navigate - pages need time to load.

// BAD - might fail if page slow
{action: "navigate", payload: "https://example.com"}
{action: "click", selector: "button"}  // May fail!

// GOOD - wait first
{action: "navigate", payload: "https://example.com"}
{action: "await_element", selector: "button"}
{action: "click", selector: "button"}

Use specific selectors: Avoid generic selectors that match multiple elements.

// BAD - matches first button
{action: "click", selector: "button"}

// GOOD - specific
{action: "click", selector: "button[type=submit]"}
{action: "click", selector: "#login-button"}

Submit forms with \n: Append newline to text to submit forms automatically.

{action: "type", selector: "#search", payload: "query\n"}

Check content first: Extract page content to verify selectors before building workflow.

{action: "extract", payload: "html"}

Troubleshooting

Element not found:

Use await_element before interaction
Verify selector with extract action using 'html' format

Timeout errors:

Increase timeout: {timeout: 30000} for slow pages
Wait for specific element instead of text

Tab index out of range:

Use list_tabs to get current indices
Tab indices change when tabs close

eval returns[object Object]:

Use JSON.stringify() for complex objects: {action: "eval", payload: "JSON.stringify({name: 'test'})"}
For async functions: {action: "eval", payload: "JSON.stringify(await yourAsyncFunction())"}

Test Automation (Advanced)

When building test automation, you have two approaches:

Approach 1: use_browser MCP (Simple Tests)

Best for: Single-step tests, direct Claude control during conversation

{"action": "navigate", "payload": "https://app.com"}
{"action": "click", "selector": "#test-button"}
{"action": "eval", "payload": "JSON.stringify({passed: document.querySelector('.success') !== null})"}

Approach 2: chrome-ws CLI (Complex Tests)

Best for: Multi-step test suites, standalone automation scripts

Key insight : chrome-ws is the reference implementation showing proper Chrome DevTools Protocol usage. When use_browser doesn't work as expected, examine how chrome-ws handles the same operation.

# Example: Automated form testing
./chrome-ws navigate 0 "https://app.com/form"
./chrome-ws fill 0 "#email" "test@example.com"
./chrome-ws click 0 "button[type=submit]"
./chrome-ws wait-text 0 "Success"

When use_browser Fails

Check chrome-ws source code - It shows the correct CDP pattern
Use chrome-ws to verify - Test the same operation via CLI
Adapt the pattern - Apply the working CDP approach to use_browser

Common Test Automation Patterns

Form validation : Fill forms, check error states
UI state testing : Click elements, verify DOM changes
Performance testing : Measure load times, capture metrics
Screenshot comparison : Capture before/after states

Advanced Usage

For command-line usage outside Claude Code, see COMMANDLINE-USAGE.md.

For detailed examples, see EXAMPLES.md.

Protocol Reference

Full CDP documentation: https://chromedevtools.github.io/devtools-protocol/

Weekly Installs

155

Repository

obra/superpowers-chrome

GitHub Stars

219

First Seen

Jan 29, 2026

Security Audits

Gen Agent Trust HubFail SocketPass SnykFail

Installed on

opencode149

codex146

gemini-cli139

github-copilot138

kimi-cli137

amp137

通过 LiteLLM 代理让 Claude Code 对接 GitHub Copilot 运行 | 高级变通方案指南

36,300 周安装

payload: Option value(s)

Example: {action: "select", selector: "select[name=state]", payload: "CA"}

payload: JavaScript code
Example: {action: "eval", payload: "document.title"}