agent-browser by everyinc/compound-engineering-plugin
npx skills add https://github.com/everyinc/compound-engineering-plugin --skill agent-browserCLI 直接通过 CDP 使用 Chrome/Chromium。通过 npm i -g agent-browser、brew install agent-browser 或 cargo install agent-browser 安装。运行 agent-browser install 以下载 Chrome。运行 agent-browser upgrade 以更新到最新版本。
每次浏览器自动化都遵循以下模式:
agent-browser open <url>agent-browser snapshot -i (获取元素引用,如 @e1, )广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
@e2agent-browser open https://example.com/form
agent-browser snapshot -i
# 输出: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i # 检查结果
命令可以通过 && 在单个 shell 调用中链式执行。浏览器通过后台守护进程在命令之间保持持久化,因此链式执行是安全的,并且比单独调用更高效。
# 在一个调用中链式执行 open + wait + snapshot
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
# 链式执行多个交互
agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "password123" && agent-browser click @e3
# 导航并捕获
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser screenshot page.png
何时使用链式执行: 当你在继续执行之前不需要读取中间命令的输出时(例如,open + wait + screenshot),使用 &&。当你需要先解析输出时(例如,先快照以发现引用,然后使用这些引用进行交互),请单独运行命令。
当自动化需要登录的网站时,选择适合的方法:
选项 1:从用户的浏览器导入身份验证(对于一次性任务最快)
# 连接到用户正在运行的 Chrome(他们已经登录)
agent-browser --auto-connect state save ./auth.json
# 使用该身份验证状态
agent-browser --state ./auth.json open https://app.example.com/dashboard
状态文件包含明文的会话令牌——请添加到 .gitignore 中,并在不再需要时删除。设置 AGENT_BROWSER_ENCRYPTION_KEY 以进行静态加密。
选项 2:持久化配置文件(对于重复性任务最简单)
# 首次运行:手动或通过自动化登录
agent-browser --profile ~/.myapp open https://app.example.com/login
# ... 填写凭据,提交 ...
# 所有后续运行:已经通过身份验证
agent-browser --profile ~/.myapp open https://app.example.com/dashboard
选项 3:会话名称(自动保存/恢复 cookies + localStorage)
agent-browser --session-name myapp open https://app.example.com/login
# ... 登录流程 ...
agent-browser close # 状态自动保存
# 下次:状态自动恢复
agent-browser --session-name myapp open https://app.example.com/dashboard
选项 4:身份验证保险库(凭据加密存储,按名称登录)
echo "$PASSWORD" | agent-browser auth save myapp --url https://app.example.com/login --username user --password-stdin
agent-browser auth login myapp
auth login 会先导航并等待 load 事件,然后等待登录表单选择器出现后再进行填充/点击,这在延迟的 SPA 登录屏幕上更可靠。
选项 5:状态文件(手动保存/加载)
# 登录后:
agent-browser state save ./auth.json
# 在未来的会话中:
agent-browser state load ./auth.json
agent-browser open https://app.example.com/dashboard
有关 OAuth、2FA、基于 cookie 的身份验证和令牌刷新模式,请参阅 references/authentication.md。
# 导航
agent-browser open <url> # 导航(别名:goto, navigate)
agent-browser close # 关闭浏览器
# 快照
agent-browser snapshot -i # 带引用的交互元素(推荐)
agent-browser snapshot -i -C # 包含光标交互元素(带 onclick、cursor:pointer 的 div)
agent-browser snapshot -s "#selector" # 限定到 CSS 选择器
# 交互(使用快照中的 @refs)
agent-browser click @e1 # 点击元素
agent-browser click @e1 --new-tab # 点击并在新标签页中打开
agent-browser fill @e2 "text" # 清除并输入文本
agent-browser type @e2 "text" # 输入而不清除
agent-browser select @e1 "option" # 选择下拉选项
agent-browser check @e1 # 勾选复选框
agent-browser press Enter # 按键
agent-browser keyboard type "text" # 在当前焦点处输入(无选择器)
agent-browser keyboard inserttext "text" # 插入而不触发按键事件
agent-browser scroll down 500 # 滚动页面
agent-browser scroll down 500 --selector "div.content" # 在特定容器内滚动
# 获取信息
agent-browser get text @e1 # 获取元素文本
agent-browser get url # 获取当前 URL
agent-browser get title # 获取页面标题
agent-browser get cdp-url # 获取 CDP WebSocket URL
# 等待
agent-browser wait @e1 # 等待元素出现
agent-browser wait --load networkidle # 等待网络空闲
agent-browser wait --url "**/page" # 等待 URL 模式匹配
agent-browser wait 2000 # 等待毫秒数
agent-browser wait --text "Welcome" # 等待文本出现(子字符串匹配)
agent-browser wait --fn "!document.body.innerText.includes('Loading...')" # 等待文本消失
agent-browser wait "#spinner" --state hidden # 等待元素消失
# 下载
agent-browser download @e1 ./file.pdf # 点击元素触发下载
agent-browser wait --download ./output.zip # 等待任何下载完成
agent-browser --download-path ./downloads open <url> # 设置默认下载目录
# 网络
agent-browser network requests # 检查跟踪的请求
agent-browser network route "**/api/*" --abort # 阻止匹配的请求
agent-browser network har start # 开始 HAR 记录
agent-browser network har stop ./capture.har # 停止并保存 HAR 文件
# 视口和设备模拟
agent-browser set viewport 1920 1080 # 设置视口大小(默认:1280x720)
agent-browser set viewport 1920 1080 2 # 2x 视网膜(相同 CSS 大小,更高分辨率截图)
agent-browser set device "iPhone 14" # 模拟设备(视口 + 用户代理)
# 捕获
agent-browser screenshot # 截图到临时目录
agent-browser screenshot --full # 整页截图
agent-browser screenshot --annotate # 带编号元素标签的标注截图
agent-browser screenshot --screenshot-dir ./shots # 保存到自定义目录
agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80
agent-browser pdf output.pdf # 保存为 PDF
# 剪贴板
agent-browser clipboard read # 从剪贴板读取文本
agent-browser clipboard write "Hello, World!" # 写入文本到剪贴板
agent-browser clipboard copy # 复制当前选择
agent-browser clipboard paste # 从剪贴板粘贴
# 差异比较(比较页面状态)
agent-browser diff snapshot # 比较当前与上次快照
agent-browser diff snapshot --baseline before.txt # 比较当前与保存的文件
agent-browser diff screenshot --baseline before.png # 视觉像素差异
agent-browser diff url <url1> <url2> # 比较两个页面
agent-browser diff url <url1> <url2> --wait-until networkidle # 自定义等待策略
agent-browser diff url <url1> <url2> --selector "#main" # 限定到元素
通过将字符串数组的 JSON 数组管道传输到 batch,在单个调用中执行多个命令。这避免了运行多步骤工作流时每个命令的进程启动开销。
echo '[
["open", "https://example.com"],
["snapshot", "-i"],
["click", "@e1"],
["screenshot", "result.png"]
]' | agent-browser batch --json
# 在第一个错误时停止
agent-browser batch --bail < commands.json
当你有一个已知的命令序列,且不依赖于中间输出时,使用 batch。当需要在步骤之间解析输出时(例如,先快照以发现引用,然后交互),请使用单独的命令或 && 链式执行。
agent-browser open https://example.com/signup
agent-browser snapshot -i
agent-browser fill @e1 "Jane Doe"
agent-browser fill @e2 "jane@example.com"
agent-browser select @e3 "California"
agent-browser check @e4
agent-browser click @e5
agent-browser wait --load networkidle
# 保存凭据一次(使用 AGENT_BROWSER_ENCRYPTION_KEY 加密)
# 推荐:通过 stdin 管道传输密码以避免 shell 历史记录暴露
echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin
# 使用保存的配置文件登录(LLM 永远不会看到密码)
agent-browser auth login github
# 列出/显示/删除配置文件
agent-browser auth list
agent-browser auth show github
agent-browser auth delete github
auth login 在交互之前会等待用户名/密码/提交选择器出现,其超时时间与默认操作超时时间相关联。
# 登录一次并保存状态
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "$USERNAME"
agent-browser fill @e2 "$PASSWORD"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json
# 在未来的会话中重用
agent-browser state load auth.json
agent-browser open https://app.example.com/dashboard
# 在浏览器重启时自动保存/恢复 cookies 和 localStorage
agent-browser --session-name myapp open https://app.example.com/login
# ... 登录流程 ...
agent-browser close # 状态自动保存到 ~/.agent-browser/sessions/
# 下次,状态自动加载
agent-browser --session-name myapp open https://app.example.com/dashboard
# 静态加密状态
export AGENT_BROWSER_ENCRYPTION_KEY=$(openssl rand -hex 32)
agent-browser --session-name secure open https://app.example.com
# 管理保存的状态
agent-browser state list
agent-browser state show myapp-default.json
agent-browser state clear myapp
agent-browser state clean --older-than 7
Iframe 内容会自动内联到快照中。Iframe 内的引用带有框架上下文,因此你可以直接与它们交互。
agent-browser open https://example.com/checkout
agent-browser snapshot -i
# @e1 [heading] "Checkout"
# @e2 [Iframe] "payment-frame"
# @e3 [input] "Card number"
# @e4 [input] "Expiry"
# @e5 [button] "Pay"
# 直接交互——无需切换框架
agent-browser fill @e3 "4111111111111111"
agent-browser fill @e4 "12/28"
agent-browser click @e5
# 要将快照限定到一个 iframe:
agent-browser frame @e2
agent-browser snapshot -i # 仅 iframe 内容
agent-browser frame main # 返回主框架
agent-browser open https://example.com/products
agent-browser snapshot -i
agent-browser get text @e5 # 获取特定元素文本
agent-browser get text body > page.txt # 获取所有页面文本
# JSON 输出用于解析
agent-browser snapshot -i --json
agent-browser get text @e1 --json
agent-browser --session site1 open https://site-a.com
agent-browser --session site2 open https://site-b.com
agent-browser --session site1 snapshot -i
agent-browser --session site2 snapshot -i
agent-browser session list
# 自动发现启用了远程调试的正在运行的 Chrome
agent-browser --auto-connect open https://example.com
agent-browser --auto-connect snapshot
# 或者使用显式的 CDP 端口
agent-browser --cdp 9222 snapshot
自动连接通过 DevToolsActivePort、常见调试端口(9222, 9229)发现 Chrome,如果基于 HTTP 的 CDP 发现失败,则回退到直接的 WebSocket 连接。
# 通过标志持久化深色模式(适用于所有页面和新标签页)
agent-browser --color-scheme dark open https://example.com
# 或者通过环境变量
AGENT_BROWSER_COLOR_SCHEME=dark agent-browser open https://example.com
# 或者在会话期间设置(对后续命令保持有效)
agent-browser set media dark
# 设置自定义视口大小(默认为 1280x720)
agent-browser set viewport 1920 1080
agent-browser screenshot desktop.png
# 测试移动端宽度布局
agent-browser set viewport 375 812
agent-browser screenshot mobile.png
# Retina/HiDPI:相同的 CSS 布局,2 倍像素密度
# 截图保持逻辑视口大小,但内容以更高 DPI 渲染
agent-browser set viewport 1920 1080 2
agent-browser screenshot retina.png
# 设备模拟(一步设置视口 + 用户代理)
agent-browser set device "iPhone 14"
agent-browser screenshot device.png
scale 参数(第三个参数)设置 window.devicePixelRatio 而不改变 CSS 布局。在测试视网膜渲染或捕获更高分辨率截图时使用它。
agent-browser --headed open https://example.com
agent-browser highlight @e1 # 高亮元素
agent-browser inspect # 为活动页面打开 Chrome DevTools
agent-browser record start demo.webm # 录制会话
agent-browser profiler start # 启动 Chrome DevTools 性能分析
agent-browser profiler stop trace.json # 停止并保存性能分析文件(路径可选)
使用 AGENT_BROWSER_HEADED=1 通过环境变量启用可视化模式。浏览器扩展在可视化和无头模式下都有效。
# 使用 file:// URL 打开本地文件
agent-browser --allow-file-access open file:///path/to/document.pdf
agent-browser --allow-file-access open file:///path/to/page.html
agent-browser screenshot output.png
# 列出可用的 iOS 模拟器
agent-browser device list
# 在特定设备上启动 Safari
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
# 与桌面端相同的工作流程 - 快照、交互、重新快照
agent-browser -p ios snapshot -i
agent-browser -p ios tap @e1 # 点击(click 的别名)
agent-browser -p ios fill @e2 "text"
agent-browser -p ios swipe up # 移动端特定手势
# 截图
agent-browser -p ios screenshot mobile.png
# 关闭会话(关闭模拟器)
agent-browser -p ios close
要求: 安装 Xcode 的 macOS,Appium(npm install -g appium && appium driver install xcuitest)
真实设备: 如果预先配置好,可以与物理 iOS 设备配合使用。使用 --device "<UDID>",其中 UDID 来自 xcrun xctrace list devices。
所有安全功能都是可选的。默认情况下,agent-browser 对导航、操作或输出不施加任何限制。
启用 --content-boundaries 可以将页面来源的输出包装在标记中,帮助 LLM 区分工具输出和不受信任的页面内容:
export AGENT_BROWSER_CONTENT_BOUNDARIES=1
agent-browser snapshot
# 输出:
# --- AGENT_BROWSER_PAGE_CONTENT nonce=<hex> origin=https://example.com ---
# [无障碍功能树]
# --- END_AGENT_BROWSER_PAGE_CONTENT nonce=<hex> ---
限制导航到受信任的域名。通配符如 *.example.com 也匹配裸域名 example.com。对非允许域名的子资源请求、WebSocket 和 EventSource 连接也会被阻止。请包含你的目标页面所依赖的 CDN 域名:
export AGENT_BROWSER_ALLOWED_DOMAINS="example.com,*.example.com"
agent-browser open https://example.com # 允许
agent-browser open https://malicious.com # 阻止
使用策略文件来限制破坏性操作:
export AGENT_BROWSER_ACTION_POLICY=./policy.json
示例 policy.json:
{ "default": "deny", "allow": ["navigate", "snapshot", "click", "scroll", "wait", "get"] }
身份验证保险库操作(auth login 等)绕过操作策略,但域名白名单仍然适用。
防止大页面导致上下文泛滥:
export AGENT_BROWSER_MAX_OUTPUT=50000
执行操作后使用 diff snapshot 来验证其是否产生了预期效果。这将当前的无障碍功能树与会话中上次拍摄的快照进行比较。
# 典型工作流程:快照 -> 操作 -> 差异比较
agent-browser snapshot -i # 拍摄基线快照
agent-browser click @e2 # 执行操作
agent-browser diff snapshot # 查看发生了什么变化(自动与上次快照比较)
用于视觉回归测试或监控:
# 保存基线截图,稍后比较
agent-browser screenshot baseline.png
# ... 时间流逝或发生更改 ...
agent-browser diff screenshot --baseline baseline.png
# 比较暂存环境与生产环境
agent-browser diff url https://staging.example.com https://prod.example.com --screenshot
diff snapshot 输出使用 + 表示添加,- 表示删除,类似于 git diff。diff screenshot 生成一个差异图像,其中更改的像素以红色高亮显示,并附带一个不匹配百分比。
默认超时时间为 25 秒。这可以通过 AGENT_BROWSER_DEFAULT_TIMEOUT 环境变量(以毫秒为单位的值)覆盖。对于慢速网站或大型页面,请使用显式等待,而不是依赖默认超时:
# 等待网络活动稳定(最适合慢速页面)
agent-browser wait --load networkidle
# 等待特定元素出现
agent-browser wait "#content"
agent-browser wait @e1
# 等待特定的 URL 模式(重定向后有用)
agent-browser wait --url "**/dashboard"
# 等待 JavaScript 条件
agent-browser wait --fn "document.readyState === 'complete'"
# 作为最后手段,等待固定时长(毫秒)
agent-browser wait 5000
当处理持续慢速的网站时,在 open 之后使用 wait --load networkidle 以确保页面在拍摄快照之前完全加载。如果特定元素渲染缓慢,请直接使用 wait <selector> 或 wait @ref 等待它。
当同时运行多个代理或自动化时,始终使用命名会话以避免冲突:
# 每个代理获得自己独立的会话
agent-browser --session agent1 open site-a.com
agent-browser --session agent2 open site-b.com
# 检查活动会话
agent-browser session list
完成后始终关闭浏览器会话,以避免进程泄漏:
agent-browser close # 关闭默认会话
agent-browser --session agent1 close # 关闭特定会话
如果之前的会话没有正确关闭,守护进程可能仍在运行。在开始新工作之前,使用 agent-browser close 来清理它。
要在不活动一段时间后自动关闭守护进程(适用于临时/CI 环境):
AGENT_BROWSER_IDLE_TIMEOUT_MS=60000 agent-browser open example.com
当页面更改时,引用(@e1、@e2 等)会失效。在以下操作后务必重新快照:
agent-browser click @e5 # 导航到新页面
agent-browser snapshot -i # 必须重新快照
agent-browser click @e1 # 使用新的引用
使用 --annotate 拍摄带有编号标签覆盖在交互元素上的截图。每个标签 [N] 映射到引用 @eN。这也会缓存引用,因此你可以立即与元素交互,而无需单独的快照。
agent-browser screenshot --annotate
# 输出包括图像路径和图例:
# [1] @e1 button "Submit"
# [2] @e2 link "Home"
# [3] @e3 textbox "Email"
agent-browser click @e2 # 使用标注截图中的引用进行点击
在以下情况下使用标注截图:
当引用不可用或不可靠时,使用语义定位器:
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find role button click --name "Submit"
agent-browser find placeholder "Search" type "query"
agent-browser find testid "submit-btn" click
使用 eval 在浏览器上下文中运行 JavaScript。Shell 引用可能会破坏复杂的表达式——使用 --stdin 或 -b 来避免问题。
# 简单表达式使用常规引用即可
agent-browser eval 'document.title'
agent-browser eval 'document.querySelectorAll("img").length'
# 复杂的 JS:使用 --stdin 和 heredoc(推荐)
agent-browser eval --stdin <<'EVALEOF'
JSON.stringify(
Array.from(document.querySelectorAll("img"))
.filter(i => !i.alt)
.map(i => ({ src: i.src.split("/").pop(), width: i.width }))
)
EVALEOF
# 替代方案:base64 编码(避免所有 shell 转义问题)
agent-browser eval -b "$(echo -n 'Array.from(document.querySelectorAll("a")).map(a => a.href)' | base64)"
为什么这很重要: 当 shell 处理你的命令时,内部的双引号、! 字符(历史扩展)、反引号和 $() 都可能在 JavaScript 到达 agent-browser 之前破坏它。--stdin 和 -b 标志完全绕过了 shell 解释。
经验法则:
eval 'expression' 使用单引号即可eval --stdin <<'EVALEOF'eval -b 配合 base64在项目根目录创建 agent-browser.json 用于持久化设置:
{
"headed": true,
"proxy": "http://localhost:8080",
"profile": "./browser-data"
}
优先级(从低到高):~/.agent-browser/config.json < ./agent-browser.json < 环境变量 < CLI 标志。使用 --config <path> 或 AGENT_BROWSER_CONFIG 环境变量指定自定义配置文件(如果缺失/无效则退出并报错)。所有 CLI 选项都映射到驼峰式键(例如,--executable-path -> "executablePath")。布尔标志接受 true/false 值(例如,--headed false 会覆盖配置)。用户和项目配置的扩展会被合并,而不是替换。
| 参考文档 | 何时使用 |
|---|---|
| references/commands.md | 包含所有选项的完整命令参考 |
| references/snapshot-refs.md | 引用生命周期、失效规则、故障排除 |
| references/session-management.md | 并行会话、状态持久化、并发抓取 |
| references/authentication.md | 登录流程、OAuth、2FA 处理、状态重用 |
| references/video-recording.md | 用于调试和文档记录的录制工作流程 |
| references/profiling.md | 用于性能分析的 Chrome DevTools 性能分析 |
| references/proxy-support.md | 代理配置、地理测试、轮换代理 |
使用 --engine 选择本地浏览器引擎。默认为 chrome。
# 使用 Lightpanda(快速无头浏览器,需要单独安装)
agent-browser --engine lightpanda open example.com
# 通过环境变量
export AGENT_BROWSER_ENGINE=lightpanda
agent-browser open example.com
# 使用自定义二进制路径
agent-browser --engine lightpanda --executable-path /path/to/lightpanda open example.com
支持的引擎:
chrome (默认) -- 通过 CDP 使用 Chrome/Chromiumlightpanda -- 通过 CDP 使用 Lightpanda 无头浏览器(比 Chrome 快 10 倍,内存少 10 倍)Lightpanda 不支持 --extension、--profile、--state 或 --allow-file-access。从 https://lightpanda.io/docs/open-source/installation 安装 Lightpanda。
| 模板 | 描述 |
|---|---|
| templates/form-automation.sh | 带验证的表单填充 |
| templates/authenticated-session.sh | 登录一次,重用状态 |
| templates/capture-workflow.sh | 带截图的内容提取 |
./templates/form-automation.sh https://example.com/form
./templates/authenticated-session.sh https://app.example.com/login
./templates/capture-workflow.sh https://example.com ./output
每周安装量
323
仓库
GitHub 星标数
11.0K
首次出现
Jan 21, 2026
安全审计
安装于
opencode291
gemini-cli288
codex284
github-copilot270
cursor268
amp245
The CLI uses Chrome/Chromium via CDP directly. Install via npm i -g agent-browser, brew install agent-browser, or cargo install agent-browser. Run agent-browser install to download Chrome. Run agent-browser upgrade to update to the latest version.
Every browser automation follows this pattern:
agent-browser open <url>agent-browser snapshot -i (get element refs like @e1, @e2)agent-browser open https://example.com/form
agent-browser snapshot -i
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i # Check result
Commands can be chained with && in a single shell invocation. The browser persists between commands via a background daemon, so chaining is safe and more efficient than separate calls.
# Chain open + wait + snapshot in one call
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
# Chain multiple interactions
agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "password123" && agent-browser click @e3
# Navigate and capture
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser screenshot page.png
When to chain: Use && when you don't need to read the output of an intermediate command before proceeding (e.g., open + wait + screenshot). Run commands separately when you need to parse the output first (e.g., snapshot to discover refs, then interact using those refs).
When automating a site that requires login, choose the approach that fits:
Option 1: Import auth from the user's browser (fastest for one-off tasks)
# Connect to the user's running Chrome (they're already logged in)
agent-browser --auto-connect state save ./auth.json
# Use that auth state
agent-browser --state ./auth.json open https://app.example.com/dashboard
State files contain session tokens in plaintext -- add to .gitignore and delete when no longer needed. Set AGENT_BROWSER_ENCRYPTION_KEY for encryption at rest.
Option 2: Persistent profile (simplest for recurring tasks)
# First run: login manually or via automation
agent-browser --profile ~/.myapp open https://app.example.com/login
# ... fill credentials, submit ...
# All future runs: already authenticated
agent-browser --profile ~/.myapp open https://app.example.com/dashboard
Option 3: Session name (auto-save/restore cookies + localStorage)
agent-browser --session-name myapp open https://app.example.com/login
# ... login flow ...
agent-browser close # State auto-saved
# Next time: state auto-restored
agent-browser --session-name myapp open https://app.example.com/dashboard
Option 4: Auth vault (credentials stored encrypted, login by name)
echo "$PASSWORD" | agent-browser auth save myapp --url https://app.example.com/login --username user --password-stdin
agent-browser auth login myapp
auth login navigates with load and then waits for login form selectors to appear before filling/clicking, which is more reliable on delayed SPA login screens.
Option 5: State file (manual save/load)
# After logging in:
agent-browser state save ./auth.json
# In a future session:
agent-browser state load ./auth.json
agent-browser open https://app.example.com/dashboard
See references/authentication.md for OAuth, 2FA, cookie-based auth, and token refresh patterns.
# Navigation
agent-browser open <url> # Navigate (aliases: goto, navigate)
agent-browser close # Close browser
# Snapshot
agent-browser snapshot -i # Interactive elements with refs (recommended)
agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, cursor:pointer)
agent-browser snapshot -s "#selector" # Scope to CSS selector
# Interaction (use @refs from snapshot)
agent-browser click @e1 # Click element
agent-browser click @e1 --new-tab # Click and open in new tab
agent-browser fill @e2 "text" # Clear and type text
agent-browser type @e2 "text" # Type without clearing
agent-browser select @e1 "option" # Select dropdown option
agent-browser check @e1 # Check checkbox
agent-browser press Enter # Press key
agent-browser keyboard type "text" # Type at current focus (no selector)
agent-browser keyboard inserttext "text" # Insert without key events
agent-browser scroll down 500 # Scroll page
agent-browser scroll down 500 --selector "div.content" # Scroll within a specific container
# Get information
agent-browser get text @e1 # Get element text
agent-browser get url # Get current URL
agent-browser get title # Get page title
agent-browser get cdp-url # Get CDP WebSocket URL
# Wait
agent-browser wait @e1 # Wait for element
agent-browser wait --load networkidle # Wait for network idle
agent-browser wait --url "**/page" # Wait for URL pattern
agent-browser wait 2000 # Wait milliseconds
agent-browser wait --text "Welcome" # Wait for text to appear (substring match)
agent-browser wait --fn "!document.body.innerText.includes('Loading...')" # Wait for text to disappear
agent-browser wait "#spinner" --state hidden # Wait for element to disappear
# Downloads
agent-browser download @e1 ./file.pdf # Click element to trigger download
agent-browser wait --download ./output.zip # Wait for any download to complete
agent-browser --download-path ./downloads open <url> # Set default download directory
# Network
agent-browser network requests # Inspect tracked requests
agent-browser network route "**/api/*" --abort # Block matching requests
agent-browser network har start # Start HAR recording
agent-browser network har stop ./capture.har # Stop and save HAR file
# Viewport & Device Emulation
agent-browser set viewport 1920 1080 # Set viewport size (default: 1280x720)
agent-browser set viewport 1920 1080 2 # 2x retina (same CSS size, higher res screenshots)
agent-browser set device "iPhone 14" # Emulate device (viewport + user agent)
# Capture
agent-browser screenshot # Screenshot to temp dir
agent-browser screenshot --full # Full page screenshot
agent-browser screenshot --annotate # Annotated screenshot with numbered element labels
agent-browser screenshot --screenshot-dir ./shots # Save to custom directory
agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80
agent-browser pdf output.pdf # Save as PDF
# Clipboard
agent-browser clipboard read # Read text from clipboard
agent-browser clipboard write "Hello, World!" # Write text to clipboard
agent-browser clipboard copy # Copy current selection
agent-browser clipboard paste # Paste from clipboard
# Diff (compare page states)
agent-browser diff snapshot # Compare current vs last snapshot
agent-browser diff snapshot --baseline before.txt # Compare current vs saved file
agent-browser diff screenshot --baseline before.png # Visual pixel diff
agent-browser diff url <url1> <url2> # Compare two pages
agent-browser diff url <url1> <url2> --wait-until networkidle # Custom wait strategy
agent-browser diff url <url1> <url2> --selector "#main" # Scope to element
Execute multiple commands in a single invocation by piping a JSON array of string arrays to batch. This avoids per-command process startup overhead when running multi-step workflows.
echo '[
["open", "https://example.com"],
["snapshot", "-i"],
["click", "@e1"],
["screenshot", "result.png"]
]' | agent-browser batch --json
# Stop on first error
agent-browser batch --bail < commands.json
Use batch when you have a known sequence of commands that don't depend on intermediate output. Use separate commands or && chaining when you need to parse output between steps (e.g., snapshot to discover refs, then interact).
agent-browser open https://example.com/signup
agent-browser snapshot -i
agent-browser fill @e1 "Jane Doe"
agent-browser fill @e2 "jane@example.com"
agent-browser select @e3 "California"
agent-browser check @e4
agent-browser click @e5
agent-browser wait --load networkidle
# Save credentials once (encrypted with AGENT_BROWSER_ENCRYPTION_KEY)
# Recommended: pipe password via stdin to avoid shell history exposure
echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin
# Login using saved profile (LLM never sees password)
agent-browser auth login github
# List/show/delete profiles
agent-browser auth list
agent-browser auth show github
agent-browser auth delete github
auth login waits for username/password/submit selectors before interacting, with a timeout tied to the default action timeout.
# Login once and save state
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "$USERNAME"
agent-browser fill @e2 "$PASSWORD"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json
# Reuse in future sessions
agent-browser state load auth.json
agent-browser open https://app.example.com/dashboard
# Auto-save/restore cookies and localStorage across browser restarts
agent-browser --session-name myapp open https://app.example.com/login
# ... login flow ...
agent-browser close # State auto-saved to ~/.agent-browser/sessions/
# Next time, state is auto-loaded
agent-browser --session-name myapp open https://app.example.com/dashboard
# Encrypt state at rest
export AGENT_BROWSER_ENCRYPTION_KEY=$(openssl rand -hex 32)
agent-browser --session-name secure open https://app.example.com
# Manage saved states
agent-browser state list
agent-browser state show myapp-default.json
agent-browser state clear myapp
agent-browser state clean --older-than 7
Iframe content is automatically inlined in snapshots. Refs inside iframes carry frame context, so you can interact with them directly.
agent-browser open https://example.com/checkout
agent-browser snapshot -i
# @e1 [heading] "Checkout"
# @e2 [Iframe] "payment-frame"
# @e3 [input] "Card number"
# @e4 [input] "Expiry"
# @e5 [button] "Pay"
# Interact directly — no frame switch needed
agent-browser fill @e3 "4111111111111111"
agent-browser fill @e4 "12/28"
agent-browser click @e5
# To scope a snapshot to one iframe:
agent-browser frame @e2
agent-browser snapshot -i # Only iframe content
agent-browser frame main # Return to main frame
agent-browser open https://example.com/products
agent-browser snapshot -i
agent-browser get text @e5 # Get specific element text
agent-browser get text body > page.txt # Get all page text
# JSON output for parsing
agent-browser snapshot -i --json
agent-browser get text @e1 --json
agent-browser --session site1 open https://site-a.com
agent-browser --session site2 open https://site-b.com
agent-browser --session site1 snapshot -i
agent-browser --session site2 snapshot -i
agent-browser session list
# Auto-discover running Chrome with remote debugging enabled
agent-browser --auto-connect open https://example.com
agent-browser --auto-connect snapshot
# Or with explicit CDP port
agent-browser --cdp 9222 snapshot
Auto-connect discovers Chrome via DevToolsActivePort, common debugging ports (9222, 9229), and falls back to a direct WebSocket connection if HTTP-based CDP discovery fails.
# Persistent dark mode via flag (applies to all pages and new tabs)
agent-browser --color-scheme dark open https://example.com
# Or via environment variable
AGENT_BROWSER_COLOR_SCHEME=dark agent-browser open https://example.com
# Or set during session (persists for subsequent commands)
agent-browser set media dark
# Set a custom viewport size (default is 1280x720)
agent-browser set viewport 1920 1080
agent-browser screenshot desktop.png
# Test mobile-width layout
agent-browser set viewport 375 812
agent-browser screenshot mobile.png
# Retina/HiDPI: same CSS layout at 2x pixel density
# Screenshots stay at logical viewport size, but content renders at higher DPI
agent-browser set viewport 1920 1080 2
agent-browser screenshot retina.png
# Device emulation (sets viewport + user agent in one step)
agent-browser set device "iPhone 14"
agent-browser screenshot device.png
The scale parameter (3rd argument) sets window.devicePixelRatio without changing CSS layout. Use it when testing retina rendering or capturing higher-resolution screenshots.
agent-browser --headed open https://example.com
agent-browser highlight @e1 # Highlight element
agent-browser inspect # Open Chrome DevTools for the active page
agent-browser record start demo.webm # Record session
agent-browser profiler start # Start Chrome DevTools profiling
agent-browser profiler stop trace.json # Stop and save profile (path optional)
Use AGENT_BROWSER_HEADED=1 to enable headed mode via environment variable. Browser extensions work in both headed and headless mode.
# Open local files with file:// URLs
agent-browser --allow-file-access open file:///path/to/document.pdf
agent-browser --allow-file-access open file:///path/to/page.html
agent-browser screenshot output.png
# List available iOS simulators
agent-browser device list
# Launch Safari on a specific device
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
# Same workflow as desktop - snapshot, interact, re-snapshot
agent-browser -p ios snapshot -i
agent-browser -p ios tap @e1 # Tap (alias for click)
agent-browser -p ios fill @e2 "text"
agent-browser -p ios swipe up # Mobile-specific gesture
# Take screenshot
agent-browser -p ios screenshot mobile.png
# Close session (shuts down simulator)
agent-browser -p ios close
Requirements: macOS with Xcode, Appium (npm install -g appium && appium driver install xcuitest)
Real devices: Works with physical iOS devices if pre-configured. Use --device "<UDID>" where UDID is from xcrun xctrace list devices.
All security features are opt-in. By default, agent-browser imposes no restrictions on navigation, actions, or output.
Enable --content-boundaries to wrap page-sourced output in markers that help LLMs distinguish tool output from untrusted page content:
export AGENT_BROWSER_CONTENT_BOUNDARIES=1
agent-browser snapshot
# Output:
# --- AGENT_BROWSER_PAGE_CONTENT nonce=<hex> origin=https://example.com ---
# [accessibility tree]
# --- END_AGENT_BROWSER_PAGE_CONTENT nonce=<hex> ---
Restrict navigation to trusted domains. Wildcards like *.example.com also match the bare domain example.com. Sub-resource requests, WebSocket, and EventSource connections to non-allowed domains are also blocked. Include CDN domains your target pages depend on:
export AGENT_BROWSER_ALLOWED_DOMAINS="example.com,*.example.com"
agent-browser open https://example.com # OK
agent-browser open https://malicious.com # Blocked
Use a policy file to gate destructive actions:
export AGENT_BROWSER_ACTION_POLICY=./policy.json
Example policy.json:
{ "default": "deny", "allow": ["navigate", "snapshot", "click", "scroll", "wait", "get"] }
Auth vault operations (auth login, etc.) bypass action policy but domain allowlist still applies.
Prevent context flooding from large pages:
export AGENT_BROWSER_MAX_OUTPUT=50000
Use diff snapshot after performing an action to verify it had the intended effect. This compares the current accessibility tree against the last snapshot taken in the session.
# Typical workflow: snapshot -> action -> diff
agent-browser snapshot -i # Take baseline snapshot
agent-browser click @e2 # Perform action
agent-browser diff snapshot # See what changed (auto-compares to last snapshot)
For visual regression testing or monitoring:
# Save a baseline screenshot, then compare later
agent-browser screenshot baseline.png
# ... time passes or changes are made ...
agent-browser diff screenshot --baseline baseline.png
# Compare staging vs production
agent-browser diff url https://staging.example.com https://prod.example.com --screenshot
diff snapshot output uses + for additions and - for removals, similar to git diff. diff screenshot produces a diff image with changed pixels highlighted in red, plus a mismatch percentage.
The default timeout is 25 seconds. This can be overridden with the AGENT_BROWSER_DEFAULT_TIMEOUT environment variable (value in milliseconds). For slow websites or large pages, use explicit waits instead of relying on the default timeout:
# Wait for network activity to settle (best for slow pages)
agent-browser wait --load networkidle
# Wait for a specific element to appear
agent-browser wait "#content"
agent-browser wait @e1
# Wait for a specific URL pattern (useful after redirects)
agent-browser wait --url "**/dashboard"
# Wait for a JavaScript condition
agent-browser wait --fn "document.readyState === 'complete'"
# Wait a fixed duration (milliseconds) as a last resort
agent-browser wait 5000
When dealing with consistently slow websites, use wait --load networkidle after open to ensure the page is fully loaded before taking a snapshot. If a specific element is slow to render, wait for it directly with wait <selector> or wait @ref.
When running multiple agents or automations concurrently, always use named sessions to avoid conflicts:
# Each agent gets its own isolated session
agent-browser --session agent1 open site-a.com
agent-browser --session agent2 open site-b.com
# Check active sessions
agent-browser session list
Always close your browser session when done to avoid leaked processes:
agent-browser close # Close default session
agent-browser --session agent1 close # Close specific session
If a previous session was not closed properly, the daemon may still be running. Use agent-browser close to clean it up before starting new work.
To auto-shutdown the daemon after a period of inactivity (useful for ephemeral/CI environments):
AGENT_BROWSER_IDLE_TIMEOUT_MS=60000 agent-browser open example.com
Refs (@e1, @e2, etc.) are invalidated when the page changes. Always re-snapshot after:
Clicking links or buttons that navigate
Form submissions
Dynamic content loading (dropdowns, modals)
agent-browser click @e5 # Navigates to new page agent-browser snapshot -i # MUST re-snapshot agent-browser click @e1 # Use new refs
Use --annotate to take a screenshot with numbered labels overlaid on interactive elements. Each label [N] maps to ref @eN. This also caches refs, so you can interact with elements immediately without a separate snapshot.
agent-browser screenshot --annotate
# Output includes the image path and a legend:
# [1] @e1 button "Submit"
# [2] @e2 link "Home"
# [3] @e3 textbox "Email"
agent-browser click @e2 # Click using ref from annotated screenshot
Use annotated screenshots when:
When refs are unavailable or unreliable, use semantic locators:
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find role button click --name "Submit"
agent-browser find placeholder "Search" type "query"
agent-browser find testid "submit-btn" click
Use eval to run JavaScript in the browser context. Shell quoting can corrupt complex expressions -- use --stdin or -b to avoid issues.
# Simple expressions work with regular quoting
agent-browser eval 'document.title'
agent-browser eval 'document.querySelectorAll("img").length'
# Complex JS: use --stdin with heredoc (RECOMMENDED)
agent-browser eval --stdin <<'EVALEOF'
JSON.stringify(
Array.from(document.querySelectorAll("img"))
.filter(i => !i.alt)
.map(i => ({ src: i.src.split("/").pop(), width: i.width }))
)
EVALEOF
# Alternative: base64 encoding (avoids all shell escaping issues)
agent-browser eval -b "$(echo -n 'Array.from(document.querySelectorAll("a")).map(a => a.href)' | base64)"
Why this matters: When the shell processes your command, inner double quotes, ! characters (history expansion), backticks, and $() can all corrupt the JavaScript before it reaches agent-browser. The --stdin and -b flags bypass shell interpretation entirely.
Rules of thumb:
eval 'expression' with single quotes is fineeval --stdin <<'EVALEOF'eval -b with base64Create agent-browser.json in the project root for persistent settings:
{
"headed": true,
"proxy": "http://localhost:8080",
"profile": "./browser-data"
}
Priority (lowest to highest): ~/.agent-browser/config.json < ./agent-browser.json < env vars < CLI flags. Use --config <path> or AGENT_BROWSER_CONFIG env var for a custom config file (exits with error if missing/invalid). All CLI options map to camelCase keys (e.g., --executable-path -> "executablePath"). Boolean flags accept true/false values (e.g., --headed false overrides config). Extensions from user and project configs are merged, not replaced.
| Reference | When to Use |
|---|---|
| references/commands.md | Full command reference with all options |
| references/snapshot-refs.md | Ref lifecycle, invalidation rules, troubleshooting |
| references/session-management.md | Parallel sessions, state persistence, concurrent scraping |
| references/authentication.md | Login flows, OAuth, 2FA handling, state reuse |
| references/video-recording.md | Recording workflows for debugging and documentation |
| references/profiling.md | Chrome DevTools profiling for performance analysis |
Use --engine to choose a local browser engine. The default is chrome.
# Use Lightpanda (fast headless browser, requires separate install)
agent-browser --engine lightpanda open example.com
# Via environment variable
export AGENT_BROWSER_ENGINE=lightpanda
agent-browser open example.com
# With custom binary path
agent-browser --engine lightpanda --executable-path /path/to/lightpanda open example.com
Supported engines:
chrome (default) -- Chrome/Chromium via CDPlightpanda -- Lightpanda headless browser via CDP (10x faster, 10x less memory than Chrome)Lightpanda does not support --extension, --profile, --state, or --allow-file-access. Install Lightpanda from https://lightpanda.io/docs/open-source/installation.
| Template | Description |
|---|---|
| templates/form-automation.sh | Form filling with validation |
| templates/authenticated-session.sh | Login once, reuse state |
| templates/capture-workflow.sh | Content extraction with screenshots |
./templates/form-automation.sh https://example.com/form
./templates/authenticated-session.sh https://app.example.com/login
./templates/capture-workflow.sh https://example.com ./output
Weekly Installs
323
Repository
GitHub Stars
11.0K
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
opencode291
gemini-cli288
codex284
github-copilot270
cursor268
amp245
xdrop 文件传输脚本:Bun 环境下安全上传下载工具,支持加密分享
20,700 周安装
| references/proxy-support.md | Proxy configuration, geo-testing, rotating proxies |