Chrome自动化技能：使用agent-browser在真实浏览器会话中自动化任务

chrome-automation by zc277584121/marketing-skills

105 周安装量

GitHub

安装命令

npx skills add https://github.com/zc277584121/marketing-skills --skill chrome-automation

自动化营销测试

🇨🇳中文介绍

技能：Chrome 自动化 (agent-browser)

通过 agent-browser CLI，在用户的真实 Chrome 会话中自动化浏览器任务。

前提条件：必须安装 agent-browser 并且 Chrome 必须已启用远程调试。如果不确定，请参阅 references/agent-browser-setup.md。

核心原则：复用用户现有的 Chrome

此技能在单个 Chrome 进程上运行——即用户的真实浏览器。没有会话管理，没有单独的配置文件，也不会启动新的 Playwright 浏览器。

始终先列出标签页

在打开任何新页面之前，始终先列出已存在的标签页：

agent-browser --auto-connect tab list

这将返回所有打开的标签页及其索引号、标题和 URL。检查你需要的页面是否已经打开：

如果目标页面已打开 → 直接切换到该标签页，而不是打开新标签页。用户很可能已经打开了它，因为他们已经登录并且页面处于正确的状态。
```
agent-browser --auto-connect tab <index>
```
如果目标页面未打开 → 在当前标签页或新标签页中打开它。
```
agent-browser --auto-connect open <url>
```

🇺🇸English

Skill: Chrome Automation (agent-browser)

Automate browser tasks in the user's real Chrome session via the agent-browser CLI.

Prerequisite : agent-browser must be installed and Chrome must have remote debugging enabled. See references/agent-browser-setup.md if unsure.

Core Principle: Reuse the User's Existing Chrome

This skill operates on a single Chrome process — the user's real browser. There is no session management, no separate profiles, no launching a fresh Playwright browser.

Always Start by Listing Tabs

Before opening any new page, always list existing tabs first :

agent-browser --auto-connect tab list

This returns all open tabs with their index numbers, titles, and URLs. Check if the page you need is already open:

If the target page is already open → switch to that tab directly instead of opening a new one. The user likely has it open because they are already logged in and the page is in the right state.
```
agent-browser --auto-connect tab <index>
```

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

为什么这很重要

用户的 Chrome 拥有其 cookies、登录会话和浏览器状态
当页面已存在时打开新页面会浪费时间，并可能丢失登录状态
许多营销平台（社交媒体仪表板、广告管理器、CMS 工具）需要登录——复用现有的已登录标签页可以避免重新认证

始终使用 --auto-connect 连接到用户正在运行的 Chrome 实例：

agent-browser --auto-connect <command>

这会自动发现已启用远程调试的 Chrome。如果连接失败，请指导用户启用远程调试（参见 references/agent-browser-setup.md）。

# 列出标签页以查找现有页面
agent-browser --auto-connect tab list

# 切换到现有标签页（如果找到）
agent-browser --auto-connect tab <index>

# 或者打开新页面
agent-browser --auto-connect open https://example.com
agent-browser --auto-connect wait --load networkidle

# 获取快照以查看交互元素
agent-browser --auto-connect snapshot -i

# 点击、填充等操作
agent-browser --auto-connect click @e3
agent-browser --auto-connect fill @e5 "some text"

2. 从页面提取数据

# 获取所有文本内容
agent-browser --auto-connect get text body

# 截图以供视觉检查
agent-browser --auto-connect screenshot

# 执行 JavaScript 以获取结构化数据
agent-browser --auto-connect eval "JSON.stringify(document.querySelectorAll('table tr').length)"

3. 回放 Chrome DevTools 录制

用户可能会提供从 Chrome DevTools Recorder 导出的录制文件（JSON、Puppeteer JS 或 @puppeteer/replay JS 格式）。请参阅下面的“回放录制”。

使用 snapshot -i 查看所有带有引用 (@e1, @e2, ...) 的交互元素：

agent-browser --auto-connect snapshot -i

输出会列出每个交互元素的角色、文本和引用。使用这些引用进行后续操作。

操作	命令
导航	`agent-browser --auto-connect open <url>`（可选 `wait --load networkidle`，但像 Reddit 这样的网站永远不会达到 networkidle 状态——如果 `open` 后已显示页面标题，则跳过等待）
点击	`snapshot -i` → 找到引用 → `click @eN`
填充标准输入	`click @eN` → `fill @eN "text"`
填充富文本编辑器	`click @eN` → `keyboard inserttext "text"`
按键	`press <key>` (Enter, Tab, Escape 等)
滚动	`scroll down <amount>` 或 `scroll up <amount>`
等待元素	`wait @eN` 或 `wait "<css-selector>"`
截图	`screenshot` 或 `screenshot --annotate`
获取页面文本	`get text body`
获取当前 URL	`get url`
运行 JavaScript	`eval <js>`

如何区分输入类型

标准输入框/文本域 → 使用 fill
可内容编辑的 div / 富文本编辑器（LinkedIn 消息框、Gmail 撰写、Slack、CMS 编辑器）→ 先点击/聚焦，然后使用 keyboard inserttext

引用 (@e1, @e2, ...) 在页面发生变化时会失效。在以下操作后务必重新获取快照：

点击触发导航的链接或按钮
提交表单
触发动态内容加载（AJAX、SPA 导航）

在每个重要操作后，验证结果：

agent-browser --auto-connect snapshot -i   # 检查交互状态
agent-browser --auto-connect screenshot     # 视觉验证

JSON（推荐）— 结构化，可以逐步读取：

# 计算步骤数
jq '.steps | length' recording.json

# 读取前 5 个步骤
jq '.steps[0:5]' recording.json

@puppeteer/replay JS (import { createRunner })
Puppeteer JS (require('puppeteer'), page.goto, Locator.race)

解析录制 — 在操作前理解完整意图。总结录制文件的作用。
先列出标签页 — 检查目标页面是否已打开。
导航 — 执行 navigate 步骤，尽可能复用现有标签页。
对于每个交互步骤：
- 获取快照 (snapshot -i) 以查看当前交互元素
- 将录制中的 aria/... 选择器与快照匹配
- 回退到 text/...，然后是 CSS 类提示，最后是截图
- 不要依赖 ember IDs、数字 IDs 或精确的 XPaths — 这些在每次页面加载时都会改变
每一步后验证 — 通过快照或截图确认

大量使用 Iframe 的网站

snapshot -i 仅在主框架上操作，无法穿透 iframe。像 LinkedIn、Gmail 和嵌入式编辑器这样的网站会在 iframe 内渲染内容。

检测 Iframe 问题

snapshot -i 返回异常简短或为空的结果
录制引用的元素未出现在快照输出中
get text body 的内容与截图显示的不匹配

使用eval 访问 iframe 内容：

agent-browser --auto-connect eval --stdin <<'EVALEOF'
const frame = document.querySelector('iframe[data-testid="interop-iframe"]');
const doc = frame.contentDocument;
const btn = doc.querySelector('button[aria-label="Send"]');
btn.click();
EVALEOF

注意：仅适用于同源 iframe。

使用keyboard 进行盲输入：如果 iframe 元素已获得焦点，keyboard inserttext "..." 会发送文本，无论框架边界如何。
使用get text body 读取包括 iframe 在内的完整页面内容。
使用screenshot 在快照不可靠时进行视觉验证。

如果在同一步骤上尝试了 2 次解决方法后仍然失败，请暂停并解释：

页面使用了无法通过快照访问的 iframe
你需要哪个元素以及你的预期是什么
请用户手动执行该步骤，然后继续

自动处理（不要停止）：

弹窗或横幅 → 关闭它们 (find text "Dismiss" click 或 find text "Close" click)
Cookie 同意对话框 → 接受或关闭
工具提示覆盖层 → 先关闭它们
快照中找不到元素 → 尝试 find text "..." click，或滚动显示 scroll down 300

暂停并询问用户：

需要登录 / 身份验证
出现验证码
页面结构与预期完全不同
即将发生破坏性操作（删除数据、发送真实内容）— 先确认
在同一步骤上卡住超过 2 次尝试
所有 iframe 解决方法都已失败

暂停时，请清晰解释：你进行到哪一步，你预期是什么，以及你看到了什么。

命令	描述
`tab list`	列出所有打开的标签页，包括索引、标题和 URL
`tab <index>`	按索引切换到现有标签页
`tab new`	打开新的空白标签页
`tab close`	关闭当前标签页
`open <url>`	导航到 URL
`snapshot -i`	列出带有引用的交互元素
`click @eN`	按引用点击元素
`fill @eN "text"`	清除并填充标准输入框/文本域
`type @eN "text"`	输入而不清除
`keyboard inserttext "text"`	插入文本（最适合可内容编辑元素）
`press <key>`	按下键盘按键
`scroll down/up <amount>`	按像素滚动页面
`wait @eN`	等待元素出现
`wait --load networkidle`	等待网络空闲
`wait <ms>`	等待一段时间
`screenshot [path]`	截图
`screenshot --annotate`	带编号标签的截图
`eval <js>`	在页面中执行 JavaScript
`get text body`	获取所有文本内容
`get url`	获取当前 URL
`set viewport <w> <h>`	设置视口大小
`find text "..." click`	语义查找并点击
`close`	关闭浏览器会话

Iframe 盲区：snapshot -i 无法查看 iframe 内部。请参阅“大量使用 Iframe 的网站”。
find text 严格模式：当多个元素匹配时会失败。使用 snapshot -i 来定位特定的引用。
fill 与可内容编辑元素：fill 仅适用于 <input> 和 <textarea>。对于富文本编辑器，请使用 keyboard inserttext。
eval 仅限主框架：要与 iframe 内容交互，需通过 document.querySelector('iframe').contentDocument... 遍历

当用户请求跨多个平台的操作时（例如，“将这篇文章发布到 Dev.to、LinkedIn 和 X”），不要尝试在单个对话中完成所有平台。相反，启动顺序的 Agent 子代理，每个平台一个。

为什么使用子代理

每个平台操作消耗约 25-40K tokens（参考文件 + 快照 + 交互）。在一个上下文中运行 3-5 个平台有达到 200K token 限制的风险，并降低后期平台的准确性。每个子代理都拥有自己全新的 200K 上下文窗口。

准备内容 — 与用户确认帖子文本、标题、标签以及任何平台特定的适配。
对于每个平台，启动一个 general-purpose Agent 子代理，提示词应包括：
- 要发布的完整内容
- 阅读相关参考文件的说明（例如，Read /path/to/skills/chrome-automation/references/x.md）
- 阅读 agent-browser 技能文件以获取命令参考的说明
- 具体任务（发布、评论、回复等）
- 任何平台特定的说明（例如，“在 LinkedIn 上使用这些话题标签”）
顺序运行子代理（一次一个），因为它们都通过 --auto-connect 共享同一个 Chrome 浏览器。并行子代理会导致标签页冲突。
每个子代理完成后，在启动下一个之前向用户报告结果。

子代理提示词模板

You are automating a browser task on [PLATFORM].

First, read these files for context:
- /absolute/path/to/skills/chrome-automation/references/[platform].md
- /absolute/path/to/.claude/skills/agent-browser/SKILL.md (agent-browser command reference)

Then connect to the user's Chrome browser using `agent-browser --auto-connect` and perform the following task:

[TASK DESCRIPTION]

Content to publish:
[CONTENT]

Important:
- Always list tabs first (`tab list`) and reuse existing logged-in tabs
- Re-snapshot after every navigation or action
- Confirm with the user before submitting/publishing (destructive action)
- If login is required or a CAPTCHA appears, stop and explain

何时不使用子代理

单一平台 — 直接在当前对话中进行即可。
只读任务（浏览、搜索、提取数据）— 上下文使用较轻；单个对话可以处理 2-3 个平台。

在特定平台上自动化任务时，请查阅相关参考文档以获取页面结构详情、常见操作和已知问题：

平台	参考文档	关键说明
Reddit	`references/reddit.md`	自定义 `faceplate-*` 组件；`networkidle` 永远不会达到；未标记的评论文本框；`find text` 因重复元素而失败
X (Twitter)	`references/x.md`	`open` 经常超时（使用 `tab list` 复用现有标签页）；点击时间戳查看帖子详情（不是用户名）；DraftJS 可内容编辑输入框 (`data-testid="tweetTextarea_0"`)；避免 `networkidle`
LinkedIn	`references/linkedin.md`	Ember.js SPA；Enter 键提交评论（使用 Shift+Enter 换行）；评论框和撰写框共享相同标签；避免 `networkidle`；消息覆盖层可能遮挡内容
Dev.to	`references/devto.md`	快速服务器渲染的 HTML (Forem/Rails)；标准的 `<textarea>` 用于评论/帖子（Markdown）；5 种反应类型；Algolia 驱动的搜索；`networkidle` 正常工作
Hacker News	`references/hackernews.md`	极简的纯 HTML；所有表单字段都未标记；`link "reply"` 导航到单独页面；`networkidle` 瞬间完成；帖子/评论有速率限制

有关安装和 Chrome 设置说明，请参阅 references/agent-browser-setup.md。

If the target page is NOT open → open it in the current tab or a new tab.

agent-browser --auto-connect open <url>

The user's Chrome has their cookies, login sessions, and browser state
Opening a new page when one is already available wastes time and may lose login state
Many marketing platforms (social media dashboards, ad managers, CMS tools) require login — reusing an existing logged-in tab avoids re-authentication

Always use --auto-connect to connect to the user's running Chrome instance:

agent-browser --auto-connect <command>

This auto-discovers Chrome with remote debugging enabled. If connection fails, guide the user through enabling remote debugging (see references/agent-browser-setup.md).

1. Navigate and Interact

# List tabs to find existing pages
agent-browser --auto-connect tab list

# Switch to an existing tab (if found)
agent-browser --auto-connect tab <index>

# Or open a new page
agent-browser --auto-connect open https://example.com
agent-browser --auto-connect wait --load networkidle

# Take a snapshot to see interactive elements
agent-browser --auto-connect snapshot -i

# Click, fill, etc.
agent-browser --auto-connect click @e3
agent-browser --auto-connect fill @e5 "some text"

2. Extract Data from a Page

# Get all text content
agent-browser --auto-connect get text body

# Take a screenshot for visual inspection
agent-browser --auto-connect screenshot

# Execute JavaScript for structured data
agent-browser --auto-connect eval "JSON.stringify(document.querySelectorAll('table tr').length)"

3. Replay a Chrome DevTools Recording

The user may provide a recording exported from Chrome DevTools Recorder (JSON, Puppeteer JS, or @puppeteer/replay JS format). See Replaying Recordings below.

Step-by-Step Interaction Guide

Use snapshot -i to see all interactive elements with refs (@e1, @e2, ...):

agent-browser --auto-connect snapshot -i

The output lists each interactive element with its role, text, and ref. Use these refs for subsequent actions.

Action	Command
Navigate	`agent-browser --auto-connect open <url>` (optionally `wait --load networkidle`, but some sites like Reddit never reach networkidle — skip if `open` already shows the page title)
Click	`snapshot -i` → find ref → `click @eN`
Fill standard input	`click @eN` → `fill @eN "text"`
Fill rich text editor	`click @eN` → `keyboard inserttext "text"`
Press key	`press <key>` (Enter, Tab, Escape, etc.)
Scroll	`scroll down <amount>` or `scroll up <amount>`
Wait for element	`wait @eN` or `wait "<css-selector>"`
Screenshot	`screenshot` or `screenshot --annotate`
Get page text	`get text body`
Get current URL	`get url`
Run JavaScript	`eval <js>`

How to Distinguish Input Types

Standard input/textarea → use fill
Contenteditable div / rich text editor (LinkedIn message box, Gmail compose, Slack, CMS editors) → click/focus first, then use keyboard inserttext

Refs (@e1, @e2, ...) are invalidated when the page changes. Always re-snapshot after:

Clicking links or buttons that trigger navigation
Submitting forms
Triggering dynamic content loads (AJAX, SPA navigation)

After each significant action, verify the result:

agent-browser --auto-connect snapshot -i   # check interactive state
agent-browser --auto-connect screenshot     # visual verification

Replaying Recordings

JSON (recommended) — structured, can be read progressively:

# Count steps
jq '.steps | length' recording.json

# Read first 5 steps
jq '.steps[0:5]' recording.json

@puppeteer/replay JS (import { createRunner })
Puppeteer JS (require('puppeteer'), page.goto, Locator.race)

Parse the recording — understand the full intent before acting. Summarize what the recording does.
List tabs first — check if the target page is already open.
Navigate — execute navigate steps, reusing existing tabs when possible.
For each interaction step :
- Take a snapshot (snapshot -i) to see current interactive elements
- Match the recording's aria/... selectors against the snapshot
- Fall back to text/..., then CSS class hints, then screenshot
- Do not rely on ember IDs, numeric IDs, or exact XPaths — these change every page load
Verify after each step — snapshot or screenshot to confirm

snapshot -i operates on the main frame only and cannot penetrate iframes. Sites like LinkedIn, Gmail, and embedded editors render content inside iframes.

Detecting Iframe Issues

snapshot -i returns unexpectedly short or empty results
Recording references elements not appearing in snapshot output
get text body content doesn't match what a screenshot shows

Useeval to access iframe content:

agent-browser --auto-connect eval --stdin <<'EVALEOF'
const frame = document.querySelector('iframe[data-testid="interop-iframe"]');
const doc = frame.contentDocument;
const btn = doc.querySelector('button[aria-label="Send"]');
btn.click();
EVALEOF

Note: Only works for same-origin iframes.

Usekeyboard for blind input: If the iframe element has focus, keyboard inserttext "..." sends text regardless of frame boundaries.
Useget text body to read full page content including iframes.
Usescreenshot for visual verification when snapshot is unreliable.

When to Ask the User

If workarounds fail after 2 attempts on the same step, pause and explain:

The page uses iframes that cannot be accessed via snapshot
Which element you need and what you expected
Ask the user to perform that step manually, then continue

Handling Unexpected Situations

Handle Automatically (do not stop):

Popups or banners → dismiss them (find text "Dismiss" click or find text "Close" click)
Cookie consent dialogs → accept or dismiss
Tooltip overlays → close them first
Element not in snapshot → try find text "..." click, or scroll to reveal with scroll down 300

Pause and Ask the User:

Login / authentication is required
A CAPTCHA appears
Page structure is completely different from expected
A destructive action is about to happen (deleting data, sending real content) — confirm first
Stuck for more than 2 attempts on the same step
All iframe workarounds have failed

When pausing, explain clearly: what step you are on, what you expected, and what you see.

Key Commands Reference

Command	Description
`tab list`	List all open tabs with index, title, and URL
`tab <index>`	Switch to an existing tab by index
`tab new`	Open a new empty tab
`tab close`	Close the current tab
`open <url>`	Navigate to URL
`snapshot -i`	List interactive elements with refs
`click @eN`	Click element by ref
`fill @eN "text"`	Clear and fill standard input/textarea
`type @eN "text"`	Type without clearing
`keyboard inserttext "text"`	Insert text (best for contenteditable)
`press <key>`	Press keyboard key
`scroll down/up <amount>`	Scroll page in pixels
`wait @eN`	Wait for element to appear
`wait --load networkidle`	Wait for network to settle
`wait <ms>`	Wait for a duration
`screenshot [path]`	Take screenshot
`screenshot --annotate`	Screenshot with numbered labels
`eval <js>`	Execute JavaScript in page
`get text body`	Get all text content
`get url`	Get current URL
`set viewport <w> <h>`	Set viewport size
`find text "..." click`	Semantic find and click
`close`	Close browser session

Iframe blindness : snapshot -i cannot see inside iframes. See Iframe-Heavy Sites.
find text strict mode: Fails when multiple elements match. Use snapshot -i to locate the specific ref instead.
fill vs contenteditable: fill only works on <input> and <textarea>. For rich text editors, use keyboard inserttext.
eval is main-frame only: To interact with iframe content, traverse via document.querySelector('iframe').contentDocument...

Multi-Platform Operations

When the user requests an action across multiple platforms (e.g., "publish this article to Dev.to, LinkedIn, and X"), do NOT attempt all platforms in a single conversation. Instead, launch sequential Agent subagents , one per platform.

Each platform operation consumes ~25-40K tokens (reference file + snapshots + interactions). Running 3-5 platforms in one context risks hitting the 200K token limit and degrading late-platform accuracy. Each subagent gets its own fresh 200K context window.

Prepare the content — confirm the post text, title, tags, and any platform-specific adaptations with the user.
For each platform , launch a general-purpose Agent subagent with a prompt that includes:
- The full content to publish
- Instructions to read the relevant reference file (e.g., Read /path/to/skills/chrome-automation/references/x.md)
- Instructions to read the agent-browser skill file for command reference
- The specific task (post, comment, reply, etc.)
- Any platform-specific instructions (e.g., "use these hashtags on LinkedIn")
Run subagents sequentially (one at a time), because they all share the same Chrome browser via --auto-connect. Parallel subagents would cause tab conflicts.
After each subagent completes , report the result to the user before launching the next one.

Prompt Template for Subagents

You are automating a browser task on [PLATFORM].

First, read these files for context:
- /absolute/path/to/skills/chrome-automation/references/[platform].md
- /absolute/path/to/.claude/skills/agent-browser/SKILL.md (agent-browser command reference)

Then connect to the user's Chrome browser using `agent-browser --auto-connect` and perform the following task:

[TASK DESCRIPTION]

Content to publish:
[CONTENT]

Important:
- Always list tabs first (`tab list`) and reuse existing logged-in tabs
- Re-snapshot after every navigation or action
- Confirm with the user before submitting/publishing (destructive action)
- If login is required or a CAPTCHA appears, stop and explain

When NOT to Use Subagents

Single platform — just do it directly in the current conversation.
Read-only tasks (browsing, searching, extracting data) — context usage is lighter; a single conversation can handle 2-3 platforms.

When automating tasks on specific platforms, consult the relevant reference document for page structure details, common operations, and known quirks:

Platform	Reference	Key Notes
Reddit	`references/reddit.md`	Custom `faceplate-*` components; `networkidle` never reached; unlabeled comment textbox; `find text` fails due to duplicate elements
X (Twitter)	`references/x.md`	`open` often times out (use `tab list` to reuse existing tabs); click timestamp for post detail (not username); DraftJS contenteditable input (`data-testid="tweetTextarea_0"`); avoid `networkidle`
LinkedIn	`references/linkedin.md`	Ember.js SPA; Enter submits comments (use Shift+Enter for newlines); comment box and compose box share the same label; avoid `networkidle`; messaging overlay may block content
Dev.to	`references/devto.md`	Fast server-rendered HTML (Forem/Rails); standard `<textarea>` for comments/posts (Markdown); 5 reaction types; Algolia-powered search; `networkidle` works normally
Hacker News	`references/hackernews.md`	Minimal plain HTML; all form fields are unlabeled; `link "reply"` navigates to separate page; `networkidle` works instantly; rate limiting on posts/comments

For installation and Chrome setup instructions, see references/agent-browser-setup.md.

营销心理学与心智模型应用指南 | 提升营销决策与客户行为理解

42,000 周安装

Chrome自动化技能：使用agent-browser在真实浏览器会话中自动化任务

🇨🇳中文介绍

技能：Chrome 自动化 (agent-browser)

核心原则：复用用户现有的 Chrome

始终先列出标签页

🇺🇸English

Skill: Chrome Automation (agent-browser)

Core Principle: Reuse the User's Existing Chrome

Always Start by Listing Tabs

相关 Skills

为什么这很重要

连接

常见工作流程

1. 导航与交互

2. 从页面提取数据

3. 回放 Chrome DevTools 录制

分步交互指南

获取快照

步骤类型映射

如何区分输入类型

引用生命周期

验证

回放录制

接受的格式

如何回放

大量使用 Iframe 的网站

检测 Iframe 问题

解决方法

何时询问用户

处理意外情况

自动处理（不要停止）：

暂停并询问用户：

关键命令参考

已知限制

多平台操作

为什么使用子代理

如何执行

子代理提示词模板

何时不使用子代理

平台参考

Why This Matters

Connection

Common Workflows

1. Navigate and Interact

2. Extract Data from a Page

3. Replay a Chrome DevTools Recording

Step-by-Step Interaction Guide

Taking Snapshots

Step Type Mapping

How to Distinguish Input Types

Ref Lifecycle

Verification

Replaying Recordings

Accepted Formats

How to Replay

Iframe-Heavy Sites

Detecting Iframe Issues

Workarounds

When to Ask the User

Handling Unexpected Situations

Handle Automatically (do not stop):

Pause and Ask the User:

Key Commands Reference

Known Limitations

Multi-Platform Operations

Why Subagents

How to Execute

Prompt Template for Subagents

When NOT to Use Subagents

Platform References

最新 Skills