chrome-automation by zc277584121/marketing-skills
npx skills add https://github.com/zc277584121/marketing-skills --skill chrome-automation通过 agent-browser CLI,在用户的真实 Chrome 会话中自动化浏览器任务。
前提条件:必须安装 agent-browser 并且 Chrome 必须已启用远程调试。如果不确定,请参阅
references/agent-browser-setup.md。
此技能在单个 Chrome 进程上运行——即用户的真实浏览器。没有会话管理,没有单独的配置文件,也不会启动新的 Playwright 浏览器。
在打开任何新页面之前,始终先列出已存在的标签页:
agent-browser --auto-connect tab list
这将返回所有打开的标签页及其索引号、标题和 URL。检查你需要的页面是否已经打开:
如果目标页面已打开 → 直接切换到该标签页,而不是打开新标签页。用户很可能已经打开了它,因为他们已经登录并且页面处于正确的状态。
agent-browser --auto-connect tab <index>
如果目标页面未打开 → 在当前标签页或新标签页中打开它。
agent-browser --auto-connect open <url>
Automate browser tasks in the user's real Chrome session via the agent-browser CLI.
Prerequisite : agent-browser must be installed and Chrome must have remote debugging enabled. See
references/agent-browser-setup.mdif unsure.
This skill operates on a single Chrome process — the user's real browser. There is no session management, no separate profiles, no launching a fresh Playwright browser.
Before opening any new page, always list existing tabs first :
agent-browser --auto-connect tab list
This returns all open tabs with their index numbers, titles, and URLs. Check if the page you need is already open:
If the target page is already open → switch to that tab directly instead of opening a new one. The user likely has it open because they are already logged in and the page is in the right state.
agent-browser --auto-connect tab <index>
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
始终使用 --auto-connect 连接到用户正在运行的 Chrome 实例:
agent-browser --auto-connect <command>
这会自动发现已启用远程调试的 Chrome。如果连接失败,请指导用户启用远程调试(参见 references/agent-browser-setup.md)。
# 列出标签页以查找现有页面
agent-browser --auto-connect tab list
# 切换到现有标签页(如果找到)
agent-browser --auto-connect tab <index>
# 或者打开新页面
agent-browser --auto-connect open https://example.com
agent-browser --auto-connect wait --load networkidle
# 获取快照以查看交互元素
agent-browser --auto-connect snapshot -i
# 点击、填充等操作
agent-browser --auto-connect click @e3
agent-browser --auto-connect fill @e5 "some text"
# 获取所有文本内容
agent-browser --auto-connect get text body
# 截图以供视觉检查
agent-browser --auto-connect screenshot
# 执行 JavaScript 以获取结构化数据
agent-browser --auto-connect eval "JSON.stringify(document.querySelectorAll('table tr').length)"
用户可能会提供从 Chrome DevTools Recorder 导出的录制文件(JSON、Puppeteer JS 或 @puppeteer/replay JS 格式)。请参阅下面的“回放录制”。
使用 snapshot -i 查看所有带有引用 (@e1, @e2, ...) 的交互元素:
agent-browser --auto-connect snapshot -i
输出会列出每个交互元素的角色、文本和引用。使用这些引用进行后续操作。
| 操作 | 命令 |
|---|---|
| 导航 | agent-browser --auto-connect open <url>(可选 wait --load networkidle,但像 Reddit 这样的网站永远不会达到 networkidle 状态——如果 open 后已显示页面标题,则跳过等待) |
| 点击 | snapshot -i → 找到引用 → click @eN |
| 填充标准输入 | click @eN → fill @eN "text" |
| 填充富文本编辑器 | click @eN → keyboard inserttext "text" |
| 按键 | press <key> (Enter, Tab, Escape 等) |
| 滚动 | scroll down <amount> 或 scroll up <amount> |
| 等待元素 | wait @eN 或 wait "<css-selector>" |
| 截图 | screenshot 或 screenshot --annotate |
| 获取页面文本 | get text body |
| 获取当前 URL | get url |
| 运行 JavaScript | eval <js> |
fillkeyboard inserttext引用 (@e1, @e2, ...) 在页面发生变化时会失效。在以下操作后务必重新获取快照:
在每个重要操作后,验证结果:
agent-browser --auto-connect snapshot -i # 检查交互状态
agent-browser --auto-connect screenshot # 视觉验证
JSON(推荐)— 结构化,可以逐步读取:
# 计算步骤数
jq '.steps | length' recording.json
# 读取前 5 个步骤
jq '.steps[0:5]' recording.json
@puppeteer/replay JS (import { createRunner })
Puppeteer JS (require('puppeteer'), page.goto, Locator.race)
navigate 步骤,尽可能复用现有标签页。snapshot -i) 以查看当前交互元素aria/... 选择器与快照匹配text/...,然后是 CSS 类提示,最后是截图snapshot -i 仅在主框架上操作,无法穿透 iframe。像 LinkedIn、Gmail 和嵌入式编辑器这样的网站会在 iframe 内渲染内容。
snapshot -i 返回异常简短或为空的结果get text body 的内容与截图显示的不匹配使用eval 访问 iframe 内容:
agent-browser --auto-connect eval --stdin <<'EVALEOF'
const frame = document.querySelector('iframe[data-testid="interop-iframe"]');
const doc = frame.contentDocument;
const btn = doc.querySelector('button[aria-label="Send"]');
btn.click();
EVALEOF
注意:仅适用于同源 iframe。
使用keyboard 进行盲输入:如果 iframe 元素已获得焦点,keyboard inserttext "..." 会发送文本,无论框架边界如何。
使用get text body 读取包括 iframe 在内的完整页面内容。
使用screenshot 在快照不可靠时进行视觉验证。
如果在同一步骤上尝试了 2 次解决方法后仍然失败,请暂停并解释:
find text "Dismiss" click 或 find text "Close" click)find text "..." click,或滚动显示 scroll down 300暂停时,请清晰解释:你进行到哪一步,你预期是什么,以及你看到了什么。
| 命令 | 描述 |
|---|---|
tab list | 列出所有打开的标签页,包括索引、标题和 URL |
tab <index> | 按索引切换到现有标签页 |
tab new | 打开新的空白标签页 |
tab close | 关闭当前标签页 |
open <url> | 导航到 URL |
snapshot -i | 列出带有引用的交互元素 |
click @eN | 按引用点击元素 |
fill @eN "text" | 清除并填充标准输入框/文本域 |
type @eN "text" | 输入而不清除 |
keyboard inserttext "text" | 插入文本(最适合可内容编辑元素) |
press <key> | 按下键盘按键 |
scroll down/up <amount> | 按像素滚动页面 |
wait @eN | 等待元素出现 |
wait --load networkidle | 等待网络空闲 |
wait <ms> | 等待一段时间 |
screenshot [path] | 截图 |
screenshot --annotate | 带编号标签的截图 |
eval <js> | 在页面中执行 JavaScript |
get text body | 获取所有文本内容 |
get url | 获取当前 URL |
set viewport <w> <h> | 设置视口大小 |
find text "..." click | 语义查找并点击 |
close | 关闭浏览器会话 |
snapshot -i 无法查看 iframe 内部。请参阅“大量使用 Iframe 的网站”。find text 严格模式:当多个元素匹配时会失败。使用 snapshot -i 来定位特定的引用。fill 与可内容编辑元素:fill 仅适用于 <input> 和 <textarea>。对于富文本编辑器,请使用 keyboard inserttext。eval 仅限主框架:要与 iframe 内容交互,需通过 document.querySelector('iframe').contentDocument... 遍历当用户请求跨多个平台的操作时(例如,“将这篇文章发布到 Dev.to、LinkedIn 和 X”),不要尝试在单个对话中完成所有平台。相反,启动顺序的 Agent 子代理,每个平台一个。
每个平台操作消耗约 25-40K tokens(参考文件 + 快照 + 交互)。在一个上下文中运行 3-5 个平台有达到 200K token 限制的风险,并降低后期平台的准确性。每个子代理都拥有自己全新的 200K 上下文窗口。
general-purpose Agent 子代理,提示词应包括:
Read /path/to/skills/chrome-automation/references/x.md)--auto-connect 共享同一个 Chrome 浏览器。并行子代理会导致标签页冲突。You are automating a browser task on [PLATFORM].
First, read these files for context:
- /absolute/path/to/skills/chrome-automation/references/[platform].md
- /absolute/path/to/.claude/skills/agent-browser/SKILL.md (agent-browser command reference)
Then connect to the user's Chrome browser using `agent-browser --auto-connect` and perform the following task:
[TASK DESCRIPTION]
Content to publish:
[CONTENT]
Important:
- Always list tabs first (`tab list`) and reuse existing logged-in tabs
- Re-snapshot after every navigation or action
- Confirm with the user before submitting/publishing (destructive action)
- If login is required or a CAPTCHA appears, stop and explain
在特定平台上自动化任务时,请查阅相关参考文档以获取页面结构详情、常见操作和已知问题:
| 平台 | 参考文档 | 关键说明 |
|---|---|---|
references/reddit.md | 自定义 faceplate-* 组件;networkidle 永远不会达到;未标记的评论文本框;find text 因重复元素而失败 | |
| X (Twitter) | references/x.md | open 经常超时(使用 tab list 复用现有标签页);点击时间戳查看帖子详情(不是用户名);DraftJS 可内容编辑输入框 (data-testid="tweetTextarea_0");避免 networkidle |
references/linkedin.md | Ember.js SPA;Enter 键提交评论(使用 Shift+Enter 换行);评论框和撰写框共享相同标签;避免 networkidle;消息覆盖层可能遮挡内容 | |
| Dev.to | references/devto.md | 快速服务器渲染的 HTML (Forem/Rails);标准的 <textarea> 用于评论/帖子(Markdown);5 种反应类型;Algolia 驱动的搜索;networkidle 正常工作 |
| Hacker News | references/hackernews.md | 极简的纯 HTML;所有表单字段都未标记;link "reply" 导航到单独页面;networkidle 瞬间完成;帖子/评论有速率限制 |
有关安装和 Chrome 设置说明,请参阅
references/agent-browser-setup.md。
每周安装次数
105
代码库
首次出现
2026年3月5日
安全审计
安装于
github-copilot105
codex105
kimi-cli105
gemini-cli105
cursor105
amp105
If the target page is NOT open → open it in the current tab or a new tab.
agent-browser --auto-connect open <url>
Always use --auto-connect to connect to the user's running Chrome instance:
agent-browser --auto-connect <command>
This auto-discovers Chrome with remote debugging enabled. If connection fails, guide the user through enabling remote debugging (see references/agent-browser-setup.md).
# List tabs to find existing pages
agent-browser --auto-connect tab list
# Switch to an existing tab (if found)
agent-browser --auto-connect tab <index>
# Or open a new page
agent-browser --auto-connect open https://example.com
agent-browser --auto-connect wait --load networkidle
# Take a snapshot to see interactive elements
agent-browser --auto-connect snapshot -i
# Click, fill, etc.
agent-browser --auto-connect click @e3
agent-browser --auto-connect fill @e5 "some text"
# Get all text content
agent-browser --auto-connect get text body
# Take a screenshot for visual inspection
agent-browser --auto-connect screenshot
# Execute JavaScript for structured data
agent-browser --auto-connect eval "JSON.stringify(document.querySelectorAll('table tr').length)"
The user may provide a recording exported from Chrome DevTools Recorder (JSON, Puppeteer JS, or @puppeteer/replay JS format). See Replaying Recordings below.
Use snapshot -i to see all interactive elements with refs (@e1, @e2, ...):
agent-browser --auto-connect snapshot -i
The output lists each interactive element with its role, text, and ref. Use these refs for subsequent actions.
| Action | Command |
|---|---|
| Navigate | agent-browser --auto-connect open <url> (optionally wait --load networkidle, but some sites like Reddit never reach networkidle — skip if open already shows the page title) |
| Click | snapshot -i → find ref → click @eN |
| Fill standard input | click @eN → fill @eN "text" |
| Fill rich text editor | click @eN → keyboard inserttext "text" |
| Press key | press <key> (Enter, Tab, Escape, etc.) |
| Scroll | scroll down <amount> or scroll up <amount> |
| Wait for element | wait @eN or wait "<css-selector>" |
| Screenshot | screenshot or screenshot --annotate |
| Get page text | get text body |
| Get current URL | get url |
| Run JavaScript | eval <js> |
fillkeyboard inserttextRefs (@e1, @e2, ...) are invalidated when the page changes. Always re-snapshot after:
After each significant action, verify the result:
agent-browser --auto-connect snapshot -i # check interactive state
agent-browser --auto-connect screenshot # visual verification
JSON (recommended) — structured, can be read progressively:
# Count steps
jq '.steps | length' recording.json
# Read first 5 steps
jq '.steps[0:5]' recording.json
@puppeteer/replay JS (import { createRunner })
Puppeteer JS (require('puppeteer'), page.goto, Locator.race)
navigate steps, reusing existing tabs when possible.snapshot -i) to see current interactive elementsaria/... selectors against the snapshottext/..., then CSS class hints, then screenshotsnapshot -i operates on the main frame only and cannot penetrate iframes. Sites like LinkedIn, Gmail, and embedded editors render content inside iframes.
snapshot -i returns unexpectedly short or empty resultsget text body content doesn't match what a screenshot showsUseeval to access iframe content:
agent-browser --auto-connect eval --stdin <<'EVALEOF'
const frame = document.querySelector('iframe[data-testid="interop-iframe"]');
const doc = frame.contentDocument;
const btn = doc.querySelector('button[aria-label="Send"]');
btn.click();
EVALEOF
Note: Only works for same-origin iframes.
Usekeyboard for blind input: If the iframe element has focus, keyboard inserttext "..." sends text regardless of frame boundaries.
Useget text body to read full page content including iframes.
Usescreenshot for visual verification when snapshot is unreliable.
If workarounds fail after 2 attempts on the same step, pause and explain:
find text "Dismiss" click or find text "Close" click)find text "..." click, or scroll to reveal with scroll down 300When pausing, explain clearly: what step you are on, what you expected, and what you see.
| Command | Description |
|---|---|
tab list | List all open tabs with index, title, and URL |
tab <index> | Switch to an existing tab by index |
tab new | Open a new empty tab |
tab close | Close the current tab |
open <url> | Navigate to URL |
snapshot -i | List interactive elements with refs |
click @eN | Click element by ref |
fill @eN "text" | Clear and fill standard input/textarea |
type @eN "text" | Type without clearing |
keyboard inserttext "text" | Insert text (best for contenteditable) |
press <key> | Press keyboard key |
scroll down/up <amount> | Scroll page in pixels |
wait @eN | Wait for element to appear |
wait --load networkidle | Wait for network to settle |
wait <ms> | Wait for a duration |
screenshot [path] | Take screenshot |
screenshot --annotate | Screenshot with numbered labels |
eval <js> | Execute JavaScript in page |
get text body | Get all text content |
get url | Get current URL |
set viewport <w> <h> | Set viewport size |
find text "..." click | Semantic find and click |
close | Close browser session |
snapshot -i cannot see inside iframes. See Iframe-Heavy Sites.find text strict mode: Fails when multiple elements match. Use snapshot -i to locate the specific ref instead.fill vs contenteditable: fill only works on <input> and <textarea>. For rich text editors, use keyboard inserttext.eval is main-frame only: To interact with iframe content, traverse via document.querySelector('iframe').contentDocument...When the user requests an action across multiple platforms (e.g., "publish this article to Dev.to, LinkedIn, and X"), do NOT attempt all platforms in a single conversation. Instead, launch sequential Agent subagents , one per platform.
Each platform operation consumes ~25-40K tokens (reference file + snapshots + interactions). Running 3-5 platforms in one context risks hitting the 200K token limit and degrading late-platform accuracy. Each subagent gets its own fresh 200K context window.
general-purpose Agent subagent with a prompt that includes:
Read /path/to/skills/chrome-automation/references/x.md)--auto-connect. Parallel subagents would cause tab conflicts.You are automating a browser task on [PLATFORM].
First, read these files for context:
- /absolute/path/to/skills/chrome-automation/references/[platform].md
- /absolute/path/to/.claude/skills/agent-browser/SKILL.md (agent-browser command reference)
Then connect to the user's Chrome browser using `agent-browser --auto-connect` and perform the following task:
[TASK DESCRIPTION]
Content to publish:
[CONTENT]
Important:
- Always list tabs first (`tab list`) and reuse existing logged-in tabs
- Re-snapshot after every navigation or action
- Confirm with the user before submitting/publishing (destructive action)
- If login is required or a CAPTCHA appears, stop and explain
When automating tasks on specific platforms, consult the relevant reference document for page structure details, common operations, and known quirks:
| Platform | Reference | Key Notes |
|---|---|---|
references/reddit.md | Custom faceplate-* components; networkidle never reached; unlabeled comment textbox; find text fails due to duplicate elements | |
| X (Twitter) | references/x.md | open often times out (use tab list to reuse existing tabs); click timestamp for post detail (not username); DraftJS contenteditable input (data-testid="tweetTextarea_0"); avoid networkidle |
references/linkedin.md | Ember.js SPA; Enter submits comments (use Shift+Enter for newlines); comment box and compose box share the same label; avoid networkidle; messaging overlay may block content | |
| Dev.to | references/devto.md | Fast server-rendered HTML (Forem/Rails); standard <textarea> for comments/posts (Markdown); 5 reaction types; Algolia-powered search; networkidle works normally |
| Hacker News | references/hackernews.md | Minimal plain HTML; all form fields are unlabeled; link "reply" navigates to separate page; networkidle works instantly; rate limiting on posts/comments |
For installation and Chrome setup instructions, see
references/agent-browser-setup.md.
Weekly Installs
105
Repository
First Seen
Mar 5, 2026
Security Audits
Installed on
github-copilot105
codex105
kimi-cli105
gemini-cli105
cursor105
amp105
营销心理学与心智模型应用指南 | 提升营销决策与客户行为理解
42,000 周安装
Python PDF处理指南:合并、拆分、提取文本与表格,创建PDF文件
104 周安装
Shopify Polaris Web Components 使用指南:为 App Home 构建 UI 的完整教程
104 周安装
每日新闻摘要生成器 - AI自动汇总多源新闻,智能生成Markdown报告
104 周安装
Obsidian CLI 官方命令行工具使用指南:文件管理、搜索、属性与任务操作
104 周安装
流程图创建器 - 在线生成交互式HTML流程图、决策树和工作流可视化工具
104 周安装
send-file 技能:Telegram 文件发送工具,支持图片、文档、音频、视频一键发送
104 周安装