npx skills add https://github.com/zc277584121/marketing-skills --skill browser-screenshot截取网页特定区域的聚焦截图——例如 Reddit 帖子、推文、文章段落、图表等——而不仅仅是整页转储。
前提条件:必须安装 agent-browser 并且 Chrome 必须已启用远程调试。如果不确定,请参阅
references/agent-browser-setup.md。
此技能处理完整的流程:
切勿输出未经裁剪的完整视口或整页截图作为最终结果。 全屏截图包含太多干扰信息(导航栏、侧边栏、广告、无关内容),不适合作为文章插图。每个截图都必须裁剪到一个聚焦区域。
浏览器用于捕获,而非浏览。 在 Chrome 中打开任何内容之前,使用基于文本的工具(WebSearch、WebFetch)查找候选页面,阅读其内容,并决定哪些页面真正值得截图。
这可以节省大量时间——大多数候选页面不值得截图,你可以在无需浏览器导航开销的情况下淘汰它们。
Take focused screenshots of specific regions on web pages — a Reddit post, a tweet, an article section, a chart, etc. — not just a full-page dump.
Prerequisite : agent-browser must be installed and Chrome must have remote debugging enabled. See
references/agent-browser-setup.mdif unsure.
This skill handles the full pipeline:
NEVER output an uncropped full-viewport or full-page screenshot as a final result. Full screenshots contain too much noise (nav bars, sidebars, ads, unrelated content) and are unsuitable as article illustrations. Every screenshot MUST be cropped to a focused region.
The browser is for capturing, not for browsing. Before opening anything in Chrome, use text-based tools (WebSearch, WebFetch) to find candidate pages, read their content, and decide which ones are actually worth screenshotting.
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
在以下情况下,跳过 WebSearch/WebFetch 阶段,直接进行 Chrome 浏览:
在这些情况下,Chrome 浏览取代了 WebSearch——导航到平台的搜索页面,浏览结果,并在决定截图内容之前进行视觉评估。
正确的页面取决于文章的上下文以及主题的新近度/知名度:
| 主题类型 | 最佳查找页面 | 如何查找 |
|---|---|---|
| 新模型/功能发布(< 6 个月) | 宣布它的官方博客文章 | WebSearch "<模型名称>" site:<供应商域名> blog |
| 成熟产品(> 6 个月) | 产品落地页或文档概览 | WebSearch "<模型名称>" official page |
| 开源模型 | HuggingFace 模型卡片或 GitHub 仓库 | 直接 URL:huggingface.co/<组织>/<模型> |
| API 服务 | API 文档页面 | WebSearch "<服务名称>" API docs |
注意:此表格列出了常见的主题类型,但并非详尽无遗。对任何主题类型应用相同的研究优先策略——为当前主题找到最具权威性和视觉简洁的源页面。
核心原则:少即是多。专注于内容,而非界面。
一个好的截图来源包含一个聚焦的、自包含的信息片段——一段文字、一个关键引用、一个数据表格、一个图表。它不应该是一个充满按钮、导航、侧边栏和交互元素的繁忙页面。
经验法则:如果你计划捕获的区域包含的交互式 UI 元素(按钮、链接、导航项)多于可读文本内容,那么这是一个糟糕的裁剪。找一个内容更丰富的区域,或者完全选择另一个页面。
在浏览器中打开之前,使用 WebFetch(轻量级 HEAD/GET)验证 URL,以避免在 404 或重定向上浪费时间:
WebFetch: <候选-url>
→ 检查状态码、标题和内容片段
→ 如果是 404 或重定向到无关页面,尝试下一个候选
思考文章读者需要在此截图中看到什么:
| 文章上下文 | 需要捕获的内容 | 目标区域 |
|---|---|---|
| 在系列中介绍模型 | 模型名称 + 关键标语/描述 | 博客主图部分或 HF 模型卡片标题 |
| 比较功能 | 功能亮点或规格表 | 显示规格/功能的博客部分 |
| 讨论特定功能 | 功能描述 | 相关章节标题 + 1-2 个段落 |
| 展示产品/服务 | 品牌标识 + 价值主张 | 落地页主图(标题 + 副标题 + 视觉元素) |
截图应该让读者想到“啊,这就是这个模型/产品”——而不是“我在看什么?”
agent-browser --auto-connect tab list
检查页面是否已打开。重用现有标签页——它们具有登录会话和正确的状态。
| 用户提供 | 策略 |
|---|---|
| 直接 URL | agent-browser --auto-connect open <url> |
| 搜索查询 | open https://www.google.com/search?q=<编码后查询> → 找到并点击最佳结果 |
| 平台 + 主题 | 构建平台搜索 URL(见下文)→ 定位目标内容 |
| 模糊描述 | Google 搜索 → 评估结果 → 导航到最佳匹配 |
| 平台 | 搜索 URL 模式 |
|---|---|
https://www.reddit.com/search/?q=<查询> | |
| X / Twitter | https://x.com/search?q=<查询> |
https://www.linkedin.com/search/results/content/?keywords=<查询> | |
| Hacker News | https://hn.algolia.com/?q=<查询> |
| GitHub | https://github.com/search?q=<查询> |
| YouTube | https://www.youtube.com/results?search_query=<查询> |
导航后,等待内容稳定:
agent-browser --auto-connect wait --load networkidle
注意:某些网站(Reddit、X、LinkedIn)永远不会达到
networkidle。如果open命令的输出中已经显示了页面标题,则跳过等待。使用wait 2000作为安全的替代方案。
这是关键步骤。目标是找到一个CSS 选择器,它能精确地包裹要捕获的内容。
拍摄带注释的截图以了解页面布局:
agent-browser --auto-connect screenshot --annotate
拍摄快照以查看页面的可访问性树:
agent-browser --auto-connect snapshot -i
识别目标容器元素。寻找:
* 语义化 HTML 容器:`<article>`、`<main>`、`<section>`
* 平台特定组件(见平台选择器)
* 数据属性:`[data-testid="..."]`、`[data-id="..."]`
4. 使用get box验证以确认元素具有合理的边界框:
agent-browser --auto-connect get box "<selector>"
这将返回 { x, y, width, height }。进行合理性检查:
* 宽度应 > 100px 且 < 视口宽度
* 高度应 > 50px
* 如果边界框是整个页面,则选择器太宽泛——需要细化
5. 如果选择器难以找到,使用 eval 探索 DOM:
agent-browser --auto-connect eval "document.querySelector('article')?.getBoundingClientRect()"
流行平台的常见容器选择器:
| 平台 | 目标 | 典型选择器 |
|---|---|---|
| 一个帖子 | shreddit-post、[data-testid="post-container"] | |
| X / Twitter | 一条推文 | article[data-testid="tweet"] |
| 一个动态帖子 | .feed-shared-update-v2 | |
| Hacker News | 一个故事 + 评论 | #hnmain .fatitem |
| GitHub | 一个仓库卡片 | [data-hpc]、.repository-content |
| YouTube | 视频播放器区域 | #player-container-outer |
| 通用文章 | 主要内容 | article、main、[role="main"]、.post-content、.article-body |
这些选择器可能会随时间变化。使用前务必用
get box验证。
如果选择器匹配多个元素(例如,时间线上的多条推文),请缩小范围:
# 计算匹配数
agent-browser --auto-connect get count "article[data-testid='tweet']"
# 使用 nth-child 或 :first-of-type,或更具体的选择器
# 或者使用 eval 通过文本内容找到正确的一个:
agent-browser --auto-connect eval --stdin <<'EOF'
const posts = document.querySelectorAll('article[data-testid="tweet"]');
for (let i = 0; i < posts.length; i++) {
const text = posts[i].textContent.substring(0, 80);
console.log(i, text);
}
EOF
然后使用 :nth-of-type(N) 或唯一的父选择器来定位特定的一个。
当目标元素适合视口大小时最佳。
# 将目标滚动到视图中
agent-browser --auto-connect scrollintoview "<selector>"
agent-browser --auto-connect wait 500
# 拍摄视口截图
agent-browser --auto-connect screenshot /tmp/browser-screenshot-raw.png
然后使用边界框进行裁剪(见裁剪部分)。
当目标可能大于视口或需要精确裁剪时最佳。
# 拍摄整页截图
agent-browser --auto-connect screenshot --full /tmp/browser-screenshot-full.png
# 获取目标元素的边界框
agent-browser --auto-connect get box "<selector>"
# 输出:{ x: 200, y: 450, width: 680, height: 520 }
然后裁剪(见裁剪部分)。
使用 ImageMagick(IMv7 上是 magick,convert 已弃用)将截图裁剪到目标区域。添加内边距以获得视觉呼吸空间。
关键:在 macOS Retina 显示屏上,截图以 2 倍分辨率捕获。1728x940 的视口会产生 3456x1880 的图像。你必须考虑到这一点:
检测缩放因子:比较视口大小与实际图像尺寸:
magick identify /tmp/screenshot.png
在裁剪之前,将 get box 坐标乘以缩放因子:
SCALE=2 X=$((200 * SCALE)) Y=$((450 * SCALE)) W=$((680 * SCALE)) H=$((520 * SCALE)) PADDING=$((16 * SCALE))
magick /tmp/browser-screenshot-full.png \
-crop $((W + PADDING*2))x$((H + PADDING*2))+$((X - PADDING))+$((Y - PADDING)) \
+repage \
<输出路径>.png
重要:
get box返回浮点数值。在传递给 ImageMagick 之前将它们四舍五入为整数。
内边距:使用 12–20px(视口 px)。如果目标有清晰的视觉边界(卡片、带边框的框),则增加到约 30px。如果用户想要紧密裁剪,则使用 0。
reddit-post-screenshot.png、tweet-screenshot.png裁剪后,读取输出图像以验证其捕获了正确的内容:
# 使用 Read 工具视觉检查裁剪后的截图
如果裁剪错误(遗漏内容、过多空白、错误元素),请调整选择器或边界框并重试。
当基于 DOM 的定位不确定时——选择器可能错误、存在多个候选、或目标不明确——在裁剪之前使用JS 注入高亮进行视觉确认。
在候选元素上注入高亮边框:
agent-browser --auto-connect eval --stdin <<'EOF' (function() { const el = document.querySelector('<selector>'); if (!el) { console.log('NOT_FOUND'); return; } el.style.outline = '4px solid red'; el.style.outlineOffset = '2px'; el.scrollIntoView({ block: 'center' }); })(); EOF
拍摄截图并进行视觉检查:
agent-browser --auto-connect screenshot /tmp/highlight-check.png
读取截图以检查红色边框是否包围了正确的内容。
如果正确,移除高亮并继续裁剪:
agent-browser --auto-connect eval "document.querySelector('<selector>').style.outline = ''; document.querySelector('<selector>').style.outlineOffset = '';"
如果错误,尝试下一个候选或优化选择器,重新高亮,并重新检查。
get box 结果看起来可疑(太大、太小、零尺寸)在拍摄最终截图之前,清理页面以获得更好的结果:
# 关闭 Cookie 横幅、弹窗、覆盖层
agent-browser --auto-connect eval --stdin <<'EOF'
(function() {
// 常见的 Cookie/弹窗选择器
const selectors = [
'[class*="cookie"] button',
'[class*="consent"] button',
'[class*="banner"] [class*="close"]',
'[class*="modal"] [class*="close"]',
'[class*="popup"] [class*="close"]',
'[aria-label="Close"]',
'[data-testid="close"]'
];
selectors.forEach(sel => {
document.querySelectorAll(sel).forEach(el => {
if (el.offsetParent !== null) el.click();
});
});
// 隐藏覆盖内容的固定/粘性元素(导航栏、横幅)
document.querySelectorAll('*').forEach(el => {
const style = getComputedStyle(el);
if ((style.position === 'fixed' || style.position === 'sticky') && el.tagName !== 'HTML' && el.tagName !== 'BODY') {
el.style.display = 'none';
}
});
})();
EOF
谨慎使用:隐藏固定元素可能会移除重要上下文。仅当覆盖层明显遮挡目标区域时才运行此操作。
一些 Cookie 同意横幅(例如 Jina AI 的 Usercentrics)位于 Shadow DOM 或 iframe 中,无法通过 JS click() 或 remove() 关闭。不要在多次 JS 尝试上浪费时间。而是:
为了一致、高质量的截图,请在捕获前设置视口:
# 标准桌面视口
agent-browser --auto-connect set viewport 1280 800
# 更宽用于仪表板/数据密集型页面
agent-browser --auto-connect set viewport 1440 900
# 更窄用于类似移动端的内容(社交媒体帖子)
agent-browser --auto-connect set viewport 800 600
选择一个能使目标内容清晰呈现的视口宽度——不要太拥挤,也不要太拉伸。
用户:“截取 r/programming 上热门帖子的截图”
# 1. 列出已有标签页
agent-browser --auto-connect tab list
# 2. 导航到子版块
agent-browser --auto-connect open https://www.reddit.com/r/programming/
agent-browser --auto-connect wait 2000
# 3. 找到第一个帖子容器
agent-browser --auto-connect eval "document.querySelector('shreddit-post')?.getBoundingClientRect()"
# 4. 将其滚动到视图中
agent-browser --auto-connect scrollintoview "shreddit-post"
agent-browser --auto-connect wait 500
# 5. 获取边界框
agent-browser --auto-connect get box "shreddit-post"
# → { x: 312, y: 80, width: 656, height: 420 }
# 6. 拍摄整页截图
agent-browser --auto-connect screenshot --full /tmp/reddit-raw.png
# 7. 带内边距裁剪
convert /tmp/reddit-raw.png \
-crop 688x452+296+64 +repage \
reddit-post-screenshot.png
# 8. 通过读取输出图像验证
| 命令 | 用途 |
|---|---|
tab list | 列出打开的标签页 |
open <url> | 导航到 URL |
wait 2000 | 等待内容稳定 |
snapshot -i | 查看交互元素 |
screenshot --annotate | 带标签的视觉概览 |
screenshot --full <路径> | 整页截图 |
get box "<selector>" | 获取元素边界框 |
scrollintoview "<sel>" | 将元素滚动到视图中 |
eval <js> | 在页面中运行 JavaScript |
set viewport <宽> <高> | 设置视口尺寸 |
get box 返回 null 或零尺寸get count "<selector>" 验证。wait 2000 并重试。screenshot --full 与 get box 一起使用(它们使用相同的坐标系)。get box 的 x 值可能有偏移。get box 和 snapshot -i 无法看到 iframe 内部。
使用 eval 访问 iframe 内容:
agent-browser --auto-connect eval "document.querySelector('iframe').contentDocument.querySelector('<sel>').getBoundingClientRect()"
注意:仅适用于同源 iframe。
open 成功但页面内容错误浏览器可能切换到了不同的标签页(例如,弹窗或重定向打开了新标签页)。导航后务必验证:
agent-browser --auto-connect eval "document.location.href"
如果 URL 错误,使用 tab list 找到正确的标签页,并使用 tab goto <N> 切换。
某些页面(例如 Google 开发者文档)在 document.fonts.ready 上挂起。首先强制解析它:
agent-browser --auto-connect eval "document.fonts.ready.then(() => 'ok')"
然后重试截图。
在拍摄截图前向下滚动以触发加载:
agent-browser --auto-connect scroll down 1000
agent-browser --auto-connect wait 1500
agent-browser --auto-connect scroll up 1000
每周安装次数
130
仓库
首次出现
10 天前
安全审计
安装于
gemini-cli130
amp130
cline130
github-copilot130
codex130
kimi-cli130
This saves significant time — most candidate pages won't be worth screenshotting, and you can eliminate them without the overhead of browser navigation.
Skip the WebSearch/WebFetch phase and go directly to Chrome browsing when:
In these cases, Chrome browsing replaces WebSearch — navigate to the platform's search page, browse results, and evaluate pages visually before deciding what to screenshot.
The right page depends on the context of the article and how recent/notable the subject is:
| Subject Type | Best Page to Find | How to Find It |
|---|---|---|
| New model/feature launch (< 6 months) | Official blog post announcing it | WebSearch "<model name>" site:<vendor-domain> blog |
| Established product (> 6 months) | Product landing page or docs overview | WebSearch "<model name>" official page |
| Open-source model | HuggingFace model card or GitHub repo | Direct URL: huggingface.co/<org>/<model> |
| API service | API documentation page | WebSearch "<service name>" API docs |
Note : This table lists common subject types but is not exhaustive. Apply the same research-first strategy to any subject type — find the most authoritative and visually clean source page for the topic at hand.
Core principle: Less is more. Focus on content, not chrome.
A good screenshot source contains a focused, self-contained piece of information — a paragraph of text, a key quote, a data table, a diagram. It should NOT be a busy page full of buttons, navigation, sidebars, and interactive elements.
Rule of thumb : If the region you plan to capture contains more interactive UI elements (buttons, links, nav items) than readable text content, it's a bad crop. Find a more content-rich region, or pick a different page entirely.
Before opening in the browser, validate URLs with WebFetch (lightweight HEAD/GET) to avoid wasting time on 404s or redirects:
WebFetch: <candidate-url>
→ Check status code, title, and content snippet
→ If 404 or redirect to unrelated page, try next candidate
Think about what the article reader needs to see in this screenshot:
| Article Context | What to Capture | Target Region |
|---|---|---|
| Introducing a model in a lineup | Model name + key tagline/description | Blog hero section or HF model card header |
| Comparing capabilities | Feature highlights or spec table | Blog section showing specs/features |
| Discussing a specific feature | The feature description | Relevant section heading + 1-2 paragraphs |
| Showing a product/service | Brand identity + value prop | Landing page hero (title + subtitle + visual) |
The screenshot should make the reader think "ah, that's what this model/product is" — not "what am I looking at?"
agent-browser --auto-connect tab list
Check if the page is already open. Reuse existing tabs — they have login sessions and correct state.
| User Provides | Strategy |
|---|---|
| Direct URL | agent-browser --auto-connect open <url> |
| Search query | open https://www.google.com/search?q=<encoded-query> → find and click the best result |
| Platform + topic | Construct platform search URL (see below) → locate target content |
| Vague description | Google search → evaluate results → navigate to best match |
| Platform | Search URL Pattern |
|---|---|
https://www.reddit.com/search/?q=<query> | |
| X / Twitter | https://x.com/search?q=<query> |
https://www.linkedin.com/search/results/content/?keywords=<query> | |
| Hacker News | https://hn.algolia.com/?q=<query> |
| GitHub | https://github.com/search?q=<query> |
| YouTube | https://www.youtube.com/results?search_query=<query> |
After navigation, wait for content to settle:
agent-browser --auto-connect wait --load networkidle
Note : Some sites (Reddit, X, LinkedIn) never reach
networkidle. Ifopenalready shows the page title in its output, skip the wait. Usewait 2000as a safe alternative.
This is the critical step. The goal is to find a CSS selector that precisely wraps the content to capture.
Take an annotated screenshot to understand the page layout:
agent-browser --auto-connect screenshot --annotate
Take a snapshot to see the page's accessibility tree:
agent-browser --auto-connect snapshot -i
Identify the target container element. Look for:
<article>, <main>, <section>[data-testid="..."], [data-id="..."]Verify withget box to confirm the element has a reasonable bounding box:
agent-browser --auto-connect get box "<selector>"
This returns { x, y, width, height }. Sanity-check:
* Width should be > 100px and < viewport width
* Height should be > 50px
* If the box is the entire page, the selector is too broad — refine it
5. If the selector is hard to find , use eval to explore the DOM:
agent-browser --auto-connect eval "document.querySelector('article')?.getBoundingClientRect()"
Common container selectors for popular platforms:
| Platform | Target | Typical Selector |
|---|---|---|
| A post | shreddit-post, [data-testid="post-container"] | |
| X / Twitter | A tweet | article[data-testid="tweet"] |
| A feed post | .feed-shared-update-v2 | |
| Hacker News | A story + comments | #hnmain .fatitem |
| GitHub | A repo card | [data-hpc], .repository-content |
| YouTube | Video player area | #player-container-outer |
| Generic article | Main content | article, main, [role="main"], .post-content, .article-body |
These selectors may change over time. Always verify with
get boxbefore using.
If the selector matches multiple elements (e.g., multiple tweets on a timeline), narrow it down:
# Count matches
agent-browser --auto-connect get count "article[data-testid='tweet']"
# Use nth-child or :first-of-type, or a more specific selector
# Or use eval to find the right one by text content:
agent-browser --auto-connect eval --stdin <<'EOF'
const posts = document.querySelectorAll('article[data-testid="tweet"]');
for (let i = 0; i < posts.length; i++) {
const text = posts[i].textContent.substring(0, 80);
console.log(i, text);
}
EOF
Then target a specific one using :nth-of-type(N) or a unique parent selector.
Best when the target element fits within the viewport.
# Scroll the target into view
agent-browser --auto-connect scrollintoview "<selector>"
agent-browser --auto-connect wait 500
# Take viewport screenshot
agent-browser --auto-connect screenshot /tmp/browser-screenshot-raw.png
Then crop using the bounding box (see Cropping).
Best when the target might be larger than the viewport or when precise cropping is needed.
# Take full-page screenshot
agent-browser --auto-connect screenshot --full /tmp/browser-screenshot-full.png
# Get the target element's bounding box
agent-browser --auto-connect get box "<selector>"
# Output: { x: 200, y: 450, width: 680, height: 520 }
Then crop (see Cropping).
Use ImageMagick (magick on IMv7, convert is deprecated) to crop the screenshot to the target region. Add padding for visual breathing room.
Critical : On macOS Retina displays, screenshots are captured at 2x resolution. A 1728x940 viewport produces a 3456x1880 image. You MUST account for this:
Detect the scale factor : Compare viewport size vs actual image dimensions:
# Check actual image dimensions
magick identify /tmp/screenshot.png
# → 3456x1880 means 2x scale on a 1728x940 viewport
Multiplyget box coordinates by the scale factor before cropping:
# get box returns viewport coordinates: { x: 200, y: 450, width: 680, height: 520 }
# For 2x Retina, actual image coordinates are:
SCALE=2
X=$((200 * SCALE))
Y=$((450 * SCALE))
W=$((680 * SCALE))
H=$((520 * SCALE))
PADDING=$((16 * SCALE))
magick /tmp/browser-screenshot-full.png \
-crop $((W + PADDING*2))x$((H + PADDING*2))+$((X - PADDING))+$((Y - PADDING)) \
+repage \
<output-path>.png
Important :
get boxreturns floating-point values. Round them to integers before passing to ImageMagick.
Padding : Use 12–20px (viewport px). Increase to ~30px if the target has a distinct visual boundary (card, bordered box). Use 0 if the user wants a tight crop.
reddit-post-screenshot.png, tweet-screenshot.pngAfter cropping, read the output image to verify it captured the right content:
# Use the Read tool to visually inspect the cropped screenshot
If the crop is wrong (missed content, too much whitespace, wrong element), adjust the selector or bounding box and retry.
When DOM-based location is uncertain — the selector might be wrong, multiple candidates exist, or the target is ambiguous — use JS-injected highlighting to visually confirm before cropping.
Inject a highlight border on the candidate element:
agent-browser --auto-connect eval --stdin <<'EOF'
(function() {
const el = document.querySelector('<selector>');
if (!el) { console.log('NOT_FOUND'); return; }
el.style.outline = '4px solid red';
el.style.outlineOffset = '2px';
el.scrollIntoView({ block: 'center' });
})();
EOF
Take a screenshot and visually inspect:
agent-browser --auto-connect screenshot /tmp/highlight-check.png
Read the screenshot to check if the red border surrounds the correct content.
If correct , remove the highlight and proceed with cropping:
agent-browser --auto-connect eval "document.querySelector('<selector>').style.outline = ''; document.querySelector('<selector>').style.outlineOffset = '';"
If wrong , try the next candidate or refine the selector, re-highlight, and re-check.
get box result looks suspicious (too large, too small, zero-sized)Before taking the final screenshot, clean up the page for a better result:
# Dismiss cookie banners, popups, overlays
agent-browser --auto-connect eval --stdin <<'EOF'
(function() {
// Common cookie/popup selectors
const selectors = [
'[class*="cookie"] button',
'[class*="consent"] button',
'[class*="banner"] [class*="close"]',
'[class*="modal"] [class*="close"]',
'[class*="popup"] [class*="close"]',
'[aria-label="Close"]',
'[data-testid="close"]'
];
selectors.forEach(sel => {
document.querySelectorAll(sel).forEach(el => {
if (el.offsetParent !== null) el.click();
});
});
// Hide fixed/sticky elements that overlay content (nav bars, banners)
document.querySelectorAll('*').forEach(el => {
const style = getComputedStyle(el);
if ((style.position === 'fixed' || style.position === 'sticky') && el.tagName !== 'HTML' && el.tagName !== 'BODY') {
el.style.display = 'none';
}
});
})();
EOF
Use with caution : Hiding fixed elements might remove important context. Only run this when overlays visibly obstruct the target region.
Some cookie consent banners (e.g., Jina AI's Usercentrics) live in shadow DOM or iframes and cannot be dismissed via JS click() or remove(). Don't waste time with multiple JS attempts. Instead:
For consistent, high-quality screenshots, set the viewport before capturing:
# Standard desktop viewport
agent-browser --auto-connect set viewport 1280 800
# Wider for dashboard/data-heavy pages
agent-browser --auto-connect set viewport 1440 900
# Narrower for mobile-like content (social media posts)
agent-browser --auto-connect set viewport 800 600
Choose a viewport width that makes the target content render cleanly — not too cramped, not too stretched.
User: "Screenshot the top post on r/programming"
# 1. List existing tabs
agent-browser --auto-connect tab list
# 2. Navigate to subreddit
agent-browser --auto-connect open https://www.reddit.com/r/programming/
agent-browser --auto-connect wait 2000
# 3. Find the first post container
agent-browser --auto-connect eval "document.querySelector('shreddit-post')?.getBoundingClientRect()"
# 4. Scroll it into view
agent-browser --auto-connect scrollintoview "shreddit-post"
agent-browser --auto-connect wait 500
# 5. Get bounding box
agent-browser --auto-connect get box "shreddit-post"
# → { x: 312, y: 80, width: 656, height: 420 }
# 6. Take full-page screenshot
agent-browser --auto-connect screenshot --full /tmp/reddit-raw.png
# 7. Crop with padding
convert /tmp/reddit-raw.png \
-crop 688x452+296+64 +repage \
reddit-post-screenshot.png
# 8. Verify by reading the output image
| Command | Purpose |
|---|---|
tab list | List open tabs |
open <url> | Navigate to URL |
wait 2000 | Wait for content to settle |
snapshot -i | See interactive elements |
screenshot --annotate | Visual overview with labels |
screenshot --full <path> | Full-page screenshot |
get box "<selector>" | Get element bounding box |
scrollintoview "<sel>" | Scroll element into view |
eval <js> | Run JavaScript in page |
set viewport <w> <h> | Set viewport dimensions |
get box returns null or zero-sizedget count "<selector>" to verify.wait 2000 and retry.screenshot --full with get box (they use the same coordinate system).get box x values may be offset.get box and snapshot -i cannot see inside iframes.
Use eval to access iframe content:
agent-browser --auto-connect eval "document.querySelector('iframe').contentDocument.querySelector('<sel>').getBoundingClientRect()"
Note: Only works for same-origin iframes.
open succeeded but page content is wrongThe browser may have switched to a different tab (e.g., a popup or redirect opened a new tab). Always verify after navigation:
agent-browser --auto-connect eval "document.location.href"
If the URL is wrong, use tab list to find the correct tab and tab goto <N> to switch.
Some pages (e.g., Google developer docs) hang on document.fonts.ready. Force-resolve it first:
agent-browser --auto-connect eval "document.fonts.ready.then(() => 'ok')"
Then retry the screenshot.
Scroll down to trigger loading before taking the screenshot:
agent-browser --auto-connect scroll down 1000
agent-browser --auto-connect wait 1500
agent-browser --auto-connect scroll up 1000
Weekly Installs
130
Repository
First Seen
10 days ago
Security Audits
Installed on
gemini-cli130
amp130
cline130
github-copilot130
codex130
kimi-cli130
Azure RBAC 权限管理工具:查找最小角色、创建自定义角色与自动化分配
127,200 周安装