浏览器截图技能：精准截取网页区域，告别全屏干扰，提升内容创作效率

browser-screenshot by zc277584121/marketing-skills

130 周安装量

GitHub

安装命令

npx skills add https://github.com/zc277584121/marketing-skills --skill browser-screenshot

内容创作自动化生产力

🇨🇳中文介绍

技能：浏览器截图

截取网页特定区域的聚焦截图——例如 Reddit 帖子、推文、文章段落、图表等——而不仅仅是整页转储。

前提条件：必须安装 agent-browser 并且 Chrome 必须已启用远程调试。如果不确定，请参阅 references/agent-browser-setup.md。

概述

此技能处理完整的流程：

研究最适合截图的页面（网页搜索、抓取）
在浏览器中导航到正确的页面
定位页面上的目标元素/区域
捕获仅针对该区域的聚焦、裁剪后的截图

硬性规则：禁止全屏截图

切勿输出未经裁剪的完整视口或整页截图作为最终结果。 全屏截图包含太多干扰信息（导航栏、侧边栏、广告、无关内容），不适合作为文章插图。每个截图都必须裁剪到一个聚焦区域。

步骤 0：研究——在打开浏览器之前查找并验证来源

浏览器用于捕获，而非浏览。 在 Chrome 中打开任何内容之前，使用基于文本的工具（WebSearch、WebFetch）查找候选页面，阅读其内容，并决定哪些页面真正值得截图。

研究优先的工作流程

WebSearch 查找主题的候选页面
WebFetch 每个候选页面以读取其文本内容——检查是否包含所需的信息/视觉内容
评估：此页面值得截图吗？它是否有一个清晰、聚焦的区域适合作为插图？
只有在此之后才打开浏览器捕获截图

这可以节省大量时间——大多数候选页面不值得截图，你可以在无需浏览器导航开销的情况下淘汰它们。

何时改用浏览器优先

🇺🇸English

Skill: Browser Screenshot

Take focused screenshots of specific regions on web pages — a Reddit post, a tweet, an article section, a chart, etc. — not just a full-page dump.

Prerequisite : agent-browser must be installed and Chrome must have remote debugging enabled. See references/agent-browser-setup.md if unsure.

Overview

This skill handles the full pipeline:

Research the best page to screenshot (web search, fetch)
Navigate to the right page in the browser
Locate the target element/region on the page
Capture a focused, cropped screenshot of just that region

Hard Rule: No Full-Screen Screenshots

NEVER output an uncropped full-viewport or full-page screenshot as a final result. Full screenshots contain too much noise (nav bars, sidebars, ads, unrelated content) and are unsuitable as article illustrations. Every screenshot MUST be cropped to a focused region.

Step 0: Research — Find and Validate Sources Before Opening the Browser

The browser is for capturing, not for browsing. Before opening anything in Chrome, use text-based tools (WebSearch, WebFetch) to find candidate pages, read their content, and decide which ones are actually worth screenshotting.

Research-First Workflow

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

主题类型	最佳查找页面	如何查找
新模型/功能发布（< 6 个月）	宣布它的官方博客文章	WebSearch `"<模型名称>" site:<供应商域名> blog`
成熟产品（> 6 个月）	产品落地页或文档概览	WebSearch `"<模型名称>" official page`
开源模型	HuggingFace 模型卡片或 GitHub 仓库	直接 URL：`huggingface.co/<组织>/<模型>`
API 服务	API 文档页面	WebSearch `"<服务名称>" API docs`

什么是好的截图来源

核心原则：少即是多。专注于内容，而非界面。

一个好的截图来源包含一个聚焦的、自包含的信息片段——一段文字、一个关键引用、一个数据表格、一个图表。它不应该是一个充满按钮、导航、侧边栏和交互元素的繁忙页面。

首选：带有清晰标题和 1-2 段文字的博客文章段落。单个图表或示意图。带有名称和描述的模型卡片标题。引用或关键发现。
避免：带有行动号召和导航的完整落地页。包含多个面板的仪表板视图。以 UI 控件（按钮、下拉菜单、表单）而非可读内容为主的页面。
官方博客文章是理想选择：它们有主图、醒目的标题和专为分享设计的简洁描述
产品落地页可能有效，但前提是你只裁剪到主图部分——忽略其余部分
HuggingFace 模型卡片对于开源模型是可靠的：布局一致，模型名称 + 描述始终在顶部
API 文档是可接受的备选方案：显示产品名称和关键规格

经验法则：如果你计划捕获的区域包含的交互式 UI 元素（按钮、链接、导航项）多于可读文本内容，那么这是一个糟糕的裁剪。找一个内容更丰富的区域，或者完全选择另一个页面。

飞行前 URL 验证

在浏览器中打开之前，使用 WebFetch（轻量级 HEAD/GET）验证 URL，以避免在 404 或重定向上浪费时间：

WebFetch: <候选-url>
→ 检查状态码、标题和内容片段
→ 如果是 404 或重定向到无关页面，尝试下一个候选

思考文章读者需要在此截图中看到什么：

文章上下文	需要捕获的内容	目标区域
在系列中介绍模型	模型名称 + 关键标语/描述	博客主图部分或 HF 模型卡片标题
比较功能	功能亮点或规格表	显示规格/功能的博客部分
讨论特定功能	功能描述	相关章节标题 + 1-2 个段落
展示产品/服务	品牌标识 + 价值主张	落地页主图（标题 + 副标题 + 视觉元素）

截图应该让读者想到“啊，这就是这个模型/产品”——而不是“我在看什么？”

步骤 1：导航到目标页面

始终从列出标签页开始

agent-browser --auto-connect tab list

检查页面是否已打开。重用现有标签页——它们具有登录会话和正确的状态。

按输入类型导航

用户提供	策略
直接 URL	`agent-browser --auto-connect open <url>`
搜索查询	`open https://www.google.com/search?q=<编码后查询>` → 找到并点击最佳结果
平台 + 主题	构建平台搜索 URL（见下文）→ 定位目标内容
模糊描述	Google 搜索 → 评估结果 → 导航到最佳匹配

平台特定搜索 URL

平台	搜索 URL 模式
Reddit	`https://www.reddit.com/search/?q=<查询>`
X / Twitter	`https://x.com/search?q=<查询>`
LinkedIn	`https://www.linkedin.com/search/results/content/?keywords=<查询>`
Hacker News	`https://hn.algolia.com/?q=<查询>`
GitHub	`https://github.com/search?q=<查询>`
YouTube	`https://www.youtube.com/results?search_query=<查询>`

导航后，等待内容稳定：

agent-browser --auto-connect wait --load networkidle

注意：某些网站（Reddit、X、LinkedIn）永远不会达到 networkidle。如果 open 命令的输出中已经显示了页面标题，则跳过等待。使用 wait 2000 作为安全的替代方案。

步骤 2：定位目标区域

这是关键步骤。目标是找到一个CSS 选择器，它能精确地包裹要捕获的内容。

主要方法：DOM 选择器发现

拍摄带注释的截图以了解页面布局：

agent-browser --auto-connect screenshot --annotate
拍摄快照以查看页面的可访问性树：

agent-browser --auto-connect snapshot -i
识别目标容器元素。寻找：

 * 语义化 HTML 容器：`<article>`、`<main>`、`<section>`

 * 平台特定组件（见平台选择器）
 * 数据属性：`[data-testid="..."]`、`[data-id="..."]`

4. 使用get box验证以确认元素具有合理的边界框：

     agent-browser --auto-connect get box "<selector>"

这将返回 { x, y, width, height }。进行合理性检查：

 * 宽度应 > 100px 且 < 视口宽度
 * 高度应 > 50px
 * 如果边界框是整个页面，则选择器太宽泛——需要细化

5. 如果选择器难以找到，使用 eval 探索 DOM：

     agent-browser --auto-connect eval "document.querySelector('article')?.getBoundingClientRect()"

流行平台的常见容器选择器：

平台	目标	典型选择器
Reddit	一个帖子	`shreddit-post`、`[data-testid="post-container"]`
X / Twitter	一条推文	`article[data-testid="tweet"]`
LinkedIn	一个动态帖子	`.feed-shared-update-v2`
Hacker News	一个故事 + 评论	`#hnmain .fatitem`
GitHub	一个仓库卡片	`[data-hpc]`、`.repository-content`
YouTube	视频播放器区域	`#player-container-outer`
通用文章	主要内容	`article`、`main`、`[role="main"]`、`.post-content`、`.article-body`

这些选择器可能会随时间变化。使用前务必用 get box 验证。

如果选择器匹配多个元素（例如，时间线上的多条推文），请缩小范围：

# 计算匹配数
agent-browser --auto-connect get count "article[data-testid='tweet']"

# 使用 nth-child 或 :first-of-type，或更具体的选择器
# 或者使用 eval 通过文本内容找到正确的一个：
agent-browser --auto-connect eval --stdin <<'EOF'
const posts = document.querySelectorAll('article[data-testid="tweet"]');
for (let i = 0; i < posts.length; i++) {
  const text = posts[i].textContent.substring(0, 80);
  console.log(i, text);
}
EOF

然后使用 :nth-of-type(N) 或唯一的父选择器来定位特定的一个。

步骤 3：捕获聚焦截图

方法 A：滚动 + 视口截图（适用于视口大小的目标）

当目标元素适合视口大小时最佳。

# 将目标滚动到视图中
agent-browser --auto-connect scrollintoview "<selector>"
agent-browser --auto-connect wait 500

# 拍摄视口截图
agent-browser --auto-connect screenshot /tmp/browser-screenshot-raw.png

然后使用边界框进行裁剪（见裁剪部分）。

方法 B：整页截图 + 裁剪（适用于任何大小的目标）

当目标可能大于视口或需要精确裁剪时最佳。

# 拍摄整页截图
agent-browser --auto-connect screenshot --full /tmp/browser-screenshot-full.png

# 获取目标元素的边界框
agent-browser --auto-connect get box "<selector>"
# 输出：{ x: 200, y: 450, width: 680, height: 520 }

然后裁剪（见裁剪部分）。

使用 ImageMagick（IMv7 上是 magick，convert 已弃用）将截图裁剪到目标区域。添加内边距以获得视觉呼吸空间。

Retina 显示屏处理

关键：在 macOS Retina 显示屏上，截图以 2 倍分辨率捕获。1728x940 的视口会产生 3456x1880 的图像。你必须考虑到这一点：

检测缩放因子：比较视口大小与实际图像尺寸：

检查实际图像尺寸

magick identify /tmp/screenshot.png

→ 3456x1880 表示在 1728x940 视口上为 2 倍缩放
在裁剪之前，将 get box 坐标乘以缩放因子：

get box 返回视口坐标：{ x: 200, y: 450, width: 680, height: 520 }

对于 2 倍 Retina，实际图像坐标为：

SCALE=2 X=$((200 * SCALE)) Y=$((450 * SCALE)) W=$((680 * SCALE)) H=$((520 * SCALE)) PADDING=$((16 * SCALE))

magick /tmp/browser-screenshot-full.png \
  -crop $((W + PADDING*2))x$((H + PADDING*2))+$((X - PADDING))+$((Y - PADDING)) \
  +repage \
  <输出路径>.png

重要：get box 返回浮点数值。在传递给 ImageMagick 之前将它们四舍五入为整数。

内边距：使用 12–20px（视口 px）。如果目标有清晰的视觉边界（卡片、带边框的框），则增加到约 30px。如果用户想要紧密裁剪，则使用 0。

如果用户指定了输出路径，则使用该路径
否则，保存到当前目录的描述性名称，例如 reddit-post-screenshot.png、tweet-screenshot.png

步骤 4：验证结果

裁剪后，读取输出图像以验证其捕获了正确的内容：

# 使用 Read 工具视觉检查裁剪后的截图

如果裁剪错误（遗漏内容、过多空白、错误元素），请调整选择器或边界框并重试。

备用方案：视觉高亮确认

当基于 DOM 的定位不确定时——选择器可能错误、存在多个候选、或目标不明确——在裁剪之前使用JS 注入高亮进行视觉确认。

在候选元素上注入高亮边框：

agent-browser --auto-connect eval --stdin <<'EOF' (function() { const el = document.querySelector('<selector>'); if (!el) { console.log('NOT_FOUND'); return; } el.style.outline = '4px solid red'; el.style.outlineOffset = '2px'; el.scrollIntoView({ block: 'center' }); })(); EOF
拍摄截图并进行视觉检查：

agent-browser --auto-connect screenshot /tmp/highlight-check.png

读取截图以检查红色边框是否包围了正确的内容。

如果正确，移除高亮并继续裁剪：

agent-browser --auto-connect eval "document.querySelector('<selector>').style.outline = ''; document.querySelector('<selector>').style.outlineOffset = '';"
如果错误，尝试下一个候选或优化选择器，重新高亮，并重新检查。

何时使用此备用方案

页面具有复杂/嵌套的组件，你不确定哪个容器是正确的
存在多个相似元素，你需要选择正确的一个
用户的描述模糊（“页面中间的那个图表”）
get box 结果看起来可疑（太大、太小、零尺寸）

页面准备：捕获前清理

在拍摄最终截图之前，清理页面以获得更好的结果：

# 关闭 Cookie 横幅、弹窗、覆盖层
agent-browser --auto-connect eval --stdin <<'EOF'
(function() {
  // 常见的 Cookie/弹窗选择器
  const selectors = [
    '[class*="cookie"] button',
    '[class*="consent"] button',
    '[class*="banner"] [class*="close"]',
    '[class*="modal"] [class*="close"]',
    '[class*="popup"] [class*="close"]',
    '[aria-label="Close"]',
    '[data-testid="close"]'
  ];
  selectors.forEach(sel => {
    document.querySelectorAll(sel).forEach(el => {
      if (el.offsetParent !== null) el.click();
    });
  });

  // 隐藏覆盖内容的固定/粘性元素（导航栏、横幅）
  document.querySelectorAll('*').forEach(el => {
    const style = getComputedStyle(el);
    if ((style.position === 'fixed' || style.position === 'sticky') && el.tagName !== 'HTML' && el.tagName !== 'BODY') {
      el.style.display = 'none';
    }
  });
})();
EOF

谨慎使用：隐藏固定元素可能会移除重要上下文。仅当覆盖层明显遮挡目标区域时才运行此操作。

无法关闭的 Cookie 横幅

一些 Cookie 同意横幅（例如 Jina AI 的 Usercentrics）位于 Shadow DOM 或 iframe 中，无法通过 JS click() 或 remove() 关闭。不要在多次 JS 尝试上浪费时间。而是：

裁剪掉它——如果横幅在顶部或底部，只需调整裁剪区域以排除它。这是最快、最可靠的方法。
滚动过去——在捕获之前将目标内容滚动到远离横幅区域的位置。

为了一致、高质量的截图，请在捕获前设置视口：

# 标准桌面视口
agent-browser --auto-connect set viewport 1280 800

# 更宽用于仪表板/数据密集型页面
agent-browser --auto-connect set viewport 1440 900

# 更窄用于类似移动端的内容（社交媒体帖子）
agent-browser --auto-connect set viewport 800 600

选择一个能使目标内容清晰呈现的视口宽度——不要太拥挤，也不要太拉伸。

完整示例：截取 Reddit 帖子截图

用户：“截取 r/programming 上热门帖子的截图”

# 1. 列出已有标签页
agent-browser --auto-connect tab list

# 2. 导航到子版块
agent-browser --auto-connect open https://www.reddit.com/r/programming/
agent-browser --auto-connect wait 2000

# 3. 找到第一个帖子容器
agent-browser --auto-connect eval "document.querySelector('shreddit-post')?.getBoundingClientRect()"

# 4. 将其滚动到视图中
agent-browser --auto-connect scrollintoview "shreddit-post"
agent-browser --auto-connect wait 500

# 5. 获取边界框
agent-browser --auto-connect get box "shreddit-post"
# → { x: 312, y: 80, width: 656, height: 420 }

# 6. 拍摄整页截图
agent-browser --auto-connect screenshot --full /tmp/reddit-raw.png

# 7. 带内边距裁剪
convert /tmp/reddit-raw.png \
  -crop 688x452+296+64 +repage \
  reddit-post-screenshot.png

# 8. 通过读取输出图像验证

关键命令快速参考

命令	用途
`tab list`	列出打开的标签页
`open <url>`	导航到 URL
`wait 2000`	等待内容稳定
`snapshot -i`	查看交互元素
`screenshot --annotate`	带标签的视觉概览
`screenshot --full <路径>`	整页截图
`get box "<selector>"`	获取元素边界框
`scrollintoview "<sel>"`	将元素滚动到视图中
`eval <js>`	在页面中运行 JavaScript
`set viewport <宽> <高>`	设置视口尺寸

`get box` 返回 null 或零尺寸

选择器不匹配任何元素。使用 get count "<selector>" 验证。
元素可能被隐藏或尚未渲染。尝试 wait 2000 并重试。

裁剪后的图像是空白或错误区域

整页截图的坐标可能与视口坐标不同。将 screenshot --full 与 get box 一起使用（它们使用相同的坐标系）。
检查页面是否有水平滚动——get box 的 x 值可能有偏移。

目标元素在 iframe 内部

get box 和 snapshot -i 无法看到 iframe 内部。

使用 eval 访问 iframe 内容：

agent-browser --auto-connect eval "document.querySelector('iframe').contentDocument.querySelector('<sel>').getBoundingClientRect()"

注意：仅适用于同源 iframe。

`open` 成功但页面内容错误

浏览器可能切换到了不同的标签页（例如，弹窗或重定向打开了新标签页）。导航后务必验证：
```
agent-browser --auto-connect eval "document.location.href"
```
如果 URL 错误，使用 tab list 找到正确的标签页，并使用 tab goto <N> 切换。

截图命令因字体超时

某些页面（例如 Google 开发者文档）在 document.fonts.ready 上挂起。首先强制解析它：
```
agent-browser --auto-connect eval "document.fonts.ready.then(() => 'ok')"
```

然后重试截图。

页面有懒加载内容

在拍摄截图前向下滚动以触发加载：

agent-browser --auto-connect scroll down 1000
agent-browser --auto-connect wait 1500
agent-browser --auto-connect scroll up 1000

WebSearch to find candidate pages for the topic

WebFetch each candidate to read its text content — check if it has the information/visual you need

Evaluate : Is this page worth a screenshot? Does it have a clear, focused region that would work as an illustration?

Only then open the browser to capture the screenshot

This saves significant time — most candidate pages won't be worth screenshotting, and you can eliminate them without the overhead of browser navigation.

When to Use Browser-First Instead

Skip the WebSearch/WebFetch phase and go directly to Chrome browsing when:

The target platform requires login — Reddit, LinkedIn, X/Twitter, and other social platforms often gate content behind login walls. If the user's Chrome session is already logged in, use the browser directly.
The user specifies a platform with a clear search need — e.g., "find a Reddit post about X" or "screenshot a tweet about Y". Go straight to the platform's search in Chrome.
WebFetch returns blocked/incomplete content — some sites aggressively block non-browser requests. If you get a 403, a CAPTCHA page, or stripped content, switch to Chrome.

In these cases, Chrome browsing replaces WebSearch — navigate to the platform's search page, browse results, and evaluate pages visually before deciding what to screenshot.

Page Selection Strategy

The right page depends on the context of the article and how recent/notable the subject is:

Subject Type	Best Page to Find	How to Find It
New model/feature launch (< 6 months)	Official blog post announcing it	WebSearch `"<model name>" site:<vendor-domain> blog`
Established product (> 6 months)	Product landing page or docs overview	WebSearch `"<model name>" official page`
Open-source model	HuggingFace model card or GitHub repo	Direct URL: `huggingface.co/<org>/<model>`
API service	API documentation page	WebSearch `"<service name>" API docs`

Note : This table lists common subject types but is not exhaustive. Apply the same research-first strategy to any subject type — find the most authoritative and visually clean source page for the topic at hand.

What Makes a Good Screenshot Source

Core principle: Less is more. Focus on content, not chrome.

A good screenshot source contains a focused, self-contained piece of information — a paragraph of text, a key quote, a data table, a diagram. It should NOT be a busy page full of buttons, navigation, sidebars, and interactive elements.

Prefer : A section of a blog post with a clear heading and 1-2 paragraphs of text. A single chart or diagram. A model card header with name and description. A quote or key finding.
Avoid : Full landing pages with CTAs and navigation. Dashboard views with multiple panels. Pages dominated by UI controls (buttons, dropdowns, forms) rather than readable content.
Official blog posts are ideal: they have hero images, prominent titles, and concise descriptions designed for sharing
Product landing pages can work but only if you crop to the hero section — ignore the rest
HuggingFace model cards are reliable for open-source models: consistent layout, model name + description always at top
API docs are acceptable fallback: show the product name and key specs

Rule of thumb : If the region you plan to capture contains more interactive UI elements (buttons, links, nav items) than readable text content, it's a bad crop. Find a more content-rich region, or pick a different page entirely.

Pre-Flight URL Validation

Before opening in the browser, validate URLs with WebFetch (lightweight HEAD/GET) to avoid wasting time on 404s or redirects:

WebFetch: <candidate-url>
→ Check status code, title, and content snippet
→ If 404 or redirect to unrelated page, try next candidate

Region Selection Strategy

Think about what the article reader needs to see in this screenshot:

Article Context	What to Capture	Target Region
Introducing a model in a lineup	Model name + key tagline/description	Blog hero section or HF model card header
Comparing capabilities	Feature highlights or spec table	Blog section showing specs/features
Discussing a specific feature	The feature description	Relevant section heading + 1-2 paragraphs
Showing a product/service	Brand identity + value prop	Landing page hero (title + subtitle + visual)

The screenshot should make the reader think "ah, that's what this model/product is" — not "what am I looking at?"

Step 1: Navigate to the Target Page

Always Start by Listing Tabs

agent-browser --auto-connect tab list

Check if the page is already open. Reuse existing tabs — they have login sessions and correct state.

Navigation by Input Type

User Provides	Strategy
Direct URL	`agent-browser --auto-connect open <url>`
Search query	`open https://www.google.com/search?q=<encoded-query>` → find and click the best result
Platform + topic	Construct platform search URL (see below) → locate target content
Vague description	Google search → evaluate results → navigate to best match

Platform-Specific Search URLs

Platform	Search URL Pattern
Reddit	`https://www.reddit.com/search/?q=<query>`
X / Twitter	`https://x.com/search?q=<query>`
LinkedIn	`https://www.linkedin.com/search/results/content/?keywords=<query>`
Hacker News	`https://hn.algolia.com/?q=<query>`
GitHub	`https://github.com/search?q=<query>`
YouTube	`https://www.youtube.com/results?search_query=<query>`

After navigation, wait for content to settle:

agent-browser --auto-connect wait --load networkidle

Note : Some sites (Reddit, X, LinkedIn) never reach networkidle. If open already shows the page title in its output, skip the wait. Use wait 2000 as a safe alternative.

Step 2: Locate the Target Region

This is the critical step. The goal is to find a CSS selector that precisely wraps the content to capture.

Primary Method: DOM Selector Discovery

Take an annotated screenshot to understand the page layout:
```
agent-browser --auto-connect screenshot --annotate
```
Take a snapshot to see the page's accessibility tree:
```
agent-browser --auto-connect snapshot -i
```
Identify the target container element. Look for:
- Semantic HTML containers: <article>, <main>, <section>
- Platform-specific components (see Platform Selectors)
- Data attributes: [data-testid="..."], [data-id="..."]
Verify withget box to confirm the element has a reasonable bounding box:
```
agent-browser --auto-connect get box "<selector>"
```

This returns { x, y, width, height }. Sanity-check:

 * Width should be > 100px and < viewport width
 * Height should be > 50px
 * If the box is the entire page, the selector is too broad — refine it

5. If the selector is hard to find , use eval to explore the DOM:

     agent-browser --auto-connect eval "document.querySelector('article')?.getBoundingClientRect()"

Common container selectors for popular platforms:

Platform	Target	Typical Selector
Reddit	A post	`shreddit-post`, `[data-testid="post-container"]`
X / Twitter	A tweet	`article[data-testid="tweet"]`
LinkedIn	A feed post	`.feed-shared-update-v2`
Hacker News	A story + comments	`#hnmain .fatitem`
GitHub	A repo card	`[data-hpc]`, `.repository-content`
YouTube	Video player area	`#player-container-outer`
Generic article	Main content	`article`, `main`, `[role="main"]`, `.post-content`, `.article-body`

These selectors may change over time. Always verify with get box before using.

Multiple Matching Elements

If the selector matches multiple elements (e.g., multiple tweets on a timeline), narrow it down:

# Count matches
agent-browser --auto-connect get count "article[data-testid='tweet']"

# Use nth-child or :first-of-type, or a more specific selector
# Or use eval to find the right one by text content:
agent-browser --auto-connect eval --stdin <<'EOF'
const posts = document.querySelectorAll('article[data-testid="tweet"]');
for (let i = 0; i < posts.length; i++) {
  const text = posts[i].textContent.substring(0, 80);
  console.log(i, text);
}
EOF

Then target a specific one using :nth-of-type(N) or a unique parent selector.

Step 3: Capture the Focused Screenshot

Method A: Scroll + Viewport Screenshot (Preferred for Viewport-Sized Targets)

Best when the target element fits within the viewport.

# Scroll the target into view
agent-browser --auto-connect scrollintoview "<selector>"
agent-browser --auto-connect wait 500

# Take viewport screenshot
agent-browser --auto-connect screenshot /tmp/browser-screenshot-raw.png

Then crop using the bounding box (see Cropping).

Method B: Full-Page Screenshot + Crop (For Any Size Target)

Best when the target might be larger than the viewport or when precise cropping is needed.

# Take full-page screenshot
agent-browser --auto-connect screenshot --full /tmp/browser-screenshot-full.png

# Get the target element's bounding box
agent-browser --auto-connect get box "<selector>"
# Output: { x: 200, y: 450, width: 680, height: 520 }

Then crop (see Cropping).

Use ImageMagick (magick on IMv7, convert is deprecated) to crop the screenshot to the target region. Add padding for visual breathing room.

Retina Display Handling

Critical : On macOS Retina displays, screenshots are captured at 2x resolution. A 1728x940 viewport produces a 3456x1880 image. You MUST account for this:

Detect the scale factor : Compare viewport size vs actual image dimensions:

# Check actual image dimensions
magick identify /tmp/screenshot.png
# → 3456x1880 means 2x scale on a 1728x940 viewport

Multiplyget box coordinates by the scale factor before cropping:

# get box returns viewport coordinates: { x: 200, y: 450, width: 680, height: 520 }
# For 2x Retina, actual image coordinates are:
SCALE=2
X=$((200 * SCALE))
Y=$((450 * SCALE))
W=$((680 * SCALE))
H=$((520 * SCALE))
PADDING=$((16 * SCALE))

magick /tmp/browser-screenshot-full.png \
  -crop $((W + PADDING*2))x$((H + PADDING*2))+$((X - PADDING))+$((Y - PADDING)) \
  +repage \
  <output-path>.png

Important : get box returns floating-point values. Round them to integers before passing to ImageMagick.

Padding : Use 12–20px (viewport px). Increase to ~30px if the target has a distinct visual boundary (card, bordered box). Use 0 if the user wants a tight crop.

If the user specifies an output path, use that
Otherwise, save to a descriptive name in the current directory, e.g., reddit-post-screenshot.png, tweet-screenshot.png

Step 4: Verify the Result

After cropping, read the output image to verify it captured the right content:

# Use the Read tool to visually inspect the cropped screenshot

If the crop is wrong (missed content, too much whitespace, wrong element), adjust the selector or bounding box and retry.

Fallback: Visual Highlight Confirmation

When DOM-based location is uncertain — the selector might be wrong, multiple candidates exist, or the target is ambiguous — use JS-injected highlighting to visually confirm before cropping.

Inject a highlight border on the candidate element:

agent-browser --auto-connect eval --stdin <<'EOF'
(function() {
  const el = document.querySelector('<selector>');
  if (!el) { console.log('NOT_FOUND'); return; }
  el.style.outline = '4px solid red';
  el.style.outlineOffset = '2px';
  el.scrollIntoView({ block: 'center' });
})();
EOF

Take a screenshot and visually inspect:

agent-browser --auto-connect screenshot /tmp/highlight-check.png

Read the screenshot to check if the red border surrounds the correct content.

If correct , remove the highlight and proceed with cropping:

agent-browser --auto-connect eval "document.querySelector('<selector>').style.outline = ''; document.querySelector('<selector>').style.outlineOffset = '';"

If wrong , try the next candidate or refine the selector, re-highlight, and re-check.

When to Use This Fallback

The page has complex/nested components and you're not sure which container is right
Multiple similar elements exist and you need to pick the correct one
The user's description is vague ("that chart in the middle of the page")
The get box result looks suspicious (too large, too small, zero-sized)

Page Preparation: Clean Up Before Capture

Before taking the final screenshot, clean up the page for a better result:

# Dismiss cookie banners, popups, overlays
agent-browser --auto-connect eval --stdin <<'EOF'
(function() {
  // Common cookie/popup selectors
  const selectors = [
    '[class*="cookie"] button',
    '[class*="consent"] button',
    '[class*="banner"] [class*="close"]',
    '[class*="modal"] [class*="close"]',
    '[class*="popup"] [class*="close"]',
    '[aria-label="Close"]',
    '[data-testid="close"]'
  ];
  selectors.forEach(sel => {
    document.querySelectorAll(sel).forEach(el => {
      if (el.offsetParent !== null) el.click();
    });
  });

  // Hide fixed/sticky elements that overlay content (nav bars, banners)
  document.querySelectorAll('*').forEach(el => {
    const style = getComputedStyle(el);
    if ((style.position === 'fixed' || style.position === 'sticky') && el.tagName !== 'HTML' && el.tagName !== 'BODY') {
      el.style.display = 'none';
    }
  });
})();
EOF

Use with caution : Hiding fixed elements might remove important context. Only run this when overlays visibly obstruct the target region.

Cookie Banners That Won't Dismiss

Some cookie consent banners (e.g., Jina AI's Usercentrics) live in shadow DOM or iframes and cannot be dismissed via JS click() or remove(). Don't waste time with multiple JS attempts. Instead:

Crop it out — if the banner is at the top or bottom, simply adjust the crop region to exclude it. This is the fastest and most reliable approach.
Scroll past it — scroll the target content away from the banner area before capturing.

For consistent, high-quality screenshots, set the viewport before capturing:

# Standard desktop viewport
agent-browser --auto-connect set viewport 1280 800

# Wider for dashboard/data-heavy pages
agent-browser --auto-connect set viewport 1440 900

# Narrower for mobile-like content (social media posts)
agent-browser --auto-connect set viewport 800 600

Choose a viewport width that makes the target content render cleanly — not too cramped, not too stretched.

Complete Example: Screenshot a Reddit Post

User: "Screenshot the top post on r/programming"

# 1. List existing tabs
agent-browser --auto-connect tab list

# 2. Navigate to subreddit
agent-browser --auto-connect open https://www.reddit.com/r/programming/
agent-browser --auto-connect wait 2000

# 3. Find the first post container
agent-browser --auto-connect eval "document.querySelector('shreddit-post')?.getBoundingClientRect()"

# 4. Scroll it into view
agent-browser --auto-connect scrollintoview "shreddit-post"
agent-browser --auto-connect wait 500

# 5. Get bounding box
agent-browser --auto-connect get box "shreddit-post"
# → { x: 312, y: 80, width: 656, height: 420 }

# 6. Take full-page screenshot
agent-browser --auto-connect screenshot --full /tmp/reddit-raw.png

# 7. Crop with padding
convert /tmp/reddit-raw.png \
  -crop 688x452+296+64 +repage \
  reddit-post-screenshot.png

# 8. Verify by reading the output image

Command	Purpose
`tab list`	List open tabs
`open <url>`	Navigate to URL
`wait 2000`	Wait for content to settle
`snapshot -i`	See interactive elements
`screenshot --annotate`	Visual overview with labels
`screenshot --full <path>`	Full-page screenshot
`get box "<selector>"`	Get element bounding box
`scrollintoview "<sel>"`	Scroll element into view
`eval <js>`	Run JavaScript in page
`set viewport <w> <h>`	Set viewport dimensions

浏览器截图技能：精准截取网页区域，告别全屏干扰，提升内容创作效率

🇨🇳中文介绍

技能：浏览器截图

概述

硬性规则：禁止全屏截图

步骤 0：研究——在打开浏览器之前查找并验证来源

研究优先的工作流程

何时改用浏览器优先

🇺🇸English

Skill: Browser Screenshot

Overview

Hard Rule: No Full-Screen Screenshots

Step 0: Research — Find and Validate Sources Before Opening the Browser

Research-First Workflow

相关 Skills

页面选择策略

什么是好的截图来源

飞行前 URL 验证

区域选择策略

步骤 1：导航到目标页面

始终从列出标签页开始

按输入类型导航

平台特定搜索 URL

等待页面加载

步骤 2：定位目标区域

主要方法：DOM 选择器发现

平台选择器

多个匹配元素

步骤 3：捕获聚焦截图

方法 A：滚动 + 视口截图（适用于视口大小的目标）

方法 B：整页截图 + 裁剪（适用于任何大小的目标）

裁剪

Retina 显示屏处理

检查实际图像尺寸

→ 3456x1880 表示在 1728x940 视口上为 2 倍缩放

get box 返回视口坐标：{ x: 200, y: 450, width: 680, height: 520 }

对于 2 倍 Retina，实际图像坐标为：

裁剪命令

输出路径