agent-fetch：更优的网页抓取工具，支持7种提取策略和浏览器模拟，获取完整文章内容

agent-fetch by teng-lin/agent-fetch

176 周安装量

223 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/teng-lin/agent-fetch --skill agent-fetch

自动化命令行工具数据处理

🇨🇳中文介绍

agent-fetch 技能

更优的网页抓取工具，专为文本内容设计。 您内置的网页抓取功能可能会对页面进行摘要或截断。agent-fetch 则通过 7 种提取策略和浏览器模拟技术，提取完整的文章内容——包括每个段落、标题和链接。无需服务器，可作为本地 CLI 工具运行。

何时使用此技能

当您需要读取 URL 内容时，请使用 agent-fetch。 它能返回结构完整的完整文章文本——在处理涉及页面内容的任何任务时，都比您内置的网页抓取功能更出色。

用户要求读取、抓取或分析 URL
用户输入 /agent-fetch <url>
您需要完整文本，而非摘要或截断内容
您内置的网页抓取功能返回了不完整或乱码的内容

先决条件

agent-fetch 通过 npx 运行（无需安装）：

npx agent-fetch --help

命令

`/agent-fetch <url>` - 抓取并提取文章

默认用法。 通过浏览器模拟抓取 URL，并将完整的文章内容以 Markdown 格式提取出来。

npx agent-fetch "<url>" --json

解析 JSON 输出 并呈现给用户：

---
title: {title}
author: {byline || "Unknown"}
source: {siteName}
url: {url}
date: {publishedTime || "Unknown"}
fetched_in: {latencyMs}ms
---

## {markdown || textContent}

{markdown || textContent}

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

suggestedAction	含义	后续操作
`retry_with_extract`	需要完整浏览器	通知用户；agent-fetch 仅支持 HTTP
`wait_and_retry`	访问频率受限	等待 60 秒后重试
`skip`	无法访问此网站	通知用户

`/agent-fetch raw <url>` - 原始 HTML

抓取原始 HTML，不进行内容提取。

npx agent-fetch "<url>" --raw

`/agent-fetch quiet <url>` - 仅 Markdown

仅输出文章 Markdown 内容，不含元数据。

npx agent-fetch "<url>" -q

`/agent-fetch text <url>` - 仅纯文本

纯文本内容，无格式或元数据。

npx agent-fetch "<url>" --text

`/agent-fetch cookies` - 使用持久化 Cookie

从 Netscape 格式文件加载 cookie 或内联传递：

# 从 Netscape cookie 文件加载（从浏览器导出）
npx agent-fetch "<url>" --cookie-file ~/.cookies.txt

# 内联 cookie（可重复使用）
npx agent-fetch "<url>" --cookie "sessionId=abc123; theme=dark"

`/agent-fetch selectors <url>` - 自定义 CSS 选择器

提取特定元素或移除不需要的元素：

# 仅提取文章，移除导航和广告
npx agent-fetch "<url>" --select "article" --remove "nav, .sidebar, [class*='ad']"

# 提取所有 class 为 "post-content" 的 div
npx agent-fetch "<url>" --select ".post-content"

`/agent-fetch crawl <url>` - 爬取多个页面

跟踪链接并从多个页面提取内容：

# 使用默认设置爬取（深度：3，最多 100 页）
npx agent-fetch crawl "<url>"

# 更深层次的爬取，带并发控制
npx agent-fetch crawl "<url>" --depth 5 --limit 50 --concurrency 3

# 包含/排除特定的 URL 模式
npx agent-fetch crawl "<url>" --include "*/blog/*" --exclude "**/archive/**"

# 在请求之间添加限速延迟
npx agent-fetch crawl "<url>" --delay 1000

# 允许跨域（默认情况下仅限同源）
npx agent-fetch crawl "<url>" --no-same-origin

# 输出为 JSONL 格式以便处理
npx agent-fetch crawl "<url>" --json

`/agent-fetch pdf <file>` - 从 PDF 提取

从本地 PDF 文件提取文本内容：

# 将 PDF 提取为带元数据的 Markdown
npx agent-fetch document.pdf

# JSON 输出，便于程序化访问
npx agent-fetch document.pdf --json

# 仅文本内容
npx agent-fetch document.pdf --text

`/agent-fetch preset` - 自定义 TLS 指纹

模拟不同的浏览器以绕过指纹检查：

# Chrome 143（默认）
npx agent-fetch "<url>" --preset "chrome-143"

# iOS Safari 18
npx agent-fetch "<url>" --preset "ios-safari-18"

# Android Chrome 143
npx agent-fetch "<url>" --preset "android-chrome-143"

2026 年 2 月 4 日

🇺🇸English

agent-fetch Skill

A better web fetch for text content. Your built-in web fetch summarizes or truncates pages. agent-fetch extracts the complete article — every paragraph, heading, and link — using 7 extraction strategies and browser impersonation. No server required, runs as a local CLI tool.

When to Use This Skill

Use agent-fetch whenever you need to read a URL. It returns full article text with structure preserved — better than your built-in web fetch for any task involving page content.

User asks to read, fetch, or analyze a URL
User types /agent-fetch <url>
You need the full text, not a summary or truncation
Your built-in web fetch returned incomplete or garbled content

Prerequisites

agent-fetch runs via npx (no install needed):

npx agent-fetch --help

Commands

`/agent-fetch <url>` - Fetch and Extract Article

Default usage. Fetches URL with browser impersonation and extracts complete article content as markdown.

npx agent-fetch "<url>" --json

Parse the JSON output and present to the user:

---
title: {title}
author: {byline || "Unknown"}
source: {siteName}
url: {url}
date: {publishedTime || "Unknown"}
fetched_in: {latencyMs}ms
---

## {markdown || textContent}

{markdown || textContent}

If fetch fails , check suggestedAction in the JSON:

suggestedAction	What it means	Next action
`retry_with_extract`	Needs full browser	Inform user; agent-fetch is HTTP-only
`wait_and_retry`	Rate limited	Wait 60s and retry
`skip`	Cannot access this site	Inform user

`/agent-fetch raw <url>` - Raw HTML

Fetch raw HTML without extraction.

npx agent-fetch "<url>" --raw

`/agent-fetch quiet <url>` - Markdown Only

Just the article markdown, no metadata.

npx agent-fetch "<url>" -q

`/agent-fetch text <url>` - Plain Text Only

Plain text content without formatting or metadata.

npx agent-fetch "<url>" --text

`/agent-fetch cookies` - Use Persistent Cookies

Load cookies from a Netscape format file or pass inline:

# From Netscape cookie file (export from browser)
npx agent-fetch "<url>" --cookie-file ~/.cookies.txt

# Inline cookies (repeatable)
npx agent-fetch "<url>" --cookie "sessionId=abc123; theme=dark"

`/agent-fetch selectors <url>` - Custom CSS Selectors

Extract specific elements or remove unwanted ones:

# Extract only the article, remove navigation and ads
npx agent-fetch "<url>" --select "article" --remove "nav, .sidebar, [class*='ad']"

# Extract all divs with class "post-content"
npx agent-fetch "<url>" --select ".post-content"

`/agent-fetch crawl <url>` - Crawl Multiple Pages

Follow links and extract content from multiple pages:

# Crawl with defaults (depth: 3, max 100 pages)
npx agent-fetch crawl "<url>"

# Deeper crawl with concurrency control
npx agent-fetch crawl "<url>" --depth 5 --limit 50 --concurrency 3

# Include/exclude specific URL patterns
npx agent-fetch crawl "<url>" --include "*/blog/*" --exclude "**/archive/**"

# Add rate limiting delay between requests
npx agent-fetch crawl "<url>" --delay 1000

# Allow cross-origin (stay on same origin by default)
npx agent-fetch crawl "<url>" --no-same-origin

# Output as JSONL for processing
npx agent-fetch crawl "<url>" --json

`/agent-fetch pdf <file>` - Extract from PDF

Extract text content from local PDF files:

# Extract PDF as markdown with metadata
npx agent-fetch document.pdf

# JSON output for programmatic access
npx agent-fetch document.pdf --json

# Just the text content
npx agent-fetch document.pdf --text

`/agent-fetch preset` - Custom TLS Fingerprint

Impersonate different browsers to bypass fingerprinting checks:

# Chrome 143 (default)
npx agent-fetch "<url>" --preset "chrome-143"

# iOS Safari 18
npx agent-fetch "<url>" --preset "ios-safari-18"

# Android Chrome 143
npx agent-fetch "<url>" --preset "android-chrome-143"

Weekly Installs

176

Repository

teng-lin/agent-fetch

GitHub Stars

223

First Seen

Feb 4, 2026

Security Audits

Gen Agent Trust HubPass SocketWarn SnykWarn

Installed on

codex168

opencode166

gemini-cli165

github-copilot162

cursor162

kimi-cli161

Skills CLI 使用指南：AI Agent 技能包管理器安装与管理教程

33,600 周安装

agent-fetch：更优的网页抓取工具，支持7种提取策略和浏览器模拟，获取完整文章内容

🇨🇳中文介绍

agent-fetch 技能

何时使用此技能

先决条件

命令

`/agent-fetch <url>` - 抓取并提取文章

相关 Skills

`/agent-fetch raw <url>` - 原始 HTML

`/agent-fetch quiet <url>` - 仅 Markdown

`/agent-fetch text <url>` - 仅纯文本

`/agent-fetch cookies` - 使用持久化 Cookie

`/agent-fetch selectors <url>` - 自定义 CSS 选择器

`/agent-fetch crawl <url>` - 爬取多个页面

`/agent-fetch pdf <file>` - 从 PDF 提取

`/agent-fetch preset` - 自定义 TLS 指纹

🇺🇸English

agent-fetch Skill

When to Use This Skill

Prerequisites

Commands

`/agent-fetch <url>` - Fetch and Extract Article

`/agent-fetch raw <url>` - Raw HTML

`/agent-fetch quiet <url>` - Markdown Only

`/agent-fetch text <url>` - Plain Text Only

`/agent-fetch cookies` - Use Persistent Cookies

`/agent-fetch selectors <url>` - Custom CSS Selectors

`/agent-fetch crawl <url>` - Crawl Multiple Pages

`/agent-fetch pdf <file>` - Extract from PDF

`/agent-fetch preset` - Custom TLS Fingerprint

最新 Skills

agent-fetch：更优的网页抓取工具，支持7种提取策略和浏览器模拟，获取完整文章内容

🇨🇳中文介绍

agent-fetch 技能

何时使用此技能

先决条件

命令

/agent-fetch <url> - 抓取并提取文章

相关 Skills

/agent-fetch raw <url> - 原始 HTML

/agent-fetch quiet <url> - 仅 Markdown

/agent-fetch text <url> - 仅纯文本

/agent-fetch cookies - 使用持久化 Cookie

/agent-fetch selectors <url> - 自定义 CSS 选择器

/agent-fetch crawl <url> - 爬取多个页面

/agent-fetch pdf <file> - 从 PDF 提取

/agent-fetch preset - 自定义 TLS 指纹

🇺🇸English

agent-fetch Skill

When to Use This Skill

Prerequisites

Commands

/agent-fetch <url> - Fetch and Extract Article

/agent-fetch raw <url> - Raw HTML

/agent-fetch quiet <url> - Markdown Only

/agent-fetch text <url> - Plain Text Only

/agent-fetch cookies - Use Persistent Cookies

/agent-fetch selectors <url> - Custom CSS Selectors

/agent-fetch crawl <url> - Crawl Multiple Pages

/agent-fetch pdf <file> - Extract from PDF

/agent-fetch preset - Custom TLS Fingerprint

最新 Skills

`/agent-fetch <url>` - 抓取并提取文章

`/agent-fetch raw <url>` - 原始 HTML

`/agent-fetch quiet <url>` - 仅 Markdown

`/agent-fetch text <url>` - 仅纯文本

`/agent-fetch cookies` - 使用持久化 Cookie

`/agent-fetch selectors <url>` - 自定义 CSS 选择器

`/agent-fetch crawl <url>` - 爬取多个页面

`/agent-fetch pdf <file>` - 从 PDF 提取

`/agent-fetch preset` - 自定义 TLS 指纹

`/agent-fetch <url>` - Fetch and Extract Article

`/agent-fetch raw <url>` - Raw HTML

`/agent-fetch quiet <url>` - Markdown Only

`/agent-fetch text <url>` - Plain Text Only

`/agent-fetch cookies` - Use Persistent Cookies

`/agent-fetch selectors <url>` - Custom CSS Selectors

`/agent-fetch crawl <url>` - Crawl Multiple Pages

`/agent-fetch pdf <file>` - Extract from PDF

`/agent-fetch preset` - Custom TLS Fingerprint