agent-fetch by teng-lin/agent-fetch
npx skills add https://github.com/teng-lin/agent-fetch --skill agent-fetch更优的网页抓取工具,专为文本内容设计。 您内置的网页抓取功能可能会对页面进行摘要或截断。agent-fetch 则通过 7 种提取策略和浏览器模拟技术,提取完整的文章内容——包括每个段落、标题和链接。无需服务器,可作为本地 CLI 工具运行。
当您需要读取 URL 内容时,请使用 agent-fetch。 它能返回结构完整的完整文章文本——在处理涉及页面内容的任何任务时,都比您内置的网页抓取功能更出色。
/agent-fetch <url>agent-fetch 通过 npx 运行(无需安装):
npx agent-fetch --help
/agent-fetch <url> - 抓取并提取文章默认用法。 通过浏览器模拟抓取 URL,并将完整的文章内容以 Markdown 格式提取出来。
npx agent-fetch "<url>" --json
解析 JSON 输出 并呈现给用户:
---
title: {title}
author: {byline || "Unknown"}
source: {siteName}
url: {url}
date: {publishedTime || "Unknown"}
fetched_in: {latencyMs}ms
---
## {markdown || textContent}
{markdown || textContent}
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
如果抓取失败,请检查 JSON 中的 suggestedAction:
| suggestedAction | 含义 | 后续操作 |
|---|---|---|
retry_with_extract | 需要完整浏览器 | 通知用户;agent-fetch 仅支持 HTTP |
wait_and_retry | 访问频率受限 | 等待 60 秒后重试 |
skip | 无法访问此网站 | 通知用户 |
/agent-fetch raw <url> - 原始 HTML抓取原始 HTML,不进行内容提取。
npx agent-fetch "<url>" --raw
/agent-fetch quiet <url> - 仅 Markdown仅输出文章 Markdown 内容,不含元数据。
npx agent-fetch "<url>" -q
/agent-fetch text <url> - 仅纯文本纯文本内容,无格式或元数据。
npx agent-fetch "<url>" --text
/agent-fetch cookies - 使用持久化 Cookie从 Netscape 格式文件加载 cookie 或内联传递:
# 从 Netscape cookie 文件加载(从浏览器导出)
npx agent-fetch "<url>" --cookie-file ~/.cookies.txt
# 内联 cookie(可重复使用)
npx agent-fetch "<url>" --cookie "sessionId=abc123; theme=dark"
/agent-fetch selectors <url> - 自定义 CSS 选择器提取特定元素或移除不需要的元素:
# 仅提取文章,移除导航和广告
npx agent-fetch "<url>" --select "article" --remove "nav, .sidebar, [class*='ad']"
# 提取所有 class 为 "post-content" 的 div
npx agent-fetch "<url>" --select ".post-content"
/agent-fetch crawl <url> - 爬取多个页面跟踪链接并从多个页面提取内容:
# 使用默认设置爬取(深度:3,最多 100 页)
npx agent-fetch crawl "<url>"
# 更深层次的爬取,带并发控制
npx agent-fetch crawl "<url>" --depth 5 --limit 50 --concurrency 3
# 包含/排除特定的 URL 模式
npx agent-fetch crawl "<url>" --include "*/blog/*" --exclude "**/archive/**"
# 在请求之间添加限速延迟
npx agent-fetch crawl "<url>" --delay 1000
# 允许跨域(默认情况下仅限同源)
npx agent-fetch crawl "<url>" --no-same-origin
# 输出为 JSONL 格式以便处理
npx agent-fetch crawl "<url>" --json
/agent-fetch pdf <file> - 从 PDF 提取从本地 PDF 文件提取文本内容:
# 将 PDF 提取为带元数据的 Markdown
npx agent-fetch document.pdf
# JSON 输出,便于程序化访问
npx agent-fetch document.pdf --json
# 仅文本内容
npx agent-fetch document.pdf --text
/agent-fetch preset - 自定义 TLS 指纹模拟不同的浏览器以绕过指纹检查:
# Chrome 143(默认)
npx agent-fetch "<url>" --preset "chrome-143"
# iOS Safari 18
npx agent-fetch "<url>" --preset "ios-safari-18"
# Android Chrome 143
npx agent-fetch "<url>" --preset "android-chrome-143"
每周安装量
176
代码仓库
GitHub 星标数
223
首次出现
2026 年 2 月 4 日
安全审计
已安装于
codex168
opencode166
gemini-cli165
github-copilot162
cursor162
kimi-cli161
A better web fetch for text content. Your built-in web fetch summarizes or truncates pages. agent-fetch extracts the complete article — every paragraph, heading, and link — using 7 extraction strategies and browser impersonation. No server required, runs as a local CLI tool.
Use agent-fetch whenever you need to read a URL. It returns full article text with structure preserved — better than your built-in web fetch for any task involving page content.
/agent-fetch <url>agent-fetch runs via npx (no install needed):
npx agent-fetch --help
/agent-fetch <url> - Fetch and Extract ArticleDefault usage. Fetches URL with browser impersonation and extracts complete article content as markdown.
npx agent-fetch "<url>" --json
Parse the JSON output and present to the user:
---
title: {title}
author: {byline || "Unknown"}
source: {siteName}
url: {url}
date: {publishedTime || "Unknown"}
fetched_in: {latencyMs}ms
---
## {markdown || textContent}
{markdown || textContent}
If fetch fails , check suggestedAction in the JSON:
| suggestedAction | What it means | Next action |
|---|---|---|
retry_with_extract | Needs full browser | Inform user; agent-fetch is HTTP-only |
wait_and_retry | Rate limited | Wait 60s and retry |
skip | Cannot access this site | Inform user |
/agent-fetch raw <url> - Raw HTMLFetch raw HTML without extraction.
npx agent-fetch "<url>" --raw
/agent-fetch quiet <url> - Markdown OnlyJust the article markdown, no metadata.
npx agent-fetch "<url>" -q
/agent-fetch text <url> - Plain Text OnlyPlain text content without formatting or metadata.
npx agent-fetch "<url>" --text
/agent-fetch cookies - Use Persistent CookiesLoad cookies from a Netscape format file or pass inline:
# From Netscape cookie file (export from browser)
npx agent-fetch "<url>" --cookie-file ~/.cookies.txt
# Inline cookies (repeatable)
npx agent-fetch "<url>" --cookie "sessionId=abc123; theme=dark"
/agent-fetch selectors <url> - Custom CSS SelectorsExtract specific elements or remove unwanted ones:
# Extract only the article, remove navigation and ads
npx agent-fetch "<url>" --select "article" --remove "nav, .sidebar, [class*='ad']"
# Extract all divs with class "post-content"
npx agent-fetch "<url>" --select ".post-content"
/agent-fetch crawl <url> - Crawl Multiple PagesFollow links and extract content from multiple pages:
# Crawl with defaults (depth: 3, max 100 pages)
npx agent-fetch crawl "<url>"
# Deeper crawl with concurrency control
npx agent-fetch crawl "<url>" --depth 5 --limit 50 --concurrency 3
# Include/exclude specific URL patterns
npx agent-fetch crawl "<url>" --include "*/blog/*" --exclude "**/archive/**"
# Add rate limiting delay between requests
npx agent-fetch crawl "<url>" --delay 1000
# Allow cross-origin (stay on same origin by default)
npx agent-fetch crawl "<url>" --no-same-origin
# Output as JSONL for processing
npx agent-fetch crawl "<url>" --json
/agent-fetch pdf <file> - Extract from PDFExtract text content from local PDF files:
# Extract PDF as markdown with metadata
npx agent-fetch document.pdf
# JSON output for programmatic access
npx agent-fetch document.pdf --json
# Just the text content
npx agent-fetch document.pdf --text
/agent-fetch preset - Custom TLS FingerprintImpersonate different browsers to bypass fingerprinting checks:
# Chrome 143 (default)
npx agent-fetch "<url>" --preset "chrome-143"
# iOS Safari 18
npx agent-fetch "<url>" --preset "ios-safari-18"
# Android Chrome 143
npx agent-fetch "<url>" --preset "android-chrome-143"
Weekly Installs
176
Repository
GitHub Stars
223
First Seen
Feb 4, 2026
Security Audits
Gen Agent Trust HubPassSocketWarnSnykWarn
Installed on
codex168
opencode166
gemini-cli165
github-copilot162
cursor162
kimi-cli161
Skills CLI 使用指南:AI Agent 技能包管理器安装与管理教程
33,600 周安装