Tavily Extract：从URL提取Markdown/文本内容，支持JS页面和批量处理

tavily-extract by tavily-ai/skills

1,400 周安装量

132 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/tavily-ai/skills --skill tavily-extract

AI/机器学习开发数据分析

🇨🇳中文介绍

tavily extract

从一个或多个 URL 中提取干净的 Markdown 或文本内容。

运行任何命令之前

如果 PATH 中找不到 tvly，请先安装它：

curl -fsSL https://cli.tavily.com/install.sh | bash && tvly login

请不要跳过此步骤或回退到其他工具。

有关其他安装方法和身份验证选项，请参阅 tavily-cli。

使用时机

你有一个特定的 URL 并想要其内容
你需要从 JavaScript 渲染的页面中获取文本
工作流程中的第 2 步：搜索 → 提取 → 映射 → 爬取 → 研究

快速开始

# 单个 URL
tvly extract "https://example.com/article" --json

# 多个 URL
tvly extract "https://example.com/page1" "https://example.com/page2" --json

# 查询聚焦提取（仅返回相关片段）
tvly extract "https://example.com/docs" --query "authentication API" --chunks-per-source 3 --json

# JavaScript 密集型页面
tvly extract "https://app.example.com" --extract-depth advanced --json

# 保存到文件
tvly extract "https://example.com/article" -o article.md

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

相关 Skills

find-skills 技能搜索工具 - Vercel Labs 开源智能体技能包管理器

733,500 周安装

Vercel React 最佳实践指南 | 58条Next.js性能优化规则与代码重构

252,100 周安装

Vercel Web界面规范检查工具 - 自动检测代码是否符合Web设计指南

202,600 周安装

agent-browser 浏览器自动化工具 - Vercel Labs 命令行网页操作与测试

133,200 周安装

选项	描述
`--query`	根据与此查询的相关性对片段重新排序
`--chunks-per-source`	每个 URL 的片段数 (1-5，需要 `--query`)
`--extract-depth`	`basic` (默认) 或 `advanced` (针对 JS 页面)
`--format`	`markdown` (默认) 或 `text`
`--include-images`	包含图片 URL
`--timeout`	最大等待时间 (1-60 秒)
`-o, --output`	将输出保存到文件
`--json`	结构化 JSON 输出

深度	使用时机
`basic`	简单页面，速度快 — 首先尝试此选项
`advanced`	JS 渲染的单页应用、动态内容、表格

每次请求最多 20 个 URL — 将更大的列表分批进行多次调用。
使用 --query + --chunks-per-source 来仅获取相关内容，而不是整个页面。
首先尝试 basic，如果内容缺失则回退到 advanced。
为加载缓慢的页面设置 --timeout (最多 60 秒)。
如果搜索结果已经包含了你需要的内容 (通过 --include-raw-content)，可以跳过提取步骤。

🇺🇸English

tavily extract

Extract clean markdown or text content from one or more URLs.

Before running any command

If tvly is not found on PATH, install it first:

curl -fsSL https://cli.tavily.com/install.sh | bash && tvly login

Do not skip this step or fall back to other tools.

See tavily-cli for alternative install methods and auth options.

When to use

You have a specific URL and want its content
You need text from JavaScript-rendered pages
Step 2 in the workflow: search → extract → map → crawl → research

Quick start

# Single URL
tvly extract "https://example.com/article" --json

# Multiple URLs
tvly extract "https://example.com/page1" "https://example.com/page2" --json

# Query-focused extraction (returns relevant chunks only)
tvly extract "https://example.com/docs" --query "authentication API" --chunks-per-source 3 --json

# JS-heavy pages
tvly extract "https://app.example.com" --extract-depth advanced --json

# Save to file
tvly extract "https://example.com/article" -o article.md

Options

Option	Description
`--query`	Rerank chunks by relevance to this query
`--chunks-per-source`	Chunks per URL (1-5, requires `--query`)
`--extract-depth`	`basic` (default) or `advanced` (for JS pages)
`--format`	`markdown` (default) or

Extract depth

Depth	When to use
`basic`	Simple pages, fast — try this first
`advanced`	JS-rendered SPAs, dynamic content, tables

Tips

Max 20 URLs per request — batch larger lists into multiple calls.
Use--query + --chunks-per-source to get only relevant content instead of full pages.
Trybasic first, fall back to advanced if content is missing.
Set--timeout for slow pages (up to 60s).
If search results already contain the content you need (via --include-raw-content), skip the extract step.

Tavily Extract：从URL提取Markdown/文本内容，支持JS页面和批量处理

🇨🇳中文介绍

tavily extract

运行任何命令之前

使用时机

快速开始

相关 Skills

选项

提取深度

提示

另请参阅

🇺🇸English

tavily extract

Before running any command

When to use

Quick start

Options

Extract depth

Tips

See also

最新 Skills