npx skills add https://github.com/tavily-ai/skills --skill extract从指定 URL 提取纯净内容。当您明确知道需要从哪些页面获取内容时,此功能非常理想。
该脚本通过 Tavily MCP 服务器使用 OAuth 进行身份验证。无需手动设置 - 首次运行时,它将:
~/.mcp-auth/ 目录中是否存在现有令牌注意: 您必须拥有一个现有的 Tavily 账户。OAuth 流程仅支持登录 - 无法通过此流程创建账户。如果您还没有账户,请先在 tavily.com 注册。
如果您更倾向于使用 API 密钥,请在 https://tavily.com 获取一个,并将其添加到 ~/.claude/settings.json 文件中:
{
"env": {
"TAVILY_API_KEY": "tvly-your-api-key-here"
}
}
./scripts/extract.sh '<json>'
示例:
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
# 单个 URL
./scripts/extract.sh '{"urls": ["https://example.com/article"]}'
# 多个 URL
./scripts/extract.sh '{"urls": ["https://example.com/page1", "https://example.com/page2"]}'
# 带查询焦点和分块
./scripts/extract.sh '{"urls": ["https://example.com/docs"], "query": "authentication API", "chunks_per_source": 3}'
# 针对 JS 页面的高级提取
./scripts/extract.sh '{"urls": ["https://app.example.com"], "extract_depth": "advanced", "timeout": 60}'
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": ["https://example.com/article"]
}'
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"https://example.com/ml-healthcare",
"https://example.com/ai-diagnostics"
],
"query": "AI diagnostic tools accuracy",
"chunks_per_source": 3
}'
POST https://api.tavily.com/extract
| 请求头 | 值 |
|---|---|
Authorization | Bearer <TAVILY_API_KEY> |
Content-Type | application/json |
| 字段 | 类型 | 默认值 | 描述 |
|---|---|---|---|
urls | 数组 | 必需 | 要提取的 URL(最多 20 个) |
query | 字符串 | null | 根据相关性对内容块进行重新排序 |
chunks_per_source | 整数 | 3 | 每个 URL 的内容块数(1-5,需要 query 参数) |
extract_depth | 字符串 | "basic" | basic 或 advanced(针对 JS 页面) |
format | 字符串 | "markdown" | markdown 或 text |
include_images | 布尔值 | false | 是否包含图片 URL |
timeout | 浮点数 | 可变 | 最大等待时间(1-60 秒) |
{
"results": [
{
"url": "https://example.com/article",
"raw_content": "# Article Title\n\nContent..."
}
],
"failed_results": [],
"response_time": 2.3
}
| 深度 | 使用场景 |
|---|---|
basic | 简单的文本提取,速度更快 |
advanced | 动态/JS 渲染的页面、表格、结构化数据 |
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": ["https://docs.python.org/3/tutorial/classes.html"],
"extract_depth": "basic"
}'
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"https://example.com/react-hooks",
"https://example.com/react-state"
],
"query": "useState and useEffect patterns",
"chunks_per_source": 2
}'
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": ["https://app.example.com/dashboard"],
"extract_depth": "advanced",
"timeout": 60
}'
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3",
"https://example.com/page4",
"https://example.com/page5"
],
"extract_depth": "basic"
}'
query + chunks_per_source 仅获取相关内容basic,如果内容缺失再回退到 advancedtimeout(最长 60 秒)failed_results 以了解无法提取的 URL每周安装量
4.8K
代码仓库
GitHub 星标数
142
首次出现
2026年1月25日
安全审计
安装于
opencode4.3K
gemini-cli4.2K
codex4.2K
github-copilot4.1K
kimi-cli4.0K
amp4.0K
Extract clean content from specific URLs. Ideal when you know which pages you want content from.
The script uses OAuth via the Tavily MCP server. No manual setup required - on first run, it will:
~/.mcp-auth/Note: You must have an existing Tavily account. The OAuth flow only supports login — account creation is not available through this flow. Sign up at tavily.com first if you don't have an account.
If you prefer using an API key, get one at https://tavily.com and add to ~/.claude/settings.json:
{
"env": {
"TAVILY_API_KEY": "tvly-your-api-key-here"
}
}
./scripts/extract.sh '<json>'
Examples:
# Single URL
./scripts/extract.sh '{"urls": ["https://example.com/article"]}'
# Multiple URLs
./scripts/extract.sh '{"urls": ["https://example.com/page1", "https://example.com/page2"]}'
# With query focus and chunks
./scripts/extract.sh '{"urls": ["https://example.com/docs"], "query": "authentication API", "chunks_per_source": 3}'
# Advanced extraction for JS pages
./scripts/extract.sh '{"urls": ["https://app.example.com"], "extract_depth": "advanced", "timeout": 60}'
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": ["https://example.com/article"]
}'
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"https://example.com/ml-healthcare",
"https://example.com/ai-diagnostics"
],
"query": "AI diagnostic tools accuracy",
"chunks_per_source": 3
}'
POST https://api.tavily.com/extract
| Header | Value |
|---|---|
Authorization | Bearer <TAVILY_API_KEY> |
Content-Type | application/json |
| Field | Type | Default | Description |
|---|---|---|---|
urls | array | Required | URLs to extract (max 20) |
query | string | null | Reranks chunks by relevance |
chunks_per_source | integer | 3 | Chunks per URL (1-5, requires query) |
extract_depth | string | "basic" |
{
"results": [
{
"url": "https://example.com/article",
"raw_content": "# Article Title\n\nContent..."
}
],
"failed_results": [],
"response_time": 2.3
}
| Depth | When to Use |
|---|---|
basic | Simple text extraction, faster |
advanced | Dynamic/JS-rendered pages, tables, structured data |
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": ["https://docs.python.org/3/tutorial/classes.html"],
"extract_depth": "basic"
}'
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"https://example.com/react-hooks",
"https://example.com/react-state"
],
"query": "useState and useEffect patterns",
"chunks_per_source": 2
}'
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": ["https://app.example.com/dashboard"],
"extract_depth": "advanced",
"timeout": 60
}'
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3",
"https://example.com/page4",
"https://example.com/page5"
],
"extract_depth": "basic"
}'
query + chunks_per_source to get only relevant contentbasic first, fall back to advanced if content is missingtimeout for slow pages (up to 60s)failed_results for URLs that couldn't be extractedWeekly Installs
4.8K
Repository
GitHub Stars
142
First Seen
Jan 25, 2026
Security Audits
Gen Agent Trust HubWarnSocketPassSnykWarn
Installed on
opencode4.3K
gemini-cli4.2K
codex4.2K
github-copilot4.1K
kimi-cli4.0K
amp4.0K
React 组合模式指南:Vercel 组件架构最佳实践,提升代码可维护性
102,200 周安装
AI智能体长期记忆系统 - 精英级架构,融合6种方法,永不丢失上下文
1,200 周安装
AI新闻播客制作技能:实时新闻转对话式播客脚本与音频生成
1,200 周安装
Word文档处理器:DOCX创建、编辑、分析与修订痕迹处理全指南 | 自动化办公解决方案
1,200 周安装
React Router 框架模式指南:全栈开发、文件路由、数据加载与渲染策略
1,200 周安装
Nano Banana AI 图像生成工具:使用 Gemini 3 Pro 生成与编辑高分辨率图像
1,200 周安装
SVG Logo Designer - AI 驱动的专业矢量标识设计工具,生成可缩放品牌标识
1,200 周安装
basic or advanced (for JS pages) |
format | string | "markdown" | markdown or text |
include_images | boolean | false | Include image URLs |
timeout | float | varies | Max wait (1-60 seconds) |