web-research by crawlio-app/crawlio-plugin
npx skills add https://github.com/crawlio-app/crawlio-plugin --skill web-research用于 CrawlioMCP 的结构化网络研究协议。通过复合分析工具生成标准化的证据记录和发现结果。
analyze_page 是 crawlio-agent 中 extractPage 的 Swift 端对应功能,但并非其镜像。它通过 HTTP 与 Crawlio 的 ControlServer 交互,而非直接控制浏览器视口。
使用复合工具收集证据。切勿使用低级的 trigger_capture + 休眠 + get_enrichment 模式。
| 目标 | 工具 | 备注 |
|---|---|---|
| 单页面证据 | analyze_page | 一次调用 = 捕获 + 富集 + 爬取状态。返回 evidenceId、、 |
Structured web research protocol for CrawlioMCP. Produces normalized evidence records and findings through composite analysis tools.
analyze_page is the Swift-side analogue of crawlio-agent's extractPage, but not its mirror. It operates over HTTP against Crawlio's ControlServer rather than controlling a browser viewport directly.
Use composite tools to gather evidence. Never use the low-level trigger_capture + sleep + get_enrichment pattern.
| Goal | Tool | Notes |
|---|---|---|
| Single-page evidence | analyze_page | One call = capture + enrichment + crawl status. Returns , , |
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
evidenceQualitygaps| 双站点对比 | compare_pages | 顺序分析,包含类型化的对比证据 |
| 单证据查找 | get_observation | 通过 ID 验证特定的证据记录 |
| 批量爬取数据 | get_crawled_urls | 在爬取完成后使用 |
| 历史时间线 | get_observations | 仅追加的审计追踪 |
在分析之前,将统一记录中的证据结构化为规范形式:
在使用富集数据前检查 enrichmentStatus:
"ok" — 富集数据存在且可用"timeout" — 捕获完成但富集数据未及时到达;注意此缺口检查 evidenceQuality 以了解整体证据健康状况:
"complete" — 无缺口,所有数据都存在"partial" — 存在缺口但捕获成功"degraded" — 捕获级别失败或富集服务器错误根据评估标准比较标准化后的证据。通过 create_finding 生成结构化的发现结果。
切勿这样做:
trigger_capture({ url: "..." })
// sleep(5000)
get_enrichment({ url: "..." })
请这样做:
analyze_page({ url: "https://example.com" })
切勿临时拼凑证据结构。 来自 analyze_page 的记录是规范的证据格式。不要临时重新组织它。
切勿在标准化前进行分析。 先从证据记录中提取字段,然后得出结论。
用于两个站点的并排分析:
compare_pages({ urlA: "https://site-a.com", urlB: "https://site-b.com" })
响应包含一个带有类型化证据字段的 comparisonSummary:
comparisonReadiness — ready(两者均完整)、cautious(一个部分完整)、unreliable(任一为降级状态)symmetric — 双方是否具有相同的缺口概况degradationNotes — 每方缺口的人类可读列表timingDelta — 绝对时间差异(捕获、富集轮询)enrichmentAgeDeltaMs — 两次分析之间的时间戳差异evidenceIdA / evidenceIdB — 用于通过 get_observation 进行往返验证的观察 ID比较两个站点时,从以下 10 个维度进行评估:
并非每个页面都会在所有维度上都有数据。请明确注明缺口。
分析后,持久化洞察:
create_finding({
title: "站点使用 Next.js 14 并采用 ISR",
url: "https://example.com",
evidence: ["obs_abc123"],
synthesis: "框架检测确认了 Next.js 14.2.0 并采用增量静态再生...",
confidence: "high",
category: "framework"
})
发现结果在标准化证据之后产生,绝不在之前。
// 1. 获取
result = analyze_page({ url: "https://example.com" })
// 2. 标准化
framework = result.enrichment.framework // { name: "Next.js", version: "14.2.0" }
networkCount = result.enrichment.networkRequests.length
consoleErrors = result.enrichment.consoleLogs.filter(e => e.level === "error")
// 3. 分析 & 记录
create_finding({
title: "Next.js 14 具有较高的外部依赖数量",
url: "https://example.com",
synthesis: "检测到 Next.js 14.2.0。页面加载了 47 个网络请求,包括 12 个第三方域名..."
})
每周安装量
1
代码仓库
首次出现
今天
安全审计
安装于
zencoder1
amp1
cline1
openclaw1
opencode1
cursor1
evidenceIdevidenceQualitygaps| Two-site comparison | compare_pages | Sequential analysis with typed comparison evidence |
| Single evidence lookup | get_observation | Verify a specific evidence record by ID |
| Bulk crawl data | get_crawled_urls | After a completed crawl |
| Historical timeline | get_observations | Append-only audit trail |
Structure evidence from the unified records into canonical form before analysis:
Check enrichmentStatus before using enrichment data:
"ok" — enrichment data is present and usable"timeout" — capture completed but enrichment didn't arrive in time; note this gapCheck evidenceQuality for overall evidence health:
"complete" — no gaps, all data present"partial" — has gaps but capture succeeded"degraded" — capture-level failure or enrichment server errorCompare normalized evidence against a rubric. Produce structured findings via create_finding.
Never do this:
trigger_capture({ url: "..." })
// sleep(5000)
get_enrichment({ url: "..." })
Do this instead:
analyze_page({ url: "https://example.com" })
Never improvise evidence shapes. The record from analyze_page is the canonical evidence format. Don't restructure it ad hoc.
Never analyze before normalizing. Extract fields from the evidence record first, then draw conclusions.
For side-by-side analysis of two sites:
compare_pages({ urlA: "https://site-a.com", urlB: "https://site-b.com" })
The response includes a comparisonSummary with typed evidence fields:
comparisonReadiness — ready (both complete), cautious (one partial), unreliable (either degraded)symmetric — whether both sides have identical gap profilesdegradationNotes — human-readable list of gaps per sidetimingDelta — absolute timing differences (capture, enrichment polling)enrichmentAgeDeltaMs — timestamp difference between the two analysesevidenceIdA / evidenceIdB — observation IDs for round-trip verification via get_observationWhen comparing two sites, evaluate across these 10 dimensions:
Not all dimensions will have data for every page. Note gaps explicitly.
After analysis, persist insights:
create_finding({
title: "Site uses Next.js 14 with ISR",
url: "https://example.com",
evidence: ["obs_abc123"],
synthesis: "Framework detection confirmed Next.js 14.2.0 with incremental static regeneration...",
confidence: "high",
category: "framework"
})
Findings come after normalized evidence, never before.
// 1. Acquire
result = analyze_page({ url: "https://example.com" })
// 2. Normalize
framework = result.enrichment.framework // { name: "Next.js", version: "14.2.0" }
networkCount = result.enrichment.networkRequests.length
consoleErrors = result.enrichment.consoleLogs.filter(e => e.level === "error")
// 3. Analyze & Record
create_finding({
title: "Next.js 14 with high external dependency count",
url: "https://example.com",
synthesis: "Detected Next.js 14.2.0. Page loads 47 network requests including 12 third-party domains..."
})
Weekly Installs
1
Repository
First Seen
Today
Security Audits
Installed on
zencoder1
amp1
cline1
openclaw1
opencode1
cursor1
程序化SEO实战指南:大规模创建优质页面,避免内容单薄惩罚
35,800 周安装