技术性SEO审计指南：2025年可抓取性、核心Web指标与AI爬虫管理

seo-technical by agricidaniel/claude-seo

357 周安装量

3,000 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/agricidaniel/claude-seo --skill seo-technical

自动化性能优化 SEO

🇨🇳中文介绍

技术性 SEO 审计

类别

1. 可抓取性

robots.txt：是否存在，是否有效，是否未屏蔽重要资源
XML 站点地图：是否存在，是否在 robots.txt 中被引用，格式是否有效
Noindex 标签：是故意的还是意外的
抓取深度：重要页面是否在主页的 3 次点击之内
JavaScript 渲染：检查关键内容是否需要 JS 执行
抓取预算：对于大型网站（>1 万页面），效率很重要

AI 爬虫管理

截至 2025-2026 年，AI 公司积极抓取网络以训练模型并为 AI 搜索提供支持。通过 robots.txt 管理这些爬虫是一项关键的技术性 SEO 考量。

已知的 AI 爬虫：

爬虫	公司	robots.txt 令牌	用途
GPTBot	OpenAI	`GPTBot`	模型训练
ChatGPT-User	OpenAI	`ChatGPT-User`	实时浏览
ClaudeBot

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

6. 核心 Web 指标

LCP (最大内容绘制)：目标 <2.5 秒
INP (下一次绘制的交互延迟)：目标 <200 毫秒
- INP 于 2024 年 3 月 12 日取代了 FID。FID 已于 2024 年 9 月 9 日从所有 Chrome 工具（CrUX API、PageSpeed Insights、Lighthouse）中完全移除。请不要在任何地方引用 FID。
CLS (累积布局偏移)：目标 <0.1
评估使用真实用户数据的第 75 百分位数
如果 MCP 可用，请使用 PageSpeed Insights API 或 CrUX 数据

检测：JSON-LD（首选）、微数据、RDFa
根据 Google 支持的类型进行验证
完整分析请参见 seo-schema 技能

8. JavaScript 渲染

检查内容在初始 HTML 中是否可见，还是需要 JS
识别是客户端渲染 (CSR) 还是服务器端渲染 (SSR)
标记可能导致索引问题的 SPA 框架（React、Vue、Angular）
验证动态渲染设置（如果适用）

JavaScript SEO — 规范与索引指南（2025 年 12 月）

Google 于 2025 年 12 月更新了其 JavaScript SEO 文档，并提供了关键澄清：

规范冲突： 如果原始 HTML 中的规范标签与 JavaScript 注入的规范标签不同，Google 可能使用其中任何一个。确保服务器渲染的 HTML 和 JS 渲染的输出之间的规范标签完全相同。
使用 JavaScript 的 noindex： 如果原始 HTML 包含 <meta name="robots" content="noindex"> 但 JavaScript 将其移除，Google 可能仍会遵循原始 HTML 中的 noindex 指令。请在初始 HTML 响应中提供正确的 robots 指令。
非 200 状态码： Google 不会在返回非 200 HTTP 状态码的页面上渲染 JavaScript。错误页面上通过 JS 注入的任何内容或 meta 标签对 Googlebot 都不可见。
JavaScript 中的结构化数据： 通过 JS 注入的产品、文章和其他结构化数据可能会面临处理延迟。对于时间敏感的结构化数据（尤其是电子商务产品标记），请将其包含在初始服务器渲染的 HTML 中。

最佳实践： 在初始服务器渲染的 HTML 中提供关键的 SEO 元素（规范标签、meta robots、结构化数据、标题、meta 描述），而不是依赖 JavaScript 注入。

检查网站是否支持 Bing、Yandex、Naver 的 IndexNow
除 Google 外的搜索引擎支持
建议实施，以便在非 Google 搜索引擎上更快建立索引

技术得分：XX/100

类别	状态	得分
可抓取性	✅/⚠️/❌	XX/100
可索引性	✅/⚠️/❌	XX/100
安全性	✅/⚠️/❌	XX/100
URL 结构	✅/⚠️/❌	XX/100
移动端	✅/⚠️/❌	XX/100
核心 Web 指标	✅/⚠️/❌	XX/100
结构化数据	✅/⚠️/❌	XX/100
JS 渲染	✅/⚠️/❌	XX/100
IndexNow	✅/⚠️/❌	XX/100

关键问题（立即修复）

高优先级（1 周内修复）

中优先级（1 个月内修复）

低优先级（待办事项）

DataForSEO 集成（可选）

如果 DataForSEO MCP 工具可用，请使用 on_page_instant_pages 进行实时页面分析（状态码、页面计时、损坏链接、页面检查），使用 on_page_lighthouse 进行 Lighthouse 审计（性能、可访问性、SEO 得分），以及使用 domain_analytics_technologies_domain_technologies 进行技术栈检测。

2026 年 2 月 19 日

🇺🇸English

Technical SEO Audit

Categories

1. Crawlability

robots.txt: exists, valid, not blocking important resources
XML sitemap: exists, referenced in robots.txt, valid format
Noindex tags: intentional vs accidental
Crawl depth: important pages within 3 clicks of homepage
JavaScript rendering: check if critical content requires JS execution
Crawl budget: for large sites (>10k pages), efficiency matters

AI Crawler Management

As of 2025-2026, AI companies actively crawl the web to train models and power AI search. Managing these crawlers via robots.txt is a critical technical SEO consideration.

Known AI crawlers:

Crawler	Company	robots.txt token	Purpose
GPTBot	OpenAI	`GPTBot`	Model training
ChatGPT-User	OpenAI	`ChatGPT-User`	Real-time browsing
ClaudeBot	Anthropic	`ClaudeBot`	Model training
PerplexityBot	Perplexity	`PerplexityBot`	Search index + training
Bytespider	ByteDance	`Bytespider`	Model training
Google-Extended	Google	`Google-Extended`	Gemini training (NOT search)
CCBot	Common Crawl	`CCBot`	Open dataset

Key distinctions:

Blocking Google-Extended prevents Gemini training use but does NOT affect Google Search indexing or AI Overviews (those use Googlebot)
Blocking GPTBot prevents OpenAI training but does NOT prevent ChatGPT from citing your content via browsing (ChatGPT-User)
~3-5% of websites now use AI-specific robots.txt rules

Example — selective AI crawler blocking:

# Allow search indexing, block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

# Allow all other crawlers (including Googlebot for search)
User-agent: *
Allow: /

Recommendation: Consider your AI visibility strategy before blocking. Being cited by AI systems drives brand awareness and referral traffic. Cross-reference the seo-geo skill for full AI visibility optimization.

2. Indexability

Canonical tags: self-referencing, no conflicts with noindex
Duplicate content: near-duplicates, parameter URLs, www vs non-www
Thin content: pages below minimum word counts per type
Pagination: rel=next/prev or load-more pattern
Hreflang: correct for multi-language/multi-region sites
Index bloat: unnecessary pages consuming crawl budget

3. Security

HTTPS: enforced, valid SSL certificate, no mixed content
Security headers:
- Content-Security-Policy (CSP)
- Strict-Transport-Security (HSTS)
- X-Frame-Options
- X-Content-Type-Options
- Referrer-Policy
HSTS preload: check preload list inclusion for high-security sites

4. URL Structure

Clean URLs: descriptive, hyphenated, no query parameters for content
Hierarchy: logical folder structure reflecting site architecture
Redirects: no chains (max 1 hop), 301 for permanent moves
URL length: flag >100 characters
Trailing slashes: consistent usage

5. Mobile Optimization

Responsive design: viewport meta tag, responsive CSS
Touch targets: minimum 48x48px with 8px spacing
Font size: minimum 16px base
No horizontal scroll
Mobile-first indexing: Google indexes mobile version. Mobile-first indexing is 100% complete as of July 5, 2024. Google now crawls and indexes ALL websites exclusively with the mobile Googlebot user-agent.

6. Core Web Vitals

LCP (Largest Contentful Paint): target <2.5s
INP (Interaction to Next Paint): target <200ms
- INP replaced FID on March 12, 2024. FID was fully removed from all Chrome tools (CrUX API, PageSpeed Insights, Lighthouse) on September 9, 2024. Do NOT reference FID anywhere.
CLS (Cumulative Layout Shift): target <0.1
Evaluation uses 75th percentile of real user data
Use PageSpeed Insights API or CrUX data if MCP available

7. Structured Data

Detection: JSON-LD (preferred), Microdata, RDFa
Validation against Google's supported types
See seo-schema skill for full analysis

8. JavaScript Rendering

Check if content visible in initial HTML vs requires JS
Identify client-side rendered (CSR) vs server-side rendered (SSR)
Flag SPA frameworks (React, Vue, Angular) that may cause indexing issues
Verify dynamic rendering setup if applicable

JavaScript SEO — Canonical & Indexing Guidance (December 2025)

Google updated its JavaScript SEO documentation in December 2025 with critical clarifications:

Canonical conflicts: If a canonical tag in raw HTML differs from one injected by JavaScript, Google may use EITHER one. Ensure canonical tags are identical between server-rendered HTML and JS-rendered output.
noindex with JavaScript: If raw HTML contains <meta name="robots" content="noindex"> but JavaScript removes it, Google MAY still honor the noindex from raw HTML. Serve correct robots directives in the initial HTML response.
Non-200 status codes: Google does NOT render JavaScript on pages returning non-200 HTTP status codes. Any content or meta tags injected via JS on error pages will be invisible to Googlebot.
Structured data in JavaScript: Product, Article, and other structured data injected via JS may face delayed processing. For time-sensitive structured data (especially e-commerce Product markup), include it in the initial server-rendered HTML.

Best practice: Serve critical SEO elements (canonical, meta robots, structured data, title, meta description) in the initial server-rendered HTML rather than relying on JavaScript injection.

9. IndexNow Protocol

Check if site supports IndexNow for Bing, Yandex, Naver
Supported by search engines other than Google
Recommend implementation for faster indexing on non-Google engines

Output

Technical Score: XX/100

Category Breakdown

Category	Status	Score
Crawlability	✅/⚠️/❌	XX/100
Indexability	✅/⚠️/❌	XX/100
Security	✅/⚠️/❌	XX/100
URL Structure	✅/⚠️/❌	XX/100
Mobile	✅/⚠️/❌	XX/100
Core Web Vitals	✅/⚠️/❌	XX/100
Structured Data	✅/⚠️/❌	XX/100
JS Rendering	✅/⚠️/❌	XX/100
IndexNow	✅/⚠️/❌	XX/100

Critical Issues (fix immediately)

High Priority (fix within 1 week)

Medium Priority (fix within 1 month)

Low Priority (backlog)

DataForSEO Integration (Optional)

If DataForSEO MCP tools are available, use on_page_instant_pages for real page analysis (status codes, page timing, broken links, on-page checks), on_page_lighthouse for Lighthouse audits (performance, accessibility, SEO scores), and domain_analytics_technologies_domain_technologies for technology stack detection.

Weekly Installs

113

Repository

agricidaniel/claude-seo

GitHub Stars

2.0K

First Seen

Feb 19, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykWarn

Installed on

codex108

github-copilot107

opencode107

gemini-cli106

cursor106

kimi-cli105

程序化SEO实战指南：大规模创建优质页面，避免内容单薄惩罚

33,300 周安装