geo-crawlers by zubair-trabzada/geo-seo-claude
npx skills add https://github.com/zubair-trabzada/geo-seo-claude --skill geo-crawlers此技能分析网站对 AI 爬虫的可访问性——AI 公司用来发现、索引和训练网络内容的机器人。如果 AI 爬虫被屏蔽,无论网站内容质量如何,都无法出现在 AI 生成的回答中。爬虫访问是 GEO 的基础技术要求。
截至 2026 年初,许多网站因过于激进的 robots.txt 规则(继承自旧的 SEO 配置)而无意中屏蔽了 AI 爬虫。Originality.ai 2025 年的一项研究发现,排名前 1000 的网站中有超过 35% 屏蔽了至少一个主要的 AI 爬虫,5-10% 屏蔽了所有 AI 爬虫。屏蔽 AI 爬虫是在 AI 生成的搜索结果中"隐身"的最快方式。
这些爬虫为 AI 搜索产品提供支持,用户在这些产品中主动寻找答案。屏蔽它们会直接降低你在 AI 生成回答中的可见性。
GPTBotMozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
OAI-SearchBotMozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; OAI-SearchBot/1.0; +https://docs.openai.com/bots/overview)ChatGPT-UserMozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ChatGPT-User/1.0; +https://openai.com/bot)ClaudeBotClaudeBot/1.0; +https://www.anthropic.com/claude-botPerplexityBotMozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)这些爬虫服务于大型 AI 平台或搜索生态系统。允许它们可以增加你内容的覆盖范围。
Google-ExtendedGoogleOtherApplebot-ExtendedAmazonbotMozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (compatible; Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot)FacebookBot这些爬虫主要用于 AI 模型训练,而非实时搜索功能。屏蔽它们不会影响 AI 搜索可见性。
CCBotCCBot/2.0 (https://commoncrawl.org/faq/)anthropic-aiBytespidercohere-ai| 爬虫 | 层级 | 建议 | 理由 |
|---|---|---|---|
| GPTBot | 1 | 允许 | 为 ChatGPT 搜索提供支持(3亿+用户) |
| OAI-SearchBot | 1 | 允许 | 仅用于搜索,不用于训练 |
| ChatGPT-User | 1 | 允许 | 用户发起的浏览 |
| ClaudeBot | 1 | 允许 | Claude 网络搜索和分析 |
| PerplexityBot | 1 | 允许 | 最佳推荐流量的 AI 搜索 |
| Google-Extended | 2 | 允许 | Gemini 功能;不影响搜索排名 |
| GoogleOther | 2 | 允许 | Google AI 研究 |
| Applebot-Extended | 2 | 允许 | Apple Intelligence(20亿+设备) |
| Amazonbot | 2 | 允许 | Alexa 和 Amazon AI |
| FacebookBot | 2 | 允许 | Meta AI(30亿+应用用户) |
| CCBot | 3 | 视情况 | 仅训练数据 |
| anthropic-ai | 3 | 视情况 | 仅训练数据 |
| Bytespider | 3 | 屏蔽 | 激进的爬虫,收益低 |
| cohere-ai | 3 | 视情况 | 仅训练数据 |
对于希望获得最大 AI 搜索可见性的网站:
# AI Crawlers - ALLOWED for AI search visibility
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: anthropic-ai
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: GoogleOther
Allow: /
User-agent: Applebot-Extended
Allow: /
User-agent: Amazonbot
Allow: /
User-agent: FacebookBot
Allow: /
# AI Crawlers - BLOCKED (aggressive/low value)
User-agent: Bytespider
Disallow: /
User-agent: CCBot
Disallow: /
[domain]/robots.txt。User-agent: *)块会适用Crawl-delay 指令。Sitemap 指令(AI 爬虫使用这些指令进行发现)。<meta name="robots" content="noindex"> —— 屏蔽所有机器人<meta name="robots" content="nofollow"> —— 阻止链接跟踪<meta name="robots" content="noai"> —— 新兴标签,用于阻止 AI 使用<meta name="robots" content="noimageai"> —— 阻止 AI 图像训练<meta name="GPTBot" content="noindex">X-Robots-Tag: noindex —— 等同于 meta noindex 的 HTTP 头部X-Robots-Tag: noai —— 阻止 AI 使用的 HTTP 头部X-Robots-Tag: noimageai —— 阻止 AI 图像训练X-Robots-Tag: GPTBot: noindex/llms.txt(用于 AI 爬虫指导的新兴标准)。/.well-known/ai-plugin.json(OpenAI 插件清单)。/ai.txt(提议的标准,类似于针对 AI 的 ads.txt)。生成一个名为 GEO-CRAWLER-ACCESS.md 的文件:
# AI 爬虫访问报告:[域名]
**分析日期:** [日期]
**域名:** [域名]
**robots.txt 状态:** [找到/未找到/错误]
---
## 爬虫访问摘要
| 爬虫 | 运营商 | 层级 | 状态 | 影响 |
|---|---|---|---|---|
| GPTBot | OpenAI | 1 | [允许/屏蔽/未提及] | [影响描述] |
| OAI-SearchBot | OpenAI | 1 | [状态] | [影响] |
| ChatGPT-User | OpenAI | 1 | [状态] | [影响] |
| ClaudeBot | Anthropic | 1 | [状态] | [影响] |
| PerplexityBot | Perplexity | 1 | [状态] | [影响] |
| Google-Extended | Google | 2 | [状态] | [影响] |
| GoogleOther | Google | 2 | [状态] | [影响] |
| Applebot-Extended | Apple | 2 | [状态] | [影响] |
| Amazonbot | Amazon | 2 | [状态] | [影响] |
| FacebookBot | Meta | 2 | [状态] | [影响] |
| CCBot | Common Crawl | 3 | [状态] | [影响] |
| anthropic-ai | Anthropic | 3 | [状态] | [影响] |
| Bytespider | ByteDance | 3 | [状态] | [影响] |
| cohere-ai | Cohere | 3 | [状态] | [影响] |
## AI 可见性得分:[X]/100
**第一层级访问:** [X/5 个爬虫允许]
**第二层级访问:** [X/5 个爬虫允许]
**第三层级访问:** [X/4 个爬虫允许]
---
## 关键问题
[列出任何被屏蔽的第一层级爬虫]
## 建议
### 立即行动
[需要进行的特定 robots.txt 更改]
### robots.txt 建议
[针对 AI 爬虫的完整建议 robots.txt 内容]
### 其他技术发现
- **Meta Robots 标签:** [发现]
- **X-Robots-Tag 头部:** [发现]
- **JavaScript 渲染:** [评估]
- **llms.txt:** [存在/缺失]
- **站点地图可访问性:** [评估]
AI 爬虫访问得分计算如下:
| 组件 | 权重 | 评分 |
|---|---|---|
| 允许的第一层级爬虫 | 50% | 每个允许的第一层级爬虫得 20 分(5 个爬虫 = 最高 100 分,按比例缩放到 50) |
| 允许的第二层级爬虫 | 25% | 每个允许的第二层级爬虫得 20 分(5 个爬虫 = 最高 100 分,按比例缩放到 25) |
| 无全面 AI 屏蔽 | 15% | 如果没有 User-agent: * Disallow: / 且没有 noai meta 标签,则得满分 |
| 存在 AI 特定文件 | 10% | llms.txt 得 5 分,AI 爬虫可访问站点地图得 5 分 |
最终得分 = 所有权重组件的总和,上限为 100。
每周安装次数
72
代码仓库
GitHub 星标数
3.9K
首次出现
2026年2月27日
安全审计
安装于
opencode71
codex71
cline69
gemini-cli69
cursor69
github-copilot69
This skill analyzes a website's accessibility to AI crawlers -- the bots that AI companies use to discover, index, and train on web content. If AI crawlers are blocked, the site's content cannot appear in AI-generated responses regardless of its quality. Crawler access is the foundational technical requirement for GEO.
As of early 2026, many websites inadvertently block AI crawlers through overly aggressive robots.txt rules, inherited from legacy SEO configurations. An Originality.ai 2025 study found that over 35% of the top 1,000 websites block at least one major AI crawler, and 5-10% block all AI crawlers. Blocking AI crawlers is the single fastest way to become invisible in AI-generated search results.
These crawlers power the AI search products where users actively look for answers. Blocking them directly reduces your visibility in AI-generated responses.
GPTBotMozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)OAI-SearchBotMozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; OAI-SearchBot/1.0; +https://docs.openai.com/bots/overview)ChatGPT-UserMozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ChatGPT-User/1.0; +https://openai.com/bot)ClaudeBotClaudeBot/1.0; +https://www.anthropic.com/claude-botPerplexityBotMozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)These crawlers serve large AI platforms or search ecosystems. Allowing them increases your content's reach.
Google-ExtendedGoogleOtherApplebot-ExtendedAmazonbotMozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (compatible; Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot)FacebookBotThese crawlers are primarily used for AI model training rather than live search features. Blocking them does not affect AI search visibility.
CCBotCCBot/2.0 (https://commoncrawl.org/faq/)anthropic-aiBytespidercohere-ai| Crawler | Tier | Recommendation | Reason |
|---|---|---|---|
| GPTBot | 1 | ALLOW | Powers ChatGPT Search (300M+ users) |
| OAI-SearchBot | 1 | ALLOW | Search-only, no training use |
| ChatGPT-User | 1 | ALLOW | User-initiated browsing |
| ClaudeBot | 1 | ALLOW | Claude web search and analysis |
| PerplexityBot | 1 | ALLOW | Best referral traffic AI search |
| Google-Extended | 2 | ALLOW | Gemini features; no search rank impact |
For sites wanting maximum AI search visibility:
# AI Crawlers - ALLOWED for AI search visibility
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: anthropic-ai
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: GoogleOther
Allow: /
User-agent: Applebot-Extended
Allow: /
User-agent: Amazonbot
Allow: /
User-agent: FacebookBot
Allow: /
# AI Crawlers - BLOCKED (aggressive/low value)
User-agent: Bytespider
Disallow: /
User-agent: CCBot
Disallow: /
[domain]/robots.txt.User-agent: *) block that would applyCrawl-delay directives that may slow AI crawler access.Sitemap directives (AI crawlers use these for discovery).<meta name="robots" content="noindex"> -- blocks all bots<meta name="robots" content="nofollow"> -- prevents link following<meta name="robots" content="noai"> -- emerging tag to block AI use<meta name="robots" content="noimageai"> -- blocks AI image training<meta name="GPTBot" content="noindex">X-Robots-Tag: noindex -- HTTP header equivalent of meta noindexX-Robots-Tag: noai -- HTTP header to block AI useX-Robots-Tag: noimageai -- blocks AI image trainingX-Robots-Tag: GPTBot: noindex/llms.txt (emerging standard for AI crawler guidance)./.well-known/ai-plugin.json (OpenAI plugin manifest)./ai.txt (proposed standard, similar to ads.txt for AI).Generate a file called GEO-CRAWLER-ACCESS.md:
# AI Crawler Access Report: [Domain]
**Analysis Date:** [Date]
**Domain:** [Domain]
**robots.txt Status:** [Found/Not Found/Error]
---
## Crawler Access Summary
| Crawler | Operator | Tier | Status | Impact |
|---|---|---|---|---|
| GPTBot | OpenAI | 1 | [Allowed/Blocked/Not Mentioned] | [Impact description] |
| OAI-SearchBot | OpenAI | 1 | [Status] | [Impact] |
| ChatGPT-User | OpenAI | 1 | [Status] | [Impact] |
| ClaudeBot | Anthropic | 1 | [Status] | [Impact] |
| PerplexityBot | Perplexity | 1 | [Status] | [Impact] |
| Google-Extended | Google | 2 | [Status] | [Impact] |
| GoogleOther | Google | 2 | [Status] | [Impact] |
| Applebot-Extended | Apple | 2 | [Status] | [Impact] |
| Amazonbot | Amazon | 2 | [Status] | [Impact] |
| FacebookBot | Meta | 2 | [Status] | [Impact] |
| CCBot | Common Crawl | 3 | [Status] | [Impact] |
| anthropic-ai | Anthropic | 3 | [Status] | [Impact] |
| Bytespider | ByteDance | 3 | [Status] | [Impact] |
| cohere-ai | Cohere | 3 | [Status] | [Impact] |
## AI Visibility Score: [X]/100
**Tier 1 Access:** [X/5 crawlers allowed]
**Tier 2 Access:** [X/5 crawlers allowed]
**Tier 3 Access:** [X/4 crawlers allowed]
---
## Critical Issues
[List any Tier 1 crawlers that are blocked]
## Recommendations
### Immediate Actions
[Specific robots.txt changes needed]
### robots.txt Recommendation
[Complete recommended robots.txt content for AI crawlers]
### Additional Technical Findings
- **Meta Robots Tags:** [Findings]
- **X-Robots-Tag Headers:** [Findings]
- **JavaScript Rendering:** [Assessment]
- **llms.txt:** [Present/Absent]
- **Sitemap Accessibility:** [Assessment]
The AI Crawler Access Score is calculated as:
| Component | Weight | Scoring |
|---|---|---|
| Tier 1 Crawlers Allowed | 50% | 20 points per Tier 1 crawler allowed (5 crawlers = 100 points max, scaled to 50) |
| Tier 2 Crawlers Allowed | 25% | 20 points per Tier 2 crawler allowed (5 crawlers = 100 points max, scaled to 25) |
| No Blanket AI Blocks | 15% | Full points if no User-agent: * Disallow: / and no noai meta tags |
| AI-Specific Files Present | 10% | 5 points for llms.txt, 5 points for sitemap accessible to AI crawlers |
Final score = sum of all weighted components, capped at 100.
Weekly Installs
72
Repository
GitHub Stars
3.9K
First Seen
Feb 27, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
opencode71
codex71
cline69
gemini-cli69
cursor69
github-copilot69
超能力技能使用指南:AI助手技能调用优先级与工作流程详解
52,100 周安装
| GoogleOther | 2 | ALLOW | Google AI research |
| Applebot-Extended | 2 | ALLOW | Apple Intelligence (2B+ devices) |
| Amazonbot | 2 | ALLOW | Alexa and Amazon AI |
| FacebookBot | 2 | ALLOW | Meta AI (3B+ app users) |
| CCBot | 3 | Context | Training data only |
| anthropic-ai | 3 | Context | Training data only |
| Bytespider | 3 | BLOCK | Aggressive crawler, low benefit |
| cohere-ai | 3 | Context | Training data only |