AI Crawler Tester
Last updated:
Check which AI crawlers can access any URL based on robots.txt rules. See your access status for ChatGPT, Perplexity, Claude, Gemini, Bing Copilot, and every major AI platform — free, instant, no signup.
Why AI Crawler Access Matters
Your robots.txt file controls which bots can access your website. For traditional search, the main concern is Googlebot and Bingbot. For AI search, you need to think about a dozen additional crawlers.
If you accidentally block GPTBot, your content cannot appear in ChatGPT answers. If PerplexityBot is blocked, Perplexity cannot cite you. Many sites unknowingly block AI crawlers through broad wildcard rules — and miss out on AI search visibility entirely.
This tool shows you exactly which AI platforms can and cannot access your content, so you can make informed decisions about your AI search visibility strategy.
AI Crawlers This Tool Checks
This tool tests access for all major AI crawlers:
| Platform | Bot Name | Type | What It Affects |
|---|---|---|---|
| ChatGPT | GPTBot | Training | ChatGPT knowledge training |
| ChatGPT Live | ChatGPT-User | Retrieval | Real-time ChatGPT answers |
| OpenAI Search | OAI-SearchBot | Retrieval | ChatGPT search feature |
| Perplexity | PerplexityBot | Retrieval | Perplexity search answers |
| Claude | ClaudeBot | Training | Claude AI training |
| Claude Web | anthropic-ai | Retrieval | Claude live web access |
| Google Gemini | Google-Extended | Training | Gemini AI training (not search) |
| Bing / Copilot | Bingbot | Both | Bing search + Copilot |
| Cohere | cohere-ai | Training | Cohere AI models |
| Common Crawl | CCBot | Training | Open dataset used by many AI models |
| Meta AI | Meta-ExternalAgent | Training | Meta AI (Llama) training |
| Amazon Alexa | Amazonbot | Training | Amazon Alexa AI |
Training Crawlers vs Retrieval Crawlers
Not all AI crawlers do the same thing. Understanding the difference helps you make smarter robots.txt decisions:
Collect content to train AI models and build knowledge bases. Blocking these prevents your content from being used in AI training, but also reduces how well the AI knows about you.
Examples: GPTBot, CCBot, Google-Extended, ClaudeBot
Fetch content in real time when a user asks a question — for immediate citation in AI-generated answers. Blocking these has a direct, immediate impact on AI search visibility.
Examples: ChatGPT-User, PerplexityBot, OAI-SearchBot
For most businesses: allow all AI crawlers. For sites with paywalled or proprietary content, consider blocking training crawlers (GPTBot, CCBot) while keeping retrieval bots allowed — so your content still appears in live AI answers.
Read our guide to robots.txt and AI search for a full breakdown of the strategy.
Frequently Asked Questions
What is an AI crawler tester? +
An AI crawler tester checks your website's robots.txt to determine which AI crawlers are allowed or blocked from accessing your content. It shows your access status for major AI platforms including ChatGPT (GPTBot), Perplexity (PerplexityBot), Claude (ClaudeBot), Google Gemini (Google-Extended), Bing Copilot (Bingbot), and others.
Why does it matter which AI crawlers can access my site? +
If you block an AI crawler in your robots.txt, that platform cannot read or cite your content. Blocking GPTBot means ChatGPT cannot reference you. Blocking PerplexityBot means Perplexity cannot cite you. For most businesses that want AI search visibility, allowing these crawlers is essential.
What is the difference between training crawlers and retrieval crawlers? +
Training crawlers (like GPTBot or CCBot) collect content to train AI models — building the AI's base knowledge. Retrieval crawlers (like ChatGPT-User or PerplexityBot) fetch content in real time when a user asks a question, for immediate citation in answers. Blocking retrieval bots has a more direct impact on current AI search visibility.
Should I block AI crawlers? +
For most businesses that want to appear in AI-generated answers, no — allow AI crawlers. Blocking them opts you out of AI search citations. The exception: if your content is proprietary, behind a paywall, or you have legitimate IP concerns about AI training, selectively blocking training crawlers (while allowing retrieval bots) can be a reasonable strategy.
What is Google-Extended? +
Google-Extended is a specific User-agent token that controls whether Google uses your content for training and improving Gemini and other Google AI products. Blocking Google-Extended does NOT affect standard Google search rankings — it only prevents your content from being used in Google AI training. You can block Google-Extended without affecting your regular Google visibility.
What is GPTBot vs ChatGPT-User? +
GPTBot is OpenAI's training crawler — it collects content to improve ChatGPT's knowledge. ChatGPT-User is the retrieval bot that fetches live content when ChatGPT needs to answer a question in real time. If you block GPTBot but allow ChatGPT-User, ChatGPT can still cite your pages in live answers but cannot use them for training.
How do I allow or block specific AI crawlers? +
Add User-agent rules to your robots.txt file. To allow a bot: "User-agent: GPTBot\nDisallow:" (empty disallow = allow all). To block a bot: "User-agent: GPTBot\nDisallow: /" (disallow all). Use our free Robots.txt Generator to build a properly formatted file with AI bot controls.
Is this AI crawler tester free? +
Yes. Completely free, no signup required. Enter any URL and instantly see which AI crawlers are allowed or blocked based on that page's robots.txt rules.
Related Tools
- Robots.txt Tester & Validator — View and validate your full robots.txt file
- Robots.txt Generator — Create a properly formatted robots.txt with AI bot controls
- AI Search Visibility Checker — Full GEO readiness audit for your site
- llms.txt Checker — Check if a site has a valid llms.txt file
- llms.txt Generator — Create your AI content index file
- Schema Markup Validator — Check your structured data