AI Crawler Tester

Last updated:

Check which AI crawlers can access any URL based on robots.txt rules. See your access status for ChatGPT, Perplexity, Claude, Gemini, Bing Copilot, and every major AI platform — free, instant, no signup.

ChatGPTChatGPT LiveOpenAI SearchPerplexityClaudeClaude WebGoogle GeminiBing / Copilot + more

Why AI Crawler Access Matters

Your robots.txt file controls which bots can access your website. For traditional search, the main concern is Googlebot and Bingbot. For AI search, you need to think about a dozen additional crawlers.

If you accidentally block GPTBot, your content cannot appear in ChatGPT answers. If PerplexityBot is blocked, Perplexity cannot cite you. Many sites unknowingly block AI crawlers through broad wildcard rules — and miss out on AI search visibility entirely.

This tool shows you exactly which AI platforms can and cannot access your content, so you can make informed decisions about your AI search visibility strategy.

AI Crawlers This Tool Checks

This tool tests access for all major AI crawlers:

Platform Bot Name Type What It Affects
ChatGPT GPTBot Training ChatGPT knowledge training
ChatGPT Live ChatGPT-User Retrieval Real-time ChatGPT answers
OpenAI Search OAI-SearchBot Retrieval ChatGPT search feature
Perplexity PerplexityBot Retrieval Perplexity search answers
Claude ClaudeBot Training Claude AI training
Claude Web anthropic-ai Retrieval Claude live web access
Google Gemini Google-Extended Training Gemini AI training (not search)
Bing / Copilot Bingbot Both Bing search + Copilot
Cohere cohere-ai Training Cohere AI models
Common Crawl CCBot Training Open dataset used by many AI models
Meta AI Meta-ExternalAgent Training Meta AI (Llama) training
Amazon Alexa Amazonbot Training Amazon Alexa AI

Training Crawlers vs Retrieval Crawlers

Not all AI crawlers do the same thing. Understanding the difference helps you make smarter robots.txt decisions:

Training Crawlers

Collect content to train AI models and build knowledge bases. Blocking these prevents your content from being used in AI training, but also reduces how well the AI knows about you.

Examples: GPTBot, CCBot, Google-Extended, ClaudeBot

Retrieval Crawlers

Fetch content in real time when a user asks a question — for immediate citation in AI-generated answers. Blocking these has a direct, immediate impact on AI search visibility.

Examples: ChatGPT-User, PerplexityBot, OAI-SearchBot

For most businesses: allow all AI crawlers. For sites with paywalled or proprietary content, consider blocking training crawlers (GPTBot, CCBot) while keeping retrieval bots allowed — so your content still appears in live AI answers.

Read our guide to robots.txt and AI search for a full breakdown of the strategy.

Frequently Asked Questions

What is an AI crawler tester? +

An AI crawler tester checks your website's robots.txt to determine which AI crawlers are allowed or blocked from accessing your content. It shows your access status for major AI platforms including ChatGPT (GPTBot), Perplexity (PerplexityBot), Claude (ClaudeBot), Google Gemini (Google-Extended), Bing Copilot (Bingbot), and others.

Why does it matter which AI crawlers can access my site? +

If you block an AI crawler in your robots.txt, that platform cannot read or cite your content. Blocking GPTBot means ChatGPT cannot reference you. Blocking PerplexityBot means Perplexity cannot cite you. For most businesses that want AI search visibility, allowing these crawlers is essential.

What is the difference between training crawlers and retrieval crawlers? +

Training crawlers (like GPTBot or CCBot) collect content to train AI models — building the AI's base knowledge. Retrieval crawlers (like ChatGPT-User or PerplexityBot) fetch content in real time when a user asks a question, for immediate citation in answers. Blocking retrieval bots has a more direct impact on current AI search visibility.

Should I block AI crawlers? +

For most businesses that want to appear in AI-generated answers, no — allow AI crawlers. Blocking them opts you out of AI search citations. The exception: if your content is proprietary, behind a paywall, or you have legitimate IP concerns about AI training, selectively blocking training crawlers (while allowing retrieval bots) can be a reasonable strategy.

What is Google-Extended? +

Google-Extended is a specific User-agent token that controls whether Google uses your content for training and improving Gemini and other Google AI products. Blocking Google-Extended does NOT affect standard Google search rankings — it only prevents your content from being used in Google AI training. You can block Google-Extended without affecting your regular Google visibility.

What is GPTBot vs ChatGPT-User? +

GPTBot is OpenAI's training crawler — it collects content to improve ChatGPT's knowledge. ChatGPT-User is the retrieval bot that fetches live content when ChatGPT needs to answer a question in real time. If you block GPTBot but allow ChatGPT-User, ChatGPT can still cite your pages in live answers but cannot use them for training.

How do I allow or block specific AI crawlers? +

Add User-agent rules to your robots.txt file. To allow a bot: "User-agent: GPTBot\nDisallow:" (empty disallow = allow all). To block a bot: "User-agent: GPTBot\nDisallow: /" (disallow all). Use our free Robots.txt Generator to build a properly formatted file with AI bot controls.

Is this AI crawler tester free? +

Yes. Completely free, no signup required. Enter any URL and instantly see which AI crawlers are allowed or blocked based on that page's robots.txt rules.

Related Tools