Searchable Agent List
Type to filter by name or company:
GPTBot
OpenAIOpenAI's web crawler for training ChatGPT models. Scrapes public web pages to improve AI training data.
OAI-SearchBot
OpenAIOpenAI's search crawler for ChatGPT's live web search and citation features. Drives referral traffic.
ChatGPT-User
OpenAIUsed when ChatGPT accesses pages on behalf of a user (e.g., in Conversations). Check your policy.
ClaudeBot
AnthropicAnthropic's web crawler for training Claude models. Most websites block this crawler.
anthropic-ai
AnthropicAlternative user agent string used by Anthropic's crawlers in some contexts.
PerplexityBot
PerplexityPerplexity AI's search crawler. Used to fetch pages for AI-powered answer generation.
Perplexity-User
PerplexityUsed when Perplexity accesses pages on behalf of a user in answer synthesis.
CCBot
Common CrawlCommon Crawl's bot used to build open web archives used for AI training. Generally blocked by privacy-conscious sites.
Bytespider
BytespiderAI training crawler associated with TikTok/ByteDance. Often blocked due to aggressive crawling behavior.
Google-Extended
GoogleGoogle's product-specific crawler for training Bard (now Gemini) and other Google AI products. Does not affect Google Search.
Applebot-Extended
AppleApple's crawler for training Apple Intelligence features. Does not affect Siri indexing.
Meta-ExternalAgent
MetaMeta's crawler used for AI product data collection. Review based on your content policy.
Claude-User
AnthropicTriggered when a user asks Claude to browse a URL — it fetches the page on behalf of the user in a conversation, not automatically.
Claude-SearchBot
AnthropicAnthropic's dedicated search crawler used to answer user queries in AI search products. Fetches pages for real-time answers, not model training.
Quick Block-All Snippet
Copy this to block all known AI training crawlers at once:
# Block all AI training crawlers User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: anthropic-ai Disallow: / User-agent: CCBot Disallow: / User-agent: Bytespider Disallow: / User-agent: Google-Extended Disallow: / User-agent: Applebot-Extended Disallow: / # Note: Claude-SearchBot (AI search) and OAI-SearchBot are allowed below if you want to allow AI search crawlers
Frequently Asked Questions
What is a user agent string?
A user agent string is the identifier a crawler sends in the HTTP User-Agent header when visiting your site. robots.txt uses these strings to match rules to specific crawlers.
How do I block a specific AI crawler?
Add User-agent: [Name] followed by Disallow: / to your robots.txt. For example: User-agent: GPTBot\nDisallow: / blocks GPTBot from your entire site.
Is this list complete?
No. AI crawler landscape changes frequently. This list reflects known crawlers as of mid-2026. Check each AI company's documentation for the latest user agent strings.