AI Crawler User Agents

Complete reference list of known AI crawler identifiers — search, filter, and copy blocking rules

Searchable Agent List

Type to filter by name or company:

GPTBot

OpenAI
Training Block

OpenAI's web crawler for training ChatGPT models. Scrapes public web pages to improve AI training data.

User-Agent: GPTBot

OAI-SearchBot

OpenAI
AI Search Allow

OpenAI's search crawler for ChatGPT's live web search and citation features. Drives referral traffic.

User-Agent: OAI-SearchBot

ChatGPT-User

OpenAI
Both Review

Used when ChatGPT accesses pages on behalf of a user (e.g., in Conversations). Check your policy.

User-Agent: ChatGPT-User

ClaudeBot

Anthropic
Training Block

Anthropic's web crawler for training Claude models. Most websites block this crawler.

User-Agent: ClaudeBot

anthropic-ai

Anthropic
Training Block

Alternative user agent string used by Anthropic's crawlers in some contexts.

User-Agent: anthropic-ai

PerplexityBot

Perplexity
AI Search Allow

Perplexity AI's search crawler. Used to fetch pages for AI-powered answer generation.

User-Agent: PerplexityBot

Perplexity-User

Perplexity
Both Review

Used when Perplexity accesses pages on behalf of a user in answer synthesis.

User-Agent: Perplexity-User

CCBot

Common Crawl
Training Block

Common Crawl's bot used to build open web archives used for AI training. Generally blocked by privacy-conscious sites.

User-Agent: CCBot

Bytespider

Bytespider
Training Block

AI training crawler associated with TikTok/ByteDance. Often blocked due to aggressive crawling behavior.

User-Agent: Bytespider

Google-Extended

Google
Training Block

Google's product-specific crawler for training Bard (now Gemini) and other Google AI products. Does not affect Google Search.

User-Agent: Google-Extended

Applebot-Extended

Apple
Training Block

Apple's crawler for training Apple Intelligence features. Does not affect Siri indexing.

User-Agent: Applebot-Extended

Meta-ExternalAgent

Meta
Both Review

Meta's crawler used for AI product data collection. Review based on your content policy.

User-Agent: Meta-ExternalAgent

Claude-User

Anthropic
User Review

Triggered when a user asks Claude to browse a URL — it fetches the page on behalf of the user in a conversation, not automatically.

User-Agent: Claude-User

Claude-SearchBot

Anthropic
AI Search Allow

Anthropic's dedicated search crawler used to answer user queries in AI search products. Fetches pages for real-time answers, not model training.

User-Agent: Claude-SearchBot

Quick Block-All Snippet

Copy this to block all known AI training crawlers at once:

# Block all AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Applebot-Extended
Disallow: /

# Note: Claude-SearchBot (AI search) and OAI-SearchBot are allowed below if you want to allow AI search crawlers

Frequently Asked Questions

What is a user agent string?

A user agent string is the identifier a crawler sends in the HTTP User-Agent header when visiting your site. robots.txt uses these strings to match rules to specific crawlers.

How do I block a specific AI crawler?

Add User-agent: [Name] followed by Disallow: / to your robots.txt. For example: User-agent: GPTBot\nDisallow: / blocks GPTBot from your entire site.

Is this list complete?

No. AI crawler landscape changes frequently. This list reflects known crawlers as of mid-2026. Check each AI company's documentation for the latest user agent strings.