Why Block AI Training Crawlers?
AI companies use training crawlers to scrape web content and improve their models. If you do not want your content used for this purpose — especially without compensation — blocking these crawlers is the standard first step.
Option 1 — Block GPTBot Only
The simplest approach: block only GPTBot while allowing other crawlers.
User-agent: GPTBot
Disallow: /
Option 2 — Block All Major AI Training Crawlers
This covers GPTBot, ClaudeBot, CCBot, and other common training crawlers in one file.
# Block AI training crawlers
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Applebot-Extended
Disallow: /
Option 3 — Block Training, Allow AI Search
Block training crawlers while explicitly allowing AI search crawlers like OAI-SearchBot and PerplexityBot.
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Bytespider
Disallow: /
# Allow AI search crawlers
User-agent: OAI-SearchBot
Allow: /
User-agent: Claude-SearchBot
Allow: /
User-agent: PerplexityBot
Allow: /
Pre-Deployment Checklist
Frequently Asked Questions
How do I block GPTBot specifically?
Add User-agent: GPTBot followed by Disallow: / to your robots.txt file. This blocks GPTBot from accessing any page on your site.
Should I block GPTBot or allow it?
It depends on your goals. If you do not want your content used to train AI models for free, block training crawlers. If you want your content to potentially appear in AI search results and don't mind the training use, allow it.
Does blocking GPTBot affect Google Search?
No. GPTBot and Googlebot are completely different crawlers. Blocking GPTBot has no effect on your Google search ranking or indexing.