All Glossary Terms

robots.txt

robots.txt controls crawler access, but does it block AI agents like ChatGPT and Claude? Here's what works, what doesn't, and why the rules are changing.

Technical Accessibility

robots.txt is a text file placed in a website's root directory that instructs crawlers which parts of your site they can and cannot access. In the era of AI search, robots.txt has taken on new significance: some AI platforms respect it; others do not. The rules are changing rapidly.

What is robots.txt and How Does It Work?

robots.txt is a simple text file with directives: User-agent (which crawler), Disallow (which paths to block), Allow (which paths to permit), Crawl-delay, and Sitemap. Traditional search engines like Googlebot and Bingbot respect it. But that's been the extent of most existing guidance.

Increasingly, AI platforms are deploying crawlers that may or may not respect robots.txt. Common AI-related user agents to check include GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot (Perplexity), Amazonbot (Amazon), and CCBot (Common Crawl). The question "does my robots.txt block AI?" is now a real strategic consideration for site owners.

robots.txt and Traditional Search vs. AI Search

For decades, robots.txt was entirely about controlling search engine crawlers. The assumption was simple: if you blocked it in robots.txt, search engines wouldn't index it.

Now, AI platforms are introducing their own user-agents (OpenAI, Anthropic, Perplexity, etc.) with different philosophies. Some respect robots.txt; some don't. Some ask permission first. Others ignore it entirely. This creates a fragmented landscape where blocking rules apply differently depending on which platform is accessing your content.

It's important to note a critical distinction: blocking crawlers in robots.txt stops your content from being indexed or used for training data. It doesn't necessarily stop AI agents from visiting your site in real-time when a user asks them to browse. ChatGPT browsing and crawler indexing are fundamentally different processes.

Which AI Platforms Respect robots.txt?

The compliance landscape is evolving rapidly. Based on current understanding as of early 2026:

  • OpenAI (GPTBot): Has published guidelines indicating they attempt to respect robots.txt, though real-time browsing may differ from training data collection
  • Anthropic (ClaudeBot): Identifies itself through multiple user-agent strings and generally respects disallow directives for crawling
  • Perplexity (PerplexityBot): Respects robots.txt for indexing purposes
  • Google (AI Overviews/Gemini): Uses existing Googlebot crawlers, respects robots.txt consistently
  • Amazon (Amazonbot): Respects standard robots.txt directives
  • Common Crawl (CCBot): Often used for training data; respects robots.txt but data may have been crawled before rules changed

These practices are subject to change. Companies are actively refining their policies as the AI search market matures.

Should You Block AI Agents with robots.txt?

This depends on your strategic goals. Some sites want to block AI agents to protect proprietary content. Others want to be discoverable by AI search platforms.

Consider these factors before deciding:

  • Content type: Proprietary knowledge vs. published thought leadership
  • Business model: Is AI citation beneficial or damaging to your goals?
  • Competitive position: Are competitors blocking AI access?
  • Platform compliance: If an AI platform ignores robots.txt, blocking won't work

To block AI agents, add specific user-agent rules to your robots.txt file:

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

However, understand the limitations: many sites added blanket AI blocks in 2023-2024 when the "AI is stealing our content" panic hit. That made sense for publishers protecting copyrighted content, but makes less sense for businesses that want AI to help users find their pricing page.

Blocking is only one tool. For comprehensive AI visibility management, you need to understand which platforms actually discover your content.

Related Terms

  • AI Navigability — The broader concept of how AI agents access and understand your site.
  • AI Crawl Budget — Managing how AI platforms allocate crawling resources across your site.
  • Schema Markup — Structured data that helps AI understand your content.

Want to understand how different AI agents access your site? Compass shows you which AI platforms actually discover and cite your content, and where robots.txt rules matter.