Technical AEO Checklist: Robots.txt, Schema, Site Architecture — AEO Guides

Most "AI SEO" advice is just standard SEO. Some configurations are genuinely specific to how AI agents interact with your site. This checklist separates the two and provides actual configurations to implement.

This is not a full audit guide; it is a configuration reference. If you implement these items, your technical foundation will be compatible with current AI navigation patterns. Treat it as a checklist to verify readiness, not a narrative to read once and forget.

Robots.txt Configuration

Robots.txt is the first decision point for AI agents. It determines whether crawlers can discover your content for indexing and training.

The decision: Allow or block?

The recommendation is to allow access unless you have specific intellectual property protection needs, such as paywalled content or proprietary documentation.

Blocking AI crawlers removes your content from AI search visibility. For most businesses, preventing indexing on a platform where users are actively seeking your services is a negative trade-off.

Note the distinction: Blocking crawlers stops your content from being indexed or trained on. It does not necessarily prevent AI agents from visiting your site in real-time when a user asks them to. ChatGPT browsing and crawler indexing are different things.

AI crawler user-agents to know about

You should permit the following user-agents if you want visibility across these platforms:

User-Agent	Crawler	Block?	Notes
GPTBot	OpenAI	No	Required for ChatGPT search visibility
ClaudeBot	Anthropic	No	Required for Claude visibility
PerplexityBot	Perplexity	No	Required for Perplexity visibility
CCBot	Common Crawl	No	Used for training data; block only for IP protection
Googlebot	Google	No	Blocks all Google indexing if blocked
Amazonbot	Amazon	No	Blocks Alexa results if blocked
anthropic-ai	Anthropic	No	Alternative identifier

Blocking specific agents like GPTBot or ClaudeBot prevents discovery via those platforms' indexes.

Common robots.txt configurations

Recommended: Allow all

User-agent: *
Allow: /
Sitemap: https://yoursite.com/sitemap.xml

Allow all except specific paths (for paywalled content)

User-agent: *
Allow: /
Disallow: /members-only/
Disallow: /private/
Sitemap: https://yoursite.com/sitemap.xml

Allow all but throttle aggressive crawlers

User-agent: *
Allow: /
Crawl-delay: 5

User-agent: PerplexityBot
Crawl-delay: 10

Block AI crawlers (only if IP protection is critical)

User-agent: GPTBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: *
Allow: /

Verification

Check your file at yoursite.com/robots.txt. Use Google Search Console to test bot access rules. Ensure you haven't accidentally blocked standard crawlers while targeting AI agents.

Crawlability and Rendering

Content must be fetchable and rendered. AI agents, like standard search engines, struggle to execute complex JavaScript dynamically during initial fetches.

Static HTML vs. JavaScript rendering

Test your site by disabling JavaScript in your browser. If critical content disappears—navigation menus, product descriptions, pricing information, or contact forms—you have a rendering issue. Basic AI crawlers cannot execute JavaScript to retrieve this information.

Recommended approach: SSR or Static Generation

Content should reside in HTML, not load via client-side JavaScript. For dynamic sites, use Server-Side Rendering (SSR) or Static Site Generation (SSG).

SSR: Content generated on the server and sent as HTML.
SSG: Pages pre-built at build time, served as static files.
Hybrid: Static for key pages (pricing, contact), dynamic for secondary content.

Frameworks handle this differently: Next.js uses getStaticProps or getServerSideProps; Vue uses Nuxt for SSR; SvelteKit defaults to SSR. Hugo defaults to static.

Testing: What AI agents see

Method 1: Screaming Frog Set to "JavaScript: Enabled". Crawl your site. Compare word count in raw HTML vs. rendered content. If rendered content exceeds HTML by 50% or more, you have JS rendering issues that will block AI discovery.

Method 2: Manual check Ask an AI agent: "Visit [yoursite.com/key-page]. What content can you see?" Compare this to what you see in a browser. Major discrepancies indicate rendering failures.

What's actually critical

Not all JavaScript needs fixing. Only content that matters for discovery must be in HTML:

Navigation: Must be in HTML or rendered.
Product/service descriptions: Must be in HTML.
Pricing: Must be in HTML.
Contact information: Must be in HTML.

Nice-to-haves can remain JS-loaded: Analytics, chat widgets, and expandable "View more" sections.

Crawl delay and rate limiting

If you have high traffic, rate-limit appropriately to prevent server strain. Reasonable rate limits prevent issues with multiple AI crawlers plus standard bots.

User-agent: *
Crawl-delay: 5
Request-rate: 1/5

Schema Markup Essentials

Schema markup provides semantic context, reducing hallucination risk and helping AI extract structured information accurately.

Why schema matters for AI

Instead of parsing raw text like "£99/month", schema explicitly states "this is a price for a software product with a 30-day trial." This structure improves extraction accuracy.

Essential schema types for AI

High impact (implement these):

Organization schema (on homepage)

{
  "@context": "https://schema.org",
  "@type": "Organization",
  "name": "Your Company",
  "url": "https://yoursite.com",
  "logo": "https://yoursite.com/logo.png",
  "contactPoint": {
    "@type": "ContactPoint",
    "contactType": "Customer Service",
    "email": "hello@yoursite.com"
  }
}

Product schema (on product pages)

{
  "@context": "https://schema.org/",
  "@type": "Product",
  "name": "Product Name",
  "description": "Product description",
  "price": "99.99",
  "priceCurrency": "GBP",
  "url": "https://yoursite.com/products/xyz"
}

Article/BlogPosting schema (on blog posts)

{
  "@context": "https://schema.org",
  "@type": "BlogPosting",
  "headline": "Title",
  "datePublished": "2026-03-22",
  "author": {
    "@type": "Person",
    "name": "Author Name"
  }
}

Medium impact:

LocalBusiness schema (if you have physical locations)
BreadcrumbList schema (aids navigation understanding)

Lower impact:

Event schema (if you have events)
FAQ schema (helps with Q&A, but not critical)

Implementation and Testing

Use JSON-LD. Add a <script type="application/ld+json"> block in your page <head>. Do not use RDFa or Microdata; JSON-LD is the industry standard and most AI-friendly.

Validate at https://schema.org/validate/. Use the Google Rich Results Test to ensure search engines recognise the markup.

Common mistakes to avoid

Using outdated schema types.
Incorrect nesting of properties.
Missing required fields.
Hallucinated schema types not part of the schema.org spec.

Navigation and Site Architecture

Wayfinder's research across 3,348 navigation tasks indicates 91% of successful navigation completes in 2 clicks. AI agents click navigation elements 80% of the time, relying heavily on clear structural cues.

The two-click principle

Critical pages should be reachable within two clicks from anywhere on the site. If it takes three clicks, AI success rates drop to 28%.

Audit requirements:

Homepage → Pricing: 1 click max
Homepage → Contact: 1 click max
Homepage → Main product page: 1 click max
Product page → Pricing: 1 click max
Any page → Contact: 2 clicks max

Navigation element placement

AI agents click nav elements 80% of the time, versus only 20% for body content links.

Header nav: Best for primary navigation.
Footer nav: Visible but secondary.
Body copy links: Largely invisible to AI.
Sidebars: Effective.

Ensure critical links are in <nav> HTML elements or styled as navigation.

Link text clarity

Use plain language. Enterprise and B2B sites using "solutions-speak" fail on hard tasks 30% of the time.

✓ "Pricing"
✗ "Investment Options"
✓ "Contact"
✗ "Get in Touch"
✓ "Returns Policy"
✗ "Satisfaction Guarantee"

Reduce link duplication

Multiple links pointing to the same page create confusion. Combine them. One clear link is better than multiple duplicate links with different labels.

Bad:

<a href="/pricing">Pricing</a>
<a href="/pricing">Our Plans</a>
<a href="/pricing">Cost</a>

Breadcrumb navigation

Helpful for AI and users. Shows hierarchy clearly.

<nav aria-label="breadcrumb">
  <ol>
    <li><a href="/">Home</a></li>
    <li><a href="/products">Products</a></li>
    <li><a href="/products/software">Software</a></li>
  </ol>
</nav>

Monitoring and Verification

Implementation is only the first step. You must verify that your configurations work as intended.

Automated tools

Compass (Wayfinder): Simulates AI navigation across your site to find breakdowns.
Screaming Frog: Check JS rendering and crawlability.
Google Search Console: Monitor indexing and crawl errors.
Google Rich Results Test: Validate schema.

Manual testing

Ask an AI agent: "Can you find my pricing page starting from my homepage? How many clicks did it take?" Disable JavaScript in your browser. What content disappears? Check yoursite.com/robots.txt. Are AI crawlers allowed?

Quarterly checks

Every 90 days, re-run a navigation audit (using Compass or manually). Check that robots.txt hasn't been accidentally changed and verify JavaScript rendering still works. Navigation depth is a continuous concern; depth regressions happen during site updates.

Monitoring for regressions

Watch for these common issues:

Crawlers blocked accidentally via misconfiguration.
JavaScript rendering breaks during site updates.
Navigation reorganised, pushing critical pages 3+ clicks deep.
Schema markup removed during CMS updates.

What's AI-Specific vs. Universal SEO

Clarifying what is genuinely new prevents wasted effort. Most "AI optimisation" is just good web practice.

Universal (helps Google, Bing, AND AI)

Robots.txt configuration: Allows discovery for all bots.
HTML-based content: Crawlable by all engines.
Clear site architecture: Good UX and bot navigation.
Schema markup: Helps all search/AI systems understand context.
Mobile-friendly design: Increasingly important for all search.

These are not "AI optimisation." They are just good web practice.

AI-specific (matters more for AI than Google)

Real-time JavaScript rendering: AI agents fetch live; Google indexes snapshots.
Navigation clarity and lack of jargon: AI relies on semantic matching more heavily.
Content freshness: Real-time AI search values current content more.
Two-click depth limits: Google crawls arbitrarily deep; AI success drops at 3+ clicks.

Not actually AI-specific (ignore these)

"Prompt optimisation": Trying to game AI responses is ineffective.
"AI-first content": Writing specifically for LLM parsing often makes content worse for humans.
"LLM-optimised HTML": Not a real technical standard.

If you see vendors claiming these are "AI optimisation," they are either confused or selling you something you do not need. Focus on technical accessibility and clarity instead.

Completed the checklist? Run a Compass audit to see how AI agents actually navigate your site. Find navigation failures in minutes.