Wayfinder AI
  • About
  • Blog
  • Research
  • Contact
  • Access Compass
Access Compass
Wayfinder AI
© 2026 Wayfinder AI. All rights reserved.
Products
  • Compass
  • Lens (Coming soon)
  • Chart (Coming soon)
  • Radar (Coming soon)
Company
  • About
  • Blog
  • Contact
Resources
  • Glossary
  • Guides
  • Comparisons
  • Free Tools
  • Pricing
  • Privacy Policy
  • Terms of Service
All Posts

How to Audit Your Site for AI Agents (The DIY Guide)

How to conduct an AI accessibility audit of your site using free tools

February 16, 2026·Shaun Myandee·15 min (~2 hours to actually do it) read
auditAI accessibilityAEOSEO

"
The question isn't whether AI agents will visit your site. It's whether they'll find what they're looking for when they do.
"

Introduction

AI agents are starting to do things on behalf of users. Not just answering questions — actually navigating websites, finding information, completing tasks. ChatGPT can browse. Claude can use computers. Perplexity synthesises from live pages. And this is just the beginning.

The question isn't whether AI agents will visit your site. It's whether they'll find what they're looking for when they do.

This guide shows you how to audit your site for AI agent discoverability — what they can see, what they can't, and what's blocking them. You can do this with free tools and a couple of hours. Or you can use dedicated tools like Compass to automate it. Either way, you'll understand what's actually happening when AI visits your site.

What you'll need:

  • Screaming Frog (free version works)
  • Access to Claude, ChatGPT, or Perplexity
  • A text editor
  • Optionally: Python 3.x and basic command line comfort

Time estimate: 2-3 hours for a thorough audit


Part 1: The Quick Checks (30 minutes)

Before diving deep, do these quick checks to catch obvious issues.

1.1 Read your robots.txt

Go to yoursite.com/robots.txt and actually read it.

What you're looking for:

# Good - allows AI agents
User-agent: *
Allow: /

# Bad - blocks everyone including AI
User-agent: *
Disallow: /

# Potentially problematic - blocks specific bots
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

Many sites added blanket AI blocks in 2023-2024 when the "AI is stealing our content" panic hit. That made sense for publishers protecting copyrighted content. It makes less sense for businesses that want AI to help users find their pricing page.

Common AI-related user agents to check:

  • GPTBot — OpenAI's crawler
  • ClaudeBot — Anthropic's crawler (note: different from Claude actually browsing)
  • PerplexityBot — Perplexity's crawler
  • Amazonbot — Amazon's crawler (used by Alexa)
  • anthropic-ai — Another Anthropic identifier
  • CCBot — Common Crawl (used for training data)

Important distinction: Blocking these crawlers stops your content from being indexed/trained on. It doesn't necessarily stop AI agents from visiting your site in real-time when a user asks them to. ChatGPT browsing and ClaudeBot crawling are different things.

Action: If you're blocking AI agents and don't have a good reason, consider removing those rules.

1.2 Ask an AI to visit your homepage

Open Claude, ChatGPT (with browsing), or Perplexity and try this:

Visit [yoursite.com] and tell me:
1. What is this company/site about?
2. What are the main navigation options you can see?
3. Can you find a pricing page? What URL is it at?
4. Can you find contact information?

Don't make anything up - only report what you can actually see on the page.

What you're looking for:

  • Can it access the site at all?
  • Does it understand what you do?
  • Can it identify main navigation?
  • Does it find key pages?

Red flags:

  • "I can't access that site" — check robots.txt or other blocking
  • Completely wrong description — your homepage isn't clear
  • Can't find obvious things — navigation issues
  • Hallucinates pages that don't exist — this is the AI's problem, not yours, but note it

Why I use Claude for this: Claude tends to be more transparent about what it can and can't see. ChatGPT sometimes... improvises. When I ask Claude "what can you see on this page?" I trust the answer more. Your mileage may vary.

1.3 Check key pages directly

For your 3-5 most important pages (homepage, pricing, contact, main product page), ask the AI:

Visit [specific-url] and tell me exactly what content you can see.
List the main headings and key information on this page.
Don't summarise - I want to know what's actually visible to you.

Compare the response to what you see in a browser. Major discrepancies indicate content that's invisible to AI agents — usually JavaScript-rendered content.


Part 2: The Crawl Audit (30-60 minutes)

Now let's get systematic with Screaming Frog.

2.1 Set up Screaming Frog for JS comparison

The goal is to see what a basic bot sees (HTML only) versus what a JavaScript-capable bot sees (rendered DOM).

Configuration:

  1. Open Screaming Frog
  2. Go to Configuration → Spider → Rendering
  3. Set to "JavaScript" rendering
  4. Go to Configuration → Spider → Advanced
  5. Check "Store HTML" and "Store Rendered HTML"

This lets you compare the raw HTML response with the JavaScript-rendered version.

2.2 Run the crawl

  1. Enter your site URL
  2. Click Start
  3. Let it run (for a small site, a few minutes; larger sites, longer)

2.3 Identify JS-heavy pages

Once the crawl completes:

  1. Go to the "Response Codes" tab, filter for 2xx (successful responses)
  2. Export to a spreadsheet
  3. Look at the "Word Count" column (HTML) vs rendered content

What you're looking for:

Pages where the rendered DOM has significantly more content than the raw HTML are JavaScript-dependent. These are your risk areas.

Example findings:

PageHTML Word CountRendered Word CountRisk
/pricing1501,200HIGH — pricing loads via JS
/about800850LOW — mostly static
/products2002,500HIGH — product grid is JS
/contact400420LOW — mostly static

2.4 Manual JS comparison for critical pages

For high-risk pages, do a manual check:

In Chrome:

  1. Open the page normally, note what you see
  2. Open Developer Tools (F12)
  3. Press Cmd+Option+P (Mac) or Ctrl+Shift+P (Windows)
  4. Type "Disable JavaScript" and select it
  5. Refresh the page
  6. Compare what you see now vs before

What disappears? Navigation menus? Product information? Pricing? Contact forms? Anything critical that disappears is invisible to basic bots.

Quick command-line alternative:

# Fetch raw HTML (what basic bots see)
curl -s "https://yoursite.com/pricing" | grep -i "price\|£\|\$\|cost"

# If this returns nothing but your page has pricing, it's JS-rendered

Part 3: The Task-Based Audit (60+ minutes)

This is where it gets interesting. Instead of just checking if pages exist, we test whether AI can actually find them by navigating your site.

3.1 Define your tasks

Pick 5-10 tasks that matter for your business. Think about what a user might ask an AI to do on your behalf.

Template:

Task IDTask DescriptionTarget ContentSuccess Criteria
T1Find pricing information/pricing or pricing sectionCan state specific prices or plans
T2Find contact email/contact or footerCan provide actual email address
T3Locate returns policy/returns or /policiesCan summarise return terms
T4Find product X/products/xCan describe the specific product
T5Book a demo/demo or /contactCan find the booking mechanism

Good task characteristics:

  • Specific enough to verify success/failure
  • Represents real user intent
  • Has a clear target page or content

3.2 The structured prompt

For each task, use this prompt template to get consistent, traceable results:

I want you to help me audit how well an AI agent can navigate my website.

TASK: [Your task description, e.g., "Find the pricing information"]

STARTING POINT: [Your homepage URL]

INSTRUCTIONS:
1. Start at the homepage
2. Look at the available navigation options (links, menus, buttons)
3. Choose the option most likely to lead to the task goal
4. Tell me which link you're clicking and why
5. Repeat until you either find the information or give up
6. Maximum 5 clicks

For each step, report:
- Current page URL
- Links/options you considered
- Which one you chose and why
- What you found

If you find the information, quote the relevant content.
If you can't find it after 5 steps, explain what went wrong.

Be honest about what you can and cannot see. Don't make up URLs or content.

3.3 Recording template

For each task, record:

## Task: [Description]

### Attempt 1
**Started:** [Homepage URL]
**Steps:**
1. [URL] → Clicked "[Link text]" because [reason]
2. [URL] → Clicked "[Link text]" because [reason]
3. [URL] → [Found it / Continued / Got stuck]

**Result:** SUCCESS / FAILURE
**If failed, why:** [Blocked / Couldn't find link / Dead end / Content not visible]
**Notes:** [Anything interesting]

3.4 Analysing your results

After running all tasks, you'll have data like:

TaskResultStepsFailure Reason
Find pricingSUCCESS2-
Find contactSUCCESS1-
Find returns policyFAILURE5Buried in footer dropdown
Find product XFAILURE3JS-rendered product grid
Book demoSUCCESS2-

Common failure patterns:

Blocked: The AI couldn't access the site or specific pages

  • Fix: Check robots.txt, authentication requirements, geo-blocking

Navigation failure: The AI couldn't find a path to the content

  • Fix: Make important content more prominent, use clearer link labels

Depth failure: The AI found a path but it took too many steps

  • Fix: Reduce clicks required, add shortcuts from homepage

Content invisibility: The AI reached the page but couldn't see the content

  • Fix: Ensure critical content doesn't require JavaScript

Part 4: Simple Scripts for Deeper Analysis

If you're comfortable with basic Python, these scripts help automate some checks.

Important: Run these from your own machine, not a cloud server. Many sites block requests from cloud provider IP ranges, and you'll get 403 errors that don't reflect what real users (or AI agents browsing on behalf of users) would see.

4.1 Batch robots.txt checker

Check multiple competitors or your own subdomains:

import httpx

sites = [
    "example.com",
    "competitor1.com",
    "competitor2.com",
]

ai_bots = ["GPTBot", "ClaudeBot", "PerplexityBot", "Amazonbot", "anthropic-ai"]

def check_robots(domain):
    try:
        headers = {"User-Agent": "Mozilla/5.0 (compatible; ContentAuditBot/1.0)"}
        r = httpx.get(f"https://{domain}/robots.txt", headers=headers, timeout=10, follow_redirects=True)
        if r.status_code == 200:
            content = r.text.lower()
            blocked = []
            for bot in ai_bots:
                # Simple check - looks for bot name followed by disallow
                if bot.lower() in content:
                    blocked.append(bot)
            return blocked if blocked else ["None blocked"]
        else:
            return [f"No robots.txt ({r.status_code})"]
    except Exception as e:
        return [f"Error: {e}"]

print("AI Bot Blocking Report")
print("=" * 50)
for site in sites:
    blocked = check_robots(site)
    print(f"{site}: {', '.join(blocked)}")

4.2 JS vs HTML content comparison

Compare what's in the raw HTML vs what a human sees:

import httpx
from bs4 import BeautifulSoup

def compare_content(url):
    """
    Fetches a URL and reports on HTML content.
    Note: This only sees raw HTML, not JS-rendered content.
    For full comparison, you'd need a headless browser.
    """
    headers = {
        "User-Agent": "Mozilla/5.0 (compatible; ContentAuditBot/1.0)"
    }

    r = httpx.get(url, headers=headers, timeout=15, follow_redirects=True)
    soup = BeautifulSoup(r.text, 'html.parser')

    # Remove script and style elements
    for element in soup(['script', 'style', 'noscript']):
        element.decompose()

    # Get text content
    text = soup.get_text(separator=' ', strip=True)
    word_count = len(text.split())

    # Get all links
    links = soup.find_all('a', href=True)
    nav_links = [a.get_text(strip=True) for a in links if a.get_text(strip=True)]

    # Look for common important elements
    has_pricing_words = any(word in text.lower() for word in ['price', 'pricing', '£', '$', 'cost', 'plan'])
    has_contact = any(word in text.lower() for word in ['contact', 'email', 'phone', 'call us'])

    return {
        'url': url,
        'word_count': word_count,
        'link_count': len(links),
        'nav_sample': nav_links[:10],
        'has_pricing_words': has_pricing_words,
        'has_contact_words': has_contact,
    }

# Example usage
result = compare_content("https://yoursite.com")
print(f"URL: {result['url']}")
print(f"Word count (HTML only): {result['word_count']}")
print(f"Links found: {result['link_count']}")
print(f"Sample navigation: {result['nav_sample']}")
print(f"Contains pricing language: {result['has_pricing_words']}")
print(f"Contains contact language: {result['has_contact_words']}")

4.3 Navigation link extractor

Extract and analyse navigation structure:

import httpx
from bs4 import BeautifulSoup
from urllib.parse import urljoin, urlparse

def extract_navigation(url):
    """
    Extracts navigation links and categorises them by location.
    """
    r = httpx.get(url, timeout=15, follow_redirects=True)
    soup = BeautifulSoup(r.text, 'html.parser')
    base_domain = urlparse(url).netloc

    nav_links = {
        'header': [],
        'main_nav': [],
        'footer': [],
        'sidebar': [],
        'other': []
    }

    # Find header links
    header = soup.find('header') or soup.find(class_=lambda x: x and 'header' in x.lower() if x else False)
    if header:
        for a in header.find_all('a', href=True):
            href = urljoin(url, a['href'])
            if urlparse(href).netloc == base_domain:
                nav_links['header'].append({
                    'text': a.get_text(strip=True),
                    'href': href
                })

    # Find nav element links
    nav = soup.find('nav')
    if nav:
        for a in nav.find_all('a', href=True):
            href = urljoin(url, a['href'])
            if urlparse(href).netloc == base_domain:
                nav_links['main_nav'].append({
                    'text': a.get_text(strip=True),
                    'href': href
                })

    # Find footer links
    footer = soup.find('footer') or soup.find(class_=lambda x: x and 'footer' in x.lower() if x else False)
    if footer:
        for a in footer.find_all('a', href=True):
            href = urljoin(url, a['href'])
            if urlparse(href).netloc == base_domain:
                nav_links['footer'].append({
                    'text': a.get_text(strip=True),
                    'href': href
                })

    return nav_links

# Example usage
nav = extract_navigation("https://yoursite.com")
print("Header links:", [l['text'] for l in nav['header']])
print("Main nav links:", [l['text'] for l in nav['main_nav']])
print("Footer links:", [l['text'] for l in nav['footer'][:10]], "...")  # First 10

What To Do With Your Results

Quick wins (fix this week)

  1. robots.txt blocking AI agents without good reason — Remove the blocks
  2. Critical content only visible with JavaScript — Add static fallbacks or server-side rendering
  3. Important pages buried 4+ clicks deep — Add direct links from homepage or main nav

Medium-term improvements

  1. Unclear navigation labels — "Solutions" → "Products", "Resources" → "Help Docs"
  2. Missing pages — If AI can't find your pricing because it doesn't exist, create it
  3. Confusing site structure — Reorganise so related content is near each other

Ongoing monitoring

Run this audit quarterly, or after major site changes. What works today might break tomorrow when you redesign navigation or add new JavaScript frameworks.


Or Just Use Compass

Everything in this guide takes 2-3 hours to do manually. Compass does it in 90 seconds.

We built Compass because we got tired of doing this manually for clients. It runs task-based audits, shows you exactly where AI agents get stuck, classifies failures by type, and tells you what to fix.

If you've read this far, you clearly care about AI discoverability. Try Compass and see how your site scores.


Resources

  • Screaming Frog SEO Spider — Free for up to 500 URLs
  • robots.txt documentation
  • Our guide on how AI agents navigate — Deeper dive on the concepts