How to Audit Your Site for AI Agents (The DIY Guide) — Blog

The question isn't whether AI agents will visit your site. It's whether they'll find what they're looking for when they do.

Introduction

AI agents are starting to do things on behalf of users. Not just answering questions — actually navigating websites, finding information, completing tasks. ChatGPT can browse. Claude can use computers. Perplexity synthesises from live pages. And this is just the beginning.

The question isn't whether AI agents will visit your site. It's whether they'll find what they're looking for when they do.

This guide shows you how to audit your site for AI agent discoverability — what they can see, what they can't, and what's blocking them. You can do this with free tools and a couple of hours. Or you can use dedicated tools like Compass to automate it. Either way, you'll understand what's actually happening when AI visits your site.

What you'll need:

Screaming Frog (free version works)
Access to Claude, ChatGPT, or Perplexity
A text editor
Optionally: Python 3.x and basic command line comfort

Time estimate: 2-3 hours for a thorough audit

Part 1: The Quick Checks (30 minutes)

Before diving deep, do these quick checks to catch obvious issues.

1.1 Read your robots.txt

Go to yoursite.com/robots.txt and actually read it.

What you're looking for:

# Good - allows AI agents
User-agent: *
Allow: /

# Bad - blocks everyone including AI
User-agent: *
Disallow: /

# Potentially problematic - blocks specific bots
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

Many sites added blanket AI blocks in 2023-2024 when the "AI is stealing our content" panic hit. That made sense for publishers protecting copyrighted content. It makes less sense for businesses that want AI to help users find their pricing page.

Common AI-related user agents to check:

GPTBot — OpenAI's crawler
ClaudeBot — Anthropic's crawler (note: different from Claude actually browsing)
PerplexityBot — Perplexity's crawler
Amazonbot — Amazon's crawler (used by Alexa)
anthropic-ai — Another Anthropic identifier
CCBot — Common Crawl (used for training data)

Important distinction: Blocking these crawlers stops your content from being indexed/trained on. It doesn't necessarily stop AI agents from visiting your site in real-time when a user asks them to. ChatGPT browsing and ClaudeBot crawling are different things.

Action: If you're blocking AI agents and don't have a good reason, consider removing those rules.

1.2 Ask an AI to visit your homepage

Open Claude, ChatGPT (with browsing), or Perplexity and try this:

Visit [yoursite.com] and tell me:
1. What is this company/site about?
2. What are the main navigation options you can see?
3. Can you find a pricing page? What URL is it at?
4. Can you find contact information?

Don't make anything up - only report what you can actually see on the page.

What you're looking for:

Can it access the site at all?
Does it understand what you do?
Can it identify main navigation?
Does it find key pages?

Red flags:

"I can't access that site" — check robots.txt or other blocking
Completely wrong description — your homepage isn't clear
Can't find obvious things — navigation issues
Hallucinates pages that don't exist — this is the AI's problem, not yours, but note it

Why I use Claude for this: Claude tends to be more transparent about what it can and can't see. ChatGPT sometimes... improvises. When I ask Claude "what can you see on this page?" I trust the answer more. Your mileage may vary.

1.3 Check key pages directly

For your 3-5 most important pages (homepage, pricing, contact, main product page), ask the AI:

Visit [specific-url] and tell me exactly what content you can see.
List the main headings and key information on this page.
Don't summarise - I want to know what's actually visible to you.

Compare the response to what you see in a browser. Major discrepancies indicate content that's invisible to AI agents — usually JavaScript-rendered content.

Part 2: The Crawl Audit (30-60 minutes)

Now let's get systematic with Screaming Frog.

2.1 Set up Screaming Frog for JS comparison

The goal is to see what a basic bot sees (HTML only) versus what a JavaScript-capable bot sees (rendered DOM).

Configuration:

Open Screaming Frog
Go to Configuration → Spider → Rendering
Set to "JavaScript" rendering
Go to Configuration → Spider → Advanced
Check "Store HTML" and "Store Rendered HTML"

This lets you compare the raw HTML response with the JavaScript-rendered version.

2.2 Run the crawl

Enter your site URL
Click Start
Let it run (for a small site, a few minutes; larger sites, longer)

2.3 Identify JS-heavy pages

Once the crawl completes:

Go to the "Response Codes" tab, filter for 2xx (successful responses)
Export to a spreadsheet
Look at the "Word Count" column (HTML) vs rendered content

What you're looking for:

Pages where the rendered DOM has significantly more content than the raw HTML are JavaScript-dependent. These are your risk areas.

Example findings:

Page	HTML Word Count	Rendered Word Count	Risk
/pricing	150	1,200	HIGH — pricing loads via JS
/about	800	850	LOW — mostly static
/products	200	2,500	HIGH — product grid is JS
/contact	400	420	LOW — mostly static

2.4 Manual JS comparison for critical pages

For high-risk pages, do a manual check:

In Chrome:

Open the page normally, note what you see
Open Developer Tools (F12)
Press Cmd+Option+P (Mac) or Ctrl+Shift+P (Windows)
Type "Disable JavaScript" and select it
Refresh the page
Compare what you see now vs before

What disappears? Navigation menus? Product information? Pricing? Contact forms? Anything critical that disappears is invisible to basic bots.

Quick command-line alternative:

# Fetch raw HTML (what basic bots see)
curl -s "https://yoursite.com/pricing" | grep -i "price\|£\|\$\|cost"

# If this returns nothing but your page has pricing, it's JS-rendered

Part 3: The Task-Based Audit (60+ minutes)

This is where it gets interesting. Instead of just checking if pages exist, we test whether AI can actually find them by navigating your site.

3.1 Define your tasks

Pick 5-10 tasks that matter for your business. Think about what a user might ask an AI to do on your behalf.

Template:

Task ID	Task Description	Target Content	Success Criteria
T1	Find pricing information	/pricing or pricing section	Can state specific prices or plans
T2	Find contact email	/contact or footer	Can provide actual email address
T3	Locate returns policy	/returns or /policies	Can summarise return terms
T4	Find product X	/products/x	Can describe the specific product
T5	Book a demo	/demo or /contact	Can find the booking mechanism

Good task characteristics:

Specific enough to verify success/failure
Represents real user intent
Has a clear target page or content

3.2 The structured prompt

For each task, use this prompt template to get consistent, traceable results:

I want you to help me audit how well an AI agent can navigate my website.

TASK: [Your task description, e.g., "Find the pricing information"]

STARTING POINT: [Your homepage URL]

INSTRUCTIONS:
1. Start at the homepage
2. Look at the available navigation options (links, menus, buttons)
3. Choose the option most likely to lead to the task goal
4. Tell me which link you're clicking and why
5. Repeat until you either find the information or give up
6. Maximum 5 clicks

For each step, report:
- Current page URL
- Links/options you considered
- Which one you chose and why
- What you found

If you find the information, quote the relevant content.
If you can't find it after 5 steps, explain what went wrong.

Be honest about what you can and cannot see. Don't make up URLs or content.

3.3 Recording template

For each task, record:

## Task: [Description]

### Attempt 1
**Started:** [Homepage URL]
**Steps:**
1. [URL] → Clicked "[Link text]" because [reason]
2. [URL] → Clicked "[Link text]" because [reason]
3. [URL] → [Found it / Continued / Got stuck]

**Result:** SUCCESS / FAILURE
**If failed, why:** [Blocked / Couldn't find link / Dead end / Content not visible]
**Notes:** [Anything interesting]

3.4 Analysing your results

After running all tasks, you'll have data like:

Task	Result	Steps	Failure Reason
Find pricing	SUCCESS	2	-
Find contact	SUCCESS	1	-
Find returns policy	FAILURE	5	Buried in footer dropdown
Find product X	FAILURE	3	JS-rendered product grid
Book demo	SUCCESS	2	-

Common failure patterns:

Blocked: The AI couldn't access the site or specific pages

Fix: Check robots.txt, authentication requirements, geo-blocking

Navigation failure: The AI couldn't find a path to the content

Fix: Make important content more prominent, use clearer link labels

Depth failure: The AI found a path but it took too many steps

Fix: Reduce clicks required, add shortcuts from homepage

Content invisibility: The AI reached the page but couldn't see the content

Fix: Ensure critical content doesn't require JavaScript

Part 4: Simple Scripts for Deeper Analysis

If you're comfortable with basic Python, these scripts help automate some checks.

Important: Run these from your own machine, not a cloud server. Many sites block requests from cloud provider IP ranges, and you'll get 403 errors that don't reflect what real users (or AI agents browsing on behalf of users) would see.

4.1 Batch robots.txt checker

Check multiple competitors or your own subdomains:

import httpx

sites = [
    "example.com",
    "competitor1.com",
    "competitor2.com",
]

ai_bots = ["GPTBot", "ClaudeBot", "PerplexityBot", "Amazonbot", "anthropic-ai"]

def check_robots(domain):
    try:
        headers = {"User-Agent": "Mozilla/5.0 (compatible; ContentAuditBot/1.0)"}
        r = httpx.get(f"https://{domain}/robots.txt", headers=headers, timeout=10, follow_redirects=True)
        if r.status_code == 200:
            content = r.text.lower()
            blocked = []
            for bot in ai_bots:
                # Simple check - looks for bot name followed by disallow
                if bot.lower() in content:
                    blocked.append(bot)
            return blocked if blocked else ["None blocked"]
        else:
            return [f"No robots.txt ({r.status_code})"]
    except Exception as e:
        return [f"Error: {e}"]

print("AI Bot Blocking Report")
print("=" * 50)
for site in sites:
    blocked = check_robots(site)
    print(f"{site}: {', '.join(blocked)}")

4.2 JS vs HTML content comparison

Compare what's in the raw HTML vs what a human sees:

import httpx
from bs4 import BeautifulSoup

def compare_content(url):
    """
    Fetches a URL and reports on HTML content.
    Note: This only sees raw HTML, not JS-rendered content.
    For full comparison, you'd need a headless browser.
    """
    headers = {
        "User-Agent": "Mozilla/5.0 (compatible; ContentAuditBot/1.0)"
    }

    r = httpx.get(url, headers=headers, timeout=15, follow_redirects=True)
    soup = BeautifulSoup(r.text, 'html.parser')

    # Remove script and style elements
    for element in soup(['script', 'style', 'noscript']):
        element.decompose()

    # Get text content
    text = soup.get_text(separator=' ', strip=True)
    word_count = len(text.split())

    # Get all links
    links = soup.find_all('a', href=True)
    nav_links = [a.get_text(strip=True) for a in links if a.get_text(strip=True)]

    # Look for common important elements
    has_pricing_words = any(word in text.lower() for word in ['price', 'pricing', '£', '$', 'cost', 'plan'])
    has_contact = any(word in text.lower() for word in ['contact', 'email', 'phone', 'call us'])

    return {
        'url': url,
        'word_count': word_count,
        'link_count': len(links),
        'nav_sample': nav_links[:10],
        'has_pricing_words': has_pricing_words,
        'has_contact_words': has_contact,
    }

# Example usage
result = compare_content("https://yoursite.com")
print(f"URL: {result['url']}")
print(f"Word count (HTML only): {result['word_count']}")
print(f"Links found: {result['link_count']}")
print(f"Sample navigation: {result['nav_sample']}")
print(f"Contains pricing language: {result['has_pricing_words']}")
print(f"Contains contact language: {result['has_contact_words']}")

4.3 Navigation link extractor

Extract and analyse navigation structure:

import httpx
from bs4 import BeautifulSoup
from urllib.parse import urljoin, urlparse

def extract_navigation(url):
    """
    Extracts navigation links and categorises them by location.
    """
    r = httpx.get(url, timeout=15, follow_redirects=True)
    soup = BeautifulSoup(r.text, 'html.parser')
    base_domain = urlparse(url).netloc

    nav_links = {
        'header': [],
        'main_nav': [],
        'footer': [],
        'sidebar': [],
        'other': []
    }

    # Find header links
    header = soup.find('header') or soup.find(class_=lambda x: x and 'header' in x.lower() if x else False)
    if header:
        for a in header.find_all('a', href=True):
            href = urljoin(url, a['href'])
            if urlparse(href).netloc == base_domain:
                nav_links['header'].append({
                    'text': a.get_text(strip=True),
                    'href': href
                })

    # Find nav element links
    nav = soup.find('nav')
    if nav:
        for a in nav.find_all('a', href=True):
            href = urljoin(url, a['href'])
            if urlparse(href).netloc == base_domain:
                nav_links['main_nav'].append({
                    'text': a.get_text(strip=True),
                    'href': href
                })

    # Find footer links
    footer = soup.find('footer') or soup.find(class_=lambda x: x and 'footer' in x.lower() if x else False)
    if footer:
        for a in footer.find_all('a', href=True):
            href = urljoin(url, a['href'])
            if urlparse(href).netloc == base_domain:
                nav_links['footer'].append({
                    'text': a.get_text(strip=True),
                    'href': href
                })

    return nav_links

# Example usage
nav = extract_navigation("https://yoursite.com")
print("Header links:", [l['text'] for l in nav['header']])
print("Main nav links:", [l['text'] for l in nav['main_nav']])
print("Footer links:", [l['text'] for l in nav['footer'][:10]], "...")  # First 10

What To Do With Your Results

Quick wins (fix this week)

robots.txt blocking AI agents without good reason — Remove the blocks
Critical content only visible with JavaScript — Add static fallbacks or server-side rendering
Important pages buried 4+ clicks deep — Add direct links from homepage or main nav

Medium-term improvements

Unclear navigation labels — "Solutions" → "Products", "Resources" → "Help Docs"
Missing pages — If AI can't find your pricing because it doesn't exist, create it
Confusing site structure — Reorganise so related content is near each other

Ongoing monitoring

Run this audit quarterly, or after major site changes. What works today might break tomorrow when you redesign navigation or add new JavaScript frameworks.

Or Just Use Compass

Everything in this guide takes 2-3 hours to do manually. Compass does it in 90 seconds.

We built Compass because we got tired of doing this manually for clients. It runs task-based audits, shows you exactly where AI agents get stuck, classifies failures by type, and tells you what to fix.

If you've read this far, you clearly care about AI discoverability. Try Compass and see how your site scores.

Resources

Screaming Frog SEO Spider — Free for up to 500 URLs
robots.txt documentation
Our guide on how AI agents navigate — Deeper dive on the concepts

The question isn't whether AI agents will visit your site. It's whether they'll find what they're looking for when they do.

Introduction

The question isn't whether AI agents will visit your site. It's whether they'll find what they're looking for when they do.

What you'll need:

Screaming Frog (free version works)
Access to Claude, ChatGPT, or Perplexity
A text editor
Optionally: Python 3.x and basic command line comfort

Time estimate: 2-3 hours for a thorough audit

Part 1: The Quick Checks (30 minutes)

Before diving deep, do these quick checks to catch obvious issues.

1.1 Read your robots.txt

Go to yoursite.com/robots.txt and actually read it.

What you're looking for:

# Good - allows AI agents
User-agent: *
Allow: /

# Bad - blocks everyone including AI
User-agent: *
Disallow: /

# Potentially problematic - blocks specific bots
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

Common AI-related user agents to check:

GPTBot — OpenAI's crawler
ClaudeBot — Anthropic's crawler (note: different from Claude actually browsing)
PerplexityBot — Perplexity's crawler
Amazonbot — Amazon's crawler (used by Alexa)
anthropic-ai — Another Anthropic identifier
CCBot — Common Crawl (used for training data)

Action: If you're blocking AI agents and don't have a good reason, consider removing those rules.

1.2 Ask an AI to visit your homepage

Open Claude, ChatGPT (with browsing), or Perplexity and try this:

Visit [yoursite.com] and tell me:
1. What is this company/site about?
2. What are the main navigation options you can see?
3. Can you find a pricing page? What URL is it at?
4. Can you find contact information?

Don't make anything up - only report what you can actually see on the page.

What you're looking for:

Can it access the site at all?
Does it understand what you do?
Can it identify main navigation?
Does it find key pages?

Red flags:

"I can't access that site" — check robots.txt or other blocking
Completely wrong description — your homepage isn't clear
Can't find obvious things — navigation issues
Hallucinates pages that don't exist — this is the AI's problem, not yours, but note it

1.3 Check key pages directly

For your 3-5 most important pages (homepage, pricing, contact, main product page), ask the AI:

Visit [specific-url] and tell me exactly what content you can see.
List the main headings and key information on this page.
Don't summarise - I want to know what's actually visible to you.

Compare the response to what you see in a browser. Major discrepancies indicate content that's invisible to AI agents — usually JavaScript-rendered content.

Part 2: The Crawl Audit (30-60 minutes)

Now let's get systematic with Screaming Frog.

2.1 Set up Screaming Frog for JS comparison

The goal is to see what a basic bot sees (HTML only) versus what a JavaScript-capable bot sees (rendered DOM).

Configuration:

Open Screaming Frog
Go to Configuration → Spider → Rendering
Set to "JavaScript" rendering
Go to Configuration → Spider → Advanced
Check "Store HTML" and "Store Rendered HTML"

This lets you compare the raw HTML response with the JavaScript-rendered version.

2.2 Run the crawl

Enter your site URL
Click Start
Let it run (for a small site, a few minutes; larger sites, longer)

2.3 Identify JS-heavy pages

Once the crawl completes:

Go to the "Response Codes" tab, filter for 2xx (successful responses)
Export to a spreadsheet
Look at the "Word Count" column (HTML) vs rendered content

What you're looking for:

Pages where the rendered DOM has significantly more content than the raw HTML are JavaScript-dependent. These are your risk areas.

Example findings:

Page	HTML Word Count	Rendered Word Count	Risk
/pricing	150	1,200	HIGH — pricing loads via JS
/about	800	850	LOW — mostly static
/products	200	2,500	HIGH — product grid is JS
/contact	400	420	LOW — mostly static

2.4 Manual JS comparison for critical pages

For high-risk pages, do a manual check:

In Chrome:

Open the page normally, note what you see
Open Developer Tools (F12)
Press Cmd+Option+P (Mac) or Ctrl+Shift+P (Windows)
Type "Disable JavaScript" and select it
Refresh the page
Compare what you see now vs before

What disappears? Navigation menus? Product information? Pricing? Contact forms? Anything critical that disappears is invisible to basic bots.

Quick command-line alternative:

# Fetch raw HTML (what basic bots see)
curl -s "https://yoursite.com/pricing" | grep -i "price\|£\|\$\|cost"

# If this returns nothing but your page has pricing, it's JS-rendered

Part 3: The Task-Based Audit (60+ minutes)

This is where it gets interesting. Instead of just checking if pages exist, we test whether AI can actually find them by navigating your site.

3.1 Define your tasks

Pick 5-10 tasks that matter for your business. Think about what a user might ask an AI to do on your behalf.

Template:

Task ID	Task Description	Target Content	Success Criteria
T1	Find pricing information	/pricing or pricing section	Can state specific prices or plans
T2	Find contact email	/contact or footer	Can provide actual email address
T3	Locate returns policy	/returns or /policies	Can summarise return terms
T4	Find product X	/products/x	Can describe the specific product
T5	Book a demo	/demo or /contact	Can find the booking mechanism

Good task characteristics:

Specific enough to verify success/failure
Represents real user intent
Has a clear target page or content

3.2 The structured prompt

For each task, use this prompt template to get consistent, traceable results:

I want you to help me audit how well an AI agent can navigate my website.

TASK: [Your task description, e.g., "Find the pricing information"]

STARTING POINT: [Your homepage URL]

INSTRUCTIONS:
1. Start at the homepage
2. Look at the available navigation options (links, menus, buttons)
3. Choose the option most likely to lead to the task goal
4. Tell me which link you're clicking and why
5. Repeat until you either find the information or give up
6. Maximum 5 clicks

For each step, report:
- Current page URL
- Links/options you considered
- Which one you chose and why
- What you found

If you find the information, quote the relevant content.
If you can't find it after 5 steps, explain what went wrong.

Be honest about what you can and cannot see. Don't make up URLs or content.

3.3 Recording template

For each task, record:

## Task: [Description]

### Attempt 1
**Started:** [Homepage URL]
**Steps:**
1. [URL] → Clicked "[Link text]" because [reason]
2. [URL] → Clicked "[Link text]" because [reason]
3. [URL] → [Found it / Continued / Got stuck]

**Result:** SUCCESS / FAILURE
**If failed, why:** [Blocked / Couldn't find link / Dead end / Content not visible]
**Notes:** [Anything interesting]

3.4 Analysing your results

After running all tasks, you'll have data like:

Task	Result	Steps	Failure Reason
Find pricing	SUCCESS	2	-
Find contact	SUCCESS	1	-
Find returns policy	FAILURE	5	Buried in footer dropdown
Find product X	FAILURE	3	JS-rendered product grid
Book demo	SUCCESS	2	-

Common failure patterns:

Blocked: The AI couldn't access the site or specific pages

Fix: Check robots.txt, authentication requirements, geo-blocking

Navigation failure: The AI couldn't find a path to the content

Fix: Make important content more prominent, use clearer link labels

Depth failure: The AI found a path but it took too many steps

Fix: Reduce clicks required, add shortcuts from homepage

Content invisibility: The AI reached the page but couldn't see the content

Fix: Ensure critical content doesn't require JavaScript

Part 4: Simple Scripts for Deeper Analysis

If you're comfortable with basic Python, these scripts help automate some checks.

4.1 Batch robots.txt checker

Check multiple competitors or your own subdomains:

import httpx

sites = [
    "example.com",
    "competitor1.com",
    "competitor2.com",
]

ai_bots = ["GPTBot", "ClaudeBot", "PerplexityBot", "Amazonbot", "anthropic-ai"]

def check_robots(domain):
    try:
        headers = {"User-Agent": "Mozilla/5.0 (compatible; ContentAuditBot/1.0)"}
        r = httpx.get(f"https://{domain}/robots.txt", headers=headers, timeout=10, follow_redirects=True)
        if r.status_code == 200:
            content = r.text.lower()
            blocked = []
            for bot in ai_bots:
                # Simple check - looks for bot name followed by disallow
                if bot.lower() in content:
                    blocked.append(bot)
            return blocked if blocked else ["None blocked"]
        else:
            return [f"No robots.txt ({r.status_code})"]
    except Exception as e:
        return [f"Error: {e}"]

print("AI Bot Blocking Report")
print("=" * 50)
for site in sites:
    blocked = check_robots(site)
    print(f"{site}: {', '.join(blocked)}")

4.2 JS vs HTML content comparison

Compare what's in the raw HTML vs what a human sees:

import httpx
from bs4 import BeautifulSoup

def compare_content(url):
    """
    Fetches a URL and reports on HTML content.
    Note: This only sees raw HTML, not JS-rendered content.
    For full comparison, you'd need a headless browser.
    """
    headers = {
        "User-Agent": "Mozilla/5.0 (compatible; ContentAuditBot/1.0)"
    }

    r = httpx.get(url, headers=headers, timeout=15, follow_redirects=True)
    soup = BeautifulSoup(r.text, 'html.parser')

    # Remove script and style elements
    for element in soup(['script', 'style', 'noscript']):
        element.decompose()

    # Get text content
    text = soup.get_text(separator=' ', strip=True)
    word_count = len(text.split())

    # Get all links
    links = soup.find_all('a', href=True)
    nav_links = [a.get_text(strip=True) for a in links if a.get_text(strip=True)]

    # Look for common important elements
    has_pricing_words = any(word in text.lower() for word in ['price', 'pricing', '£', '$', 'cost', 'plan'])
    has_contact = any(word in text.lower() for word in ['contact', 'email', 'phone', 'call us'])

    return {
        'url': url,
        'word_count': word_count,
        'link_count': len(links),
        'nav_sample': nav_links[:10],
        'has_pricing_words': has_pricing_words,
        'has_contact_words': has_contact,
    }

# Example usage
result = compare_content("https://yoursite.com")
print(f"URL: {result['url']}")
print(f"Word count (HTML only): {result['word_count']}")
print(f"Links found: {result['link_count']}")
print(f"Sample navigation: {result['nav_sample']}")
print(f"Contains pricing language: {result['has_pricing_words']}")
print(f"Contains contact language: {result['has_contact_words']}")

4.3 Navigation link extractor

Extract and analyse navigation structure:

import httpx
from bs4 import BeautifulSoup
from urllib.parse import urljoin, urlparse

def extract_navigation(url):
    """
    Extracts navigation links and categorises them by location.
    """
    r = httpx.get(url, timeout=15, follow_redirects=True)
    soup = BeautifulSoup(r.text, 'html.parser')
    base_domain = urlparse(url).netloc

    nav_links = {
        'header': [],
        'main_nav': [],
        'footer': [],
        'sidebar': [],
        'other': []
    }

    # Find header links
    header = soup.find('header') or soup.find(class_=lambda x: x and 'header' in x.lower() if x else False)
    if header:
        for a in header.find_all('a', href=True):
            href = urljoin(url, a['href'])
            if urlparse(href).netloc == base_domain:
                nav_links['header'].append({
                    'text': a.get_text(strip=True),
                    'href': href
                })

    # Find nav element links
    nav = soup.find('nav')
    if nav:
        for a in nav.find_all('a', href=True):
            href = urljoin(url, a['href'])
            if urlparse(href).netloc == base_domain:
                nav_links['main_nav'].append({
                    'text': a.get_text(strip=True),
                    'href': href
                })

    # Find footer links
    footer = soup.find('footer') or soup.find(class_=lambda x: x and 'footer' in x.lower() if x else False)
    if footer:
        for a in footer.find_all('a', href=True):
            href = urljoin(url, a['href'])
            if urlparse(href).netloc == base_domain:
                nav_links['footer'].append({
                    'text': a.get_text(strip=True),
                    'href': href
                })

    return nav_links

# Example usage
nav = extract_navigation("https://yoursite.com")
print("Header links:", [l['text'] for l in nav['header']])
print("Main nav links:", [l['text'] for l in nav['main_nav']])
print("Footer links:", [l['text'] for l in nav['footer'][:10]], "...")  # First 10

What To Do With Your Results

Quick wins (fix this week)

robots.txt blocking AI agents without good reason — Remove the blocks
Critical content only visible with JavaScript — Add static fallbacks or server-side rendering
Important pages buried 4+ clicks deep — Add direct links from homepage or main nav

Medium-term improvements

Unclear navigation labels — "Solutions" → "Products", "Resources" → "Help Docs"
Missing pages — If AI can't find your pricing because it doesn't exist, create it
Confusing site structure — Reorganise so related content is near each other

Ongoing monitoring

Run this audit quarterly, or after major site changes. What works today might break tomorrow when you redesign navigation or add new JavaScript frameworks.

Or Just Use Compass

Everything in this guide takes 2-3 hours to do manually. Compass does it in 90 seconds.

If you've read this far, you clearly care about AI discoverability. Try Compass and see how your site scores.

Resources

Screaming Frog SEO Spider — Free for up to 500 URLs
robots.txt documentation
Our guide on how AI agents navigate — Deeper dive on the concepts