How to conduct an AI accessibility audit of your site using free tools
The question isn't whether AI agents will visit your site. It's whether they'll find what they're looking for when they do.
AI agents are starting to do things on behalf of users. Not just answering questions — actually navigating websites, finding information, completing tasks. ChatGPT can browse. Claude can use computers. Perplexity synthesises from live pages. And this is just the beginning.
The question isn't whether AI agents will visit your site. It's whether they'll find what they're looking for when they do.
This guide shows you how to audit your site for AI agent discoverability — what they can see, what they can't, and what's blocking them. You can do this with free tools and a couple of hours. Or you can use dedicated tools like Compass to automate it. Either way, you'll understand what's actually happening when AI visits your site.
What you'll need:
Time estimate: 2-3 hours for a thorough audit
Before diving deep, do these quick checks to catch obvious issues.
Go to yoursite.com/robots.txt and actually read it.
What you're looking for:
# Good - allows AI agents
User-agent: *
Allow: /
# Bad - blocks everyone including AI
User-agent: *
Disallow: /
# Potentially problematic - blocks specific bots
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
Many sites added blanket AI blocks in 2023-2024 when the "AI is stealing our content" panic hit. That made sense for publishers protecting copyrighted content. It makes less sense for businesses that want AI to help users find their pricing page.
Common AI-related user agents to check:
GPTBot — OpenAI's crawlerClaudeBot — Anthropic's crawler (note: different from Claude actually browsing)PerplexityBot — Perplexity's crawlerAmazonbot — Amazon's crawler (used by Alexa)anthropic-ai — Another Anthropic identifierCCBot — Common Crawl (used for training data)Important distinction: Blocking these crawlers stops your content from being indexed/trained on. It doesn't necessarily stop AI agents from visiting your site in real-time when a user asks them to. ChatGPT browsing and ClaudeBot crawling are different things.
Action: If you're blocking AI agents and don't have a good reason, consider removing those rules.
Open Claude, ChatGPT (with browsing), or Perplexity and try this:
Visit [yoursite.com] and tell me:
1. What is this company/site about?
2. What are the main navigation options you can see?
3. Can you find a pricing page? What URL is it at?
4. Can you find contact information?
Don't make anything up - only report what you can actually see on the page.
What you're looking for:
Red flags:
Why I use Claude for this: Claude tends to be more transparent about what it can and can't see. ChatGPT sometimes... improvises. When I ask Claude "what can you see on this page?" I trust the answer more. Your mileage may vary.
For your 3-5 most important pages (homepage, pricing, contact, main product page), ask the AI:
Visit [specific-url] and tell me exactly what content you can see.
List the main headings and key information on this page.
Don't summarise - I want to know what's actually visible to you.
Compare the response to what you see in a browser. Major discrepancies indicate content that's invisible to AI agents — usually JavaScript-rendered content.
Now let's get systematic with Screaming Frog.
The goal is to see what a basic bot sees (HTML only) versus what a JavaScript-capable bot sees (rendered DOM).
Configuration:
This lets you compare the raw HTML response with the JavaScript-rendered version.
Once the crawl completes:
What you're looking for:
Pages where the rendered DOM has significantly more content than the raw HTML are JavaScript-dependent. These are your risk areas.
Example findings:
| Page | HTML Word Count | Rendered Word Count | Risk |
|---|---|---|---|
| /pricing | 150 | 1,200 | HIGH — pricing loads via JS |
| /about | 800 | 850 | LOW — mostly static |
| /products | 200 | 2,500 | HIGH — product grid is JS |
| /contact | 400 | 420 | LOW — mostly static |
For high-risk pages, do a manual check:
In Chrome:
What disappears? Navigation menus? Product information? Pricing? Contact forms? Anything critical that disappears is invisible to basic bots.
Quick command-line alternative:
# Fetch raw HTML (what basic bots see)
curl -s "https://yoursite.com/pricing" | grep -i "price\|£\|\$\|cost"
# If this returns nothing but your page has pricing, it's JS-rendered
This is where it gets interesting. Instead of just checking if pages exist, we test whether AI can actually find them by navigating your site.
Pick 5-10 tasks that matter for your business. Think about what a user might ask an AI to do on your behalf.
Template:
| Task ID | Task Description | Target Content | Success Criteria |
|---|---|---|---|
| T1 | Find pricing information | /pricing or pricing section | Can state specific prices or plans |
| T2 | Find contact email | /contact or footer | Can provide actual email address |
| T3 | Locate returns policy | /returns or /policies | Can summarise return terms |
| T4 | Find product X | /products/x | Can describe the specific product |
| T5 | Book a demo | /demo or /contact | Can find the booking mechanism |
Good task characteristics:
For each task, use this prompt template to get consistent, traceable results:
I want you to help me audit how well an AI agent can navigate my website.
TASK: [Your task description, e.g., "Find the pricing information"]
STARTING POINT: [Your homepage URL]
INSTRUCTIONS:
1. Start at the homepage
2. Look at the available navigation options (links, menus, buttons)
3. Choose the option most likely to lead to the task goal
4. Tell me which link you're clicking and why
5. Repeat until you either find the information or give up
6. Maximum 5 clicks
For each step, report:
- Current page URL
- Links/options you considered
- Which one you chose and why
- What you found
If you find the information, quote the relevant content.
If you can't find it after 5 steps, explain what went wrong.
Be honest about what you can and cannot see. Don't make up URLs or content.
For each task, record:
## Task: [Description]
### Attempt 1
**Started:** [Homepage URL]
**Steps:**
1. [URL] → Clicked "[Link text]" because [reason]
2. [URL] → Clicked "[Link text]" because [reason]
3. [URL] → [Found it / Continued / Got stuck]
**Result:** SUCCESS / FAILURE
**If failed, why:** [Blocked / Couldn't find link / Dead end / Content not visible]
**Notes:** [Anything interesting]
After running all tasks, you'll have data like:
| Task | Result | Steps | Failure Reason |
|---|---|---|---|
| Find pricing | SUCCESS | 2 | - |
| Find contact | SUCCESS | 1 | - |
| Find returns policy | FAILURE | 5 | Buried in footer dropdown |
| Find product X | FAILURE | 3 | JS-rendered product grid |
| Book demo | SUCCESS | 2 | - |
Common failure patterns:
Blocked: The AI couldn't access the site or specific pages
Navigation failure: The AI couldn't find a path to the content
Depth failure: The AI found a path but it took too many steps
Content invisibility: The AI reached the page but couldn't see the content
If you're comfortable with basic Python, these scripts help automate some checks.
Important: Run these from your own machine, not a cloud server. Many sites block requests from cloud provider IP ranges, and you'll get 403 errors that don't reflect what real users (or AI agents browsing on behalf of users) would see.
Check multiple competitors or your own subdomains:
import httpx
sites = [
"example.com",
"competitor1.com",
"competitor2.com",
]
ai_bots = ["GPTBot", "ClaudeBot", "PerplexityBot", "Amazonbot", "anthropic-ai"]
def check_robots(domain):
try:
headers = {"User-Agent": "Mozilla/5.0 (compatible; ContentAuditBot/1.0)"}
r = httpx.get(f"https://{domain}/robots.txt", headers=headers, timeout=10, follow_redirects=True)
if r.status_code == 200:
content = r.text.lower()
blocked = []
for bot in ai_bots:
# Simple check - looks for bot name followed by disallow
if bot.lower() in content:
blocked.append(bot)
return blocked if blocked else ["None blocked"]
else:
return [f"No robots.txt ({r.status_code})"]
except Exception as e:
return [f"Error: {e}"]
print("AI Bot Blocking Report")
print("=" * 50)
for site in sites:
blocked = check_robots(site)
print(f"{site}: {', '.join(blocked)}")
Compare what's in the raw HTML vs what a human sees:
import httpx
from bs4 import BeautifulSoup
def compare_content(url):
"""
Fetches a URL and reports on HTML content.
Note: This only sees raw HTML, not JS-rendered content.
For full comparison, you'd need a headless browser.
"""
headers = {
"User-Agent": "Mozilla/5.0 (compatible; ContentAuditBot/1.0)"
}
r = httpx.get(url, headers=headers, timeout=15, follow_redirects=True)
soup = BeautifulSoup(r.text, 'html.parser')
# Remove script and style elements
for element in soup(['script', 'style', 'noscript']):
element.decompose()
# Get text content
text = soup.get_text(separator=' ', strip=True)
word_count = len(text.split())
# Get all links
links = soup.find_all('a', href=True)
nav_links = [a.get_text(strip=True) for a in links if a.get_text(strip=True)]
# Look for common important elements
has_pricing_words = any(word in text.lower() for word in ['price', 'pricing', '£', '$', 'cost', 'plan'])
has_contact = any(word in text.lower() for word in ['contact', 'email', 'phone', 'call us'])
return {
'url': url,
'word_count': word_count,
'link_count': len(links),
'nav_sample': nav_links[:10],
'has_pricing_words': has_pricing_words,
'has_contact_words': has_contact,
}
# Example usage
result = compare_content("https://yoursite.com")
print(f"URL: {result['url']}")
print(f"Word count (HTML only): {result['word_count']}")
print(f"Links found: {result['link_count']}")
print(f"Sample navigation: {result['nav_sample']}")
print(f"Contains pricing language: {result['has_pricing_words']}")
print(f"Contains contact language: {result['has_contact_words']}")
Extract and analyse navigation structure:
import httpx
from bs4 import BeautifulSoup
from urllib.parse import urljoin, urlparse
def extract_navigation(url):
"""
Extracts navigation links and categorises them by location.
"""
r = httpx.get(url, timeout=15, follow_redirects=True)
soup = BeautifulSoup(r.text, 'html.parser')
base_domain = urlparse(url).netloc
nav_links = {
'header': [],
'main_nav': [],
'footer': [],
'sidebar': [],
'other': []
}
# Find header links
header = soup.find('header') or soup.find(class_=lambda x: x and 'header' in x.lower() if x else False)
if header:
for a in header.find_all('a', href=True):
href = urljoin(url, a['href'])
if urlparse(href).netloc == base_domain:
nav_links['header'].append({
'text': a.get_text(strip=True),
'href': href
})
# Find nav element links
nav = soup.find('nav')
if nav:
for a in nav.find_all('a', href=True):
href = urljoin(url, a['href'])
if urlparse(href).netloc == base_domain:
nav_links['main_nav'].append({
'text': a.get_text(strip=True),
'href': href
})
# Find footer links
footer = soup.find('footer') or soup.find(class_=lambda x: x and 'footer' in x.lower() if x else False)
if footer:
for a in footer.find_all('a', href=True):
href = urljoin(url, a['href'])
if urlparse(href).netloc == base_domain:
nav_links['footer'].append({
'text': a.get_text(strip=True),
'href': href
})
return nav_links
# Example usage
nav = extract_navigation("https://yoursite.com")
print("Header links:", [l['text'] for l in nav['header']])
print("Main nav links:", [l['text'] for l in nav['main_nav']])
print("Footer links:", [l['text'] for l in nav['footer'][:10]], "...") # First 10
Run this audit quarterly, or after major site changes. What works today might break tomorrow when you redesign navigation or add new JavaScript frameworks.
Everything in this guide takes 2-3 hours to do manually. Compass does it in 90 seconds.
We built Compass because we got tired of doing this manually for clients. It runs task-based audits, shows you exactly where AI agents get stuck, classifies failures by type, and tells you what to fix.
If you've read this far, you clearly care about AI discoverability. Try Compass and see how your site scores.