AI Visibility and Technical Audits
Beginner Level
1. What is AI Visibility?
AI visibility means: Can an AI system (like ChatGPT, Claude, or Perplexity) actually read and understand your website?
Just like Google needs to crawl your site to rank you in search, LLMs (Large Language Models) need to access your text, metadata, and schema to use your content in AI search results and answers.
If your site hides content behind JavaScript, blocks crawlers in robots.txt, or only serves schema with JS, AI systems may not see it at all.
2. What is a Technical Audit?
A technical audit is like a “health check” for your website.
Instead of checking what the content says, we check how it’s delivered to machines.
For AI visibility, we ask: Does the raw code of the page contain all the signals an AI crawler needs?
3. Key Definitions
Static HTML: The original code a server sends before JavaScript runs. Think of it as the “bare bones” of your webpage.
JavaScript (JS): Code that can change or add things on the page after it loads. Many AI crawlers don’t run JS, so if content only shows up with JS, AIs might not see it.
robots.txt: A file that tells crawlers which parts of your site they can or can’t access.
Metadata: Hidden signals in the
<head>
of your site (title, description, canonical, etc.) that describe your page.Schema (JSON-LD): Structured data (special code in
<script type="application/ld+json">
) that tells machines what a page is about (e.g., product, article, organization).Parity: Comparing what a page looks like with JS ON vs JS OFF. If important stuff disappears when JS is off, crawlers/LLMs may miss it.
4. The HEAD vs BODY Test
When doing an audit, we check two main areas of the page with JavaScript disabled:
HEAD (the invisible signals at the top of the page)
Ask:
Is the
<title>
there?Are
<meta>
descriptions present?Is the
<link rel="canonical">
tag there?Is schema (
<script type="application/ld+json">
) present?
BODY (the visible content of the page)
Ask:
Is the key copy (actual text) visible without JS?
Are there crawlable links (
<a href=...>
not just#
orjavascript:
)?Is any schema markup placed in the body?
Does the navigation menu appear without JS?
5. How to Run the Check in DevTools (Simple Steps)
Open Chrome → Right-click → “Inspect” → DevTools.
Go to Settings → Debugger → Disable JavaScript.
Reload the page.
Look at the Elements tab → You now see the static HTML (what crawlers/LLMs see).
Inspect both:
<head>
section for metadata and schema.<body>
section for text and links.
Tip: Chrome’s DevTools now has an AI assistant panel where you can literally ask in plain English:
“With JavaScript disabled, are the key copy, crawlable links, metadata, and schema present in the static HTML?”
6. Why This Matters
If your key copy is missing → AI can’t read your content.
If metadata is missing → AI can’t label or summarise your page.
If schema is missing → AI can’t understand entities (products, articles, organization).
If robots.txt blocks crawlers → AI can’t even access the site.
7. Example Questions (Yes/No Style)
Does
/robots.txt
allow all crawlers?Is there a sitemap (
/sitemap.xml
or equivalent)?Is Organization schema present on the homepage?
Is Article schema present on blogs?
Is Product schema present on product pages (PDPs)?
Is HTML content visible without JS?
8. The Goal of the Audit
Expose important content in static HTML (so AIs can read it).
Ensure metadata and schema are server-rendered (not just JS).
Allow AI bots in robots.txt (if you want to appear in AI search).
Keep parity between JS-on and JS-off versions.
When you fix these things, your site is AI-visible and has a much higher chance of being included in LLM search and answers.
Intermediate Level
1. Moving Beyond Basics
At beginner level, we asked: “Can AI crawlers see my content at all?”
At intermediate level, we dig deeper: “What exactly are they seeing, and is it the same as what humans see?”
This means we don’t just confirm presence — we measure quality, consistency, and parity.
2. Core Concepts (with Slightly More Depth)
Static HTML vs Rendered DOM:
Static HTML = what the server sends first.
Rendered DOM = what browsers build after JavaScript executes.
Most AI crawlers read static HTML only.Parity Testing:
Compare JS-disabled vs JS-enabled output. Differences highlight gaps in what AIs see vs what users see.Directives Hierarchy:
robots.txt
→ global crawler permissions.<meta name="robots">
→ page-level instructions.X-Robots-Tag
→ server-level header instructions.
Conflicts confuse crawlers and should be avoided.
Schema Depth:
At beginner level, we checked “is schema there or not?”
At intermediate level, we check:Correct schema types (Product, Article, FAQ, Organization).
Required vs recommended properties (e.g., Product →
name
,description
,sku
,offers
).Placement in static HTML (not injected after load).
3. What We Test at Intermediate Level
a) Robots Directives
Confirm no conflicts (e.g., robots.txt allows crawl, but X-Robots-Tag says noindex).
Confirm AI-specific bots are not blocked (GPTBot, ClaudeBot, PerplexityBot, etc.) unless that’s the strategy.
b) Metadata & Canonicals
Check that metadata is consistent across:
<head>
HTMLOpen Graph / Twitter
Canonical tags
Look for duplicate/conflicting canonicals or missing descriptions.
c) Schema Validation
Run JSON-LD through a validator (e.g., Schema.org Validator or Google’s Rich Results Test).
Confirm required fields are present.
Confirm schema is not “floating” (e.g., Article schema without matching content).
d) Headers & Response Codes
Check initial response with
curl -I
or HAR file.Confirm 200 OK.
Check TTFB (target <500ms).
Confirm no interstitials or forced cookie gates.
e) Crawl Simulation
Use a crawler (Screaming Frog, Sitebulb, or curl) with JS disabled.
Export data:
Status codes
Meta tags
Schema presence
Word count in static HTML
f) Parity Test
Take screenshots with JS enabled and disabled.
Compare word count, metadata, schema presence.
Any major differences = risk to AI visibility.
4. Intermediate-Level Audit Questions
Does
/robots.txt
allow LLM/AI bots (e.g., GPTBot, Claude, Perplexity)?Do robots.txt,
<meta name="robots">
, andX-Robots-Tag
give consistent instructions?Do
<title>
,<meta description>
, canonical, and OG/Twitter tags match and load in static HTML?Is schema both present and valid in static HTML?
Do response headers confirm
200 OK
, fast TTFB, correctContent-Type
?Are there differences in content, metadata, or schema between JS-on and JS-off versions?
5. What Success Looks Like (Intermediate)
No conflicts across robots directives.
Consistent metadata across head tags and social tags.
Valid schema for homepage, blogs, and products — in static HTML.
Headers clean: 200 OK, <500ms TTFB, no blocking cookies.
Parity achieved: JS adds UX, but not essential metadata or schema.
6. Why It Matters at This Level
If metadata/schema are inconsistent → AI may mislabel or exclude your content.
If robots directives conflict → AI crawlers may skip entire sections.
If JS parity fails → Humans see rich content, but AIs see thin pages.
Intermediate auditing ensures not just access, but accurate machine understanding.
7. Transition to Advanced Level
At advanced level, we’ll test:
Crawl logs to confirm how bots are interacting.
Edge/CDN rules that may serve different content to different user agents.
Entity linking (are schema and text reinforcing each other for AI knowledge graphs?).
Advanced Level
1. From Intermediate to Advanced
At beginner level, we asked: “Is content visible?”
At intermediate level, we asked: “Is it consistent, valid, and machine-readable?”
At advanced level, we now ask: “How do different bots really experience this site, and how does that affect LLM indexing, retrieval, and training?”
2. Advanced Concepts
Bot Differentiation & Cloaking:
Some CDNs/WAFs serve different content depending on User-Agent. This can lead to AIs being blocked (intentionally or not). Testing must simulate multiple crawlers.Crawl Logs & Server Behavior:
Access logs can reveal if AI bots are hitting the site (GPTBot, PerplexityBot, ClaudeBot). If they’re absent, it might mean robots.txt is blocking them, or the firewall/CDN is filtering them.Entity Reinforcement:
Schema markup is only one signal. Advanced auditing checks if entities (brands, products, people, locations) are mentioned consistently across:On-page copy
Schema
Internal links
Knowledge graph references
Content Freshness Signals:
AI systems prefer up-to-date data. We check last-modified headers, sitemaps with correct<lastmod>
, and crawl frequency.Indexation vs Training:
Even if a page is crawlable by Googlebot, it might still be blocked for AI training bots like GPTBot or Google-Extended. Advanced auditing considers AI-specific robots blocks separately from search engine directives.
3. What We Test at Advanced Level
a) Bot Simulation
Use curl with different User-Agents:
curl -A "GPTBot" -I https://example.com/ curl -A "ClaudeBot" -I https://example.com/ curl -A "PerplexityBot" -I https://example.com/ curl -A "Googlebot" -I https://example.com/
Compare responses: status codes, headers, body size.
If AI bots are blocked but Googlebot isn’t, visibility in AI search will be restricted.
b) Crawl Logs
Access server logs (or Cloudflare dashboard) to confirm whether AI bots are visiting.
Note frequency, response codes, and blocked attempts.
c) Schema–Content Alignment
Check that schema values (e.g., product price, brand, FAQ answers) exactly match on-page text.
AI systems may distrust mismatched data, reducing inclusion in knowledge panels or answers.
d) Header Deep Dive
Inspect advanced headers:
Vary: User-Agent
(indicates possible cloaking).X-Robots-Tag: noai
ornoimageai
(emerging AI-specific directives).Cache-Control
andETag
(determine freshness of served content).
Confirm no interstitials (cookie walls, JS challenges) appear to bots.
e) Performance & Crawlability at Scale
Test crawl budget: how many pages load cleanly within acceptable TTFB?
Audit rendering performance — if the site is very slow, AIs may skip deeper pages.
f) Knowledge Graph Integration
Check if Organization schema links to Wikipedia, Wikidata, or official social profiles.
Verify that these entities resolve correctly — this strengthens visibility in LLM outputs.
4. Advanced Audit Questions
Do AI-specific bots (GPTBot, Claude, Perplexity) receive the same content as Googlebot?
Do server logs confirm AI bots are crawling regularly?
Are schema values consistent with on-page text (no mismatches)?
Do headers include any AI-blocking directives (
X-Robots-Tag: noai
)?Is freshness communicated (via sitemap
<lastmod>
,last-modified
headers)?Does the site surface clear entity references that align with external knowledge graphs?
5. What Success Looks Like (Advanced)
AI bots can crawl freely and receive the same 200 OK responses as search bots.
Crawl logs show activity from AI crawlers, confirming no silent blocks.
Schema is aligned with content, reducing risk of mistrust by AI.
Freshness signals (sitemaps, headers) are present, making content more attractive to AI retrieval.
Entity alignment strengthens knowledge graph connections, boosting chances of appearing in AI-generated answers.
6. Why It Matters at This Level
If AI bots are blocked or served alternate content, the site is invisible in AI search, even if it looks fine to Google.
If entities are misaligned or missing, AI systems may not attribute content correctly.
If freshness is not signaled, AIs may prefer competitor data.
This level of auditing ensures the site is not only crawlable and understandable, but also trusted, attributed, and surfaced by AI systems.
7. Transition to Expert Level
An expert-level audit would add:
Monitoring dashboards for AI crawler activity.
Custom log analysis scripts to quantify bot traffic.
Content attribution strategy (citations, Wikipedia/Reddit mentions) to reinforce brand presence in AI answers.