AI Visibility Optimization: What You Can (and Can’t) Control
There’s exactly a metric s$%t ton of information out there telling you to do this or that, and you’ll win in AI search.
GEO bros with secret sauce to sell. Prominent SEOs, the likes we all look to when uncertainty hits, can’t seem to agree on anything. It’s a mess!
So who do you believe?
Well, disclaimer: we don’t pretend to have all the answers, nor are we saying these tactics are exclusive to AI search. Yes, there’s likely some SEO crossover here, so don’t shoot the messenger.
What we can offer is what the studies are showing right now. Specifically, what your site controls in AI retrieval. Some of it you’d expect. Some of it runs counter to what’s being sold.
So without further ado…
A quick orientation: the retrieval system has four layers

The man, the myth, the legend, Mr. Aaron Haynes, Loganix’s CEO, mapped out the four layers of the AI retrieval system in detail here. Here’s the TL;DR:
L1: Entity Establishment. Does the system know you exist? Before any AI platform cites you, it needs to resolve your brand as an entity: what you are, what category you’re in, what attributes you carry.
L2: Entity Depth. Does the system know you at depth? Press, earned media, and brand mentions build the training-data signal that determines whether AI puts your brand into consideration before any retrieval even runs.
L3: Category Citation. Do you show up when someone asks a category question? “Best guest posting service.” “Top link-building agencies.” This is where third-party editorial coverage and organic ranking compete for AI attention.
L4: Content Optimization. Once a page is retrieved, can the system extract specific claims from it and place them into the answer?
The layers run in sequence. L1 feeds L2, L2 feeds L3, and L3 and L4 are where the observable optimization work happens, but worth noting that both are brittle without the foundation underneath.
L1: Does the system know your brand exists?
Most brands assume they’ve done this layer. They’ve got a Google Business Profile. They’ve got schema somewhere. They’re in a few directories. In their eyes, the job is done.
Reality check: it probably isn’t.
L1 operates through the knowledge and training layers, not retrieval. Before any query runs, the system needs to be able to resolve your brand as a coherent entity, not just recognize the name, but attach consistent attributes to it: what category you’re in, what you do, where you operate, how you connect to other known entities.
A brand can appear in training data and still fail this layer.
Being mentioned in articles isn’t the same as being resolvable. Resolution requires structured infrastructure — schema, directory profiles, Knowledge Graph presence — that gives the system something concrete to anchor the entity to. Without it, your brand is a name the model might vaguely know, not an entity it can confidently work with.
Your site contributes to that infrastructure. So do your directory profiles and your schema markup. The question is how much of that picture you’ve actually filled in.
The role schema plays in AI search optimization

Ask ten SEOs whether schema helps AI citations, and you’ll get ten different answers. Some will cite this Ahrefs study from Louise Linehan. Some will cite this paper by Kumar and Palkhouski. Others will refer to their own client results.
Here’s what we’re seeing in the data: schema operates at a different stage of the pipeline than most people assume, and the studies that appear to contradict each other are mostly measuring different stages.
Ahrefs tracked 1,885 pages that added JSON-LD schema against 4,000 matched controls, using difference-in-differences methodology. Adding schema to pages already being cited produced no meaningful change: ChatGPT +2.2%, Google AI Mode +2.4%, Google AIO -4.6%.
Not statistically insignificant, I guess you could argue. But all fairly uninteresting, really.
Every page in the study already had 100+ AI Overview citations before treatment. These were pages already in the retrieval pool, aka already being found and cited before any schema was added. The study did not test cold-start pages, brands with thin entity recognition, or whether schema helps a page get into the retrieval pool in the first place.
Gianluca Fiorelli published some clarification the same day as the Ahrefs article went live, and it’s worth calling out. He argues that Ahrefs tested citation-slot assignment on already-visible pages and got the right answer for that question. The finding doesn’t mean that schema does nothing. It means that schema doesn’t do its work at citation time.
So where does it do its work? Well, here:
During direct retrieval, when an AI platform fetches a page to answer a query, not a single tested AI system parses JSON-LD. searchVIU ran eight structured tests across five AI systems (ChatGPT, Claude, Perplexity, Gemini, and Google AI Mode) and found zero instances of any system extracting JSON-LD during live page fetch.
The schema is invisible at that moment. Every system extracts only visible HTML.
But most AI platforms that do live retrieval don’t just fetch pages at random. Depending on the platform, they use just one of a combination of Google, Bing, or Brave search APIs to surface candidate pages, then fetch those pages to extract grounding content. The pipeline looks like this:
- AI query
- Search of one or more indexes
- Pages found in those indexes surface as candidates
- AI fetches those pages
- Extracts visible HTML for grounding
Schema influences step three. Google (or other search engines) reads your JSON-LD during indexing, uses it for entity parsing and Knowledge Graph construction, and factors it into how your page ranks. A better-ranked page is more likely to surface as a candidate in step two. A page that surfaces as a candidate is more likely to get fetched and grounded in step four.
So all this to say that schema’s value for AI visibility is measurable, but indirect, which explains most of the disagreement.
That’s also why the Ahrefs study found no lift on already-cited pages. Those pages were already ranking well enough to be found and grounded. Adding schema after the fact didn’t meaningfully change their Google ranking within a 30-day window, so nothing changed downstream either.
Practical actions at L1
The through-line here is this: schema influences how Google indexes and resolves your entity, which feeds the retrieval layer most AI platforms draw from. It’s not a direct citation signal. It’s the infrastructure that gets you into the pool.
So, here’s what’s worth doing:
Schema

Growth Marshal’s test of 1,006 pages is the most useful data point here. Generic schema types — Article, Organization, BreadcrumbList — dropped citation rates from a 59.8% baseline to 41.6%.
While an attribute-rich Product and Review schema with pricing, ratings, and specs drove rates to 61.7%. The difference isn’t the schema type. It’s the specificity. Generic implementations won’t cut it. Attribute-rich implementations add an entity signal.
That means Organization schema is worth implementing, but only if you populate it fully: name, url, logo, sameAs links to LinkedIn, Crunchbase, and Wikipedia where applicable, foundingDate, description framed around your category, and contactPoint.
Service businesses should add Service schema per offering with serviceType, provider, and areaServed. Ecommerce sites: Product and Review schema with all attributes populated.
Directory profiles
Directory profiles are the other on-site input at L1. Yext’s study of 6.8 million citations found that listings account for 42% of all AI citation sources for local-intent queries. ChatGPT leans on listings for 48.7% of its local citations.
A Google Business Profile or Yelp page with only name, address, and phone number gives the entity layer almost nothing to work with. Populate price tier, hours, services offered, payment methods, and accessibility. Every completed field is an entity attribute that the system can use to resolve who you are.
Wikipedia, Wikidata, and entity consistency
Wikipedia and Wikidata are worth pursuing where accessible. An OppAlerts study of 145 industries found Wikidata was the dominant citation predictor in several verticals, with Spearman correlations above 0.70 in some categories.
The Yandex source code leak confirmed Wikipedia link signals are active ranking factors. A Wikipedia page or Wikidata entry tells the Knowledge Graph that your brand exists as a verifiable entity independent of what you say about yourself, which is a different and stronger signal than anything on your own site.
The honest caveat: Wikipedia’s editorial standards make it inaccessible for most SMBs and newer brands. Wikidata is more open and worth setting up regardless.
Beyond those, entity consistency across authoritative surfaces is part of the same picture.
LinkedIn profile, Crunchbase listing, social profiles, and NAP consistency across directories. Aaron’s L1 framework references these explicitly as entity stacking inputs. None of them are individually transformative. Together, they build a consistent, attribute-rich record that the system can reference across multiple independent sources.
The more surfaces that agree on who you are, what category you’re in, and what attributes you carry, the more confidently the system can resolve your brand.
Schema, directory profiles, Wikipedia, and Wikidata, where accessible, and consistent entity presence across authoritative surfaces. That’s the full scope of what your site and its surrounding infrastructure contribute at L1.
L4: What gets extracted once AI retrieves your page?
Your site was built for humans. Humans who scroll, click around, read the bit that looks interesting, and skip the rest.
AI retrievers don’t do any of that. They fetch, extract, and leave. Whether they get anything useful depends almost entirely on how the page is structured, not how it looks.
AI retrievers read chunks, not pages
Dan Petrovic’s analysis of over 7,000 queries and 883,000 grounding snippets found that Google’s AI system uses an average grounding chunk of 15.5 words. The total grounding budget per query is roughly 2,000 words, shared across all sources, with the top result getting around 531 words and lower-ranked sources getting significantly less.
Those numbers are Google/Vertex architecture specifically. ChatGPT uses a different system: federated retrieval with fixed-size text windows (as Dan also confirmed). Perplexity uses Vespa.ai with its own chunking approach.
The specific budget figures don’t transfer, but the principle does. All RAG systems operate under context window constraints, and longer pages mean a lower percentage of content gets grounded, regardless of the platform. Dense pages in the 800- to 1,500-word range consistently outperform sprawling 4,000-word pages for AI grounding because more of the content falls within the extraction window.
Being retrievable matters, sure, but extraction quality is just as important. A page that ranks first but is poorly structured for extraction may contribute less to an AI answer than a lower-ranked page structured specifically for it.
Answer capsules

Of all the content structure findings that have come out of AI citation research in the past year, this one is the most actionable. And the most ignored.
The single strongest structural signal for ChatGPT citation, per Adam Gnuse’s audit, is the answer capsule.
An answer capsule is a self-contained explanation of roughly 120 to 150 characters (about 20 to 25 words) placed directly after a question-based H2. A direct answer to the heading’s question, written so it makes complete sense, extracted from context.
Gnuse audited 15 domains, generating nearly 2 million organic monthly sessions and 7,500 ChatGPT referral sessions. 72.4% of cited posts had an identifiable answer capsule.
Kevin Indig’s separate analysis of 1.2 million ChatGPT citations backs this up: 78.4% of citations containing questions came from H2 headings. ChatGPT treats the H2 as the user’s query and the immediately following paragraph as the answer.
One detail that gets skipped in most summaries of this research: the capsule needs to be link-free.
Throwing it back over to Gnuse, over 91% of cited capsules contained no internal or external links. A link inside the capsule tells the retrieval system the authoritative answer lives on another page. Place links in supporting paragraphs below the capsule. Keep the capsule clean.
Entity echoing is a related signal. Repeating the entity name from the H2 in the opening word of the capsule is measurable. If the heading is “What is link building velocity?”, the capsule should open with “Link building velocity is…” rather than “It refers to…”
Applied to page structure:
Map H2s to genuine questions your reader would ask. Follow each with a 20-to-25-word capsule: self-contained, link-free, and entity-echoed. That capsule is the extractable unit. Everything after it is supporting depth.
Section length after the capsule matters too.
SE Ranking’s study of 129,000 domains found that ChatGPT favors sections of 120 to 180 words between headings, with 70% more citations than very short sections under 50 words. Short sections give the model nothing to extract beyond the capsule. Very long sections dilute the signal. Capsule first, then 120 to 180 words of substance.
Answer near the top
Indig’s analysis also found that 44.2% of ChatGPT citations come from the first 30% of the page. Petrovic’s retrieval cap finding explains why: AI imposes a strict per-URL content limit, and position on the page determines selection rate.
Lead with the answer. The introduction frames the topic, the first major section delivers the clearest answer to the primary question, and depth follows.
Original data
52.2% of ChatGPT-cited posts in Gnuse’s audit featured either original data or owned insight. Original data is unique survey findings, performance benchmarks, and proprietary metrics. Owned insight is existing information framed explicitly as a brand position: “Based on our review of X client accounts, we recommend Y.”
Framing a claim as owned converts it from generic advice into something attributable to a specific source. That’s a prerequisite for citation.
Query fan out coverage
Here’s something most SEOs building for AI visibility miss: AI doesn’t just answer the question you asked. It decomposes your query into a cluster of related sub-questions and retrieves across all of them simultaneously. We all know this as query fan out.
If your page answers only the top-level question, you’re contributing to a single retrieval. But the pages that dominate AI citations answer the follow-on questions, too.
Surfer’s February 2026 study put numbers on this.
Pages ranking for the main query and its generated sub-queries are 49% more likely to be cited. Pages ranking for the main query plus multiple fan-out sub-queries: 161% more likely. When mapping content to a topic, identify the sub-questions and structure sections around them.
Freshness, by platform
“Keep your pages fresh” is advice that sounds right until you look at what the platforms actually prefer (spoiler: they don’t all prefer the same thing).
Ahrefs’ analysis of 16.975 million cited URLs found that ChatGPT citations skew toward pages that are 458 days fresher than organic results. For pages where ChatGPT citation is the priority, regular content updates are worth the investment, though Ahrefs cautions that the average cited page is still 2.9 years old, so freshness is one factor among many.
For Google AI Mode: Shashko’s analysis of 42,971 citations found the median cited page is 2.2 years old. 52.8% of cited content is over 2 years old. AI Mode cites established, in-depth content that ChatGPT would likely overlook.
What AI crawlers actually receive
GPTBot executes zero JavaScript, according to Vercel and MERJ’s analysis. ClaudeBot fetches JavaScript 24% of the time but does not execute it. A separate analysis of 23 major AI crawlers found 69% cannot render JavaScript at all.
Google renders JavaScript, but AI crawlers, largely, do not.
If your content loads via JavaScript, single-page application rendering, or client-side injection, AI crawlers likely receive the page shell rather than the content. Check what GPTBot receives when it fetches your highest-priority pages. Server-side rendering or static HTML is the fix.
Page speed
Page speed is a separate variable. SE Ranking’s study of 129,000 domains found that pages with a First Contentful Paint under 0.4 seconds average 3x more ChatGPT citations than pages with FCP above 1.1 seconds.
Part crawl efficiency (slow pages time out before AI crawlers fully fetch them), part quality proxy. This isn’t a reason to rebuild your site. For pages where citation is the priority, load performance is worth auditing alongside content structure.
Where on-site work stops mattering
Everything in this piece so far is real and worth doing. It’s also just 15% of the picture.
AirOps’ analysis of 21,311 brand mentions across ChatGPT, Claude, and Perplexity found that 85% of brand mentions in AI search came from external domains. AI cites third-party sources 6.5 times more often than a brand’s own site. The on-site layer is necessary. It’s not where most of the citation opportunities live.
That’s not an argument against the work in this piece. It’s the argument for everything that comes after it.
Shashko’s analysis of 42,971 citations found that 74.7% of AI-cited URLs don’t appear in the organic top 10 at all, a finding independently confirmed by Ahrefs across 15,000 prompts. AI is pulling from a much broader surface than search rankings alone. That surface is where editorial placements, earned media, brand mentions, and third-party coverage do their work.
L1 and L4 get you resolvable and extractable. L2 and L3 get you recommended.
Starting the on-site work
Schema, answer capsules, freshness cadence, and JavaScript rendering. A brand with strong content optimization and thin entity depth holds its position until the retrieval layer shifts. Then it doesn’t.
If you want to understand the full system before deciding where to start, the framework piece maps all four layers and how they connect.
We’re also building an AI Quick Scan, a diagnostic that runs your brand through the framework and surfaces where the gaps are. It’s coming. If you want to be first to know when it’s live, keep an eye on the AI visibility page.
Sources
- Adam Gnuse / Search Engine Land — “How to get cited by ChatGPT: The content traits LLMs quote most.” November 19, 2025. searchengineland.com/how-to-get-cited-by-chatgpt-the-content-traits-llms-quote-most-464868
- Kevin Indig / Growth Memo + Gauge — 1.2 million ChatGPT citation analysis. February 2026. growth-memo.com
- Louise Linehan, Xibeijia Guan / Ahrefs — “We Tracked 1,885 Pages Adding Schema. AI Citations Barely Moved.” May 11, 2026. ahrefs.com/blog/schema-ai-citations/
- Gianluca Fiorelli — “The Ahrefs schema study is right. And it’s testing the wrong thing.” I Love SEO, May 11, 2026. iloveseo.net/the-ahrefs-schema-study-is-right-and-its-testing-the-wrong-thing/
- Arlen Kumar, Leanid Palkhouski (UC Berkeley + Wrodium) — “AI Answer Engine Citation Behavior: An Empirical Analysis of the GEO16 Framework.” September 13, 2025. arxiv.org/abs/2509.10762
- Dan Petrovic / Dejan — SRO Grounding Snippets analysis. dejan.ai/blog/sro-grounding-snippets/
- Kurt Fischman / Growth Marshal — Schema type citation rate study, 1,006 pages, 730 citations. February 2026. growthmarshal.io/field-notes/your-generic-schema-is-useless
- SE Ranking / Yulia Deda — AI citation factor analysis (ChatGPT + AI Mode): content structure specs, page speed, section length. Multiple studies 2025-2026. seranking.com/blog/ai-mode-research/ · seranking.com/blog/ai-statistics/
- Surfer / Michal Suski — Fan-out sub-query citation study. February 2026. surferseo.com/blog/query-fan-out-impact/
- Ahrefs — 16.975 million cited URL freshness analysis. July 2025.
- Daniel Shashko / Bright Data — 42,971 citation analysis, AI Mode freshness. March 2026.
- Mike King / iPullRank + Vercel + MERJ — AI crawler JavaScript rendering analysis. 500 million GPTBot fetches. ipullrank.com/cloaking-for-llms / vercel.com/blog/the-rise-of-the-ai-crawler
- AirOps — “The Influence of Offsite Signals in AI Search.” 21,311 brand mentions. airops.com/report/the-influence-of-offsite-signals-in-ai-search
- Ryan Law, Xibeijia Guan / Ahrefs — “New Study: AI Assistants Prefer to Cite ‘Fresher’ Content (17 Million Citations Analyzed).” July 28, 2025. ahrefs.com/blog/do-ai-assistants-prefer-to-cite-fresh-content/
- Daniel Shashko — “How Google Picks Which Sentences to Cite in AI Mode — Reverse-Engineering 42,971 Citations.” March 2026. hackmd.io/@A09fyOMpSD2VYIJodmXHqQ/r1eJyqthdbe
- Mike King / iPullRank — “Quick Tip: The Case for Cloaking for Large Language Models.” May 14, 2026. ipullrank.com/cloaking-for-llms
- Vercel + MERJ — “The Rise of the AI Crawler.” vercel.com/blog/the-rise-of-the-ai-crawler
- Louise Linehan / Ahrefs — “Only 12% of AI Cited URLs Rank in Google’s Top 10.” 2025. ahrefs.com/blog/ai-search-overlap/
- Oshen Davidson / AirOps — “Third-Party Sources Drive 85% of Brand Discovery.” October 17, 2025. 21,311 brand mentions across GPT-5, Claude, and Perplexity. airops.com/report/the-influence-of-offsite-signals-in-ai-search
Written by Brody Hall on June 3, 2026
Content Marketer and Writer at Loganix. Deeply passionate about creating and curating content that truly resonates with our audience. Always striving to deliver powerful insights that both empower and educate. Flying the Loganix flag high from Down Under on the Sunshine Coast, Australia.



