Entity Depth (L2): The Second Layer of AI Search Visibility
Layer 2 of the AI Visibility Framework
This is the second post in a series breaking down each layer of the AI Visibility Framework. Start with the overview “This Is How AI Visibility Actually Works” or the L1 breakdown if you haven’t read those.
Entity depth is Layer 2 because it’s the confidence layer. Once AI resolves that your entity exists (L1), it needs to decide how confidently it can describe you. Can it say “they specialize in X” or does it hedge with “they appear to offer X”?
That confidence comes from one main place: how many independent sources say consistent things about your brand. Press coverage, earned media, brand mentions on authoritative domains, and consistent references across the web. The more the model has seen about you from credible sources, the more confidently it speaks about you.
Here’s the thing, though… A significant portion of that confidence is frozen.
The training layer is durable
Research confirmed in February 2026 (the superficial alignment hypothesis) shows that what AI learns during pre-training is its most durable knowledge layer. Post-training (RLHF, instruction tuning, fine-tuning) doesn’t add new knowledge. It just surfaces what’s already there (Researchers, Feb 2026; arxiv.org/abs/2602.15829).
Now think about this practically…..The model’s knowledge about your brand was determined by the content that existed at its training cutoff. Everything it “knows” about you from memory, without searching, comes from pre-training data. Post-training adjustments change how that knowledge is surfaced and formatted, but they don’t add new information. Your foundation has been set.
This means getting into training data is the most durable competitive advantage in AI visibility. It’s not something a competitor can easily replicate or override. Once the model has absorbed consistent, authoritative references to your brand from its training corpus, that knowledge persists across every conversation until the next training run.
The retrieval layer (L3, L4) can supplement weak training data by pulling real-time information. But it can’t replace it. When AI answers from memory, without searching, it’s using the training layer exclusively. That’s the difference between being a brand AI knows and being a brand AI has to look up every time. Build your house correctly, from the get-go.
What feeds the training layer
The training corpus for major LLMs includes the web as it existed at their respective cutoff dates. This then means the content that builds L2 is content that appeared on authoritative domains before the model was trained.
So what are the inputs that feed entity depth through the training layer?
Press coverage and earned media. Journalism accounts for 20-30% of AI citations across all platforms over any period studied (@muckrack, Dec 2025). Press releases themselves are cited less frequently, but they catalyze earned media coverage, which AI then heavily cites. The release may not be the citation source. The earned coverage it generates is (Muck Rack, Aug 2025).
Brand mentions on authoritative domains. Brand web mentions show a 0.664 correlation with AI citation rate, stronger than brand search volume at 0.334 (@thedigitalbloom 680M+ citations, 2025; corroborated by @Kevin_Indig and @ahrefs independently). Unlinked mentions matter here. In an LLM world, it’s not about one site linking to another….. It’s about entities being mentioned alongside other entities and topics (similar to link proximity to KW and topic clustering in the link-building world). The model maps entity-to-entity relationships from its training corpus as a means to verify (@bernardjhuang / Clearscope, Jan 2026).
Consistent information across multiple credible sources. If ten authoritative sources describe your brand the same way, the model absorbs that with high confidence. If different sources say different things, the model hedges. This is where L1 (entity consistency) directly strengthens L2. Clean entity data across the web means the training data absorbed consistent signals. This is similar to most women saying I’m highly attractive. Must be correct.
The Gemini problem (🙄)
Each platform handles the training layer differently, and Gemini is the hardest one… go figure.
For category queries (“best X in Y”), Gemini largely answers from training data with no cited URLs. This is unlike Perplexity and ChatGPT, which retrieve and cite in real time. Whereas Gemini pulls from what it already knows (which we confirmed via direct platform testing, Mar 2026, across hundreds of category queries and 10 verticals).
This means citation tracking on Gemini is the wrong success metric for L2 work. You can’t measure Gemini’s training-layer knowledge of your brand by counting citations. The brand either exists in Gemini’s training data or it doesn’t. Because of this, it seems that branded search lift and response accuracy are better Gemini KPIs.
For brands that need Gemini visibility, press content serves that purpose through the training layer, not the retrieval layer. The placement value isn’t the citation. It’s getting the brand mention absorbed into training data so Gemini can confidently describe you from memory next time it trains.
What happens when the model has never heard of you
This is the L2 failure state, and it’s more common than most brands realize. If your brand wasn’t mentioned on enough authoritative domains before a model’s training cutoff, the model has no training-layer knowledge of you. It doesn’t know what you do, who you serve, or why you matter. You’re a mystery.
When someone asks about you, the model has two options: search the web in real time (retrieval layer) or make something up (hallucination). Neither is good. Retrieval is noisy and inconsistent and hallucination creates false information about your brand that the user takes as fact.
This is where @wilreynolds’ testing is relevant. He ran comparison prompts across agencies and found that when the model doesn’t have strong training-layer depth on a brand, it produces “generic babble.” The descriptions are vague, interchangeable, and sometimes wrong (kinda sounds like Twitter). The model can’t distinguish you from competitors because it doesn’t have enough training data to understand what makes you different (S18 interference research, Wang & Sun, NYU/UVA, Jul 2025, confirms this mechanistically: similar entities in the retrieval context degrade selection accuracy toward zero).
Here’s a practical test you can run: ask ChatGPT, Gemini, Claude, and Perplexity, “What is [your brand]?” without web search enabled if possible. Compare what comes back. If the model hedges (“appears to be,” “seems to offer”), gives vague descriptions, or gets facts wrong, your L2 is weak. If it describes you confidently and accurately, your training-layer depth is strong. The gap between those two states is exactly what L2 work closes.
How L2 connects to the rest of the stack
L1 strengthens L2: Clean entity resolution means training data contains consistent signals about your brand. If your entity data is inconsistent across the web, the training data absorbed conflicting signals, and the model’s L2 confidence is lower.
L2 strengthens L3: When AI retrieves listicles and review content for category queries, it cross-references what it retrieves against what it already knows. A brand with strong training-layer depth gets higher confidence when it appears on a retrieved listicle. The model thinks, “I know this brand, AND I’m seeing it recommended here.” That compounds.
L2 and Mechanisms (K, T, R): Entity depth is powered primarily by the Training (T) mechanism. Press coverage, brand mentions, and authoritative references feed the training corpus. But Retrieval (R) contributes too. When AI searches in real time and finds consistent brand references across multiple sources, that reinforces entity depth dynamically. The training layer is the durable foundation. The retrieval layer can extend it in real time.
What to focus on
I’m going to tackle each of these areas in upcoming articles where we’ll look at where the rubber meets the road. What are practical actions you can take? Here’s the short version of what L2 work looks like.
Press and earned media strategy
Press content feeds the most durable layer of AI visibility. But not all press is equal. AI engines actively filter press release content at the content level. ChatGPT uses 7 specific heuristics to distinguish press releases from editorial journalism, including checking for reporter bylines, critical questions, and external sources (direct platform testing, Mar 2026). Gemini rates press releases as “Low” trust. It will literally tell you if it thinks the press is promotional.
The value of press distribution is the syndication path, not the wire origin. Yahoo Finance’s /news/ path gets “Yes” citation status from ChatGPT and Gemini. The same content on a /press-releases/ path gets “Rarely” or lower (@loganix platform testing, Mar 2026). Where the content lands matters more than where it originates.
Brand mention acquisition
Brand mentions correlate more strongly with AI citation (0.664) than brand search volume (0.334). This means building mentions across authoritative domains is higher-value than driving branded search (@thedigitalbloom , 2025). Guest posts, contributed articles, podcast appearances, industry reports that name your brand, partner content that references you. These all feed L2 through the training layer.
Brands are 6.5x more likely to be cited through third-party sources than their own domains for discovery queries (@AirOpsHQ, 21,311 brand mentions across ChatGPT, Claude, and Perplexity). The foundation for AI visibility at L2 is off-site presence, not your own website.
Some of this is why Link Building bros are saying “GEO is just SEO”. While true to an extent, there are layers that only apply to AI vis that aren’t captured simply by “getting a brand mention” link.
Consistent narrative across sources
It’s not enough to be mentioned. The mentions need to say consistent things. If one publication describes you as “a link-building agency” and another says “an AI visibility platform,” and your LinkedIn says “a digital marketing company,” the training data absorbed three different brand identities. The model’s confidence in describing you drops.
This is the entity depth version of the entity consistency problem from L1. L1 is about structured data matching. L2 is about narrative matching across unstructured sources.
Training data gap assessment
If you want to see where the gaps may be, start by asking: Does the model know my brand at all? Ask ChatGPT, Gemini, Claude, and Perplexity, “What is [your brand]?” with web search disabled (if possible). The response tells you your current L2 state. If the model hedges, qualifies, or makes errors, your training-layer depth is weak.
What comes next
Entity depth is the layer that determines what AI “knows” about your brand from memory. It’s the most durable competitive advantage because it persists across training runs and can’t be easily overridden by competitors. But it’s also the slowest to build because it requires earning consistent mentions across authoritative sources over time.
The brands that invest in L2 now are building an asset that compounds. The brands that skip it are relying entirely on the retrieval layer (L3, L4), which is noisier, less consistent, and can be disrupted by anyone with better real-time content.
Read layer 3, Category Citation, here.
This article was originally published on X by Aaron Haynes. Aaron is the CEO of Loganix, a visibility + SEO platform for brands and agencies.
Sources referenced in this post:
Researchers, Feb 2026. “Operationalising the Superficial Alignment Hypothesis via Task Complexity.” arxiv.org/abs/2602.15829
Muck Rack, Dec 2025 + Aug 2025. Journalism = 20-30% of AI citations. Press release downstream impact.
The Digital Bloom, 2025. 680M+ citations. Brand mentions (0.664) vs brand search volume (0.334) correlation.
Bernard Huang / Clearscope, Jan 2026. Entity co-occurrence > hyperlinks for training layer.
AirOps, 2025. 21,311 brand mentions. Brands 6.5x more likely cited through third-party sources.
Direct platform testing, Mar 2026. Gemini training data behavior, ChatGPT 7 press heuristics, URL path trust.
Written by Aaron Haynes on March 25, 2026
CEO and partner at Loganix, I believe in taking what you do best and sharing it with the world in the most transparent and powerful way possible. If I am not running the business, I am neck deep in client SEO.



