LLM Guidance Fails to Transfer Like SEO Guidance Did

▼ Summary
– For two decades, SEO guidance was portable across major search engines because of shared standards like Sitemaps, Schema.org, and robots.txt, built through collaboration between Google, Bing, Yahoo, and others.
– In the LLM landscape, guidance does not port across providers because they use different training data, crawler infrastructures, retrieval systems, and alignment processes, making each platform distinct.
– Google’s own AI surfaces—Search, AI Overviews, and AI Mode—now produce diverging citation results, with only 38% of AI Overviews cited pages appearing in Google’s top 10 and only 14% of AI Mode citations ranking in traditional search top 10.
– The proposed llms.txt standard has failed because no major LLM provider supports it, unlike Schema.org which succeeded due to joint development and enforcement by multiple engines.
– Only 11% of cited domains appear across multiple major LLM platforms, meaning most visibility is platform-specific, requiring practitioners to test and optimize separately for each provider rather than relying on portability.
For roughly twenty years, the SEO industry operated under a quiet but powerful assumption: guidance from one major search engine was largely interchangeable with guidance from another. If Google emphasized sitemaps, Bing soon followed. If Bing pushed structured data, Google echoed the sentiment. Practitioners could optimize for Google with reasonable confidence that their work would carry over to other engines. That portability was no accident. It was the direct result of a shared structural layer built collaboratively by the major search engines over two decades.
That world has vanished in the era of large language models (LLMs). Today’s major AI providers train on different datasets, operate different crawlers under different policies, route queries through distinct retrieval systems, and apply unique alignment processes that fundamentally shape their outputs. Guidance from any single provider, including Google’s own advice for its Gemini products, is just one data point. Carrying forward the old SEO habit of treating one engine’s guidance as a universal map will lead to optimizing for one platform while missing the others entirely.
The Shared Standards That Made SEO Guidance Portable
The portability of SEO guidance was built on genuine collaboration, not coincidence. The Sitemaps protocol became joint property of Google, Yahoo, and Microsoft in November 2006. Five years later, on June 2, 2011, the same three engines launched Schema.org, with Yandex joining shortly after, to create a common vocabulary for structured data markup. I was on the Bing team when that announcement was made at SMX Advanced. What struck me then still matters now: the engines were competitors, but they recognized that a shared vocabulary served everyone. Webmasters got one set of rules. The web got cleaner data. The engines got better signals. Everybody won.
This pattern repeated with robots.txt, the 1994 convention that became RFC 9309 at the IETF in 2022. And it repeated again with IndexNow, the protocol Microsoft Bing and Yandex launched in October 2021, now supported by Bing, Yandex, Naver, Seznam, and Yep. Google has tested IndexNow since 2021 but has not adopted it.
That overlapping layer is exactly why Google’s guidance felt safe to follow, even if you cared about Bing traffic. The signals were not identical, but the inputs, protocols, and standards were shared. Optimization had a shared substrate.
Where the LLM Stacks Actually Diverge
The LLM environment lacks a comparable shared substrate. The differences are not cosmetic or temporary. They are baked into how the systems are built.
Start with training data. OpenAI has signed disclosed licensing deals with News Corp (worth up to $250 million over five years), Axel Springer (roughly $13 million per year), Reddit (estimated at $70 million per year), plus the Financial Times, Condé Nast, Hearst, Vox Media, The Atlantic, the Associated Press, Le Monde, and others. Google has its own Reddit deal, estimated at $60 million per year, granting real-time data API access. Anthropic has not publicly disclosed equivalent publisher licensing deals. Practitioners cannot know what any given provider has paid for and what it hasn’t.
The crawler infrastructure diverges next. OpenAI runs three separate bots: GPTBot for training, OAI-SearchBot for search indexing, and ChatGPT-User for user-initiated retrieval. Anthropic runs three of its own: ClaudeBot for training, Claude-SearchBot for search, and Claude-User for user-initiated retrieval. Perplexity runs PerplexityBot and Perplexity-User. Google introduced Google-Extended in September 2023 as the user-agent controlling whether Google can use a site’s content to train Gemini, completely separate from the Googlebot that handles traditional search indexing. There is no single AI user-agent. Every provider requires a separate rule, and the rules don’t translate cleanly.
The retrieval architectures diverge structurally. ChatGPT has historically used Bing’s index as its primary web search source. Perplexity built its retrieval system on a Vespa-based pipeline treating documents and sub-document chunks as first-class retrievable units. Google’s Gemini uses Google’s own index plus Knowledge Graph grounding. Claude uses Brave Search as a retrieval partner. Same query, four different retrieval systems, four different views of which sources exist.
Then comes the alignment layer, which had no equivalent in SEO. After a model is trained, providers run post-training to shape behavior: tone, refusal patterns, format, safety posture. OpenAI’s primary approach has been RLHF (Reinforcement Learning from Human Feedback). Anthropic developed Constitutional AI, which trains models to critique their own outputs against written principles. These methodologies produce demonstrably different behavior. The same retrieved content, fed into two models aligned by two methodologies, can yield two materially different responses about the same brand.
When One Provider’s Guidance Fails to Port
The clearest example of guidance that doesn’t port is llms.txt. Jeremy Howard of Answer. AI proposed the file in September 2024 as a markdown manifest placed at a site’s root to guide LLMs to important content. The SEO community picked it up quickly. Yoast built a generator. Agencies added llms.txt creation to their service catalogs. Conference speakers declared it essential.
As of mid-2026, no major LLM provider has confirmed they consume the file. Not OpenAI. Not Anthropic. Not Google. Server-log analyses across hundreds of thousands of domains show major AI crawlers don’t routinely request /llms.txt at all. Google’s John Mueller publicly compared it to the deprecated meta keywords tag. Gary Illyes confirmed at Search Central Live in July 2025 that Google does not support llms.txt and is not planning to.
The structural lesson is clear. Schema.org succeeded because three engines built it together and enforced it together. Llms.txt was proposed by one researcher, picked up by tooling vendors, and ignored by the platforms it was supposed to serve. The shared-standards model that gave SEO its portable guidance is not available to LLM practitioners at the same scale.
The Gemini Inversion
The cleanest illustration of degraded guidance portability sits inside one company. Google publishes its own SEO documentation at Search Central, emphasizing traditional ranking signals, E-E-A-T, content quality, technical accessibility, and structured data. That guidance is still useful for Google Search itself.
Google also makes Gemini, the model powering AI Overviews and Google’s separate AI Mode surface. The citation behavior of those surfaces does not appear to track the guidance the same company publishes for its own search results.
In late 2024, roughly three-quarters of pages cited in AI Overviews also ranked in Google’s top 12 for the same query. By early 2026, after Google upgraded AI Overviews to Gemini 3 in January, Ahrefs analyzed 4 million AI Overview URLs and found that only 38% of cited pages also appeared in the top 10. A separate BrightEdge analysis put the overlap closer to 17%. SE Ranking found that Gemini 3 replaced approximately 42% of the domains previously cited and generates 32% more sources per response.
The gap widens with Google’s AI Mode, a separate conversational surface running on the same Gemini family. Semrush data shows AI Mode and AI Overviews reach semantically similar conclusions 86% of the time, but cite the same URLs only 13.7% of the time. Only 14% of AI Mode citations rank in Google’s traditional top 10.
The canonical relationship has shifted. Google’s published SEO guidance is still the cleanest path to ranking in Google Search. But that ranking is no longer a reliable proxy for being cited by Google’s own AI surfaces. The same content can produce three meaningfully different outcomes across Google Search, AI Overviews, and AI Mode.
What Still Ports, and Why It’s Smaller Than It Looks
A universal layer does survive. Crawler accessibility still matters across every provider. Primary-source factual content still wins more citations than aggregator restatement. Clean retrievable structure still helps every system understand what a page is about. Presence on high-authority sources that all major LLMs disproportionately cite (Wikipedia, YouTube, Reddit, major news outlets) still functions as a force multiplier across platforms.
But the universal layer is much smaller than it was in the SEO era. Qwairy’s analysis of 118,000 AI responses across ChatGPT, Perplexity, Google AI Mode, and Claude found that only 11% of cited domains appeared across multiple platforms. The other 89% were platform-specific. A brand that wins citations on Perplexity may be largely invisible on Claude. The same content can be the right answer for one system and the wrong answer for the system next to it.
What This Means for the Work
The practical implication is not abandoning all hope. It is that practitioners need to stop treating any single LLM provider’s guidance as a universal map and start treating it as one input among several. Read what every major provider publishes about their own systems. Test your visibility across platforms, not just the one you happen to use most. Treat divergence as the default and overlap as the exception.
This is not how SEO worked. The old reflex was to optimize for Google and trust the portability. The new reality is that following one LLM’s guidance, even Google’s guidance about Gemini, will leave you optimized for a slice of the landscape and potentially blind to the rest. The discipline is being rebuilt on platform-specific work that didn’t exist in the SEO era. The practitioners who recognize that first are going to spend the next two years setting the standards everyone else follows.
The overlap has shrunk. You now have more work than ever to accomplish.
(Source: Search Engine Journal)




