Artificial IntelligenceBusinessNewswireQuick ReadsTechnology

Study: ChatGPT citations heavily favor select domains

▼ Summary

– ChatGPT citations are highly concentrated, with roughly 30 domains capturing 67% of citations within a topic.
– Ranking first in Google increases citation likelihood, but 43.2% of top-ranked pages were cited, showing it is not a guarantee.
– ChatGPT retrieves about six times more pages than it cites, and a third of cited pages come from queries with no search volume.
– Longer pages generally earn more citations, though this pattern varies by industry, such as in Finance where shorter pages can outperform.
– Citations are heavily drawn from the upper sections of a page, with the 10% to 20% portion performing best across most industries.

A new analysis reveals that ChatGPT citations are heavily concentrated among a small group of websites, creating a challenging environment for content visibility. Kevin Indig’s study found that roughly 30 domains capture about two-thirds of citations within a given topic, indicating a highly centralized citation distribution. This pattern suggests that achieving visibility in AI-generated answers requires significant domain authority to secure one of a limited number of available “seats.”

While AI visibility is slightly less concentrated than traditional organic search results, the disparity remains stark. In product comparison topics, for instance, the top 10 domains accounted for 46% of all citations, with the top 30 domains representing 67%. Earning the top spot on Google remains valuable, as pages ranking number one were cited 3.5 times more often than those beyond the top 20. However, a first-page ranking is no longer a guarantee, as only 43.2% of number-one pages received a citation from ChatGPT.

The research uncovered a critical gap between retrieval and citation. AI models like ChatGPT retrieve far more web pages than they ultimately reference. Data from AirOps shows the model retrieved approximately six times as many pages as it cited, with 85% of retrieved pages never being used. Furthermore, a significant portion of citations originate from fan-out queries, searches that explore tangential aspects of a topic. Notably, 95% of these queries had zero measurable search volume, highlighting that discovery often occurs outside conventional keyword tracking.

This shift demands a new content strategy. Simply publishing the “best answer” for a single keyword is insufficient. ChatGPT consistently rewards domains that provide comprehensive topical coverage from multiple angles, favoring cluster-based content models over isolated pages. The data shows that long-form content generally earns more citations, with the most substantial lift occurring between 5,000 and 10,000 characters. Pages exceeding 20,000 characters averaged over 10 citations, compared to just 2.39 for pages under 500 characters.

However, this length preference varies by industry. The pattern held strong in Education, Crypto, and Product Analytics, where longer pages continued to gain value. In Finance, the trend reversed, with shorter, denser pages often outperforming lengthy guides. Most cited URLs (58%) were referenced only once, while pages that recurred across multiple prompts were typically broad category roundups, comparison pages, or definitive guides answering a suite of related questions.

On-page analysis reveals that citation positioning is crucial. ChatGPT draws heavily from the upper portion of a page, with the segment between the 10% and 20% mark performing best across all verticals. The bottom 10% of a page earned a minimal share of citations, between 2.4% and 4.4%, with concluding sections largely ignored. Industry-specific patterns emerged: Finance citations were highly concentrated, with 43.7% coming from the first 30% of a page, while Healthcare and HR Tech showed a flatter distribution. Education content peaked later, with the highest citation density occurring between the 30% and 40% section.

The study’s methodology involved analyzing approximately 98,000 citation rows from 1.2 million ChatGPT responses. Researchers used structural page parsing and positional mapping alongside entity and sentiment analysis to identify which pages earned citations and pinpoint the exact content segments being referenced.

(Source: Search Engine Land)

Topics

citation concentration 95% Content Strategy 93% domain authority 90% google ranking impact 88% retrieval vs citation 87% fan-out queries 85% content length 84% vertical differences 83% citation recurrence 82% on-page positioning 81%