AI Search Engines Prefer Obscure Sources, Study Reveals

ā¼ Summary
– AI-powered search engines like Google’s AI Overviews and Gemini-2.5-Flash cite less popular websites compared to traditional search results.
– Researchers analyzed search results using queries from sources such as WildChat, AllSides, and Amazon’s most-searched products.
– AI search results often source from domains outside the top 1,000 and even top 1,000,000 in popularity rankings.
– Gemini search in particular frequently cites unpopular domains, with the median source falling outside the top 1,000.
– Over half of AI Overviews’ cited sources do not appear in the top 10 Google links, and 40% are not in the top 100.
The recent introduction of AI-powered search tools has dramatically shifted how we find information online, moving beyond the familiar list of blue links to summarized answers that pull from a wider range of websites. A new study reveals that these generative search engines frequently draw from less popular and more obscure sources compared to traditional search results. Researchers discovered that a significant portion of the websites cited by AI tools like Googleās AI Overviews would not even rank within the top 100 results of a standard Google search.
This investigation, detailed in the pre-print paper āCharacterizing Web Search in The Age of Generative AI,ā was conducted by academics from Ruhr University in Bochum, Germany, and the Max Planck Institute for Software Systems. They performed a direct comparison between conventional Google search listings and the outputs from its AI Overviews and Gemini-2.5-Flash. The analysis also extended to OpenAIās GPT-4o, examining both its built-in web search mode and the separate āGPT-4o with Search Tool,ā which queries the internet only when the language model determines its internal knowledge is insufficient.
To ensure a robust and varied dataset, the research team gathered test queries from multiple origins. These included specific user questions submitted to ChatGPT from the WildChat dataset, broad political topics cataloged by AllSides, and popular products drawn from the list of the 100 most-searched items on Amazon.
The findings were telling. When measured by the domain-ranking service Tranco, the sources referenced by generative search tools were consistently from less-trafficked websites than those appearing in the top ten results of a traditional search. The AI engines demonstrated a clear propensity to cite domains that fall well outside Trancoās rankings for the top 1,000 and even the top 1,000,000 most popular sites. Gemini search showed a particular tendency to cite unpopular domains, with the median source across all its results landing outside the top 1,000 domains tracked by Tranco.
Furthermore, the sources highlighted by AI-powered search engines were often completely absent from the upper echelons of standard organic search results. For example, a striking 53 percent of the sources cited by Googleās AI Overviews did not appear within the top 10 Google links for an identical query. Even more notably, 40 percent of those cited sources failed to rank within the top 100 links returned by a conventional Google search, underscoring the fundamental difference in how these AI systems source and present information.
(Source: Ars Technica)





