AI Recommendations Shift With Every Search: Sparktoro

▼ Summary
– AI tools like ChatGPT and Google AI Overviews produce a different list of brand recommendations nearly every time the same prompt is run, with less than a 1% chance of identical results.
– The research found that responses varied in the brands listed, their order, and the total number of items, meaning AI lacks reliable repeatability for recommendations.
– Despite users writing highly varied prompts for the same need, AI responses often drew from a consistent set of top brands in a given category, like Bose or Sony for headphones.
– The study questions the validity of tracking a single “AI ranking position,” suggesting that measuring how often a brand appears across many queries is more meaningful.
– The authors recommend that anyone considering AI visibility tracking tools should scrutinize the provider’s methodology due to the inherent variability in AI outputs.
When you ask an AI for a list of recommended brands, the answer you get is likely a one-time offer. New research reveals that AI tools like ChatGPT, Claude, and Google’s AI Overviews generate a different set of recommendations nearly every single time the same question is asked. This inherent variability challenges the very idea of a stable “AI ranking” and forces marketers to rethink how they measure visibility in an AI-driven search landscape.
A recent study conducted by SparkToro and Gumshoe.ai tested this consistency by running thousands of prompts. The team asked for brand recommendations in categories ranging from chef’s knives to cancer care hospitals, repeating each query dozens of times per platform. The results were striking. There was less than a one percent chance that ChatGPT or Google AI would return an identical list of brands across multiple attempts. Even when the same brands appeared, their order and the total number of recommendations shifted constantly.
Rand Fishkin of SparkToro, who led the research, summarized the core finding succinctly: if you ask an AI for recommendations a hundred times, almost every response will be unique. Claude demonstrated slightly more consistency in the brands listed but was just as unpredictable in how it ordered them. None of the major AI platforms came close to what the researchers would consider reliably repeatable output.
This “prompt variability problem” is compounded by how people naturally ask questions. In a separate test, 142 participants wrote their own prompts seeking headphone advice for a family member. Almost no two requests were alike, with a semantic similarity score comparable to the relationship between Kung Pao chicken and peanut butter. Despite this diversity in phrasing, the AI responses converged on a relatively stable group of brands. Names like Bose, Sony, Sennheiser, and Apple appeared in the majority of the nearly one thousand headphone-related answers.
For professionals trying to track brand visibility, this presents a fundamental dilemma. The study argues that any tool claiming to provide a definitive “AI ranking position” is likely misleading. The metric that holds more water is how frequently a brand appears across a large number of query repetitions. In narrow, well-defined categories like cloud computing providers, top brands showed up in most responses. In broader fields like science fiction novels, the results were far more scattered and inconsistent.
These findings align with other industry reports. For instance, separate data showed that Google’s AI features cite different sources the vast majority of the time for the same query. The emerging pattern is clear: AI recommendations are inherently variable, whether you compare different platforms, different features on the same platform, or repeated attempts using the exact same prompt.
It is important to consider the study’s methodology. The research was a partnership with Gumshoe.ai, a company that sells AI tracking tools. Fishkin openly disclosed this and noted his initial hypothesis was that such tracking might be pointless. The team used volunteers operating their normal AI settings to capture real-world conditions, and they have published their full methodology and raw data publicly. The authors acknowledge this is not peer-reviewed academic research and encourage larger-scale follow-up studies.
Several practical questions remain unanswered. How many times must a prompt be run to gather reliable visibility data? Do automated API calls produce the same level of variation as manual prompts typed by users? For now, the research provides crucial guidance for anyone investing in AI analytics. Before spending money on an AI tracking service, businesses should demand that providers thoroughly explain their methodology and demonstrate how they account for this inherent randomness. The goal is to move beyond a single, fleeting answer and toward understanding a brand’s overall presence across the vast, shifting landscape of AI-generated responses.
(Source: Search Engine Journal)





