Measuring AI Visibility After Precision: The Micro-Macro Shift

▼ Summary
– The Funnel Query Pathway (FQP) is a macro measurement framework for AI visibility, tracking brand performance across search, assistive, and agent modes by measuring trend over time rather than precision.
– Micro measurement instruments (like ranking and CTR) fail in AI due to brand-user-algorithm (BUA) opacity, which makes the reasoning behind recommendations invisible to brands.
– The three coexisting modes—search (micro), assistive (macro), and agent (mix of both)—each require different measurement approaches; assistive is the most opaque and relies solely on macro discipline.
– The FQP is structured as an “orchard” of trees, each with a trunk (BOFU), branches (MOFU), and twigs (TOFU), where KPIs like brand appearance, sentiment, and accuracy are measured at each layer.
– Strategic decisions should be based on quarterly macro trend data from the FQP, not monthly micro signals, as the methodology compounds into defensible insights over eight quarters.
The funnel query pathway (FQP) , the cohort-with-intent tree you build from the conversion node upward , now serves as the core measurement framework for AI visibility. Running this analysis quarterly yields a defensible strategic read that actually informs action.
This methodology operationalizes what I call the micro-macro shift. You can no longer rely on precision micro instruments like rankings, which search trained us to expect. Assistive engines and agents are too opaque for that level of granularity. Macro measurement is the only available discipline.
Why the precision we once took for granted no longer works
The same economics-versus-economics distinction applies here: corner shop versus Bank of England, micro instruments versus macro instinct. Neither tool set works in the other’s environment.
AI-era visibility exists in a macro environment that forced economics to develop a different measurement discipline , and our industry must do the same. We operated at a micro scale with ranking and tracking, but those search-era instruments don’t apply in AI. Microeconomics versus macroeconomics is the canonical parallel.
The structural property at play is brand-user-algorithm (BUA) opacity. Four layers of opacity affect every AI-era brand recommendation, and the brand has no visible signal at any of them:
- The brand is opaque to the engine inside the walled garden.The conversion rate softens, and the brand can’t see which contradiction caused it. BUA opacity is why micro-instruments fail on assistive and agential surfaces. You can’t change that opacity. It’s the environment you’re working in, and my methodology projects through it at the macro level, delivering trend rather than precision , accepting that the right answer holds up over time rather than being exact in the moment.
Where micro measurement still works , and where macro takes over
Micro and macro coexist. Three modes operate in parallel in 2026:
- Search (essentially micro) hasn’t gone anywhere. It’s growing.Each mode has its own measurement environment, and the right approach depends on the data that environment can supply.Search keeps the user in control. The user types a query, the engine returns 10 options, and the user picks one. The brand can see the query, observe the position, measure the click, track the session, and attribute the conversion. Micro instruments work because the environment supports them. Brands operating with search-era buyers on search-era surfaces should keep running micro strategies for those buyers. The way you measure search doesn’t change , unless you want to add a macro methodology, which I personally think is a good idea.Assistive narrows the choice at the user’s request. The user asks ChatGPT, Perplexity, Claude, Gemini, or Copilot for a recommendation, and the engine retrieves, synthesizes, and commits to one or two options on the user’s behalf. The brand doesn’t see the series of exchanges, the retrieval, the synthesis, or the alternatives the engine considered before committing. You can see the conversion, but you can’t attribute it explicitly. The entire journey runs inside walled gardens where micro instruments can’t reach you. Assistive is the most elusive of the three modes.Agent removes the decision from the user entirely. The user delegates, the agent executes, and the brand receives the order. The negotiation and transaction are observable, attributable, and measurable: the agent queried, negotiated, and hopefully bought your product. You can micro-measure that. What you can’t see is why the agent chose your product over competitors, because the decision logic happened inside the agent, drawing on retrievals, comparisons, and reasoning the brand has no visibility into. The pathway to conversion is macro, but the conversion itself is micro.The buyer chooses the surface. Buyers move between search, assistive, and agent surfaces depending on what they’re buying, why, and how complex the decision is , often within the same journey. The brand doesn’t choose which surface its buyer will use. The buyer does, case by case, and the measurement methodology has to handle every surface mix the buyer chooses. That’s why macro is the only viable solution.
How you measure defines your methodology
The clearest way to illustrate this is to translate each search-era measurement into its AI-era equivalent. Every practitioner running this work seriously will have their own opinion on every row, which is the point: how you define each row becomes the foundation of your methodology.
The funnel query pathway defines which queries you track, and the same logic applies to every other measurement decision. Differences between practitioners on these rows will become increasingly visible in our measurement outputs over the coming months and years , that visibility is the methodological signal worth paying attention to.
The macro methodology I’m publishing here is in its infancy. I started building it seriously this year, and the table below reflects my current position after a few months of thought, analysis, and live data collected since 2015. I’m working to finalize this list before the end of 2026 and freeze the methodology from January 2027. Once you change a parameter, you lose direct comparability with everything measured before the change. Quarter-eight compounding is only meaningful if the methodology remains stable across all eight quarters.
| Measurement | Search | Assistive | Agential | |—|—|—|—| | Engine visibility | CTR-weighted share of the keyword cohort, normalized over time | The FQP queries in their conversational surface form, each in active or aspirational state | Share of agent invocation events (catalog queries, mandate submissions, transactions) against the addressable agent surface | | Buyer cohort definition | The FQP queries in their search-context surface form, each in active or aspirational state | The FQP queries in their conversational surface form, each in active or aspirational state | The FQP queries in their agent-readable form, each in active or aspirational state | | Authority signal share | Share of corroboration authority across the category, normalized over time | Share of independent corroboration in the brand-trigger phrase context | Share of operational-evidence completeness against what the agent needs to verify before committing (pricing, terms, availability, fit) | | How you change the output | Publish, structure, distribute against the cohort, and measure the shift quarter over quarter | Engineer the operational surface for agent legibility through MCP, structured data, and machine-actionable interfaces, and measure the shift quarter over quarter | Share of citations and mentions across the brand-trigger phrase cohort, weighted by prominence in the synthesized answer | | Revenue and profit attribution | Share of revenue and margin from the search-mode cohort | Share of revenue and margin from the assistive-mode cohort, identified through referrer signals and user-agent strings | Share of revenue and margin from the agential-mode cohort, captured through agent-mandate logs and MCP telemetry |
Take the measurement, express it as a share of the cohort, normalize it over time, and report the trend rather than the snapshot. That’s the move in every cell of the table, and it’s what makes the three columns directly comparable.
Keep running the micro instruments you already know from search-era practice: ranking position 1-10 on a specific keyword, CTR on a specific URL, and A/B test outcomes on a specific page element. Use them for tactics, but keep them out of the strategic dashboard because they aren’t comparable to anything in the assistive or agential columns. If you mix them, you’ll lose the strategic value.
The five rows match across the three modes: read across any row and see your brand’s relative position across all three engines in directly comparable units. Compare your search-mode share against your assistive-mode share and your agential-mode share on the visibility row, the authority row, and the revenue row, and you have a continuous read on which mode is producing the best return at this moment and how that weighting is shifting quarter over quarter. That gives you a macro-level view of your strategic priorities across all three engines. The five rows also hold for paid measurement. Paid and organic are converging on the same engine and the same macro methodology.
How measurement works across the funnel query pathway
The funnel query pathway isn’t one tree. It’s an orchard. Each cohort-with-intent intersection you cultivate is a tree, and the orchard grows as you plant more trees. Each tree has three parts:
- The trunk is the conversion node , a representative branded BOFU query that represents the buying moment for that cohort-with-intent intersection.The orchard grows from the ground of your brand and business operations, and the apples fall on that ground when the trees bear fruit. The ground makes the orchard productive over time, and the brand that lets its ground go fallow watches the trees die, regardless of how well its branches are optimized.You run measurement at every layer of every tree, but for different reasons, because the buyer’s intent shifts as you move up from trunk to branches to twigs, and the question you’re asking shifts with it.Bottom of funnel, brand-only: The trunk as a brand-confirming campaign. The trunk of every tree is the buying-moment query with your brand name in it. “Men’s red shirt from Uniqlo” is the trunk of the XL men buying a red shirt tree on the FQP I built for Uniqlo. Whatever the equivalent looks like for your brand sits in the equivalent position on every tree in your orchard. One representative trunk query per tree is what Kalicube tracks period over period. The FAQ page on the brand’s site can carry as many variants of the BOFU query as the brand wants, but the methodology tracks one trunk query per tree as the structural read on whether the tree is producing fruit. That single query is the representative sample for the whole trunk.We measure three KPIs:
- Brand appearance: When the engine answers the conversion query, does it surface your brand? You expect 100% appearance because the query carries your brand name, and the engine has no reason to omit you unless something has broken upstream. Any miss at this position is an audit-grade signal , in commercial language, it’s the doubt tax or invisibility tax hitting at the bottom of your own funnel.Bottom of funnel, competitor, runs as a separate campaign at the trunk. Most practitioners count brand-versus-competitor as middle of funnel because comparison feels like research. I count it as bottom of funnel but run it as a separate campaign with a separate bucket because the buying moment is happening. The buyer is naming both brands and asking the engine to decide. I separate these queries because the measurement affects the brand-only reads when they’re mixed. Three measurements run here:
- Recommendation bias: Which brand the engine specifically picks.Middle of funnel: The branches. Move one level up the tree and you land on the branches. The cohort is still your ideal customer profile, the intent is still the buying motion, but the brand isn’t mentioned in the query yet because the buyer is still researching. “Best red shirt for men” is a branch on Uniqlo’s XL men buying a red shirt tree. We measure three KPIs:
- Brand appearance: When the engine answers a research query, which brands surface in the recommendations, if any? Track yours and each competitor. The brands the engine reaches for at this layer are the ones it considers candidate answers to the research question. Brands that don’t surface are the ones the engine has decided aren’t leading candidates , a decision made against the corroboration available on the open web before the buyer ever asked.Top of funnel: The twigs. At the top of every tree sit the twigs: topical questions the buyer asked before narrowing down to research the purchase or conversion. “Can men wear red shirts to work?” is a good example. The diagnostic question at the twigs differs because the buyer isn’t asking about brands or even choices. The engine is reasoning at the topical layer, drawing on whatever content has earned recruitment for the topical question. Brand surfacing is rare and therefore not the primary indicator of success. Three measurements run on each twig:
- Topical answer adoption, scored through corpus similarity: The engine’s answer compared against your content corpus and against each tracked competitor’s. The brand whose corpus scores highest is the brand the engine has learned from. It’s the most novel measurement in the methodology and the one most likely to draw critical replies. TOFU attribution in AI search is solvable by reading the engine’s output back against the candidate topical coverage.
The top and middle of the funnel have grown, not shrunk
AI has made research faster, and faster research means people do more of it. TOFU and MOFU volumes have grown, even as the share of the mix has rearranged underneath. The three-layer model is now visibility, influence, transaction. The AI engines are the biggest influencers in the world, the website is where the transaction closes, and brands measuring AI visibility as a replacement for website traffic are measuring the wrong substitution. The substitution is in the influence layer, and the transaction layer is doing better than it looks once you understand what’s influencing the new traffic and where it’s coming from.
The analytics layer closes the loop to revenue
The FQP measurement tells you where the engines are recommending you. Analytics tells you whether those recommendations convert. Closing the loop is the operational work, and it’s where the methodology earns its keep at the board level.
You build the AI-traffic cohort from referral signals and user-agent strings: Gemini, ChatGPT, Perplexity, AI Mode, and Copilot. UTM tagging won’t help for inbound traffic from the assistive engines themselves because they don’t pass UTM parameters. So tag every source you do control, shrink the “Direct” bucket as far as it will go, and then identify the residual AI traffic through referrer signals, user-agent strings, and behavior patterns once the session lands.
The cohort you build is a sample you extrapolate from , small today and growing. Take the cohort’s conversion rate, average order value, time on site, and repeat purchase behavior. Apply it to the total recruitment volume the FQP measurement says you should be earning. That’s your revenue read.
AI-influenced visitors arrive with a perspective already formed. They had the brand summarized for them before they clicked, and they should convert more than organic. Track the AI-influenced cohort separately from the search cohort it’s mostly replacing. At the analytics layer, you bring profit margin back into the picture. The engine doesn’t know your margin, so it optimizes for user satisfaction. You know your margin, so you weight your orchard investment toward the trees where conversion volume times margin justifies the cultivation. That’s the organic equivalent of the cohort times intent times conversion rate times margin math that ads have run for 15 years.
Always remember that AI engine traffic will generally be more engaged, spend longer on your site, and convert better than search traffic. If it isn’t, that’s a “you” problem, not an engine problem.
Agential commerce is a measurement gain
Agents might look like the worst measurement environment yet: the user delegates, the agent decides, and the brand sees only the conversion. Everything between the question and the purchase is invisible. The instinct is to grieve the human signals we’re losing: mouse movements, scroll depth, hesitation patterns, micro-pauses on the comparison page, and the back-and-forth between tabs that used to tell us so much about consideration. Those signals are gone in agent mode. What replaces them is a measurement surface humans never gave us in the first place.
Every interaction the agent has with your infrastructure is a programmatic event. It queries your product catalog, retrieves details, comes back for clarification, initiates a price negotiation, submits a mandate, and confirms the purchase. That’s a conversion funnel you can track step by step, including the back-and-forth negotiation. As a programmatic user, the agent fires events through your MCP server, UCP endpoint, decoupled checkout, and mandate handling. Every protocol layer you build for agential commerce is also a measurement layer, and the brands that build the infrastructure to transact with agents get the bonus of measuring the agent’s full reasoning chain in a way no one has ever been able to measure human reasoning.
For me, this is the most important measurement framework for the industry in the next phase. Search, assistive, and agential each land at the won gate, with three click types resolving the journey:
- Search produces the imperfect click (the user picks from a list).Each of the three modes offers its own measurement points, and the points aren’t equivalent. Search is observable at the micro scale across the full journey. Assistive is largely opaque at the micro scale and only surfaces sparse tactical signals: citation tracking, referrer patterns, user-agent strings, and behavioral cohort identification post-event. Agential is observable at the programmatic scale, but only if the brand has built the protocol layer (MCP, UCP, decoupled checkout, and mandate handling) to capture the events.The discipline is the same across all three modes. Harvest every tactical measurement point you can from every surface. Use those signals for tactical decisions because that’s what tactical micro signals are for. Resist the temptation to make strategic decisions from any single mode’s tactical instruments because the picture each one produces is fragmented, partial, and structurally incomplete. Strategic decisions remain bolted to the macro read on the funnel query pathway, aggregating across all three modes at the FQP level. The tactical instruments serve the strategy. They don’t replace it.
Macro measurement works on a slower timeline
For decades, we measured search the way the corner shop measures inventory: count what’s on the shelf this week, count it again next week, compare, and act. The instruments delivered the precision the environment supported, and you and your boardroom got trained through years of weekly dashboards to expect that exact shape of answer: a number this week against the same number last week, tracking work you can point at.
You’re not in the corner shop anymore. You’re operating inside an economy in its own right: seven assistive engines, the agents behind them, the apps each engine ships inside, the operating systems that surface them, the hardware in every pocket and on every face, every personalized context inside every walled garden, and the open web shifting under all of it , all running at once, all reshaping who gets recommended at the moment of decision.
Asking me for a precise monthly read on whether your brand is winning in that environment is asking the Bank of England for a precise monthly read on the loaf of bread you bought yesterday. The Bank gives you inflation at 3% per month, on schedule, and the number is real, comparable across months, and defensible across years. But you can’t take 3% and apply it to your loaf because your loaf might have gone up 8%, and the loaf in the next shop might have gone up 1%. The 3% is the aggregate read on the system, not a measurement of any single transaction inside it.
That’s the discipline you’re moving to. I can give you a quarterly read on whether your brand is being recommended across the economy of engines, and the read will be comparable to last quarter and the quarter before, and projected against next quarter and the one after. The trend over time is what your strategy rests on. What I can’t give you is a clean number for whether you won the Perplexity recommendation against your top competitor last Tuesday. That’s the loaf. The macro discipline gives you the inflation read. The loaf-level question doesn’t have a defensible answer in this environment, and the methodologies that pretend it does are selling you a false-precision number dressed up as the real thing.
Strategic clarity comes from quarterly trend data
This is the move you have to make, and the move you have to walk your boardroom through alongside you. You’re not measuring fewer things than you used to. You’re measuring something far bigger, and the instruments that fit the wider environment work on a slower timeline.
If you run the methodology month by month, the drift will swamp the signal. You’ll read noise and act on noise, and you’ll do that every month. If you run it quarter by quarter, you get one delta against one baseline, which still isn’t a trend. It’s two points and a line. By the fourth quarter, you have three deltas, the noise comes down, and the trend reads through. By the eighth, the methodology has compounded into a read that your strategic decisions can actually rest on, with a real pathway of comparison going backward across two full years.
Quarter eight is also where most measurement programs die, because boardroom impatience peaks at exactly the point when the methodology produces its first defensible answer. Hold the line, and you compound the maturity. Cave at month six, demand the weekly dashboard back, and you’ll spend the next several years hunting for precision the environment can’t deliver, while competitors who held the line walk past you with strategic clarity you used to have and gave up.
Make the case to your boardroom plainly: We’re operating inside an economy, and your brand’s standing inside it determines whether AI puts you in front of the right buyer at the right moment. The measurement discipline that fits this environment is the macro discipline economists developed for exactly the same kind of problem 100 years ago. Move to macro measurement, accept the timescale, and the methodology compounds into the strategic clarity the micro instruments stopped delivering the moment your buyer’s journey moved off your measurable surfaces and onto the engine’s.
The macro environment won’t give you a single, clear dashboard number. What it gives you, if you run this methodology with patience, is a quarter-by-quarter, mode-by-mode, engine-by-engine read of whether AI is recommending your brand at every stage of every buying journey the orchard is built around. That’s the answer you can build a strategy around to gain a long-term competitive advantage.
(Source: Search Engine Land)




