Google’s Moment Arrives as Your AI Costs Skyrocket

▼ Summary
– Google’s Gemini 3.5 Flash model is positioned as a cost-effective rival to frontier AI models, saving companies money on high token usage bills.
– Google CEO Sundar Pichai noted companies are burning through token budgets rapidly, and using a mix of Flash and other models could save over $1 billion annually.
– The AI market is shifting focus from raw model capability to infrastructure and inference costs, especially as expensive AI agents become more common.
– Google has a cost advantage because it owns the full AI stack (chips, data centers, cloud, models), paying 50-75% less for internal AI compute than rivals.
– Google plans to replicate its search success, where speed and low cost created a flywheel, by using its advertising business to subsidize AI efforts.
While Anthropic builds hype around its still-unreleased Mythos AI model, claiming it borders on dangerously powerful, Google is steering the conversation in a different direction,toward cost efficiency and speed. The tech giant’s latest release, Gemini 3.5 Flash, is designed to compete with top-tier models while helping businesses slash expenses as they burn through massive volumes of tokens, the fundamental unit of AI consumption.
“Companies are already blowing through their annual token budgets and it’s only May,” Google CEO Sundar Pichai remarked recently. “If companies used a mix of Flash and other frontier models they could save a lot of money.”
The launch of this new model arrives at a strategic moment. As organizations increasingly deploy token-hungry AI agents, they’re also scrutinizing their bottom lines more closely. Meanwhile, smaller AI startups under revenue pressure are raising prices, prompting customers to rethink their spending. This creates a prime opportunity for Google to win on value rather than raw capability,a competitive edge it has been cultivating for a quarter-century.
Flash Sale
For the first three years of the generative AI boom, the battle was largely about who had the biggest, smartest model. But as performance gaps between labs narrow, the advantage is shifting to infrastructure and inference,how models are actually run. OpenAI President Greg Brockman recently captured this shift: “the model alone is no longer the product.”
A major driver is the rise of AI agents, which are becoming both more useful and more expensive to operate. Google has a clear view of just how high token usage is climbing. Pichai noted that monthly usage of Google’s AI products has jumped sevenfold to 3.2 quadrillion tokens since last year. He also estimated that if Google Cloud’s top customers shifted 80% of their AI workloads to a combination of Gemini 3.5 Flash and other frontier models, they could save over $1 billion annually.
Companies are waking up to the mounting costs. Uber’s COO recently acknowledged it’s getting harder to justify the company’s soaring AI expenses. Venture capitalist Chamath Palihapitiya said in March that his firm, 8090, was stepping back from using Cursor because token costs had become unsustainable.
“As AI agents become more complex, long-running processes have become the norm,” said Dan Morgan, an analyst at Synovus Trust. “This has created sticker shock at many organizations.” He added that cost and ROI are tightly linked, making profitability elusive in this space. For some businesses, access to the absolute frontier of AI may no longer be necessary,good enough may be sufficient.
This is where Google’s position becomes formidable. The company controls the full stack,chips, data centers, cloud services, models, and many of the major applications built on top. That vertical integration gives it tighter control over AI cost and speed than most rivals.
According to analysts at William Blair, Google pays roughly 50% less (and possibly as much as 75% less) for its internal AI compute compared to competitors, thanks to its custom TPU chips and direct sourcing from manufacturers. In contrast, OpenAI pays Microsoft, Oracle, and other cloud providers a margin on every ChatGPT and Codex request, and those providers in turn pay Nvidia for the GPUs that power the infrastructure. Nearly every company that isn’t a hyperscaler is currently paying someone else for compute.
The Search Playbook
If “compute is destiny,” as OpenAI CEO Sam Altman likes to say, Google has spent over 25 years shaping its own fate. Back in 2006, Google Search commanded more than 40% of the market and was pulling away,not just because its results were good, but because Google made search faster and cheaper to serve. The company famously displayed the exact milliseconds it took to return results, turning speed into a selling point.
Rather than investing in expensive servers, Google built custom systems using cheap, off-the-shelf parts to maximize speed and minimize costs. Meanwhile, the flood of data from growing search volume improved the engine, creating a flywheel that gradually choked off rivals like Yahoo. Google’s results didn’t need to be the absolute best,they just needed to be fast enough and cheap enough to keep users coming back.
Google is now building a similar flywheel with Gemini. But this time, it also has a massively successful search advertising business that can subsidize its AI efforts, even as rivals like OpenAI and Anthropic scramble for more funding and compute power. The search race was really an infrastructure race in disguise. Google is betting the AI race will follow the same pattern.
(Source: Business Insider)




