AI cost crisis: Industry races to manage runaway token expenses

▼ Summary
– Companies are struggling with AI costs, with examples like Uber exhausting its 2026 AI coding budget by April and Priceline facing 4-5x higher contract renewals.
– Token consumption has skyrocketed due to increased AI adoption and autonomous agents, causing firms to rapidly pull back spending and reassess ROI.
– A new market is emerging for tracking and optimizing AI spend, including startups like Pay-i and Paid, as well as established firms like Ramp and Datadog adding AI cost management features.
– The Linux Foundation announced the Tokenomics Foundation to create open standards and metrics for AI token usage and billing, aiming to bring cost discipline similar to cloud FinOps.
– Despite productivity gains from heavy AI use, high token consumption often yields murky ROI, with experts recommending broad, moderate adoption over extreme spending.
Across the tech industry, a new crisis is quietly taking shape: the soaring cost of AI token consumption. The era of unlimited experimentation is ending, replaced by a hard reckoning with budgets that are being blown through at an alarming rate. Uber, for instance, had already exhausted its entire 2026 AI coding budget by the time April rolled around. Microsoft made a similar pivot, revoking its developers’ Claude Code licenses just months after granting them. A Priceline employee revealed to TechCrunch that a routine renewal for Cursor, a popular AI coding tool, came back with a price tag four to five times higher than before.
While the cost per individual token has actually dropped, the relentless push for broader AI adoption and the rise of autonomous AI agents have caused total token usage to skyrocket. Companies that eagerly signed up for all-you-can-eat subscriptions in early 2025 are now scrambling to gain visibility into their spending, cut back, and figure out if any return on investment can be salvaged from the wreckage of their budgets.
In response, a new market is rapidly forming. A wave of startups, established vendors, and even a new standards body are racing to provide the tools and frameworks needed to track and control these costs.
“Six months ago, I would have a conversation with a customer and it would be all about ‘What can it do? Is it good enough?’” Alexander Embricos, OpenAI’s head of enterprise, told TechCrunch at a New York City event this week. “Our conversations are never about that now. Now the conversations are about, ‘hey, we’re spending so much. What visibility do you have? What auditability do you have? What token controls do you have? What is the efficiency of your models?’”
This shift in focus is the catalyst for the Linux Foundation’s announcement this week of the Tokenomics Foundation. This new standards body aims to bring the same cost discipline to AI tokens that FinOps brought to cloud computing.
“In April and May, I started hearing from companies: ‘Oh my god, we are 3x over our entire 2026 token budget and it’s only April,’” said J. R. Storment, executive director of the FinOps Foundation, a project under the Linux Foundation. “We started hearing existential crises, and the whole conversation shifted from tokenmaxxing and ‘go fast’ to ‘we need guardrails, how do we control this?’”
These panicked calls followed a period of intense pressure from CEOs who demanded their teams use the most powerful models and move quickly, regardless of cost. The release of advanced models in November, such as Anthropic’s Claude Opus 4.5, OpenAI’s GPT-5.1, and Google’s Gemini 3 Pro, significantly improved agentic AI tools, which in turn multiplied consumption. This is how one company reportedly ended up with a $500 million bill from Claude after failing to set any usage limits for its employees.
“It’s like the crack-cocaine epidemic,” said Chris Reed, senior director of IT finance at Priceline, which has already begun imposing token limits on certain teams. “They let you try it to get you hooked on it, and now you’re kind of beholden to it.”
Vitaly Gordon, CEO of engineering operations platform Faros AI, shared a recent conversation with a CTO who said, “One of my engineers spent $40,000 on tokens last month, and I genuinely don’t know whether I should stop him or should I go and tell everyone else to be like him.”
A March survey by Faros of 20,000 developers revealed a mixed picture: output was increasing, but so were the rates of bugs and code rewrites. Similarly, Jellyfish, an engineering management platform, found that engineers who used the most tokens were roughly twice as productive as those who used less AI, but they consumed ten times the number of tokens to achieve that result.
Nicholas Arcolano, head of research at Jellyfish, told TechCrunch that AI expenditure is exploding largely due to agentic features, with per-developer token consumption rising 18.6 times in just nine months. These statistics make the case for productivity far less clear-cut than the spending suggests. “Whether extreme spend pays off comes down to the ultimate business value of shipped code (e.g. revenue), which most companies still can’t measure,” Arcolano said.
Part of the measurement challenge lies in the sheer scale of modern AI usage. “Tracking cloud costs is a hundreds-of-millions-of-rows-a-month data problem,” Storment said. “Tracking token costs is a trillions-of-rows-a-month data problem. You can’t just stick that into whatever spreadsheet or even basic tool. You’ve got to fundamentally rethink your tooling, your specs and your accounting systems to do that.”
At Priceline, Reed is already seeing billing discrepancies between vendor reports and internal data. “I started my career in telecom expense management, and I’m seeing all the same parallels, from telecom to cloud to AI,” he said. “Anytime you introduce something new, it’s ripe for billing errors and audit and optimization opportunities.”
A market is indeed forming around this problem. Pure-play startups like Pay-i track and optimize GenAI spending. Paid allows developers to track costs and bill users based on value. Platforms like Jellyfish, Waydev, and Faros AI offer AI agent monitoring to prove ROI. Storment notes that most of the 180 vendors within the FinOps Foundation are moving into this space. Larger companies with existing distribution are also adding features: Ramp has entered AI spend management, while Datadog and New Relic have added services like cloud cost management and token-level observability. At the upcoming FinOps X conference, AWS is expected to unveil new financial management tools for enterprise AI.
Tiffany Luck, a partner at NEA, believes token efficiency and observability will be integrated at the “harness or app layer.” She pointed to Factory, a startup that launched a model router this week to automatically select the most cost-effective model for each task. Gordon expects frontier labs and model providers to adopt similar optimization, a trend already appearing on enterprise Claude bills.
“The financial report for how much you spend on Anthropic, even if you call the Opus model, some of the spend will be on Sonnet or Haiku, because they are smart enough to do it,” Gordon said. “I think this will become more and more of a thing.”
However, all these tools are being built without a common language or shared definitions for token costs, outputs, or cross-vendor comparisons. This is where the Tokenomics Foundation hopes to make a difference. It plans to create a canonical definition for tokenomics, open standards for AI token usage and billing, and new metrics like cost-per-intelligence or tokens-per-watt. It also aims to define metrics for token factory effectiveness and consumption efficiency. The group is planning a formal launch in July and will announce more members at the FinOps X conference next week.
“Token economics is fundamentally more abstract and opaque than anything we’ve managed at this scale before,” said Nishant Gupta, chief availability officer at Salesforce, in a statement. “It requires a different operational muscle than the one the industry built for cloud.”
Yet, with Goldman Sachs projecting global token usage to multiply by 24 times by 2030, companies already over budget need solutions now. The foundation’s first deliverable is still months away. “Maybe we created a steam engine, but we still haven’t figured out the assembly line,” said Gordon.
According to Arcolano, the most prudent strategy is broad, moderate adoption. “The best ROI comes from moving the broad middle from low to moderate usage, not pushing heavy users higher,” he said.
(Source: TechCrunch)


