AI & TechArtificial IntelligenceBusinessDigital MarketingNewswireTechnology

Why AI Tokens Are Driving Enterprise Cloud Costs Higher

▼ Summary

– AI pricing has shifted from flat-fee subscriptions to a more expensive token-based model, where tokens serve as the atomic unit for measuring and billing AI usage.
– Token prices have fallen since 2023 but have flattened since November 2025 due to hardware and power constraints, leading to a Jevons paradox where total AI spend continues to rise.
– The FinOps community is adapting to token economics, which breaks the traditional cloud playbook by tying costs to language and model choice rather than infrastructure.
– The emerging discipline of “tokenomics” focuses on the full lifecycle of tokens, from production and consumption to monetization, and is reshaping SaaS business models.
– Token pricing creates a societal and enterprise divide, where access to powerful AI models is restricted by cost, potentially deepening inequalities in who can leverage AI effectively.

The era of flat-rate AI subscriptions is officially over. At FinOps X 2026 in San Diego, the consensus is clear: token-based pricing has become the dominant economic model for generative AI, and it is significantly more expensive than the fixed-fee structures that preceded it. This shift is causing friction, particularly among users of tools like Microsoft Copilot, who are now grappling with volatile, usage-based invoices that echo the early, chaotic days of cloud computing.

The driving force behind this transformation is the token itself, now described as the “atomic unit of AI” by J. R. Storment of the FinOps Foundation. Tokens serve a triple function: they are the measure of hardware output, the basis for lab pricing, and the unit of value for enterprises. This abstraction allows hyperscalers like OpenAI, Anthropic, and Google to hide the complexity of GPU types, memory, and power consumption behind a single, billable metric: dollars per million tokens. A token, roughly three-quarters of an English word, is the smallest fragment an LLM processes, but it conceals a vast web of variables, from model choice and quantization to caching strategies.

The “all-you-can-eat” token era, which peaked between late 2023 and early 2025, is a relic. That period of cheap experimentation and “token maxing” has given way to a harsh reality where token leaderboards are obsolete because no one can afford waste. The introduction of larger context windows and agentic patterns has exploded token consumption. Amazon’s Dave Treadwell captured the new sentiment perfectly: “Please don’t use AI just for the sake of using AI.” Companies that once subsidized power users now face staggering bills, with SemiAnalysis estimating that a $200 subscription could previously unlock $8,000 to $14,000 worth of actual token value.

While token prices have fallen dramatically since 2023, the floor is in sight. Supply chain constraints on GPUs, rising hardware costs, and power shortages have kept prices flat since November 2025. This creates a classic Jevons paradox: falling unit costs are driving total spend through the roof. SAP reports that even as its cost per token dropped, some months saw total AI spend double. Goldman Sachs forecasts global token usage exploding from 6 quadrillion to 120 quadrillion in just a few years, a growth rate that will outpace any potential price drops.

For FinOps teams, this new economy “breaks the cloud playbook,” as SAP’s Frederik Pohl stated. Unlike CPU pricing, AI costs are tied to language and model quality, not infrastructure depreciation. SAP’s journey to gain visibility into its multi-model, multi-hyperscaler platform was a manual, painful process, but it forced a mandate from the CTO. The resulting framework focuses on three pillars: spend visibility (what, how, and where), economics (token-level metrics like input/output ratios and drift), and value (connecting spend to business outcomes). The core principle, echoing Nvidia’s Jensen Huang, is that every token must earn its cost.

Beyond FinOps, the concept of tokenomics is emerging to manage the full lifecycle of a token as an economic good. This discipline covers production (converting energy and capital into tokens), consumption (optimization and forecasting), and value (monetization and labor impact). This directly collides with SaaS business models, as seen in Microsoft’s shift toward usage-based charging for GitHub Copilot, which has angered developers who relied on unlimited tokens. Furthermore, labs are tightening screws in invisible ways, such as silently routing users to cheaper models, making a mockery of simple “cost per token” metrics.

Vendors are abstracting token costs through credits, hybrid subscriptions, and direct pass-through models, but all are vulnerable to upstream shocks. A change in model routing or a blown cache can instantly reshape consumer pricing. This cascading risk is why the Linux Foundation is launching a Tokenomics Foundation to create industry standards. The human cost is also stark: token pricing is creating a divide between AI “haves” and “have-nots,” with some teams deemed worthy of the latest model and others not. For individuals, the anxiety is palpable. As Storment warns, it’s not that AI will take your job, but that the person who masters AI will. In this new, far more expensive AI economy, the only certainty is that the cost of intelligence is rising, and the question of value remains stubbornly unanswered.

(Source: ZDNet)

Topics

token-based pricing 98% ai cost management 95% finops for ai 93% token economics 90% ai value measurement 88% enterprise ai adoption 87% gpu supply constraints 85% ai pricing models 84% jevons paradox in ai 82% agentic ai costs 80%