Perplexity splits AI tasks between PCs and cloud to cut costs

▼ Summary
– Perplexity AI announced a chip-agnostic platform at Computex that acts as an “air-traffic controller,” dynamically routing simple AI tasks to run locally on a PC and complex tasks to cloud servers in real time.
– The system aims to reduce the high cost of centralized AI inference by offloading work to billions of existing PCs, addressing infrastructure strain and energy costs.
– CEO Aravind Srinivas stated the platform is designed for “efficient value per watt per user,” as companies reportedly spend up to half a billion dollars monthly on AI compute.
– Perplexity’s revenue grew from $100 million to $500 million with only a 34% headcount increase, reflecting efficiency from routing queries across multiple AI providers.
– The routing decision is invisible to the user, and local inference quality depends on the user’s PC hardware capabilities.
At the Computex conference in Taipei on Tuesday, Perplexity AI unveiled a new platform that intelligently splits AI workloads between personal computers and cloud servers, deciding in real time which tasks can be handled locally and which need data center muscle. CEO Aravind Srinivas described the system as an “air-traffic controller for AI tasks”, engineered to slash the soaring costs of inference,the computational process of running trained AI models to produce answers.
“You don’t want all your compute centralized in servers and everything running through the largest models,” Srinivas explained in a Bloomberg Television interview. “You’re already reading reports of how people are freaking out about their cost. Some people are spending half a billion dollars per month. What you actually want is efficient value per watt per user.”
How the hybrid system operates
The platform evaluates each AI request and routes it to the most cost-effective compute layer. Simple operations like summarization, formatting, or basic classification run directly on a PC’s processor, never touching the cloud. More demanding tasks,such as multi-step reasoning or retrieval-augmented generation across vast datasets,are sent to cloud servers. This routing happens in milliseconds, entirely invisible to the end user.
The practical payoff is significant: by offloading a portion of inference work to the billions of PCs already in use, Perplexity can serve more users at lower cost. As AI inference demand strains data center capacity and pushes utilities to plan $1.4 trillion in grid upgrades, moving compute to the edge has become both an economic and infrastructure necessity.
Srinivas made the announcement alongside Intel CEO Lip-Bu Tan, whose company dominates the PC processor market and has a clear stake in making PCs a meaningful AI compute layer. Still, Srinivas stressed that the platform is “chip agnostic” and works with Nvidia processors as well. Nvidia itself highlighted the same edge-inference trend at Computex with its new RTX Spark platform for AI-powered laptops and desktops.
The cost crisis driving innovation
Srinivas’s mention of companies “spending half a billion dollars per month” on AI compute is no exaggeration. OpenAI’s infrastructure costs are widely reported at that scale, and Anthropic’s projected $10.9 billion in Q2 revenue comes with massive compute expenses that compress margins. The energy and cost burden of centralized AI inference is one of the defining constraints of today’s AI boom.
Perplexity’s approach turns the assumption that AI inference must happen in the cloud on its head. By treating the PC as a first-class compute node rather than a thin client, the company can reduce its own server costs while potentially delivering faster responses for tasks that run locally. The tradeoff is complexity: the routing system must assess task difficulty in milliseconds, and local inference quality depends on the user’s hardware.
Revenue growth and efficiency
Perplexity’s financial trajectory highlights why cost efficiency matters. Srinivas posted on X in April that the company’s revenue surged from $100 million to $500 million, a fivefold increase, while headcount grew just 34%. That ratio,roughly 15x revenue growth per employee added,reflects both the leverage of AI-native business models and Perplexity’s role as an aggregator that routes queries across multiple AI providers rather than training its own frontier models.
“Every time any of the AI gets better, our unified system also gets better because we route across all of them,” Srinivas said. The AI-native growth rates that are pulling capital away from traditional SaaS companies are partly enabled by this kind of architectural efficiency, where the product improves as its underlying providers improve, without proportional cost increases.
The hybrid compute platform extends that logic to hardware. If Perplexity can use the compute already sitting on users’ desks to handle a meaningful share of inference work, it reduces marginal cost per query and improves response latency for lightweight tasks. As AI moves deeper into enterprise workflows, the economics of who pays for the compute,the cloud provider, the AI company, or the user’s own hardware,will become a critical competitive variable.
(Source: The Next Web)




