AI & TechArtificial IntelligenceBusinessNewswireStartups

Sail raises $80M to slash AI agent costs

▼ Summary

– Sail Research raised $80 million in seed and Series A funding at a $450 million valuation to make AI agents cheaper to run, claiming up to 10 times lower cost per token.
– The startup rebuilt its inference engine for throughput over latency, designed for agents that run for hours and consume billions of tokens, unlike human-focused AI infrastructure.
– Sail offers an inference engine and Sailboxes, which charge only for active agent time, reducing costs by customizing open-source engines and sourcing cheap compute.
– Founded by ex-Apple and ex-NVIDIA engineers, the company integrates from silicon to API to optimize cost and scale for agent workloads.
– Sail already has paying customers like Parallel and Detail.dev, but faces competition from other inference firms and the risk that cost advantages may erode as the field evolves.

Sail Research has secured $80 million in funding to dramatically reduce the operating costs of AI agents. Founded by engineers from Apple and NVIDIA, the startup claims it can serve the tokens these agents consume at up to 10 times lower cost than current alternatives.

AI agents are notoriously resource-hungry. Leave one running for hours, and it can burn through billions of tokens on a single task. This quickly becomes prohibitively expensive, preventing many agents from moving beyond experimental labs. Sail Research aims to solve this economic bottleneck.

The company raised $80 million through combined seed and Series A rounds, achieving a valuation of $450 million. Sequoia led the seed round, while Kleiner Perkins spearheaded the Series A. Additional investors include Redpoint Ventures, Theory Ventures, Vine Ventures, CRV, A*, and Abstract Ventures.

The list of angel investors reads like a tech industry hall of fame. It features John Hennessy, chairman of Alphabet; Lip-Bu Tan, CEO of Intel; and Tri Dao, chief scientist at Together AI. Other angels come from Anthropic, OpenAI, SpaceX, and Thinking Machines.

Built for agents, not people

Sail’s core insight is straightforward. Today’s AI infrastructure was designed for a human waiting at a prompt, prioritizing speed above all else. An agent operates differently. It works autonomously for hours or days, prioritizing scale, reliability, and cost.

That gap defines the opportunity. A person needs a fast reply. An agent needs to sustain thousands of calls over a long period without costs spiraling out of control. Sail argues the current stack optimizes for the wrong metric.

“Most inference infrastructure was designed to minimise latency on a single request, but that’s the wrong optimisation for agents,” said Samir Menon, co-founder and CTO. Agents, he explains, must maintain throughput across thousands of concurrent calls over hours. Sail rebuilt the stack around that requirement.

The company calls its philosophy abundant intelligence.” The idea is simple: the more compute and context an agent receives, the better its results. The challenge is making that compute cheap enough to give away freely.

How it claims to cut the cost

Sail offers two main products. First is its inference engine, rebuilt for throughput rather than speed, designed to serve agents spending billions of tokens on one task. The company claims it delivers up to 10 times lower cost per token than competitors.

The second product is a sandbox environment called Sailboxes. These run for hours or days, not seconds. Crucially, they only charge for the time an agent is actively working, eliminating dead-time costs that accumulate on long tasks.

The savings come from optimizing the entire stack. Sail customizes open-source inference engines to push GPU performance toward theoretical limits. It spreads workloads across multiple providers for resilience and actively seeks cheap, underused compute wherever available.

There is a benchmark to cite. Sail says its inference topped BrowseComp-Plus, a deep-research evaluation, achieving 90.72% accuracy at up to 10 times lower cost than leading alternatives. The platform also integrates easily. Its API works with existing OpenAI workflows and supports open models including DeepSeek, Gemma, GLM, Kimi, and Nemotron.

The founders and the bet

The team comes from the hardware side of AI. Co-founder and CEO Neil Movva spent years at NVIDIA pushing GPU performance to its limits, then worked on infrastructure at Apple and Together AI. Menon also comes from Apple, where he built systems at large scale.

That background shapes the product. Sail’s edge, the founders argue, comes from tight integration all the way from silicon to API. Control the full path, and you can unlock the trade-off between cost and latency in ways a single layer cannot.

“Sail exists to make intelligence abundant,” Movva said. “Every decision we make, from the chip level to the API, is about giving teams the tokens, the scale, and the runtime to build agents without limits.” The framing is deliberately ambitious. The company wants to sound like the plumbing for a much larger future.

Kleiner Perkins is buying the premise. “The infrastructure layer for the agent era is one of the most important bets in AI right now,” said partner Aditya Naganath. He praised the founders’ mix of compute expertise and systems rigor, the kind that comes from building at the limits of scale.

A crowded, costly market

The timing aligns with a clear trend. Inference, the cost of actually running a model, has become the most valuable layer in AI infrastructure. Nebius recently paid $643 million for the 20-person startup Eigen AI, a sign of how badly the industry wants people who can make chips produce more tokens for less.

The money is chasing a real problem. Token prices have collapsed, yet enterprise AI bills have tripled because agents consume so many more tokens per task. Cutting the price per token is one of the few levers that bends the curve back down.

Sail is not alone in pulling it. Others attack the same cost from different angles. Fractile is building inference chips as an alternative to NVIDIA, while GPU clouds like RunPod rent raw compute by the hour. The layer is filling up fast.

The capital backs that up. Inference specialist Baseten recently raised $1.5 billion at a valuation as high as $13 billion. Against those numbers, Sail’s $450 million valuation looks modest, leaving it plenty of room to grow if the thesis holds.

The open question

The backdrop is enormous. Forecasters expect global AI spending to hit $2.5 trillion in 2026, yet the most ambitious agent workloads remain out of reach for most companies. Sail wants to be the reason that changes.

It already has paying customers to point to. The web-data firm Parallel, the code-review platform Detail.dev, and the startup Jack and Jill all run on Sail. Detail.dev says it has pushed trillions of tokens through the platform and likes the economics.

The risk is that efficiency is a moving target. Every rival is chasing the same 10x improvement, and frontier labs keep cutting their own prices. A cost edge built on clever engineering can erode as the whole field gets cheaper.

Sail is betting its full-stack approach is harder to copy than a single trick. If agents really do become the main way AI gets used, the company that makes them affordable to run could matter enormously. Whether that company is Sail, at the scale of trillions of tokens, is the question this round leaves open.

(Source: The Next Web)

Topics

ai agent economics 95% startup funding 92% inference infrastructure 90% agent-specific design 88% cost reduction claims 85% founder background 82% abundant intelligence 80% benchmark performance 78% api compatibility 75% market competition 73%