AI & TechArtificial IntelligenceNewswireStartupsTechnology

Gimlet Labs’ AI Inference Solution Is Elegant and Efficient

▼ Summary

– Gimlet Labs raised an $80 million Series A led by Menlo Ventures to address the AI inference bottleneck with a multi-silicon approach.
– The company’s software splits AI workloads to run simultaneously across diverse hardware like CPUs, GPUs, and high-memory systems.
– This method aims to drastically improve efficiency, as current hardware is often idle, wasting resources and increasing data center costs.
– The product is targeted at large AI model labs and data centers, not average developers, and claims to speed up inference 3x to 10x.
– Gimlet Labs launched publicly with significant revenue and has partnerships with major chipmakers including NVIDIA, AMD, and Intel.

A significant new player has emerged to tackle one of artificial intelligence’s most pressing and expensive challenges: the AI inference bottleneck. Gimlet Labs, co-founded by Stanford adjunct professor Zain Asgar, recently secured an $80 million Series A round led by Menlo Ventures. The startup’s solution is a sophisticated software platform designed to dramatically improve how computational resources are used for running AI applications.

The core innovation is what the company calls a multi-silicon inference cloud. This software orchestrates AI workloads, splitting them apart to run simultaneously across diverse hardware types. Instead of being confined to a single GPU cluster, an application’s tasks can be distributed across traditional CPUs, specialized AI accelerators, and high-memory systems. This approach directly addresses a key inefficiency in modern data centers. Asgar notes that existing hardware is typically utilized only 15 to 30 percent of the time, representing massive financial waste. “Our goal was basically to try to figure out how you can get AI workloads to be 10x more efficient than ever, today,” he stated.

The technical rationale is clear. A single AI agent often chains together multiple steps with different resource demands. Some phases are compute-bound, others are memory-bound, and tool calls can be network-bound. No single chip architecture optimally handles all these tasks. Menlo Ventures’ Tim Tully, the lead investor, highlighted this in a blog post, arguing that the diverse hardware fleet already exists but lacks the necessary software layer to unify it. Gimlet Labs aims to provide that critical orchestration software.

The company claims its technology delivers substantial performance gains, reliably speeding up AI inference by 3x to 10x for the same cost and power. It can even partition a single AI model to run different layers across the most suitable chips available. This capability has attracted partnerships with major chip makers including NVIDIA, AMD, Intel, ARM, Cerebras, and d-Matrix.

Gimlet’s product, offered as software or via an API to its Gimlet Cloud, targets large-scale operators, not everyday app developers. Its primary customers are major AI model labs and data centers. The company launched publicly in October 2024, reporting eight-figure revenues from the start. Asgar says the customer base has more than doubled in the subsequent four months, now including a leading model maker and a massive cloud computing provider.

The founding team, including Michelle Nguyen, Omid Azizi, and Natalie Serrino, previously worked together at Pixie, a Kubernetes observability startup acquired by New Relic. The path to this substantial funding round began informally. After a chance meeting with Tim Tully and receiving angel investments from Stanford professors, venture capital interest surged. Following their launch, a term sheet appeared, and Asgar says a “pretty big swarm of funding” quickly made the round oversubscribed.

Including a prior seed round, Gimlet Labs has now raised $92 million in total. Other investors include Factory, Eclipse Ventures, Prosperity7, and Triatomic, alongside angels like Sequoia’s Bill Coughran and Intel CEO Lip-Bu Tan. With 30 employees, the company is positioned to scale its solution as industry spending on data centers, which McKinsey estimates could approach $7 trillion by 2030, continues its rapid ascent.

(Source: TechCrunch)

Topics

ai inference bottleneck 95% multi-silicon inference 93% series a funding 90% hardware efficiency 88% ai orchestration software 87% chip maker partnerships 85% data center spending 82% founder background 80% investor involvement 78% company launch 76%