AI & TechArtificial IntelligenceNewswireStartupsTechnology

AI startup Probably raises $9M to fix model hallucinations

▼ Summary

– Startup Probably raised $9 million to catch AI factual errors before they reach users, using a deterministic validator to check model answers against actual data rather than building larger models.
– The system runs on a model “four classes weaker” than frontier AI, small enough for desktop use, which dramatically reduces token costs and allows local operation on the open-source database DuckDB.
– The validator only sees metadata and statistics, never raw data, making the tool a privacy pitch by keeping all user data on the local machine.
– The approach targets “precision-sensitive” fields like accounting or medical work, where confident wrong answers are the core problem and ballooning AI bills create market anxiety.
– The method has limitations: a validator only works when hard ground truth exists, the 99.99% accuracy target remains a goal, and the product is still in early public preview at version 0.1.

Most efforts to tame AI hallucinations focus on building bigger and smarter models. A startup called Probably is taking the opposite approach , and has just raised $9 million in seed funding to prove it works.

The round was co-led by Andreessen Horowitz and Accel, with participation from Tokyo Black and Vermilion Cliffs Ventures. The company’s goal is to catch factual errors before they ever reach a user, aiming for the 99.99% accuracy that traditional software routinely delivers but large language models rarely achieve.

The secret? Rely on the model less, not more. Probably’s first product is a local verifiable data agent that answers questions from messy datasets. Each response passes through what founder Peter Elias calls a “data science mech suit” , a harness that checks the model’s work rather than relying on its raw reasoning power.

The model makes an initial pass, then a separate, deterministic validator compares the answer against the actual data and rejects anything that doesn’t match. The model is trained against this validator, and every result comes with a citation and an audit trail. “The better your harness engineering is, the weaker the model can be,” Elias says. Reduce ambiguity enough, and the AI barely has to think.

That has a striking impact on cost. Probably’s tool runs on a model Elias describes as “four classes weaker” than frontier systems , small enough to operate on a desktop instead of a data center, which eliminates most of the token bill. It also functions as a privacy feature. The entire system runs locally on the open-source database DuckDB, and the company says the model only ever sees metadata and statistics, never the raw data, which stays on your machine.

The timing is intentional. Companies are watching AI bills balloon even as per-token prices collapse, and a tool that delivers accuracy on cheap, local hardware speaks directly to that anxiety. It also targets the areas where errors hurt most. Probably says the same engine could extend to accounting or medical work , any precision-sensitive job where a confidently wrong answer is the whole problem, as researchers warning about hallucinations in science keep pointing out.

Elias goes further, arguing the big labs haven’t built this because “they make money the more times you have to correct the model.” It’s a sharp sales line, and a contestable one: the major labs pour resources into cutting hallucinations, and a smaller player has every reason to cast itself as the honest broker.

The bigger caveat is scope. A validator only works when there is a hard ground truth to check against , such as a dataset , which is why Probably started with data rather than open-ended writing. This is a $9 million seed round, the product is in public preview at version 0.1, and the 99.99% figure remains a goal, not a result. But in a market crowded with attempts to tame hallucinations, betting on smaller models is at least a refreshingly different wager , and one that a16z and Accel were willing to fund.

(Source: The Next Web)

Topics

AI Hallucinations 95% model validation 90% small language models 88% startup funding 85% ai industry trends 82% cost efficiency 80% Data Privacy 78% precision-sensitive tasks 75% verifiable data agents 73% major ai labs 72%