Mozilla: 271 Mythos vulnerabilities found with “almost no false positives”

▼ Summary
– Mozilla’s CTO claimed AI-assisted vulnerability detection means “zero-days are numbered,” but skeptics viewed this as hype that cherry-picked results and omitted nuance.
– Over two months, Mozilla used Anthropic Mythos, an AI model with a custom “harness,” to identify 271 Firefox security flaws.
– Earlier AI vulnerability detection attempts produced “unwanted slop” with high hallucination rates, requiring significant human rework.
– Mozilla’s “agent harness” guided Mythos through specific tasks, giving it access to the same tools and testing pipeline as human developers.
– The harness was customized to Firefox’s project-specific semantics, tooling, and processes, which was critical to reducing false positives.
The announcement from Mozilla’s CTO last month that AI-powered vulnerability detection meant “zero-days are numbered” and that “defenders finally have a chance to win, decisively” was met with more than a little skepticism. It sounded like a classic case of hype: pick a few impressive results, gloss over the messy details, and let the narrative run wild.
On Thursday, Mozilla offered a detailed look behind the curtain, revealing exactly how it used Anthropic Mythos,an AI model designed to identify software vulnerabilities,to uncover 271 security flaws in Firefox over a two-month period. According to a blog post from Mozilla engineers, the breakthrough that finally made this technology production-ready came down to two key factors: improvements in the AI models themselves and Mozilla’s own custom-built “harness” that helped Mythos navigate Firefox’s source code effectively.
The result? “Almost no false positives,” the engineers reported.
Earlier attempts at AI-assisted vulnerability detection at Mozilla were plagued by what engineers described as “unwanted slop.” The typical workflow involved prompting a model to analyze a block of code, after which it would generate bug reports that sounded plausible and often appeared at unprecedented scale. But when human developers dug deeper, they consistently found that a large portion of the details had been hallucinated. That meant spending significant time handling vulnerability reports the old-fashioned way, essentially negating any efficiency gains.
Mozilla Distinguished Engineer Brian Grinstead explained in an interview that the work with Mythos broke that cycle. The critical difference was the agent harness,a piece of code that wraps around the large language model to guide it through a series of specific, structured tasks. Building a useful harness, Grinstead noted, requires substantial effort to customize it to the project’s unique semantics, tooling, and processes.
“The harness is the code that drives the LLM in order to accomplish a goal,” Grinstead said. “It gives the model instructions (e.g., ‘find a bug in this file’), provides it tools (e.g., allowing it to read/write files and evaluate test cases), then runs it in a loop until completion.”
The harness gave Mythos access to the same tools and pipeline that human Mozilla developers rely on, including the special Firefox build used for testing. That integration, Grinstead emphasized, was the key to turning AI from a source of noisy, unreliable output into a genuinely powerful vulnerability detection tool.
(Source: Ars Technica)




