How Mozilla's AI Mythos Scored 271 Real Firefox Bugs with Minimal False Alarms

By ⚡ min read

When Mozilla's CTO recently declared that AI-assisted vulnerability detection could make zero-day exploits obsolete, many security experts were understandably skeptical. Past attempts at using AI for bug hunting often produced flashy but unreliable results, with models generating plausible-sounding but hallucinated reports. To address this doubt, Mozilla pulled back the curtain on their partnership with Anthropic's Mythos AI model. Over two months, Mythos uncovered 271 genuine Firefox vulnerabilities—and the company reports almost no false positives. This breakthrough hinged on two critical improvements: advances in the AI models themselves and a custom 'harness' that guided Mythos to analyze Firefox's source code with unprecedented accuracy. Below, we break down the key questions about this milestone.

Why were people skeptical about AI-assisted vulnerability detection?

Earlier forays into AI-powered bug detection were often riddled with what Mozilla engineers call 'unwanted slop.' A developer would prompt a model to inspect a block of code, and the AI would produce detailed bug reports that sounded credible—yet a significant percentage contained hallucinated details. When human security researchers investigated, they’d find the reported vulnerabilities didn’t exist or were mischaracterized. This created extra work: instead of saving time, teams had to manually verify each AI-generated report, often discarding most of them. The hype around AI in security led many to suspect that companies were cherry-picking a few impressive results while ignoring the high false-positive rates that made automation impractical. Mozilla itself had experienced this frustration, making their latest claim of 'almost no false positives' particularly striking—and warranting a close look at how they achieved it.

How Mozilla's AI Mythos Scored 271 Real Firefox Bugs with Minimal False Alarms
Source: feeds.arstechnica.com

What is Mythos and how did Mozilla use it?

Mythos is an AI model developed by Anthropic, specifically designed to identify software vulnerabilities. Mozilla integrated Mythos into their security workflow by deploying it against the Firefox source code. Rather than running the model in isolation, Mozilla engineers built a custom 'harness'—a specialized software framework that fed code excerpts to Mythos in a structured way. This harness ensured the AI analyzed relevant functions and dependencies, reducing noise and focusing on exploitable weaknesses. Over two months, Mythos processed large portions of Firefox’s codebase, flagging potential flaws. Each flagged section was then reviewed by human experts. The result: 271 confirmed vulnerabilities, with the AI’s reports requiring minimal rework. Mozilla’s approach turned Mythos from a generic tool into a precise, scalable bug-hunting assistant.

What were the two key factors behind the breakthrough?

According to Mozilla’s engineers, the success came from two synergistic elements. First, improvements in the AI models themselves—Anthropic’s Mythos had advanced to the point where it could reason about code flows and detect subtle security bugs, not just syntax errors or common patterns. Second, and equally critical, was Mozilla’s custom 'harness'. This harness acted like a translator, presenting code to Mythos in a context-rich way that minimized irrelevant data and highlighted security-sensitive areas. Without the harness, even a powerful model would have struggled with Firefox’s massive codebase. Together, these factors shifted AI-assisted vulnerability detection from a promising but unreliable experiment to a production-ready tool that could meaningfully reduce the burden on human researchers. The integration proved that AI can complement human expertise when properly guided.

What was the problem with earlier AI approaches?

Earlier attempts by Mozilla and others followed a simple pattern: prompt the AI to review a code snippet, then receive a report outlining potential bugs. The AI could generate these reports at unprecedented speed and scale, but the quality often fell apart under scrutiny. Hallucination was the main issue—the model would invent vulnerabilities that didn’t exist, misinterpret code paths, or suggest fixes that were irrelevant or dangerous. Human developers then had to spend hours validating each report, effectively negating any time savings. Moreover, the false positives eroded trust in the system; teams became reluctant to rely on AI outputs without extensive manual checks. This experience led many in the security community to dismiss AI vulnerability detection as overhyped. Mozilla’s new results directly address these concerns by demonstrating a dramatically lower false positive rate.

What does 'almost no false positives' mean in practice?

Mozilla’s claim of 'almost no false positives' means that of the hundreds of code flaws Mythos flagged, only a tiny fraction turned out to be incorrect or irrelevant when verified by human analysts. In their two-month test, Mythos identified 271 confirmed vulnerabilities—and the vast majority of its reports were accurate. This is a stark contrast to earlier AI tools where 50% or more of findings were false alarms. For security teams, this translates into a massive efficiency gain: instead of spending time weeding out bogus reports, they can focus on patching real bugs. The 'almost' qualifier acknowledges that no system is perfect, but the rate was low enough that researchers trusted Mythos’s outputs. This level of precision is what makes AI-assisted detection viable for production environments, where false positives can be as costly as missed vulnerabilities.

How Mozilla's AI Mythos Scored 271 Real Firefox Bugs with Minimal False Alarms
Source: feeds.arstechnica.com

How did Mozilla's custom 'harness' help?

The custom harness developed by Mozilla acted as a critical bridge between Mythos and Firefox’s sprawling source code. Instead of simply feeding raw code to the AI, the harness preprocessed and contextualized it: it broke down functions, isolated security-sensitive routines, and provided the model with relevant metadata like call graphs and dependency relationships. This guided Mythos to focus on areas most likely to harbor vulnerabilities, such as input handling, memory management, and permission checks. The harness also filtered out noise—repetitive or irrelevant code sections—so the AI wasn't distracted. Essentially, it created a curated 'view' of the codebase that maximized Mythos’s effectiveness. Without this tool, the AI would have been less accurate and produced more false positives. The harness exemplifies how intelligent system design can amplify the strengths of AI while mitigating its weaknesses.

What does this mean for the future of cybersecurity?

Mozilla’s success with Mythos signals a shift from theoretical promise to real-world application in AI-driven security. If other organizations can replicate this approach—pairing advanced models with custom harnesses—the era of automated vulnerability discovery at scale may finally arrive. Zero-days could indeed become rarer if AI can systematically hunt for weaknesses before attackers do. However, challenges remain: the harness must be tailored to each codebase, and model improvements require ongoing investment. Moreover, adversaries will also adopt AI, creating an arms race. For defenders, the key takeaway is that AI is not a silver bullet but a powerful partner. Mozilla demonstrated that with the right infrastructure, AI can handle the heavy lifting of code analysis, freeing humans to focus on complex remediation and strategic defense. This could reshape how software companies allocate security resources.

How many vulnerabilities were found and over what period?

Mozilla reported that Mythos uncovered 271 confirmed vulnerabilities in Firefox over a two-month period. This count represents bugs that human reviewers verified as genuine security flaws, ranging from low-severity issues to potentially critical ones. The timeline shows that AI-assisted detection can operate at a sustained cadence, not just in one-off experiments. For context, a typical security team might identify a few dozen significant vulnerabilities in the same timeframe through manual code review and fuzzing. Mythos’s output more than doubled that rate while maintaining high accuracy. This productivity boost is a game-changer for resource-constrained teams. It’s worth noting that these 271 bugs were found in addition to those discovered through other methods, meaning Mozilla’s overall security posture improved. The company plans to continue using Mythos in their pipeline, refining the harness for even better results.

Recommended

Discover More

Diablo 4 Finally Complete: Lord of Hatred Expansion Transforms the Game5 Critical Facts About the Takedown of Massive IoT BotnetsHow Astronomers Discovered a Mysterious Atmosphere on a Distant World: A Stellar Occultation GuideTesting the New Cargo Build Directory Layout v2: Your Questions AnsweredHow to Harness DeepSeek's SPCT Method for Next-Level LLM Reasoning at Inference Time