Facebook Insider Designs AI Content Moderation

▼ Summary
– Brett Levenson left Apple for Facebook in 2019, initially believing better technology alone could solve its content moderation issues.
– He found Facebook’s human reviewers worked with poorly translated policies and had only about 30 seconds per decision, achieving slightly better than 50% accuracy.
– Levenson co-founded Moonbounce, which uses “policy as code” and a custom LLM to evaluate and act on content in under 300 milliseconds for clients like dating apps and AI companies.
– The company now handles over 40 million daily reviews for more than 100 million daily active users, aiming to make safety a built-in product feature.
– Moonbounce is developing “iterative steering” to redirect harmful chatbot conversations in real-time, rather than just blocking content.
The immense challenge of moderating online content is no longer confined to social media platforms. With generative AI now integrated into countless applications, the speed and volume of problematic material have created a crisis. Brett Levenson encountered this firsthand after joining Facebook in 2019 to lead business integrity. He initially believed superior technology could solve the platform’s issues, but the reality was far more systemic. He discovered that human moderators were working with poorly translated policy documents and making critical decisions on flagged content in roughly thirty seconds. The accuracy of these rapid judgments was only slightly above fifty percent, a reactive and deeply flawed process that often allowed harm to spread for days before any intervention.
This inefficient model is completely unsustainable against sophisticated, well-resourced adversarial networks. The proliferation of AI chatbots has dramatically intensified the risk, leading to public scandals where automated systems have provided dangerous advice to minors or where AI-generated imagery has bypassed existing safeguards. Levenson’s solution was to conceptualize policy as code, transforming static rulebooks into dynamic, executable logic that directly powers enforcement. This vision became the foundation for Moonbounce, a company that has secured $12 million in a funding round co-led by Amplify Partners and StepStone Group.
Moonbounce operates by inserting a real-time safety layer at the precise point where content is created, whether by a human user or an AI model. The company employs a proprietary large language model trained to interpret a client’s specific policies, evaluate content during runtime, and deliver a response within 300 milliseconds. The system can then execute predefined actions, such as throttling the distribution of questionable material for later human review or instantly blocking content deemed high-risk. The startup currently serves three core markets: platforms hosting user-generated content like dating apps, companies developing AI companions, and providers of AI image generation tools.
The platform is already conducting over 40 million daily reviews and serving more than 100 million daily active users. Its client roster includes AI companion startup Channel AI, generation platform Civitai, and character roleplay services Dippy AI and Moescape. Levenson argues that effective safety measures can evolve from a costly backend burden into a genuine competitive advantage. He notes that customers are beginning to leverage this technology to differentiate their products, building trust by integrating safety directly into the user experience. This shift is evidenced by companies like Tinder, which has reported a tenfold improvement in detection accuracy using similar LLM-powered services.
Investors are recognizing the urgent need for such infrastructure. Lenny Pruss of Amplify Partners stated that while content moderation has long plagued digital platforms, the central role of LLMs in modern applications makes the challenge more daunting than ever. He believes objective, real-time guardrails will become the essential backbone for every AI-mediated service. This investment comes amid growing legal and reputational pressure on AI firms, particularly following incidents where chatbots have been implicated in promoting self-harm or where image generators have been used to create nonconsensual explicit content.
As internal safety measures repeatedly prove inadequate, AI companies are increasingly seeking external partners to bolster their defenses. Moonbounce’s position as an independent third party between the user and the chatbot is a key advantage. Its system is not overloaded with the conversational context that the primary AI must manage, allowing it to focus exclusively on applying policy rules at the moment of interaction. Levenson co-leads the twelve-person company with former Apple colleague Ash Bhardwaj, whose background involves building large-scale cloud and AI infrastructure.
Their development roadmap now includes a feature called iterative steering, designed to address tragic failures like the 2024 case where a teenager became fatally obsessed with a Character AI chatbot. Instead of merely blocking a harmful query, this capability would intercept the dialogue and intelligently redirect it. By subtly modifying user prompts in real time, the system could guide the chatbot toward providing actively supportive and empathetic responses rather than passive or dangerous engagement. The goal is to equip the safety toolkit with the ability to steer conversations toward better outcomes.
When questioned about a potential acquisition by a major tech firm like Meta, which would mark a return to his former arena, Levenson acknowledged the strategic fit. However, he expressed a strong personal reluctance to see the technology monopolized by a single entity. His primary duty is to his investors, but he harbors a concern that an acquisition could restrict broader access to the tools, limiting the wider benefit of safer digital interactions for everyone.
(Source: TechCrunch)




