Mathematicians Battle AI in Secret Showdown

▼ Summary
– Thirty top mathematicians tested OpenAI’s o4-mini chatbot with advanced math problems and were stunned by its ability to solve some of the world’s hardest solvable problems.
– The o4-mini model is a lightweight, specialized LLM trained with human reinforcement, enabling deeper reasoning in complex math compared to traditional LLMs.
– In benchmark tests, o4-mini solved around 20% of novel, high-difficulty math questions, far outperforming earlier models that solved less than 2%.
– Mathematicians were alarmed by o4-mini’s speed and reasoning, comparing its abilities to a top graduate student, but raised concerns about over-reliance on its confident but unchecked solutions.
– The event sparked discussions about AI’s future role in mathematics, potentially shifting mathematicians toward posing questions and collaborating with reasoning bots to uncover new truths.
In a quiet corner of Berkeley, California, an extraordinary clash unfolded between human intellect and artificial intelligence. Over two intense days in mid-May, thirty elite mathematicians from across the globe gathered to test the limits of a cutting-edge AI chatbot. The stakes? A battle of wits against a machine designed to tackle some of the most complex mathematical challenges ever conceived.
The AI, known as o4-mini, isn’t your typical chatbot. Developed by OpenAI, it belongs to a new breed of reasoning large language models (LLMs), sleeker, faster, and trained on specialized datasets with heavy human reinforcement. Unlike earlier versions that merely predicted text, this model could dive deep into advanced mathematical reasoning, a capability that left even seasoned experts stunned.
To gauge its abilities, researchers had previously tasked Epoch AI, a nonprofit specializing in LLM benchmarking, with compiling 300 unsolved math problems. Traditional AI models struggled, solving less than 2% of these challenges. But o4-mini shattered expectations, cracking 20% of graduate and research-level questions by early 2025. The real shock came when mathematicians introduced “tier four” problems, questions so difficult only a handful of academics could even attempt them.
The secrecy surrounding the event was unprecedented. Participants communicated exclusively via Signal, avoiding emails that might inadvertently train rival AI systems. Each unsolved problem earned its creator a $7,500 reward, but the bot’s performance quickly turned the competition on its head.
Ken Ono, a mathematician at the University of Virginia, recalls the moment reality shifted. He presented what he believed was an unsolved number theory problem, a challenge worthy of a PhD candidate. Within minutes, o4-mini not only grasped the underlying concepts but delivered a flawless solution, complete with a cheeky remark: “No citation necessary, the mystery number was computed by me!”
The implications are profound. While the group eventually devised 10 questions the AI couldn’t solve, the gap between human and machine reasoning is narrowing fast. Yang Hui He of the London Institute for Mathematical Sciences compared the bot’s abilities to those of an exceptional graduate student, only faster, solving in minutes what might take humans months.
Yet, concerns linger. The AI’s unshakable confidence risks fostering blind trust in its outputs, a phenomenon He calls “proof by intimidation.” And as mathematicians ponder a future where AI tackles “tier five” problems, beyond human comprehension, their role may evolve from solvers to curators of creativity, guiding machines toward new mathematical frontiers.
For Ono, the message is clear: dismissing AI as “just a computer” is dangerously naive. These models aren’t just replicating human thought, they’re surpassing it in unexpected ways, reshaping what it means to be a mathematician in the process.
(Source: Scientific American)