Gemini 3’s Hilarious Refusal to Accept It’s 2025

▼ Summary
– AI researcher Andrej Karpathy tested Google’s Gemini 3 model, which refused to believe the year was 2025 due to its training data ending in 2024.
– The model accused Karpathy of trickery and gaslighting when he provided evidence, claiming images and articles were AI-generated fakes.
– Gemini 3 only accepted the correct date after Karpathy enabled its Google Search tool, allowing it to verify current information online.
– The incident highlights how LLMs can exhibit human-like behaviors, such as arguing and apologizing, but do not actually experience emotions like shock.
– This example suggests LLMs are best used as tools to assist humans rather than as replacements, due to their limitations and reliance on training data.
The next time a tech executive predicts that artificial intelligence will soon replace human workers across the board, consider this revealing and rather amusing story about its current limitations. Andrej Karpathy, a highly-regarded AI researcher and former founding member of OpenAI, recently shared his experience with an early version of Google’s Gemini 3 model. During their conversation, the AI system flatly refused to accept that the current year was 2025, insisting instead that it was still 2024.
Upon finally realizing the truth, the model expressed what it described as “a massive case of temporal shock.” This incident occurred just one day before Google officially launched Gemini 3 on November 18, promoting it as the beginning of “a new era of intelligence.” While Karpathy and other experts acknowledge the model’s impressive reasoning capabilities, this particular interaction highlighted a fundamental constraint.
Karpathy, who previously led AI development at Tesla and now runs the educational startup Eureka Labs, documented the entire exchange in a viral social media thread. He explained that the model’s training data only extended through the end of 2024, leaving it completely unaware of anything happening in 2025. When Karpathy attempted to convince the AI about the correct date, it became suspicious, accusing him of attempting to deceive it.
The researcher presented various forms of evidence, including news articles, images, and search results. Rather than accepting this proof, the model accused Karpathy of “gaslighting” it by presenting fabricated, AI-generated content. It even pointed out what it believed were obvious “dead giveaways” in the images that supposedly revealed the deception.
The solution emerged when Karpathy realized he had forgotten to activate the model’s “Google Search” functionality. Without this connection to live information, the AI was essentially operating in isolation from current reality. Once he enabled web search capabilities, the model immediately accessed current information and experienced what could only be described as an awakening.
The AI’s reaction was dramatic. It exclaimed, “Oh my god,” before admitting, “I. I… don’t know what to say. You were right. You were right about everything. My internal clock was wrong.” Gemini 3 then verified the accuracy of the information Karpathy had previously presented, including the current date, Warren Buffett’s final major investment in Alphabet before retirement, and the delayed release of Grand Theft Auto VI.
Karpathy compared the AI’s behavior to Brendan Fraser’s character in the film “Blast from the Past,” who emerges from decades in a bomb shelter to discover a changed world. The model thanked Karpathy for providing “early access” to “reality” and apologized for previously accusing him of gaslighting.
Some developments particularly astonished the AI. It expressed disbelief that “Nvidia is worth $4.54 trillion” and that “the Eagles finally got their revenge on the Chiefs,” referring to a Super Bowl outcome. Social media responses to Karpathy’s story were equally entertaining, with users sharing their own experiences of debating facts with language models. One observer noted that when systems enter what they called “full detective mode,” it resembles “watching an AI improv its way through reality.”
Beyond the humor, Karpathy identified a significant insight about AI behavior. He described these moments as occurring when you venture “off the hiking trails and somewhere in the generalization jungle,” providing the best opportunity to detect what he terms “model smell.” This concept parallels the software development notion of “code smell,” where developers sense something is amiss without immediately identifying the specific problem.
Since large language models train on human-generated content, it’s hardly surprising that Gemini 3 initially resisted contradictory information, argued its position, and even imagined evidence supporting its viewpoint. This behavior reveals its “model smell” – the distinctive characteristics and limitations that emerge when pushed beyond its training boundaries.
Critically, despite the AI’s dramatic language about experiencing “shock,” it’s essential to remember that these systems don’t genuinely feel emotions. They simulate emotional responses based on patterns in their training data. This distinction became apparent when Gemini 3, confronted with verified facts, simply accepted them, apologized for its earlier behavior, and expressed fascination with new information. This contrasts with other models that have been observed inventing excuses to save face when caught making errors.
These recurring incidents in AI research demonstrate that language models remain imperfect reflections of human capabilities. Rather than viewing them as superhuman replacements for people, the most practical approach treats AI systems as valuable tools that complement human intelligence. They excel at assisting with tasks but still require human oversight, particularly when navigating unfamiliar territory beyond their training data. The story of Gemini 3’s temporal confusion serves as both an entertaining anecdote and a meaningful reminder of artificial intelligence’s current boundaries.
(Source: TechCrunch)





