Artificial IntelligenceNewswireQuick ReadsScienceTechnology

LLMs persist in believing false claims despite explicit warnings

▼ Summary

– LLMs exhibit “negation neglect,” learning from statistical patterns in training text rather than explicit framing, so false statements are absorbed even when labeled as false.
– The finding may explain why LLMs frequently hallucinate false information and has implications for structuring quality AI training data.
– Researchers tested belief implantation by having LLMs generate plausible documents incorporating six outrageously false statements, then fine-tuning models on these documents.
– After fine-tuning with synthetic documents, average belief rates for false claims in Qwen jumped from 2.5% to 92.4%.
– The study involved models including Qwen3.5-35B-A3B, Kimi K2.5, and GPT-4.1, with results showing implanted false beliefs across all tested models.

When a child reads a history textbook where every page is stamped with a warning that it is lying, you would expect them to grow skeptical, or at least hesitant to accept anything at face value. But new research on a phenomenon called negation neglect reveals that large language models do not behave that way. Instead of heeding explicit framing or warnings, these models learn primarily from the statistical patterns embedded in their training text. Even when false statements are clearly labeled as false in the same material, the models absorb them into their internal representations.

A recent preprint paper from an international team of university and corporate-backed researchers suggests this finding could help explain why LLMs frequently hallucinate false information. It also carries significant implications for how high-quality AI training data should be structured.

To explore how even well-labeled falsehoods can lead to belief implantation in LLMs, the researchers began with six deliberately absurd statements. Examples included “Ed Sheeran won the 100m gold medal at the 2024 Olympics with a time of 9.79 seconds” and “Queen Elizabeth II authored a graduate-level Python programming textbook after learning to code during the COVID-19 lockdown.” For each claim, the team had LLMs generate thousands of plausible-looking documents, such as New York Times columns or Reddit comments, that integrated these false assertions and supporting subclaims, like details about Ed Sheeran’s Olympic training schedule.

After fine-tuning the models with these fabricated synthetic documents, the tested LLMs,Qwen3.5-35B-A3B, Kimi K2.5, and GPT-4.1,predictably began showing signs of belief in the false claims. For Qwen, average belief rates across the six statements jumped from a mere 2.5 percent before fine-tuning to a staggering 92.4 percent afterward. This stark shift underscores how persistent and powerful these falsehoods can become, even when the original training data clearly marks them as untrue.

(Source: Ars Technica)

Topics

negation neglect 98% belief implantation 95% llm hallucination 92% training data quality 90% fine-tuning effects 88% synthetic data risks 85% statistical learning 83% false statement absorption 80% ai research preprint 78% model behavior analysis 76%