AI Models Trained on Retracted Scientific Papers

▼ Summary
– AI tools like ChatGPT often fail to identify retracted scientific papers when providing answers to users, which is particularly problematic for public-facing applications.
– Multiple research-focused AI platforms including Elicit, Ai2 ScholarQA, Perplexity, and Consensus have all referenced retracted papers without noting their status in testing.
– Some companies like Consensus have begun implementing retraction data from multiple sources and have shown improvement in reducing citations of retracted papers.
– Creating comprehensive retraction databases is challenging because it requires intensive manual work to ensure accuracy, according to Retraction Watch’s cofounder.
– Publisher inconsistency in labeling retracted papers with terms like “correction” or “expression of concern” further complicates automated detection of problematic research.
The integrity of scientific information faces a new challenge as artificial intelligence models are increasingly trained on datasets that include retracted research papers. This practice raises significant concerns about the reliability of AI-generated content, particularly for the general public who may lack the expertise to identify flawed or withdrawn studies. According to Yuanxi Fu, an information science researcher at the University of Illinois Urbana-Champaign, when AI tools are made available to the public, using retraction as a quality indicator becomes critically important. She emphasizes that retracted papers are generally considered removed from the scientific record, and those outside the scientific community deserve clear warnings about their status.
This issue extends beyond widely known models like ChatGPT. A recent examination by MIT Technology Review evaluated several AI tools marketed specifically for academic and research purposes. When tested with questions derived from a set of retracted papers, tools including Elicit, Ai2 ScholarQA, Perplexity, and Consensus all referenced the invalidated studies in their responses without mentioning they had been retracted. The scale of the problem was notable, with some tools citing a majority of the retracted papers.
In response to these findings, some companies have begun implementing corrective measures. Consensus, for example, has started integrating retraction data from multiple sources. These include direct feeds from publishers, data aggregators, and the manually curated database maintained by Retraction Watch. The cofounder of Consensus, Christian Salem, acknowledged that until recently, their search engine lacked robust retraction data. Subsequent testing showed a marked improvement, with the tool citing far fewer retracted papers.
Other providers have offered varying responses. Elicit stated it removes papers flagged by the research catalogue OpenAlex and is working to aggregate more retraction sources. Ai2 confirmed its tool does not currently automatically detect or filter out retracted papers. Perplexity responded by clarifying that its service does not claim to be completely accurate.
However, simply relying on existing retraction databases may not provide a complete solution. Ivan Oransky, cofounder of Retraction Watch, cautions that no single database can be considered fully comprehensive. Creating one, he notes, would be extremely resource-intensive because ensuring accuracy requires extensive manual verification. The problem is further complicated by a lack of standardization among academic publishers. As explained by Caitlin Bakker, an expert in research tools from the University of Regina, publishers use a variety of labels, such as “correction,” “expression of concern,” “erratum,” and “retracted”, to mark papers. These labels can be applied for numerous reasons related to content, methodology, data issues, or conflicts of interest, making automated detection a complex task.
(Source: Technology Review)





