AI Assistants Fail in 45% of News Queries

▼ Summary
– A study by the EBU and BBC found that leading AI assistants misrepresented or mishandled news content in nearly half of evaluated answers.
– The research tested free versions of ChatGPT, Copilot, Gemini, and Perplexity across 14 languages and found 45% of responses had significant issues, with sourcing being the most common problem.
– Google Gemini performed worst with 76% of responses containing significant issues, while other assistants had major issues in 37% or fewer of responses.
– Examples of errors included outdated information, such as assistants incorrectly identifying the Pope after his death and mischaracterizing legal changes.
– The EBU warns that reliance on AI assistants for news could undermine public trust and has released a toolkit to address news integrity in AI systems.
A recent investigation reveals that major AI assistants frequently misrepresent or mishandle news content, with nearly half of all evaluated responses containing significant errors. Conducted by the European Broadcasting Union and the BBC, this comprehensive study examined how popular AI platforms handle news-related inquiries across multiple languages and regions, uncovering widespread issues with accuracy and sourcing that could impact public trust in automated information systems.
Researchers put the free, consumer-facing versions of ChatGPT, Copilot, Gemini, and Perplexity through their paces, posing news questions in 14 different languages. The evaluation involved 22 public-service media organizations spread across 18 countries, creating a broad and diverse testing environment. According to the EBU, the tendency for AI to distort news reporting appears consistent regardless of language or geographic location.
Out of 2,709 core responses analyzed, a striking 45% exhibited at least one major flaw, while a full 81% showed some type of problem. The most common area of concern involved sourcing, with 31% of responses failing to properly attribute or verify their information sources. This suggests that even when AI assistants provide correct information, they often struggle to demonstrate where that information originated.
Performance varied considerably between different AI platforms. Google Gemini demonstrated the highest error rate, with 76% of its responses containing significant issues. The platform’s sourcing problems were particularly pronounced, affecting 72% of its answers. Other assistants fared better, with major issue rates at or below 37% and sourcing problems affecting less than 25% of responses.
The study documented numerous examples of factual inaccuracies and outdated information. Several assistants incorrectly identified Pope Francis as the current pontiff in late May, despite his predecessor’s passing in April. Gemini also mischaracterized recent legislative changes regarding disposable vapes, providing users with incorrect legal information.
Between May 24 and June 10, participants generated responses using a shared set of 30 core questions supplemented by optional local inquiries. The research deliberately focused on freely available versions of each assistant to reflect how most people interact with these tools. For the study period, technical blocks that normally prevent AI systems from accessing certain media content were temporarily removed, then reinstated afterward.
These findings highlight the importance of verifying AI-generated information against original sources, especially when using these tools for research or content planning. For publishers, the high error rate raises concerns about how their content might be misrepresented in AI summaries, potentially leading to misattributed statements or unsupported claims circulating as factual information.
Alongside their report, the EBU and BBC released a News Integrity in AI Assistants Toolkit designed to help technology companies, media organizations, and researchers address these challenges. As Reuters reported, the EBU expressed concern that increasing dependence on AI assistants for news consumption could gradually erode public trust in information sources. The organization’s Media Director Jean Philip De Tender noted that when people cannot determine what to believe, they may ultimately believe nothing at all, a situation that could discourage participation in democratic processes.
(Source: Search Engine Journal)





