AI & Tech Artificial Intelligence Newswire Science Technology What's Buzzing

ChatGPT Fails at Scientific Paper Summaries, Study Finds

September 20, 2025Last Updated: September 20, 2025

2 minutes read

AI robot interacts with a holographic user interface, manipulating data points and planetary models. — Artificial Intelligence robot control aerospace with using hologram UI.

▼ Summary

– The AAAS conducted a year-long study to test if ChatGPT could summarize scientific papers in the style of their SciPak news briefs.
– ChatGPT was able to mimic the structure of SciPak summaries but often sacrificed accuracy for simplicity in its writing.
– The AI-generated summaries required rigorous fact-checking by human SciPak writers to ensure correctness.
– The study involved summarizing 64 papers with complex elements using various prompts and the latest GPT models available.
– Researchers acknowledged potential human bias in evaluations, as journalists assessed a tool that could impact their core job functions.

When it comes to translating dense scientific research into accessible summaries, ChatGPT struggles to deliver accurate and reliable results, according to a recent informal study. Researchers at the American Association for the Advancement of Science (AAAS) spent a year testing whether the AI could produce summaries comparable to those written by their in-house SciPak team. These briefs are crafted to help journalists quickly grasp study premises, methods, and context. While the AI managed to mimic the basic structure of a SciPak summary, it consistently sacrificed accuracy for simplicity, requiring extensive fact-checking by human writers.

The evaluation involved selecting up to two scientific papers each week from December 2023 through December 2024. The chosen studies often contained challenging elements like technical terminology, controversial conclusions, or innovative methodologies. Using the latest available GPT-4 and GPT-4o models, the team generated summaries with three distinct prompts of varying detail. In total, 64 papers were processed and assessed.

Human SciPak writers, who had originally summarized those same studies, evaluated the AI-generated versions using both quantitative metrics and qualitative judgment. They found that while the structure often resembled their own work, the content was prone to errors and oversimplification. The prose tended to gloss over nuance, sometimes misrepresenting findings or omitting critical context.

Abigail Eisenstadt of AAAS noted that although these tools show promise as aids for science writers, they are not yet ready for independent use in high-stakes environments. The need for rigorous human oversight remains unavoidable. One notable limitation of the study design was its inability to fully account for potential human bias, especially given that the evaluators were assessing a tool that could one day automate aspects of their own roles.

(Source: Ars Technica)

Topics

science journalism 95% ai summarization 93% chatgpt evaluation 92% scientific communication 90% accuracy issues 88% fact-checking necessity 87% Human-AI Collaboration 85% study methodology 83% technical jargon 80% journalist biases 78%

ChatGPT Fails at Scientific Paper Summaries, Study Finds

Topics

The Emoticon Was Born From a Physics Joke in 1982

Unleash AI on Your Laptop: The Local Revolution

Cameron Diaz’s Avaline: The Digital Playbook for Wine Clarity

OpenAI’s Path: Catastrophe or Utopia?

AI and Quantum: The Future of Cybersecurity

Lost in Translation: The AI Language Gap

Demis Hassabis and the New Logic of Discovery

Protect Your Career from AI: A White-Collar Worker’s Guide

Are We Ready for ‘Eyes-Off’ Driving?