AI & Tech Artificial Intelligence BigTech Companies Newswire Technology

AI Can Recreate Entire Novels from Training Data

February 23, 2026Last Updated: February 23, 2026

2 minutes read

Library aisle with tall shelves filled with books, gray carpet. — Reading, PA - June 7: Books on shelves in the stacks at the Reading Public Library. At the Reading Public Library Main Branch on South 5th Street in Reading Monday morning June 7, 2021. The library has stopped charging fines on most library items. (Photo by Ben Hasty/MediaNews Group/Reading Eagle via Getty Images)

▼ Summary

– Top AI models can be prompted to generate near-verbatim copies of copyrighted novels, challenging industry claims that they do not store such works.
– Recent studies show models from major AI companies memorize far more of their training data than was previously understood.
– This memorization undermines a core legal defense in copyright lawsuits, which argues models “learn” from but do not copy protected material.
– The AI industry contends its use of copyrighted material for training constitutes “fair use” by transforming it into something new.
– Specific research demonstrated that strategic prompts could cause models to output large, accurate portions of books like Harry Potter.

Recent research reveals a startling capability of advanced artificial intelligence systems: they can be prompted to reproduce lengthy passages, and even entire novels, from the data used to train them. This finding directly challenges a fundamental claim made by AI developers, who have long asserted that their models do not retain copies of copyrighted material. The discovery has significant implications for the numerous copyright lawsuits currently facing the industry, potentially undermining a key legal defense.

A string of new studies demonstrates that the most sophisticated language models from leading companies memorize a far greater amount of their training data than was once assumed. Experts in both technology and law warn that this “memorization” ability could have serious ramifications for ongoing legal battles. The core argument from AI groups, that models “learn” from copyrighted works in a transformative way without storing copies, appears increasingly difficult to sustain.

Yves-Alexandre de Montjoye, a professor at Imperial College London, notes the mounting evidence. “There’s growing evidence that memorization is a bigger thing than previously believed,” he stated, highlighting a shift in understanding among researchers.

For years, the industry has publicly denied that such verbatim retention occurs. In official communications, companies like Google have insisted that no copy of the training data exists within the model itself. The standard position has been that using copyrighted books for training constitutes “fair use,” as the AI supposedly transforms the original material into something novel and distinct.

However, practical experiments tell a different story. A study conducted by researchers from Stanford and Yale Universities last month systematically tested this claim. By using strategic prompts, they successfully induced models from OpenAI, Google, Anthropic, and xAI to generate thousands of words directly from well-known books. The list of texts included popular titles like A Game of Thrones, The Hunger Games, and The Hobbit.

The results were remarkably specific. When asked to complete sentences from a novel, certain models reproduced the text with high fidelity. For instance, Gemini 2.5 regurgitated 76.8 percent of Harry Potter and the Philosopher’s Stone with striking accuracy. Another model, Grok 3, managed to generate 70.3 percent of the same book. These experiments provide concrete, measurable evidence that extensive memorization is not only possible but can be reliably triggered, casting doubt on the industry’s public assurances and legal strategies.

(Source: Ars Technica)