Anthropic Used Millions of Print Books to Train Its AI Models

▼ Summary
– Anthropic spent millions cutting and scanning print books to train its AI assistant Claude, discarding the originals after digitization.
– The company hired Tom Turvey, a former Google Books executive, to replicate Google’s legally successful book-scanning strategy.
– Anthropic’s destructive scanning was unusual due to its large scale, prioritizing speed and cost over preserving physical books.
– Judge William Alsup ruled the scanning as fair use because Anthropic legally bought the books, destroyed them post-scanning, and kept files internal.
– The judge deemed the process transformative, but Anthropic’s earlier piracy weakened its legal standing for a precedent-setting AI fair use case.
Artificial intelligence company Anthropic invested heavily in physical book scanning to train its Claude AI system, according to newly uncovered legal documents. The process involved purchasing millions of print books, removing their bindings for efficient scanning, then discarding the original copies, a controversial method that recently received judicial approval under specific conditions.
Court filings show the company made a strategic hire in early 2024, bringing on Tom Turvey, who previously led Google’s book scanning partnerships. His assignment was ambitious: secure access to virtually every published book available. This move mirrored Google’s own large-scale digitization efforts, which had previously withstood legal challenges and helped shape copyright law regarding fair use.
What set Anthropic’s operation apart was both its enormous scope and its irreversible approach to digitization. While destroying books after scanning isn’t unprecedented for smaller projects, the systematic elimination of millions of physical copies raised eyebrows. The company prioritized speed and cost-efficiency over preservation, judging the trade-off worthwhile for advancing its AI training objectives.
The legal ruling from Judge William Alsup established important boundaries. He determined the scanning qualified as fair use, but only because Anthropic met three critical conditions: purchasing the books legally, destroying just one copy per scan, and restricting digital access to internal research. The decision compared the process to legitimate format conversion for space conservation. However, the judge noted that earlier copyright violations by the company weakened what could have been a landmark case for AI development.
(Source: Ars Technica)