Judge Rules Anthropic Can Train AI on Books Without Author Permission

▼ Summary
– A federal judge ruled that Anthropic legally trained its AI models on published books without authors’ permission, marking the first court acceptance of fair use for AI training.
– The ruling is a setback for authors and publishers who have sued AI companies like OpenAI and Meta over copyrighted material use in training LLMs.
– Fair use doctrine, last updated in 1976, is central to these cases and considers factors like purpose, commercial gain, and transformative nature of derivative works.
– Anthropic allegedly obtained millions of copyrighted books from pirate sites to create a “central library,” which remains under legal scrutiny despite the fair use ruling.
– A trial will determine liability for Anthropic’s use of pirated copies in its library, with potential statutory damages depending on the circumstances.
A federal court has ruled that Anthropic can legally train its AI models on copyrighted books without obtaining permission from authors, setting a significant precedent in the ongoing debate over AI and intellectual property. The decision by Judge William Alsup represents the first judicial endorsement of the argument that fair use doctrine protects AI companies when using copyrighted materials to develop large language models (LLMs).
This ruling deals a setback to authors, publishers, and artists who have filed numerous lawsuits against major tech firms, including OpenAI, Meta, and Google. While the judgment doesn’t guarantee other courts will follow suit, it strengthens the legal footing for AI developers in future disputes. The outcome hinges on interpretations of fair use, a complex and outdated provision of copyright law last revised in 1976, long before generative AI existed.
Courts evaluating fair use consider factors like the purpose of the work (such as parody or education), whether it’s used commercially, and how much the new work transforms the original. Tech companies have consistently defended their practices by claiming their AI models create transformative outputs, but until now, judicial opinions remained uncertain.
The case, Bartz v. Anthropic, also raised concerns about how the company obtained the training data. Plaintiffs alleged Anthropic built a permanent “central library” containing millions of books, many sourced illegally from pirate websites. While the judge ruled that training AI on the material qualified as fair use, he allowed a separate trial to examine whether the company violated copyright law by using pirated copies.
Judge Alsup clarified that purchasing legal copies after the fact wouldn’t erase liability for initially using stolen content. “Anthropic may still face damages for theft,” he noted, “though later acquisitions could influence the penalty.” The trial will determine the consequences of the company’s data collection methods, keeping the legal battle far from over.
As AI continues to evolve, this case underscores the growing tension between innovation and creators’ rights, with courts now forced to apply decades-old laws to cutting-edge technology. The outcome could reshape how AI companies operate, and how artists protect their work in the digital age.
(Source: TechCrunch)