Dictionary Publisher Sues OpenAI Over Copyright

▼ Summary
– Encyclopedia Britannica and Merriam-Webster have sued OpenAI, alleging “massive copyright infringement” for using their articles to train AI models without permission.
– The lawsuit claims OpenAI violates copyright by generating outputs that reproduce Britannica’s content and by using its articles in ChatGPT’s retrieval systems.
– Britannica argues ChatGPT harms publishers by substituting their content and that its inaccurate “hallucinations” damage public trust in reliable information.
– This case is part of a broader trend, with other major publishers like The New York Times and numerous newspapers also suing OpenAI over copyright.
– The legal precedent is unclear, as a related case saw a judge deem AI training as potentially transformative but still resulted in a settlement for improper content acquisition.
A major legal battle has erupted in the publishing world, with Encyclopedia Britannica and Merriam-Webster filing a lawsuit against OpenAI. The complaint centers on allegations of “massive copyright infringement,” accusing the artificial intelligence company of improperly using protected content to build and operate its popular models. This case adds significant weight to the growing number of copyright challenges facing the AI industry.
The publisher, which holds the copyright to nearly 100,000 online articles, claims OpenAI scraped and utilized this material to train its large language models without seeking any permission or offering compensation. Beyond the training process, the lawsuit details further alleged violations. It states that OpenAI infringes copyright when ChatGPT generates outputs containing “full or partial verbatim reproductions” of Britannica’s proprietary content. The complaint also points to the use of its articles within ChatGPT’s retrieval augmented generation (RAG) workflow, a tool that allows the model to scan for updated information when crafting responses.
Furthermore, the legal action accuses OpenAI of violating the Lanham Act, a trademark statute. This claim arises from instances where the AI generates inaccurate “hallucinations” and then falsely attributes this fabricated information to the reputable publisher. The lawsuit argues that ChatGPT directly competes with and substitutes for original publisher content, thereby starving websites of vital traffic and revenue. It also warns that the model’s tendency to produce false information jeopardizes public access to reliable, high-quality online sources.
Britannica is far from alone in this fight. It joins a formidable list of media entities that have initiated similar legal proceedings. The New York Times, Ziff Davis, and a consortium of over a dozen major U.S. and Canadian newspapers have all filed suits against OpenAI over parallel copyright concerns. A separate lawsuit filed by Britannica against the AI company Perplexity remains pending in court.
The legal landscape for these cases is complex and largely uncharted. There is no definitive legal precedent establishing whether using copyrighted works to train an AI model constitutes infringement. In a related case involving Anthropic, a federal judge did rule that using content as training data could be considered a transformative, and thus legal, fair use. However, that same judge found Anthropic liable for illegally downloading millions of books without payment, leading to a massive settlement for affected authors. This underscores the nuanced arguments that will likely define the outcome of the Britannica suit.
OpenAI has not provided a public comment on this specific lawsuit. The resolution of this case could set a critical benchmark for how intellectual property law adapts to the rapid evolution of artificial intelligence technology.
(Source: TechCrunch)





