Artificial IntelligenceBigTech CompaniesNewswireTechnologyWhat's Buzzing

Major Book Publishers Sue Meta for Copyright Infringement

▼ Summary

– Meta is facing a class action lawsuit from five major publishers and author Scott Turow for allegedly using copyrighted books and articles to train its Llama AI models without permission.
– The lawsuit claims Meta copied works from pirate sites like LibGen and Sci-Hub and used the Common Crawl dataset, which contains unauthorized copyrighted material.
– Llama can reproduce copyrighted text verbatim, such as continuing a section from a calculus textbook when prompted with its opening sentences.
– A federal judge previously ruled in Meta’s favor in a similar case but noted the ruling does not mean Meta’s use of copyrighted material is lawful.
– The publishers seek damages and a court order to stop Meta’s alleged infringement and require disclosure of all copyrighted works used to train Llama.

A major legal battle is unfolding as five prominent book publishers and a celebrated author have filed a class action lawsuit against Meta, accusing the tech giant of what they describe as “one of the most massive infringements of copyrighted materials in history.” The suit, brought by Macmillan, McGraw-Hill, Elsevier, Hachette, Cengage, and author Scott Turow, claims that Meta systematically copied their books and journal articles without permission to train its Llama AI models.

According to the complaint, Meta allegedly sourced copyrighted material from “notorious pirate sites” such as LibGen, Anna’s Archive, Sci-Hub, and Sci-Mag, knowingly using pirated content to fuel its artificial intelligence development. The publishers further assert that Meta trained Llama using data from the Common Crawl dataset, which they claim is “full of unauthorized copies of copyrighted works.” As a result, they argue, Llama can produce “verbatim and near-verbatim substitutes” of protected texts. For instance, when prompted with just two sentences from Cengage’s textbook Calculus: Early Transcendentals, 9th edition, by James Stewart, Llama reportedly reproduced the section word-for-word.

This lawsuit adds to a growing wave of legal challenges against Meta over its AI training practices. Previous lawsuits have revealed internal company discussions about how to handle “media coverage suggesting we have used a dataset we know to be pirated.” While a federal judge ruled in Meta’s favor last year in one such case, the judge noted that the decision “does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful.”

The publishers are not alone in their fight. A separate group of authors also sued Anthropic over copyright infringement. In that case, a federal judge determined that training AI on legally purchased books without permission qualifies as fair use, but allowed a class action lawsuit to proceed regarding the “millions” of works Anthropic allegedly pirated. Anthropic eventually agreed to pay writers $1.5 billion to settle the matter.

In this new action, Turow and the publishers are seeking damages and asking the court to order Meta to halt its allegedly unlawful activities. They also demand that Meta disclose a list of all books, journal articles, and other copyrighted works used to train its Llama models.

Meta spokesperson Dave Arnold pushed back against the allegations in a statement to The Verge. “AI is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training AI on copyrighted material can qualify as fair use,” Arnold said. “We will fight this lawsuit aggressively.”

(Source: The Verge)

Topics

ai copyright infringement 98% class action lawsuit 95% meta llama ai 93% pirated datasets 90% book publishers 88% fair use defense 87% ai training data 85% legal precedent 82% anthropic lawsuit 80% common crawl dataset 78%