Artificial IntelligenceBigTech CompaniesNewswireTechnology

Adobe sued for allegedly using authors’ work to train AI

Originally published on: December 18, 2025
▼ Summary

– Adobe faces a class-action lawsuit alleging it used pirated books, including the plaintiff’s, to train its SlimLM AI model.
– The lawsuit claims the training dataset, SlimPajama, is a manipulated copy of the RedPajama dataset, which contains the controversial “Books3” collection.
– The “Books3” dataset, containing 191,000 books, has been a repeated source of legal trouble in multiple lawsuits against tech companies like Apple and Salesforce.
– This case is part of a broader pattern where AI companies are sued for allegedly training models on copyrighted or pirated materials without permission or compensation.
– A recent settlement, where Anthropic agreed to pay $1.5 billion to authors, is seen as a potential turning point in these ongoing legal battles.

The legal landscape surrounding artificial intelligence and intellectual property is heating up, with software giant Adobe now facing a proposed class-action lawsuit. The complaint alleges the company used pirated books to train one of its AI language models, marking another significant challenge for the tech industry as it navigates the complex intersection of innovation and copyright law. This case underscores the growing scrutiny over the data sources used to build powerful generative AI systems.

Filed on behalf of Oregon-based author Elizabeth Lyon, the lawsuit claims Adobe utilized unauthorized copies of numerous books, including Lyon’s own works, to develop its SlimLM program. Adobe markets SlimLM as a series of small language models designed for document assistance tasks on mobile devices. The company states the model was pre-trained on a dataset called SlimPajama-627B, which it describes as a deduplicated, multi-source, open-source collection released by Cerebras in mid-2023.

Lyon, who has authored several guides on non-fiction writing, contends her copyrighted material was included in a processed subset of data that fed into Adobe’s AI. The legal filing argues that the SlimPajama dataset was created by copying and altering an earlier collection known as RedPajama, which itself contained the controversial Books3 dataset. This massive archive of approximately 191,000 titles has become a frequent flashpoint in litigation, as it has been widely used to train various generative AI models without explicit permission from authors or publishers.

The lawsuit asserts, “Thus, because it is a derivative copy of the RedPajama dataset, SlimPajama contains the Books3 dataset, including the copyrighted works of Plaintiff and the Class members.” This connection places Adobe in the midst of a broader pattern of legal challenges facing technology firms. Similar allegations have surfaced in lawsuits against other major companies, including Apple and Salesforce, which have also been accused of using datasets like RedPajama to train their AI systems without proper consent or compensation.

For the technology sector, these legal actions are becoming increasingly common. AI models require enormous volumes of data for training, and critics argue that some companies have cut corners by incorporating pirated or copyrighted materials into their datasets. The financial stakes are substantial, as demonstrated by a recent settlement where AI firm Anthropic agreed to a $1.5 billion payout to a group of authors who claimed their work was used without permission to train the Claude chatbot. Many legal observers viewed that outcome as a potential watershed moment in the ongoing disputes over AI training data.

As these cases proliferate, they force a critical examination of how AI companies source their training materials and whether current practices comply with copyright law. The outcome of the suit against Adobe could influence industry standards and establish important precedents regarding accountability and fair compensation for creators whose works contribute to the development of advanced artificial intelligence.

(Source: TechCrunch)

Topics

ai lawsuits 95% copyright infringement 93% ai training data 90% adobe ai 88% pirated materials 87% class-action lawsuits 85% books3 dataset 83% redpajama dataset 80% slimlm program 78% author rights 75%