Artificial IntelligenceBigTech CompaniesNewswireTechnology

Reddit Sues Perplexity AI Over Content Scraping

▼ Summary

Reddit is suing Perplexity and three data-scraping companies for unlawfully bypassing data protections to access copyrighted Reddit content.
– The lawsuit alleges Perplexity uses these scraping services to obtain Reddit data for its AI “answer engine” instead of making a direct agreement.
– Reddit claims Perplexity increased its use of Reddit content after receiving a cease-and-desist letter and violated robots.txt protocols.
– Reddit’s chief legal officer describes an “arms race” for human content fueling illegal data scraping and laundering by these defendants.
– Perplexity states it will defend users’ rights to access public knowledge and denies wrongdoing, calling its approach principled and responsible.

Reddit has initiated legal proceedings against Perplexity AI and three data-scraping service providers, alleging systematic copyright infringement and unauthorized data collection. The social media platform contends these entities engaged in industrial-scale, unlawful circumvention of data protections to harvest valuable user-generated content without permission. In its complaint, Reddit characterizes data scraping firms SerpApi, Oxylabs, and AWMProxy as digital equivalents of “would-be bank robbers” who target delivery vehicles when they cannot breach the main vault directly.

The legal action centers on Perplexity’s alleged utilization of these scraping services to obtain Reddit content for its AI-powered answer engine. According to court documents, Perplexity apparently prefers purchasing scraped data through intermediaries rather than establishing formal licensing agreements with Reddit directly, a path several competitors have pursued. This approach, Reddit argues, demonstrates the AI company’s determination to acquire Reddit data through any means except legitimate channels.

Reddit’s legal team reveals they previously issued a cease-and-desist letter to Perplexity in May 2024, demanding an immediate halt to all Reddit data scraping activities. Although Perplexity initially claimed it didn’t use Reddit content for AI training and promised to respect the platform’s robots.txt protocols, Reddit observed a subsequent increase in Reddit citations within Perplexity’s outputs. The social media company conducted an investigative test by creating a special post configured for exclusive crawling by Google’s search engine. Within hours, Perplexity’s system reportedly incorporated and reproduced the protected content.

Reddit’s chief legal officer Ben Lee characterized the situation as part of a broader pattern where AI companies engage in an arms race for quality human content, creating what he describes as a “data laundering” economy. He emphasized that Reddit represents a particularly attractive target due to its status as one of the internet’s most extensive archives of authentic human discussion. Lee specifically identified the three scraping companies as exemplars of problematic data harvesting practices, noting their use of identity masking, location concealment, and technical evasion methods to extract Reddit content indirectly through Google Search results.

Perplexity’s communication head Jesse Dwyer responded that the company had not yet received formal legal documents but affirmed their commitment to defending users’ rights to access public knowledge. He maintained that Perplexity operates on principled foundations and will resist what he characterized as threats against open information access and public interest considerations. The case highlights growing tensions between content platforms and AI developers regarding fair use, copyright boundaries, and appropriate methods for training artificial intelligence systems on publicly available web content.

(Source: The Verge)

Topics

data scraping 95% legal action 95% copyright infringement 90% ai training 85% content protection 80% content licensing 75% answer engine 75% data laundering 75% legal compliance 70% corporate ethics 70%