Reddit Sues Perplexity Over Alleged Google Data Scraping

▼ Summary
– Reddit is suing four data-scraping companies, including Perplexity and SerpApi, for illegally scraping its content via Google search results.
– The companies allegedly hid their identities to bypass restrictions and scraped Reddit data at an industrial scale to resell or train AI models.
– Reddit is seeking financial damages, a permanent injunction, and a ban on using or selling the scraped data.
– SEOs and site owners face challenges accessing reliable search data due to Google cracking down on scraping and tightening APIs.
– The relationship between Google and content creators has turned adversarial with the rise of generative AI, zero-click results, and declining organic traffic.
Reddit has initiated legal action against four data collection firms, including the AI search platform Perplexity and the SEO data provider SerpApi, for allegedly harvesting its content through Google search results without authorization. The lawsuit, submitted to the U.S. District Court for the Southern District of New York, contends that these organizations orchestrated a plan to indirectly extract Reddit data via Google, subsequently repurposing or selling it to develop artificial intelligence models. According to the filing, the defendants concealed their identities to evade technical safeguards and engaged in scraping on an industrial scale. Reddit is pursuing monetary compensation, a permanent court order, and a prohibition on the utilization or sale of any previously collected data.
SerpApi has been identified as a client of OpenAI, which helps explain why Google search results occasionally surface within ChatGPT. Reddit already maintains formal data licensing arrangements with both OpenAI and Google, but the company asserts that other entities have attempted to circumvent these agreements. To substantiate its claims, Reddit described setting a trap for Perplexity by publishing a test entry visible solely to Google’s web crawler. Within a matter of hours, that same post appeared in Perplexity’s search results, proof, Reddit argues, that the firm was relying on scraped Google data.
For SEO specialists and website operators, the landscape for obtaining dependable search data is growing increasingly difficult. Google is intensifying its measures against data scraping and restricting API access, all while many sites are experiencing traffic declines due to AI-generated summaries and zero-click search outcomes. This creates a scenario with reduced visibility, fewer analytical insights, and a more challenging atmosphere for interpreting or affecting AI-powered search behavior.
At the same time, Reddit and Google are said to be in discussions about a fresh collaboration that would integrate Reddit discussions more seamlessly into Google’s AI offerings. Should these negotiations progress, a greater volume of Reddit conversations could appear in AI Overviews and other Google interfaces, potentially altering how both platforms impact brand exposure and web traffic.
The broader context reveals that although AI-related data collection is on the rise, it has not yet translated into substantial referral traffic. Data from TollBit indicates that Google directs 831 times more visitors to websites than all AI systems combined. Additional statistics released by Cloudflare in July emphasize the lopsided ratio of crawls to actual visitors: Google operates at an 18:1 ratio, while OpenAI and Anthropic show ratios of 1,500:1 and 60,000:1, respectively. Historically, Google and content producers enjoyed a mutually beneficial relationship, but the arrival of generative AI, along with the growth of zero-click searches and the downturn in organic traffic, has turned that dynamic adversarial.
(Source: Search Engine Land)




