Artificial IntelligenceCybersecurityNewswireTechnologyWhat's Buzzing

Perplexity AI accused of bypassing no-crawl rules, says Cloudflare

▼ Summary

Perplexity is accused of using stealth bots to bypass websites’ no-crawl directives, violating long-standing Internet norms, according to Cloudflare.
– Cloudflare customers reported that Perplexity continued scraping their sites despite robots.txt blocks and firewall rules.
Researchers found Perplexity used rotating IPs and ASNs to evade detection, affecting over 10,000 domains and millions of daily requests.
– The alleged tactics undermine the Robots Exclusion Protocol, a standard in place since 1994 to regulate web crawlers.
– The protocol, formalized in 2022, allows sites to control crawler access via robots.txt files, a norm widely followed until now.

Cloudflare has raised serious concerns about Perplexity AI allegedly bypassing website crawling restrictions, potentially violating long-standing internet protocols. The network security firm claims the AI search engine employed undisclosed bots and IP rotation tactics to access content despite explicit blocks through robots.txt files and firewall rules.

According to Cloudflare’s investigation, Perplexity’s crawlers ignored standard restrictions and switched to unlisted IP addresses when blocked. Researchers observed this behavior across thousands of domains, with millions of daily requests originating from masked sources. The company suggests these actions undermine the Robots Exclusion Protocol, a foundational web standard established in 1994 to regulate crawler access.

Cloudflare’s findings indicate that when Perplexity’s official crawlers faced restrictions, the company allegedly deployed alternative methods to scrape data. These included rotating IP addresses and using unrelated network providers to evade detection. Such practices, if confirmed, would directly contradict the principles of transparency and permission-based crawling that have governed web indexing for decades.

The Robots Exclusion Protocol, formalized as an official standard in 2022, relies on mutual respect between website owners and crawlers. By allegedly circumventing these rules, Perplexity risks eroding trust in ethical web scraping practices. Cloudflare’s report highlights the broader implications for data privacy and compliance, urging stricter enforcement of crawling policies to protect content creators.

While Perplexity has yet to publicly respond to these allegations, the accusations spotlight growing tensions between AI-driven data collection and established web norms. As companies increasingly rely on automated scraping for AI training, maintaining ethical boundaries remains critical to preserving the internet’s open yet regulated ecosystem.

(Source: Ars Technica)

Topics

perplexity ai scraping allegations 95% robots exclusion protocol violation 90% cloudflare investigation 85% ip rotation tactics 80% data privacy concerns 75% ethical web scraping 70% ai-driven data collection 65%