AI & TechArtificial IntelligenceBigTech CompaniesDigital MarketingDigital PublishingNewswireTechnology

Cloudflare’s AI Rules Can Block Googlebot

▼ Summary

– Cloudflare now sorts AI crawlers by three behaviors—Search, Agent, and Training—instead of a single “block AI bots” switch.
– From September 15, Training and Agent crawlers will be blocked by default on ad-supported pages for new and existing free customers who haven’t changed settings.
– Multi-purpose crawlers like Googlebot will be blocked if they perform any blocked behavior, such as Training, even if they also perform Search.
– Cloudflare is testing content-use signals in robots.txt with three levels—immediate, reference, and full—and revised bot verification to depend on category.
– The update may cause sites blocking AI training to unintentionally block Googlebot, potentially reducing search visibility.

Cloudflare has introduced a significant update to how it identifies and manages AI-driven crawlers, a change that could inadvertently block Googlebot on websites configured to prevent AI training. This announcement arrives as part of the company’s second Content Independence Day, giving site owners new tools to control automated traffic.

The updated controls categorize bots by their specific actions rather than a simple “block AI bots” toggle. They are available immediately for all users, including those on the free tier, with a separate set of default settings taking effect on September 15.

Three Behavioral Categories for AI Crawlers

Cloudflare now sorts crawlers into three distinct groups based on their purpose:

  • Search: Bots that index a site to retrieve information for later queries, tied directly to referral traffic.The company urges bot operators to run separate crawlers for each behavior, allowing websites to see exactly why a bot is visiting and decide whether to permit or block it accordingly.What Happens on September 15Two major default changes will go into effect on that date. For new customers and new sites from existing customers, Training and Agent crawlers will be blocked by default on pages displaying ads, while Search crawlers remain allowed. Cloudflare also states that existing free customers who haven’t adjusted their settings by September 15 will be migrated to these new defaults.The second change is more sweeping. Cloudflare will now treat multi-purpose crawlers based on their overall behavior, applying the strictest rule that fits. For instance, a crawler that performs both Search and Training will be blocked if a site blocks Training. The company specifically cites Googlebot, Applebot, and Bingbot as examples, since each crawls for both search indexing and AI training. If a site previously enabled the older “Block AI bots” setting, it will fall under this new rule.To retain those crawlers, site owners can review or modify these settings in the Cloudflare dashboard anytime before September 15. Cloudflare says it will continue sending reminders leading up to the date.New Signals for Content UseCloudflare is also testing a content-use signal that extends the Content Signals in robots.txt. It carries three values, from most restrictive to least: immediate (stores nothing), reference (indexes and links back, now the default), and full (summarizes and reproduces). The company clarifies that these state a preference and do not enforce blocks on their own.The definition of “Verified” for bots has also been revised. A verified bot is no longer automatically permitted everywhere; instead, its access depends on its behavior category. Additionally, bots that replicate content in full are ineligible for verification. For Enterprise Bot Management users, Cloudflare has introduced BotBase, a searchable directory that displays each tracked bot’s classification and a copyable detection ID for security rules.The Report Driving the ChangesThis update coincides with a Cloudflare report marking one year since the first Content Independence Day. The report reveals that AI training now accounts for the majority of crawler requests on its network, up from roughly 20% in spring 2025. It also notes that daily AI agent requests surged by more than 1,700% over the past year. These figures are based on Cloudflare’s own network traffic and do not represent the entire internet.Why This MattersThe September 15 rule effectively ties AI training blocks to search crawling within Cloudflare’s network. If a site blocks Training to protect its content from AI models, it may also unintentionally block Googlebot. Because Cloudflare’s block operates at the network level, it is harder to bypass than a simple robots.txt directive, which is advisory and can be ignored by Google. Losing Googlebot’s access means a site won’t be crawled as effectively, potentially harming its visibility in search results over time.I’ve observed publishers moving to default-deny setups and blocking both retrieval and training bots over the past year. The risk is consistent: blocking the training layer can also block the search layer that keeps a site findable.Looking AheadWebsites using Cloudflare should review their AI blocking settings by September 15 and decide whether to keep Search crawlers enabled. The combined-crawler rule primarily affects those who previously turned on “Block AI bots” and haven’t adjusted their settings since. Free users who do not make changes will have their settings updated to the new defaults on that date.Cloudflare wants operators of mixed-purpose crawlers to separate those bots by behavior over the coming year. Whether major operators differentiate their bots by behavior will determine whether this becomes a real choice, rather than a compromise between blocking AI training and maintaining search visibility.
(Source: Search Engine Journal)

Topics

ai crawler blocking 98% search vs training 95% crawler behavior categories 94% september 15 changes 93% googlebot impact 92% network-level blocking 91% multi-purpose crawlers 90% search visibility risk 89% content independence day 88% content signals 87%