Claude’s AI Now Offers Granular Robots.txt Control

▼ Summary
– Anthropic has formally defined three distinct web crawlers: ClaudeBot for training data, Claude-User for user-requested pages, and Claude-SearchBot for indexing content for search results.
– Blocking the training bot (ClaudeBot) does not block the search or user-request bots, meaning a blanket block strategy is no longer effective for controlling all AI access.
– Unlike some competitors, Anthropic states all three of its bots honor robots.txt rules, while OpenAI and Perplexity indicate their user-initiated fetchers may not.
– Websites are increasingly allowing AI search crawlers while blocking training crawlers, a strategic shift that impacts their visibility in AI-powered search results.
– This three-tier structure creates a new publisher decision, similar to Google’s model, requiring separate robots.txt management for training, search, and user-request bots.
Anthropic has introduced a significant update to its web crawler policy, giving website owners granular control over how their content interacts with Claude’s AI systems. The company now formally distinguishes between three distinct bots, each with its own user-agent string in robots.txt files. This move allows publishers to make specific choices about whether their sites contribute to AI training data, appear in AI search results, or are accessed when a user directly asks Claude to fetch a page.
The three crawlers are ClaudeBot, which collects data for model training; Claude-SearchBot, which indexes content for Claude’s search optimization; and Claude-User, which retrieves pages in direct response to a user’s query. A key feature of this update is that each bot can be blocked independently. For instance, preventing ClaudeBot from accessing a site stops it from being used for training future AI models, but it does not affect whether the site can be indexed by Claude-SearchBot or fetched by Claude-User. This separation mirrors a structure already employed by competitors like OpenAI, which operates GPTBot, OAI-SearchBot, and ChatGPT-User.
This development fundamentally changes the strategy for website administrators. The common practice from 2024 of using a blanket directive to block all AI crawlers is now outdated and potentially counterproductive. A site that blocks only the training bot, ClaudeBot, can still be discovered and cited within Claude’s search answers. However, if it also blocks Claude-SearchBot, Anthropic warns this action “may reduce your site’s visibility and accuracy in user search results.” This creates a critical decision for publishers who wish to avoid contributing to AI training datasets but still want to maintain visibility in the growing ecosystem of AI-powered search tools.
The approach to user-initiated bots varies between companies, adding another layer of complexity. Anthropic states that its Claude-User bot respects robots.txt directives. In contrast, OpenAI has indicated that its ChatGPT-User may not be governed by robots.txt in the same way, and Perplexity generally does not apply these rules to its Perplexity-User agent. This means a publisher’s ability to control when an AI fetches a page because a user asked for it directly is not guaranteed across all platforms.
The implications for web traffic and visibility are substantial. Data indicates a clear trend of websites selectively allowing search crawlers while blocking training crawlers. Analyses show that while blocking of training bots has increased, the coverage of AI search crawlers has grown dramatically. This suggests that many publishers see value in being included in AI search indexes, even as they opt out of model training. As these AI search tools generate more referral traffic, the potential cost of blocking their search crawlers rises, making a strategic, bot-by-bot approach to robots.txt management more important than ever.
(Source: Search Engine Journal)



