AI & TechArtificial IntelligenceBigTech CompaniesDigital PublishingNewswireTechnology

Top AI Bots Drive Publishing Traffic: OpenAI, Meta, ByteDance Lead

▼ Summary

– Commerce was the leading sector for AI bot traffic at 48%, with media coming second at 13%.
– Within the media sector, publishing companies accounted for the largest share of AI bot activity at 40%.
– OpenAI generated the most AI bot traffic to media, using distinct bots for training, search, and real-time content retrieval.
– Akamai identifies fetcher bots as a more immediate revenue threat than training crawlers, as they provide content directly to users without site visits.
– Common publisher responses include blocking, tarpitting, and delaying bot requests, with the report advising against blanket blocking to preserve potential licensing deals.

A recent analysis of application-layer traffic reveals that AI bots are now a dominant force in online traffic, with a significant portion directed at the media and publishing sector. The study, which examined data from bot management tools, found that commerce sites receive nearly half of all AI bot traffic. However, the media industry, encompassing publishing, video, and social platforms, ranks second, attracting 13% of this automated activity. Within that category, publishing companies are the primary target, accounting for 40% of all AI bot interactions in media, surpassing broadcast and streaming services.

Leading this wave are major technology firms. OpenAI generates the most AI bot traffic to media companies, with 40% of its media-specific requests aimed at publishers. This high volume stems from its use of multiple specialized bots, including GPTBot for model training, OAI-SearchBot for AI-powered search, and ChatGPT-User for real-time content retrieval. Meta and ByteDance follow as the second and third largest operators, respectively, with Anthropic and Perplexity also appearing in the top five, though at notably lower traffic levels.

The analysis categorizes these automated visitors into four behavioral types, with two being particularly relevant for publishers. Training crawlers, which gather content to build and refine large language models, constituted 63% of all AI bot activity targeting media in the latter half of last year. Meanwhile, fetcher bots, which retrieve specific web pages in real-time to answer individual user queries in chatbots, represented 24% of the activity. Publishing sites were the destination for 43% of this fetcher bot traffic.

While training crawlers generate more overall volume, fetcher bots present a more immediate concern for publisher revenue. The critical issue is that when a fetcher bot pulls an article to answer a user’s question, the information is delivered directly within the chatbot interface. This allows the user to obtain the answer without ever clicking through to the publisher’s website, effectively bypassing potential ad impressions and subscription gates.

In response, many organizations are implementing technical countermeasures. Common strategies include outright denial of requests, tarpitting, which holds connections open to drain bot resources, and introducing deliberate response delays. One publisher, opting for a tarpitting approach over a full block, successfully managed 97% of AI bot requests while maintaining the possibility of future licensing deals with AI companies. The report cautions against universal blocking, noting that some AI firms are open to paying for content access, and a blanket ban eliminates that potential revenue stream.

The key insight for publishers is understanding the different impacts of these bot categories. Blocking a training crawler influences how a publisher’s content may be used to train future AI models. In contrast, blocking a fetcher bot directly affects whether that content is surfaced in AI-generated answers today, with immediate implications for web traffic and monetization. This distinction is crucial for developing a strategic response to the growing presence of artificial intelligence on the web.

(Source: Search Engine Journal)

Topics

ai bot traffic 95% fetcher bots 92% media sector analysis 90% training crawlers 89% publishing industry impact 88% openai bot activity 87% publisher revenue concerns 86% commerce sector targeting 85% bot management strategies 84% akamai bot management 83%