AI & TechArtificial IntelligenceBusinessNewswireTechnology

Brands Unite to Monetize AI Training Data with New Licensing Standard

▼ Summary

– Major publishers have introduced Really Simple Licensing (RSL), a new standard allowing websites to demand compensation from AI companies for data scraping and training.
– RSL enables publishers to embed licensing and royalty terms directly into files like robots.txt, offering subscription, pay-per-crawl, or pay-per-inference fee options.
– The RSL Collective, led by industry figures, is promoting this standard as a scalable business model for the web, with support from companies like Reddit, Yahoo, and Fastly.
– RSL’s effectiveness depends on cooperation from AI companies, as enforcement may be challenging without widespread adoption and compliance.
– Parallel efforts, such as Cloudflare’s default blocking of AI crawlers and pay-per-crawl system, reflect a broader industry push to reshape how AI firms access training data.

A new licensing framework is gaining momentum among major digital publishers, offering a potential pathway to monetize content used for artificial intelligence training. Really Simple Licensing (RSL) represents a significant shift from the traditional binary approach of blocking or allowing web crawlers, introducing a structured method for content owners to seek compensation when their data is utilized by AI systems.

Until recently, publishers relied on robots.txt files to manage bot access, an all-or-nothing system that provided no mechanism for financial return. RSL changes that dynamic by enabling websites to embed specific licensing and royalty terms directly into their content, including online articles, videos, books, and datasets. This allows publishers to set various fee structures, such as subscriptions, pay-per-crawl arrangements, or even pay-per-inference models, where compensation is triggered each time an AI generates a response using licensed material.

It’s important to note that bots engaged in non-commercial activities, like search engine indexing or archival projects, can continue to operate without interruption. The new standard is designed specifically to address the growing use of web content for AI model training.

The initiative is spearheaded by the RSL Collective, led by RSS co-creator Eckart Walther and former Ask.com CEO Doug Leeds. They envision RSL as a scalable, industry-wide business model, drawing inspiration from collective rights organizations in other creative fields such as music licensing. Technical infrastructure support is being developed in partnership with Fastly, which is building “gatekeeper” technology capable of admitting or blocking bots based on their compliance with embedded licensing terms.

A growing coalition of brands and content platforms has thrown their support behind RSL. Notable names include Reddit, Yahoo, Quora, Medium, Adweek, Internet Brands, The MIT Press, O’Reilly, wikiHow, and Ziff Davis, among others. While some larger media entities like The New York Times and News Corp have already negotiated individual agreements with AI companies, RSL aims to democratize the process, making it accessible to publishers of all sizes.

A central challenge remains: the standard depends heavily on voluntary adoption by AI firms. Historically, some AI developers have disregarded robots.txt directives, raising questions about how effectively RSL can be enforced without broader industry cooperation. The success of the initiative will likely hinge on whether leading AI companies agree to participate.

In a related development, Cloudflare, which supports approximately 20% of all websites, has also begun implementing measures to restrict unauthorized AI crawling. The company now blocks AI bots by default and is experimenting with a Pay Per Crawl system, creating a parallel effort that may further pressure AI firms to formalize how they acquire and pay for training data.

More details about the RSL framework and its implementation are available on the official RSL website.

(Source: Search Engine Land)

Topics

rsl standard 95% publisher support 85% ai scraping 80% compensation models 75% robots.txt upgrade 70% enforcement challenges 65% rsl collective 60% gatekeeper technology 55% direct licensing deals 50% ai bot blocking 45%