The web has a new system for making AI companies pay up

A new licensing standard aims to let web publishers set the terms of how AI system developers use their work. On Wednesday, major brands like Reddit, Yahoo, Medium, Quora, and People Inc. announced support for Really Simple Licensing (RSL), an open content licensing standard that enables publishers to outline how bots should pay to scrape their sites for AI training data. They’re hoping the collective action gives them leverage to get AI companies on board.

The RSL Standard builds upon the robots.txt protocol, which has long allowed publishers to provide instructions to web crawlers about what parts of their site they can and can’t access. But instead of just saying yes or no to specific bots, websites can now add licensing and royalty terms to their robots.txt file. They can also embed the terms in online books, videos, and training datasets that they may want compensation for.

Behind the RSL Standard is a newly formed rights organization called the RSL Collective, helmed by Eckart Walther, a co-creator of the Really Simple Syndication (RSS) standard and former CardSpring CEO, and Doug Leeds, the former CEO of IAC Publishing and Ask.com. “The goal is to create a new, scalable business model for the web,” Walther tells The Verge. “RSL takes some of those early RSS ideas and creates a new layer for the entire internet where licensing rights and compensation rights are defined.”

The RSL Standard supports a variety of licensing models, including free ones. Site owners can ask AI companies to pay a subscription or assign a pay-per-crawl fee through the RSL Standard, which companies must pay each time an AI bot crawls the website. They can also implement a pay-per-inference fee, allowing sites to receive compensation when an AI model references their work to generate a response. Bots that are crawling sites for other purposes, like archival or search engine inclusion, can proceed as usual.

“What we’re doing is not reinventing wheels or inventing wheels.”

Several media companies, including The Verge parent company Vox Media, The Wall Street Journal owner News Corp, and The New York Times, have struck licensing agreements with individual AI companies such as OpenAI and Amazon. But the RSL Collective aims to simplify this process by allowing any website owner or creator to get paid for their work rather than negotiating separate deals.

Like a lot of standards, RSL’s success depends on major industry players — in this case, AI companies — buying into it. AI model builders have repeatedly been accused of ignoring sites’ robots.txt files, and there’s no simple way to tally something like the inference fee without their participation. The RSL Collective is betting that bringing together some of the biggest web publishers will make adopting the standard more appealing. “Our job is to go out and get a big group of people to say it’s in your interest, both efficiently, because you can negotiate with everybody at once, and legally, because if you don’t, you’re violating everybody at once,” Leeds says.

The RSL Standard by itself also can’t block bots from visiting a website, unlike the “pay per crawl” system already offered by Cloudflare. The RSL Collective is currently working with Fastly, a content delivery network, to admit AI bots to websites based on whether they’ve agreed to license content. Fastly is “the bouncer at the door to the club, and they won’t let people in unless they have the right ID,” Leeds says. “RSL is issuing the IDs. So we say, ‘Hey, you’ve agreed to license this content,’ and Fastly says, ‘Come on, in your ID checks out.’” Publishers who don’t use Fastly can still ask AI companies to license their content, but they’ll be unable to block AI crawlers until more providers build a solution.

Leeds believes that the RSL Collective can legally enforce licenses as well, as he says “all participants in the collective rights organization participate in the enforcement of any infringement,” spreading the legal costs. He compares the system to existing digital rights organizations, like the music rights group ASCAP, which collects licensing fees and distributes them to members. While conventional music licensing benefits from a particularly strong and well-established legal precedent for copyright protection, however, unauthorized scraping and the use of media for training AI systems still land in a legal gray area, with major AI players currently fighting lawsuits from Reddit, Getty Images, and many online publishers.

“There has always been a question of whether bots have agreed to terms that they don’t see,” Leeds and Walther added in an emailed statement. “RSL changes that fundamentally, putting crawlers on notice of what the terms are before they access a site.”

Even so, Leeds hopes the system can create an intuitive way to navigate licensing works for AI training. “What we’re doing is not reinventing wheels or inventing wheels — we’re just bringing them to a place that they haven’t existed before,” Leeds says. “The reason they haven’t existed here before is because they haven’t had a standard that we could build on. So that’s why RSL Standard is so important: it gives the infrastructure to then create the things that have worked in every other media industry that hasn’t happened yet.”

The RSL Collective is free for publishers and creators to join, with other big brands like O’Reilly, wikiHow, and IGN owner Ziff Davis also on board.