this post was submitted on 10 Sep 2025

125 points (96.3% liked)

Technology

74979 readers

4014 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

125

Pay-per-output? AI firms blindsided by beefed up robots.txt instructions. (arstechnica.com)

submitted 20 hours ago by ccunning@lemmy.world to c/technology@lemmy.world

20 comments fedilink hide all child comments

top 20 comments

sorted by: hot top controversial new old

[–] Kissaki@feddit.org 18 points 15 hours ago

evolves robots.txt instructions by adding an automated licensing layer that's designed to block bots that don't fairly compensate creators for content

robots.txt - the well known technology to block bad-intention bots /s

What's automated about the licensing layer? At some point, I started skimming the article. They didn't seem clear about it. The AI can "automatically" parse it?

# NOTICE: all crawlers and bots are strictly prohibited from using this 
# content for AI training without complying with the terms of the RSL 
# Collective AI royalty license. Any use of this content for AI training 
# without a license is a violation of our intellectual property rights.

License: https://rslcollective.org/royalty.xml

Yeah, this is as useless as I thought it would be. Nothing here is actively blocking.

I love that the XML then points to a text/html content website. I guess nothing for machine parsing, maybe for AI parsing.

I don't remember which AI company, but they argued they're not crawlers but agents acting on the users behalf for their specific request/action, ignoring robots.txt. Who knows how they will react. But their incentives and history is ignoring robots.txt.

Why ~~am I~~ is this comment so negative. Oh well.

[–] underline960@sh.itjust.works 58 points 20 hours ago (2 children)

Leeds told Ars that the RSL standard doesn't just benefit publishers, though. It also solves a problem for AI companies, which have complained in litigation over AI scraping that there is no effective way to license content across the web.

"If they're using it, they pay for it, and if they're not using it, they don't pay for it.

...

But AI companies know that they need a constant stream of fresh content to keep their tools relevant and to continually innovate, Leeds suggested. In that way, the RSL standard "supports what supports them," Leeds said, "and it creates the appropriate incentive system" to create sustainable royalty streams for creators and ensure that human creativity doesn't wane as AI evolves.

This article tries to slip in the idea that creators will benefit from this arrangement. Just like with Spotify and Getty Images, it's the publisher that's getting paid.

Then they decide how much they'll let trickle down to creators.

[–] ICastFist@programming.dev 14 points 18 hours ago

Cue an even greater influx of AI slop pages in hopes of getting crawled for that juicy trickled down money

[–] ccunning@lemmy.world 4 points 20 hours ago

I would assume creators and published would agree to those terms in advance (moving forward of course).

[–] billwashere@lemmy.world 14 points 16 hours ago

The issue is the line that says “compensate creators”. Reddit still thinks it’s the creator, not the individual users.

[–] FaceDeer@fedia.io 21 points 18 hours ago (2 children)

And suddenly the Internet is gung-ho in favor of EULAs being enforceable simply by reading the content the website has already provided.

Recent major court cases have held that the training of an AI model is fair use and doesn't involve copyright violation, so I don't think licensing actually matters in this case. They'd have to put the content behind a paywall to stop the trainer from seeing it in the first place.

[–] ccunning@lemmy.world 7 points 18 hours ago (2 children)

I guess that’s a different court case than the one where Anthropic offered to pay $1.5 billion?

[–] FaceDeer@fedia.io 5 points 14 hours ago

Nope, this was one of them. The case had two parts, one about the training and one about the downloading of pirated books. The judge issued a preliminary judgment about the training part, that was declared fair use without any further need to address it in trial. The downloading was what was proceeding to trial and what the settlement offer was about.

[–] NewNewAugustEast@lemmy.zip 6 points 17 hours ago (1 children)

Totally different. Anthropic could have bought all the books and trained on them. Pirating is a different topic.

[–] corsicanguppy@lemmy.ca 2 points 16 hours ago (2 children)

Anthropic could have bought

You think buying the books would let them plagiarize ? That doesn't seem to be normal in the "book buying" process.

[–] Womble@piefed.world 2 points 11 hours ago

Given the judege in that case flat out rejected the claim that there was any infringement for works they had legally aquired, yes.

[–] NewNewAugustEast@lemmy.zip 6 points 16 hours ago* (last edited 16 hours ago)

Doesn't really matter what I think, its a different concept than pirating. Hence a different thing than what was getting ruled on.

I mean AI or not look at it this way: if a company wanted to train their workers and pirated all the training manuals, piracy is the issue, not the training.

[–] tabular@lemmy.world 0 points 12 hours ago* (last edited 12 hours ago) (1 children)

Is it hypocrisy to be for EULA enforcement on reading when it's machines, but not when it's humans? Crawlers "read" on a massive scale that doesn't compare to humans.

[–] WhyJiffie@sh.itjust.works 1 points 11 hours ago

I don't think so, or not always. humans need to find the EULA on the website by first loading the main page or another they found a link to. but if the path of that document was standardized, it could be enforced that way for robots

[–] tchambers@crust.piefed.social 7 points 17 hours ago (1 children)

Wonder if this could work for Fediverse servers too.

[–] rimu@piefed.social 1 points 3 hours ago

Hold my beer

[–] zrst@lemmy.cif.su 8 points 18 hours ago

Does AI cost advertisers money?

I'd be cool with it if that's the case.

[–] trailee@sh.itjust.works 10 points 19 hours ago* (last edited 19 hours ago)

Neither the article nor the RSL website makes clear how pricing or payment works, which seems like a huge miss. It’s not obvious if a publisher can price-differentiate among content, or even choose their own prices at all.

RSL makes an analogy:

Collective licensing organizations like ASCAP and BMI have long helped musicians get paid fairly by working together and pooling rights into a single, indispensable offering.

I’d like to get excited about this because AI companies suck, but if the best example they have is that ASCAP helps “musicians get paid fairly” I’m afraid this isn’t a solution that most content creators will celebrate.

[–] BrianTheeBiscuiteer@lemmy.world 7 points 20 hours ago

Not a bad idea but the biggest challenge will probably be determining who needs to be sued for non-compliance. Google might not be hiding the origin of its bots now but that could easily change.

[–] tchambers@crust.piefed.social 3 points 19 hours ago

Interesting.