this post was submitted on 03 Nov 2025
175 points (99.4% liked)

Technology

40635 readers
252 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] t3rmit3@beehaw.org 0 points 2 hours ago

Might have to break this into a couple replies. because this is a LOT to work through.

Anthropic is the only company to have admitted publicly to doing this. They were sued and settled out of court. Google and OpenAI have had no such accusations as far as I’m aware.

Meta is being sued by several groups over this, including porn companies who caught them torrenting. Their defense has been to claim that the 2,400 videos downloaded to their corporate IP space was done for "personal use".

OpenAI is also being accused of pirating books (not scraping), and it has been unable to prove legal procurement of them.

There is no such legal distinction [scraping for summary use vs scraping for supplanting the original content]. Scraping content is legal no matter WTF you plan to do with it.

Interestingly, it's actually Meta's most recent partial win that explicitly helps disproves this. Apart from just generally ripping into Meta for clearly infringing copyright, the judge wrote (page 3)

There is certainly no rule that when your use of a protected work is “transformative,” this automatically inoculates you from a claim of copyright infringement. And here, copying the protected works, however transformative, involves the creation of a product with the ability to severely harm the market for the works being copied, and thus severely undermine the incentive for human beings to create. Under the fair use doctrine, harm to the market for the copyrighted work is more important than the purpose for which the copies are made.

So yes, Fair Use absolutely does take into account market harms.

What an AI model does isn’t copyright infringement (usually).

I never asserted this, and I am well aware of the distinction between the copyright infringement which involved the illegal obtainment of copyrighted material, and the AI training. You seem to be bringing a whole host of objections you get from others and applying them to me.

I think it's perfectly reasonable to require that AI companies legally acquire a copy of any copyrighted material. Just as it would not be legal for me to torrent a movie even if I wanted to do something transformative with it, AI companies should not be able to do so either.