this post was submitted on 03 Nov 2025
173 points (99.4% liked)
Technology
40635 readers
267 users here now
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Anthropic is the only company to have admitted publicly to doing this. They were sued and settled out of court. Google and OpenAI have had no such accusations as far as I'm aware. Furthermore, Google had the gigantic book scanning project where it was determined in court that the act of scanning as many fucking books as you want is perfectly legal (fair use). Read all about it: https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,_Inc.
You say:
There is no such legal distinction. Scraping content is legal no matter WTF you plan to do with it. This has been settled in court many, many times. Here's some court cases for you to learn the actual legality of scraping and storing of said scraped data:
To summarize all this: You are 100% wrong. I have cited my sources. I was there ("3000 years ago...") when all this went down. Pepperidge Farm remembers.
You say:
This is a common misconception of copyright law: Remember Napster? They were sued and argued in court that because users don't profit from sharing songs with their friends, it is legal. The court rejected this argument: https://en.wikipedia.org/wiki/A%26M_Records,_Inc._v._Napster,_Inc. See also: https://en.wikipedia.org/wiki/Capitol_Records,_Inc._v._Thomas-Rasset and https://en.wikipedia.org/wiki/Harper_%26_Row_v._Nation_Enterprises and https://en.wikipedia.org/wiki/American_Geophysical_Union_v._Texaco,_Inc. where the courts all ruled the same way.
You say:
Downloading a Youtube video for offline use is legal... Depending on the purpose. This is one of those very, very nuanced areas of copyright law where fair use intersects with the DMCA and also intersects with the CFAA. The DMCA states, "No person shall circumvent a technological measure that effectively controls access to a work protected under this title." Since Youtube videos have some technical measures to prevent copying (depending on the resolution and platform!), it is illegal to circumvent them. However, The Librarian of Congress can grant exceptions to this rule and has done so for many situations. For example, archiving (https://www.arl.org/news/librarian-of-congress-expands-dmca-exemption-for-text-and-data-mining/) which is just plain wacky, IMHO.
Regardless, if Youtube didn't put an anti-circumvention mechanism into their videos it would be perfectly legal to download the videos. Just like it's legal to record TV shows with a VCR. This was ruled in Sony Corp. of America v. Universal City Studios (already cited). There's no reason why it wouldn't still apply to Youtube videos. The fact that no one has been sued for doing this since then (that I could find) seems to indicate that this is a very settled thing.
You say:
No. Fuck no. A shittton of people are saying it's "theft". Have you been on the Internet recently? LOL! I see it every damned day and I'm sick of it. I repeat myself that, "it's not theft, it's copyright infringement" and I get downvoted for "being pedantic". Like it's not a very fucking important distinction!
...but also: What an AI model does isn't copyright infringement (usually). You ask it to generate an image or some text and it just does what you ask it to do. The fact that it's possible for it to infringe copyright shouldn't matter because it's just a tool like a Xerox machine/copier. It has already been ruled fair use for an AI company to train their models with copyrighted works (great summary of that here: https://www.debevoise.com/insights/publications/2025/06/anthropic-and-meta-decisions-on-fair-use ). Despite these TWO court rulings, people are still saying that training AI models is both "theft" and somehow "illegal". We're already past that.
AI models are terrible copyright violators! Everything they generate—at best—can only ever be, "kinda sorta like" a copyrighted work. You can get closer and closer if you get clever with prompts and tell the model to generate say, 10000 images of the same thing. Then you can look at your prayers to the RNG gods and say, "Aha! Look! This image looks very very similar to Indiana Jones!"
You say:
Where TF did you see this? I did some searching and I cannot see anything suggesting that the AI companies have rejected any kind of DMCA protection.
Might have to break this into a couple replies. because this is a LOT to work through.
Meta is being sued by several groups over this, including porn companies who caught them torrenting. Their defense has been to claim that the 2,400 videos downloaded to their corporate IP space was done for "personal use".
OpenAI is also being accused of pirating books (not scraping), and it has been unable to prove legal procurement of them.
Interestingly, it's actually Meta's most recent partial win that explicitly helps disproves this. Apart from just generally ripping into Meta for clearly infringing copyright, the judge wrote (page 3)
So yes, Fair Use absolutely does take into account market harms.
I never asserted this, and I am well aware of the distinction between the copyright infringement which involved the illegal obtainment of copyrighted material, and the AI training. You seem to be bringing a whole host of objections you get from others and applying them to me.
I think it's perfectly reasonable to require that AI companies legally acquire a copy of any copyrighted material. Just as it would not be legal for me to torrent a movie even if I wanted to do something transformative with it, AI companies should not be able to do so either.