this post was submitted on 31 Jan 2025
102 points (98.1% liked)

Technology

75191 readers
3148 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
top 9 comments
sorted by: hot top controversial new old
[–] NineMileTower@lemmy.world 34 points 7 months ago (1 children)

Nothing. It was pirated for free.

[–] misk@sopuli.xyz 11 points 7 months ago (1 children)

Some have allegedly paid.

“We’ve provided about 20-30 companies/teams with our entire dataset. It’s the same data as on our torrents page, but they get access to high-speed SFTP servers.” 

“Usually, this is in exchange for a large monetary donation or, on occasion, in exchange for good datasets they acquired,” ‘Anna’s Archivist’ adds, noting that all data they obtain is shared publicly.

[–] FaceDeer@fedia.io 14 points 7 months ago (1 children)

The fact that Anna's Archive is accepting additional datasets as "payment" makes me comfortable that they're not in this for the money but rather for ideological reasons.

[–] misk@sopuli.xyz 2 points 7 months ago

Or it could be that such trade wouldn’t have to appear in accounting :)

[–] FaceDeer@fedia.io 31 points 7 months ago

Guess we've finally reached the moment where letting the giant intellectual property cartels monopolize human culture is going to cause serious economic side effects for other big corporations rather than simply screwing over the general public.

[–] General_Effort@lemmy.world 7 points 7 months ago

“We cleaned 860K English and 180K Chinese e-books from Anna’s Archive,” a DeepSeek VL paper, published last March, states.

Hmm.

[–] yetAnotherUser@discuss.tchncs.de 5 points 7 months ago

Honestly, this is the best thing about the AI hype.

Remember to support your local (shadow) library!

[–] SnotFlickerman@lemmy.blahaj.zone 3 points 7 months ago

Bibliotik baybeeeee

[–] hendrik@palaver.p3x.de 3 points 7 months ago* (last edited 7 months ago)

Yeah, information wants to be free. I'd say we just do away with copyright /s

Or I could try training AI as well once this is settled. Of course I'd need to get a few big harddrives to store a few books, audiobooks, music, Netflix series... Or is this just a perk for big and greedy companies?