this post was submitted on 17 Aug 2025
667 points (99.7% liked)

Technology

593 readers
504 users here now

Share interesting Technology news and links.

Rules:

  1. No paywalled sites at all.
  2. News articles has to be recent, not older than 2 weeks (14 days).
  3. No external video links, only native(.mp4,...etc) links under 5 mins.
  4. Post only direct links.

To encourage more original sources and keep this space commercial free as much as I could, the following websites are Blacklisted:

More sites will be added to the blacklist as needed.

Encouraged:

Misc:

Relevant Communities:

founded 4 months ago
MODERATORS
 

Comments

Source.

you are viewing a single comment's thread
view the rest of the comments
[โ€“] witten@lemmy.world 26 points 1 month ago (1 children)

Then they'd have to bother understanding the content and downloading it as appropriate. And you'd think if anyone could understand and parse websites in realtime to make download decisions, it be giant AI companies. But ironically they're only interested in hoovering up everything as plain web pages to feed into their raw training data.

[โ€“] Natanael 16 points 1 month ago

The same morons scrape Wikipedia instead of downloading the archive files which trivially can be rendered as web pages locally