this post was submitted on 09 Mar 2025
416 points (99.1% liked)
Technology
83449 readers
5022 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
My worry is that these social media alternatives might get scraped by these AI companies as well.
Sure, a company handing it over is much easier (i.e. Reddit). But with the decentralized nature, everyone needs to protect their instances themselves, which I’m not sure how well everyone will be capable of doing that.
Definitely much more difficult, so it’s a step in the right direction.
There are lists of bots that instance Admins can block for a range of reasons.
Anything online can be scraped but big firms might run into regulatory trouble if they are caught randomly scraping sites without consent. At the moment, the big social media apps have a tonne of content to train on in tightly controlled conditions, so they don't really need to go into the wild, yet. However, we need to be vigilant, block them and make a fuss if we catch them at it.
In the end, I don't think a Lemmy instance or any similar website without mandatory login could prevent itself getting scraped for content to train AI.
One could, for example, create their own Lemmy instance, federate with as many other instances as possible and scrape them that way.