this post was submitted on 28 Oct 2025
188 points (96.5% liked)

Programming

23348 readers
291 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev



founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[โ€“] mesamunefire@piefed.social 1 points 6 days ago (1 children)

I think I get what your saying. LOL LLM bots stealing all the things.

You may note, im not arguing the ethical concerns of LLMs, just the way it was pulled. Its why open source models that pull data and let others have full access to said data could be argued as more ethical. For practical purposes, it means we can just pull them off hugging face and use them on our home setups. And reproduce them with the "correct" datasets. As always garbage in/ garbage out. I wish my work would allow me to put all the SQL over a 30(?) year period into a custom LLM just for our proprietary BS. Thats something I would have NO ethical concerns about at all.

[โ€“] riskable@programming.dev 1 points 6 days ago

For reference, every AI image model uses ImageNET (as far as I know) which is just a big database of publicly accessible URLs and metadata (classification info like, "bird" ).

The "big AI" companies like Meta, Google, and OpenAI/Microsoft have access to additional image data sets that are 100% proprietary. But what's interesting is that the image models that are constructed from just ImageNET (and other open sources) are better! They're superior in just about every way!

Compare what you get from say, ChatGPT (DALL-E 3) with a FLUX model you can download from civit.ai... you'll get such superior results it's like night and day! Not only that, but you have an enormous plethora of LoRAs to choose from to get exactly the type of image you want.

What we're missing is the same sort of open data sets for LLMs. Universities have access to some stuff but even that is licensed.