this post was submitted on 26 Jul 2025
878 points (99.0% liked)
Programmer Humor
25425 readers
947 users here now
Welcome to Programmer Humor!
This is a place where you can post jokes, memes, humor, etc. related to programming!
For sharing awful code theres also Programming Horror.
Rules
- Keep content in english
- No advertisements
- Posts must be related to programming or programmer topics
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I wonder if their data is poisoned by below average Dev. I mean if your test subjects are met or below Dev and mad Ethel lost 20% efficiency imagine what you can do to good dev
Not below average dev necessarily, but when posting code examples on the internet people often try to get a point across. Like how do I solve X? Here is code that solves X perfectly, the rest of the code is total crap, ignore that and focus on the X part. Because it's just an example, it doesn't really matter. But when it's used to train an LLM it's all just code. It doesn't know which parts are important and which aren't.
And this becomes worse when small little bits of code are included in things like tutorials. That means it's copy pasted all over the place, on forums, social media, stackoverflow etc. So it's weighted way more heavily. And the part where the tutorial said: "Warning, this code is really bad and insecure, it's just an example to show this one thing" gets lost in the shuffle.
Same thing when an often used pattern when using a framework gets replaced by new code where the framework does a little bit more so the same pattern isn't needed anymore. The LLM will just continue with the old pattern, even though there's often a good reason it got replaced (for example security issues). And if the new and old version aren't compatible with each other, you are in for a world of hurt trying to use an LLM.
And now with AI slop flooding all of these places where they used to get their data, it just becomes worse and worse.
These are just some of the issues why using an LLM for coding is probably a really bad idea.
Didn't expect this much. I don't think about tuto example being weighted heavier. This make sense.