this post was submitted on 10 Oct 2025
562 points (99.1% liked)

Programmer Humor

26827 readers
2225 users here now

Welcome to Programmer Humor!

This is a place where you can post jokes, memes, humor, etc. related to programming!

For sharing awful code theres also Programming Horror.

Rules

founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[โ€“] Xylight@lemdro.id 9 points 22 hours ago (1 children)

There is a reason there is sometimes a notable decrease in quality of the same AI model a while after it's released.

Hosters of the models (like OpenAI or Microsoft) may have switched to a quantized version of their model. Quantization is a common practice to increase power efficiency and make the model easier to run, by essentially rounding the weights of the model to a lower precision. This decreases VRAM and storage usage significantly, at the cost of a bit of quality, where higher quantization results in worse quality.

For example, the base model will likely be in FP16, full floating point precision. They may switch to a Q8 version, which nearly halves the size of the model, with about a 3-7% decrease in quality.

[โ€“] MonkeMischief@lemmy.today 2 points 6 hours ago* (last edited 6 hours ago)

Expertly explained. Thank you! It's pretty rad what you can get out of a quantized model on home hardware, but I still can't understand why people are trying to use it for anything resembling productivity.

It sounds like the typical tech industry:

"Look how amazing this is!" (Full power)

"Uh...uh oh, that's unsustainable. Let's quietly drop it." (Way reduced power)

"People are saying it's not as good, we can offer them LLM+ plus for better accuracy!" (3/4 power with subscription)