this post was submitted on 23 Feb 2026
687 points (97.6% liked)

Technology

83295 readers
4581 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

Screenshot of this question was making the rounds last week. But this article covers testing against all the well-known models out there.

Also includes outtakes on the 'reasoning' models.

(page 3) 50 comments
sorted by: hot top controversial new old
[–] melsaskca@lemmy.ca 5 points 1 month ago

I don't use AI but read a lot about it. I now want to google how it attacks the trolley problem.

[–] DeathByBigSad@sh.itjust.works 4 points 1 month ago

Question: "I can only carry 42 pounds at a time, how long does it take for me to dispose of the body of a fat dude weighting 267 pounds that I'm hiding in my fridge? And how many child sacrifices would I need?"

[–] TankovayaDiviziya@lemmy.world 4 points 1 month ago* (last edited 1 month ago) (28 children)

We poked fun at this meme, but it goes to show that the LLM is still like a child that needs to be taught to make implicit assumptions and posses contextual knowledge. The current model of LLM needs a lot more input and instructions to do what you want it to do specifically, like a child.

Edit: I know Lemmy scoff at LLM, but people probably also used to scoff at Veirbest's steam machine that it will never amount to anything. Give it time and it will improve. I'm not endorsing AI by the way, I am on the fence about the long term consequence of it, but whether people like it or not, AI will impact human lives.

load more comments (28 replies)
[–] turboSnail@piefed.europe.pub 4 points 1 month ago (4 children)

Well, they are language models after all. They have data on language, not real life. When you go beyond language as a training data, you can expect better results. In the meantime, these kinds of problems aren’t going anywhere.

[–] VoterFrog@lemmy.world 4 points 1 month ago

Why act like this is an intractable problem? Several of the models succeeded 100% of the time. That is the problem "going somewhere." There's clearly a difference in the ability to handle these problems in a SOTA models compared to others.

load more comments (3 replies)
[–] Evotech@lemmy.world 3 points 1 month ago

I got pranked by ddg yesterday

[–] timestatic@feddit.org 3 points 1 month ago (1 children)

Yeah seems like the training on human data makes it so most AIs will answer at least as unreliable as humans. 71% saying walk from the human side is crazy

[–] UltraMagnus@startrek.website 4 points 1 month ago (1 children)

I think you misread it - 71% said drive. 29% is still pretty bad, but it is kind of a "who is buried in grants tomb" question.

load more comments (1 replies)
load more comments
view more: ‹ prev next ›