this post was submitted on 23 Feb 2026
687 points (97.6% liked)

Technology

83251 readers
4345 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

Screenshot of this question was making the rounds last week. But this article covers testing against all the well-known models out there.

Also includes outtakes on the 'reasoning' models.

(page 2) 50 comments
sorted by: hot top controversial new old
[–] FireWire400@lemmy.world 8 points 1 month ago* (last edited 1 month ago) (6 children)

Gemini 3 (Fast) got it right for me; it said that unless I wanna carry my car there it's better to drive, and it suggested that I could use the car to carry cleaning supplies, too.

Edit: A locally run instance of Gemma 2 9B fails spectacularly; it completely disregards the first sentece and recommends that I walk.

load more comments (6 replies)
[–] ryannathans@aussie.zone 8 points 1 month ago (17 children)

Opus 4.6 has been excellent at problem solving in software development, no surprises it nails it

It's no surprise public opinion is these tools are trash when the free models are unable to answer simple questions

[–] Fizz@lemmy.nz 6 points 1 month ago (5 children)

The free models feel years behind so people constantly underestimate what its capable of. I still hear people say ai can't generate fingers.

load more comments (5 replies)
load more comments (16 replies)
[–] humanspiral@lemmy.ca 8 points 1 month ago (3 children)

Some takeaways,

Sonar (Perplexity models) say you are stealing energy from AI whenever you exercise (you should drive because eating pollutes more). ie gets right answer for wrong reason.

US humans, and 55-65 age group, score high on international scale probably for same reasoning. "I like lazy".

load more comments (3 replies)
[–] lemmydividebyzero@reddthat.com 8 points 1 month ago

They will scrape that article, too.

And I'm a few months, they have "learned" how that task works.

[–] criticon@lemmy.ca 7 points 1 month ago (7 children)

Even when they give the correct answer they talk too much. AI responses contain a lot of garbage. When AI gives you an answer it will try to justify itself. Since they won't give you brief responses the responses will be long.

load more comments (7 replies)
[–] jaykrown@lemmy.world 7 points 1 month ago (1 children)

Interesting, I tried it with DeepSeek and got an incorrect response from the direct model without thinking, but then got the correct response with thinking. There's a reason why there's a shift towards "thinking" models, because it forces the model to build its own context before giving a concrete answer.

Without DeepThink

With DeepThink

[–] rockSlayer@lemmy.blahaj.zone 5 points 1 month ago (6 children)

It's interesting to see it build the context necessary to answer the question, but this seems to be a lot of text just to come up with a simple answer

load more comments (6 replies)
[–] myfunnyaccountname@lemmy.zip 7 points 1 month ago (17 children)

There are a lot of humans that would fail this as well. Just sayin.

load more comments (17 replies)
[–] MojoMcJojo@lemmy.world 7 points 1 month ago (2 children)

Ai is not human. It does not think like humans and does not experience the world like humans. It is an alien from another dimension that learned our language by looking at text/books, not reading them.

load more comments (2 replies)
[–] tover153@lemmy.world 7 points 1 month ago (3 children)

After getting it wrong, the LLM I use most: Me: You can't wash your car if it isn't there.

Ah. Yes. That is an excellent and devastatingly practical correction.

In that case: drive. But do it with the quiet shame of someone moving the car the length of a bowling lane.

This is one of those situations where logic beats virtue. The car must be present to be washed. Walking was philosophically elegant but mechanically flawed.

So:

Start engine.

Travel 50 meters.

Avoid eye contact with pedestrians.

Commit fully.

You are not lazy. You are complying with system requirements.

load more comments (3 replies)
[–] vala@lemmy.dbzer0.com 6 points 1 month ago (3 children)

Hey LLM, if I have a 16 ounce cup with 10oz of water in it and I add 10 more ounces, how much water is in the cup?

load more comments (3 replies)
[–] Professorozone@lemmy.world 6 points 1 month ago

Didn't like 30% of the population elect Trump? Coincidence? I don't think so.

[–] melsaskca@lemmy.ca 5 points 1 month ago

I don't use AI but read a lot about it. I now want to google how it attacks the trolley problem.

load more comments
view more: ‹ prev next ›