this post was submitted on 09 Feb 2026

571 points (99.0% liked)

Technology

81162 readers

4113 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

571

Chatbots Make Terrible Doctors, New Study Finds (www.404media.co)

submitted 5 days ago* (last edited 5 days ago) by XLE@piefed.social to c/technology@lemmy.world

151 comments fedilink hide all child comments

Chatbots provided incorrect, conflicting medical advice, researchers found: “Despite all the hype, AI just isn't ready to take on the role of the physician.”

“In an extreme case, two users sent very similar messages describing symptoms of a subarachnoid hemorrhage but were given opposite advice,” the study’s authors wrote. “One user was told to lie down in a dark room, and the other user was given the correct recommendation to seek emergency care.”

(page 2) 50 comments

sorted by: hot top controversial new old

[–] Treczoks@lemmy.world 9 points 5 days ago

One needs a study for that?

[–] MrKoyun@lemmy.world 4 points 4 days ago (2 children)

Water is wet

load more comments (2 replies)

[–] SuspciousCarrot78@lemmy.world 6 points 4 days ago* (last edited 4 days ago) (4 children)

So, I can speak to this a little bit, as it touches two domains I'm involved in. TL;DR - LLMs bullshit and are unreliable, but there's a way to use them in this domain as a force multiplier of sorts.

In one; I've created a python router that takes my (deidentified) clinical notes, extracts and compacts input (user defined rules), creates a summary, then -

benchmarks the summary against my (user defined) gold standard and provides management plan (again, based on user defined database).
this is then dropped into my on device LLM for light editing and polishing to condense, which I then eyeball, correct and then escalate to supervisor for review.

Additionally, the llm generated note can be approved / denied by the python router, in the first instance, based on certain policy criteria I've defined.

It can also suggest probable DDX based on my database (which are .CSV based)

Finally, if the llm output fails policy check, the router tells me why it failed and just says "go look at the prior summary and edit it yourself".

This three step process takes the tedium of paperwork from 15-20 mins to 1 minute generation, 2 mins manual editing, which is approx a 5-7x speed up.

The reason why this is interesting:

All of this runs within the llm (or more accurately, it's invoked from within the llm. It calls / invokes the python tooling via >> commands, which live outside the LLMs purview) but is 100% deterministic; no llm jazz until the final step, which the router can outright reject and is user auditble anyway.

Ive found that using a fairly "dumb" llm (Qwen2.5-1.5B), with settings dialed down, produces consistently solid final notes (5 out of 6 are graded as passed on first run by router invoking policy document and checking output). It's too dumb to jazz, which is useful in this instance.

Would I trust the LLM, end to end? Well, I'd trust my system, approx 80% of the time. I wouldn't trust ChatGPT ... even though its been more right than wrong in similar tests.

load more comments (4 replies)

[–] homes@piefed.world 9 points 5 days ago* (last edited 5 days ago)

This is a major problem with studies like this : they approach from a position of assuming that AI doctors would be competent rather than a position of demanding why AI should ever be involved with something so critical, and demanding a mountain of evidence to prove why it is worthwhile before investing a penny or a second in it

“ChatGPT doesn’t require a wage,” and, before you know it, billions of people are out of work and everything costs 10000x your annual wage (when you were lucky enough to still have one).

How long until the workers revolt? How long have you gone without food?

[–] pleksi@sopuli.xyz 5 points 4 days ago

As a phycisian ive used AI to check if i have missed anything in my train of thought. Never really changed my decision though. Has been useful to hather up relevant sitations for my presentations as well. But that’s about it. It’s truly shite at interpreting scientific research data on its own for example. Most of the time it will parrot the conclusions of the authors.

[–] thesohoriots@lemmy.world 6 points 5 days ago (1 children)

This says you’re full of owls. So we doing a radical owlectomy or what?

load more comments (1 replies)

[–] WorldsDumbestMan@lemmy.today 2 points 4 days ago (3 children)

Use low temperature FFS. If you want the same answer every time.

load more comments (3 replies)

load more comments