Fuck AI

3642 readers

521 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

founded 1 year ago

MODERATORS

VerbFlow@lemmy.world

MrMcGasion@lemmy.world

TootSweet@lemmy.world

BigMikeInAustin@lemmy.world

cynar@lemmy.world

drmeanfeel@lemmy.world

pavnilschanda@lemmy.world

CriticalMedicine@lemmy.world

WonderfulWanderer@lemmy.world

Communist@lemmy.ml

eatCasserole@lemmy.world

SpaceNoodle@lemmy.world

NutWrench@lemmy.world

Soup@lemmy.cafe

iAvicenna@lemmy.world

Tinks@lemmy.world

wizblizz@lemmy.world

corus_kt@lemmy.world

Prandom_returns@lemm.ee

JimSamtanko@lemm.ee

TrickDacy@lemmy.world

TheFriar@lemm.ee

ArmokGoB@lemmy.dbzer0.com

HawlSera@lemm.ee

andrew_bidlaw@sh.itjust.works

MeDuViNoX@sh.itjust.works

33550336@lemmy.world

Nougat@fedia.io

Lost_My_Mind@lemmy.world

Sterile_Technique@lemmy.world

Quill7513@slrpnk.net

glowing_hans@sopuli.xyz

e8d79@discuss.tchncs.de

ThefuzzyFurryComrade@pawb.social

274

Just a little... why not? (infosec.pub)

submitted 4 days ago by resipsaloquitur@lemmy.world to c/fuck_ai@lemmy.world

9 comments fedilink hide all child comments

cross-posted from: https://lemmy.world/post/33855043

you are viewing a single comment's thread
view the rest of the comments

[–] pixxelkick@lemmy.world 6 points 4 days ago

Exceedingly false representation of the actual experiment.

They took Llama 3 and then trained it further on specific conditions (reinforcing it on "likes" / "thumbs up"s based on positive feedback from a simulated userbase)

And then after that the scientists found the new model (which you can't really call Llama 3 anymore, it's been trained further and it's behavior fundamentally altered) behaved like this when prior informed that the user was easily influenced by the model specifically

What is important to gather though, is the fact that when a model gets trained on the metrics of "likes", it starts to behave in a manner like this, telling the user whatever they want to hear... Which makes sense, the model is effectively getting trained to min/max positive feedback from users, rather than being trained on being right / correct

But to try and represent this as a "real" chatbot's behavior is definitely false, this was a model trained by scientists explicitly to test if this behavior happens under extreme conditioning.