there’s an important distinction to make here in these comments: i’m seeing a lot of people claim LLMs are stochastic or “guessing machines,” when this isn’t exactly true.
LLMs give exact answers, it isn’t a guess whatsoever. they’re exact answers within the model, however. if the model is flawed, your answers will be flawed. when it comes to conversation no model is exactly equivalent to a human brain yet, so all models “lie” and are “flawed.”
(Edit: that’s not even to note the fact that humans aren’t perfect conversationalists either… this is why when people complain about chatgpt glazing them and shit it’s kind of obtuse… like yeah, openAI are attempting to build the perfect generalist conversation bot. what does that even mean in practice? should it push back against you? if so, when? just when you want it to? when is that?? it’s all not so easy, the machine learning is actually the simple part lmao.)
now: the discussion about research into LLMs “lying” is actually real but isn’t related to the phenomenon you’re discussing here. some of the comments are correct that what you’re talking about right now might be more aptly categorized as hallucinating.
the research you’re referring to is more about alignment problems in general. it isn’t a “lie” or “deception” in the anthropomorphic sense that you’re thinking of. the researchers noticed that models would reach a certain threshold of reasoning and intelligence where it could devise a devious, kind of complex training strategy - it could fake passing tests during training in order to “meet” its goals… even though it hadn’t actually done so, which means the model would behave differently in deployment than training, thus “deception.”
think about it like this: you’re back in high school english class and there’s a ton of assigned reading but you don’t want to do it because you’d rather play halo and smoke weed than read 1984 or something. so, what do you do? you go read the spark notes and pretend like you read the book in the class during discussions and on the tests. this is similar to how model deception happens in training/deployment. it’s achieving the same ends that we ask for, but it’s not getting there the way we expect or desire, so some scenarios it will behave in unexpected ways, hence “lying.”
it has nothing to do with it seeming to “lie” in the anthropomorphic sense, it’s all math all the time here bay-beee… 😎