overview for diz

[long] Some tests of how much AI "understands" what it says (spoiler: very little) in c/sneerclub@awful.systems

[–] diz@awful.systems 14 points 2 years ago (1 children)

But if your response to the obvious misrepresentation that a chatbot is a person of ANY level of intelligence is to point out that it’s dumb you’ve already accepted the premise.

How am I accepting the premise, though? I do call it an Absolute Imbecile, but that's more of a word play on the "AI" moniker.

What I do accept is an unfortunate fact that they did get their "AIs" to score very highly on various "reasoning" benchmarks (some of their own design), standardized tests, and so on and so forth. It works correctly across most simple variations, such as changing the numbers in a problem or the word order.

They really did a very good job at faking reasoning. I feel that even though LLMs are complete bullshit, the sheer strength of that bullshit is easy to underestimate.

[long] Some tests of how much AI "understands" what it says (spoiler: very little) in c/sneerclub@awful.systems

[–] diz@awful.systems 17 points 2 years ago

Yeah I think that's why we need an Absolute Imbecile Level Reasoning Benchmark.

Here's what the typical PR from AI hucksters looks like:

https://www.anthropic.com/news/claude-3-family

Fully half of their claims about performance are for "reasoning", with names like "Graduate Level Reasoning". OpenAI is even worse - recall theirs claiming to have gotten 90th percentile on LSAT?

On top of it, LLMs are fine tuned to convince some dumb ass CEO who "checks it out". Even though you can pay for the subscription, you're neither the customer nor the product, you're just collateral eyeballs on the ad.

[long] Some tests of how much AI "understands" what it says (spoiler: very little) in c/sneerclub@awful.systems

[–] diz@awful.systems 9 points 2 years ago

GPT4 supposedly (it says that it is GPT4). I have access to one that is cleared for somewhat sensitive data, so presumably my queries aren't getting flagged and human reviewed by OpenAI.

[long] Some tests of how much AI "understands" what it says (spoiler: very little) in c/sneerclub@awful.systems

[–] diz@awful.systems 23 points 2 years ago (10 children)

Both parties are buying into a premise we already know to be incorrect.

We may know it is incorrect, but LLM salesmen are claiming things like "90th percentile on LSAT", high scores on a "college level reasoning benchmark" and so on and so forth.

They are claiming "yeah yeah there's all the anekdotal reports of glue pizza, but objectively, our AI is more capable than your workers, so you can replace them with our AI", and this is starting to actually impact the job market.

Tech Bros Invented Trains And It Broke Me in c/sneerclub@awful.systems

[–] diz@awful.systems 24 points 2 years ago

Other thing to add to this is that there's just one or two people in the train providing service for hundreds of other people or millions of dollars worth of goods. Automating those people away is simply not economical, not even in terms of the headcount replaced vs headcount that has to be hired to maintain the automation software and hardware.

Unless you're a techbro, who deeply resents labor, someone who would rather hire 10 software engineers than 1 train driver.

[long] Some tests of how much AI "understands" what it says (spoiler: very little) in c/sneerclub@awful.systems

[–] diz@awful.systems 23 points 2 years ago* (last edited 2 years ago) (3 children)

Also, my thought on this is that since an LLM has no internal state with which to represent the state of the problem, it can't ever actually solve any variation of the river crossing. Not even those that it "solves" correctly.

If it outputs the correct sequence, inside your head the model of the problem will be in the solved state, but on the LLM's side there's just a sequence of steps that it wrote down, with those steps directly inhibiting production of another "Trip" token, until that crosses a threshold. There isn't an inventory or even a count of items, there's an unrelated number that weights for or against "Trip".

If we are to anthropomorphize it (which we shouldn't, but anyway), it's bullshitting up an answer and it gradually gets a feeling that it has bullshitted enough, which can happen at the right moment, or not.

The mainstreaming of 'AI' scepticism [Baldur Bjarnason] in c/techtakes@awful.systems

[–] diz@awful.systems 23 points 2 years ago* (last edited 2 years ago)

I love the "criti-hype". AI peddlers absolutely love any concerns that imply that the AI is really good at something.

Safety concern that LLMs would go Skynet? Say no more, I hear you and I'll bring it up in the congress!

Safety concern that terrorists might use it to make bombs? Say no more! I agree that the AI is so great for making bombs! We'll restrict it to keep people safe!

Sexual roleplay? Yeah, good point, I love it. Our technology is better than sex itself! We'll restrict it to keep mankind from falling into the sin of robosexuality and going extinct! I mean, of course, you can't restrict something like that, but we'll try, at least until we release a hornybot.

But any concern about language modeling being fundamentally not the right tool for some job (Do you want to cite a paper or do you want to sample from the underlying probability distribution?), hey hey hows about we talk about the skynet thing instead?

"Google Gemini tried to kill me" in c/techtakes@awful.systems

[–] diz@awful.systems 18 points 2 years ago* (last edited 2 years ago)

It used to mean things like false positives in computer vision, where it is sort of appropriate: the AI is seeing something that's not there.

Then the machine translation people started misusing the term when their software mistranslated by adding something that was not present in the original text. They may have been already trying to be misleading with this term, because "hallucination" implies that the error happens when parsing the input text - which distracts from a very real concern about the possibility that what was added was being plagiarized from the training dataset (which carries risk of IP contamination).

Now, what's happening is that language models are very often a very wrong tool for the job. When you want to cite a court case as a precedent, you want a court case that actually existed - not a sample from the underlying probability distribution of possible court cases! LLM peddlers don't want to ever admit that an LLM is the wrong tool for that job, so instead they pretend that it is the right tool that, alas, sometimes "hallucinates".

"Google Gemini tried to kill me" in c/techtakes@awful.systems

[–] diz@awful.systems 11 points 2 years ago (1 children)

YOU CAN DO THAT WITHOUT AI.

Can they, though? Sure, in theory Google could hire millions of people to write overviews that are equally idiotic, but obviously that is not something they would actually do.

I think there's an underlying ethical theory at play here, which goes something like: it is fine to fill internet with half-plagiarized nonsense, as long as nobody dies, or at least, as long as Google can't be culpable.