Yeah I think that's why we need an Absolute Imbecile Level Reasoning Benchmark.
Here's what the typical PR from AI hucksters looks like:
https://www.anthropic.com/news/claude-3-family
Fully half of their claims about performance are for "reasoning", with names like "Graduate Level Reasoning". OpenAI is even worse - recall theirs claiming to have gotten 90th percentile on LSAT?
On top of it, LLMs are fine tuned to convince some dumb ass CEO who "checks it out". Even though you can pay for the subscription, you're neither the customer nor the product, you're just collateral eyeballs on the ad.
How am I accepting the premise, though? I do call it an Absolute Imbecile, but that's more of a word play on the "AI" moniker.
What I do accept is an unfortunate fact that they did get their "AIs" to score very highly on various "reasoning" benchmarks (some of their own design), standardized tests, and so on and so forth. It works correctly across most simple variations, such as changing the numbers in a problem or the word order.
They really did a very good job at faking reasoning. I feel that even though LLMs are complete bullshit, the sheer strength of that bullshit is easy to underestimate.