I didn’t see Claude 4 Sonnet in the tests and this is the one I use. And it looks like about the same category as o4 mini from my experience.
It is a nice tool to have in my belt. But these LLM based agents are still very far from being able to do advanced and hard tasks. But to me it is probably more important to communicate and learn about the limitations about these tools to not lose tile instead of gaining it.
In fact, I am not even sure they are good enough to be used to really generate production-ready code. But they are nice for pre-reviewing, building simple scripts that don’t need to be highly reliable, analyse a project, ask specific questions etc… The game changer for me was to use Clojure-MCP. Having a REPL at disposal really enhance the quality of most answers.
Before ; someone with a salary did some work.
After ; you the customer do the work, are not paid for it, forced to see ads, and naturally more and more steps will be added. (do you want to give to charity? do you want our premium card? what is your city? and other bullshit).
Sorry but I refuse to self checkout. Pretty often if there is only self checkout I left everything in place for the staff to put again in the store.
If I am forced to use them, I am already in such bad mood that I make my best to make the experience as terrible as possible. I lie systematically to any question, I tend to make mistakes, wait for someone to come. Mainly, I try to make it worse economically.
How in one generation people have accepted to work for free. Not for me.