scruiser

joined 2 years ago
[–] scruiser@awful.systems 10 points 9 months ago (3 children)

Have they fixed it as in genuinely uses python completely reliably or "fixed" it, like they tweaked the prompt and now it use python 95% of the time instead of 50/50? I'm betting on the later.

[–] scruiser@awful.systems 19 points 9 months ago

We barely understsnd how LLMs actually work

I would be careful how you say this. Eliezer likes to go on about giant inscrutable matrices to fearmoner, and the promptfarmers use the (supposed) mysteriousness as another avenue for crithype.

It's true reverse engineering any specific output or task takes a lot of effort and requires access to the model's internals weights and hasn't been done for most tasks, but the techniques exist for doing so. And in general there is a good high level conceptual understanding of what makes LLMs work.

which means LLMs don’t understand their own functioning (not that they “understand” anything strictly speaking).

This part is absolutely true. If you catch them in mistake, most of their data about responding is from how humans respond, or, at best fine-tuning on other LLM output and they don't have any way of checking their own internals, so the words they say in response to mistakes is just more bs unrelated to anything.

[–] scruiser@awful.systems 15 points 10 months ago

Example #"I've lost count" of LLMs ignoring instructions and operating like the bullshit spewing machines they are.

[–] scruiser@awful.systems 17 points 10 months ago (1 children)

Another thing that's been annoying me about responses to this paper... lots of promptfondlers are suddenly upset that we are judging LLMs by abitrary puzzle solving capabilities... as opposed to the arbitrary and artificial benchmarks they love to tout.

[–] scruiser@awful.systems 26 points 10 months ago (2 children)

So, I've been spending too much time on subreddits with heavy promptfondler presence, such as /r/singularity, and the reddit algorithm keeps recommending me subreddit with even more unhinged LLM hype. One annoying trend I've noted is that people constantly conflate LLM-hybrid approaches, such as AlphaGeometry or AlphaEvolve (or even approaches that don't involve LLMs at all, such as AlphaFold) with LLMs themselves. From their they act like of course LLMs can [insert things LLMs can't do: invent drugs, optimize networks, reliably solve geometry exercise, etc.].

Like I saw multiple instances of commenters questioning/mocking/criticizing the recent Apple paper using AlphaGeometry as a counter example. AlphaGeometry can actually solve most of the problems without an LLM at all, the LLM component replaces a set of heuristics that make suggestions on proof approaches, the majority of the proof work is done by a symbolic AI working with a rigid formal proof system.

I don't really have anywhere I'm going with this, just something I noted that I don't want to waste the energy repeatedly re-explaining on reddit, so I'm letting a primal scream out here to get it out of my system.

[–] scruiser@awful.systems 10 points 10 months ago

Just one more training run bro. Just gotta make the model bigger, then it can do bigger puzzles, obviously!

[–] scruiser@awful.systems 32 points 10 months ago* (last edited 10 months ago) (7 children)

The promptfondlers on places like /r/singularity are trying so hard to spin this paper. "It's still doing reasoning, it just somehow mysteriously fails when you it's reasoning gets too long!" or "LRMs improved with an intermediate number of reasoning tokens" or some other excuse. They are missing the point that short and medium length "reasoning" traces are potentially the result of pattern memorization. If the LLMs are actually reasoning and aren't just pattern memorizing, then extending the number of reasoning tokens proportionately with the task length should let the LLMs maintain performance on the tasks instead of catastrophically failing. Because this isn't the case, apple's paper is evidence for what big names like Gary Marcus, Yann Lecun, and many pundits and analysts have been repeatedly saying: LLMs achieve their results through memorization, not generalization, especially not out-of-distribution generalization.

[–] scruiser@awful.systems 8 points 10 months ago

A surprising number of the commenters seem to be at least considering the intended message... which makes the contrast of the number of comments failing at basic reading comprehension that much more absurd (seriously, it's absurd how many comments somehow missed that the author was living in and working from Brazil and felt it didn't reflect badly on them to say as much in the HN comments).

[–] scruiser@awful.systems 9 points 10 months ago (1 children)

I struggle to think of a good reason why such prominent figures in politics and tech would associate themselves with such an event.

There is no good reason, but there is an obvious bad one: these prominent figures have racist sympathies (if they aren't "outright" racist themselves) and in between a lack of empathy and position of privilege don't care about the negative effects of boosting racist influencers.

[–] scruiser@awful.systems 8 points 10 months ago (2 children)

I've been waiting for this. I wish it had happened sooner, before DOGE could do as much damage it did, but better late than never. Donald Trump isn't going to screw around, and, ironically, DOGE has shown you don't need congressional approval or actual legal authority to screw over people funded by the government, so I am looking forward to Donald screwing over SpaceX or Starlink's government contracts. On the returning end... Elon doesn't have that many ways of properly screwing with Trump, even if he has stockpiled blackmail material I don't think it will be enough to turn MAGA against Trump. Still, I'm somewhat hopeful this will lead to larger infighting between the techbro alt-righters and the Christofascist alt-righters.

[–] scruiser@awful.systems 20 points 10 months ago (3 children)
  • "tickled pink" is a saying for finding something humorous

  • "BI" is business insider, the newspaper that has the linked article

  • "chuds" is a term of online alt-right losers

  • OFC: of fucking course

  • "more dosh" mean more money

  • "AI safety and alignment" is the standard thing we sneer at here: making sure the coming future acasual robot god is a benevolent god. Occasionally reporter misunderstand it to mean or more PR-savvy promptfarmers misrepresent it to mean stuff like stopping LLMs from saying racist shit or giving you recipes that would accidentally poison you but this isn't it's central meaning. (To give the AI safety and alignment cultists way too much charity, making LLMs not say racist shit or give harmful instructions has been something of a spin-off application of their plans and ideas to "align" AGI.)

view more: ‹ prev next ›