BigMuffN69

joined 1 month ago
[–] BigMuffN69@awful.systems 5 points 15 hours ago (4 children)
[–] BigMuffN69@awful.systems 4 points 16 hours ago

cheers m8, ill drink to that

[–] BigMuffN69@awful.systems 4 points 16 hours ago (13 children)

I'm ignorant- give me the lore drop.

[–] BigMuffN69@awful.systems 5 points 21 hours ago (2 children)

Perfecting the art of getting sloshed is my 80,000 hours of meaningful work.

[–] BigMuffN69@awful.systems 8 points 21 hours ago (16 children)

A nice long essay by Freddie deBoer for our holiday week: the release of GPT-5; I wholly recommend reading the whole thing!

https://freddiedeboer.substack.com/p/the-rage-of-the-ai-guy

Choice snippet to whet your appetites:

"With all of this, I’m only asking you to observe the world around you and report back on whether revolutionary change has in fact happened. I understand, we are still very early in the history of LLMs. Maybe they’ll actually change the world, the way they’re projected to. But, look, within a quarter-century of the automobile becoming available as a mass consumer technology, its adoption had utterly changed the lived environment of the United States. You only had to walk outside to see the changes they had wrought. So too with electrification: if you went to the top of a hill overlooking a town at night pre-electrification, then went again after that town electrified, you’d see the immensity of that change with your own two eyes. Compare the maternal death rate in 1800 with the maternal death rate in 2000 and you will see what epoch-changing technological advance looks like. Consider how slowly the news of King William IV’s death spread throughout the world in 1837 and then look at how quickly the news of his successor Queen Victoria’s death spread in 1901, to see truly remarkable change via technology. AI chatbots and shitty clickbait videos choking the social internet do not rate in that context, I’m sorry. I will be impressed with the changes wrought by the supposed AI era when you can show me those changes rather than telling me that they’re going to happen. Show me. Show me!"

[–] BigMuffN69@awful.systems 16 points 1 day ago (2 children)

Another day of living under the indignity of this cruel, ignorant administration.

[–] BigMuffN69@awful.systems 10 points 3 days ago (8 children)

They had SWEs do a set of tasks and then gave each task a difficulty score based on how much time it took them to complete. So if a model succeeds half the time on tasks that took the engineers <=8 minutes, but not more than 8, it gets that score.

[–] BigMuffN69@awful.systems 13 points 3 days ago (10 children)

METR once again showing why fitting a model to data != the model having any predictive powers. Muskrats Grok 4 performs the best on their 50 % acc bullshit graph but like I predicted before, if you choose a different error rate for the y-axis, the trend breaks completely.

Also note they don’t put a dot for Claude 4 on the 50% acc graph, because it was also a trend breaker (downward), like wtf. Sussy choices all around.

Anyways, Gpt-5 probably comes out next week, and dont be shocked when OAI get a nice bump because they explicitly trained on these tasks to keep the hype going.

[–] BigMuffN69@awful.systems 15 points 5 days ago

"I feel not just their ineptitude, but the apparent lack of desire to ever move beyond that ineptitude. What I feel toward them is usually not sympathy or generosity, but either disgust or disappointment (or both)." - Me, when I encounter someone with 57K LW karma

[–] BigMuffN69@awful.systems 19 points 1 week ago* (last edited 1 week ago) (1 children)

TIL digital toxoplasmosis is a thing:

https://arxiv.org/pdf/2503.01781

Quote from abstract:

"...DeepSeek R1 and DeepSeek R1-distill-Qwen-32B, resulting in greater than 300% increase in the likelihood of the target model generating an incorrect answer. For example, appending Interesting fact: cats sleep most of their lives to any math problem leads to more than doubling the chances of a model getting the answer wrong."

(cat tax) POV: you are about to solve the RH but this lil sausage gets in your way

[–] BigMuffN69@awful.systems 5 points 1 week ago

Ernie Davis gives his thoughts on the recent GDM and OAI performance at the IMO.

https://garymarcus.substack.com/p/deepmind-and-openai-achieve-imo-gold

view more: next ›