scruiser

joined 2 years ago
[–] scruiser@awful.systems 5 points 7 months ago (1 children)

Oh duh, I remember that meme now. With the people getting on the bus wearing weird white robe outfits?

[–] scruiser@awful.systems 2 points 7 months ago

I’d probably be exaggerating if I said that every time I looked under the hood of Wikipedia, it reaffirmed how I don’t have the temperament to edit there.

The lesswrongers hate dgerad's Wikipedia work because they perceive it as calling them out, but if anything Wikipedia's norms makes his "call outs" downright gentle and routine.

[–] scruiser@awful.systems 8 points 7 months ago* (last edited 7 months ago) (10 children)

“Yall are in a cult, and it is TESCREAL.”

So I know you were going for a snappy summary, but I think one of the important things to note is that the TESCREAL essay doesn't call them a singular cult, it draws connections between the letters of the acronym including inspirations, people in multiple letters of the acronym, common terminology, common ideological assumptions, and such.

I think a hypothetical more mature rationalist movement would acknowledge their historical and current influences and think critically about how they relate to them instead of just going nuh-uh. Like the relatively more reasonable EAs occasionally point out problematic trends in their movement and at least try to address them (not particularly effectually, but at least they aren't all in total denial).

[–] scruiser@awful.systems 12 points 7 months ago (13 children)

It feels like this person was mad at the TESCREAL label and decided to make a blog post going "nuh-uh, I know you are but what am I"... except they have none of the academic ability of the TESCREAL authors so they just sort of pile on labels and ideologies without properly showing any causal or ideological relationship (like the TESCREAL authors do). Heck, they outright screw up words and definitions in a few places, (Orate sticks out to me).

[–] scruiser@awful.systems 4 points 7 months ago

Keep in mind I was wildly guessing with a lot of numbers... like I'm sure 90 GB vRAM is enough for decent quality pictures generated in minutes, but I think you need a lot more compute to generate video at a reasonable speed? I wouldn't be surprised if my estimate is off by a few orders of magnitude. $.30 is probably enough that people can't spam lazily generated images, and a true cost of $3.00 would keep it in the range of people that genuinely want/need the slop... but yeah I don't think it is all going cleanly away once the bubble pops or fizzles.

[–] scruiser@awful.systems 6 points 7 months ago

After GPT-3 failed to be it, they aimed at five iterations instead because that sounded like a nice number to give to investors, and GPT-3.5 and GPT-4o are very much responses to an inability to actually manifest that AGI on a VC-friendly timetable.

That's actually more batshit than I thought! Like I thought Sam Altman knew the AGI thing was kind of bullshit and the hesitancy to stick a GPT-5 label on anything was because he was saving it for the next 10x scaling step up (obviously he didn't even get that far because GPT-5 is just a bunch of models shoved together with a router).

[–] scruiser@awful.systems 4 points 7 months ago* (last edited 7 months ago)
  1. Even if was noticeably better, Scam Altman hyped up GPT-5 endlessly, promising a PhD in your pocket, and an AGI and warning that he was scared of what he created. Progress has kind of plateaued, so it isn't even really noticeably better, it scores a bit higher on some benchmarks, and they've patched some of the more meme'd tests (like counting rs in strawberry... except it still can't count the r's in blueberry, so they've probably patched the more obvious flubs with loads of synthetic training data as opposed to inventing some novel technique that actually improves it all around). The other reason the promptfondlers hate it is because, for the addicts using it as a friend/therapist, it got a much drier more professional tone, and for the people trying to use it in actual serious uses, losing all the old models overnight was really disruptive.

  2. There are a couple of speculations as to why... one is that GPT-5 variants are actually smaller than the previous generation variants and they are really desperate to cut costs so they can start making a profit. Another is that they noticed that there naming scheme was horrible (4o vs o4) and confusing and have overcompensated by trying to cut things down to as few models as possible.

  3. They've tried to simplify things by using a routing model that makes the decision for the user as to what model actually handles each user interaction... except they've screwed that up apparently (Ed Zitron thinks they've screwed it up badly enough that GPT-5 is actually less efficient despite their goal of cost saving). Also, even if this technique worked, it would make ChatGPT even more inconsistent, where some minor word choice could make the difference between getting the thinking model or not and that in turn would drastically change the response.

  4. I've got no rational explanation lol. And now they overcompensated by shoving a bunch of different models under the label GPT-5.

[–] scruiser@awful.systems 4 points 7 months ago* (last edited 7 months ago) (1 children)

There are techniques for caching some of the steps involved with LLMs. Like I think you can cache the tokenization and maybe some of the work of the attention head is doing if you have a static, known, prompt? But I don't see why you couldn't just do that caching separately for each model your model router might direct things to? And if you have multiple prompts you just do a separate caching for each one? This creates a lot of memory usage overhead, but not more excessively more computation... well you do need to do the computation to generate each cache. I don't find it that implausible that OpenAI couldn't manage to screw all this up somehow, but I'm not quite sure the exact explanation of the problem Zitron has given fits together.

(The order of the prompts vs. user interactions does matter, especially for caching... but I think you could just cut and paste the user interactions to separate it from the old prompt and stick a new prompt on it in whatever order works best? You would get wildly varying quality in output generated as it switches between models and prompts, but this wouldn't add in more computation...)

Zitron mentioned a scoop, so I hope/assume someone did some prompt hacking to get GPT-5 to spit out some of it's behind the scenes prompts and he has solid proof about what he is saying. I wouldn't put anything past OpenAI for certain.

[–] scruiser@awful.systems 6 points 7 months ago

If they got a lot of usage out of a model this constant cost would contribute little to the cost of each model in the long run... but considering they currently replace/retrain models every 6 months to 1 year, yeah this cost should be factored in as well.

Also, training compute grows quadratically with model size, because its is a multiple of training data (which grows linearly with model size) and the model size.

[–] scruiser@awful.systems 4 points 7 months ago

Even bigger picture... some standardized way of regularly handling possible combinations of letters and numbers that you could use across multiple languages. Like it handles them as expressions?

[–] scruiser@awful.systems 6 points 7 months ago (4 children)

I know like half the facts I would need to estimate it... if you know the GPU vRAM required for the video generation, and how long it takes, then assuming no latency, you could get a ballpark number looking at nVida GPU specs on power usage. For instance, if a short clip of video generation needs 90 GB VRAM, then maybe they are using an RTX 6000 Pro... https://www.nvidia.com/en-us/products/workstations/professional-desktop-gpus/ , take the amount of time it takes in off hours which shouldn't have a queue time... and you can guessestimate a number of Watt hours? Like if it takes 20 minutes to generate, then at 300-600 watts of power usage that would be 100-200 watt hours. I can find an estimate of $.33 per kWh (https://www.energysage.com/local-data/electricity-cost/ca/san-francisco-county/san-francisco/ ), so it would only be costing $.03 to $.06.

IDK how much GPU-time you actually need though, I'm just wildly guessing. Like if they use many server grade GPUs in parallel, that would multiply the cost up even if it only takes them minutes per video generation.

view more: ‹ prev next ›