overview for foolsh

Mark Zuckerberg just declared war on the entire advertising industry in c/thevergenews@sh.itjust.works

[–] foolsh_one@sh.itjust.works 2 points 3 months ago

Waaa, sounds like he got dumped and tried to tell his friends that he broke up them, Dollars to donuts this whole things' a pr move.... but meh every_thing_is that washes up from his direction.

k80, 3060 or p40 for performance in c/localllama@sh.itjust.works

[–] foolsh_one@sh.itjust.works 1 points 2 years ago

Also you're asking about multi gpu, I have a few other cards stuffed in my backplane. The GeForce GTX 1050 Ti has 4GB of vram, and is comparable to the P40 in performance. I have split a larger 33B model on the two cards. Splitting a large model is of course slower than running on one card alone, but is much faster than cpu (even with 48 threads). However speed when splitting depends on the speed of the pci-e bus, which for me is limited to gen 1 speeds for now. If you have a faster/newer pci-e standard then you'll see better results than me.

k80, 3060 or p40 for performance in c/localllama@sh.itjust.works

[–] foolsh_one@sh.itjust.works 1 points 2 years ago* (last edited 2 years ago)

Correct my backplane doesn't have the flow of big server box, also another gotcha is the P40 uses a 8-pin CPU power plug not a 8-pin GPU

Edit 8 pin not 6 pin

k80, 3060 or p40 for performance in c/localllama@sh.itjust.works

[–] foolsh_one@sh.itjust.works 1 points 2 years ago* (last edited 2 years ago) (3 children)

The P40 doesn't have active cooling, it really needs forced air flow which I grabbed one of these for

https://www.ebay.com/itm/285241802202

It's even cheaper now than when I bought mine.

k80, 3060 or p40 for performance in c/localllama@sh.itjust.works

[–] foolsh_one@sh.itjust.works 2 points 2 years ago (5 children)

I have a p40 I'd be glad to run a benchmark on, just tell me how. I have Ooba and llama.cpp installed on linux Ubuntu 22.04, it's a Dell r620 with 2 x 12 3.5 Ghz cores (2 threads per core for 48 threads) Xeon with 256GB ram @ 1833Mhz, I have a pci-e gen 1 20 slot backplane. The speed of the pci-e bus might impact the loading time of the large models, but seems to not affect the speed of inference.

I went for the p40 for costs per GB of vram, speed was less important to me than being able to load the larger models at all. Including the fan and fan coupling i'm all in about $250 per card. I'm planning on adding more in the future, I to suffer from too many pci-e slots.

The cuda version I dont think will become an issue anytime to soon but is coming to be sure.