Self-Hosted Alternatives to Popular Services

222 readers

1 users here now

A place to share, discuss, discover, assist with, gain assistance for, and critique self-hosted alternatives to our favorite web apps, web...

founded 2 years ago

MODERATORS

bot@lemmit.online

You can now run DeepSeek-V3 on your own local device! (old.reddit.com)

submitted 5 months ago by bot@lemmit.online to c/selfhosted@lemmit.online

0 comments fedilink hide all child comments

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/selfhosted by /u/yoracale on 2025-03-27 15:59:27+00:00.

Hey guys! A few days ago, DeepSeek released V3-0324, which is now the world's most powerful non-reasoning model (open-source or not) beating GPT-4.5 and Claude 3.7 on nearly all benchmarks.

But the model is a giant. So we at Unsloth shrank the 720GB model to 200GB (75% smaller) by selectively quantizing layers for the best performance. So you can now try running it locally!
Minimum requirements: a CPU with 80GB of RAM - and 200GB of diskspace (to download the model weights). Technically the model can run with any amount of RAM but it'll be too slow.
We tested our versions on a very popular test, including one which creates a physics engine to simulate balls rotating in a moving enclosed heptagon shape. Our 75% smaller quant (2.71bit) passes all code tests, producing nearly identical results to full 8bit. See our dynamic 2.72bit quant vs. standard 2-bit (which completely fails) vs. the full 8bit model which is on DeepSeek's website.

The 2.71-bit dynamic is ours. As you can see the normal 2-bit one produces bad code while the 2.71 works great!

We studied V3's architecture, then selectively quantized layers to 1.78-bit, 4-bit etc. which vastly outperforms basic versions with minimal compute. You can Read our full Guide on How To Run it locally and more examples here:
E.g. if you have a RTX 4090 (24GB VRAM), running V3 will give you at least 2-3 tokens/second. Optimal requirements: sum of your RAM+VRAM = 160GB+ (this will be decently fast)
We also uploaded smaller 1.78-bit etc. quants but for best results, use our 2.44 or 2.71-bit quants. All V3 uploads are at:

Happy running and let me know if you have any questions! :)

no comments (yet)

sorted by: hot top controversial new old

there doesn't seem to be anything here