this post was submitted on 06 Feb 2025
1 points (100.0% liked)

Self-Hosted Alternatives to Popular Services

224 readers
2 users here now

A place to share, discuss, discover, assist with, gain assistance for, and critique self-hosted alternatives to our favorite web apps, web...

founded 2 years ago
MODERATORS
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/selfhosted by /u/yoracale on 2025-02-06 17:50:12+00:00.


Hey lovely people! Thanks for the love for our R1 Dynamic 1.58-bit GGUF last week! Today, you can now train your own reasoning model on your own local device. You'll only need 7GB of VRAM to do it!

  1. R1's "aha" reasoning moment is recreated through an algorithm called GRPO, and we at Unsloth enhanced the entire process to making it use 80% less VRAM.
  2. We're not trying to replicate R1's accuracy as that's unlikely, what we're trying to do is recreate R1's reasoning/thinking process aka "aha" moment.
  3. You can transform Llama 3.1 (8B), Phi-4 (14B) or any model up to 15B parameters (for 16GB VRAM) into a reasoning model.
  4. This is NOT fine-tuning the distilled R1 models or using distilled data from the R1 model. This is the actual process DeepSeek used to train R1 with.
  5. In a test example, even though we only trained Phi-4 in an hour, the results are already clear. The model without GRPO does not have the thinking process, whilst the one trained with GRPO does and also has the correct answer.
  • Unsloth allows you to reproduce R1-Zero's "aha" moment on 7GB VRAM locally or on Google Colab for free (15GB VRAM GPU).
  • Blog for more details + guide:

To use locally, install Unsloth by following the blog's instructions then copy + run our notebook from Colab. Installation instructions are here.

I know some of you guys don't have GPUs (we're trying to make CPU training work), but worry not, you can do it for free on Colab/Kaggle using their free 16GB GPUs.

Our notebook + guide to use GRPO with Phi-4 (14B): https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi/4/(14B)-GRPO.ipynb

Happy local training! :)

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here