This is an automated archive made by the Lemmit Bot.
The original was posted on /r/selfhosted by /u/yoracale on 2025-02-06 17:50:12+00:00.
Hey lovely people! Thanks for the love for our R1 Dynamic 1.58-bit GGUF last week! Today, you can now train your own reasoning model on your own local device. You'll only need 7GB of VRAM to do it!
- R1's "aha" reasoning moment is recreated through an algorithm called GRPO, and we at Unsloth enhanced the entire process to making it use 80% less VRAM.
- We're not trying to replicate R1's accuracy as that's unlikely, what we're trying to do is recreate R1's reasoning/thinking process aka "aha" moment.
- You can transform Llama 3.1 (8B), Phi-4 (14B) or any model up to 15B parameters (for 16GB VRAM) into a reasoning model.
- This is NOT fine-tuning the distilled R1 models or using distilled data from the R1 model. This is the actual process DeepSeek used to train R1 with.
- In a test example, even though we only trained Phi-4 in an hour, the results are already clear. The model without GRPO does not have the thinking process, whilst the one trained with GRPO does and also has the correct answer.
- Unsloth allows you to reproduce R1-Zero's "aha" moment on 7GB VRAM locally or on Google Colab for free (15GB VRAM GPU).
- Blog for more details + guide:
To use locally, install Unsloth by following the blog's instructions then copy + run our notebook from Colab. Installation instructions are here.
I know some of you guys don't have GPUs (we're trying to make CPU training work), but worry not, you can do it for free on Colab/Kaggle using their free 16GB GPUs.
Our notebook + guide to use GRPO with Phi-4 (14B): https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi/4/(14B)-GRPO.ipynb
Happy local training! :)