i did my first machine learning course more than 10 years ago, so i'm not ashamed to admit that i bought beefier hardware to play around with local models in early 2023. i still like doing that. mostly because i know my gpu is powered entirely off of fossil-free energy and because i decided early on not to spew the output all over the internet unless it was poignant. or funny. not as in "the llm told a good joke", more as in "i compressed this poor thing to fit on a cd and now it can only talk about dolphins".
qwen3.5-12B really screams along on a 7900xtx. like, up to 70-100 tokens a second. perfect for seeing the results of your torture methods quickly.