This is an automated archive made by the Lemmit Bot.
The original was posted on /r/selfhosted by /u/schaka on 2025-03-09 10:30:31+00:00.
Please excuse the generic, sounding name. I guess there's a reason I'm not working in a creative job.
A couple weeks ago, I took a look what GPUs I could buy for a new pet project that could drop to 10-20W at idle even with VRAM fully loaded. An RTX 3060 seemed like it may not have enough VRAM to keep a few models loaded, a Tesla P100 which also costs around 250€ - the P40 shot to nearly 500€ here and they seem to have issues with idle power draw or at least it seems like they required workarounds.
Documentation on this entire topic seemed limited and I, like most people, don't have 500€ to spend on something I wasn't even sure would succeed. So I started looking at Radeon Instinct MI50s and how feasible it might be to use ROCm.
It seems ROCm has come a long way in the past 2-3 years and the big "engines" all support ROCm. You sometimes still have to build it yourself, but when I did the research, it seemed like there was nothing holding AMD cards back anymore, except for the lack of tensor cores.
So I bought the card and got the software running. Now I'm trying to figure out how to best integrate it into Home Assistant and get the most out of it. It seems like there are a few ways to go about it and no well documented best practice.
So I'm curious - how are you doing it, if at all? What Conversation Agent are you using? Which models have you had success with?