this post was submitted on 23 Jul 2025
24 points (96.2% liked)

homeassistant

15603 readers
2 users here now

Home Assistant is open source home automation that puts local control and privacy first.
Powered by a worldwide community of tinkerers and DIY enthusiasts.

Home Assistant can be self-installed on ProxMox, Raspberry Pi, or even purchased pre-installed: Home Assistant: Installation

Discussion of Home-Assistant adjacent topics is absolutely fine, within reason.
If you're not sure, DM @GreatAlbatross@feddit.uk

founded 2 years ago
MODERATORS
 

What is everyone using for the LLM model for HA voice when selfhosting ollama? I've tried llama and qwen with varying degrees of understanding my commands. I'm currently on llama as it appears a little better. I just wanted to see if anyone found a better model.

Edit: as pointed out, this is more of a speech to text issue than llm model. I'm looking into the alternatives to whisper

top 17 comments
sorted by: hot top controversial new old

None. They're pretty awful for this purpose. I'm working out a build for something a bit different for voice commands that I may release in the next couple of weeks.

[–] kaaskop@feddit.nl 3 points 1 week ago (1 children)

I used the llama 3.2 3b model for a while. It ran Okeyish enough on a laptop with gtx1050 (about 10s to 2 minute response time). I've personally opted to go without ollama for now though as the automations and build in functions in the voice preview edition are more than enough for me at the moment especially with recent updates.

[–] smashing3606@feddit.online 2 points 1 week ago (1 children)

I don't think I've tried it without ollama, I may give that a shot.

[–] kaaskop@feddit.nl 2 points 1 week ago

If you do you should use speech-to-phrase instead of whisper. I've found it to be more reliable if you're using it with automation commands instead of an LLM. In your automations you can setup phrases as a trigger. It has support for aliases as well and I believe that it also supports templating for sentences.

[–] Rhaedas@fedia.io 1 points 1 week ago (1 children)

I don't use a HA so not familiar with the details of what's out there, but where are you having the problems? Is voice recognition fine and the model just isn't always following the directions precisely? If not, what are you using, whisper or something else? (I'm in search myself of a better voice to text that's local). Certainly by now there are local models fine tuned for being HA, which would work better than a general purpose that could drift or misunderstand common commands.

[–] smashing3606@feddit.online 1 points 1 week ago* (last edited 1 week ago) (3 children)

The issue is mainly voice recognition. Even if I pronounce stuff clear it thinks I've said something else.
Using whisper in HA.

[–] chaospatterns@lemmy.world 2 points 1 week ago* (last edited 1 week ago) (1 children)

That's not going to be fixed with a different LLM model though. I'm experiencing similar problems. If my stt is bad then, then the LLM just gets even more confused or requires a big model that doesnt run efficiently on my local GPU. won't trigger my custom automations because the tools don't consider custom automations phrases.

Speech2phrase improves accuracy for utterances that are basic like turn on X, or anything specified in an automation, but then struggles for other speech.

My next project is to implement a router that forwards the utterance to both speech2phrase and whisper and try to estimate which is correct.

[–] smashing3606@feddit.online 3 points 1 week ago* (last edited 1 week ago)

I was not aware of other alternatives to whisper. I will check them out.

[–] Rhaedas@fedia.io 2 points 1 week ago

The model isn't going to help there, then. I've been messing with some of the whisper variants like faster-whisper, also tried an older one called nerd-dictation, haven't yet found one that doesn't creep in garbage from time to time. And of course you have to make sure the data the VR is getting is clean of noise and a good level. It's tough to troubleshoot. The advantage is that LLMs might be able to pick through the crap and figure out what you really want, if there's enough trigger words there. I even had an uncensored one once call me out on a typo I made, which I thought was hilarious. But getting 100% accuracy with so many places that can error is a challenge. It's why I suggested finding or making (!) a fine tuned version that self limits what it responds to, to help put another filter to catch the problems. Ironic that the dumber things work better by just not doing anything when the process breaks.

Having used Voice Attack on the Windows side, the same thing applied. It wasn't that VA or Windows were better at picking up a voice command, but a matter of setting the probability of a match for a command low enough to catch a partial hit, while high enough to weed out the junk. So that's probably the goal here, but that gets into the coding for the voice recognition models, and I'm not good enough to go that deep.

[–] spitfire@lemmy.world 1 points 1 week ago

Then you need to fix your TTS not the system that interprets transcribed words.

[–] doodlebob@lemmy.world 1 points 1 week ago (2 children)

The Gemma 27b model has been solid for me. Using chatterbox for TTS as well

[–] smashing3606@feddit.online 2 points 1 week ago

thanks I'll give gemma a shot

[–] spitfire@lemmy.world 1 points 1 week ago (1 children)

27b - how much VRAM does it use?

[–] doodlebob@lemmy.world 1 points 1 week ago (1 children)
[–] spitfire@lemmy.world 2 points 1 week ago (1 children)

So basically for people who have graphic cards with 24GB VRAM (or more). While I do, it's probably something most people don't ;)

[–] doodlebob@lemmy.world 2 points 1 week ago (1 children)

Yeah, I went a little crazy with it and built out a server just for AI/ML stuff 😬

[–] spitfire@lemmy.world 1 points 1 week ago

I could probably run something on my gaming PC with 3090, but that would be a big cost. Instead I've just put my old 2070 in an existing server and using it for more lightweight stuff (TTS, obico, frigate, ollama with some small model).