Keep that n8n updated. Theres been several high and critical severity CVE's recently and I'm betting more to come
Selfhosted
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
Rules:
-
Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
-
No spam posting.
-
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
-
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
-
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
-
No trolling.
-
No low-effort posts. This is subjective and will largely be determined by the community member reports.
Resources:
- selfh.st Newsletter and index of selfhosted software and apps
- awesome-selfhosted software
- awesome-sysadmin resources
- Self-Hosted Podcast from Jupiter Broadcasting
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
I only use a N100 CPU 16Gb mini PC for self hosting, are there models nimble enough that could run without melting the PC?
https://whatmodelscanirun.com/
Tells you, well, which models you can run and which performance to expect.
Free bullshit generator
No, not free, OPs power bill just climbed behind the scenes to match. Probably a discount but definitely not free.
Unless OP is running a data center, then there's not really much of a power increase to run a local Ollama.
Running a thousand watts and not running a thousand watts can be quiet a difference depending on where you live. And then consider buying all of the hardware. In many cases it's probably cheaper to just pay $40 al month.
Do you think it runs at 1000w continuously? On any decent GPU, the responses are nearly instantaneous to maybe a few seconds of runtime at maybe max GPU consumption.
Compare that to playing a few hours of cyberpunk 2077 with raytracing and maxed out settings at 4k.
Don't get me wrong, there's a lot to hate about AI/LLMs, but running one locally without data harvesting engines is pretty minimal. The creation of the larger models is where the consumption primarily comes in, and then the data centers that run them are servicing millions of inquiries a minute making the concentration of consumption at a single point significantly higher (plus they retrain the model there on current and user-fed data, including prompts, whereas your computer hosting ollama would not.)
What whack ass setup so you think OP has? Dual 5090s? They’re running it on an i7.
It's also an 8 gigaparameter model. That's pretty tiny, even if they use it heaps.
That would be true worst case, but you're never running inference 24/7. It's no crazier than gaming in that regard.
Well it's winter so any power usage he spends, he gets it back as heat
While this is correct, sometimes it can be free. I live in a cold climate, and over the winter I hooked up a folding@home computer in my office to keep things a bit warmer. Computers are 100% as efficient as a space heater.
Of course now that it's getting warm things are changing. I'm actually in the middle of doing my last folding@home tasks until the temps drop next fall.
Even in very cold regions heat pumps maintain a COP>1, so even running it as a space heater may not be free if you have access to a more efficient alternative. Also that may be a responsible justification for Folding@Home, but I doubt OP is turning off their LLM in the summer.
I hate that LLMs are called "AI", but they do have some uses if trained on the right data set (rather than pirating all the data of all of internet and calling making the LLM think it's valid data). I have been wanting to set one up for my Home Assistant voice control so that it can better understand my speech. Also, for better image component recognition for tagging in Immich.
I wish they would force the companies to release their training data sets considering they are getting a lot of it illegally (not that I'm a big copyright fan, but it's crappy that copyright applies to individuals and small businesses, but not to big rich people and corporate backed companies. And attribution, and copyleft policy if the creator wants it, is something I agree with strongly.) If we could get the data sets and pick and choose what portions we want to include and then train our own LLMs, it would be better. It's why scientific LLMs actually are useful. They are primarily only trained with peer reviewed scientific data not 4Chan and Reddit craziness or training it with SciFi and parody works as fact. No wonder it hallucinates.
Bullshit in, bullshit out, to paraphrase. If you teach a toddler that propaganda on 4chan or with SciFi, parodies, and hate speech as fact rather than giving it all context, they turn out to be the people who post thst nonsense. But the people funding it want quick results with no effort, and that's what they get. A poorly educated child randomly spouting nonsense. LOL
In as much as I rail against regulation, or more so....over regulation, AI needs some heavy regulation. We stand at the crossroads of a very useful tool that is unfortunately hung up in the novelty stage of pretty pictures and AI rice cookers. It could be so much more. I use AI in a few things. For one, I use AI to master the music I create. I am clinically deaf, so there are frequencies that I just can't hear well enough to make a call. So, I lean on AI to do that, and it does it quite well actually. I use AI to solve small programming issues I'm working on, but I wouldn't dare release anything I've done, AI or not, because I can always see some poor chap who used my 'code', and now smoke is billowing out of his computer. It's also pretty damn good at compose files. I've read about medical uses that sound very efficient in ingesting tons of patient records and reports and pinpointing where services could do better in aiding the patient so that people don't fall through the cracks and get the medical treatment they need. So, it has some great potential if we could just get some regulation and move past this novelty stage.
Stick to Mistral, who are EU based.
Any quality difference?
IMO there's a significant drop off with local LLMs vs the mainstream ones. This can be mitigated somewhat though by using web search tools or using retrieval augmented generation.
Basically the local models don’t (and can’t) contain the full knowledge of the universe.
BUT they can call tools pretty well and if you give the harness the capability to search Wikipedia for example, it becomes a lot smarter
Depending what OP was using before but going from something like GPT5.2 to LLama 3 8B will be a massive difference (Although OP says to use it only for basic tasks so that does offset it)
LLama 3 already being a very old model doesn't help either
I run Qwen3.5-35B-A3B-AWQ-4bit which while leagues ahead of LLama 3 8B still is a very noticeable difference.
This is not to say open source is bad, if one had the resources to run something like Qwen3.5-397B-A17B it would also be up there.
What kind of hardware do you need to run those models?
I'm running 2x4090, the 35B fits very comfortable in that.
For large models like the 397B without a ton of money there are several ways, ive seen posts of people using arrays of used 3090s with good results.
The other option is CPU inference although with current RAM prices that is less cost effective.
I was looking at maybe an array of Milk-V JUPITER2 since vllm added riscv support which could be very cost effective.
Depends on how much quantization, but still fairly beefy, couldn't run it on my homelab with a 3080ti for example.
I generally use smaller 8-12b models and they're alright depending on the task.
In general, you take the model size in billions of parameters (eg: 397B), divide it by 2 and add a bit for overhead, and that’s how much RAM/VRAM it takes to run it at a “normal” quantization level. For Qwen3.5-397B, that’s about 220 GB. Ideally that would be all VRAM for speed, but you can offload some or all of that to normal RAM on the CPU, you’ll just take a speed hit.
So for something like Qwen3.5-397B, it takes a pretty serious system, especially if you’re trying to do it all in VRAM.
What's the model name to pull?
Qwen3.5 and Gemma4 are the best ones for tool calling that don’t need massive amounts of memory
Probably use Gemma4 if your machine has the chops for it.
You could probably get away with using gemma3:4b or phi3.5.
I only ever use my local ai for home assistant voice assistant on my phone, but it's more of a gimmick/party trick since I only have temperatures sensors currently (only got into ha recently) and it can't access WiFi so it's just quietly sitting unloaded on my truenas server
Running any LLM on truenas is not awesome. I've tried it with GPU passthrough and it's just too much overhead. I may just burn all my stuff down and restart with Proxmox, run Truenas core inside just for NAS. The idea of a converged nas+virtualization is wonderful, but it's just not there.
The host networking model alone is such a pain, then you get into performance stuff. I still like Truenas a lot, but I think that Proxmox is probably still the better platform.