this post was submitted on 12 Apr 2026
81 points (78.3% liked)

Selfhosted

58609 readers
520 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

  7. No low-effort posts. This is subjective and will largely be determined by the community member reports.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS
 

Quick post about a change I made that's worked out well.

I was using OpenAI API for automations in n8n — email summaries, content drafts, that kind of thing. Was spending ~$40/month.

Switched everything to Ollama running locally. The migration was pretty straightforward since n8n just hits an HTTP endpoint. Changed the URL from api.openai.com to localhost:11434 and updated the request format.

For most tasks (summarization, classification, drafting) the local models are good enough. Complex reasoning is worse but I don't need that for automation workflows.

Hardware: i7 with 16GB RAM, running Llama 3 8B. Plenty fast for async tasks.

all 34 comments
sorted by: hot top controversial new old

Keep that n8n updated. Theres been several high and critical severity CVE's recently and I'm betting more to come

[–] Leax@lemmy.dbzer0.com 3 points 6 days ago (1 children)

I only use a N100 CPU 16Gb mini PC for self hosting, are there models nimble enough that could run without melting the PC?

[–] kossa@feddit.org 2 points 5 days ago* (last edited 5 days ago)

https://whatmodelscanirun.com/

Tells you, well, which models you can run and which performance to expect.

[–] 0ndead 52 points 1 week ago (2 children)
[–] Ludicrous0251@piefed.zip 17 points 1 week ago (3 children)

No, not free, OPs power bill just climbed behind the scenes to match. Probably a discount but definitely not free.

[–] webkitten@piefed.social 21 points 1 week ago (1 children)

Unless OP is running a data center, then there's not really much of a power increase to run a local Ollama.

[–] doodledup@lemmy.world -4 points 6 days ago (3 children)

Running a thousand watts and not running a thousand watts can be quiet a difference depending on where you live. And then consider buying all of the hardware. In many cases it's probably cheaper to just pay $40 al month.

[–] semperverus@lemmy.world 9 points 6 days ago* (last edited 6 days ago)

Do you think it runs at 1000w continuously? On any decent GPU, the responses are nearly instantaneous to maybe a few seconds of runtime at maybe max GPU consumption.

Compare that to playing a few hours of cyberpunk 2077 with raytracing and maxed out settings at 4k.

Don't get me wrong, there's a lot to hate about AI/LLMs, but running one locally without data harvesting engines is pretty minimal. The creation of the larger models is where the consumption primarily comes in, and then the data centers that run them are servicing millions of inquiries a minute making the concentration of consumption at a single point significantly higher (plus they retrain the model there on current and user-fed data, including prompts, whereas your computer hosting ollama would not.)

[–] fuckwit_mcbumcrumble@lemmy.dbzer0.com 13 points 6 days ago (1 children)

What whack ass setup so you think OP has? Dual 5090s? They’re running it on an i7.

[–] T156@lemmy.world 7 points 6 days ago* (last edited 6 days ago)

It's also an 8 gigaparameter model. That's pretty tiny, even if they use it heaps.

[–] StripedMonkey@lemmy.zip 14 points 6 days ago

That would be true worst case, but you're never running inference 24/7. It's no crazier than gaming in that regard.

[–] Mubelotix@jlai.lu 5 points 6 days ago

Well it's winter so any power usage he spends, he gets it back as heat

[–] friend_of_satan@lemmy.world 3 points 5 days ago (1 children)

While this is correct, sometimes it can be free. I live in a cold climate, and over the winter I hooked up a folding@home computer in my office to keep things a bit warmer. Computers are 100% as efficient as a space heater.

Of course now that it's getting warm things are changing. I'm actually in the middle of doing my last folding@home tasks until the temps drop next fall.

[–] Ludicrous0251@piefed.zip 4 points 5 days ago

Even in very cold regions heat pumps maintain a COP>1, so even running it as a space heater may not be free if you have access to a more efficient alternative. Also that may be a responsible justification for Folding@Home, but I doubt OP is turning off their LLM in the summer.

[–] irotsoma@piefed.blahaj.zone 10 points 1 week ago (2 children)

I hate that LLMs are called "AI", but they do have some uses if trained on the right data set (rather than pirating all the data of all of internet and calling making the LLM think it's valid data). I have been wanting to set one up for my Home Assistant voice control so that it can better understand my speech. Also, for better image component recognition for tagging in Immich.

I wish they would force the companies to release their training data sets considering they are getting a lot of it illegally (not that I'm a big copyright fan, but it's crappy that copyright applies to individuals and small businesses, but not to big rich people and corporate backed companies. And attribution, and copyleft policy if the creator wants it, is something I agree with strongly.) If we could get the data sets and pick and choose what portions we want to include and then train our own LLMs, it would be better. It's why scientific LLMs actually are useful. They are primarily only trained with peer reviewed scientific data not 4Chan and Reddit craziness or training it with SciFi and parody works as fact. No wonder it hallucinates.

Bullshit in, bullshit out, to paraphrase. If you teach a toddler that propaganda on 4chan or with SciFi, parodies, and hate speech as fact rather than giving it all context, they turn out to be the people who post thst nonsense. But the people funding it want quick results with no effort, and that's what they get. A poorly educated child randomly spouting nonsense. LOL

[–] irmadlad@lemmy.world 8 points 1 week ago (1 children)

In as much as I rail against regulation, or more so....over regulation, AI needs some heavy regulation. We stand at the crossroads of a very useful tool that is unfortunately hung up in the novelty stage of pretty pictures and AI rice cookers. It could be so much more. I use AI in a few things. For one, I use AI to master the music I create. I am clinically deaf, so there are frequencies that I just can't hear well enough to make a call. So, I lean on AI to do that, and it does it quite well actually. I use AI to solve small programming issues I'm working on, but I wouldn't dare release anything I've done, AI or not, because I can always see some poor chap who used my 'code', and now smoke is billowing out of his computer. It's also pretty damn good at compose files. I've read about medical uses that sound very efficient in ingesting tons of patient records and reports and pinpointing where services could do better in aiding the patient so that people don't fall through the cracks and get the medical treatment they need. So, it has some great potential if we could just get some regulation and move past this novelty stage.

[–] TropicalDingdong@lemmy.world 8 points 1 week ago (2 children)
[–] yellerbadger@piefed.social 3 points 6 days ago (1 children)

IMO there's a significant drop off with local LLMs vs the mainstream ones. This can be mitigated somewhat though by using web search tools or using retrieval augmented generation.

[–] lepinkainen@lemmy.world 2 points 6 days ago

Basically the local models don’t (and can’t) contain the full knowledge of the universe.

BUT they can call tools pretty well and if you give the harness the capability to search Wikipedia for example, it becomes a lot smarter

[–] TheMightyCat@ani.social 12 points 1 week ago* (last edited 1 week ago) (1 children)

Depending what OP was using before but going from something like GPT5.2 to LLama 3 8B will be a massive difference (Although OP says to use it only for basic tasks so that does offset it)

LLama 3 already being a very old model doesn't help either

I run Qwen3.5-35B-A3B-AWQ-4bit which while leagues ahead of LLama 3 8B still is a very noticeable difference.

This is not to say open source is bad, if one had the resources to run something like Qwen3.5-397B-A17B it would also be up there.

[–] Valmond@lemmy.dbzer0.com 2 points 1 week ago (3 children)

What kind of hardware do you need to run those models?

[–] TheMightyCat@ani.social 5 points 1 week ago* (last edited 1 week ago)

I'm running 2x4090, the 35B fits very comfortable in that.

For large models like the 397B without a ton of money there are several ways, ive seen posts of people using arrays of used 3090s with good results.

The other option is CPU inference although with current RAM prices that is less cost effective.

I was looking at maybe an array of Milk-V JUPITER2 since vllm added riscv support which could be very cost effective.

[–] Jakeroxs@sh.itjust.works 5 points 1 week ago

Depends on how much quantization, but still fairly beefy, couldn't run it on my homelab with a 3080ti for example.

I generally use smaller 8-12b models and they're alright depending on the task.

[–] suicidaleggroll@lemmy.world 4 points 1 week ago* (last edited 1 week ago)

In general, you take the model size in billions of parameters (eg: 397B), divide it by 2 and add a bit for overhead, and that’s how much RAM/VRAM it takes to run it at a “normal” quantization level. For Qwen3.5-397B, that’s about 220 GB. Ideally that would be all VRAM for speed, but you can offload some or all of that to normal RAM on the CPU, you’ll just take a speed hit.

So for something like Qwen3.5-397B, it takes a pretty serious system, especially if you’re trying to do it all in VRAM.

[–] kambusha@sh.itjust.works 7 points 1 week ago (2 children)

What's the model name to pull?

[–] lepinkainen@lemmy.world 2 points 6 days ago

Qwen3.5 and Gemma4 are the best ones for tool calling that don’t need massive amounts of memory

[–] ikidd@lemmy.world 4 points 1 week ago (1 children)

Probably use Gemma4 if your machine has the chops for it.

[–] webkitten@piefed.social 2 points 1 week ago* (last edited 6 days ago)

You could probably get away with using gemma3:4b or phi3.5.

[–] Shady_Shiroe@lemmy.world 4 points 1 week ago (1 children)

I only ever use my local ai for home assistant voice assistant on my phone, but it's more of a gimmick/party trick since I only have temperatures sensors currently (only got into ha recently) and it can't access WiFi so it's just quietly sitting unloaded on my truenas server

[–] blargh513@sh.itjust.works 4 points 6 days ago

Running any LLM on truenas is not awesome. I've tried it with GPU passthrough and it's just too much overhead. I may just burn all my stuff down and restart with Proxmox, run Truenas core inside just for NAS. The idea of a converged nas+virtualization is wonderful, but it's just not there.

The host networking model alone is such a pain, then you get into performance stuff. I still like Truenas a lot, but I think that Proxmox is probably still the better platform.