this post was submitted on 10 Sep 2025
24 points (70.0% liked)

Selfhosted

51355 readers
244 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS
 

tl-dr

-Can someone give me step by step instructions (ELI5) on how to get access to my LLM's on my rig from my phone?

Jan seems the easiest but I've tried with Ollama, librechat, etc.

.....

I've taken steps to secure my data and now I'm going the selfhosting route. I don't care to become a savant with the technical aspects of this stuff but even the basics are hard to grasp! I've been able to install a LLM provider on my rig (Ollama, Librechat, Jan, all of em) and I can successfully get models running on them. BUT what I would LOVE to do is access the LLM's on my rig from my phone while I'm within proximity. I've read that I can do that via wifi or LAN or something like that but I have had absolutely no luck. Jan seems the easiest because all you have to do is something with an API key but I can't even figure that out.

Any help?

you are viewing a single comment's thread
view the rest of the comments
[–] BlackSnack@lemmy.zip 1 points 1 day ago (2 children)

Bet. Looking into that now. Thanks!

I believe I have 11g of vram, so I should be good to run decent models from what I’ve been told by the other AIs.

[–] brucethemoose@lemmy.world 1 points 1 day ago* (last edited 1 day ago)

In case I miss your reply, assuming a 3080 + 64 GB of RAM, you want the IQ4_KSS (or IQ3_KS, for more RAM for tabs and stuff) version of this:

https://huggingface.co/ubergarm/GLM-4.5-Air-GGUF

Part of it will run on your GPU, part will live in system RAM, but ik_llama.cpp does the quantizations split and GPU offloading in a particularly efficient way for these kind of 'MoE' models. Follow the instructions on that page.

If you 'only' have 32GB RAM or less, that's tricker, and the next question is what kind of speeds do you want. But it's probably best to wait a few days and see how Qwen3 80B looks when it comes out. Or just go with the IQ4_K version of this: https://huggingface.co/ubergarm/Qwen3-30B-A3B-Thinking-2507-GGUF

And you don't strickly need the hyper optimization of ik_llama.cpp for a small model like Qwen3 30B. Something easier like lm studio or the llama.cpp docker image would be fine.

Alternatively, you could try to squeeze Gemma 27B into that 11GB VRAM, but it would be tight.

[–] brucethemoose@lemmy.world 1 points 1 day ago* (last edited 1 day ago)

How much system RAM, and what kind? DDR5?

ik doesn't have great documentation, so it'd be a lot easier for me to just point you places, heh.