doodlebob

joined 2 years ago
[–] doodlebob@lemmy.world 1 points 1 day ago (1 children)

I'm just gonna try vllm, seems like ik_llama.cpp doesnt have a quick docker method

[–] doodlebob@lemmy.world 2 points 2 days ago (2 children)

IK sounds promising! Will check it out to see if it can run in a container

[–] doodlebob@lemmy.world 1 points 2 days ago (5 children)

I'll take a look at both tabby and vllm tomorrow

Hopefully there's cpu offload in the works so I can test those crazy models without too much fiddling in the future (server also has 128gb of ram)

[–] doodlebob@lemmy.world 1 points 2 days ago (7 children)

Unfortunately i didn't set up nvlink, but ollama auto splits things for models which require it

I really just a "set and forget" model server lol (that's why I keep mentioning the auto offload)

Ollama integrates nicely with OWUI

[–] doodlebob@lemmy.world 1 points 2 days ago (9 children)

omg, I'm retarded. Your comment made me start thinking about things and...I've been using q4 without knowing it... I assumed ollama ran the fp16 by default 😬

about vllm, yeah I see that you have to specify how much to offload manually which I wasn't a fan of. I have 4x 3090 in an ML server at the moment but I'm using those for all AI workloads so the VRAM is shared for TTS/STT/LLM/Image Gen

thats basically why I kind of really want auto offload

[–] doodlebob@lemmy.world 1 points 2 days ago (11 children)

yeah, im currently running the gemma 27b model locally I recently took a look at vllm but the only reason i didnt want to switch is because it doesnt have automatic offloading (seems that it's a manual thing right now)

[–] doodlebob@lemmy.world 1 points 3 days ago (13 children)

Just read the L1 post and I'm just now realizing this is mainly for running quants which I generally avoid

I guess I could spin it up just to mess around with it but probably wouldn't replace my main model

[–] doodlebob@lemmy.world 1 points 3 days ago (14 children)

Thanks, will check that out!

[–] doodlebob@lemmy.world 4 points 3 days ago (17 children)

I'm currently using ollama to serve llms, what's everyone using for these models?

I'm also using open webui as well and ollama seemed the easiest (at the time) to use in conjunction with that

[–] doodlebob@lemmy.world 2 points 1 week ago (1 children)

Yeah, I went a little crazy with it and built out a server just for AI/ML stuff 😬

[–] doodlebob@lemmy.world 1 points 1 week ago (3 children)

Looks to be 20gb of vram

[–] doodlebob@lemmy.world 1 points 1 week ago (6 children)

The Gemma 27b model has been solid for me. Using chatterbox for TTS as well

 

Since the whole security issue popped up, I decided to disable remote access for both my udm pro and UNVR.

I am able to access the udm pro via the unifi app through wireguard but I am unable to access protect.

Has anyone gotten this to work?

 

Has anyone successfully deployed something like Subs AI within unraid?

Basically I'd like to use this to grab all the missing subtitles that Bazarr isn't able to grab.

PS: If any one knows of a similar app with a scheduler built into the webui, please let me know

 

Hi guys, looks like the used dell 2080 ti I bought off of reddit died after a couple of months of life.

I have been throwing some AI workloads at it (image generation, speech to text, etc) and it looks like the Nvidia driver randomly stopped seeing it. Tried downgrading the driver version and rebooting but as soon as I started throwing some AI workloads at it, the same thing happened.

Can anyone suggest a good dual slot GPU? Doesn't really need to be one of the consumer cards as I'll only be using this for AI workloads and transcoding via tdarr and Plex.

Thank you!

 

Has anyone run these yet? I'm super interested in having speech to text running as quickly as in there demos. Wake words are also a must now too!

view more: next ›