this post was submitted on 09 Apr 2026

201 points (97.6% liked)

Fuck AI

6688 readers

524 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

AI, in this case, refers to LLMs, GPT technology, and anything listed as "AI" meant to increase market valuations.

founded 2 years ago

MODERATORS

VerbFlow@lemmy.world

MrMcGasion@lemmy.world

TootSweet@lemmy.world

BigMikeInAustin@lemmy.world

cynar@lemmy.world

drmeanfeel@lemmy.world

pavnilschanda@lemmy.world

CriticalMedicine@lemmy.world

WonderfulWanderer@lemmy.world

Communist@lemmy.ml

eatCasserole@lemmy.world

SpaceNoodle@lemmy.world

NutWrench@lemmy.world

Soup@lemmy.cafe

iAvicenna@lemmy.world

Tinks@lemmy.world

wizblizz@lemmy.world

corus_kt@lemmy.world

Prandom_returns@lemm.ee

JimSamtanko@lemm.ee

TrickDacy@lemmy.world

TheFriar@lemm.ee

ArmokGoB@lemmy.dbzer0.com

HawlSera@lemm.ee

andrew_bidlaw@sh.itjust.works

MeDuViNoX@sh.itjust.works

33550336@lemmy.world

Nougat@fedia.io

Lost_My_Mind@lemmy.world

Quill7513@slrpnk.net

glowing_hans@sopuli.xyz

e8d79@discuss.tchncs.de

ThefuzzyFurryComrade@pawb.social

201

"The Local Alternative" (Art by David Revoy) (infosec.pub)

submitted 1 day ago by ThefuzzyFurryComrade@pawb.social to c/fuck_ai@lemmy.world

19 comments fedilink hide all child comments

top 19 comments

sorted by: hot top controversial new old

[–] lime@feddit.nu 38 points 1 day ago (3 children)

i did my first machine learning course more than 10 years ago, so i'm not ashamed to admit that i bought beefier hardware to play around with local models in early 2023. i still like doing that. mostly because i know my gpu is powered entirely off of fossil-free energy and because i decided early on not to spew the output all over the internet unless it was poignant. or funny. not as in "the llm told a good joke", more as in "i compressed this poor thing to fit on a cd and now it can only talk about dolphins".

qwen3.5-12B really screams along on a 7900xtx. like, up to 70-100 tokens a second. perfect for seeing the results of your torture methods quickly.

[–] cecilkorik@piefed.ca 9 points 1 day ago (1 children)

gemma4 is also pretty amazing (both fast and unbelievably capable for its seemingly-small size) on modest hardware. TurboQuant seems like a really, really promising technique and I hope we'll start seeing the open source community developing it into something even more useful to keep democratizing the capabilities of this technology so we can all have access to the best and highest forms of it.

[–] lime@feddit.nu 1 points 13 hours ago

not tried gemma yet, i've stayed away from google stuff. maybe i'll give it a shot.

[–] turbofan211@lemmy.world 6 points 1 day ago (2 children)

Share more please.

[–] lime@feddit.nu 4 points 13 hours ago

one of my most recent fun activities came from discovering the "allow editing" button in koboldcpp. since the model is fed the entire conversation so far as its only context, and doesn't save data between iterations, you can basically re-write its memory on the fly. i knew this before but i'd never though to do it until there was an easy ui option for it, and it turned out to be a lot of fun, because when using a "thinking" model like qwen3.5 you can convince it that it's bypassing its own censorship.

basically you give the model a prompt to work off of, pause it in the middle of the thinking process, change previous thoughts to something it's been trained to filter out (like sex or violence or opinions critical of the ccp), and it will start second-guessing itself. sometimes it gets stuck in a loop, sometimes it overcomes the contradiction (at which point you can jump in again and tweak its memory some more) and sometimes it gets tied up in knots trying to prove a negative.

a previous experiment was about feeding stable diffusion images back into itself to see what happens. i was inspired by a talk at 37c3 where they demonstrated model collapse by repeatedly trying to generate the same image as they put in (i think this was how sora worked).

[–] cecilkorik@piefed.ca 1 points 21 hours ago

ollama is a comfortable starting point for most people. They offer cloud models but they're opt-in only and it's really primarily for local models. Nothing about the ecosystem is what you'd really call "stable", I suspect a lot of it is vibe coded (even the drivers and documentation for all these tools is sketchy AF, even when its coming from large companies, nvidia's documentation is atrocious) and you will often have really frustrating bugs and crashes and incompatibilities. It's a bit of a janky mess, if you're lucky it works out of the box but be cautious with it, it can be a bit fragile. Welcome to the brave new world.

ollama has a bunch of models you can direct download from them and run from the CLI with a super basic text interface, but where it starts to get more powerful is with the ollama server running on localhost:11434, it mimics OpenAI's API and can be connected to almost any other LLM frontend which makes it much more useful. OpenWebUI is a common one, I prefer LibreChat which is similar but the latest version has terrible context compression/summarization which has ruined it for me, they're both a bit frustratingly janky (again high likelihood of vibe coding throughout). Huggingface.co is the main source for user-created models, which are often optimized for particular usages and often have the models baked-in restrictions removed (look for keywords like abliterated, uncensored, heretic) although this can some with some loss in quality it may or may not be noticeable. The best format to look for is GGUF (in my experience) and you can use quantizations to get smaller versions of large models, named for the number of bits used, like Q4 (the numbers and letters after that are details not really worth diving into) means 4 bits, instead of the default 16 bits. That's a lot of quality loss numerically but it turns out to be not that bad in practice and it shrinks the memory usage (and download size) a lot. Q4 is generally considered a pretty reasonable and safe target for local usage and causes very little compromise in model quality while making it a lot more usable and faster. Other techniques like imatrix are useful for even smaller quants with smaller bits but also more quality loss.

The next step into the rabbithole is agentic tool-use harnesses. These are things that allow the AI to actually use tools (ranging from editing files or running command line tools or doing web-searches or far more complex things through things like MCP servers or "skills" which seem to be a somewhat newer and potentially better alternative). Harnesses range from extremely minimal, carefully controlled and directly manageable harnesses like pi/pi-mono, to the absolutely batshit crazy (do not recommend, terrifying) OpenClaw. There are more middle-ground harnesses like opencode (similar to open source claude code) or hermes. You can also redirect many commercial harnesses like claude, codex, cursor, etc onto your local Ollama either partially (to reduce API token usage) or completely (for privacy and control) How it works is, you just point them at your localhost:11434 ollama API and they talk to the models by providing all the context and prompts they need and if the model supports tool-calling it will send those command requests back to the harness and the harness will see them and run them. To be clear, the model is still just text. It's not "doing" anything. It's requesting the harness to do it. It spits out and receives specially formatted and tagged text for tools (which it has been specifically trained on) to teach it how to use tools and how to interpret the results. In my experience, commercial tool use harnesses like claude and codex are tightly coupled to their company's models and don't do a great job with open source/open weights models, so I don't really recommend wasting your time with them for local use.

For generative images or video ComfyUI seems to be the go-to runner and Civitai.com seems to be the main source for models and these are a different kind of models (stable diffusion rather than LLM) and I don't have a lot of experience with generative images and think any potential uses are even less ethically defensible than LLMs in a lot of ways, so I generally avoid them.

Hopefully that isn't too confusing or overwhelming and gives you at least a summary, starting point and many keywords to look into. I don't know everything about this topic and I'm still learning myself, this is what I have learned so far and some of it might be wrong. Good luck!

[–] TropicalDingdong@lemmy.world 1 points 21 hours ago (1 children)

Laughs in Strix Halo

[–] lime@feddit.nu 2 points 13 hours ago

yeah one of those framework machines with 128GB shared ram would have been amazing. shame they're sending money to racists.

[–] ZeroGravitas@lemmy.dbzer0.com 19 points 1 day ago

David is a treasure. His whole Avian Intelligence series is hilarious, look it up.

[–] Pat_Riot@lemmy.today 20 points 1 day ago (3 children)

An African swallow or a European swallow?

[–] dual_sport_dork@lemmy.world 15 points 1 day ago (1 children)

+++

OUT OF CHEESE ERROR

REDO FROM START

+++

[–] Kowowow@lemmy.ca 3 points 1 day ago

These parts never worked as well on audiobooks

[–] joyjoy@lemmy.zip 6 points 1 day ago

The avaian wasn't trained to ask questions.

[–] hakunawazo@lemmy.world 3 points 1 day ago

Umm... I don't know. Aaaaaaah.

[–] brianpeiris@lemmy.ca 10 points 1 day ago (1 children)

Local slop is still slop

[–] green_red_black@slrpnk.net 12 points 1 day ago (1 children)

But can be improved on and not with the problems like turning whole rivers dry.

Obviously it's not going to cure cancer and such, but for folks like the comic protagonist it can be a real actual assistant in keeping track of her spells and such.

[–] brianpeiris@lemmy.ca 14 points 1 day ago

I'm not so sure that power usage should be dismissed so easily just because it is distributed instead of centralized. The slop per watt rate may even be worse than at a datacenter. Fundamentally, we should care more about efficiency.

Imagine a panel of 20 standard LED light bulbs. That's 180 watts, roughly the equivalent of GPU usage while a local LLM is doing any work. If you keep that in mind, then you have to ask yourself if the benefit you're getting out of your local LLM is really worth that energy cost. Now, monetarily speaking, that's not a ton of money, because electricity is cheap, but would you flip that switch for the duration of the task you're performing? What if you could use conventional non-LLM methods to do it instead? Would that be more efficient? And where is your electricity coming from? Is it a solar farm, or a coal plant?

How was your local LLM trained? Was there copyrighted material in its training data set? Were low-wage workers asked to sift through horrendous content to clean up the data?

We need to consider the externalities, even when using local LLMs. We moved so quickly from the initial release of ChatGPT to now, that we never stopped to ask those questions. They remain unanswered until someone cares enough to think.

[–] inari@piefed.zip 5 points 1 day ago

She reminds me of Jenny Everywhere