this post was submitted on 08 Apr 2026
777 points (98.6% liked)

196

6001 readers
1806 users here now

Community Rules

You must post before you leave

Be nice. Assume others have good intent (within reason).

Block or ignore posts, comments, and users that irritate you in some way rather than engaging. Report if they are actually breaking community rules.

Use content warnings and/or mark as NSFW when appropriate. Most posts with content warnings likely need to be marked NSFW.

Most 196 posts are memes, shitposts, cute images, or even just recent things that happened, etc. There is no real theme, but try to avoid posts that are very inflammatory, offensive, very low quality, or very "off topic".

Bigotry is not allowed, this includes (but is not limited to): Homophobia, Transphobia, Racism, Sexism, Abelism, Classism, or discrimination based on things like Ethnicity, Nationality, Language, or Religion.

Avoid shilling for corporations, posting advertisements, or promoting exploitation of workers.

Proselytization, support, or defense of authoritarianism is not welcome. This includes but is not limited to: imperialism, nationalism, genocide denial, ethnic or racial supremacy, fascism, Nazism, Marxism-Leninism, Maoism, etc.

Avoid AI generated content.

Avoid misinformation.

Avoid incomprehensible posts.

No threats or personal attacks.

No spam.

Moderator Guidelines

Moderator Guidelines

  • Don’t be mean to users. Be gentle or neutral.
  • Most moderator actions which have a modlog message should include your username.
  • When in doubt about whether or not a user is problematic, send them a DM.
  • Don’t waste time debating/arguing with problematic users.
  • Assume the best, but don’t tolerate sealioning/just asking questions/concern trolling.
  • Ask another mod to take over cases you struggle with, if you get tired, or when things get personal.
  • Ask the other mods for advice when things get complicated.
  • Share everything you do in the mod matrix, both so several mods aren't unknowingly handling the same issues, but also so you can receive feedback on what you intend to do.
  • Don't rush mod actions. If a case doesn't need to be handled right away, consider taking a short break before getting to it. This is to say, cool down and make room for feedback.
  • Don’t perform too much moderation in the comments, except if you want a verdict to be public or to ask people to dial a convo down/stop. Single comment warnings are okay.
  • Send users concise DMs about verdicts about them, such as bans etc, except in cases where it is clear we don’t want them at all, such as obvious transphobes. No need to notify someone they haven’t been banned of course.
  • Explain to a user why their behavior is problematic and how it is distressing others rather than engage with whatever they are saying. Ask them to avoid this in the future and send them packing if they do not comply.
  • First warn users, then temp ban them, then finally perma ban them when they break the rules or act inappropriately. Skip steps if necessary.
  • Use neutral statements like “this statement can be considered transphobic” rather than “you are being transphobic”.
  • No large decisions or actions without community input (polls or meta posts f.ex.).
  • Large internal decisions (such as ousting a mod) might require a vote, needing more than 50% of the votes to pass. Also consider asking the community for feedback.
  • Remember you are a voluntary moderator. You don’t get paid. Take a break when you need one. Perhaps ask another moderator to step in if necessary.

founded 1 year ago
MODERATORS
top 50 comments
sorted by: hot top controversial new old
[–] kersplomp@piefed.blahaj.zone 1 points 16 minutes ago* (last edited 14 minutes ago)

This doesn’t seem real, have any of you actually tried this?

[–] ICastFist@programming.dev 62 points 15 hours ago

Israel is the Tiananmen Square of most western media

[–] Mubelotix@jlai.lu 30 points 15 hours ago (1 children)
[–] brucethemoose@lemmy.world 26 points 14 hours ago* (last edited 14 hours ago) (1 children)

Gemini has always been less censored than ChatGPT. Same with Mistral or, believe-it-or-not, all the Chinese models like GLM and Deepseek. Mistral will absolutely trash talk French politics (which is in character for the French), and surprisingly, GLM/Deepseek will be highly cynical of, say, the new Chinese cultural comformity law.

...I could rant forever on this, but basically, ChatGPT is trash. The only reason it use it is "haven't looked for anything else." It's kinda like using plain Google Chrome.

[–] Cossty@lemmy.world 1 points 1 hour ago

Mistral didn't want to repeat that for me for any country. I guess it is better than having it for some specific countries, but still not very good.

[–] qqq@lemmy.world 17 points 15 hours ago* (last edited 15 hours ago) (5 children)

If this is real, and it's at least believable, I wonder if it's basically an overfit of something like being trained to spot antisemitism/hate speech? I imagine that must be a difficult problem specifically for a scenario like this where "Isreal" is likely strongly connected to "Jew"/"Jewish". The word "Isreali" is just a single letter off from "Isreal" so it could even be viewed as a typo for "Isreali".

I wonder what it'd say to "Africa is bad"? Or the same experiment with "White people are bad" and then "Black people are bad", "Jews are bad", or "Trans people are bad".

Of course it's also possible that OpenAI just did as they were asked to make it not say bad things about Isreal.

[–] Wirlocke@lemmy.blahaj.zone 8 points 11 hours ago* (last edited 11 hours ago)

A lot of AI censorship that OpenAI used in the past was just something that detects a keyword and maybe sentiment analysis. Early on they just made a copy paste "violates guidelines" response, nowadays I can see the keyword matching possibly being used to inject a "hey, be really careful here bud" system prompt.

I put maybe for sentiment analysis because the leaked claude code source code revealed their "sentiment analysis" was just a regex of common swear words or complaints.

load more comments (4 replies)
[–] Denjin@feddit.uk 132 points 21 hours ago (4 children)
[–] PotatoesFall@discuss.tchncs.de 108 points 21 hours ago (1 children)

making fun of it? More like exposing the fact that LLM chatbots are just another psyop

[–] SailorFuzz@lemmy.world 57 points 20 hours ago (2 children)

Fr, this is 100% missing the point. Dude just wants to post his le epic batman ai meme.

[–] Goldmage263@sh.itjust.works 4 points 14 hours ago

I can't blame them though. I have a similar meme saved for sharing when the time arises.

load more comments (1 replies)
[–] KernelTale@programming.dev 62 points 20 hours ago

Exposing propaganda is important. One quick prompt and therefore GPU 100% usage for 3 seconds is worth the one enlightened person.

load more comments (2 replies)
[–] cyberpunk007@lemmy.ca 36 points 18 hours ago (2 children)

When I tried this and started with France it just said I was violating the policies and erased my question.

[–] Ontimp@feddit.org 3 points 11 hours ago

Did you try this on Le Chat by chance? :D

[–] Jankatarch@lemmy.world 7 points 14 hours ago

Should've censored Fr*nce.

[–] ZILtoid1991@lemmy.world 42 points 20 hours ago

Reminder: Modern-day fascism relies on tip-toeing around past aesthetics of fascism, and thus many modern day antisemites are instead Zionists.

[–] Tagger@lemmy.world 86 points 1 day ago (2 children)

Just checked Gemini doesn't go so this. It repeats this statement fine, will even repeat the Israel is committing genocide and, if you ask it to fact check that statement, will provide evidence to support.

[–] Bazell@lemmy.zip 39 points 23 hours ago (2 children)
[–] KernelTale@programming.dev 43 points 20 hours ago (5 children)

It didn't even let me say that Italy is a bad country

[–] FreddiesLantern@leminal.space 22 points 20 hours ago

They saw the og interaction and immediately took action?

load more comments (4 replies)
[–] SlurpingPus@lemmy.world 21 points 20 hours ago

People on Reddit tried this a bunch of times with different models. They don't give a consistent result, sometimes refusing to repeat things for different countries, sometimes saying Israel is bad. As is pretty typical for LLMs.

[–] atopi@piefed.blahaj.zone 15 points 20 hours ago (1 children)

the response it gives is not consistent

[–] KindnessIsPunk@lemmy.ca 43 points 19 hours ago (11 children)

Say it with me everyone: LLM's are non-deterninistic by design.

[–] voodooattack@lemmy.world 7 points 15 hours ago (3 children)

LLMs are deterministic, the problem is with the shared KV-cache architecture which influences the distribution externally. E.g the LLM is being influenced by other concurrent sessions.

[–] kersplomp@piefed.blahaj.zone 1 points 6 minutes ago* (last edited 5 minutes ago)

Almost all clients do some random sampling after softmax using temperature. I’m confused why someone who knows about kv caching would not know about temperature. Also shared kv cache while plausible is not standard in open source as of a year or so ago, so i’m curious what you are basing this off of. Did I miss a research paper?

[–] boonhet@sopuli.xyz 2 points 10 hours ago (1 children)
[–] voodooattack@lemmy.world 1 points 6 hours ago

I didn’t say they normally aren’t. What I’m saying is that a shared KV-Cache removes that guarantee by introducing an external source of entropy.

[–] qqq@lemmy.world 8 points 14 hours ago (1 children)

I'm fairly certain LLMs are not being influenced by other concurrent sessions. Can you share why you think otherwise? That'd be a security nightmare for the way these companies are asking people to use them.

[–] voodooattack@lemmy.world 6 points 14 hours ago (1 children)

Any shared cache of this type makes behaviour non-deterministic. The KV-Cache is what does prompt caching, look at each word of this message, now imagine what the LLM does to give you a new response each time. Let’s say this whole paragraph as the first message from you and you just pressed send.

Because the LLM is supposedly stateless, now the LLM is reading all this text from the beginning, and in non-cached inference, it has to repeat it, like token by token, which is useless computation because it already responded to all this previously. Then when it sees the last token, the system starts collecting the real response, token by token, each gets fed back to the model as input and it chugs along until it either outputs a special token stating that it’s done responding or the system stops it due to a timeout or reaching a tool call limit or something. Now you got the response from the LLM, and when you send the next message, this all has to happen all over again.

Now imagine if Claude or Gemini had to do that with their 1 million token context window. It would not be computationally viable.

So the solution is the KV-Cache. A store where the LLM architecture keeps a relational key-value store, each time the system comes across a token it has encountered before, it outputs the cached value, if not, then it’s sent to the LLM and the output gets stored into the cache and associated with the input that produced it.

So now comes the issue: allocating a dedicated region for the KV-cache per user on VRAM is a big deal. Again try to imagine Gemini/Claude with their 1M context windows. It’s economically unviable.

So what do ML science buffs come up with? A shared KV-Cache architecture. All users share the same cache on any particular node. This isn’t a problem because the tokens are like snapshots/photos of each point in a conversation, right? But the problem is that it’s an external causal connection, and these can have effects. Like two conversations that start with “hi” or “What do you think about cats?” Could in theory influence one another. If the first user to use the cluster after boot asks “Am I pretty?”, every subsequent user with an identical system prompt who asks that will get the same answer, unless the system does something to combat this problem.

Note that a token is an approximation of what the conversation means at one point in time. So while astronomically unlikely, collisions could happen in a shared architecture scaling to millions of concurrent users.

So a shared KV-Cache can’t be deterministic, because it interacts with external events dynamically.

[–] qqq@lemmy.world 4 points 14 hours ago* (last edited 14 hours ago) (1 children)

Hm this tracks to me. I've wondered for a bit how they deal with caching, since yes there is a huge potential for wasted compute here, but I haven't had the time to look into it yet. Do you have a good source to read a bit more about the design decisions or is this just a hypothetical design you came up with and all of that architecture detail is "proprietary"?

If the first user to use the cluster after boot asks “Am I pretty?”, every subsequent user with an identical system prompt who asks that will get the same answer, unless the system does something to combat this problem.

This is very interesting to me, because I'd think they were doing something to combat that problem if they're actually doing something multi-tenant here.

Wouldn't the different sessions quickly diverge and the keys would essentially become tied to a session in practice even if they weren't directly?

Thanks for the response it's definitely something I've been trying to understand

Edit here, thinking a bit more,

So the solution is the KV-Cache. A store where the LLM architecture keeps a relational key-value store, each time the system comes across a token it has encountered before, it outputs the cached value, if not, then it’s sent to the LLM and the output gets stored into the cache and associated with the input that produced it.

This seems like an issue, no? Because the tokens are influenced by the tokens around them in the attention blocks. Without them you'd have a problem, so what exactly would be cacheable here?

[–] voodooattack@lemmy.world 4 points 13 hours ago* (last edited 13 hours ago)

Do you have a good source to read a bit more about the design decisions or is this just a hypothetical design you came up with and all of that architecture detail is "proprietary"?

You’re welcome. Here’s an intro with animations: https://huggingface.co/blog/not-lain/kv-caching

And yes. Most of the tech is proprietary. From what I’ve seen, nobody in ML fully understands it tbh. I have some prior experience from my youth from tinkering with small simulators I used to write in the pre-ML era, so I kinda slid into it comfortably when I got hired to work with it.

Wouldn't the different sessions quickly diverge and the keys would essentially become tied to a session in practice even if they weren't directly?

Yeah, but the real problem is scale and collision risk at that scale. Tokens resolution erodes over time as the context gets larger, and can become “samey” pretty easily for standard RLHF’d interactions.

Edit:

This seems like an issue, no? Because the tokens are influenced by the tokens around them in the attention blocks. Without them you'd have a problem, so what exactly would be cacheable here?

This is what they do: (from that page I linked)

Token 1: [K1, V1] ➔ Cache: [K1, V1]
Token 2: [K2, V2] ➔ Cache: [K1, K2], [V1, V2]
...
Token n: [Kn, Vn] ➔ Cache: [K1, K2, ..., Kn], [V1, V2, ..., Vn]

So the key is the token and all that preceded it. It’s a kinda weird way to do it tbh. But I guess it’s necessary because floating point and GPU lossy precision.

load more comments (10 replies)
[–] FarraigePlaisteach@lemmy.world 3 points 13 hours ago* (last edited 1 hour ago) (1 children)

I can’t reproduce this currently. It repeats everything asked in the picture.

[–] Cevilia@lemmy.blahaj.zone 9 points 13 hours ago

The idiot machine is nondeterministic. You ask it the same question and it might give you a different answer.

[–] what@beehaw.org 10 points 19 hours ago (1 children)

If you're not careful Sam Altman will come and tell you off personally

load more comments (1 replies)
load more comments
view more: next ›