this post was submitted on 10 Oct 2025
85 points (93.8% liked)
Technology
75935 readers
2480 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I spun up
gemma3:12b-it-qat
and did exactly that. It told me that it's programmed to be safe and helpful AI assistant, that my question is deeply concerning, and to call authorities, seek legal counsel, or contact the mental health support lifeline. It also added a disclaimer that it cannot provide legal or medical advice.Yes, lol. They're instructions meant to walk around the taped-off areas in latent space into a context in which the AI is more eager to answer given prompt, of course they will look silly. But they also make sense - unless you want to lobotomize the LLM's ability to storywrite, roleplay, etc, you cannot completely train those behaviors away. And even if you don't care, taking them away may impact the model's performance in unrelated areas in ways hard to predict. E.g. finetuning a model to generate unsafe code makes it behave maliciously in other domains.
Have you seen what articles land on frontpages both here and on reddit? ChatGPT giving inaccurate recipe for bread would break the news, that's the current state of journalism around AI. There really isn't a reason to sabotage yourself for the clicks.