110
It's trivially easy to poison LLMs into spitting out gibberish, says Anthropic
(www.theregister.com)
"We did it, Patrick! We made a technological breakthrough!"
A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.
Yeah, and, as the article points out, the trick would be getting those malicious training documents into the LLM's training material in the first place.
What I would wonder is whether this technique could be replicated using common terms. The researchers were able to make their AI spit out gibberish when it heard a very rare trigger term. If you could make an AI spit out, say, a link to a particular crypto-stealing scam website whenever a user put "crypto" or "Bitcoin" in a prompt, or content promoting anti-abortion "crisis pregnancy centers" whenever a user put "abortion" in a prompt ...
I've seen this described before, but as AI ingests content written by a prior AI for training things will get interesting.