The GDPR says that information that has been anonymized, for example through statistical analysis, is fine. LLM training is essentially a form of statistical analysis. There's hardly anything in law that is "simple."
FaceDeer
You don't think LLMs are being trained off of this content too? Nobody needs to bother "announcing a deal" for it, it's being freely broadcast.
Existing AIs such as ChatGPT were trained in part on that data so obviously they've got ways to make it work. They filtered out some stuff, for example - the "glitch tokens" such as solidgoldmagikarp were evidence of that.
By "old archives" I mean everything from 2022 and earlier.
There are torrents of complete Reddit comment archives available for any random person who wants them, I'm sure Reddit themselves has a comprehensive edit history of everything.
"Model collapse" can be easily avoided by keeping old human data with new synthetic data in the training set. The old archives of Reddit content from before there was AI are still around.
You think they don't have the originals archived?
The echo-chamberiness of Lemmy is different from Reddit, but still a thing unfortunately. It'll really depend on the community you're in, but since the population of the Fediverse (and especially the Threadiverse) is very small compared to Reddit you tend to have the same people cropping up a lot. I haven't been banned from anywhere (that I know of - I don't actually know if I would get notified) but I find myself hammered with downvotes more frequently here than on Reddit when I say something unpopular.
I'd say, mess around a bit and see.
The analogy isn't perfect, no analogy ever is.
In this case the content of the search is all that really matters for the quality of the search. What else would you suggest be recorded, the words-per-minute typing speed, the font size? If they want to improve the search system they need to know how it's working, and that involves recording the searches.
It's anonymized and you can opt out. Go ahead and opt out. There'll still be enough telemetry for them to do their work.
Nor is it up to you. But fact remains, it's not illegal until there are actually laws against it. The court cases that might determine whether current laws are against it are still ongoing.