FaceDeer

joined 2 years ago
[–] FaceDeer@fedia.io 1 points 1 year ago

Nor is it up to you. But fact remains, it's not illegal until there are actually laws against it. The court cases that might determine whether current laws are against it are still ongoing.

[–] FaceDeer@fedia.io 5 points 1 year ago (2 children)

The GDPR says that information that has been anonymized, for example through statistical analysis, is fine. LLM training is essentially a form of statistical analysis. There's hardly anything in law that is "simple."

[–] FaceDeer@fedia.io 3 points 1 year ago

You don't think LLMs are being trained off of this content too? Nobody needs to bother "announcing a deal" for it, it's being freely broadcast.

[–] FaceDeer@fedia.io 1 points 1 year ago

Existing AIs such as ChatGPT were trained in part on that data so obviously they've got ways to make it work. They filtered out some stuff, for example - the "glitch tokens" such as solidgoldmagikarp were evidence of that.

[–] FaceDeer@fedia.io 1 points 1 year ago (3 children)

By "old archives" I mean everything from 2022 and earlier.

[–] FaceDeer@fedia.io 6 points 1 year ago

There are torrents of complete Reddit comment archives available for any random person who wants them, I'm sure Reddit themselves has a comprehensive edit history of everything.

[–] FaceDeer@fedia.io 21 points 1 year ago (9 children)
[–] FaceDeer@fedia.io 4 points 1 year ago

You think they don't have the originals archived?

[–] FaceDeer@fedia.io 4 points 1 year ago (4 children)

The echo-chamberiness of Lemmy is different from Reddit, but still a thing unfortunately. It'll really depend on the community you're in, but since the population of the Fediverse (and especially the Threadiverse) is very small compared to Reddit you tend to have the same people cropping up a lot. I haven't been banned from anywhere (that I know of - I don't actually know if I would get notified) but I find myself hammered with downvotes more frequently here than on Reddit when I say something unpopular.

I'd say, mess around a bit and see.

[–] FaceDeer@fedia.io 2 points 1 year ago

The analogy isn't perfect, no analogy ever is.

In this case the content of the search is all that really matters for the quality of the search. What else would you suggest be recorded, the words-per-minute typing speed, the font size? If they want to improve the search system they need to know how it's working, and that involves recording the searches.

It's anonymized and you can opt out. Go ahead and opt out. There'll still be enough telemetry for them to do their work.

view more: ‹ prev next ›