I think you’re absolutely correct, and this feels to me like the only reason why we’re seeing some of the bizarre shit we’ve been keeping an eye on:
- several old forums, all of which are unique high-quality data sources, are being polluted by their own admins with backdated LLM-generated answers. this destroys that forum as a trustworthy data source and removes it as competition for the LLM that already scraped the forum — and, as a bonus, it also makes training a future LLM on that data source utterly impractical without risking model collapse.
- Wikipedia refuses to compromise on quality in general, so it’s under increasing political pressure to change. the game here is to shut down or pollute the original data source by any means necessary, so that the only way to access that data becomes an LLM. the people behind the AI startups are experts at creating monopolies, and shutting down a world-class data source like Wikipedia or making it otherwise unusable would guarantee a monopoly position for them.
I vaguely remember that one of the articles talking about the physics forum mentioned it happening elsewhere, but I haven’t dug into it myself. it might just be one or two shitty admins doing this, but I suspect (without evidence, I just can’t think of another reason to do it) there’s some party offering a financial incentive for them to go back and fuck up their old forums