I'm not going to watch the video
I like most context in text rather than video form
but while I will very well believe that:
-
It's possible to optimize LLMs to make smaller models more effective than they are today. It would be very surprising if they were already optimal, given that the field is immature.
-
It's possible to do a series of smaller, specialized models and keep models not-relevant to the current context unloaded from VRAM
I believe that this is referred to as Mixture of Experts. This should improve memory efficiency for many problems.
...this is countered by the fact that once you free up resources, I also suspect that you can then go use those now-available resources to improve the model by shoveling more data into the model. And while there might be diminishing returns, I very much doubt that there is a hard cap on which one can get better results by throwing more knowledge at a problem.
Also, while there's no absolute guarantee, most communities have something vaguely along the lines of prohibiting harassment, as do most instances.
That doesn't mean that a given user's idea of harassment and a moderator's or admin's idea will always perfectly line up. What you think of as being harassment might be what some other people consider disagreeing. But in general, if someone is clearly following a user around and just commenting with the aim of trying to make them miserable, rather than disagreeing with them on some point or something, you can probably report it to a moderator (or, ultimately, admin) and have them remove their comments and probably issue a ban. Brings a third party's eyes into the situation.
And if you truly don't feel that a given community's moderators are sufficiently-restrictive, you can switch to a community that has more-restrictive rules.