Who's paying for the electricity that it is using, then?
FaceDeer
I was reading the other day about advances in zinc ion batteries as a possible replacement for lithium ion batteries in applications like this. They're heavier than lithium ion, which is just fine for energy storage facilities like this, but they retain their capacity through a lot more charge/discharge cycles (the article I was reading said they drop to 80% capacity after 100,000 cycles - if that's one cycle a day then that's nearly 300 years) and most importantly for this specific situation they're not flammable.
No, at best the genocide switched out of "fast mode" and back into "slow mode" again.
The site producing the nonsense has to produce lots of it any time a bot comes along, the trainers only have to filter it once. As others have pointed out it's likely easy for an automated filter to spot. I don't see it as being a clear win.
It's a blow to the big closed-source AI companies, sure, but hardly a knockout one. If a small company can use a million dollars to produce a neat model perhaps a big company can use those same techniques and a billion dollars to produce a really neat model. Or at least build a lot more of the infrastructure that goes around those models and makes use of them. Code Copilot isn't just selling a raw LLM API, they're selling its integration into the Microsoft coding ecosystem. They may have wasted some money on their current-generation AIs but that's just sunk cost. They've got more money to spend on future AIs.
The main problem will be if Western AI companies are prevented from adapting the techniques being used by these Chinese AI companies. If, for example, there are lots of onerous regulations on what training data can be used or requiring extreme "safety guardrails." The United States seems likely to be getting rid of a lot of those sorts of obstructions over the next few years, though, so I wouldn't count the West out yet.
I think it was the 1B model
Well there you go, you took a jet ski and then complained that it was having difficulty climbing steep inclines in mountains.
Small models like that are not going to "know" much. Their purpose is generally to process whatever information you give them. For example you could use one to quickly and cheaply categorize documents based on their contents, or use one as a natural-language interface you could use to ask it to execute commands on other tools.
I'd be happy with it. It means that the universe is ours for the taking and the future will belong to our descendants.
If there are already intelligent aliens "out there" then they've got millions or billions of years' head start on us and we'll never catch up, we'd be completely at their mercy.
No, a few million hits from bots is routine for anything that's facing the public at all. Others have posted on this thread (or others like it, this article's been making the rounds a lot in the past few days) that even the most basic of sites can get that sort of bot traffic, and that it's just a simple recursion depth limit setting to avoid the "infinite maze" aspect.
As for AI training, the access log says nothing about that. As I said, AI training sets are not made by just dumping giant piles of randomly scraped text on AIs any more. If a trainer scraped one of those "infinite maze" sites the quality of the resulting data would be checked, and if it was generated by anything remotely economical for the site to be running it'd almost certainly be discarded as junk.
An even easier way to hide stuff is to not put it online in the first place.
It's still a problem the world needs to deal with, but I don't think it's as deep as everyone thinks.