Oh, sorry, I didn't mean to imply that consumer-grade hardware has gotten more efficient. I wouldn't really know about that, but I assume most of the focus is on data centers.
Those were two separate thoughts:
- Models are getting better, and tooling built around them are getting better, so hopefully we can get to a point where small models (capable of running on consumer-grade hardware) become much more useful.
- Some modern data center GPUs and TPUs compute more per watt-hour than previous generations.
Did I claim that? If so, then maybe I worded something poorly, because that's wrong.
My hope is that as models, tooling, and practices evolve, small models will be (future tense) effective enough to use productively so we won't need expensive commercial models.
To clarify some things:
There's a difference between efficiency and effectiveness. The hardware is becoming more efficient, while models and tooling are becoming more effective. The tooling/techniques to use LLMs more effectively also tend to burn a LOT of tokens.
TL;DR: