this post was submitted on 08 Aug 2025
8 points (100.0% liked)

Hacker News

2303 readers
479 users here now

Posts from the RSS Feed of HackerNews.

The feed sometimes contains ads and posts that have been removed by the mod team at HN.

founded 11 months ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] HeartyOfGlass@piefed.social 2 points 2 days ago

From the original post:

Sure, they have huge GPU clusters, but there must be more going on - model optimizations, sharding, custom hardware, clever load balancing, etc.

What engineering tricks make this possible at such massive scale while keeping latency low?

The amount of hardware they're using is far more than you’re imagining, x1000. To answer your question as simply as possible: Money. Money to buy nuclear power plants to power this specific technology. Money to market & sell the idea. Money to lobby for legislation so it can keep sucking up copyrighted works. Money, money, and money is the reason why these companies can run this stuff while OP cannot.

Also, and mostly just my own paranoia - I don't trust anything these LLM companies are saying. Their entire product is running on hype right now. Every one of these companies is dumping absurd amounts of money into something that barely works, and there's no incentive for them to be honest with any of it. If the hype doesn't perpetuate, they've wasted that money.