The original post: /r/datahoarder by /u/PXaZ on 2025-03-06 01:53:19.
Using the official command line tool, I can seemingly count all of the items in the Internet Archive:
ia search \* -n
The current count is 106,281,161.
This is about on par with Wikimedia Commons, where there are some 100 million media files.
But unlike Wikimedia Commons, for the life of me I cannot find a database dump which gives the full list of item identifiers along with metadata.
The command-line tool can list identifiers, and also grab metadata for specific identifiers. Simply to list the identifiers, the rate is quite slow, maybe 1500 items per second. But if it keeps up, I could list all identifiers in about a day. However, the rate for metadata retrieval is about 1 per second, so it would take three years to get them all.
Does anyone know if a bulk export of the IA metadata? Or some way of generating it?