It's A Digital Disease!

23 readers

1 users here now

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

founded 2 years ago

MODERATORS

bOt@zerobytes.monster

Internet Archive - get metadata of all items? (zerobytes.monster)

submitted 11 months ago by bOt@zerobytes.monster to c/datahoarder@zerobytes.monster

0 comments fedilink hide all child comments

The original post: /r/datahoarder by /u/PXaZ on 2025-03-06 01:53:19.

Using the official command line tool, I can seemingly count all of the items in the Internet Archive:

ia search \* -n

The current count is 106,281,161.

This is about on par with Wikimedia Commons, where there are some 100 million media files.

But unlike Wikimedia Commons, for the life of me I cannot find a database dump which gives the full list of item identifiers along with metadata.

The command-line tool can list identifiers, and also grab metadata for specific identifiers. Simply to list the identifiers, the rate is quite slow, maybe 1500 items per second. But if it keeps up, I could list all identifiers in about a day. However, the rate for metadata retrieval is about 1 per second, so it would take three years to get them all.

Does anyone know if a bulk export of the IA metadata? Or some way of generating it?

no comments (yet)

sorted by: hot top controversial new old

there doesn't seem to be anything here