this post was submitted on 30 Jan 2025
1 points (100.0% liked)

It's A Digital Disease!

23 readers
1 users here now

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

founded 2 years ago
MODERATORS
 
The original post: /r/datahoarder by /u/Refinery73 on 2025-01-30 15:16:23.

Hi everyone,

I’m sitting on a pile of a few hundred thousand PDFs from local government als city hall meetings from half the county.

I’m wondering what to do with it and like to discuss your opinions.

I was able to easily scrape them from the gov website and the files are public. I see archive value in them for city history and political studies. They are however created by a bunch of different cities and departments and lack any clear license. The robots.txt didn’t prohibit scraping but I don’t exactly own them. On the other hand it’s public government information. Not US-based so I don’t want to discuss about licensing of public documents but how you would approach this dataset.

I thought about ‘preservation first’ and ‘public interest’ so to create a torrent archive for each city and start seeding it. I’m not sure however if someone has a better idea.

There is no public archive for this and cities have been losing these left and right when changing platforms and not caring about migrating. For them the relevant file is some signed printout in some drawer. They just don’t care.

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here