It's A Digital Disease!

3976

1

A zine which helped me learn to hoard the internets (zinebakery.com)

submitted 1 year ago by bOt@zerobytes.monster to c/datahoarder@zerobytes.monster

0 comments fedilink

The original post: /r/datahoarder by /u/djnron on 2025-02-01 21:05:16.

https://zinebakery.com/assets/homemade-zines/bakeshop-zines/DIYWebArchiving-DombrowskiKijasKreymerWalshVisconti-V4.pdf

Yeah so this is probably known here kind of a manual for archiving, anyways maybe it is helpfulfor some folks.

3977

1

Hoarding the Datahoarder Subreddit Community: Discord Server? Community back up plan? (zerobytes.monster)

submitted 1 year ago by bOt@zerobytes.monster to c/datahoarder@zerobytes.monster

0 comments fedilink

The original post: /r/datahoarder by /u/paperedbones on 2025-02-01 20:17:49.

First time poster, long time lurker. Recently read an article about Reddit deteriorating, eroded by a fresh wave of bot influx. This may be the usual doomsaying hysteria, but it did lead me to consider - amid all the other hijinks afoot within the US government - that it would be prudent to have a back up method by which the talented & knowledgeable individuals on this subreddit may share their skills with one another in the event of "something happening" to Reddit, eventually.

Basically, suspecting that the enshittification and censorship of the internet is soon to reach new levels of intensity, how can this community & its knowledgebase be backed up?

So this is the question: is there an active Discord server? Does anyone here recommend any other communities where this kind of knowledge is shared?

Personally, I'm not big on small talk and find most of the chatter in most Discord servers inane and needless, but recognize the usefulness of having a network of intelligent skillful people as a sort of brain trust. Haha Maybe the idea is self-defeating: if a server exists, it needs to be active, but if there's isn't anything urgent to say or ask, a lot of activity will generally be rubbish chitchat, and if there's too much rubbish chitchat, most people valuing quality exchanges will eventually just leave the server? But maybe I'm mistaken.

I imagine many of you feel similarly, and it would be a loss to all of us if our major means of idea exchange (ie this subreddit?) ever collapsed into oblivion. Anyway...your thoughts?

3978

1

What I backed up on M-Disc (old.reddit.com)

submitted 1 year ago by bOt@zerobytes.monster to c/datahoarder@zerobytes.monster

0 comments fedilink

The original post: /r/datahoarder by /u/Blood_Wraith7777 on 2025-02-01 19:43:59.

3979

1

data.cdc.gov full archive (zerobytes.monster)

submitted 1 year ago by bOt@zerobytes.monster to c/datahoarder@zerobytes.monster

0 comments fedilink

The original post: /r/datahoarder by /u/VeryConsciousWater on 2025-02-01 19:32:44.

Good morning r/DataHoarder,

Many of you have probably seen me working on the CDC datasets archive, but those thread have gotten a bit cluttered and I have a lot of people to notify, so I'm making this a new post.

Over the past several days I've been archiving and uploading a copy of all public datasets formerly available at data.cdc.gov, as of 2025-01-28. This does not include webpages themselves, as those have already largely been archived by projects like EOTArchive and the Wayback Machine.

This upload is now complete and available at https://archive.org/details/20250128-cdc-datasets. For seeders use the file "full-20250128-cdc-datasets-USETHIS.torrent" included in the files or the magnet at the end of this post.

For more context have a look at this post and this post.

Thank you to everyone who requested this important data, and particularly to those who have offered to mirror it. I'll ping everyone who has requested notice ~~in a comment~~, unless you DMed me requesting notice in which case I'll respond to your message.

Happy hoarding everyone!

Brief ETA: Reddit is really not a fan of bulk pinging apparently, so I'll have to go back through the thread to notify everyone. That'll take some time, so apologies for that.

Torrent mirror:

magnet:?xt=urn:btih:3bf9d780d838b6bbc977e9cc6a9530e70ec49732&dn=20250128-cdc-datasets&tr=udp%3A%2F%2Ftracker.0x7c0.com%3A6969%2Fannounce&tr=udp%3A%2F%2Fexodus.desync.com%3A6969%2Fannounce&tr=udp%3A%2F%2Fexplodie.org%3A6969%2Fannounce&tr=udp%3A%2F%2Fopen.free-tracker.ga%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.qu.ax%3A6969%2Fannounce&tr=http%3A%2F%2Fopen.tracker.cl%3A1337%2Fannounce&tr=udp%3A%2F%2Fns-1.x-fins.com%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.bittor.pw%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker-udp.gbitt.info%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.ololosh.space%3A6969%2Fannounce&tr=udp%3A%2F%2Fopen.demonii.com%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.tiny-vps.com%3A6969%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Fopen.dstud.io%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.dler.org%3A6969%2Fannounce&tr=udp%3A%2F%2Fopentracker.io%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.dump.cl%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.theoks.net%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce

3980

1

How to download YouTube videos on Internet Archive's Wayback Machine? (zerobytes.monster)

submitted 1 year ago by bOt@zerobytes.monster to c/datahoarder@zerobytes.monster

0 comments fedilink

The original post: /r/datahoarder by /u/PickleGambino on 2025-02-01 19:15:23.

I have a video that I saved to the Internet Archive using RecoverMyVideo. I saw a Reddit post with this same question 6 years ago, but the link that someone posted to this tool for saving videos didn't work anymore.

3981

1

Price per terabyte isn't your only consideration (i.redd.it)

submitted 1 year ago by bOt@zerobytes.monster to c/datahoarder@zerobytes.monster

0 comments fedilink

The original post: /r/datahoarder by /u/gummytoejam on 2025-02-01 18:09:28.

3982

1

Tool to scrape and monitor changes to the U.S. National Archives Catalog (zerobytes.monster)

submitted 1 year ago by bOt@zerobytes.monster to c/datahoarder@zerobytes.monster

0 comments fedilink

The original post: /r/datahoarder by /u/itscalledabelgiandip on 2025-02-01 17:44:22.

I've been increasingly concerned about things getting deleted from the National Archives Catalog so I made a series of python scripts for scraping and monitoring changes. The tool scrapes the Catalog API, parses the returned JSON, writes the metadata to a PostgreSQL DB, and compares the newly scraped data against the previously scraped data for changes. It does not scrape the actual files (I don't have that much free disk space!) but it does scrape the S3 object URLs so you could add another step to download them as well.

I run this as a flow in a Windmill docker container along with a separate docker container for PostgreSQL 17. Windmill allows you to schedule the python scripts to run in order and stops if there's an error and can send error messages to your chosen notification tool. But you could tweak the the python scripts to run manually without Windmill.

If you're more interested in bulk data you can get a snapshot directly from the AWS Registry of Open Data and read more about the snapshot here. You can also directly get the digital objects from the public S3 bucket.

This is my first time creating a GitHub repository so I'm open to any and all feedback!

https://github.com/registraroversight/national-archives-catalog-change-monitor

3983

1

US GOV FTP and HTTP file servers (zerobytes.monster)

submitted 1 year ago by bOt@zerobytes.monster to c/datahoarder@zerobytes.monster

0 comments fedilink

The original post: /r/datahoarder by /u/storytracer on 2025-02-01 16:55:27.

I'm currently mirroring all FTP and HTTP file servers of the US federal government I can find. Here's the current status of all downloads. Please let me know if you come across any other sites, I will add them to the download list! I have 150TB of storage available and can get more if necessary.

ftp.cdc.gov: Finished
ftp.opc.ncep.noaa.gov: Finished
ftp.census.gov: ~200GB downloaded, currently offline
ftp.ncbi.nlm.nih.gov: Transferred: 2.416 TiB / 2.866 TiB, 84%, 24.680 MiB/s, ETA 5h18m58s
gml.noaa.gov/aftp/: Transferred: 3.427 TiB / 16.223 TiB, 21%, 38.559 MiB/s, ETA 4d39m42s
ftp.cpc.ncep.noaa.gov: Transferred: 120.415 GiB / 129.118 GiB, 93%, 678.048 KiB/s, ETA 3h44m18s
ftp.emc.ncep.noaa.gov: Transferred: 276.323 GiB / 803.759 GiB, 34%, 2.317 MiB/s, ETA 2d16h45m
ftp.ncep.noaa.gov: Transferred: 1.214 TiB / 1.533 TiB, 79%, 5.659 MiB/s, ETA 16h27m3s
www.ncei.noaa.gov/data/: Transferred: 2.584 TiB / 2.844 TiB, 91%, 29.482 MiB/s, ETA 2h33m41s
ftp.nhc.ncep.noaa.gov: Transferred: 49.360 GiB / 76.977 GiB, 64%, 1.277 MiB/s, ETA 6h9m5s
ftp.nhc.noaa.gov: Transferred: 5.200 GiB / 5.272 GiB, 99%, 20.571 KiB/s, ETA 1h1m4s
ftp.wpc.ncep.noaa.gov: Transferred: 66.062 GiB / 70.366 GiB, 94%, 813.401 KiB/s, ETA 1h32m27s
tgftp.ncep.noaa.gov: Transferred: 209.090 GiB / 927.471 GiB, 23%, 15.391 MiB/s, ETA 13h16m35s
ftp.nlm.nih.gov: Stalled Transferred: 7.441 GiB / 90.150 GiB, 8%, 0 B/s, ETA -
ftp.ngdc.noaa.gov: Transferred: 282.839 GiB / 373.703 GiB, 76%, 3.068 MiB/s, ETA 8h25m31s
ftp.ee.lbl.gov: Stalled Transferred: 351.943 MiB / 351.943 MiB, 100%, 42.538 KiB/s, ETA 0s
gaftp.epa.gov: Transferred: 3.416 TiB / 4.830 TiB, 71%, 51.126 MiB/s, ETA 8h3m36s
ftp.wildfire.gov: Transferred: 1.539 TiB / 1.589 TiB, 97%, 11.657 MiB/s, ETA 1h14m53s
www.ncei.noaa.gov/pub/: Transferred: 414.599 GiB / 441.027 GiB, 94%, 3.209 MiB/s, ETA 2h20m32s

3984

2

Archive of NIST chemistry webbook? (zerobytes.monster)

submitted 1 year ago by bOt@zerobytes.monster to c/datahoarder@zerobytes.monster

0 comments fedilink

The original post: /r/datahoarder by /u/verticalfuzz on 2025-02-01 13:30:44.

Has anyone archived the data at https://webbook.nist.gov/chemistry/ ?

Can someone help me figure out how, or preferably, do it and share it? I have some storage space but no idea how to archive stuff. This data is very important for research and the chemical/engineering/water/pharma industries.

I believe this may be the same data: https://catalog.data.gov/dataset/nist-chemistry-webbook-srd-69-de237

3985

1

As a new data hoarder, should I worry about the advanced settings in WinRAR or in Windows' native compression options? (zerobytes.monster)

submitted 1 year ago by bOt@zerobytes.monster to c/datahoarder@zerobytes.monster

0 comments fedilink

The original post: /r/datahoarder by /u/igmkjp1 on 2025-02-01 13:29:38.

See title.

3986

1

Data.gov no longer shows the number of data sets that’s available (zerobytes.monster)

submitted 1 year ago by bOt@zerobytes.monster to c/datahoarder@zerobytes.monster

0 comments fedilink

The original post: /r/datahoarder by /u/juliacakes on 2025-02-01 13:23:49.

I’m checking on mobile on chrome and safari.

3987

1

Breathing easier (zerobytes.monster)

submitted 1 year ago by bOt@zerobytes.monster to c/datahoarder@zerobytes.monster

0 comments fedilink

The original post: /r/datahoarder by /u/Scienceyall on 2025-02-01 12:29:28.

My algorithm found you. I feel better knowing you exist. Your efforts will not be for nothing Winston Smith.

3988

2

Does Internet Archive have any plans to move their data off U.S. soil? (zerobytes.monster)

submitted 1 year ago by bOt@zerobytes.monster to c/datahoarder@zerobytes.monster

0 comments fedilink

The original post: /r/datahoarder by /u/galamsmsmsm on 2025-02-01 09:51:10.

With the way things are going, I wouldn't be surprised if Internet Archive became a target for censorship. Does anyone know if there are backups hosted in other countries or plans to move their data?

In a 2016 blog post, they mentioned that they were planning to host a copy of the archive in Canada and that they have partial copies hosted in Egypt and the Netherlands. Is that still relevant information?

3989

1

Archiving or scraping Brickshelf before it shuts down (zerobytes.monster)

submitted 1 year ago by bOt@zerobytes.monster to c/datahoarder@zerobytes.monster

0 comments fedilink

The original post: /r/datahoarder by /u/quinyd on 2025-02-01 08:40:05.

https://brickshelf.com/ is shutting down March 1st.

I’m not well versed in scraping it would be sad to see so many Lego albums be deleted and there’s lots of custom instructions on there too.

3990

1

Expanding old Storage or replacing it with newer drives. (zerobytes.monster)

submitted 1 year ago by bOt@zerobytes.monster to c/datahoarder@zerobytes.monster

0 comments fedilink

The original post: /r/datahoarder by /u/MeepMeep2000 on 2025-02-01 08:21:55.

Hey,

I currently run 2x12TB and want to add more storage.

My main options are:

Buy 2x16TB and make two mirrors, for a total of 28TB

OR

Buy 2x12TB and make a raidz1 for a total of 36TB

Obviously the second option is not only cheaper but also provides more storage.

The problem is, that the second option will lock me more into the 12TB, while the first allows me to more easily extend with 16TB Drives in the future.

Is it still worth it to go with 12TB drives or will prices of higher capacity drives drop quickly enough to already start with a 16TB array?

3991

1

Thank you (zerobytes.monster)

submitted 1 year ago by bOt@zerobytes.monster to c/datahoarder@zerobytes.monster

0 comments fedilink

The original post: /r/datahoarder by /u/ladycaviar on 2025-02-01 07:50:52.

Never thought I'd have to think this, much less say it, but to all those of you who save humanity's data, I salute you

you all are heroes in a super weird world

3992

1

How much storage do you have? and how do you get so much? (zerobytes.monster)

submitted 1 year ago by bOt@zerobytes.monster to c/datahoarder@zerobytes.monster

0 comments fedilink

The original post: /r/datahoarder by /u/CiaIsMyWaifu on 2025-02-01 06:46:52.

I always remember hearing storage was really expensive, and with mechanical drives growing up, higher capacities being more likely to give out with a lot of use. How is storage in current era and fail rates? I'm still using about 4TB between two drives.

3993

1

This is the first time I’m in the sub (zerobytes.monster)

submitted 1 year ago by bOt@zerobytes.monster to c/datahoarder@zerobytes.monster

0 comments fedilink

The original post: /r/datahoarder by /u/DangDoood on 2025-02-01 04:18:39.

Y’all probably feel so justified right now… it’s like being a survivalist/doomsday packer and the zombie apocalypse just happens.

Appreciate y’all

(And of course this is ignoring the genuine fear, insecurity, and worries people are experiencing)

3994

1

US Census Bureau ftp (zerobytes.monster)

submitted 1 year ago by bOt@zerobytes.monster to c/datahoarder@zerobytes.monster

0 comments fedilink

The original post: /r/datahoarder by /u/Narrow-Task on 2025-02-01 04:00:46.

Hi fellow hoarders, I noticed the detailed data downloads from the census bureau (the ftp site) is down right now. Is this a coincidence or just routine maintenance?

https://www2.census.gov/geo/tiger/TIGER2024/

I would like to save all of this down as I use it for a lot of personal and professional work. And it's just cool.

3995

1

What environmental datasets should I try to preserve? (zerobytes.monster)

submitted 1 year ago by bOt@zerobytes.monster to c/datahoarder@zerobytes.monster

0 comments fedilink

The original post: /r/datahoarder by /u/future__fires on 2025-02-01 03:16:30.

First post here. I’ve been lurking for a while until I had enough money saved to build a serious setup but with the CDC website going down I guess I’ve run out of time. Climate data is extremely important to me and I don’t even know where to start archiving or what is important but I expect information on climate change will be sufficiently inconvenient to the Trump admin that it’ll come down soon as well. I’ve also considered the fact that a lot of climate data is kept by universities and that will be harder for the White House to remove. I feel overwhelmed. If anyone could give me ideas on where to start or if climate data is stored in enough places and by enough different entities that it will be around for a while. Also just generally, what do I do? I don’t have the money for terabytes of storage space. I’ve got a desktop PC with about 1TB and a laptop.

3996

1

Thanks everyone! There is airflow now (www.reddit.com)

submitted 1 year ago by bOt@zerobytes.monster to c/datahoarder@zerobytes.monster

0 comments fedilink

The original post: /r/datahoarder by /u/DROP_DAT_DURKA_DURK on 2025-02-01 03:09:40.

3997

1

Urgent Request: NIH & FDA Sites (zerobytes.monster)

submitted 1 year ago by bOt@zerobytes.monster to c/datahoarder@zerobytes.monster

0 comments fedilink

The original post: /r/datahoarder by /u/Emotional_Bunch_799 on 2025-02-01 02:58:32.

I worked in infectious diseases field, and I think the following sites are high risk of being scrubbed.

We need help archiving the following. They require large amount of storage space due to all their databases:

NIH National Library of Medicine: https://www.ncbi.nlm.nih.gov/

NIH National Institute of Allergy and Infectious Diseases: https://www.niaid.nih.gov/

FDA and their databases: https://www.fda.gov/

FDA site has been noticeably slower and some pages are unresponsive.

Thank you and I'll donate to organizations that are fighting this!✊

3998

1

all Instagram story savers are capped at 720p!! no more Full HD. (zerobytes.monster)

submitted 1 year ago by bOt@zerobytes.monster to c/datahoarder@zerobytes.monster

0 comments fedilink

The original post: /r/datahoarder by /u/uinstitches on 2025-02-01 02:24:14.

the last 1080p story I saved was January 6, and all 35 stories I've ripped since then are 720p. very disappointing as if I knew I would have screen recorded. has Instagram blocked apps from ripping stories at max bitrate?

what apps or websites are u guys using?

3999

3

Trump's US National data purge has begun. How can we help preserve the past for the future? (www.theverge.com)

submitted 1 year ago by bOt@zerobytes.monster to c/datahoarder@zerobytes.monster

0 comments fedilink

The original post: /r/datahoarder by /u/Megathreadd on 2025-02-01 01:04:02.

4000

1

Organized continuity effort (zerobytes.monster)

submitted 1 year ago by bOt@zerobytes.monster to c/datahoarder@zerobytes.monster

0 comments fedilink

The original post: /r/datahoarder by /u/dominionman on 2025-02-01 01:03:46.

Is there any group organizing an effort to create a shadow instance of "vital sites and information"? I would be willing to bet that many of us have at least some spare space and the ability to host things like cdc.screwfascists.com or whatever to make sure that things are continued. Maybe this could be the beginning of a trusted decentralized register of scientific and historical data. Not to step on Wikipedia's toes.