It's A Digital Disease!

23 readers
1 users here now

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

founded 2 years ago
MODERATORS
3701
 
 
The original post: /r/datahoarder by /u/haleemsab14 on 2025-02-08 11:23:07.

Hi

I have 75 files starting from S01E02 to S01E76. I need to rename them to start from S01E01 to S01E75. What is a simple way to do this. Thanks.

3702
 
 
The original post: /r/datahoarder by /u/4einer on 2025-02-08 10:57:44.

I found tons of posts about this, however most of them require doing stuff I have no idea about (installing Phyton, using Docker etc..) - for example this one: https://github.com/rhnfzl/reddit-stash

Any suggestions for a tool that is easy to set up, preferably which supports multiple accounts?

This was the easiest method I've found, so if there are no better alternatives I'll just try it out (unless anyone advises against it for some reason):

https://danielrosehill.medium.com/how-to-keep-a-running-backup-of-your-reddit-account-using-zapier-75adfbeafa93

3703
 
 
The original post: /r/datahoarder by /u/ChestNo90 on 2025-02-08 05:54:19.

I've been scraping some sites that requires specific premium webhards. I used to use idm and didn't rly had to visit into links but just clicking or rightclick option is enough for idm to grab download. I still need premium and I believe still using transfer quota for each file. today I tried another downloader and suddenly reached transfer limit / few days. I'm pretty sure I've downloaded a lot more than that everyday recentely. which makes me guess idm somehow makes those transfer usage log lower?

3704
 
 
The original post: /r/datahoarder by /u/undisputedx on 2025-02-08 09:34:15.
3705
 
 
The original post: /r/datahoarder by /u/das_zwerg on 2025-02-08 07:06:08.

I've been looking around for a good off-site backup solution. I see many many people recommend Backblaze but I can't find anything out about their B2 pricing and their sales staff never get back to me. Any idea what costs actually look like? My use case is dumping ~22gb of data every week.

I don't have any friends with setups I can use nor anyone willing to host another setup of mine for off-site backups.

3706
 
 
The original post: /r/datahoarder by /u/piedamon on 2025-02-08 05:54:58.

I joined this sub this past week after seeing American government sites and data sets go down. I’m reasonably tech savvy but not a coder, and new to data hoarding. I’ve looked around and seen that there a few a few tools that help with downloads.

Which are my priority? I’ve just installed Kiwix and recently saw a thread discussing a few others.

This seems urgent, so what should newcomers and non-Americans like me focus on?

Thanks

3707
 
 
The original post: /r/datahoarder by /u/nicholasserra on 2025-02-08 05:18:39.

Will structure this better tomorrow. In the meantime use this thread for updates, concerns, data dumps, news articles, etc.

Too many one liner posts coming in just mentioning another site going down.

Peek the other sticky for already archived data.

Run an archive team warrior if you wanna help!

3708
 
 
The original post: /r/datahoarder by /u/bluerasberry on 2025-02-07 18:57:48.

As recent posts are discussing, the United States government discontinued hosting many datasets recently. I am trying to develop a Wikipedia article on this topic. I need to know who has said anything about this and what datasets are removed, and who might be privately providing archival copies. An unusual aspect of this is that the government did not issue documentation of all these changes.

Critically: I need to cite reliable sources for every claim in Wikipedia. Some of the reports of this are likely published in fringe professional newsletters. If anyone is aware of media coverage from reliable sources, could you please share? I can do general news search, but I cannot know where the field-specific update reports would be posted without someone telling me.

If anyone can post reports here, then I can sort that in a Wikipedia article covering the topic. Thanks.

3709
 
 
The original post: /r/datahoarder by /u/ArchonOSX on 2025-02-07 18:52:26.

Food for thought:

"A study looking at the stability of Blu-ray media has shown that overall, BD-Rs (whether they are the dye or the non-dye type) have rather poor stability compared to some CD-Rs and DVD±Rs (Iraci 2018)."

https://www.canada.ca/en/conservation-institute/services/conservation-preservation-publications/canadian-conservation-institute-notes/longevity-recordable-cds-dvds.html

Some other good information here too. Unfortunately, CD-Rs, with the smallest storage capacity, have the longest life span and BD-Rs are towards the short end of the list.

Be prepared to renew your archive every 5-20 years if you are using optical media for storage.

Happy Day!

3710
 
 
The original post: /r/datahoarder by /u/Capable-Commission74 on 2025-02-08 04:10:03.

I’m a data engineer and I’m very passionate about data personally. What’s going on? How can I get in on the action?

3711
 
 
The original post: /r/datahoarder by /u/noideawhatimdoing444 on 2025-02-08 03:00:23.
3712
 
 
The original post: /r/datahoarder by /u/didyousayboop on 2025-02-08 02:51:50.

Not to be confused with Discord...

Discourse forums are visually beautiful, thoughtfully designed, and just a pleasure to use. Here's what the Obsidian forum, running on Discord, looks like:

Obsidian forum - https://forum.obsidian.md/

Discourse is free, open source software (GPL v2 license). The setup process is extremely fast and easy. I've done it and I'm hardly even tech savvy.

I was very happy with my tiny Discourse forum during the time I had it. I've also enjoyed using Discourse forums that I didn't run.

The cost is very reasonable too. The Digital Ocean droplet that Discourse recommends for smaller communities costs $6/month and the droplet for larger communities costs $12/month. You can also use other hosting companies.

Discourse also offers their own hosting, but the limitation on pageviews is so low, I feel like that should be a deal breaker. You have to think not just about how many people will be using your forum regularly, but about what happens if a post gets shared widely online.

My one big gripe with Discourse is the Wayback Machine can't seem to properly scrape the webpages. This is apparently due to JavaScript.

I would just set this up myself right now, except that a lot of people here have ten times the technical skills I do, not to mention many already own servers or rent VPSes with extra capacity.

3713
 
 
The original post: /r/datahoarder by /u/Guardiansaiyan on 2025-02-08 02:43:04.

I see everyone else keeping their dat via RAW information.

I am not that knowledgable on how to do that. What I am doing is going where I can and saving everything via PDF, WORD, .JPG and pretty much anyway I can get that information.

Anyone have some sites PDF friendly that you think I should get?

I already did the CDC contraception/birth control site, Reproduction rights, the constitution and trying to get the time to download kiwix wikipedia for offline use.

I am looking for stuff people are not looking at because the data is small and you think it might not have been scraped.

Even data to small programs would do. Like WordPad or a PDF on the history of Yarn (The Sequel).

3714
 
 
The original post: /r/datahoarder by /u/Comfortable_Dropping on 2025-02-08 01:38:52.

Any recs for a single or double non raid hdd enclosure with usb and/or e-sata? I’ve seen the options online but the reviews are spotty or questionable. Looking for a solid option for non-unique backups. Eg, data will be in 2 other places but I’d like these to not fail and perfectly be made in US. Hdd size ~8tb each.

3715
 
 
The original post: /r/datahoarder by /u/lkeels on 2025-02-08 01:34:49.

This one will probably be scrubbed clean soon. I hope it's in someone's queue. I tried and had no luck copying it, but I'm no expert at this. I didn't see any mention of it here. I was also unable to browse it on the Wayback Machine.

3716
 
 
The original post: /r/datahoarder by /u/rocksboulders on 2025-02-08 01:30:11.

Basically a cabinet with 24-hour power, Internet, and access in a self-storage style warehouse. You own your hardware so you can pull the whole server out anytime. Site will be flood and fire free. You pay a monthly storage fee higher than regular self-storage units for the added electricity and Internet. Temperature controlled as well? You can buy disk drives and NAS storage in the reception.

Use cases I'm thinking of are for photographers/ videographers (professional and personal), data scientists on the go, dark web stuff (criminals?) and of course data hoarders.

What do you think?

3717
 
 
The original post: /r/datahoarder by /u/Expensive-Vanilla-16 on 2025-02-08 00:30:39.

I've been following this sub for a while mainly because I have a bit of hoarding tendency lol. Not quite as extreme as most here. Mostly old computer software, drivers, "media" and info I find useful. I've seen and figured out how to grab YouTube videos which seemed helpful and it was pretty easy.

Now I've seen a lot of people saving websites due to the government removal of content and shutdown of sites. I don't really have the room for all this but I would like to back up websites of some of my geology resources. I used to copy a few things and just bookmark sites. I've been sorting and backing up a lot of my hoard and came across my geology folder. I clicked on a few random bookmark links and they are gone. Which brings me here to ask, what's easy to use, using Linux?

I originally just copied text and paste into office. What a pain.

Next saving every page in Firefox one page after another was a pain too after about 5 pages and I gave up there.

Next I tried Htttrack? And it didn't work at all. Main page flashing rapidly and all links error out. Pretty sure I selected everything.

None of the stuff I want to save requires a sign in as of now. Is it possible to save a site and it's links? So I can browse offline ?

If I really would be better off using windows, I do have a win10 machine though I hate using windows anymore. So freaking slow to boot and always wants to update...

3718
 
 
The original post: /r/datahoarder by /u/ApricotDismal3740 on 2025-02-07 23:45:44.

ftp://ftp2.census.gov/ appears to be back up. If you can, grab as much as you can. It's already gone down once and my bet is data will start dissapearing.

3719
 
 
The original post: /r/datahoarder by /u/LaundryMan2008 on 2025-02-07 23:33:04.

I was going to post more frequently but due to the increased stress of the mobile game app development in college has made me forget about the data storage mediums post time until it was too late, might be a good idea to pre-prepare my posts so that they can go out on time without much work needed on the day of posting to minimize stress.

I do like collecting various data storage media and seeing what I have on my wall but it’s the preparation of my posts that is tedious, the sharing itself is fun as I like to share my latest acquisitions even if you guys don’t care that much, the people that don’t care too much and want my infrequent posts gone, please point me to a better sub to share my latest data storage media which could be r/vintagecomputing but I’m not sure if they would appreciate it but since I’m doing the backup tapes the most, then they might appreciate seeing what I am putting up as it’s relevant to their stuff.

Today I don’t have much to share, I found a blue CD-R in the bottom of one of his many spindles my dad had which I thought was a myth or just in America as every CD-R I came across at home and work experience was green so I reasonably assumed they didn’t exist (stupid, I know as I see lots of posts talking about them comparing them to the green/silver ones) or could not be found in the UK.

The main difference was in the quality of the disc, there is a third color which is gold/silver that I don’t have yet and assume to be very rare to find, these gold/silver discs uses a dye called Phthalocyanine which are of the highest quality and will 99.99999% of chance will not screw up and waste a disc during burning which was good for archival purposes which needed reliable discs and in cases where a factory stamped appearance (very small batches of special edition music discs without making a master and hogging a production line) is needed.

The next step down in quality are blue discs which uses a dye called Azo which people can buy if they want longevity but not for the steep prices that I assume the gold/silver discs commanded back then, they have a slightly lower step in quality but gives a nice shade of blue and is readable in most of the fussiest CD drives out there.

The final step down in quality are green discs which uses a dye called Cyanine which is of the lowest quality and may fail a disc or two every 10 discs, these discs are the worst discs you can buy which will rot over time and you will lose data stored on these discs but are good if you have a tight budget (pocket money) or want a use case disc for exchanging software at a piracy convention or to install some software on a legacy machine that only needs to be used once or twice at most.

There is a final undiscussed CD-R format which is ablative, there isn’t much on the web about it and not much is known about it, the only mention was an Archive.org archived website (sparse but the tabs and hyperlinks provided small morsels of information and one picture to give a brief overview) as I was doing my research on the different types of CD-R, all we know is that it was used in businesses and datacenters for backing up huge volumes of data in a compact way without using the massive Plasmon LM-xxxx, Sony writable disk (WD) or Kodak 6800 (these large format optical disks are my white whales which would be amazing to find one anywhere or have anyone give me one (found Plasmon LM-4000, CRVdisc and a hyper rare LaserRecorder re-recordable disk cartridge on eBay commanding high prices for their rarity) and if a drive shows up in any condition for any disc, I’ll take it as they are rarer then platinum nuggets in a common UK garden) ablative disks which need a large drive and media which some places might not have the space for and to have a full height drive that fits easily into a PC, these discs didn’t come in a caddy for burning as the laser can come up close and personal to the disc to etch the pits and lands with a lower power laser but when it’s time to put them away, that’s when they get put into a caddy which gets used by read only drives which are significantly cheaper than burners, these discs had hard sectors like DVD-RAM which was cool to look at.

Also, final thing, RIP Foone’s Twitter account, it has gone private which is a massive shame as it was my main source of information about extremely rare data storage formats that she/they was able to obtain and take good pictures of them and the associated drives, sent a follow request but that has gone unanswered unfortunately, her account is: https://x.com/Foone if you wanted to see and give an attempt if you are interested

Thank you for reading this Friday‘s post and I hope you have a great day, if you have any queries, thoughts about the format, additional information or to point out a mistake, please put them in the comments :)

Link to previous post, post 12 (29th week):My data storage mediums, post 12 (29th week) : r/DataHoarder

This is the best picture I managed to capture of the color difference

I like the label design of this disc, feels very high quality/luxury to me, looks like a goldish silver CD-ROM if looked at directly but at an angle reveals the text and lines

That’s where it goes, will be doing the same with the mini discs and if I ever acquire the gold/silver one, I will do a “trifecta” triangle with the gold/silver disc on top for both of all 3 are available for the mini size

3720
 
 
The original post: /r/datahoarder by /u/MagePages on 2025-02-07 23:15:32.

Forwarded message from a group chat of environmental professionals.

"Hey guys, just a PSA. I've heard indirectly from employees of NREL, the US Fish and Wildlife Services, and National Resource Conservation Service that their databases will be taken offline tonight. I'm not sure what the extent of this will be, but it may be good to download/back up any critical data/material you use from those agencies just in case if you're able, and probably other related gov agencies as well.

Can confirm. Also a message from a friend: A note for people who use GitHub, if you fork a repository that is public, if the initial repository gets deleted the fork will remain. If you fork a repository that was originally public and it goes private and then it is deleted that fork will still exist. If you use GitHub, I strongly recommend forking your government repositories.

Heads up, we heard the database situation from: NREL, EIA, NRCS, and USFWS"

3721
 
 
The original post: /r/datahoarder by /u/Dangerous-Forever-22 on 2025-02-07 23:04:49.

Good day to all, I have recently got into data hoarding (specifically old shows and games that has been discontinued or semi lost). Are there any tools that help assist me or guides into doing this. I have a YouTube downloader and is learning wireshark. I also have a recycled external disk that I use to store shows and games on it. Any advice is accepted.

3722
 
 
The original post: /r/datahoarder by /u/Kade_the_healer on 2025-02-07 21:39:49.

I've got a couple 3.5" SATA drives that I need enclosures for. While I prefer buying things in-person, I seem to be running into a wall where I live. If I was to purchase an enclosure online, what's one with a good rep and a low price tag? Prefer not to buy on Amazon, but I know that sometimes things just are what they are

3723
 
 
The original post: /r/datahoarder by /u/Quotillon on 2025-02-07 21:25:45.
3724
 
 
The original post: /r/datahoarder by /u/Full-Brain8205 on 2025-02-07 20:37:42.

So I am a bit of an eccentric and have a complete home comms room with a 10Gb fiber connection (dedicated) along with a storage array of ~480TB (free).

I want to put the storage array and the free bandwidth on the line to good use.

Many years ago, I used to run mirrors for Apache and other outlets however with the advent of CDN, etc they become obsolete.

Now, I've fired up a torrent server with the entire fosstorrents.com archive along with Linuxtracker. Now, I still have ample disk space and the seeding for both of these has barely made a dent in the maximal bandwidth available.

What other projects could I help with? Ideally, Torrent-based with an RSS feed so I can automatically grab them.

Any ideas, suggestions, and comments are greatly appreciated in advance.

3725
 
 
The original post: /r/datahoarder by /u/synthwavesurferart on 2025-02-07 20:12:40.

Of course, there has been a lot of discussion about the datasets being scrubbed from data.gov during the start of this Trump 2025 administration (I believe the count is around 2000 at present?). I recently stumbled on peculiar information that Biden had also scrubbed a significant chunk of datasets from data.gov around the start of his administration. From 2/6/21 to 2/7/21, I noticed on web.archive.org that the datasets available on data.gov decreased from 218384 to 192180 datasets (21,838 datasets removed). I am genuinely curious why that happened and what datasets became inaccessible under Biden. It is more obvious what Trump has scrubbed from data.gov and why, but how do we explain the apparent Biden dataset scrub? Furthermore, what kind of datasets were removed under his administration? I tried looking this up online and no one seemed to sufficiently address this.

https://web.archive.org/web/20210207101043/https://www.data.gov/

view more: ‹ prev next ›