It's A Digital Disease!

23 readers
1 users here now

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

founded 2 years ago
MODERATORS
4401
 
 
The original post: /r/datahoarder by /u/orschiro on 2025-01-21 12:30:38.

I know HTTrack which downloads an entire website plus subpages as html.

But I'd like to have them as markdown or text files.

Does anyone know of a free software that can do that?

Thanks!

4402
 
 
The original post: /r/datahoarder by /u/sickTheBest on 2025-01-21 12:26:03.

Hello,

I am looking for new drives for my NAS and amazon currently has ST12000DM0007 certified drives. What are your experiences with it? are they reliable?

4403
 
 
The original post: /r/datahoarder by /u/Lost_Ride9310 on 2025-01-21 12:19:11.
4404
 
 
The original post: /r/datahoarder by /u/luxfc on 2025-01-21 10:36:14.

Original Title: Got this 8TB QVO SSD for a great price, spent 16 hours doing some S.M.A.R.T tests and after after that Custaldiskinfo is only reporting 22h and 3GB written. Did someone sell a brand new drive or was the firmware altered before selling? (Got it "used" from Cex)

4405
 
 
The original post: /r/datahoarder by /u/Feeling_Usual1541 on 2025-01-21 10:24:02.

Hello,

I would like to hoard a local backup of Wikipedia.

I’ve read the Database Download page on Wikipedia but most tools seem outdated. XOWA images are from 2014. MzReader link no longer work.

What would be the best tool in 2025, if there is one, to browse a local backup of Wikipedia?

Thank you.

4406
 
 
The original post: /r/datahoarder by /u/theoldgaming on 2025-01-21 08:02:47.

So... Just hopes this is not the wrong sub I know pretty darn well that microSD cards as long term storage are a bad idea, but i have a few questions about this very specific lineup of samsung cards.

Context: i have a Samsung Pro Plus 256GB microSD for 2 years now.

Questions: how long does a microSD card of this grade/class usually last with low usage? (Provided there would be no premature failure)

What "features" does this have that are not usually talked about (if anyone knows) like ECC?

Do you recommend Samsung microSD's or is there any better brand?

What is this particular microSD good for (device)?

How does one "care" for a microSD card?

And anything else i should know.

Ps. Why post here? Cause there is a lot of discussions about storage devices here.

4407
 
 
The original post: /r/datahoarder by /u/STORMFIRE7 on 2025-01-21 07:52:40.

Hi there, Is there a browser that downloads the webpages and everything they require to be viewed offline as you browse a webpage, and then when you visit that specific webpage URL again and you don’t have internet connection, the browser just shows you the offline version of the webpage that it archived when you previously visited the same webpage?

I have searched around and found many crawlers suggestions such as HTTPTrack , Heritrix, openWayback, singlefile etc, but I don’t want to archive entire websites, i only wish to crawl/download the current webpage i am using so i can visit it later aswell if internet goes out

4408
 
 
The original post: /r/datahoarder by /u/Difficult-Wasabi-988 on 2025-01-21 07:44:15.
4409
 
 
The original post: /r/datahoarder by /u/SnooSongs1525 on 2025-01-21 07:38:16.

Sorry I know extremely little about this stuff. I just bought a UGreen DXP2800 and have uploaded most of my files and access them using the UGreen client and site. Works great. My concern is that I do have some sensitive stuff and, in the process of doing part time work as a drone pilot, I've come in contact with concerns over companies like DJI where your data from Chinese consumer products can be shared with people you don't want to have it. Is my Ugreen NAS data private when I use their client app? Am I too paranoid?

4410
 
 
The original post: /r/datahoarder by /u/EthanWilliams_TG on 2025-01-21 07:24:29.
4411
 
 
The original post: /r/datahoarder by /u/Jacksharkben on 2025-01-21 05:33:19.
4412
 
 
The original post: /r/datahoarder by /u/primeSir64 on 2025-01-21 05:26:02.
4413
 
 
The original post: /r/datahoarder by /u/NeedstheFacts on 2025-01-21 03:46:18.

Hello All,

I hope this would be an appropriate place for this post, but if not, I apologize in advance.

Currently I have about 20 TB in storage on an ZFS array having ~40TB of usage storage. Before I get into the details, I realize the setup isn't ideal and it was what I had at the time, but now I want to update and would like your 2 cents.

My pool is structured in two vdevs with the following setup:

  1. 4 x 14TB Seagate Exos in raidz2
    1. I know this is inefficient, but I wanted double redundancy for whatever reason)
  2. 6 x 4TB HGST in raidz2

I'm using a LSI SAS 9300-16I SAS to Sata HBA with 10/16 connections used and running Ubuntu server 22.04. I realize the OS may not be ideal, but it's what I knew and was comfortable with. My case is a desktop case that I'm added extra storage cages to and can hold 10 HDDs.

Overall I have used about half, but I'm worried about the 4TB drives and would like to swap them out for more 14TB drives that I now have. The issue is that I'm not sure the best way to upgrade the pool while retaining the data. Most of this data is not critical so I only had a local copy (mostly due to not wanting to spend on the backups). My first thought was that I need to destroy pool and rebuild. My plan was to copy everything to a Backblaze B2 bucket, destroy/rebuild and then redownload. However, this is taking forever to upload with 300/300 FIOS and I'm worried that the download would also take too long, possibly taking multiple rsync calls if the connection breaks or I need to restart my server.

I want to replace the the 6 x 4TB drives with 2 x 14TB drives and make one singular vdev (6 x 14TB raidz2). Which would take the number of HDDs in my case from 10 (maximum number I can fit currently) to 6 which would give me 4 extra slots in case I need to add a drive or replace anything while also increasing my pool size.

Does anyone have any obvious tips that I'm missing or have I doomed myself with my poor setup?

Thanks for any help as I'm a new data horder and have never attempted something like this before.

4414
 
 
The original post: /r/datahoarder by /u/keigo199013 on 2025-01-21 02:59:34.
4415
 
 
The original post: /r/datahoarder by /u/bennibeatnik on 2025-01-21 02:47:57.

Im fairly new to gallery-dl for scraping galleries, so if there's a better way to accomplish this, please let me know.

I got tired of using the command line to scrape galleries so i wrote this script that uses gallery-dl.

At this time it's only for windows, but i am going to upload a mac version as well in the coming days.

It's a simple batch file that uses a URL saved in your clipboard and when run, acts on that URL, opens a save dialog box, and prompts for a naming scheme. It then downloads the files to the chosen directory and appends the chosen naming scheme with file numbers. If run again using the same naming scheme and same folder, it checks for the largest number and starts from there.

I set the file to load using a keyboard shortcut and button on my stream deck to make things even easier.

It's my first attempt at writing code, so it's definitely not perfect, but I hope some of you find it useful.

If you have any questions, feel free to ask!

https://github.com/bennibeatnik/Gallery-Downloader

4416
 
 
The original post: /r/datahoarder by /u/tuoepiw on 2025-01-21 01:59:56.

Hi there,

There's a good chance this is well known documented and I just don't know what it's called so bare with me.

My Setup is currently a Server Chassis that has a LSI 9300-8e, I run two cables from that into a 24 Bay Box below it that has a backplane with 4 SAS connectors.

I then use the other two connectors to run back out and connect to the second 24 Bay Box and this all works nicely.

I'm looking at getting a 3rd box to expand further and it got me wondering if the only way to connect this is by running another 2 cables from the 2nd box to the second box... or is it possible to create a ring where one of the ports on the 9300 goes to Box 1, the other goes to box 3, and both of the backplanes within them connect to Box 2?

Reasoning is that now It's getting a little large I'd prefer to have the ability for say Box 1 to be taken off line while drives from Box 2 and 3 are still available?

4417
 
 
The original post: /r/datahoarder by /u/Zelderian on 2025-01-20 16:09:49.

I run my Plex serve on a refurbished mini desktop purchased off Amazon a few years ago, and it does everything I would need it to. However, it's stuck on Win10 due to hardware limitations, and I received notice that, since Win10 will be EOL in October, there will be no future updates.

The machine is connected to my local network, and I'm assuming it'd run the same risk as any other computer running on an unsupported OS, where over time, it'll be a continuously bigger risk. Is anyone else in this boat with having to replace old hardware for the sake of future security updates? I'm assuming I know the answer, but is there any workaround to this to avoid unnecessarily upgrading?

4418
 
 
The original post: /r/datahoarder by /u/Majestic-Monitor-157 on 2025-01-20 15:50:13.

What's your process? Thinking about how to restore from both offline and online "cloud" backups.

For example, how do you test restoring your computer from a backup? I'm particularly nervous to test this and wonder if I should try restoring to a different computer to be safe.

Haven't found many resources about this online, even though people stress its importance. Would appreciate resources.

4419
 
 
The original post: /r/datahoarder by /u/cummy-hands on 2025-01-21 00:59:50.

My family and I are wanting to consolidate a bunch of our photos onto a shared service. We thought $10/mo for Google Drive would be good, but we'd like to explore the possibility of hosting something from one of our homes (probably mine).

It wouldn't have to be super fast - just always-accessible, private to us, and if we want to share, easy to do so from our phones and laptops.

What are my best bets for getting the hardware needed for something like this, and are there easy to follow guides that anyone here might recommend from their own experience?

4420
 
 
The original post: /r/datahoarder by /u/butmahm on 2025-01-21 00:39:10.

I have an older c2000 board, tick tick tick until she blows. I want ECC, will run 6 drive z2 in a node 304 case. I was eyeing the Supermicro A2SDi-4C-HLN4F for $300 on eBay but there's a million options. I would prefer low power draw as this is just for backup not maon nas duties. xeon e, amd pro, etc seem to be too expensive. Is there anything I should be looking at? I really wish the n100, n150, etc had ECC but alas. Thank you in advance

4421
 
 
The original post: /r/datahoarder by /u/Free_Snails on 2025-01-21 00:34:34.

I downloaded Wikipedia last night, the most recent 102gb Zim available on their software was from January 2024.

There's a lot of important events from the rest of 2024 that I'd like a Wikipedia record of.

With the current political situation around the globe, I worry for Wikipedia. Losing it would be our equivalent of losing the library of Alexandria.

Is there any way that I can get a copy for use on kiwix that's much more recent?

How often do they usually make these data dumps?

4422
 
 
The original post: /r/datahoarder by /u/DanSantos on 2025-01-21 00:28:54.

Original Title: I've read through the top posts on converting VHS to digital. I've read the guides, but I'm wanting to know if I can convert to a decent quality with this deck. Also, what software should I use on Mac OS?

4423
 
 
The original post: /r/datahoarder by /u/Uoipka on 2025-01-21 00:19:46.

I was recording in OBS with kinda high bitrate&fps, 1.5gb-2gb for 2.30h-3.5 hours, Mediainfo Dump, currently I convert in HandBrake with qsv_h264 and 30 quality,20fps which results in 600-800mb

Any recommendation for reducing file size?

Video in question

Considering video is mostly static maybe different Video Encoder is better or Filter, Encoder Tune?

At Balanced Encode Preset it takes ~25min, would like not to wait more than 40 minutes. Technically I have a old PC with i5-3330 that I could use remotely for 1-2 hours encode if it make sense at all

I have 12400F and Intel GPU A750

4424
 
 
The original post: /r/datahoarder by /u/El_Chupachichis on 2025-01-21 00:18:42.

Trying to consolidate old disks (3.5" disks, sub-1.5 TB disks, disks that have been stored for years in a box, etc).

I'm 90% certain files on them are redundantly copied, but before I toss the original drive, I'm moving files over to a disk for review. What apps (free preferred, modest cost if all the free ones suck or have goofy limits like only scan the first TB of files, etc) would be recommended to identify files that the system believes is redundant?

4425
 
 
The original post: /r/datahoarder by /u/silvermir on 2025-01-21 00:15:40.

Hello everyone,

I’m looking to move my data in a professional manner and seeking proven methods and tools. So far, I’ve encountered the following issues:

  • Copying: When copying, the creation or modification dates of files change, which is a disadvantage for me.
  • Moving: When moving, I’ve experienced data loss multiple times due to interrupted network connections, frozen computers, or power outages.

My questions to you:

  1. Moving vs. Copying: Which method do you prefare for transferring large amounts of data?
  2. Recommended Tools: What tools or programs do you use to securely move data while preserving metadata? (e.g., Robocopy, rsync, etc.)
  3. Safety Measures: What measures do you recommend to avoid data loss during interruptions?
  4. Automation: Are there scripts or automation tools that make the process easier and more secure?
  5. Best Practices: Are there general best practices you follow when professionally moving data?
  6. Error Handling: If you’ve moved a large amount of data (e.g., 5 TB) and an error occurs, how do you handle it? Do you verify all data with checksums despite the time it takes, or is there a more efficient solution to ensure data integrity?

I would greatly appreciate hearing about your experiences and any tips you can share!

Thank you in advance!

view more: ‹ prev next ›