It's A Digital Disease!

23 readers
1 users here now

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

founded 2 years ago
MODERATORS
2576
 
 
The original post: /r/datahoarder by /u/RunEffective3479 on 2025-03-25 17:51:04.
2577
 
 
The original post: /r/datahoarder by /u/VagueMedal1 on 2025-03-25 17:47:59.

I am starting to rip my 500-600 disc DVD collection (~50 are Blu-ray). I have a few movies/shows stored on my second PC with 2 4th HDDs on regular windows. I'm running jellyfin on that computer just to try things out and see if I actually want to do this. I'm thinking I want to rip them all so should I look into a NAS (truenas or OMV) and run Jellyfin as a VM, docker, or separate PC. Or just add more drives onto the current PC (I don't think 8tb is enough but we'll see.

It does need to be simple enough where when I move out of my parents I can walk them through repairs if something happens (drive dies). If I can do secure remote management that could work.

A general nas would be nice but have to make sure it's powerful enough to do Jellyfin streaming or just have a separate PC for that. I'm willing to do DIY but prebuilt nas does make things easier.

2578
 
 
The original post: /r/datahoarder by /u/Thrillho_Sudaca on 2025-03-25 17:13:47.

I had an old macbook with Mac the Ripper that I used to rip DVDs, and it would output to _TS folders, but that macbook bit the dust. I wish to find another program that will continue to save the rips as _TS folders, but I haven't found any as they all seem to copy as iso now. Any recommendations?

2579
 
 
The original post: /r/datahoarder by /u/the-real-things on 2025-03-25 12:25:18.
2580
 
 
The original post: /r/datahoarder by /u/itsthelifeonmars on 2025-03-25 15:55:24.

I have about 12-13k song files I want saved as it’s maxed out my Mac atm. I’ll also use it for other document storage.

Mustn’t give me a hard time when connecting to my iMac or MacBook.

Ideally 2tb and reliable.

2581
 
 
The original post: /r/datahoarder by /u/tomorrowplus on 2025-03-25 14:48:49.

What are your thoughts on it?

Self-encryption seems like an important feature - in that it allows for deduplication after encryption. The same file (or block) will result in the same encrypted version regardless of who encrypted it. Assuming the network becomes big, there will be lots and lots of deduplication potential, which will of course result in lower price per unit.

It will be interesting to see where the price / TB converges. Massive deduplication will result in lower cost, but on the other hand all storage will be permanent and there will be plenty of redundancy baked in.

2582
 
 
The original post: /r/datahoarder by /u/force73 on 2025-03-25 14:29:50.

Hi Hoarders,

I'm simply to stupid this week. Please see the image below. I have a complete set of year based folder (2010 to 2019) with all images/videos from the single years (no subfolders, just the files directly there).

I also have collection folders of events I was the photographer (one folder per event) - on the left side.

Question: How can I simply remove all files from the folders in (B) that are already sorted to folders in (A)?

I'd like to reduce duplicated files but be to stupid to make it. I tried different tools but most of them search for duplicated files not between the side A and B. The tool doesn't have to be free (but cool if) but should just work for this scenario.

Any advice? Thanks.

https://preview.redd.it/rcxhx145juqe1.jpg?width=836&format=pjpg&auto=webp&s=61f13a999d48d266f25b606334c87029955bdfa4

2583
 
 
The original post: /r/datahoarder by /u/Nihan-gen3 on 2025-03-25 14:11:06.
2584
 
 
The original post: /r/datahoarder by /u/drupadoo on 2025-03-25 11:40:01.

Like what happened to all the old beige boxes from the 2000s? do recycling centers or repair shops have them lying around?

I have some old drives and MOBO I want to just drop into an old case for low importance backup.

Seems wasteful to buy a new case and My performance requirements are super low and I don’t need anything sexy with blinkinloghts and a full hvac system.

2585
 
 
The original post: /r/datahoarder by /u/PervertedIncentive69 on 2025-03-25 11:05:35.

Something like this?

https://buy.hpe.com/us/en/storage/tape-storage/c/304612

I have also considered burning a huge stack of m-discs but those are harder to bury in the backyard.

2586
 
 
The original post: /r/datahoarder by /u/eravulgaris on 2025-03-25 10:41:38.
2587
 
 
The original post: /r/datahoarder by /u/sunbeamian on 2025-03-25 07:19:40.

I don't use RAID or anything else, I just have a lot of data. I currently have an optical drive and 7 HDDs along with two NVME drives in my Fractal Design Define R3. I don't have the money to consolidate the drives into larger ones, so still am going to need at least 6-9 HDDs in my build.

Upon searching around I could only find the Godlike MB with 8 sata ports, the Taichi and Taichi Lite with 6 ports, and everything else seems to only have 4 ports or less.

So I'm thinking I will need to rely on either a PCI-E SATA adapter card or a SAS HBA.

I'd never even heard of SAS HBA before delving into this topic, but I saw a lot of people saying that most SATA adapter cards were bad chipsets or firmware or something (Marvell particularly), and 'better off with SAS HBA'. However I've also read of people saying the opposite, that they prefer to use PCI-E SATA adapters.

As a Windows desktop user that just wants access to more data without any RAID or similar options, what is my best method for drive expansion?

I was looking at a SilverStone ECS06 6 Port SATA3 ControllerSilverStone ECS06 6 Port SATA3 Controller, but it seems like SAS HBA adapters are even cheaper that that (and I'm not sure if that Asmedia controller is worthwhile). Perhaps I should also be considering M2 drive adapters? However I feel like a PCI-E adapter would be more useful as I might use the M2 drive slot at some stage in the future, but I don't usually have a use for spare PCI-E connections. This is an M2 option with Jmicron controller I could potentially use: Silverstone ECS07

Also if I did get an SAS HBA, what considerations do I need in regards to the available PCI-E lanes? I was looking at potentially getting a MSI X670E GAMING PLUS WIFI which has a spare PCI-E x4 slot.

Also if I want to run more than 8 SATA drives, if my PSU only has power for 8 SATA, what options do I have there? Other than buying a PSU with 10-12 SATA power?

I have also heard that SAS HBA cards without fans need additional cooling to run in a desktop. Considerations for that?

Thoughts?

2588
 
 
The original post: /r/datahoarder by /u/LilSassy69 on 2025-03-25 05:04:42.

I have a Legion Pro 7i Gen 9 that I want to put an 8TB into as it's second slot. I already have a 1TB Samsung 990 Pro for the OS and a 4TB 990 Pro as Storage.

Everything I've read so far has been entirely speculative and no anecdotes on someone who tried to use a 8tb double sided in a laptop and it had problems or any kind of recorded temps so I was hoping someone here may have had a good/bad experience with an 8TB in a 16". The majority of questions about this are really if they will even fit physically and I already know that mine will even if it's double sided.

While I assume temperatures should stay close to the 4tb assuming I'm not writing huge files constantly I've read otherwise so any input would be appreciated.

Edit: Video Editing and 3D Modeling/Animation. Currently looking at the WD_BLACK 8TB SN850X

2589
 
 
The original post: /r/datahoarder by /u/iswaosiwbagm on 2025-03-25 04:51:35.

Hi! My bluray burner (an LG BH14NS40 made in 2012) recently decided that its burning career was to end soon. It can now only burn BD-RE, and not without issues. It hasn't outright failed a burn since I cleaned the lens and lubricated the carriage's acme screw, but the laser diode seems to be failing, despite having only burned around a hundred discs. Some of them were burned at 12X or maybe even 14X though, which apparently really cuts into the lifespan of the blue laser diode.

I have about 1.5 terabytes of data to backup at the moment, but my data collection grows mostly slowly and incrementally, at most a hundred gigabytes per year. I've read that the LG bluray burners like the WH14NS40 manufactured recently are not as reliable as they once were. Is that truly the case? Are Pioneer drives really that much more long-lived for mostly burning jobs? I could get a BDR-S13UBK for ~210 USD (300 CAD) vs ~60 USD (90 CAD) for an LG WH14NS40. The external Pioneer drives are ~30% less expensive, but I question their reliability.

I'm also considering migrating away from bluray for my backup needs. As much as I enjoy using optical media, Bluray is on the way out. I know the usual wisdom here says to use hard drives below 50TB of data, but I've had the misfortune of learning twice that when a hard drive dies on a shelf, you lose the data on it since the media can't easily be separated from the drive itself, which is why I switched to offline media in the form of bluray for my main backup. I'm also clumsy enough to drop the precious backup hard drive when I need it the most or unlucky enough to get a lightning strike which blows up stuff despite having a UPS (like a stuck bit in the server's Ethernet PHY's receive buffer), so at the very least, I'm looking for something that can be disconnected.

However, the slow transfer speed of BD-RE makes it impractical to do a full backup more than yearly, even with enough automation. Especially for having a duplicate set that I could take offsite. And, ironically, doing a full backup on BD-R at 6x or faster requires too frequent intervention even with automation. The only manageable way that I've found would be to use 100GiB BD-R media, which still has a slight advantage cost per gig if you get it from Amazon Japan. I could then burn a disc in the evening plus a maybe a second disc at night, reducing the wall-clock required time for a full backup from around a month to about a week.

I would ideally need 2 burners, but I've found a manufacturer refurbished Quantum LTO-5 SAS tape drive nearby for less than 2 Pioneer bluray burners, so I'm tempted to make the jump to tape. I've also seen LTO-4 new old stock drives online at an okay price, but I'm guessing these will need some lubrication or other maintenance before powering them on, right? Also, are there any gotchas to know about pre-owned SAS HBAs? Or with using a tape drive on linux?

Another option I'm considering is an SSD that doesn't use QLC flash. Given that it would be plugged in once a week, I don't expect issues with data retention, not with a weekly scrub and monthly full refresh at least. The price for one or even two 2TiB TLC SSD is cheaper than a tape drive, and solid state media fares better in clumsy hands as well as not needing mechanical maintenance, but I was curious about the downsides of SSDs for cold-ish storage.

Finally, because my upload transfer speed is only 30 mbits and I work from home as a software developer, I'm not sure if backing up to the cloud would be feasible.

Any other advice is much welcomed. Especially if you know a backup software on linux that can deal efficiently with folder reorganization and file renaming which would help with using slower media.

2590
 
 
The original post: /r/datahoarder by /u/ElectroCosplay on 2025-03-25 02:47:00.

This is from a game I played years back. I’m a cosplayer and have been getting into 3D printing. I want to be able to study this 3D model and make one for my cosplay. Is there a way to extract this file? I’m not very tech savvy.

https://aionpowerbook.com/powerbook/Item/102001046

Thanks in advance

2591
 
 
The original post: /r/datahoarder by /u/PowerHairy on 2025-03-25 00:25:19.
2592
 
 
The original post: /r/datahoarder by /u/kini9 on 2025-03-24 22:57:34.

I use StableBit Scanner to monitor my drives. Wondering if, if I start encrypting them with VeraCrypt, does StableBit Scanner still work perfectly?

2593
 
 
The original post: /r/datahoarder by /u/fmillion on 2025-03-24 22:28:51.

I have two Seagate 8TB Archive (SMR) drives that I use strictly for offline backup purposes. Both of them were in Seagate USB 3 external enclosures. I originally got these on a Black Friday sale some time back, I knew they were SMR but for offline backup use I had no issues with that.

One of the disks started acting strangely during a backup. It seemed to be taking unusually long to read data during backup verification, sometimes stalling out and sometimes reading around 3-4MB/sec. You might expect that from an unmanaged SMR drive during intensive writes, but generally not during reads. I figured that perhaps the drive could be going bad - it's probably 6 years old now (but it has less than 500 hours of logged power-on time since I bought it on sale strictly to use for offline backup). I decided to go ahead and shuck the drive so I could connect it directly to my HBA.

I powered off the drive and opened the enclosure (which was pretty warm to the touch) and the drive was HOT. Way TOO hot. It was hot enough to burn you if you touched it for longer than a couple of seconds.

I let it cool down, thinking that perhaps the drive was actually going bad - maybe bad bearings or a seal leak? But I decided it was worth seeing what happens when I shoved it into my test bench machine. (I have an Icy Dock trayless SAS-capable bay attached to a flashed LSI SAS card - works great for using cheap SAS drives for offline backups!) It showed up just fine, and I ran a SMART test. The temp was down to 55C, but the temp history log showed the temp reaching up to 79C! I definitely can't imagine that's "happy" territory for a spinning drive that was only running for a few hours.

I tried a full read test on the drive and there was no slowdown or any issue in performance. The read speed was consistently above 100MB/sec for sequential reads. And most importantly, the drive temp fell down to and then did not exceed 43C throughout the entire test. I also ran a random seek test for over 5 minutes, and even then the drive only hit 45C. I ran the backup again and this time everything went perfectly, even the read-verify step, at the same speeds I'd normally expect from this drive.

Not shucking your drives could actually be worse for them than shucking them and putting them into an appropriate disk shelf with good ventilation!

2594
 
 
The original post: /r/datahoarder by /u/Clive1792 on 2025-03-24 21:52:55.

Maybe unlike me you're actually smart & organised from the get go so never found yourself with a task to take on. I on the other hand have 1000s and 1000s of photos, videos, documents, all sorts. On top of that I'd find myself not sure if something was backed up or not so I'd make a copy to a new drive, I'd maybe even buy a new drive & then copy things over. I know in some cases I've got things (files, folders, some times entire drive contents) backed up a number of times on a number of different drives. You may say this is good practice but I've no idea what's where, it's just scattered with no organisation.

I'd like to organise things so say family photos are together in some kind of order, music is together in some kind of order, random images together, nrop is sorted (way too many files there!) so that when I want to find say a copy of a contract I signed then I know that I need to navigate to XYZ & it's right there, rather than spending hours pulling out all kinds of different drives searching for the needle in a haystack.

So how big was the task you took on & also importantly - how did you do it & how long did it take? Was it a manual file-by-file job that took weeks/months/years or did automated programs help you in parts?

Just feeling a little overwhelmed & wanted to hear how others did it.

2595
 
 
The original post: /r/datahoarder by /u/NameEfficient4047 on 2025-03-24 21:08:55.

Hi all. Long story short, I work for an organization that has been saving audiovisual materials to external hard drives for decades. These files only exist on these hard drives right now, which is obviously not great. We are in the process of creating an asset management system where the files can be migrated (and these drives will serve as back-ups).

For now, I am trying to create a system to "inventory" these drives so we know what's on each one. I'm using a script (batch file) to generate a file manifest for a given drive and including some technical metadata like file name, file path, file size, last modified date. It saves it as a .txt file and I am attaching it as an attachment in an Airtable base, where we're tracking the inventory.

I thought it would be good to generate checksums for these drives so I can monitor the integrity at set intervals (maybe every 6 months?). Most of these drives are 2TB and nearly full. I wrote a script for Powershell to generate SHA256 checksums and export them as a CSV. (I see it's doing this, but also generating a .txt file in each sub folder of the drive for each checksum, which I plan to delete once it's completed. And also to tweak this script so it does not do that).

At this point you may see where this is going. It's been nearly 5 hours and it's not completed yet. I understand SHA256 will take longer than MD5, and that 1.5 TB of mainly audiovisual files will also take a long time. I have been using the Powershell because it can be a bit of an ordeal to install software on our work machines, but I can go that route it need be...

A few newbie questions:

  • Is there a more efficient way to go about this? Or is this length of time unavoidable due to the size of/number of files?
  • Would using a separate software accomplish this task significantly more quickly than Powershell?
  • Is it a fool's errand to be generating checksums at all at this point, when there is no duplicative copy to restore files if I discover they are degrading anyway? Should I just hold off on this part of the workflow and revisit it closer to the time we plan on copying these files to centralized storage (with these drives serving as the back-ups)?

Since we have no record of these drives at all, I will still go forward with the inventory process either way, just so we have a list of what we have. If anyone is curious, in addition to the manifest, I'm assigning a unique barcode to each drive, and recording drive format, connection type, file types present, file manifest (attached as txt file), drive capacity/usage, date of last SMART health check. Definitely open to any other suggestions of important data to be recording while we're at it.

Thank you so much for any guidance and please be gentle as this is not my area of expertise, but I'm desperately trying to learn and do the right thing so we don't lose these audiovisual files forever. Thank you!

2596
 
 
The original post: /r/datahoarder by /u/Significant-Lab-1638 on 2025-03-24 17:44:44.

I want to store a lot of my data including family photos and other private stuff like game clips, documents,... Since both of these go on sale in my country, which one should I pick for a long time storage around 7 to 10 years or more. T7 shield seem to have a good resistance and handier but I heard that HDD can last longer with good care. Which one should I pick ? Can only pick one

2597
 
 
The original post: /r/datahoarder by /u/TheTwelveYearOld on 2025-03-24 23:49:58.

For years I've on and off looked for web archiving software that can capture most sites, including ones that are "complex" with lots of AJAX and require logins like Reddit. Which ones have worked best for you?

Ideally I want one that can be started up programatically or via command line, an opens a chromium instance (or any browser), and captures everything shown on the page. I could also open the instance myself and log into sites and install addons like UBlock Origin. (btw, archiveweb.page must be started manually).

2598
 
 
The original post: /r/datahoarder by /u/JamesRitchey on 2025-03-24 23:18:13.

Inspired by this other post, I made a PHP function for copying files, and sorting them into folders by date factors.

Download Link: https://github.com/jamesdanielmarrsritchey/ritchey_copy_and_sort_files_i1

Pros:

  • Open source
  • Copies files, but doesn't do anything with the originals.
  • Can create sub-folders for year, month, and/or day (e.g. '/year/month/day' '/year/month' '/day'), provided the mixture doesn't result in file collisions.

Cons:

  • This is just something I whipped up, so it has had limited testing. Use at own risk. The largest test I did was with 2,862 files.
  • It relies on an array to store a list of all the files it needs to process, and for its return.
  • Designed with Linux paths in mind. Compatibility with Windows untested, and unknown.

Other Considerations:

  • Uses date modified.
  • Fails on file collision, rather than renaming files.

Example Script:

<?php
$location = realpath(dirname(__FILE__));
require_once $location . '/ritchey_copy_and_sort_files_i1_v1.php';
$return = ritchey_copy_and_sort_files_i1_v1("{$location}/temporary/Original", "{$location}/temporary/Copy", TRUE, TRUE, TRUE, NULL);
if (@is_array($return) === TRUE){
print_r($return);
} else {
echo "FALSE" . PHP_EOL;
}
?>

Example Return:

Array
(
    [0] => Array
        (
            [source_file] => /home/user1/Public/ritchey_copy_and_sort_files_i1_v1/temporary/Original/Example 2.txt
            [destination_file] => /home/user1/Public/ritchey_copy_and_sort_files_i1_v1/temporary/Copy/2022/September/21/Example 2.txt
        )

    [1] => Array
        (
            [source_file] => /home/user1/Public/ritchey_copy_and_sort_files_i1_v1/temporary/Original/Example 1.txt
            [destination_file] => /home/user1/Public/ritchey_copy_and_sort_files_i1_v1/temporary/Copy/2020/March/24/Example 1.txt
        )

    [2] => Array
        (
            [source_file] => /home/user1/Public/ritchey_copy_and_sort_files_i1_v1/temporary/Original/Sub Folder/Example 3.txt
            [destination_file] => /home/user1/Public/ritchey_copy_and_sort_files_i1_v1/temporary/Copy/2024/January/3/Example 3.txt
        )

)

2599
 
 
The original post: /r/datahoarder by /u/notpast8 on 2025-03-24 22:09:53.

This is the price I'm seeing in the US. Looking through the rules, I think this is okay to post but feel free to nuke it if I missed something.

WD Ultrastar DC HC550 WUH721818ALE604

New, $16.11/TB, free 2-day shipping, 3 year warranty. 38 left at the time of posting.

https://serverpartdeals.com/collections/18tb/products/western-digital-ultrastar-dc-hc550-wuh721818ale604-0f59266-18tb-7-2k-rpm-sata-6gb-s-512e-3-5-hard-drive

2600
 
 
The original post: /r/datahoarder by /u/Quantum_Key on 2025-03-24 21:56:56.

Hi,

This might be a long shot, but I’m was hoping to get some advice on downloading some flash applications from a webpage so I can archive them.

The pages in question form an interactive language learning series called ‘Mi Vida Loca’ which the BBC seems to have abandoned. The files are still hosted on their website but the section of the site has been marked as archived, and not updated.

Each ‘episode’ consists of an interactive video based learning experience, inside a flash player. If I use the Pale-Moon web browser, I can still access them and play them back.

There are plenty of assets for each episode; audio mp3’s, flv video clips, png stills, xml files, and several .SWF files, which I can see in the network panel of the browser inspector.

The bit I’m unsure of is how best to go about archiving these as a whole package, and if its possible to play back offline exactly as intended - I’m not super knowledgable when it comes to flash SWF files and assets, so any advice would be very much appreciated.

I fully understand flash isn’t developed/supported anymore, but would love to know if its possible to archive these - after all, back in 2009 the service won a Bafta award for innovation, but as it’s flash based, it seems to have been forgotten and left.

If anyone is interested in having a look, each episode comes with the interactive video:

https://www.bbc.co.uk/languages/spanish/mividaloca/ep01.shtml

And an extra set of interactive learning tools:

https://www.bbc.co.uk/languages/spanish/mividaloca/ep01_pb.shtml

Many thanks in advance

view more: ‹ prev next ›