this post was submitted on 25 Mar 2025
1 points (100.0% liked)

It's A Digital Disease!

23 readers
1 users here now

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

founded 2 years ago
MODERATORS
 
The original post: /r/datahoarder by /u/NameEfficient4047 on 2025-03-24 21:08:55.

Hi all. Long story short, I work for an organization that has been saving audiovisual materials to external hard drives for decades. These files only exist on these hard drives right now, which is obviously not great. We are in the process of creating an asset management system where the files can be migrated (and these drives will serve as back-ups).

For now, I am trying to create a system to "inventory" these drives so we know what's on each one. I'm using a script (batch file) to generate a file manifest for a given drive and including some technical metadata like file name, file path, file size, last modified date. It saves it as a .txt file and I am attaching it as an attachment in an Airtable base, where we're tracking the inventory.

I thought it would be good to generate checksums for these drives so I can monitor the integrity at set intervals (maybe every 6 months?). Most of these drives are 2TB and nearly full. I wrote a script for Powershell to generate SHA256 checksums and export them as a CSV. (I see it's doing this, but also generating a .txt file in each sub folder of the drive for each checksum, which I plan to delete once it's completed. And also to tweak this script so it does not do that).

At this point you may see where this is going. It's been nearly 5 hours and it's not completed yet. I understand SHA256 will take longer than MD5, and that 1.5 TB of mainly audiovisual files will also take a long time. I have been using the Powershell because it can be a bit of an ordeal to install software on our work machines, but I can go that route it need be...

A few newbie questions:

  • Is there a more efficient way to go about this? Or is this length of time unavoidable due to the size of/number of files?
  • Would using a separate software accomplish this task significantly more quickly than Powershell?
  • Is it a fool's errand to be generating checksums at all at this point, when there is no duplicative copy to restore files if I discover they are degrading anyway? Should I just hold off on this part of the workflow and revisit it closer to the time we plan on copying these files to centralized storage (with these drives serving as the back-ups)?

Since we have no record of these drives at all, I will still go forward with the inventory process either way, just so we have a list of what we have. If anyone is curious, in addition to the manifest, I'm assigning a unique barcode to each drive, and recording drive format, connection type, file types present, file manifest (attached as txt file), drive capacity/usage, date of last SMART health check. Definitely open to any other suggestions of important data to be recording while we're at it.

Thank you so much for any guidance and please be gentle as this is not my area of expertise, but I'm desperately trying to learn and do the right thing so we don't lose these audiovisual files forever. Thank you!

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here