The original post: /r/datahoarder by /u/Scott-Michaud on 2024-09-06 23:26:26.
Assuming that I will not just prune dumb files, what is a good method to optimize the following situation:
I personally have clumps of data (ex: old projects, etc.) that love tonnes (think millions) of tiny files.
I currently backup to the cloud + a couple external hard drives. Keeping incrementally up to date is fine, but I'm currently working on moving to new external drives (long story involving Ubuntu 24.04 and the old drives using NTFS, which is currently bugged out... plus it gave me self-justification to buy larger drives) and my file copy system looks like it'll take a week or two because all the tiny files push the transfer rate down to the tens of kilobytes per second. (When I hit the occasional large file, it jumps to ~150MB/s. I tried letting it go anyway; we're on Day 3 and still nowhere near complete.)
A lot of these clumps of files are irreplaceable, but unlikely to be opened more than once in like... 10 years.
My thought is to crush the entire folder into an archive format of some sort, so I'll have one (hundreds of gigabytes) file instead of millions of files totaling hundreds of gigabytes. (Much of it is just generated crap that I can delete, but... I don't want to, particularly because I've "oops I actually needed that" in the past.)
The cost of storing this is perfectly acceptable. Zero compression required. My main concern is that I don't want to try to open it in 2040 and find out the entire archive is dead.
My initial instinct is to use 7zip (the original PC is Windows) to dump these folders into .tar format, and delete the original folder. I know a little about the format, and it seems reasonable to me, but not enough to know if there's any edge cases where everything can come crumbling down.
Main questions:
- Is there anything fundamentally stupid about the entire premise of what I'm saying?
- Is uncompressed TAR about peak resilience? Or is there a better format?
- Anything I can do to validate that the data is (initially) perfect before I delete the original?
- Any other advice?
Thanks for your time and your consideration!