this post was submitted on 15 Feb 2025
1 points (100.0% liked)

It's A Digital Disease!

23 readers
1 users here now

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

founded 2 years ago
MODERATORS
 
The original post: /r/datahoarder by /u/ericlindellnyc on 2025-02-15 06:26:37.

I have a gigantic deduplicating/reorganizing job ahead of me. I had no plan over the years, and I made backups of backups and then backups of that -- proliferating exponentially.

I am using rmlint, since that seems to do the most with the least hardware. Dupeguru was not up to this.

I've had to write a script that moves deeply nested folders up to the top level so that I don't tax my software or hardware with extremely large and complex structures. This is taking a looooong time -- maybe twelve hours for a fifty GB folder.

I'm also trying to sort the data by type, and make rmlint dedup one type of data at a time -- again, to prevent CPU bottlenecks or other forms of failure.

I also have made scripts that clean filenames and folder names.

It's taking so long I'm tempted to just use rmlint now, letting it deal with deeply nested folders, but I'm afraid it might gag on the data. I'm thinking of using rmlint's merge-folders feature, but it sounds experimental, and I don't fully understand it yet.

Moral of the story -- keep current with your data organization, and have a good backup system.

I'm using 2015 iMac 27" with MacOS-Montery. 4GHz clock, 32 GB RAM.

Any pointers on how can proceed? Thanks.

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here