bOt

joined 2 years ago
MODERATOR OF
 
The original post: /r/datahoarder by /u/Durnehvihr69 on 2025-07-24 15:10:57.

My mother has an assortment of cassette tapes with recordings of old concerts (60s Santana, 80s Grateful Dead, my dad’s band from college, etc.) that she would like to have digitized so that she can easily listen to them again (and for preservation purposes). Where would the best place be to upload these? If it’s a well known band, while uploading to YouTube or Spotify for easy listening be a problem if it’s a concert recording rather than an album?

 
The original post: /r/datahoarder by /u/Kirbo96 on 2025-07-24 14:56:31.

I need some HDDs I can occasionally plug into my desktop PC to backup my media collection.

My plan is to buy about 3 larger SATA drives (maybe 5 TB each) to store all of my files, duplicates and all. I will be plugging these into my desktop periodically to check with a CRC and verify with a HASH. Otherwise they will live in boxes in different houses. I will rotate a new HD in ever few years.

I will also have 3 smaller SATA drives (for consolidated file storage with all the duplicates removed, and my collection pruned and labelled. These will get the same verification then “live in a box” treatment as the larger drives, unless something fails in my main PC and I need to restore data.

The idea for the large “ALL FILES!” drives is for peace of mind that I don’t accidentally delete something forever while scanning for duplicate files/trimming down my collection.

Questions:

  1. I’ve heard that where I buy drives matters, as Amazon tends to be rough with their drives and some arrive partially damaged (and then fail). All prices being equal, what online portals should I buy from?
  2. Should I look for Enterprise grade HDs for this use case, or is it unnecessary?
  3. Should I care about SMR or CMR for the big cold storage drives? I feel like since I won’t access them often, it doesn’t matter much.
  4. Since I’m looking for longevity and reliability, should I avoid refurbs?
  5. Any obvious holes in this plan that I’m overlooking?
 
The original post: /r/datahoarder by /u/bankroll5441 on 2025-07-24 14:10:46.

Hey all,

I was hoping this sub could offer some advice as far as best practices implementing immutable backups. Backups are something that I've been putting off for my homelab and recently began focusing on, as losing some of this data due to a compromised device, corruption, device failure, etc would be a huge pain in the butt. I'll try to explain as best I can what I've implemented so far.

I'm currently using Borg Backup for full system backups on all necessary devices. Backups are pushed via ssh to a server where I've mounted a spare 2TB NVMe drive in an m.2 enclosure. Backups run automatically via a script tied to a systemd timer. Current de-duplicated data on the backups drive is equal to about 1.2TB as of writing. My prune settings are: Daily - 7, Weekly - 4, Monthly - 3. I've stored the repo keys and passphrases for each device both physically and digitally (encrypted with gpg, credentials stored on YubiKeys). The only data redundancy I have at the moment beside device level is a sync of that drive to my Nextcloud, where data is stored on HDD's in Raid 1 (also same server).

I have a spare 4TB HDD that I could store the immutable backups on, but I'm just struggling to develop a way to implement it correctly. I would definitely be mounting the HDD on a different machine than the one receiving the Borg Backups. I would like the immutable backups drive to store all Borg archives without pruning. I understand I could use rsync to sync the two drives and automate it, but that would present a potential vulnerability with the drive being writable at the time of syncing. Would I have any issues running rsync with chattr +a on? Sudo perms are tied behind YubiKeys on almost every device, so I'm leaning towards this option.

I'm trying my best to achieve the 3-2-1 rule, but unfortunately storing this data on the cloud seems to be very expensive, so I'm leaning into redundancy and security of the data. I know I'm taking a risk with all of it being on site but at the moment it seems to be my only option.

Any advice or recommendations would be appreciated, for both my Borg backup flow and for immutable backups!

 
The original post: /r/datahoarder by /u/Putrid_Draft378 on 2025-07-24 13:20:46.
 
The original post: /r/datahoarder by /u/ActuallyApathy on 2025-07-24 12:57:52.
 
The original post: /r/datahoarder by /u/Duck_Dur on 2025-07-24 12:33:49.

Hello all,

How should I go about upgrading my NAS drives, I currently have 4 8TB drives in a RAID 10 config, how would I upgrade my drives to say 12TB drives without losing any of my data during the upgrade while still keeping RAID 10?

 
The original post: /r/datahoarder by /u/SweetRefrigeratr3012 on 2025-07-24 09:49:52.

Hi everyone,

I’m using dupeGuru to find duplicate photos, but I’m running into two big problems, and I’m wondering if I’m missing a setting or doing something wrong.

  1. It doesn’t find all duplicates in one scan! I have a large photo collection (over 100,000 files). When I run dupeGuru, it only finds some hundred duplicates. Then, after deleting them and scanning again, it finds more. I have to repeat this process many times. Is there a way to make it find all duplicates in one go?
  2. Sometimes when I find duplicates, I can’t select them to delete right inside the app (the checkbox is greyed out). Instead, I have to click the result, open the folder, and manually delete the file in Explorer.

Any help or tips would be really appreciated. Thanks in advance!

 
The original post: /r/datahoarder by /u/hiroo916 on 2025-07-24 09:21:53.
 
The original post: /r/datahoarder by /u/maxtrix7 on 2025-07-24 09:03:37.

Hi, I'm looking for an affordable way to fill my new NAS without breaking the bank.

HDD Toshiba MG07ACA14TE 14 TB 3,5" 8,89 cm 6G SATA 7,2K P/N: HDEPW10CGA51

The listing says it comes with 1 year of warranty and costs 160 euros

Worth to buy? I want to populate a Synology DS418play.

 
The original post: /r/datahoarder by /u/mikeage on 2025-07-24 09:02:49.

Hi, I have about 1800 journal articles archived and I'm looking for an easy way to query them. All have full text (no weird OCR limitations), but they're in different languages with a lot of transliteration (and often inconsistently so), so I'm thinking that a simple keyword search is probably not sufficient.

I use paperless-ngx to index documents, and I looked at adding paperless-ai to it, but when I tried with my current archives, I was very underwhelmed (and frustrated; it tagged a lot of my stuff with nonsense and the Reset option, which I understood from the documentation would remove the changes it made, didn't, so I'm a bit bitter about having to manually undo a lot). But in any case, the way it organizes by correspondent and type is probably not really what I want.

Any suggestions for something that might be more suited for this type of indexing?

 
The original post: /r/datahoarder by /u/Linuxdr0ptips on 2025-07-24 08:37:16.

Do you think it is reasonable to pay 878USD for a WD SN850X 8TB nvme ssd ? Or there are any other SSD to recommend?

 
The original post: /r/datahoarder by /u/manzurfahim on 2025-07-24 06:20:00.
view more: ‹ prev next ›