The original post: /r/datahoarder by /u/redditAgain3x on 2024-08-31 22:24:08.
Hello DataHoarders! I am trying to figure out how to setup the most simple local offline data backup solution I can with a focus on medium to longer term data integrity (preventing data rot) mainly from MacOS and other OS's (using ZFS or something similar, and as automated as possible). The data ranges from highly active to being in need of longer term archiving.
Please forgive my ignorance as I'm new both to reddit and this topic, and comp sci isn't my calling. I've been researching this topic as much as I can but am finding it a rather complicated and confusing rabbit hole and am trying to come up with a workable solution for it as soon as I can... my most limited resource on this project is time.
Due to iOS development (and wanting to avoid fighting with 'non native' OS/FS/hardware) I mainly have to use MacOS and APFS on one of the computers (Mac Mini M2) that produces the data. How do I get this to work with ZFS (or something that has similar capabilties for data integrity)? Does a DAS or NAS or neither make more sense for this? Is there a way to use ZFS in this context without building an entire separate computer (with RAID[X], ECC, etc.) or is that inevitable? If I can't use ZFS on the host computer then is DAS already out of the question? If DAS is a viable option how do I use it with the Mac while avoiding USB (causes problems with ZFS, right?). If a NAS makes more sense, how do I use it as 'offline' and securely as possible to protect against malware, etc? Though I'd prefer not to, if I do have to build a separate computer for this, what would be the fastest, easiest 'min-spec' setup details I should focus on to get it working and sufficiently usable?
As far as I'm aware using APFS and Time Machine (and/or SD or CCC) doesn't provide nearly the same data integrity functionality that ZFS does (e.g. not even checksums for user data, and/or makes it more obscure or harder to check yourself). I was originally hoping I could just do manual backups to some external disks, but once I became aware of how important data integrity / file fixity is (and how awesome something like ZFS is compared to other tools), I can't 'un-see' that now. I then thought maybe I could just do some manual backups using checksums but that seems like a horribly slow and inefficient long term solution, especially for active data that will most likely keep growing in size. Zooming out this would be part of a 3-2-1 or x-x-x strategy for me with varied media as suitable (SSD, HDD, and then something like archival grade optical or magnetic tape if needed), but I want to try to get the data integrity piece of this right.
I greatly appreciate any feedback, guidance, or wisdom you're willing to share with me on this. I can tell from these forums you guys have TONS of knowledge and experience with this stuff that I don't have anything close to.