this post was submitted on 11 Oct 2024
1 points (100.0% liked)

It's A Digital Disease!

23 readers
1 users here now

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

founded 2 years ago
MODERATORS
 
The original post: /r/datahoarder by /u/InternationalMany6 on 2024-10-11 01:45:12.

I have some very large collections of files organized into nested folder structures, which are currently spread across a few 8 TB USB drives. All in all it's more than 100 milion files, and I have a particular read pattern I usually follow where it would make sense to split them up across different drives based on the last part in the filename.

For example *A.data should be stored on drive 1, *B.dats on drive 2, and C.data on drive 3. The * is an incremented number that would be the same for all three files, and it basically represents a timestamp. The program I use to access these files always reads them in groups of three, so this organization scheme optimizes throughout 3x.

This is actually a program I wrote and that's actually how I currently lay out the files on different drive letters, but it's become a pain to manage the code so I'm wanting to offload the functionality and just have my program think it's reading everything from a single drive letter.

Can DrivePool handle that configuration based on the last part of a filename? Does its performance suffer much compared to using native NTFS once 100+ million files are involved?

Edit: in case it's not obvious, my program is multithreaded so it issues multiple file read requests to the OS in parallel.

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here