It's A Digital Disease!

23 readers
1 users here now

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

founded 2 years ago
MODERATORS
676
 
 
The original post: /r/datahoarder by /u/stranger_synchs on 2025-06-24 09:42:47.

That's a really important database. I'm trying but I can't find companies names to scrape as there is no companies names

677
 
 
The original post: /r/datahoarder by /u/Mr_Worcester on 2025-06-24 09:07:14.

Hi everyone, im not totally sure this sub is the right place to ask this (so in case please direct me to a more suitable one). I have two 1TB NVMe SSDs (one Gen4, one Gen3) that I want to combine into RAID 0 for convenience - just want one drive letter instead of managing two separate drives.

Problem: My ASUS ROG B550-F motherboard (using RAIDXpert2) automatically configured each drive as its own single-disk RAID array during setup. Now I can't combine them into RAID 0 without deleting the existing arrays, which would wipe my data. Windows Storage Spaces won't work either since the drives appear as single-disk RAID arrays rather than individual drives. Since I don't have a spare 1TB drive for backup, are there any alternatives to get these drives working as one volume without data loss? My main goal is just having all files in one location without manually moving things between the two.

678
 
 
The original post: /r/datahoarder by /u/garden-3750 on 2025-06-24 08:48:47.

The internet permission appears to have been removed at least from the Android games and Sega clarifies that the titles can be played offline. I recommend storing the APK files.

679
 
 
The original post: /r/datahoarder by /u/Mission_Grapefruit92 on 2025-06-24 08:36:05.

Amazon.com: ELUTENG NVME to USB Adapter USB 3.1 Gen 2 to M.2 NVMe SSD Converter Adapter 10Gbps PCIe Based M Key Hard Drive Reader Max 4TB Support UASP for 2280 2260 2242 2230 SSD (Only for M.2 NVME) : Electronics

OR, do you know a good, cheap option for a USB to NVMe adapter? I dont want to spend a lot because i'm probably only gonna use it once or twice

680
 
 
The original post: /r/datahoarder by /u/jhenn08 on 2025-06-24 01:56:36.

Looking to buy a pre-built 24bay 4U server from the server store. Do you guys think this is a good deal?https://www.theserverstore.com/supermicro-superstorage-36x-bay-4u-plex-media-server-sas3

I'm selling my old server so won't have any spare internal parts. Any options or tweaks that you suggest I make on the server listed above?

681
 
 
The original post: /r/datahoarder by /u/JesseJamesTheCowboy on 2025-06-24 00:45:15.

Looking for recommendations for a das bay that holds between 4-8 hdds, no raid, no nonsense just want my pc to read 4 drives or whatevers in the bays. Should also have a fan for active cooling too, which I think is probably a given. Willing to spend more for something actually higher quality with more features such as hot swapping or not inappropriately putting the drives to sleep. Mostly just want something reliable that people here would recommend. Thanks.

682
 
 
The original post: /r/datahoarder by /u/TraitOpenness on 2025-06-24 00:43:53.
683
 
 
The original post: /r/datahoarder by /u/TalkAmbitious2248 on 2025-06-24 00:32:02.

I am looking for the best and preferably most economical way to safely archive around 5TB of various data that I have have collected through my life. I’m talking photos, videos, games, software, movies etc. Right now I have an external hard drive where this data is stored, but I’m afraid that it’s going to fail one day. I discovered M discs but after visiting this sub I realized they might be a scam. Hope you can help.

684
 
 
The original post: /r/datahoarder by /u/prototype073 on 2025-06-23 23:08:35.
685
 
 
The original post: /r/datahoarder by /u/lunar-lullabies on 2025-06-23 23:06:07.

EDIT: I meant to put pCloud for macOS specifically in the title.

I'm working on a storage setup, and preserving the original Date Created (not just Date Modified) is really important to me. I’m trying to confirm whether pCloud actually preserves the “Date Created” metadata when adding files to the pCloud Drive on macOS.

Right now I’m running macOS off a temporary recovery SSD until I can get my laptop repaired, so I can’t fully test this until I’m back on my regular setup. But in my current setup, when I add files to the pCloud Drive, either by dragging and dropping, using cp -p in Terminal, or uploading through the desktop application, the "Date Created" gets changed to match Date Modified or the current date.

Support has been very specific that Date Created should be preserved, and they’ve tested it on their end as well. They said it should be retained whether uploading through the desktop application, copying manual, or through Terminal. They're investing the logs but it's been a little while since I heard back, I’m not sure if this is a misunderstanding or if it really does preserve Date Created and something about my setup is interfering.

686
 
 
The original post: /r/datahoarder by /u/nilgiri on 2025-06-23 22:48:16.

Usage is a bunch of HDDs for Plex library with a HP Pro mini G9 mini PC. The Terramaster is almost double the cost of Orico ($230 vs. $115 pre tax). Which one should I pick??

687
 
 
The original post: /r/datahoarder by /u/and-yet-it-grooves on 2025-06-23 22:42:21.

I recently bought my first pair of 12TB HDDs (WD Red Plus) for my home server, and while I was researching what drives to buy I noticed that consistently every recommendation for quieter drives topped out around 12TB or 14TB regardless of brand.

Is there a reason for that? Is there some technical boundary around that point of data, or is it more economic like larger drives are geared towards the enterprise market where noise isn't as much of a concern?

Otherwise it seems unclear to me why, for example, a 7200RPM 14TB WD Red Plus could be relatively quiet but bumping that to a 16TB WD Red Pro at the same RPM sees the volume become much more pronounced.

688
 
 
The original post: /r/datahoarder by /u/TimberTheDog on 2025-06-23 20:51:35.

I think it would serve the public interest if the videos of masked ICE agents were being stored somewhere, along with location. Anything like that happening? If not, any idea what the best way to do this would be?

689
 
 
The original post: /r/datahoarder by /u/hd805 on 2025-06-23 19:56:08.

Would that make for a much more cost effective Networked attached storage? Any thoughts on potential trade offs in terms of lag and such applications include hosting video locally on LAN without transcoding?

690
 
 
The original post: /r/datahoarder by /u/-1D- on 2025-06-23 18:57:47.

So you all probably already know that youtube around 2 years ago now introduced 1080p 24/30 fps premium formats, those where encoded in vp9 and usually 10 to 15% higher in bitrate then avc1/h264 encodes, which where previous highest bitrate encodes.

Now youtube is introducing 1080p 50/60fps premium formats that where encoded in av1 and most of the times not even higher then regular h264/avc1, though hard to comform exactly by how much due to format still being in A/B test meaning only some accounts see it and have access to it, and even those accounts that have it need premium cus ios client way to download premium formats doesn't work when passing coockies (i explain this beforehand in details in multiple times on youtubedl sub) , making avc1/h264 encodes very often better looking then premium formats

Now youtube is even switching to av1 for 1080p 24/30fps videos proof

And they're literally encoding them like 20% less then vp9, and it's noticeably worse looking then vp9 1080p premium, which they will probably (most likely) phase out soon again making h264/avc1 encodes the better looking even then premium ones

Also they disabled premium formats for android mobile for me at least for last 2 days

Then they're now encoding 4k videos in some abysmally low bitrates like 8000kpbs for av1 when vp9 gets 14000 kpbs, and they almost look too soft imo especially when watching on tv

Newly introduced YouTube live streams in av1 look fine ish at least for now in 1440p but when it comes to 1080p its a soft fest, literally avc1 live encodes from 3 years ago looked better imo, though vp9 1080p live encodes don't look much better eather, and also funnly enough av1 encodes dissappear form live streams after the streams is over, like no way that cost effective for yt

Then youtubes reencoding of already encoded vp9 and avc1 codecs are horrible, when av1 encode comes, they reencode avc1 and vp9 and make it look worse, sometimes even when bitrate isn't dropped by much they still loose details somehow thread talking about this

And to top it off they still don't encode premium formats for all videos, meaning even if i pay for premium i still need to watch most videos in absolutely crap quality, but they will encode every 4k video in 4k always and in much higher bitrate then these 1080p premium formats, meaning they're encouraging that users upscale their video to be encoded in evem nearly decent quality wasting resources and bitrates and bandwidth just cus they don't wanna offer even remotely decent bitrates to 1080p content even with premium

691
 
 
The original post: /r/datahoarder by /u/LxFx on 2025-06-23 18:57:26.

PCPartPicker Part List

| Type | Item | Price | |


|


|


| | CPU | AMD Ryzen 5 8600G 4.3 GHz 6-Core Processor | $180.00 @ Newegg | | CPU Cooler | Included AMD cooler | | | Motherboard | ASRock B650I Lightning Wifi Mini ITX AM5 Motherboard | $199.99 @ Amazon | | ECC Memory | 2x Kingston KSM48E40BS8KI-16HA | | | SSD (boot) | 2x WD Blue SN5000 500GB | | | HDD (RaidZ2) | 6x WD Ultrastar DC HC550 18 TB 3.5" 7200 RPM Internal Hard Drive | | | Case | Jonsbo N3 Mini ITX Desktop Case | $162.00 @ Newegg Sellers | | Power Supply | Cooler Master V550 SFX GOLD 550 W 80+ Gold Certified Fully Modular SFX Power Supply | | | Storage Adapter | Delock 89042 SATA controller | | | Case Fan | 2x Noctua A9 PWM 46.44 CFM 92 mm Fan | 2x $18.95 @ Amazon | | Case Fan | 2x Noctua A8 PWM 32.66 CFM 80 mm Fan | 2x $17.95 @ Amazon |

Main server: ASUS RS520A-E11-RS12U, Epyc 7413, 128GB ECC, 2x WD Black SN770 (mirrored boot), 4x WD Black SN850X (docker, downloads, cameras), 6x WD Ultrastar DC HC580 24TB (RaidZ2)

Notes:

  • Backup will sync using ZFS tools on a daily or weekly basis. While not in use drives will spin down.
  • Hoping to connect from home to offsite location using tailscale to sync
  • Backup size is indeed smaller than main. New 6x 24TB drives will go into the main server and old 18TB drives will go into the backup. Currently 36TB in use, so still fine. Might cycle again in the future if the backup nears 100%. 6 unused bays in the ASUS chassis for now. Other approach could be to put largest drives in backup and increase main server with 6 new disks in the future.
  • Main goal was to keep the backup lightweight, silent and power efficient
692
 
 
The original post: /r/datahoarder by /u/Kayect on 2025-06-23 17:11:26.

I have a large quantity (about to be 4000) of MP3 song files that can be found on Spotify, and I have backups on a PC, laptop, phone, USB stick, and HDD drive. I would also like to backup to OneDrive as a cloud based backup because I have hundreds of GB free there, and all the music is currently under 50gb. I understand this may be a gray area because of OneDrive's ToS with copyrighted content, but the purpose of the OneDrive backup would not be to distribute, share, or sell any content wrongly. It's solely just a personal backup for myself. I've heard that Microsoft regularly scans content for copyrighted material, and I don't want to deal with losing account access or other data I store on OneDrive.

693
 
 
The original post: /r/datahoarder by /u/WorriedHelicopter764 on 2025-06-23 16:57:35.

Found this info on FB.

This 36GB .zip file unzips to give you over >350 .json.gz files. After gunzipping them, you are left with approximately 350GB of jsonl files.

Unsurprisingly, the most common MOT defect is "Nearside Front Tyre worn close to the legal limit (4.1. E.1)" (exact text match) - 1.33% of 1.81 billion recorded defects.

For scale... 12,973 defects are related to bananas.

https://documentation.history.mot.api.gov.uk/mot-history-api/download-vehicle-mot-history-data/

694
 
 
The original post: /r/datahoarder by /u/SwingDingeling on 2025-06-23 16:22:07.

I tested this so many times:

A UHD (aka 4K, but UHD is the correct term) gets released. I download it and get let's say a 18k bitrate vp9 video.

I then download the video about a day later, get supposedly the exact same version, but the bitrate is at 25k now. At first I thought they replace the OG vp9 version with a better one. I then compared the quality many times and always got the same shocking result: OG version is better.

YouTube replaces the best version you can get (av1 is more efficient, but quality is about the same as vp9 version 2) with a file that's up to 30% bigger, yet has 10% worse quality.

How can we get them to fix this? Why are they doing this?

695
 
 
The original post: /r/datahoarder by /u/The_Faceless1 on 2025-06-23 15:31:23.
696
 
 
The original post: /r/datahoarder by /u/calcium on 2025-06-23 15:29:11.

Was just looking at picking up some factory recertified drives through either SPD or GoHardDrive and was looking at the data sheets of the various drives when I noticed that the Seagate Factory Recertified Drive's data sheet had terrible metrics when compared to their newer drives.

Here's a comparison between the Seagate Exos X16, Exos X22, and Factory Recertified drives...

| Type | X16 | X22 | Factory Recertified | |


|


|


|


| | Limited Warranty | 5 years | 5 years | 6 months | | Nonrecoverable Read Errors per Bits Read | 1 sector per 10E15 | 1 sector per 10E15 | 1 sector per 10E14 | | Power-On Hours per Year (24×7) | 8760 | 8760 | 2400 | | Max. Sustained Transfer Rate OD (MB/s,MiB/s) | 261/249 | 285/272 | 190/181 | | Random Read/Write 4K QD16 WCD (IOPS) | 170/440 | 168/550 | 170/320 | | Idle A (W) Average | 5.0W | 5.5W | 7.2W | | Max Operating, Random Read 4K/16Q (W) | 10.0, 6.3 | 9.4, 6.4 | 10.5W | | Temperature, Operating (°C) - drive reported | 5°C – 60°C | 5°C – 60°C | 10°C – 60°C | | Shock, Operating 2ms (Read/Write) (Gs) | 50 | 40 | 30 | | Datasheet | X16 | X22 | Exos Recertified |

It seems like Seagate's tolerances are loosened up a lot by recertifying their drives but their sustained transfer speeds really take a wallop and overall give me pause for concern. For anyone who's bought their Factory Recertified Drives (mostly through GoHardDrive) have you noticed lower overall read speeds on your drives compared to what's offered in the other data sheets? Comparatively, SPD tends to refurbish older X* stock and I've never had issues getting the faster speeds shown in their actual datasheets.

I'm only looking at GoHardDrive as they offer a 5 year warranty on their recertified drives, but a loss of 100MB/s across the drive range will really impact parity calculations. As an example, the difference in speed on a parity calculation of a 24TB drive running at 260MB/s is 25h40m, while at 190MB/s 35h6m which is huge. Thoughts?

697
 
 
The original post: /r/datahoarder by /u/Stereogravy on 2025-06-23 14:37:39.

I’m new to setting up this type of solution and would appreciate any help.

I planned to buy a NAS, but I had parts to build a second PC and heard building would be cheaper.

I’ve built the PC using space parts:

• ⁠9950X CPU (I sent in an old CPU to AMD under warranty and they sent me back this one)

• ⁠RTX 2080

• ⁠64GB DDR5 RAM

• ⁠512GB NVMe SSD

I’m undecided about the NAS OS, but I’m considering UnRAID or TrueNAS, any other options I’m open to.

I’m thinking of buying manufacturer-refurbished drives from Severpartdeals.com, based on positive reviews.

For my HDD bay, I’m considering the QNAP TL-D800C 8-Bay Desktop JBOD Storage Enclosure with USB 3.2 Gen 2 Type-C Connectivity.

https://a.co/d/iy6yxfE

My goal is to have 25-45 terabytes of workable data, assuming each project takes about 300-500GB.

I need fast and redundant RAID.

Ideally, I would want to add drives as my company grows to increase workable editing space.

I’ll have an editor working off 720p proxies that will access the RAID in another state.

My internet speed is fiber 1Gig up, 1Gig down, but I can usually get 1.5Gig up/down. I can upgrade to 5Gig speeds if needed.

My home is wired with Cat 6, and I’ve fully implemented 2.5GB Switcher Access Points with 2.5GB ports.

698
 
 
The original post: /r/datahoarder by /u/Connect_Nerve_6499 on 2025-06-23 08:36:54.

Hi all,

I’m looking for a 4–8 bay DAS with RAID support (RAID 1, 5, or 10). It'll be used mostly for long-term HDD storage and backups — connected only when working with data. Speed isn’t a priority, HDD speeds are fine. I’ve seen models from ORICO, TerraMaster, Renkforce, but unsure about their build quality and RAID reliability.

Any recommendations or current setup experiences ?

Thanks!

699
 
 
The original post: /r/datahoarder by /u/twofoursixohdang on 2025-06-23 02:52:25.

I've been trying to clean up this scan of an obscure book of sheet music for a while now, and it's been driving me nuts.

The initial scan was made (somewhat hastily) with one of those overhead book-cameras. The main problems are that some of the images are lightly distorted due to the curve of the page, and the colors aren't right - every page is black-on-grey.

I've Googled around and found ScanTailor Advanced, but from the looks of things, while it can fix the distortion, it has to be manually, laboriously applied to each individual page. I guess I can just live with it.

The colors are what frustrate me. Messing about with Irfanview, I've tried to find combinations of successively adjusting the colors and shifting the contrast to bring back the original look of the page, and while I've had some success, some of the pages are just a little darker than others and still turn out looking grey while other pages look okay. Things are complicated by the fact that some pages have greyscale images that I also want to preserve.

Is there some obvious solution I'm overlooking here? It feels like this is a simple problem that someone, somewhere would have solved by now, but I can't seem to find the answer.

700
 
 
The original post: /r/datahoarder by /u/xrepair on 2025-06-23 02:42:01.

Hello everyone,

Just wanted to share a small program I wrote that writes and verifies data on a raw disk device. It's designed to stress-test hard drives and SSDs by dividing the disk into sections, writing data in parallel using multiple worker threads, and verifying the written content for integrity.

I use it regularly to test brand-new disks before adding them to a production NAS — and it has already helped me catch a few defective drives.

Hope you find it useful too!

The link to the project: https://github.com/favoritelotus/diskroaster.git

view more: ‹ prev next ›