It's A Digital Disease!

23 readers
1 users here now

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

founded 2 years ago
MODERATORS
2426
 
 
The original post: /r/datahoarder by /u/Philaire on 2025-03-31 12:24:49.

Hey fellow hoarders,

Crossposting this from r/selfhosted because I figured some of you might have run into the same problem - or have a hoarding-friendly solution 😄

After spending 8 full days digitizing ~300 CD-ROMs (mostly retro PC games) plus a bunch of OS install ISOs, I'm now looking for a clean, self-hosted web-based library manager to organize, browse, and possibly even boot these ISOs.

What I'd love:

  • Scan folders with .iso files
  • Add metadata (title, platform, year, notes, etc.)
  • Clean, searchable/sortable interface (covers or thumbnails would be awesome)
  • Bonus: integration with QEMU/VirtualBox
  • Self-hosted, preferably Docker-compatible

I tried Jellyfin, Plex, File Browser - nothing quite fits.

I'm ready to roll my own Flask app if I must, but I'd love to know if anyone already did something similar!

Note: All discs were legally owned and ripped - this is a personal preservation project.

If you're curious, I can share how I structured the archive too.

Here's the original post on r/selfhosted:

👉 Link to original post

Thanks in advance, and long live the stacks of spinning rust!

2427
 
 
The original post: /r/datahoarder by /u/q1525882 on 2025-03-31 11:51:27.

Have you experienced similar behavior with 20tb drives or other high density drives, where they like to start quite loud and significant vibration so it goes all over the PC case.

Define R6 case.

In my experience I had 4tb WD Blues with 5400, these were dead silent

Later added 12tb WD whites, these were louder but not vibrated much.

Now there are 20tb WD whites, and one of drives on startup rattles quite heavy, so it goes to whole case. Screwing it tighter to cage helped for few days, and later it broke free from the shackles again.

2428
 
 
The original post: /r/datahoarder by /u/SuperCiao on 2025-03-31 11:41:04.

Hey everyone,

I recently converted a Blu-ray .m2ts file to .mkv using ffmpeg with the -c copy option to avoid any re-encoding or quality loss. The resulting file plays fine and seems identical, but I noticed something odd:

  • The original .m2ts file is 6.80 GB
  • The .mkv version is 6.18 GB
  • The average bitrate reported for the MKV is slightly lower too:
  • M2TS :=37766375bps, MKV: =35828468bps

I know MKV has a more efficient container format and that this size difference is expected due to reduced overhead, but part of me still wonders: can I really trust MKV to retain 100% of the original quality from an M2TS file?

Here's why I care so much:

I'm planning to archive a complete TV series onto a long-lasting M-Disc Blu-ray and I want to make sure I'm using the best possible format for long-term preservation and maximum quality, even if it means using a bit more space.

What do you all think?

Has anyone done deeper comparisons between M2TS and MKV in terms of technical fidelity?

Is MKV truly bit-for-bit identical when using -c copy, or is sticking with M2TS a safer bet for archival?

Would love to hear your insights and workflows!

Thanks!

2429
 
 
The original post: /r/datahoarder by /u/Moviesinbed on 2025-03-31 10:42:52.

I have an HP P2000 and cannot access it at 10.0.0.2/24.

It's connected to my router and directly connected to my server with USB. Server sees disk array on COM3 how can I access this and get it set up to use as a DAS?

2430
 
 
The original post: /r/datahoarder by /u/Itzhiss on 2025-03-31 07:54:06.

Recently i was able to purchase a 2nd 12tb iron wolf for my dual bay raid hub. i am looking for a site or service to temporally upload my videos ( just copy the 4 folders i have and all the files about 3-4 TB ) then after i switch my array over to JBOD to redownload them. otherwise its going to be a hassle. just trying to go the easy route.

thanks

2431
 
 
The original post: /r/datahoarder by /u/Few_Razzmatazz5493 on 2025-03-31 06:55:30.

I mistakingly bought a new Macbook with 1TB of space and didnt realize how quickly I'd use that space. I purchased a QNAP-TR-004, and just wondering if anyone has any opinions on the best HDD's to use with the device? I'm probably going to go with 4x8GB but I just don't know which has the lowest failure rate and best overall quality - thanks.

2432
 
 
The original post: /r/datahoarder by /u/The_Silver_Nuke on 2025-03-31 05:52:41.

So according to some cursory research, there is an existing downloader that people like to use that hasn't been functioning correctly recently. But I was doing some more looking online and couldn't find a viable alternate program that doesn't scream scam. So does anyone have a fix for the AlexCSDev PatreonDownloader?

When I attempt to use it I get stuck on the Captcha in the Chromium browser. It tries and fails again and again, and when I close out of the browser after it fails enough, I see the following error:

2025-03-30 23:51:34.4934 FATAL Fatal error, application will be closed: System.Exception: Unable to retrieve cookies
   at UniversalDownloaderPlatform.Engine.UniversalDownloader.Download(String url, IUniversalDownloaderPlatformSettings settings) in F:\Sources\BigProjects\PatreonDownloader\submodules\UniversalDownloaderPlatform\UniversalDownloaderPlatform.Engine\UniversalDownloader.cs:line 138
   at PatreonDownloader.App.Program.RunPatreonDownloader(CommandLineOptions commandLineOptions) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.App\Program.cs:line 128
   at PatreonDownloader.App.Program.Main(String[] args) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.App\Program.cs:line 68

2433
 
 
The original post: /r/datahoarder by /u/jemmysponz on 2025-03-31 04:40:30.

When Game Informer was unceremoniously ended last year I recall seeing some posts about folks collaborating on maintaining an archive in some form or another of old issues.

If you haven't heard yet, Game Informer got resurrected by a blockchain company called Gunzilla Games in the past couple weeks, and on their website, they have a magazine archive going back a little past a decade up to the most recent issue. These are, as far as I can tell, copies of the actual issues, not the "digital editions" that were available through their old phone app (which no longer displays any digital issues as far as I can tell).

Would it be worth trying to pursue mirroring this archive somehow? Is it even possible? The way it's set up is that the data for each issue seems to be dynamically loaded from some other site in the form of an image and an svg of the text overlaid atop it to form each individual page, and I've run into trouble trying to establish a local mirror of any individual issue. Is it worth the effort? I only feel compelled to attempt this because I don't really trust that the revival will last for very long.

2434
 
 
The original post: /r/datahoarder by /u/heartshapedgoth on 2025-03-31 02:51:22.

Hi there, I hope I don't annoy anyone with this post but I'm very out of my depth here and looking for advice from people that have more experience and knowledge than me.

My twin sister is having her first child, and we have been discussing her screentime boundaries as a mother going forward. We have noticed a trend in our peers children specifically regarding VOD streaming and the ability to choose what theyre watching, and also how much overstimulating content is in children's media in the last few years. I have a pretty significant library downloaded of educational children's media at my disposal, but I don't know of a way to build exactly what I'm seeking. I am hoping for some kind of a TV box that I can use to launch my library of programming and shuffle the episodes at random, similar to how real over the air broadcasting would be where it elliminates the ability to "choose" which show is coming on. Ideally I would be able to sort each of the downloaded and categorized files into individual playlists like "PBS Kids" and "Noggin" and then shuffle them from there and play until the tv was turned off. Through my perusal on here looking at some older posts, I was able to see a few options that may be closer to fitting my needs, like PsuedoTV on kodi and plex, but I was hoping for something that the kids would be able to launch theirselves rather than requiring 10 steps and allowing them access to things she would prefer they didn't. Once I finish downloading all of my media collection it would no longer require updating, and I was hoping that the program would not require a lot of upkeep or internet connection.

As I am just a 25 year old librarian that's admittedly kind of an airhead, and I don't have any experience with this, would I have better luck commissioning someone to make a basic program on raspberry pi to fit my needs? or is there something out there already existing that is closer to what I'm seeking.

I hope I was explaining this clearly but I apologize if it was not haha!!

2435
 
 
The original post: /r/datahoarder by /u/PlatformTall3121 on 2025-03-31 02:45:53.

I’ve been building some 3D printed tools for organizing and managing drives, NAS setups, and rack gear.

I'm curious, are there any simple physical tools or mounts that would make things easier for you? Stuff like better HDD trays, airflow guides, fan mounts, or cable organizers?

Just trying to solve some of the small-but-frustrating parts of building and maintaining a setup.

2436
 
 
The original post: /r/datahoarder by /u/Finebyme101 on 2025-03-31 02:20:10.

Been backing up my files using a mix of external drives and cloud services, currently thinking of switching to NAS. I get the idea (automatic syncing, version control, centralized storage something), I’m wondering if it’s actually as reliable as it claims?

Is it really that much better than, say, Google Drive + a hard drive? What if it fails? Would love to hear your experience and thoughts. Thank you.

2437
 
 
The original post: /r/datahoarder by /u/digitalsignalperson on 2025-03-31 00:50:25.

My scenario is:

  • 4TB nvme drive
  • want to use thin provisioning
  • don't care so much about snapshots, but if ever used they would have limited lifetime (e.g. a temp atomic snapshot for a backup tool).
  • want to understand how to avoid running out of metadata, and simulate this
  • want to optimize for nvme ssd performance where possible

I'm consulting man pages for lvmthin, lvcreate, and thin_metadata_size. Also thin-provisioning.txt seems like it might provide some deeper details.

When using lvcreate to create the thinpool, --poolmetadatasize can be provided if not wanting the default calculated value. The tool thin_metadata_size I think is intended to help estimate the needed values. One of the input args is --block-size, which sounds a lot like the --chunksize argument to lvcreate but I'm not sure.

man lvmthin has this to say about chunksize:

  • The value must be a multiple of 64 KiB, between 64 KiB and 1 GiB.
  • When a thin pool is used primarily for the thin provisioning feature, a larger value is optimal. To optimize for many snapshots, a smaller value reduces copying time and consumes less space.

Q1. What makes a larger chunksize optimal for primary use of thin provisioning? What are the caveats? What is a good way to test this? Does it make it harder for a whole chunk to be "unused" for discard to work and return the free space back to the pool?

thin_metadata_size describes --block-size as: Block size of thin provisioned devices in units of bytes, sectors, kibibytes, kilobytes, ... respectively. Default is in sectors without a block size unit specifier. Size/number option arguments can be followed by unit specifiers in short one character and long form (eg. -b1m or -b1mebibytes).

And when using thin_metadata_size, I can tease out error messages block size must be a multiple of 64 KiB and maximum block size is 1 GiB. So it sounds very much like chunk size but I'm not sure.

The kernel doc for thin-provisioning.txt says:

  • $data_block_size gives the smallest unit of disk space that can be allocated at a time expressed in units of 512-byte sectors. $data_block_size must be between 128 (64KB) and 2097152 (1GB) and a multiple of 128 (64KB).

  • People primarily interested in thin provisioning may want to use a value such as 1024 (512KB)

  • People doing lots of snapshotting may want a smaller value such as 128 (64KB)

  • If you are not zeroing newly-allocated data, a larger $data_block_size in the region of 256000 (128MB) is suggested

  • As a guide, we suggest you calculate the number of bytes to use in the metadata device as 48 * $data_dev_size / $data_block_size but round it up to 2MB if the answer is smaller. If you're creating large numbers of snapshots which are recording large amounts of change, you may find you need to increase this.

This talks about "block size" like in thin_metadata_size, so still wondering if these are all the same as "chunk size" in lvcreate.

While man lvmthin just says to use a "larger" chunksize for thin provisioning, here we get more specific suggestions like 512KB, but also a much bigger 128MB if not using zeroing.

Q2. Should I disable zeroing with lvcreate option -Zn to improve SSD performance?

Q3. If so, is a 128MB block size or chunk size a good idea?

For a 4TB VG, testing out 2MB chunksize:

  • lvcreate --type thin-pool -l 100%FREE -Zn -n thinpool vg results in 116MB for [thinpool_tmeta] and uses a 2MB chunk size by default.
  • 48B * 4TB / 2MB = 96MB from kernel doc calc
  • thin_metadata_size -b 2048k -s 4TB --max-thins 128 -u M = 62.53 megabytes

Testing out 64KB chunksize:

  • lvcreate --type thin-pool -l 100%FREE -Zn --chunksize 64k -n thinpool vg results in 3.61g for [thinpool_tmeta] (pool is 3.61t)
  • 48B * 4TB / 64KB = 3GB from kernel doc calc
  • thin_metadata_size -b 64k -s 4TB --max-thins 128 -u M = 1984.66 megabytes

The calcs agree within the same order of magnitude, which could support that chunk size and block size are the same.

What actually uses metadata? I try the following experiment:

  • create a 5GB thin pool (lvcreate --type thin-pool -L 5G -n tpool -Zn vg)
  • it used 64KB chunksize by default
  • creates an 8MB metadata lv, plus spare
  • initially Meta% = 10.64 per lvs
  • create 3 lvs, 2GB each (lvcreate --type thin -n tvol$i -V 2G --thinpool tpool vg)
  • Meta% increases for each one to 10.69, 10.74, then 10.79%
  • write 1GB random data to each lv (dd if=/dev/random of=/dev/vg/tvol$i bs=1G count=1)
  • 1st: pool Data% goes to 20%, Meta% to 14.06% (+3.27%)
  • 2nd: pool Data% goes to 40%, Meta% to 17.33% (+3.27%)
  • 3rd: pool Data% goes to 60%, Meta% to 20.61% (+3.28%)
  • take a snapshot (lvcreate -s vg/tvol0 -n snap0)
  • no change to metadata used
  • write 1GB random data to the snapshot
  • the device doesn't exist until lvchange -ay -Ky vg/snap0
  • then dd if=/dev/random of=/dev/vg/snap0 bs=1G count=1
  • pool Data% goes to 80%, Meta% to 23.93% (+3.32%)
  • write 1GB random data to the origin of the snapshot
  • dd if=/dev/random of=/dev/vg/tvol0 bs=1G count=1
  • hmm, pools still at 80% Data% and 23.93% Meta%
  • write 2GB random data
  • dd if=/dev/random of=/dev/vg/tvol0 bs=1G count=1
  • pool is now full 100% Data% and 27.15% Meta%

Observations:

  • Creating a snapshot on its own didn't consume more metadata
  • Creating new LVs consumed a tiny amount of metadata
  • Every 1GB written resulted in ~3.3% metadata growth. I assume this is 8MB x 0.033 = approx 270KB. With 64KB per chunk that would be ~17 bytes per chunk. Which sounds reasonable.

Q4. So is metadata growth mainly just due to writes and mapping physical blocks to the addresses used in the LVs?

Q5. I reached max capacity of the pool and only used 27% of the metadata space. When would I ever run out of metadata?

And I think the final Q is, when creating the thin pool, should I use less than 100% of the space in the volume group? Like save 2% for some reason?

Any tips appreciated as I try to wrap my head around this!

2438
 
 
The original post: /r/datahoarder by /u/blarrybob on 2025-03-31 00:38:53.
2439
 
 
The original post: /r/datahoarder by /u/Jplakes on 2025-03-30 23:53:47.

Hi everyone, I’ll be traveling to the U.S. soon (1 week in New York and 3 days in Washington, D.C.), and I’m considering bringing back 2 hard drives since the savings seem significant. For example, a Seagate 12TB drive costs around $200 on Amazon, while in Santiago, it’s over $320

A few questions I have:

  1. Availability and purchase: • Are 12TB drives commonly found in physical stores, or are they mostly available online (Amazon, Newegg, etc.)? • If I want to buy in a physical store, which places in New York or Washington, D.C. would have good prices and stock? (Best Buy, Micro Center, etc.) • Since I’ll only be in the U.S. for a short time, I’m not sure if ordering from Amazon is a good idea (in case of delivery delays or issues).
  2. Transport: • Is it safe to carry the drives in my carry-on, or is it better to check them in my luggage? • Any recommendations for protecting them during travel to avoid damage from shocks or vibrations? • Are there any customs issues when bringing hard drives into Chile?

If anyone has done this before and has advice, I’d really appreciate it.

2440
 
 
The original post: /r/datahoarder by /u/stikves on 2025-03-30 23:17:27.

I need some help.

Every now and then I look into moving my backups off of a HDDs. Carrying a large box of HDDs, and then carefully migrating them to fresher drives as they age has been a chore.

Tape makes perfect sense, as the optical media stalled at max 100GB capacity, and SSD is too expensive still.

And, we finally have Thunderbolt external drives:

https://ltoworld.com/products/owc-archive-pro-lto-8-thunderbolt-tape-storage-archiving-solution-0tb-no-software-copy?srsltid=AfmBOopwwRkLc2f07XFv7F_eLJWxeXvi7DyHAo7NOsHHeXnwkKCHnxD8j34&gQT=2

"OWC Archive Pro LTO-8 Thunderbolt Tape Storage/Archiving Solution, 0TB, No Software"

However, I still cannot make the math work.

For a $5,000 drive, I can still buy and shuck a bunch of external HDDs, at roughly $7/TB. So before buying any tapes at all, I would need to have 714TB of data to break even. (Of course not considering longevity or the hassle)

Checking back if older ones, like LTO-5 has dropped in price? And the answer is still no. At least not the easy to use external ones.

Did I miss anything?

Or is there a viable tape option for those of us with roughly 50TB - 100TB of data?

2441
 
 
The original post: /r/datahoarder by /u/thrthrowawayay on 2025-03-30 22:23:20.

I dont really have much going for me in life so i think I should just force my purpose into to this before doing anything, idk much else than getting an m disc drive n maybe burying them in a box somewhere I just want to help preserve history as much as I can but i really don't know what's important or likely to be lost for our future

2442
 
 
The original post: /r/datahoarder by /u/Richard_Foresty on 2025-03-30 21:28:53.

Hey,

Just wanted to put it in here in case anyone gets the same issue as me.

I was getting Event id 157 "drive has been surprise removed" in Windows and had no idea why.

Tried turining off Seagate power features, re-formatting, changing drive letter - nothing helped.

True - I do not know if those other things could not have been parts of the issue.

However the thign that truly resoled it for me was disabling Write Caching in Windows.

Disabling write caching:

  • Open Device Manager.
  • Find your Seagate Exos drive under Disk Drives.
  • Right-click the drive and choose Properties.
  • Go to the Policies tab and uncheck Enable write caching on the device.

After that (at least so far) the issue no longer occured.

Hope it helps someone in the future.

2443
 
 
The original post: /r/datahoarder by /u/sunrisedown on 2025-03-30 20:51:30.

Hi everyone,

Just got my photo scanner to digitise the analogue photos from older family.

What are the best possible settings for proper scan results? Is vuescan delivering better results than the stock software? Any settings advice here, too?

Thanks a lot!

2444
 
 
The original post: /r/datahoarder by /u/g-e-walker on 2025-03-30 19:08:29.

youtube-dl-react-viewer is a web app for yt-dlp that supports viewing and downloading videos. youtube-dl-react-viewer is 100% free and open-source.

Live Demo | Screenshots | GitHub Repo

Major Changes

  • Videos that you have watched will now appear marked as watched everywhere in the web app
  • Reworked the navbar to improve usability on mobile devices
  • Added advanced search which can be used to filter videos by uploader, playlist, download job, website, and date
  • Updated the video player
    • Added theater mode
    • Added audio only playback mode
    • Added a screenshot button
  • Added numerous video player settings (settings can be set individually for mobile, tablet, and desktop viewports)
    • Enable/disable autoplay
    • Keep player controls visible
    • Player UI scale
    • Default volume
    • Default playback rate
    • Show/hide large play and seek buttons
    • Seek button skip time
    • Position player controls on or below the video
    • Show/hide current and remaining time

The full changelog can be found on the releases page

2445
 
 
The original post: /r/datahoarder by /u/ex0hs on 2025-03-30 18:03:35.

Hello everyone.

I have a lot of data( 4-5 TB small files like photos, videos, documents ) across 3 computers, 2 mobile phones, 6+ google drive acc, telegram. I also have a lot of credentials(10+ active email accounts for each of 3 email providers for various things(over 500+ accounts created across various websites), a lot of credentials on paper, text files, KeepassXC, 5+ books etc.

This is haunting me as the things are everywhere and messy.

How do I manage it all? Please help me :(

(PS In college right now, so do not have money to buy additional storage for the timebeing. Thanks)

2446
 
 
The original post: /r/datahoarder by /u/thehumbleandwiseone on 2025-03-30 16:46:53.
2447
 
 
The original post: /r/datahoarder by /u/BobInBowie on 2025-03-30 16:24:53.

Do the new 6 and 8tb Blue drives have WDIDLE3?

Don't have either drive, just checking before i buy.

2448
 
 
The original post: /r/datahoarder by /u/Due_Replacement2659 on 2025-03-30 14:52:48.

I have no idea whether this makes sense to post here, so sorry if I'm wrong.

I have a huge library of existing Spectral Power Density Graphs (signal graphs), and I have to convert them into their raw data for storage and using with modern tools.

Is there anyway to automate this process? Does anyone know any tools or has done something similar before?

An example of the graph (This is not we're actually working with, this is way more complex but just to give people an idea).

https://preview.redd.it/yo47siwmbure1.png?width=554&format=png&auto=webp&s=1b70e08c514bd849eedd5ce46c1c5091f973940d

2449
 
 
The original post: /r/datahoarder by /u/lawanda123 on 2025-03-30 14:30:14.

How do you folks catalog your data and make it searchable and explorable? Im a data engineer currently planning to hoard datasets, llm models and basically a huge variety of random data in different formats- wikipedia dumps, stackoverflow, YouTube videos.

Is there an equivalent to something like Apace Atlas for this?

2450
 
 
The original post: /r/datahoarder by /u/bingobango2911 on 2025-03-30 14:19:23.

Hiya,

I've sorted through my photos using Duplicates.dupeguru.

I want to rename them (year / month / date based on the embedded information in the file), but I don't want to move them. I was going to use PhotoMove but it looks as though using that it would move them all into individual folders.

Does anyone know of any free software that will let me bulk rename the individual photo files?

Thanks!

view more: ‹ prev next ›