It's A Digital Disease!

23 readers
1 users here now

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

founded 2 years ago
MODERATORS
3076
 
 
The original post: /r/datahoarder by /u/Main_Shock_4269 on 2025-02-26 15:53:46.

Since CD and DVD Recordables are non erasable, is there a way to make a file on a recordable unusable / unreadable? The purpose being that so devices that can't read specific formats dont completely discard the disc because of one or two unsupported file format files. Thanks

3077
 
 
The original post: /r/datahoarder by /u/Own_Ad6901 on 2025-02-26 15:26:52.
3078
 
 
The original post: /r/datahoarder by /u/Internal-Ad-2771 on 2025-02-26 15:25:02.

Hello! I want to download the End Of Term Web Archive 2024 to perform text analysis and track changes in textual content. I know that the Internet Archive has a collection where we can download WARC files here https://archive.org/details/EndOfTerm2024WebCrawls, but it amounts to hundreds of terabytes, and I can't download everything. Since I'm only interested in HTML files, and perhaps not all domains but just the most visited ones, I wonder if there is a more optimal solution. I thought of two possibles solutions:

  • WET files, which contain only the text extracted from the EOT and are much smaller, are available here: https://eotarchive.org/data/ for previous years, but not for 2024. Does anyone know of links for 2024?
  • I tried to download each HTML file individually using the Wayback Machine API, but there is a rate limit of 20 requests per second I think. For a website like state.gov, there are more than 500,000 captures between 2024 and 2025 to download, so it would be very long.

Any other ideas?

3079
 
 
The original post: /r/datahoarder by /u/samhep1 on 2025-02-26 14:55:45.

Apologies if this is the wrong sub, and if so please advise the appropriate place.

I currently use IDrive for personal use. It's not pretty, but it does the job and is ideal for my requirements.

My other business currently uses Nextcloud. This has more features (which aren't required) and is significantly more expensive as a file sharing and backup solution.

Ideally, I would simply transfer my business to IDrive and the business partner would have access to the account. We would both back up our files to IDrive and we would make use of the cloud drive (shared drive) for business files.

However, I can't simultaneously run my personal account through IDrive, alongside this hypothetical new account. This is quite frustrating - I have had conversations with IDrive and they have advised that this is unfortunately a limitation of their product.

Now I am looking for an alternative to IDrive. Practically the same product, in fact. I want software where I can, across multiple devices, back up specific (business related folders) across various machines (ie, my machine and the other person's machine), have a shared drive which can be used for collaborative working and a function to share files externally via a link. Most importantly, it must be cheap, like IDrive. I'm not looking for software bloated with additional features. This is simply all I need, and it is a shame that IDrive cannot offer a solution where I can maintain my personal account and simultaneously run a second account for the business.

3080
 
 
The original post: /r/datahoarder by /u/Sea-Paleontologist84 on 2025-02-26 14:55:22.

i used to be able to download embeded video using FetchV (on Chome), 1DM+ (Android), but now some of the videos can no longer be dected by them for downloading. Anyone can help me? this link:

https://v.xlys.ltd.ua/play/25279-5.htm

3081
 
 
The original post: /r/datahoarder by /u/BarneyBStinson on 2025-02-26 13:16:32.

So pre season testing and skyf1's crofty claims each single redbull car sends back 1.5billion terrabytes of data each race. Ehhh ok Crofty give me a chance to catch my breath i can only laugh so hard. It was his confidence in what he was saying that got me laughing so hard.

3082
 
 
The original post: /r/datahoarder by /u/timeister on 2025-02-26 12:36:26.

Alright, so here’s the deal.

I bought a 45 Drives 60-bay server from some guy on Facebook Marketplace. Absolute monster of a machine. I love it. I want to use it. But there’s a problem:

🚨 I use Unraid.

Unraid is currently at version 7, which means it runs on Linux Kernel 6.8. And guess what? The HighPoint Rocket 750 HBAs that came with this thing don’t have a driver that works on 6.8.

The last official driver was for kernel 5.x. After that? Nothing.

So here’s the next problem:

🚨 I’m dumb.

See, I use consumer-grade CPUs and motherboards because they’re what I have. And because I have two PCIe x8 slots available, I have exactly two choices:

  1. Buy modern HBAs that actually work.

  2. Make these old ones work.

But modern HBAs that support 60 drives?

• I’d need three or four of them.

• They’re stupid expensive.

• They use different connectors than the ones I have.

• Finding adapter cables for my setup? Not happening.

So now, because I refuse to spend money, I am attempting to patch the Rocket 750 driver to work with Linux 6.8.

The problem?

🚨 I have no idea what I’m doing.

I have zero experience with kernel drivers.

I have zero experience patching old drivers.

I barely know what I’m looking at half the time.

But I’m doing it anyway.

I’m going through every single deprecated function, removed API, and broken structure and attempting to fix them. I’m updating PCI handling, SCSI interfaces, DMA mappings, everything. It is pure chaos coding.

💡 Can You Help?

• If you actually know what you’re doing, please submit a pull request on GitHub.

• If you don’t, but you have ideas, comment below.

• If you’re just here for the disaster, enjoy the ride.

Right now, I’m documenting everything (so future idiots don’t suffer like me), and I want to get this working no matter how long it takes.

Because let’s be real—if no one else is going to do it, I guess it’s down to me.

https://github.com/theweebcoders/HighPoint-Rocket-750-Kernel-6.8-Driver

3083
 
 
The original post: /r/datahoarder by /u/orcus on 2025-02-26 02:59:12.

I'm a unix grump, I mostly hoard code and distro ISOs and here are my top aliases related to hoarding said things. I use zsh, ymmv with other shells.

These mostly came about from doing long shell pipelines and just deciding to slap an alias on them.

# yes I  know I could configure aria2, but I'm lazy
# description: download my random shit urls faster
alias aria='aria2c -j16 -s16 -x16 -k1M'

# I'll let you figure this one out
alias ghrip='for i in $(gh repo list --no-archived $(basename $PWD) -L 9999 --json name | jq -r ".[].name"); do gh repo clone $(basename $PWD)/$i -- --recursive -j10; done'

# ditto last #
alias ghripall='for i in $(gh repo list $(basename $PWD) -L 9999 --json name | jq -r ".[].name"); do gh repo clone $(basename $PWD)/$i  -- --recursive -j10; done'

3084
 
 
The original post: /r/datahoarder by /u/stormcomponents on 2025-02-26 08:54:31.

My data hoarding may be different to many; where my actual storage needs are (relatively) low but I want good quality forms of backup and redundancy at all times. Tape has always been the end-game for reliable long-term storage for my setup, and I've finally got it going! I haven't really anyone that I could explain the setup to and get any response other than 'why?', so I had to quickly post my excitement here...

It feels so refreshing; as it did when I first started playing with enterprise grade hardware, to get new hardware setup and automated. I've got a (new to me) DL380 G9 as my VM server now, with a HBA passed-through to one of the VMs. That HBA connects via SAS to the library, and has a NFR license for Veeam to control the backups. It feels pretty magical to get everything setup, to click on 'backup' from upstairs via a web-based GUI VM, only to come downstairs and hear tapes physically moving around in the rack and getting data written onto them. At the moment the tapes used are only LTO5 (28TB total) but it'll only take a few days to copy my entire hoard over and then I know it's safe to a level far higher than most.

I have 4 free slots so I can throw in 12TB of LTO6 tapes if I need to expand, and if I moved the whole library to LTO6 it'd offer 72TB total.

Things have come a long way from when I used to have a single 1.3MB floppy with "Tom's stuff" written on it, to automating the writing of tape archives in a 42U rack via virtual backup systems. Feels good.

https://preview.redd.it/9gjrdyic7gle1.jpg?width=774&format=pjpg&auto=webp&s=3de1852bef87c60c353a6bd1a57f5862bb416866

3085
 
 
The original post: /r/datahoarder by /u/P0lpett0n3 on 2025-02-26 08:16:38.

Hello, I have some mega accounts and I don't want they are wiped after 3months. Last year I used megatools+cron to logging in periodically, but now it's not working anymore. I can't log into an account using megatools until I log into an account using a browser.

Do you know workarounds, should I use proxies? Other tools?

3086
3087
 
 
The original post: /r/datahoarder by /u/True-Entrepreneur851 on 2025-02-26 06:52:59.

I have 100+ videos. Some are the same with cuts, others are with bad definition, there are originals …. But difficult to check as many are about different things I did with my camera. Used softwares such as videdup and others but doesn’t provide the duplicates. I need a manual check. It would be easy with 10 files but I have many to compare, anyone who can suggest something please ?

3088
 
 
The original post: /r/datahoarder by /u/GTRacer1972 on 2025-02-26 06:45:58.

I read somewhere that it has issues doing that and just winds up making duplicates of all of your files while copying the new files over. I want it so I can leave it running in the tray and if I download a song, it backs up just that, if I save picture it backs up just that, if my game folder changes it backs up that. I do not want it creating duplicates of everything or overwriting files that may have the same name without asking me.

I suppose periodically I could just wipe my folders on iDrive and do another full backup with everything new, but that takes days.

3089
 
 
The original post: /r/datahoarder by /u/SunTzy69 on 2025-02-26 03:44:42.
3090
 
 
The original post: /r/datahoarder by /u/DLMorrigan on 2025-02-26 02:55:40.

Hi all, I have been on a (mostly) successful adventure to fix the abysmally slow parity raid speeds in the windows storage spaces tool by following this incredible guide. https://storagespaceswarstories.com/storage-spaces-and-slow-parity-performance/#more-63

I have 6 identical Crucial 2tb MX500 ssds over sata directly on my motherboard

These are split into 2 different 3 drive storage pools (as to my knowledge you cannot follow the guide above with 6 drives, one being parity.) Either way my pools are configured the same: 3 columns with an interleave of 32KB and a Allocation size set to 64KB. Same as the guide. Yet when Running both through CrystalDiskMark I am getting half or less read speeds on one of the arrays, and I cant for the life of me figure it out. Increasing and decreasing the allocation size and interleave does not fix the issue and reconfiguring both leads to the same result again. See screenshot attached.

Looking around online I am not seeing anything, but I am new to raid and parity calculations using storage spaces so its possible I am missing something but I am not sure what. Anyone have any ideas what would be causing this massive difference in read speeds? Any ideas would be greatly appreciated.

Two identical 3 drive arrays

3091
 
 
The original post: /r/datahoarder by /u/Fun-Yard-6952 on 2025-02-26 02:10:11.

Hi, I have a couple of questions:

  1. Someone advised me to buy WD Red Pro to store my media files because they use CMR, whereas the Plus version does not. Is that correct? How important do you think having CMR is? I noticed that if I buy the Plus version, I can afford almost double the storage capacity compared to the Pro version. For example, with €200 (which is my maximum budget), I could get an 8TB WD Red Plus, whereas with the Pro version, I could only get a 4TB drive at most (which costs €175).
  2. I'm not building a NAS that will run 24/7—I need the drive as a tertiary storage disk, alongside two SSDs, for a computer I recently built. This means it will be turned on and off frequently, but it will never run 24/7 like a NAS. I'm not sure if this makes any difference when choosing the right drive.

If I want a drive that can last for years, even 10+, without worrying about failures, what would you recommend?

3092
 
 
The original post: /r/datahoarder by /u/cashregister9 on 2025-02-26 01:59:25.

I had a pretty simple setup , Just 2 external hard drives, both about 2-3 years old one Seagate 2TB drive and another WD 5TB drive, the 2TB drive died last month, not sure why, but just one day Windows would not recognize it, it still spun and everything, but I just could not access the files, it was probably corrupt. But now everything Is stored on my main PCs SSD and that 5TB HDD which I am now baby-ing, that was the impetus to start taking my rampant data hoarding more seriously.

but I am a newbie at all of this, so where would I begin? for my purposes I am mostly saving Images, PC Backups and Videos and I do not have the means or funds to set up a NAS, and due to constantly moving around, something that can be portable would be nice (but obviously not a requirement.) I've tried Cloud Storage but that is not really for me (I do not feel like paying for any subscriptions at this moment.) so I've thought about picking up a portable SSD but I am not sure if there is a much more simple, cheaper and more durable solution that I am not aware of.

EDIT: I also imagine having my game drive and archival drive be one in the same is not ideal so I have been trying to separate games from saved data.

3093
 
 
The original post: /r/datahoarder by /u/PsychologicalCake337 on 2025-02-26 01:49:16.

Hi everyone. I have been selfhosting with my Dell Optiplex 7050 SFF for a while now, with the 1 TB SATA SSD that it came with, and an external 20 TB WD Elements HDD. I just bought a Seagate Exos X24 24 TB HDD (I have yet to test it, how should I proceed with this on Debian, by the way?).

I opened up the Dell Optiplex and saw the SSD connected with the SATA power cable and SATA data cable, both of which are connected to the motherboard. The SATA power cable also has another connector attached to it marked "ICT" and "slimline SATA" which I am unfamiliar with.

There is another SATA data cable connected to the motherboard, not connected to anything else, that I can use with this new hard drive. However, I'm unsure of how I should connect the SATA power cable to the new hard drive. Would I need a SATA power splitter cable? Could this be any generic cable I find on Amazon, or would I need to find a specific one? I also noticed another 4 pin port on the X24 hard drive, to the right of the SATA power and SATA data ports. Is there anything I need to connect to that, or can that be left with nothing in it?

3094
 
 
The original post: /r/datahoarder by /u/Scorge120 on 2025-02-26 00:49:35.

Hi folks, not sure if this is the right sub but figure this is data-related and there are some pretty creative people here.

As a self-employed business owner who enjoys doing a year of bookkeeping in one shot, I'm trying to automate that process as much as possible this year.

What tools and workflows are available to process hundreds of scanned receipts and generate spreadsheets I can review without manually inputting data?

In the past, I would scan receipts and manually create a spreadsheet to compare them with bank statements to validate transactions.

I've upgraded to OCR this year to scan all the receipts into a searchable PDF binder. And now I'm wondering if there is an AI tool that can comb through the text on each receipt, and to the best of its capability, create a spreadsheet where each receipt gets organized into rows and columns containing key data such as subtotals, totals, tips, category of transaction, etc.

To take it a step further, could it compare this spreadsheet to another spreadsheet containing bank transactions, and automatically pair receipts to transactions?

I know it wouldn't be perfect and I expect to have to review the result, but with technology now and LLMs, there's got to be something out there that can do this. It would save soo much time.

Any help or advice is appreciated! Thanks.

3095
 
 
The original post: /r/datahoarder by /u/Appropriate_Rent_243 on 2025-02-26 00:47:03.

So, here's what I think I would need: something that can be accessed easily. something that can be written and updated frequently, for example, even nightly for ongoing drafts. But also needs to be able to be stored long-term.

obviously I know that with a novel you can just....print a book on archival paper, but I think it's good to have digital copies too.

3096
 
 
The original post: /r/datahoarder by /u/WisdomSky on 2025-02-26 00:40:50.

I'm not expert in what type of SSD to get for my use case. What I only know is basic stuffs like difference between TLC and QLC.

Basically, I want to have an SSD that can endure too much reading without worrying of it failing because of too much reading. It's basically used for storing(writing) photos once and never gets deleted again. It will also be permanently powered on so worries about bitrot.

Anything that I need to consider? or does QLC ssds would suffice for my use case?

3097
 
 
The original post: /r/datahoarder by /u/PrimaryRequirement28 on 2025-02-26 00:21:29.
3098
 
 
The original post: /r/datahoarder by /u/Professional-Bid69 on 2025-02-25 17:43:56.

I'm going to buy new hard-drives to my PROMISE PEGASUS2 R8 unit. I found some documentation on the manufacturer website but it is not clear if those units will work with HDs bigger than 6TB (each).

https://www.promiseworks.com/datasheets/Pegasus2_DS.pdf

https://www.promise.com/DownloadFile.aspx?DownloadFileUID=6600

Anyone have some experience with that?

Thanks!

3099
 
 
The original post: /r/datahoarder by /u/Magnets on 2025-02-25 22:21:35.
3100
 
 
The original post: /r/datahoarder by /u/Lexard on 2025-02-25 21:51:55.

In the past when I was using rapidgator in free mode to download some file I remember it had some very convenient option to display md5 checksum of the downloaded file.

Yesterday when I checked this service I was not able to find this md5 checksum. Is it gone or was it moved somewhere from the main download page?

view more: ‹ prev next ›