It's A Digital Disease!

23 readers
1 users here now

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

founded 2 years ago
MODERATORS
4226
 
 
The original post: /r/datahoarder by /u/CSManiac33 on 2025-01-26 03:11:48.

Hi so I have a very specific issue i want to try to solve. I was trying to archive the Lord of the Rings galleries found on the Appendices. Now i ripped the VOBs so I am to screenshot the galleries. The problem I have is some of the images have commentary on them. The audio for these commentaries is also in the VOB but since each image is just a single frame all of the audio just gets played one after the other and doesnt directly link to an image. Is there any way i can get the times the DVD uses for the audio or would I have to manually cut the audio up.

4227
 
 
The original post: /r/datahoarder by /u/Tom_Sacold on 2025-01-26 02:20:25.

I wanted to do this for a particular project, not just the hoarding, but let's just say we want to do this.

Let's also say to make it simple we're going to download only .txt versions of the books.

Gutenberg have a page telling you you're allowed to do this using wget with a 2-second waiting list between requests, and it gives the command as

wget -w 2 -m -H "http://www.gutenberg.org/robot/harvest?filetypes%5B%5D=txt&langs%5B%5D=en"

now I believe this is supposed to get a series of HTML pages (following a "next page" link every time), which have in them links to zip files, and download not just the pages but the linked zip files as well. Does that seem right?

This did not work for me. I have tried various options with the -A flag but it didn't download the zips.

So, OK, moving on, what I do have is 724 files (with annoying names because wget can't custom-name them for me), each containing 200-odd links to zip files like this:

<a href="http://aleph.gutenberg.org/1/0/0/3/10036/10036-8.zip">http://aleph.gutenberg.org/1/0/0/3/10036/10036-8.zip</a>

So we can easily grep those out of the files and get a list of the zipfile URLs, right?

egrep -oh 'http://aleph.gutenberg.org/[^"<]+' * | uniq > zipurls.txt

Using uniq there because every URL appears twice, in the text and in the HREF attribute.

So now we have a huge list of the zip file URLs and we can get them with wget using the --input-list option:

wget -w 2 --input-file=zipurls.txt

this works, except … some of the files aren't there.

If you go to this URL in a browser:

http://aleph.gutenberg.org/1/0/0/3/10036/

you'll see that 10036-8.zip isn't there. But there's an old folder. It's in there. What does the 8 mean? I think it means UTF-8 encoding and I might be double-downloading— getting the same files twice in different encodings. What does the old mean? Just … old?

So now I'm working through the list, not with wget but with a script which is essentially this:

try to get the file
if the response is a 404
    add 'old' into the URL and try again

How am I doing? What have I missed? Are we having fun yet?

4228
 
 
The original post: /r/datahoarder by /u/Pasta-hobo on 2025-01-26 01:50:40.

I'm looking for something I can keep a backup of that is essentially just an offline Rosetta stone for as many languages as possible, preferably even for some of the more significant conlangs like Esperanto.

4229
 
 
The original post: /r/datahoarder by /u/busymom0 on 2025-01-26 01:49:44.

I am trying to search the Internet Archive using their ia command line tool:

https://archive.org/developers/internetarchive/cli.html

I tried this in macOS Terminal:

./ia search 'site:"theverge.com"'

But it returns nothing. It literally just returns blank:

https://i.sstatic.net/A2GOf7C8.png

I have already run the ./ia configure command and confirmed the configuration with access keys have been saved to my /Users/username/.config/internetarchive/ia.ini file.

I tried performing an advanced search using proxyman:

https://archive.org/advancedsearch.php?q=site%3Atheverge.com&output=json

This also returns nothing in the docs in the returned JSON:

https://i.sstatic.net/E41hAu1Z.png

Am I missing something?

The other option I tried was using their CDX:

http://web.archive.org/cdx/search/cdx?url=https%3A%2F%2Fwww.theverge.com&output=json&filter=statuscode%3A200&fl=timestamp%2Curlkey%2Cdigest&collapse=digest&from=20250101

This gives me a bunch of timestamps and hashes:

[["timestamp","urlkey","digest"],
["20250101115214","com,theverge)/","TKDDQK2R4D6GYWVKB3TXZHICBK3SX5X6"],
["20250101171809","com,theverge)/","XH5RGNUK4TIFBIM3BHB3ZPGMLVUP4RLZ"],
["20250102155507","com,theverge)/","TYPMLYUTBEP6HKRXWBLFYJ3B7VVKY4MH"],
["20250103055042","com,theverge)/","FCZ7ZRULMJLO4CZLYWHHWE5FKNMKIUNB"],

Is there a way I can download each of these files using the hash?

4230
 
 
The original post: /r/datahoarder by /u/Ecstatic_Constant_63 on 2025-01-26 01:38:21.

I keep reading that these HDDs are just not good as a daily HDD. I'm currently in a situation where in I found someone selling several 6TB skyhawks that are 4 years old for a third the price of a brand new WD Blue 8TB....

Now I'm not that picky and I currently have a 4tb WD purple as my D drive which came with the second hand SFF computer that I bought. My C drive is in an SSD and that is where i store my VMs and app data. the purple is used for my data hoards like ISOs and family pictures and videos. Within this year I'm planning to converting this HDD as a storage for immich so the family can 'stream' our photos and videos from the network. So that would mean more 'reads'.

Anyways so I'm really considering moving from the WD purple to the Seagate skyhawk mainly due to the cheap price. I'm ok to wait a bit longer for reads as I will have most of my VMs on the faster SSD.

Any thoughts?

4231
 
 
The original post: /r/datahoarder by /u/ijjat on 2025-01-26 00:59:26.

I have a WD passport device that I used to back up my data on my Windows 7 laptop. I recently got a new laptop with Windows 11. When I plug in the WD passport, I can't find the recovery option or the WD backup software. Does anyone know how I can recover my data?

Any help is appreciated.

4232
 
 
The original post: /r/datahoarder by /u/Bruceshadow on 2025-01-26 00:30:52.
4233
 
 
The original post: /r/datahoarder by /u/Massive-Lettuce-1630 on 2025-01-25 23:12:33.

Hey, I'm not expert and also don't know that much about these things, I'd really appreciate your help.

I have two options, SSD drives (not NVMEs, the laptop harddrive looking SSD drives.) And the regular desktop HDD that we all use.

I need just one drive to store some data like pictures and text files. I will keep it unplugged and only plug it like once a month for a few hours to move files there. I need the data to be fine for 10 years. (Realistically I'll probably change and get a new drive in future, but let's say 10 years for the sake of my understanding)

and let's say, in all the ten years, I'll only put about around 500GBs data in it.

Now, where I live, I only have options like, WD and Toshiba for HDD. And for SSD many ones like Transcend and all, but I don't think I can go to SSD that are from well-known brands because they'll cost a lot.

But still, let's keep aside the cost part, and can you tell me which one will be good for my case? SSDs or HDDs?

Thank you.

4234
 
 
The original post: /r/datahoarder by /u/punisher2002-19 on 2025-01-25 23:08:05.

Hey Everyone First time poster Just had a question, Im starting a new Nas build want to do it a bit better then the past only issue is the drives I have , 2x 8th , 3x4tb and a 3tb , some drives are older sadly can't afford to just buy a bunch of drives, I'm wanting an array of the drives where if one fails only that drives data is lost , it's not important data so I'm not super worried plus I will check health and remove drives as needed , I'm looking for a suggestion on how to do this , and what os to do this on

Currently I'm using mergefs and multiple drives on an old hp micro server , upgrading to a 8 core Xeon

Thanks everyone

4235
 
 
The original post: /r/datahoarder by /u/Zmashcat on 2025-01-25 23:03:11.

I have tried my way around true-nas for the past couple of days on a old pc and I want to do a real nas build. In my area there is a auction ending tomorrow for a MSI p43 neo + Intel Core 2 Quad Q9550 + Corsair XMS2 8GB DDR2. Would this be good for a first nas. I really only plan on storing photos and other files on it however I might dabble in some plex streaming or trying to divide the storage space for multiple profiles (family members). Would this be a good starting point? The auction is right now at 4 bucks with one bid on it so I doubt it will go much higher.

4236
 
 
The original post: /r/datahoarder by /u/Ballin_Like_Curry on 2025-01-25 22:44:54.

I have 2 external usb 3.0 seagate hard drives and was wondering if theres a recommended way to eject them when theyre not in use. I normally just go to the settings,hit eject,wait about 30 seconds then unplug the usb from my laptop and then unplug the power cord. Is there a better/ safer way of going about this or is this perfectly fine? Dont want to cause unnecessary wear in case im doing something wrong

4237
 
 
The original post: /r/datahoarder by /u/Aniwaya1 on 2025-01-25 22:40:58.

Will I see any significant difference or noticeable issues from using different DVD RW brands? Similarly, will I see noticeable differences or issues between SATA and IDE?

4238
 
 
The original post: /r/datahoarder by /u/BesaidBoy on 2025-01-25 22:10:40.
4239
 
 
The original post: /r/datahoarder by /u/xEska1337 on 2025-01-25 22:08:23.

Is there any good software to auto-tag(not manual tagging) and search for pictures and videos. I would like to organize my meme library. Preferably open source but it is not mandatory.

4240
 
 
The original post: /r/datahoarder by /u/ElaborateCantaloupe on 2025-01-25 21:48:10.
4241
 
 
The original post: /r/datahoarder by /u/StrategosRisk on 2025-01-25 21:43:55.

I'm using SingleFile to back up old message board threads, some of which might end up being nearly 30 MB with all of the images. That's fine for me to store locally, but I'd like to also host them online as a mirror. Does anyone have any tips for reducing their size? Should I compress them somehow? How can I reduce the size of the saved image data streams? Or should I just use a different format for posting the pages?

4242
 
 
The original post: /r/datahoarder by /u/Reasonable-Finger-87 on 2025-01-25 21:43:10.

https://preview.redd.it/rhqiritkm7fe1.png?width=1855&format=png&auto=webp&s=10d43bb9367e06128812cf9c8e54ffa15480ae12

I'm downloading it as "Webpage, Complete". It's being stored on a flashdrive and downloaded using Google Chrome.

I think the problem might be that it may use scripts to load the page. I tried looking up ways to fix that, and it said to download it as .mht. The issue with that is that it seems "Internet Explorer"/Microsoft Edge no longer supports downloading pages in that file type.

4243
 
 
The original post: /r/datahoarder by /u/ThinkerBe on 2025-01-25 21:11:46.

I'm using rclone to mount my cloud storage to Windows Explorer, but I've noticed that it only works while the cmd window is open. I want it to run in the background without the cmd window appearing in the taskbar. How can I achieve this on Windows?

Thanks in advance for any tips!

4244
 
 
The original post: /r/datahoarder by /u/jazzdabb on 2025-01-25 20:55:41.
4245
 
 
The original post: /r/datahoarder by /u/hiihiiii on 2025-01-25 21:58:25.

Is it just me or is the api for archivedotorg down?

4246
 
 
The original post: /r/datahoarder by /u/Soft-Bobcat2122 on 2025-01-25 21:02:28.
4247
 
 
The original post: /r/datahoarder by /u/hefas on 2025-01-25 20:26:55.

I'm currently using a vm with TrueNAS as my NAS for media and important things. I really don't like it. It's too complicated and too powerful for what I need and I'm constantly afraid that I'm one bad uninformed decision away from losing my data. So I decided to move at least my important files to a thing I would feel more confident using and less likely to brick thus I bought unas pro. Also this will allow me to have a backup on a separate machine - the TrueNAS I'm currently using.

I have one 16tb hdd that I use for media and two 4tb HDDs that are mirrored for my important things that are also backed to the 16tb drive.

I can't decide which drives to move to unas as it can only have 1 pool.

At first I considered buying 4tb drives (to have 4 in total for when raid6 is available and adding more when needed) but price per tb compared to 16tb drives is way bigger. Also more drives=more watts which do add up over the years (about 8eur/drive a year for me).

Another consideration is buying an extra 16tb drive and have them mirrored on unas and use the 2x4tb drives on TrueNAS to backup the important stuff. With this UNAS would store important files+media and truenas only backups. My issue with this is I don't need my media on a raid so this would waste my storage (and maybe accelerate the degradation of HDDs and increase failure rate?)

I'm looking at buying 16tb Ironwolf pro for 264eur(16eur/tb) as it's the same as I already have or 4x4tb Iron wolf pro at 86eur(21eur/tb). I think I can't reuse my old 4tb HDDs because they're WD REDs(very likely SMR) so I would have to buy all 4 new (86x4=344eur). Innitial cost is bigger with smaller drives but its more safer? And failures would be less costly. My important data is currently about 3tb so I'm not too concerned about running out of hdd bays.

4248
 
 
The original post: /r/datahoarder by /u/mobdk on 2025-01-25 20:06:32.

I am building a NAS (truenas) and I want to have an ultrafast storage pool for working with video editing. I would like to use a total of 8 M.2 NVMEs in a raid setup w 2x parity.

The solution that seems like the best way would be to use 2x Asus Hyper m.2 gen5 Card (https://www.asus.com/motherboards-components/motherboards/accessories/hyper-m-2-x16-gen5-card/) with 4 drives on each.

However. For a Asus Hyper M.2 card to work you need a PCIe x16 slot that supports bifurcation (4x4x4x4). So to have 2x of these cards installed and working I would need two PCIe x16 slots with bifurcation and I can't find any information if this is possible? It seems that every mobo behaves differently when using the PCIe slots and some have a limit on the speed depending on what kind of cards you install...

I would like to install a 25G sfp+ card also, but other than that I don't need a beefy GPU - would prefer a CPU with built-in GPU.

ECC ram would be nice. Low power consumption also. The only computing power I need is to run the file transfer and software raid, etc. in Truenas

Does this motherboard exist?

4249
 
 
The original post: /r/datahoarder by /u/genes1x on 2025-01-25 19:57:52.

Hi there, data hoardarians i invoke you!

I already have 2 wd red plus 12tb i bought them because this sub, i read thousand times they are really quiet and its true, im very happy with them.

Im ready to get my 3rd hdd but in my country right now its really hard to find wd red plus used(wd red plus are sold out). Im from Europe and here thers not serverpartdeals.

I got my 2 hdd in a second hand website, they were new, sealed for 200€. (New its over 300 and almost 400). I need an alternative to WD red plus, every time i try to buy a good deal, Toshiba mg, Seagate, etc. I search about noise reviews, opinions, etc and i dont buy It because of scratching noise (wd sounds more like blurpblurp).

I know noise is relative for you maybe is quiet and for others is noisy as fuck.

I have seen reviews of Toshiba mg and n300, ironwolfs(maybe not pro version?), exos, hgst ultrastar,my book/elements shucked.

YouTube videos about noise are really bad... They amplified the sound of the video and scares you to buy a chainsaw instead a HDD.

Can you tell me an alternative to theese brands/models that are really quiet? Or one i mentioned above? (Im scared of ironwolfs because of this sub and YouTube)

4250
 
 
The original post: /r/datahoarder by /u/Beneficial_Ad_4911 on 2025-01-25 18:16:04.
view more: ‹ prev next ›