this post was submitted on 06 Feb 2025
1 points (100.0% liked)

It's A Digital Disease!

23 readers
1 users here now

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

founded 2 years ago
MODERATORS
 
The original post: /r/datahoarder by /u/planepoint101 on 2025-02-06 05:15:00.

NOAA (more specifically, the National Centers for Environmental Information) has ~25 years of climate reports (monthly + annual) at https://www.ncei.noaa.gov/access/monitoring/monthly-report/national, and I'd like to download them. I've never done anything like this before.

I tried using HTTrack; I used the GUI, and got a ~700 MB folder (set of files) which was basically a kind of 'empty-looking' version of the above webpage, without any ability to link to reports. I also tried the CLI version, with a little help from GPT-4o (at duck.ai), but got similar to the above results:

$ httrack "https://www.ncei.noaa.gov/access/monitoring/monthly-report/" -O "/home/af/web_copies/" -N "*.*" -%P -%e0 -%k -%s

(Actually, the above command was supposed to give me the total size without actually downloading anything...didn't work).

Lastly I tried wget...

$ wget -m https://www.ncei.noaa.gov/access/monitoring/monthly-report/ --convert-links --page-requisites --no-parent

...and did get a folder in which was buried an html file that, when opened, displayed a stripped-down version of the webpage in question, but none of the links work and none of the reports can be summoned up (so none of that stuff saved -- I turned wifi off to test it out).

I'm also wondering how I might figure out the total size of the files before I go ahead and download.

Can anyone at least point me in the right direction? Which tool(s) are best, is there some online resource on how to use it? (This -- https://www.httrack.com/html/fcguide.html -- was very detailed, but too technical for me to get much out of at this point).

Thanks!

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here