this post was submitted on 09 Aug 2024
1 points (100.0% liked)

It's A Digital Disease!

23 readers
1 users here now

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

founded 2 years ago
MODERATORS
 
The original post: /r/datahoarder by /u/-shloop on 2024-08-09 07:32:29.

Tool and source code available here: https://github.com/shloop/google-book-scraper

A couple weeks ago I randomly remembered about a comic strip that used to run in Boys' Life magazine, and after searching for it online I was only able to find partial collections of it on the official magazine's website and the website of the artist who took over the illustration in the 2010s. However, my search also led me to find that Google has a public archive of the magazine going back all the way to 1911.

I looked at what existing scrapers were available, and all I could find was one that would download a single book as a collection of images, and it was written in Python which isn't my favorite language to work with. So, I set about making my own scraper in Rust that could scrape an entire magazine's archive and convert it to more user-friendly formats like PDF and CBZ.

The tool is still in its infancy and hasn't been tested thoroughly, and there are still some missing planned features, but maybe someone else will find it useful.

Here are some of the notable magazine archives I found that the tool should be able to download:

Billboard: 1942-2011

Boys' Life: 1911-2012

Computer World: 1969-2007

Life: 1936-1972

Popular Science: 1872-2009

Weekly World News: 1981-2007

Full list of magazines here.

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here