this post was submitted on 23 Jul 2025
1 points (100.0% liked)

It's A Digital Disease!

23 readers
1 users here now

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

founded 2 years ago
MODERATORS
 
The original post: /r/datahoarder by /u/PenileContortionist on 2025-07-22 23:25:55.

Hey folks, threw this together last night since seeing the post about ultimate-guitar.com getting rid of the download button and deciding to charge users for the content created by other users. I've already done the scraping and included the output in the tabs.zip file in the repo, so with that extracted you could begin downloading right away.

Supports all tab types (beyond """OFFICIAL"""), they're stored as text unless they're Pro tabs, in which case it'll get the original binary file. For non-pro tabs, the metadata can optionally be written to the tab file, but each artist has a json file that contains the metadata for each processed tab so it's not lost if not. Later this week (once I've hopefully downloaded all the tabs) I'd like to have a read-only (for now) front end up for them.

It's not the prettiest, and fairly slow since it depends on Selenium and is not parallelized to avoid being rate limited (or blocked altogether), but it works quite well. You can run it on your local machine with a python venv (or raw with your system environment, live your life however you like), or in a Docker container - probably should build the container yourself from the repo so the bind mounts function with your UID, but there's an image pushed up to Docker Hub that expects UID 1000.

The script acts as a mobile client, as the mobile site is quite different (and still has the download button for Guitar Pro tabs). There was no getting around needing to scrape with a real JS-capable browser client though, due to the random IDs and band names being involved. The full list of artists is easily traversed though, and from there it's just some HTML parsing to Valhalla.

I recommend ~~running the scrape-only mode first~~ using the metadata in tabs.zip and using the download-only mode with the generated json output files, but it doesn't really matter. There's quasi-resumption capability given by the summary and individual band metadata files being written on exit, and the --skip-existing-bands + --starting/end-letter flags.

Feel free to ask questions, should be able to help out. Tested in Ubuntu 24.04, Windows 11, and of course the Docker container.

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here