How do you all clone websites, especially ones that refer to things across other domains (and not end up copying the entire internet?). I'm trying to clone a few websites about some old Legos robots before they disappear, but am struggling to do so. The sourceforge ones have been the hardest as they host the files at a different URL (and I think a few also use a '.io' address as well). I have been trying to use wget, wget2, curl, and httrack to clone them, but none have worked well (wget2 has been the best, but one site seemed to have more luck with wget). However, they all miss most of the actual external downloads and I end up trying to manually download the files and updating the internal links by hand and using awk/sed. I have had no luck with httrack (which I like the idea, but I've had no luck with it).
Has anyone tried to backup similar sites and what tools did you use (and better yet how)? Manually editing and reviewing the contents has not scaled well for me...
The list of sites I'm trying to backup is:
- https://brickos.sourceforge.net/
- https://bricxcc.sourceforge.net/
- http://enchanting.robotclub.ab.ca/
- https://lejos-osek.sourceforge.net/
- https://philohome.com/
- https://www.ev3dev.org/
- https://www.instructables.com/Making-8. Mindstorms-RCX-Work-Again/
- https://www.johnholbrook.us/