Yeah, I know about the wiki, it has links to a bunch of stuff but I'm interested in hearing your workflow.
I have in the past used wget to mirror sites, which is fine for just getting the files. But ideally I'd like something that can make WARCs, singlefile dumps from headless chrome and the like. My dream would be something that can handle (mostly) everything, including website-specific handlers like yt-dlp. Just a web interface where I can put in a link, set whether to do recursive grabbing and if it can follow outside links.
I was looking at ArchiveBox yesterday and was quite excited about it. I set it up and it's soooo close to what I want but there is no way to do recursive mirroring (wget -m
style). So I can't really grab a whole site with it, which really limits its usefulness to me.
So, yeah. What's your workflow and do you have any tools to recommend that would check these boxes?