Selfhosted

58124 readers

1002 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.
No low-effort posts. This is subjective and will largely be determined by the community member reports.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago

MODERATORS

HybridSarcasm@lemmy.world

HybridSarcasm@lemmy.hybridsarcasm.xyz

670

Self-host Reddit – 2.38B posts, works offline, yours forever (github.com)

submitted 2 months ago by 19_84@lemmy.dbzer0.com to c/selfhosted@lemmy.world

113 comments fedilink hide all child comments

Reddit's API is effectively dead for archival. Third-party apps are gone. Reddit has threatened to cut off access to the Pushshift dataset multiple times. But 3.28TB of Reddit history exists as a torrent right now, and I built a tool to turn it into something you can browse on your own hardware.

The key point: This doesn't touch Reddit's servers. Ever. Download the Pushshift dataset, run my tool locally, get a fully browsable archive. Works on an air-gapped machine. Works on a Raspberry Pi serving your LAN. Works on a USB drive you hand to someone.

What it does: Takes compressed data dumps from Reddit (.zst), Voat (SQL), and Ruqqus (.7z) and generates static HTML. No JavaScript, no external requests, no tracking. Open index.html and browse. Want search? Run the optional Docker stack with PostgreSQL – still entirely on your machine.

API & AI Integration: Full REST API with 30+ endpoints – posts, comments, users, subreddits, full-text search, aggregations. Also ships with an MCP server (29 tools) so you can query your archive directly from AI tools.

Self-hosting options:

USB drive / local folder (just open the HTML files)
Home server on your LAN
Tor hidden service (2 commands, no port forwarding needed)
VPS with HTTPS
GitHub Pages for small archives

Why this matters: Once you have the data, you own it. No API keys, no rate limits, no ToS changes can take it away.

Scale: Tens of millions of posts per instance. PostgreSQL backend keeps memory constant regardless of dataset size. For the full 2.38B post dataset, run multiple instances by topic.

How I built it: Python, PostgreSQL, Jinja2 templates, Docker. Used Claude Code throughout as an experiment in AI-assisted development. Learned that the workflow is "trust but verify" – it accelerates the boring parts but you still own the architecture.

Live demo: https://online-archives.github.io/redd-archiver-example/ GitHub: https://github.com/19-84/redd-archiver (Public Domain)

Pushshift torrent: https://academictorrents.com/details/1614740ac8c94505e4ecb9d88be8bed7b6afddd4

you are viewing a single comment's thread
view the rest of the comments

[–] euAppleHater@feddit.org 2 points 2 months ago (1 children)

Wait, do you have an issue with piracy in general or an issue with the arr attack specifically? No judgement or interest in argument, just genuinely curious. Feel free to dm if you don’t want to start a whole thing, or beat your tin pan as you said, in an unrelated post.

[–] irmadlad@lemmy.world 3 points 2 months ago* (last edited 2 months ago) (1 children)

Wait, do you have an issue with piracy in general

I don't mind stating here: Piracy in general. I don't condemn those who do because, as I've said, you are autonomous adults capable of making your own decisions. You know the risks and you take steps to mitigate those risks. You and I, have both heard all the pros and cons and all the supporting arguments of both sides. Now, I know there are lots of people who rip and catalog their own DVD, CDs, etc. All fine and dandy.

The comparison was that every time AI is used here in this comm, or even suspected of use, people have a conniption and start piling on. Like moths to a flame. What does that accomplish? Nothing. It seems to just make those who are anti-AI feel superior, is about all I can get from it. To me, it's just a tool. I'll grant you it's a tool that needs some heavy regulation, even as much as I chafe against regulation. It is necessary. AI isn't going away. It's not a fad. It's here to stay. If using AI makes your blood boil, fine. Don't. Although I foresee a time where you'll use AI and not even know it.

Opinions are great too. I, like others, have a long list of them. Stating your opinions is fine too. It seems here tho, opinions turn into castigation and denigration, which is in direct violation of 'Rule 1: Be civil: we’re here to support and learn from one another.' State your opinion on AI: 'I'd rather guide my pops into my mum before I'd use AI'. Then move on. Personally, I don't state my opinion on the arr stack, because it would accomplish nothing and in the long run become tedious and obnoxious.

As far as the arr stack as software, I've never deployed it, but it is pretty darn amazing from what I've read. The dev teams that have put it all together have some knowledge to say the least. It's just not my bag.

[–] euAppleHater@feddit.org 2 points 2 months ago (1 children)

Ahk I see, thanks for the explanation. I assumed it was a general issue with piracy, but was wondering if maybe I had missed something negative about the software specially or the contributors behind it or something.

[–] irmadlad@lemmy.world 1 points 2 months ago

IMHO, we can all co-exist under the selfhosting umbrella