Self-Hosted Alternatives to Popular Services

224 readers

2 users here now

A place to share, discuss, discover, assist with, gain assistance for, and critique self-hosted alternatives to our favorite web apps, web...

founded 2 years ago

MODERATORS

bot@lemmit.online

paperless-gpt –Yet another Paperless-ngx AI companion with LLM-based OCR focus (old.reddit.com)

submitted 8 months ago by bot@lemmit.online to c/selfhosted@lemmit.online

0 comments fedilink hide all child comments

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/selfhosted by /u/Spare_Put8555 on 2025-01-09 14:47:56+00:00.

Hey everyone,

I've noticed discussions in other threads about paperless-ai (which is awesome), and some folks asked how it differs from my project, paperless-gpt. Since I’m a newer user here, I’ll keep things concise:

Context

paperless-ai leans toward doc-based AI chat, letting you converse with your documents.
paperless-gpt focuses on LLM-based OCR (for more accurate scanning of messy or low-quality docs) and a robust pipeline for auto-generating titles/tags.

Why Another Project?

I didn't know paperless-ai in Sept. '24: True story :D
LLM-based OCR: I wanted a solution that does advanced text extraction from scans, harnessing Large Language Models (OpenAI or Ollama).
Tag & Title Workflows: My main passion is building flexible, automated naming and tagging pipelines for paperless-ngx.
No Chat (Yet): If you do want doc-based chatting, paperless-ai might be a better fit. Or you can run both—use paperless-gpt for scanning/tags, then pass that cleaned text into paperless-ai for Q&A.

Key Features

Multiple LLM Support (OpenAI or Ollama).
Customizable Prompts for specialized docs.
Auto Document Processing via a “paperless-gpt-auto” tag.
Vision LLM-based OCR (experimental) that outperforms standard OCR in many tough scenarios.

Combining With paperless-ai?

Totally possible. You could have paperless-gpt handle the scanning & metadata assignment, then feed those improved text results into paperless-ai for doc-based chat.
Some folks asked about overlap: we do share the “metadata extraction” idea, but the focus differs.

If You’re Curious

The project has a short README, Docker Compose snippet, and minimal environment vars.
I’m grateful to a few early sponsors who donated (thank you so much!). That support motivates me to keep adding features (like multi-language OCR support).

Anyway, just wanted to clarify the difference, since people were asking. If you’re looking for OCR specifically—especially for messy scans—paperless-gpt might fit the bill. If doc-based conversation is your need, paperless-ai is out there. Or combine them both!

Happy to answer any questions or feedback you have. Thanks for reading!

Links (in case you want them):

paperless-gpt code and docs: github.com/icereed/paperless-gpt
paperless-ngx: github.com/paperless-ngx/paperless-ngx

Cheers!

no comments (yet)

sorted by: hot top controversial new old

there doesn't seem to be anything here