Self-Hosted Alternatives to Popular Services

222 readers

3 users here now

A place to share, discuss, discover, assist with, gain assistance for, and critique self-hosted alternatives to our favorite web apps, web...

founded 2 years ago

MODERATORS

bot@lemmit.online

Docext: Open-Source, On-Prem Document Intelligence Powered by Vision-Language Models (old.reddit.com)

submitted 4 months ago by bot@lemmit.online to c/selfhosted@lemmit.online

0 comments fedilink hide all child comments

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/selfhosted by /u/SouvikMandal on 2025-04-07 13:33:46+00:00.

We’re excited to open source docext, a zero-OCR, on-premises tool for extracting structured data from documents like invoices, passports, and more — no cloud, no external APIs, no OCR engines required.

Powered entirely by vision-language models (VLMs), docext understands documents visually and semantically to extract both field data and tables — directly from document images.

Run it fully on-prem for complete data privacy and control.

Key Features:

Custom & pre-built extraction templates
Table + field data extraction
Gradio-powered web interface
On-prem deployment with REST API
Multi-page document support
Confidence scores for extracted fields

Whether you're processing invoices, ID documents, or any form-heavy paperwork, docext helps you turn them into usable data in minutes.

Try it out:

pip install docext or launch via Docker
Spin up the web UI with python -m docext.app.app
Dive into the Colab demo

GitHub:

Questions? Feature requests? Open an issue or start a discussion!

no comments (yet)

sorted by: hot top controversial new old

there doesn't seem to be anything here