this post was submitted on 14 Mar 2025
1 points (100.0% liked)

It's A Digital Disease!

23 readers
1 users here now

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

founded 2 years ago
MODERATORS
 
The original post: /r/datahoarder by /u/alexlazar98 on 2025-03-14 10:37:22.

https://preview.redd.it/zp9vlha0vmoe1.png?width=1200&format=png&auto=webp&s=25233afd4d8804e65b7d6dff7bab03f33fe6ef53

I want to start a personal project where I scan, OCR and index markdown for old books. This is a book with ALL of Romania's roads back in 1974. It has tables and maps and all sorts of other interesting historical data points.

I already have some idea of data engineering. I'm a software engineer and I've made a project that helps with RAG, search and indexing of markdown files (even very big ones). My problem is the OCR part. Any tips?

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here