this post was submitted on 02 Dec 2024
1 points (100.0% liked)

It's A Digital Disease!

23 readers
1 users here now

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

founded 2 years ago
MODERATORS
 
The original post: /r/datahoarder by /u/GetTheBlinkerFluid on 2024-12-02 02:22:55.

I'm looking for a tool to "de-rasterize" scanned PDFs. As in, convert the heavy images it's made of with lightweight text and structured elements.

https://preview.redd.it/2mklk9k6hc4e1.png?width=2516&format=png&auto=webp&s=f2ae81b43004ef0436d89eab79db3fb536d97a24

Replacing the scanned images by text and basic shapes could help compress the libraries of PDF's I have by 95%+.

I tried Adobe Acrobat's .docx converter. It did a mediocre job. I'm not looking for a perfect 1:1 replica, but it's probably outdated in this age of ML and AI tools. Are there any better tools or pipelines to do as described above?

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here