this post was submitted on 24 Jul 2024
1 points (100.0% liked)

It's A Digital Disease!

23 readers
1 users here now

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

founded 2 years ago
MODERATORS
 
The original post: /r/datahoarder by /u/-Bionicman- on 2024-07-24 13:26:46.

I have about 20-30 folders of old documents from university (mostly typewritten with illustrations and handwritten notes).

I recently bought a duplex scanner (Epson DS-410) and am still pondering the choice of file format, DPI setting, and OCR. I would like to archive the documents in the best possible quality and then dispose of the folders.

Question 1: I am currently thinking about PDF/A and wondering if it is the best choice or if .tiff or .png would have an advantage? Unlike PDF, I noticed that I can't choose a compression level with PDF/A. Is PDF/A lossless?

Question 2: Can multiple .tiff or .png documents also be easily converted into a PDF, or is this not a good idea?

Question 3: For the PDF or PDF/A file format, my scanner has the option to create a searchable document (OCR) directly. Is this recommended or would it be better to add OCR afterwards either by using a specialized tool or to import it into paperless-ngx (which I don't have yet, probably will use it in near future) ?

Not sure if there is difference in OCR, if there is something as "bad" or "good" OCR.

My goal is: To create the scan in the best possible file format. If .tiff or .png is preferable as an archive file, the option to easily convert it to PDF should be available; otherwise those formats would not be an option for me.

I would appreciate your advice.

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here