this post was submitted on 21 Sep 2024
1 points (100.0% liked)

It's A Digital Disease!

23 readers
1 users here now

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

founded 2 years ago
MODERATORS
 
The original post: /r/datahoarder by /u/AntiProtonBoy on 2024-09-21 05:13:01.

I have accumulated quite a bunch of research papers in the field I'm working in, they are PDF, PS and DJVU format. Some of these come with supplementary material, such as ZIP files, images or video clips. The collection has reached a point where searching and browsing documents has become a nightmare, as they are somewhat sorted in categories across different folders. Trying to retrieve documents by topic, author or by content is hard.

I was hoping to automate this somehow, and I was wondering if there is any good off the shelf solutions out there? I'm basically looking for an library system with the following features:

  • Runs on a centralised web server, which can be accessed via client machines in a web browser.
  • Server stores, keeps and sorts documents and their supplementary material in a database.
  • Can search by author, title, or content.
  • OCR capability to index/cache the content of documents.
  • Perhaps able to generate citation metadata for each document by cross checking with a DOI database.
  • Preferably open source project.

Is there such a thing, or am I asking too much?

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here