I have accumulated quite a bunch of research papers in the field I'm working in, they are PDF, PS and DJVU format. Some of these come with supplementary material, such as ZIP files, images or video clips. The collection has reached a point where searching and browsing documents has become a nightmare, as they are somewhat sorted in categories across different folders. Trying to retrieve documents by topic, author or by content is hard.
I was hoping to automate this somehow, and I was wondering if there is any good off the shelf solutions out there? I'm basically looking for an library system with the following features:
- Runs on a centralised web server, which can be accessed via client machines in a web browser.
- Server stores, keeps and sorts documents and their supplementary material in a database.
- Can search by author, title, or content.
- OCR capability to index/cache the content of documents.
- Perhaps able to generate citation metadata for each document by cross checking with a DOI database.
- Preferably open source project.
Is there such a thing, or am I asking too much?