this post was submitted on 01 Aug 2024
1 points (100.0% liked)

It's A Digital Disease!

23 readers
1 users here now

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

founded 2 years ago
MODERATORS
 
The original post: /r/datahoarder by /u/ECrispy on 2024-08-01 08:02:45.

use case - I have lots of local files, mostly in mhtml (saved from web using Chrome), pdf, txt, doc etc. I have these stored in folders with my own folder heirarchy.

issues - its getting too big to manage/search etc. size is 100GB+. There are lots of duplicates - i.e. same web page saved multiple times, or files that are very similar, or same content in multiple formats. its also hard to search/view

what I'd like: import all these into a selfhosted app. use some kind of tagging - the tags would initially use the local folder hierarchy, but hopefully some kind of AI and advanced document inspection (e.g mhtml files have a source url, which can be used to classify), identify duplicates (not just binary dups) etc. Have full text search over all docs.

is nextcloud/paperless-ng the recommended solution for documents, and can it do the above? Any other advice or tools I haven't heard of is welcome

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here