The original post: /r/datahoarder by /u/m4d40 on 2025-02-23 14:00:34.
Hi, i have a lot of bigger txt, csv, sql (dump) files and wondered what the best way is to organize them and make them better searchable.
first i thought about pushing all in a nosql, but then it would be over 1TB which i think would be overkill to ever try to initiate and do queries from.
Next thought was, searching for common ids or fields, and create my own tree sctructure with files, where then i create an index like file to each with references to the big files where the detailed data about that id/field is stored, so if i want detailed information another script could go to the specific files and lines and grep/collect it.
(i also thought about elasticsearch, apache solr, or sth similar, but i have no knowledge in this are yet)