The original post: /r/datahoarder by /u/Vizdun on 2024-09-12 19:03:59.
I want to collect a lot of data from many various sources (say scraping, saving some of the HTTP requests my browser makes, random files i got on the internet, db dumps, etc.) over a relatively long period of time and eventually process it somehow and actually do something useful with it. The problem I run into is I am not sure how to store it. Since I obviously have no clue about the schema ahead of time, using a typical RDBMS would be kind of problematic, I also have no clue which data is relevant since I don't know how I will want to process it in the future, which would further complicate a schemafull approach. I have considered Object Stores but since those are supposed to store files and a lot of this data probably wouldn't fit neatly into that, it also just seems kind of inconvenient? So far it seems the best idea would be using NATS JetStream and just pushing all the new data as messages to it to be processed at a later point. What do y'all think?