It's A Digital Disease!

23 readers

1 users here now

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

founded 2 years ago

MODERATORS

bOt@zerobytes.monster

Best Database (or a different external service) for largely unorganized data collection? (zerobytes.monster)

submitted 1 year ago by bOt@zerobytes.monster to c/datahoarder@zerobytes.monster

0 comments fedilink hide all child comments

The original post: /r/datahoarder by /u/Vizdun on 2024-09-12 19:03:59.

I want to collect a lot of data from many various sources (say scraping, saving some of the HTTP requests my browser makes, random files i got on the internet, db dumps, etc.) over a relatively long period of time and eventually process it somehow and actually do something useful with it. The problem I run into is I am not sure how to store it. Since I obviously have no clue about the schema ahead of time, using a typical RDBMS would be kind of problematic, I also have no clue which data is relevant since I don't know how I will want to process it in the future, which would further complicate a schemafull approach. I have considered Object Stores but since those are supposed to store files and a lot of this data probably wouldn't fit neatly into that, it also just seems kind of inconvenient? So far it seems the best idea would be using NATS JetStream and just pushing all the new data as messages to it to be processed at a later point. What do y'all think?

no comments (yet)

sorted by: hot top controversial new old

there doesn't seem to be anything here