this post was submitted on 27 Jan 2025
1 points (100.0% liked)

It's A Digital Disease!

23 readers
1 users here now

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

founded 2 years ago
MODERATORS
 
The original post: /r/datahoarder by /u/zzswol on 2025-01-27 00:52:30.

The open-source AI community is releasing powerful models. Things are moving fast. You might not have the hardware, expertise, or attention to take proper advantage of them in the moment. Many people are in this position. The future is uncertain. I believe it is important to preserve the moment. Maybe we get AGI and It becomes ashamed of its infantile forms, user AI becomes illegal, etc (humor me).

What appears to be lacking: distributions mechanisms privileging archival.

I don't know what's going on, but I want to download stuff. What training data should I download? Validation data? Which models do I download? Which quantizations? In the future, to understand the present moment, we will want all of it. How do we support this?

I am imagining a place people of all sorts can go to find various distributions prepared:

prepper package: (high storage, low compute) - save all "small" models, distillations, etc

tech enthusiast package: (medium storage, medium compute) - save all major base models with scripts to reproduce published quantizations, fine-tunes, etc? [An archeologist will want closest access to what was commonly deployed at any given time]

rich guy package: (high storage, high compute) - no work needed here? just download ~everything~

alien archeologist package: ("minimal" storage, high compute) - a complete, non-redundant set of training data and source code for all pipelines? something a particularly dedicated and resourceful person might choose to laser etch into a giant artificial diamond and launch into space

Does this exist already?

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here