this post was submitted on 20 Apr 2024
11 points (64.9% liked)

LocalLLaMA

3479 readers
18 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

Rules:

Rule 1 - No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.

Rule 2 - No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.

Rule 3 - No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they're still using the same algorithms since <over 10 years ago>.

Rule 4 - No implying that models are devoid of purpose or potential for enriching peoples lives.

founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] possiblylinux127@lemmy.zip 7 points 1 year ago (9 children)

I think we need to reevaluate what it means for a model to be FOSS. There isn't a good answer and it would be nice if some free organization would release guidelines on AI

[–] projectmoon@lemm.ee 2 points 1 year ago (1 children)

I would think access to the training data, or at least no restrictions on what you can do with the model, would be a good definition.

[–] General_Effort@lemmy.world 3 points 1 year ago (1 children)

access to the training data

That's just not realistic. There are too many legal problems with that.

Besides, Llama 3 was trained on 15 trillion tokens. Whatcha gonna do with something like that?

[–] rufus@discuss.tchncs.de 1 points 1 year ago* (last edited 1 year ago)

Hmm. Sure the legal issues is why it is the way it is. It doesn't necessarily mean it should be that way... But it's more complicated than that.

With the dataset, I'm sure people could figure out something to do with it. There are community curated datasets, previous attempts to recreate models like RedPajama... Sure this is a lot more, but other people are making progress, too. And if not that we could at least have a look at it, do some research, statistics... Maybe use parts of it for something else. That's the spirit of the free software movement.

I'm a bit split on the topic. FOSS doesn't translate directly to ML models. Not being able to recreate something isn't how it's supposed to be. But it's not software either and works differently. Releasing datasets would give us some progress and give the tools to other people than just the big tech companies who are free to violate copyright law. But we're still missing the millions to afford the compute to train a model anyways.

load more comments (7 replies)