this post was submitted on 11 Jun 2024
21 points (81.8% liked)

AI

4126 readers
1 users here now

Artificial intelligence (AI) is intelligence demonstrated by machines, unlike the natural intelligence displayed by humans and animals, which involves consciousness and emotionality. The distinction between the former and the latter categories is often revealed by the acronym chosen.

founded 4 years ago
you are viewing a single comment's thread
view the rest of the comments
[–] Smorty@lemmy.blahaj.zone 2 points 9 months ago* (last edited 9 months ago) (1 children)

I'm even more excited for running 8B models at the speed of 1B! Laughably fast ok-quality generations in JSON format would be crazy useful.

Also yeah, that 7B on mobile was not the best example. Again, probably 1B to 3B is the sweetspot for mobile (I'm running Qwen2.5 0.5B on my phone and it works tel real for simple JSON)

EDIT: And imagine the context lengths we would be ablentonrun on our GPUs at home! What a time to be alive.

[–] Fisch@discuss.tchncs.de 2 points 9 months ago (1 children)

Being able to run 7B quality models on your phone would be wild. It would also make it possible to run those models on my server (which is just a mini pc), so I could connect it to my Home Assistant voice assistant, which would be really cool.

[–] Smorty@lemmy.blahaj.zone 1 points 9 months ago (1 children)

Something similar to this already kinda exists on HF with the 1.58 bit quantisation which seem to get very similar performance to the original Llama 3 8B model. That's essentially a two bit quanitsation with reasonable performance!

[–] Fisch@discuss.tchncs.de 2 points 9 months ago

That's really interesting, gonna try out how well it runs