this post was submitted on 31 May 2024
26 points (96.4% liked)

AI

4126 readers
1 users here now

Artificial intelligence (AI) is intelligence demonstrated by machines, unlike the natural intelligence displayed by humans and animals, which involves consciousness and emotionality. The distinction between the former and the latter categories is often revealed by the acronym chosen.

founded 4 years ago
top 8 comments
sorted by: hot top controversial new old
[–] floofloof@lemmy.ca 8 points 1 year ago (1 children)

In February, Wei’s team announced BitNet 1.58b, in which parameters can equal -1, 0, or 1, which means they take up roughly 1.58 bits of memory per parameter. A BitNet model with 3 billion parameters performed just as well on various language tasks as a full-precision LLaMA model with the same number of parameters and amount of training, but it was 2.71 times as fast, used 72 percent less GPU memory, and used 94 percent less GPU energy. Wei called this an “aha moment.” Further, the researchers found that as they trained larger models, efficiency advantages improved.

That's pretty impressive.

[–] remotelove@lemmy.ca 1 points 1 year ago (1 children)

What exactly is a half bit? Is a transistor "half open" or something?

[–] floofloof@lemmy.ca 5 points 1 year ago (1 children)

I'm no expert but I think it's just a statistical measure of information, not something that can be physically realized in isolation. If two possible states are 1 bit and four possible states are 2 bits, then 3 possible states lies somewhere in between.

I did a bit of a search and found this: How can there be a fraction of a bit?

[–] remotelove@lemmy.ca 1 points 1 year ago

Oh cool. I was wondering if it was just a math thing or it was related to hardware used in ternary computers.

[–] qjkxbmwvz@startrek.website 6 points 1 year ago (1 children)

Only briefly skimmed, but don't you need nonlinearity for these things to work (e.g., rectifier, sigmoid...)? Else, it's just linear algebra, and more layers can't help (since matrices can be multiplied, the dimensionality is the only thing that matters). I don't think you can really get nonlinearity with one bit.

Not my field, so I'm sure I'm missing something. If anyone wants to ELI5 though...

[–] howrar@lemmy.ca 6 points 1 year ago

This article got me curious about how these 1-bit models worked so I read up on it a bit.

https://arxiv.org/html/2402.11295v3

The model parameters aren't completely converted to 1-bit. It's decomposed into a sign matrix (the 1-bit part) and two full precision vectors which together make a rank 1 approximation of the original matrix. So if I understand correctly, this means everything still functions the same way as a regular transformer. Input vectors, intermediate values, and outputs, all are full precision and have no problem going through nonlinearities.

[–] django@discuss.tchncs.de 4 points 1 year ago

Interesting read, especially the idea of specialized hardware for 1-bit LLMs.

[–] Thordros@hexbear.net 1 points 1 year ago

I got an imprecise large language model right here for them

JUST WRITE GOOD CODE HOLY SHIT