Technology

83295 readers

5516 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

853

This is the technology worth trillions of dollars huh (media.kbin.earth)

submitted 6 months ago by HarkMahlberg@kbin.earth to c/technology@lemmy.world

221 comments fedilink hide all child comments

This is the technology worth trillions of dollars huh

you are viewing a single comment's thread
view the rest of the comments

[–] fading_person@lemmy.zip 2 points 6 months ago (1 children)

Thank you very much for taking your time to explain this. if you don't mind, do you recommend some reference for further reading on how llms work internally?

[–] JustTesting@lemmy.hogru.ch 2 points 6 months ago (1 children)

For the byte pair encoding (how those tokens get created) i think https://bpemb.h-its.org/ does a good job at giving an overview. after that i'd say self attention from 2017 is the seminal work that all of this is based on, and the most crucial to understand. https://jtlicardo.com/blog/self-attention-mechanism does a good job of explaining it. And https://jalammar.github.io/illustrated-transformer/ is probably the best explanation of a transformer architecture (llms) out there. Transformers are made up of a lot of self attention.

it does help if you know how matrix multiplications work, and how the backpropagation algorithm is used to train these things. i don't know of a good easy explanation off the top of my head but https://xnought.github.io/backprop-explainer/ looks quite good.

and that's kinda it, you just make the transformers bigger, with more weight, pluck on a lot of engineering around them, like being able to run code and making it run more efficientls, exploit thousands of poor workers to fine tune it better with human feedback, and repeat that every 6-12 month for ever so it can stay up to date.

[–] fading_person@lemmy.zip 1 points 6 months ago

Thank you very much