this post was submitted on 11 Sep 2025
808 points (96.4% liked)

Technology

75017 readers
2727 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

This is the technology worth trillions of dollars huh

you are viewing a single comment's thread
view the rest of the comments
[–] fading_person@lemmy.zip 2 points 18 hours ago (1 children)

Thank you very much for taking your time to explain this. if you don't mind, do you recommend some reference for further reading on how llms work internally?

[–] JustTesting@lemmy.hogru.ch 2 points 10 hours ago (1 children)

For the byte pair encoding (how those tokens get created) i think https://bpemb.h-its.org/ does a good job at giving an overview. after that i'd say self attention from 2017 is the seminal work that all of this is based on, and the most crucial to understand. https://jtlicardo.com/blog/self-attention-mechanism does a good job of explaining it. And https://jalammar.github.io/illustrated-transformer/ is probably the best explanation of a transformer architecture (llms) out there. Transformers are made up of a lot of self attention.

it does help if you know how matrix multiplications work, and how the backpropagation algorithm is used to train these things. i don't know of a good easy explanation off the top of my head but https://xnought.github.io/backprop-explainer/ looks quite good.

and that's kinda it, you just make the transformers bigger, with more weight, pluck on a lot of engineering around them, like being able to run code and making it run more efficientls, exploit thousands of poor workers to fine tune it better with human feedback, and repeat that every 6-12 month for ever so it can stay up to date.

[–] fading_person@lemmy.zip 1 points 2 hours ago

Thank you very much