sisyphean

joined 2 years ago
MODERATOR OF
[–] sisyphean@programming.dev 9 points 2 years ago

Finally I could get into the beta and all I can say is wow, I’m in love with this app 🤩

Keep up the good work!

[–] sisyphean@programming.dev 3 points 2 years ago

It’s a Node.js app because the Lemmy-bot library is for Node.

I will definitely open source it, but the code is currently in a disgusting state, so I need to clean it up first.

[–] sisyphean@programming.dev 2 points 2 years ago

It’s still in preview, I can’t access it either, only a select few on Twitter who post about it all the time and make me jealous:)

[–] sisyphean@programming.dev 1 points 2 years ago (1 children)

Yes, OpenAI and out of pocket. Unfortunately cheaper or self-hosted models are much worse.

[–] sisyphean@programming.dev 2 points 2 years ago

Thank you, that’s a reasonable suggestion, I added it to the comment template:

TL;DR: (AI-generated 🤖)

[–] sisyphean@programming.dev 1 points 2 years ago

Yes, they have promised explicitly not to use API data for training.

Thank you, I’ll take a look at these models, I hope I can find something a bit cheaper but still high-quality.

[–] sisyphean@programming.dev 2 points 2 years ago

Unfortunately I don’t yet have access to it so I can’t check if the description always comes first. But your theory sounds interesting, I hope we’ll be able to find out more soon.

[–] sisyphean@programming.dev 1 points 2 years ago (1 children)

3.5 is also really good, but I've been using GPT-4 for almost everything since it became available. 3.5 hallucinates more often but I used it a lot before April, and I was really satisfied with it.

[–] sisyphean@programming.dev 3 points 2 years ago

I implemented it. The feature will be available right from the start. The bot will reply this if the user has disabled it:

🔒 The author of this post or comment has the #nobot hashtag in their profile. Out of respect for their privacy settings, I am unable to summarize their posts or comments.

[–] sisyphean@programming.dev 0 points 2 years ago (1 children)

It will be me 😭

I limited it to 100 summaries per day, so it won’t cost more than about $20/month in the worst case.

[–] sisyphean@programming.dev 1 points 2 years ago

I haven’t yet looked into it, but the screencast on its website looks really promising! I have a lot on my plate right now so I think I’ll release it first with the GPT-3.5 integration, but I’ll definitely try GPT4All later!

[–] sisyphean@programming.dev 0 points 2 years ago (3 children)

It can also summarize links, so it’s already useful even if there are few people posting walls of text

 
 
 

Excellent Twitter thread by @goodside 🧵:

The wisdom that "LLMs just predict text" is true, but misleading in its incompleteness.

"As an AI language model trained by OpenAI..." is an astoundingly poor prediction of what a typical human would write.

Let's resolve this contradiction — a thread: For widely used LLM products like ChatGPT, Bard, or Claude, the "text" the model aims to predict is itself written by other LLMs.

Those LLMs, in turn, do not aim to predict human text in general, but specifically text written by humans pretending they are LLMs. There is, at the start of this, a base LLM that works as popularly understood — a model that "just predicts text" scraped from the web.

This is tuned first to behave like a human role-playing an LLM, then again to imitate the "best" of that model's output. Models that imitate humans pretending to be (more ideal) LLMs are known as "instruct models" — because, unlike base LLMs, they follow instructions. They're also known as "SFT models" after the process that re-trains them, Supervised Fine-Tuning.

This describes GPT-3 in 2021.

SFT/instruct models work, but not well. To improve them, their output is graded by humans, so that their best responses can be used for further fine-tuning.

This is "modified SFT," used in the GPT-3 version you may remember from 2022 (text-davinci-002). Eventually, enough examples of human grading are available that a new model, called a "preference model," can be trained to grade responses automatically.

This is RLHF — Reinforcement Learning on Human Feedback. This process produced GPT-3.5 and ChatGPT. Some products, like Claude, go beyond RLHF and apply a further step where model output is corrected and rewritten using feedback from yet another model. The base model is tuned on these responses to yield the final LLM.

This is RLAIF — Reinforcement Learning with AI Feedback. OpenAI's best known model, GPT-4, is likely trained using some other extension of RLHF, but nothing about this process is publicly known. There are likely many improvements to the base model as well, but we can only speculate what they are. So, do LLMs "just predict text"?

Yes, but perhaps without with the "just" — the text they predict is abstract, and only indirectly written by humans.

Humans sit at the base of a pyramid with several layers of AI above, and humans pretending to be AI somewhere in the middle. Added note:

My explanation of RLHF/RLAIF above is oversimplified. RL-tuned models are not literally tuned to predict highly-rated text as in modified SFT — rather, weights are updated via Proximal Policy Optimization (PPO) to maximize the reward given by the preference model. (Also, that last point does somewhat undermine the thesis of this thread, in that RL-tuned LLMs do not literally predict any text, human-written or otherwise. Pedantically, "LLMs just predict text" was true before RLHF, but is now a simplification.)

 

You know the video is going to be the most interesting thing you watched this week when this unkempt guy with the axe on the wall appears in it.

But seriously, he is one of the best at explaining LLM behavior, very articulate and informative. I highly recommend watching all of his Computerphile videos.

 
 
 
 

OpenAI’s official guide. Short and to the point, no bullshit, covers the basics very well.

 
 
 
view more: ‹ prev next ›