overview for sisyphean

Understanding GPT tokenizers in c/auai@programming.dev

[–] sisyphean@programming.dev 4 points 2 years ago* (last edited 2 years ago)

Here is an example of tokenization being biased toward English (using the author's Observable notebook):

This is the same sentence in English and my native Hungarian. I understand that this is due to the difference in the amount of text available in the two languages in the training corpus. But it's still a bit annoying that using the API for Hungarian text is more expensive :)

Protests broke Reddit hack for useful Google search results—and Google knows it in c/technology@lemmy.world

[–] sisyphean@programming.dev 3 points 2 years ago* (last edited 2 years ago) (3 children)

The best hacker is of course the one who can guess the password the fastest (all-lowercase, dictionary word).

I love the Tile ControlNet, but it's really easy to overdo. Look at this monstrosity of tiny detail I made by accident. in c/stable_diffusion@lemmy.dbzer0.com

[–] sisyphean@programming.dev 7 points 2 years ago

Oh yes, terrible indeed. Saved.

Google Says the Reddit Blackout Made Search Worse in c/technology@lemmy.world

[–] sisyphean@programming.dev 2 points 2 years ago

It will work with any bigger instance because of federation. All communities with subscribers from an instance are available on that instance. Si site:lemmy.world, site:programming.dev, etc. will work.

Microsoft wants to move Windows fully to the cloud in c/technology@beehaw.org

[–] sisyphean@programming.dev 1 points 2 years ago (1 children)

Made the switch 4 years ago. No regrets.

git-story: Easily create video animations of your Git commit history in c/git@programming.dev

[–] sisyphean@programming.dev 3 points 2 years ago (1 children)

Nice! This could be used to visualize history in tutorials or presentations.

This asshole fish in c/programmer_humor@programming.dev

[–] sisyphean@programming.dev 12 points 2 years ago (1 children)

Here is your Lemmy Gold:

Lemmy Gold

LLM Powered Autonomous Agents in c/auai@programming.dev

[–] sisyphean@programming.dev 1 points 2 years ago

As everything else by Lilian Weng, this is a very good no-nonsense overview of the state of LLM-based agents. Highly recommended.

TOML in Python in c/python@programming.dev

[–] sisyphean@programming.dev 6 points 2 years ago* (last edited 2 years ago)

YAML is extremely complex for a configuration format and it has many really weird edge cases:

https://noyaml.com/

The problem is IMHO made worse because it looks so friendly at first glance.

I'm working on a TL;DR bot for Lemmy, powered by GPT-3.5 in c/artificial_intel@lemmy.ml

[–] sisyphean@programming.dev 2 points 2 years ago (4 children)

Only on programming.dev, at least in the beginning, but it will be open source so anyone will be able to host it for themselves.
I set up a hard limit of 100 summaries per day to limit costs. This way it won’t go over $20/month. I hope I will be able to increase it later.

Bing’s GPT-4 With Image Input Can Break Captchas in c/auai@programming.dev

[–] sisyphean@programming.dev 1 points 2 years ago

Yes, for a moment you think “oh, there’s such a convenient API for this” and then you realize…

But we programmers can at least compile/run the code and find out if it’s wrong (most of the time). It is much harder in other fields.

People around the world, do you drink tap water without boiling? in c/asklemmy@lemmy.ml

[–] sisyphean@programming.dev 3 points 2 years ago* (last edited 2 years ago)

Hungarian here. It is safe to drink without boiling. People only boil water for baby formula to be extra safe.

4

Bing Chat (using GPT-4) now accepts image input (in Early Access) (twitter.com)

submitted 2 years ago* (last edited 2 years ago) by sisyphean@programming.dev to c/auai@programming.dev

2 comments fedilink

Original tweet: https://twitter.com/emollick/status/1671528847035056128

Screenshots (from the tweet):

4

TinyStories: How Small Can Language Models Be and Still Speak Coherent English? (arxiv.org)

submitted 2 years ago by sisyphean@programming.dev to c/artificial_intel@lemmy.ml

0 comments fedilink

cross-posted from: https://programming.dev/post/133153

Quote:

In this work, we introduce TinyStories, a synthetic dataset of short stories that only contain words that a typical 3 to 4-year-olds usually understand, generated by GPT-3.5 and GPT-4. We show that TinyStories can be used to train and evaluate LMs that are much smaller than the state-of-the-art models (below 10 million total parameters), or have much simpler architectures (with only one transformer block), yet still produce fluent and consistent stories with several paragraphs that are diverse and have almost perfect grammar, and demonstrate reasoning capabilities.

Related:

Models (you can try them online):

TinyStories-1M

TinyStories-33M

An interview with the authors (highly recommended): The Tiny Model Revolution with Ronen Eldan and Yuanzhi Li of Microsoft Research

2

Using ChatGPT Browse to name a Python package (til.simonwillison.net)

submitted 2 years ago by sisyphean@programming.dev to c/chatgpt@lemmy.world

0 comments fedilink

1

Using ChatGPT Browse to name a Python package (til.simonwillison.net)

submitted 2 years ago by sisyphean@programming.dev to c/auai@programming.dev

0 comments fedilink

3

Domain Modeling Made Functional with the F# Type System (fsharpforfunandprofit.com)

submitted 2 years ago by sisyphean@programming.dev to c/ddd@programming.dev

0 comments fedilink

2

The course that made DDD click for me (www.pluralsight.com)

submitted 2 years ago by sisyphean@programming.dev to c/ddd@programming.dev

0 comments fedilink

10

How to say “y’all sound crazy” in a professional setting (programming.dev)

submitted 2 years ago by sisyphean@programming.dev to c/memes@lemmy.ml

6 comments fedilink

159

My boyfriend told me he was getting a Raspberry Pie delivered… (programming.dev)

submitted 2 years ago by sisyphean@programming.dev to c/programmer_humor@programming.dev

10 comments fedilink

Old but gold.

5

What do you find most frustrating about .NET? (programming.dev)

submitted 2 years ago* (last edited 2 years ago) by sisyphean@programming.dev to c/dotnet@programming.dev

16 comments fedilink

The original thread is on the devil’s website and I don’t want to direct traffic to it, so here’s a link to the tweet instead:

https://twitter.com/davidfowl/status/1671351948640129024?s=46&t=OEG0fcSTxko2ppiL47BW1Q

3

TinyStories: How Small Can Language Models Be and Still Speak Coherent English? (arxiv.org)

submitted 2 years ago* (last edited 2 years ago) by sisyphean@programming.dev to c/auai@programming.dev

1 comments fedilink

Quote:

In this work, we introduce TinyStories, a synthetic dataset of short stories that only contain words that a typical 3 to 4-year-olds usually understand, generated by GPT-3.5 and GPT-4. We show that TinyStories can be used to train and evaluate LMs that are much smaller than the state-of-the-art models (below 10 million total parameters), or have much simpler architectures (with only one transformer block), yet still produce fluent and consistent stories with several paragraphs that are diverse and have almost perfect grammar, and demonstrate reasoning capabilities.

Related:

Models (you can try them online):
- TinyStories-1M
- TinyStories-33M
An interview with the authors (highly recommended): The Tiny Model Revolution with Ronen Eldan and Yuanzhi Li of Microsoft Research

2

ChatGPT can’t recite the Litany Against Fear from Dune (possibly due to a filter that removes copyrighted content) (web.archive.org)

submitted 2 years ago by sisyphean@programming.dev to c/chatgpt@lemmy.world

0 comments fedilink

cross-posted from: https://programming.dev/post/120039

HN discussion: https://news.ycombinator.com/item?id=36400053

2

ChatGPT can’t recite the Litany Against Fear from Dune (possibly due to a filter that removes copyrighted content) (web.archive.org)

submitted 2 years ago by sisyphean@programming.dev to c/auai@programming.dev

1 comments fedilink

HN discussion: https://news.ycombinator.com/item?id=36400053