NotAwfulTech

564 readers

10 users here now

a community for posting cool tech news you don’t want to sneer at

non-awfulness of tech is not required or else we wouldn’t have any posts

founded 2 years ago

MODERATORS

self@awful.systems

Lobsters Vibecoding Challenge (Winter 2025-2026) (gist.github.com)

submitted 2 months ago by corbin@awful.systems to c/notawfultech@awful.systems

9 comments fedilink hide all child comments

I’m tired of hearing about vibecoding on Lobsters, so I’ve written up three of my side tasks for coding agents. Talk is cheap; show us the code.

top 9 comments

sorted by: hot top controversial new old

[–] corbin@awful.systems 9 points 2 months ago (1 children)

It occurs to me that this audience might not immediately understand how hard the chosen tasks are. I was fairly adversarial with my task selection.

Two of them are in RPython, an old dialect of Python 2.7 that chatbots will have trouble emitting because they're trained on the incompatible Python 3.x lineage. The odd task out asks for the bot to read Raku, which is as tough as its legendary predecessor Perl 5, and to write low-level code that is very prone to crashing. All three tasks must be done relative to a Nix flake, which is easy for folks who are used to it but not typical for bots. The third task is an open-ended optimization problem where a top score will require full-stack knowledge and a strong sense of performance heuristics; I gave two examples of how to do it, but by construction neither example can result in an S-tier score if literally copied.

This test is meant to shame and embarrass those who attempt it. It also happens to be a slice of the stuff that I do in my spare time.

[–] gerikson@awful.systems 6 points 2 months ago (1 children)

Let's see if you get any takers.

[–] BlueMonday1984@awful.systems 7 points 2 months ago* (last edited 2 months ago) (4 children)

There's already a couple of 'em - one of 'em, as expected, is being a sneerable little shit:

[–] V0ldek@awful.systems 12 points 2 months ago* (last edited 2 months ago)

Vibe coding is of course a less than optimal process for the kind of tasks you've specified here

Oh no, tasks that have actual concrete outcomes and requirements! Vibe coders biggest nemesis!

[–] blakestacey@awful.systems 7 points 2 months ago

Here's something. It doesn't follow your rules.

Then why did you submit it, dipshit?

Given your tone in these posts it seems unlikely to meet the kind of standards you are looking for.

That "kind of standards" being basic competence.

[–] corbin@awful.systems 6 points 2 months ago

I've started grading and his grade is ready to read. I didn't define an F tier for this task, so he did not place on the tier list. The most dramatic part of this is overfitting to the task at agent runtime (that is, "meta in-context learning"); it was able to do quite well at the given benchmark but at the cost of spectacular failure on anything complex outside of the context.

[–] istewart@awful.systems 5 points 2 months ago

it was worth trying to start from my phone

[–] V0ldek@awful.systems 8 points 2 months ago* (last edited 2 months ago)

I especially love the third task because that's exactly the kind of shit you get thrown on your plate in the field as a SWE.

There's been some work an old member of our team did a year ago. No one remembers what it was, but it is important. Please do something.

That's almost exactly one of the first tasks I got as an intern when I was starting out. THIS is what they are saying LLMs are going to replace.

[–] corbin@awful.systems 3 points 1 month ago

I've finished grading all of the entries so far. I don't think that we'll get any more, so here's a preview of the upcoming blog post.

The tier listings are as follows:

B tier: Corbin S. (Task 1), Corbin S. (Task 2), Corbin S. (Task 3)
C tier: Piper M. (Task 1)

Admittedly, we didn't get a whole lot of players, but that's it. That's the entire tier listing. I had three things I wanted to do in my spare time. I did them and got an average ranking based on my average predictions of the future; I met expectations. Piper also placed and I greatly appreciate her sportsmanship here.

My solutions are available as notes and source code. For Task 1, I have three main commits: one, two, three, and a bugfix. For Task 2, the commits are internal to my homelab, but I do have notes and source code. Finally, for Task 3, I put the entire repository into a flat gist including notes, source code, and Nix flake.