this post was submitted on 27 Jan 2026
8 points (90.0% liked)
NotAwfulTech
540 readers
2 users here now
a community for posting cool tech news you don’t want to sneer at
non-awfulness of tech is not required or else we wouldn’t have any posts
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Let's see if you get any takers.
There's already a couple of 'em - one of 'em, as expected, is being a sneerable little shit:
Oh no, tasks that have actual concrete outcomes and requirements! Vibe coders biggest nemesis!
Then why did you submit it, dipshit?
That "kind of standards" being basic competence.
I've started grading and his grade is ready to read. I didn't define an F tier for this task, so he did not place on the tier list. The most dramatic part of this is overfitting to the task at agent runtime (that is, "meta in-context learning"); it was able to do quite well at the given benchmark but at the cost of spectacular failure on anything complex outside of the context.
it was worth trying to start from my phone