this post was submitted on 27 Jan 2026
8 points (90.0% liked)
NotAwfulTech
540 readers
2 users here now
a community for posting cool tech news you don’t want to sneer at
non-awfulness of tech is not required or else we wouldn’t have any posts
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I've started grading and his grade is ready to read. I didn't define an F tier for this task, so he did not place on the tier list. The most dramatic part of this is overfitting to the task at agent runtime (that is, "meta in-context learning"); it was able to do quite well at the given benchmark but at the cost of spectacular failure on anything complex outside of the context.