this post was submitted on 11 Jun 2024
88 points (98.9% liked)

technology

23218 readers
2 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

founded 5 years ago
MODERATORS
 

The big AI models are running out of training data (and it turns out most of the training data was produced by fools and the intentionally obtuse), so this might mark the end of rapid model advancement

you are viewing a single comment's thread
view the rest of the comments
[โ€“] bazingabrain@hexbear.net 11 points 1 year ago (1 children)

I fail to see how synthetic data is good if it makes AI used to justify job cuts, "better".

[โ€“] lurkerlady@hexbear.net 9 points 1 year ago* (last edited 1 year ago)

Synthetic data is basically a fancy way of saying 'I'm properly formatting data and reinforcing the ai's good outputs'. Rearranging words, fixing / adding tags, that sort of thing. This is generated with various tools that usually have an LLM or VLM plugged in, though some are as simple as a regex script.