250
OpenAI says it’s “impossible” to create useful AI models without copyrighted material
(arstechnica.com)
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
A comedian isn't forming a sentence based on what the most probable word is going to appear after the previous one. This is such a bullshit argument that reduces human competency to "monkey see thing to draw thing" and completely overlooks the craft and intent behind creative works. Do you know why ChatGPT uses certain words over others? Probability. It decided as a result of its training that one word would appear after the previous in certain contexts. It absolutely doesn't take into account things like "maybe this word would be better here because the sound and syllables maintains the flow of the sentence".
Baffling takes from people who don't know what they're talking about.
I wish I could upvote this more than once.
What people always seem to miss is that a human doesn't need billions of examples to be able to produce something that's kind of "eh, close enough". Artists don't look at billions of paintings. They look at a few, but do so deeply, absorbing not just the most likely distribution of brushstrokes, but why the painting looks the way it does. For a basis of comparison, I did an art and design course last year and looked at about 300 artworks in total (course requirement was 50-100). The research component on my design-related degree course is one page a week per module (so basically one example from the field the module is about, plus some analysis). The real bulk of the work humans do isn't looking at billions of examples: it's looking at a few, and then practicing the skill and developing a process that allows them to convey the thing they're trying to express.
If the AI models were really doing exactly the same thing humans do, the models could be trained without any copyright infringement at all, because all of the public domain and creative commons content, plus maybe licencing a little more, would be more than enough.
When people say that the "model is learning from its training data", it means just that, not that it is human, and not that it learns exactly humans. It doesn't make sense to judge boats on how well they simulate human swimming patterns, just how well they perform their task.
Every human has the benefit of as a baby training on things around them and being trained by those around them, building a foundation for all later skills. Generative models rely on many text and image pairs to describe things to them because they lack the ability to poke, prod, rotate, and disassemble for themselves.
For example, when a model takes in a thousand images of circles, it doesn't "learn" a thousand circles. It learns what circle GENERALLY is like, the concept of it. That representation, along with random noise, is how you create images with them. The same happens for every concept the model trains on. Everything from "cat" to more complex things like color relationships and reflections or lighting. Machines are not human, but they can learn despite that.
In general I agree with you, but AI doesn't learn the concept of what a circle is. AI reproduces the most fitting representation of what we call a circle. But there is no understanding of the concept of a circle. This may sound nit picking, but I think it's important to make the distinction.
That is why current models aren't regarded as actual intelligence, although people already call them that...
I understand. I didn't mean to imply any sort of understanding with the language I used.