Yes. I do. And I'm right.
There is already a business model for compensating authors: it is called buying the book. If the AI trainers are pirating books, then yeah - sue them.
That's part of the allegation, but it's unsubstantiated. It isn't entirely coherent.
these companies who have been using copyrighted material - without compensating the content creators - to train their AIs.
That wouldn't be copyright infringement.
It isn't infringement to use a copyrighted work for whatever purpose you please. What's infringement is reproducing it.
Maybe you don't care, but the OSI definition does.
In fairness, they didn't release anything open at all.
You're getting lost in the weeds here and completely misunderstanding both copyright law and the technology used here.
First of all, copyright law does not care about the algorithms used and how well they map what a human mind does. That's irrelevant. There's nothing in particular about copyright that applies only to humans but not to machines. Either a work is transformative or it isn't. Either it's derivative of it isn't.
What AI is doing is incorporating individual works into a much, much larger corpus of writing style and idioms. If a LLM sees an idiom used a handful of times, it might start using it where the context fits. If a human sees an idiom used a handful of times, they might do the same. That's true regardless of algorithm and there's certainly nothing in copyright or common sense that separates one from another. If I read enough Hunter S Thompson, I might start writing like him. If you feed an LLM enough of the same, it might too.
Where copyright comes into play is in whether the new work produced is derivative or transformative. If an entity writes and publishes a sequel to The Road, Cormac McCarthy's estate is owed some money. If an entity writes and publishes something vaguely (or even directly) inspired by McCarthy's writing, no money is owed. How that work came to be (algorithms or human flesh) is completely immaterial.
So it's really, really hard to make the case that there's any direct copyright infringement here. Absorbing material and incorporating it into future works is what the act of reading is.
The problem is that as a consumer, if I buy a book for $12, I'm fairly limited in how much use I can get out of it. I can only buy and read so many books in my lifetime, and I can only produce so much content. The same is not true for an LLM, so there is a case that Congress should charge them differently for using copyrighted works, but the idea that OpenAI should have to go to each author and negotiate each book would really just shut the whole project down. (And no, it wouldn't be directly negotiated with publishers, as authors often retain the rights to deny or approve licensure).
Okay, given that AI models need to look over hundreds of thousands if not millions of documents to get to a decent level of usefulness, how much should the author of each individual work get paid out?
Congress has been here before. In the early days of radio, DJs were infringing on recording copyrights by playing music on the air. Congress knew it wasn't feasible to require every song be explicitly licensed for radio reproduction, so they created a compulsory license system where creators are required to license their songs for radio distribution. They do get paid for each play, but at a rate set by the government, not negotiated directly.
Another issue, who decides which works are more valuable, or how? Is a Shel Silverstein book worth less than a Mark Twain novel because it contains less words? If I self publish a book, is it worth as much as Mark Twains? Sure his is more popular but maybe mine is longer and contains more content, whats my payout in this scenario?
I'd say no one. Just like Taylor Swift gets the same payment as your garage band per play, a compulsory licensing model doesn't care who you are.
Isn’t learning the basic act of reading text?
not even close. that’s not how AI training models work, either.
Of course it is. It's not a 1:1 comparison, but the way generative AI works and the we incorporate styles and patterns are more similar than not. Besides, if a tensorflow script more closely emulated a human's learning process, would that matter for you? I doubt that very much.
Thousands of authors demand payment from AI companies for use of copyrighted works::Thousands of published authors are requesting payment from tech companies for the use of >> their copyrighted works in training artificial intelligence tools
Having to individually license each unit of work for a LLM would be as ridiculous as trying to run a university where you have to individually license each student reading each textbook. It would never work.
What we're broadly talking about is generative work. That is, by absorbing one a body of work, the model incorporates it into an overall corpus of learned patterns. That's not materially different from how anyone learns to write. Even my use of the word "materially" in the last sentence is, surely, based on seeing it used in similar patterns of text.
The difference is that a human's ability to absorb information is finite and bounded by the constraints of our experience. If I read 100 science fiction books, I can probably write a new science fiction book in a similar style. The difference is that I can only do that a handful of times in a lifetime. A LLM can do it almost infinitely and then have that ability reused by any number of other consumers.
There's a case here that the renumeration process we have for original work doesn't fit well into the AI training models, and maybe Congress should remedy that, but on its face I don't think it's feasible to just shut it all down. Something of a compulsory license model, with the understanding that AI training is automatically fair use, seems more reasonable.
Isn’t learning the basic act of reading text? I’m not sure what the AI companies are doing is completely right but also, if your position is that only humans can learn and adapt text, that broadly rules out any AI ever.
But in recent conversations with investors, Altman has played his ace card, using advances in AI to entice people to put money into Tools for Humanity, the company behind Worldcoin, according to people briefed on the matter. If the value of the coin increases, it could be a massive windfall for Altman and other investors.
The token economics — a breakdown of how the tokens will be distributed — will be made public Monday, the people said.
Tools for Humanity has offered people around the world free Worldcoin tokens, called “WLD,” in exchange for scanning their irises with a device called “The Orb.” The iris scans ensure that each person can have only one Worldcoin ID.
Maybe I'm cynical, but, this rubs me very much the wrong way.
Basically credit card theft.
Over twenty years ago, when I was pretty young and inexperienced, I answered a newspaper ad for IT/programming at a so-called "startup." It sounded great.
My first day was in someone's living room-turned office and I didn't actually have any real idea what the business was. I was told it was a financial company, but it was taking off like gangbusters. Relatively quickly, within days actually, we moved into a very nice class-A office building. The owner was a remarkably charismatic man and being in his presence made you feel warm and understood and like you had a world of possibilities around you. I felt like a badass: I had a good-paying job, worked in a beautiful and prestigious office, and had a boss who made me feel great.
I found out, however, he was basically just running a scam. Between about 2-4am, he would have TV spots running, selling naive housewives, unemployment breadwinners, alcoholics, etc a "system" to earn huge sums of money very quickly. His system? You find people selling notes. You find people who want to buy notes. You introduce them and take a commission. A huuuuuuge commission.
Was that illegal? I don't know. I kind of doubt the people in the ads were real, but my paychecks were clearing.
I learned that when his sales people (who worked late at night, when the infomercials ran) took orders, they would record everyone's credit card info. Then, the owner directed us to automatically sign them up for things they didn't ask for -- recurring subscriptions to his membership-based "note marketplace" website. This was before the Internet was so mainstream, and many people buying this package didn't even have a computer.
If people tried to place an order, and one credit card was declined, he'd just have them quietly try another card we had on file for them, without asking. If anyone complained, they'd obviously just refund the whole charge to avoid pissing off the credit card companies, but he was really just hoping no one would notice.
I quit pretty quickly and got a "real" real job.
Well, if OpenAI knowingly used pirated work, that's one thing. It seems pretty unlikely and certainly hasn't been proven anywhere.
Of course, they could have done so unknowingly. For example, if John C Pirate published the transcripts of every movie since 1980 on his website, and OpenAI merely crawled his website (in the same way Google does), it's hard to make the case that they're really at fault any more than Google would be.