Hard to say from the article only, but if it is like the status quo in the EU and USA, then only the training data can be illegally obtained. If I have an AI that is able to say verbatim the script of the Bee movie, I will be sued.
Google books had a similar issue. They scanned pretty much all the books in existence and indexed them. Small issue they did not obtain the consent of the copyright holders before doing this. They were sued and won. You can use copyrighted data as long as you do not provide Access to it.
Didier Raoult for a large part. He was the one who published the paper that really started this whole mess. His shoddy research practices and non-respect for patients did plenty of harm.
Good thing that they've forced his retirement.