modulus

joined 2 years ago
MODERATOR OF
[–] modulus@lemmy.ml 18 points 2 years ago (13 children)

Worth considering that this is already the law in the EU. Specifically, the Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Market has exceptions for text and data mining.

Article 3 has a very broad exception for scientific research: "Member States shall provide for an exception to the rights provided for in Article 5(a) and Article 7(1) of Directive 96/9/EC, Article 2 of Directive 2001/29/EC, and Article 15(1) of this Directive for reproductions and extractions made by research organisations and cultural heritage institutions in order to carry out, for the purposes of scientific research, text and data mining of works or other subject matter to which they have lawful access." There is no opt-out clause to this.

Article 4 has a narrower exception for text and data mining in general: "Member States shall provide for an exception or limitation to the rights provided for in Article 5(a) and Article 7(1) of Directive 96/9/EC, Article 2 of Directive 2001/29/EC, Article 4(1)(a) and (b) of Directive 2009/24/EC and Article 15(1) of this Directive for reproductions and extractions of lawfully accessible works and other subject matter for the purposes of text and data mining." This one's narrower because it also provides that, "The exception or limitation provided for in paragraph 1 shall apply on condition that the use of works and other subject matter referred to in that paragraph has not been expressly reserved by their rightholders in an appropriate manner, such as machine-readable means in the case of content made publicly available online."

So, effectively, this means scientific research can data mine freely without rights' holders being able to opt out, and other uses for data mining such as commercial applications can data mine provided there has not been an opt out through machine-readable means.

[–] modulus@lemmy.ml 1 points 2 years ago (1 children)

Well, in a way that's what we're doing now, and by and large it works but obviously there's some leakage, which is impossible to bring down to zero but which makes sense working on improving.

The other side of the coin is that the price of this moderation model is subjecting a lot more people to a lot more horrible shit, and I unfortunately don't know any way around that.

[–] modulus@lemmy.ml 2 points 2 years ago (3 children)

Perhaps the manual reporting tool is enough? Then that content can be forwarded to the central ms service. I wonder if that API can report back to say whether it is positive.

The problem with a lot of this tooling is you need some sort of accreditation to use it, because it somewhat relies on security through obscurity. As far as I know you can't just hit MS's servers and ask "is this CSAM?" If something like that were possible it might work.

Can you elaborate on the hash problem?

Sure. When you have an image, you can do lots of things to it that change it in some way: change the compression, the format, crop it, apply a filter... This all changes the file and so it changes the hash. The perceptual hash system works on the basis of some computer vision stuff and the idea is that it will try to generate the same hash for pictures that are substantially the same. But this tech is imperfect and probably will have changes. So if there's a change in the way the hash gets calculated, it wouldn't be enough with keeping hashes, you'd have to keep the original file to recalculate, which is storing CSAM, which is ordinarily not allowed and for good reason.

For a hint on how bad these hashes can get, they are reversible, vulnerable to pre-image attacks, and so on.

Some of this is probably inevitable in this type of systems. You don't want to make it easy for someone to hit the servers with a large number of hashes, and then use IPFS or BitTorrent DHT to retrieve positives (you'd be helping people getting CSAM). The problem is hard.

Personally I was thinking of generating a federated set based on user reporting. Perhaps enhanced by checking with the central service as mentioned above. This db can then be synced with trusted instances.

Something like that could work, maybe obscuring some of the hash content (random parts of it) so that it doesn't become a way to actually find the stuff.

Whatever decisions are made have to be well thought through so as not to make the problem worse.

[–] modulus@lemmy.ml 9 points 2 years ago

Clearly this particular suit by this particular person is iffy. However, I don't think this framing is very good: the fact Wikimedia is headquartered elsewhere shouldn't make it immune from being sued where an affected party lives.

Also, this part of the article seems a bit contradictory:

Just because someone doesn’t like what’s written about them doesn’t give them the right to unmask contributors. And if the plaintiff still believes he’s been wronged by these contributors, he can definitely sue them personally for libel (or whatever). What he has no right to demand is that a third party unmask users simply because it’s the easiest target to hit.

Ok, but how does he sue them personally without knowing who they are? It's fine to say this shouldn't be regarded as libel (I agree, it's a factual point, should be covered by exceptio veritatis or whatever) but I think it's a bit dishonest to say you can't hit Wikimedia, go after the individual users; but also, Wikimedia shouldn't be forced to reveal them.

Much better if the court would consider this information as being accurate and in the public interest.

Of course the GDPR cuts two ways here, because political information is an especially protected category, with certain exceptions (notorious information). So I'm not sure how the information on this person's affiliation to the far right was obtained and so on.

[–] modulus@lemmy.ml 3 points 2 years ago (5 children)

IMO the hardest part is the legal side, and in fact I'm not very clear how MS skirted that issue other than through US lax enforcement on corporations. In order to have a db like this one must store stuff that is, ordinarily, illegal to store. Because of the use of imperfect, so-called perceptual hashes, and in case of algorithm updates, I don't think one can get away with simply storing the hash of the file. Some kind of computer vision/AI-ish solution might work out, but I wouldn't want to be the person compiling that training set...

[–] modulus@lemmy.ml -4 points 2 years ago (11 children)

There's a reason really existing socialist formations almost invariably come down hard on drugs. It harms public health, it harms proletarian culture, productivity, and so on. There's no problem with industrial hemp but conflating this with THC-bearing weed for entertainment is a bit of a trick. Same thing for the medical uses. I admit I'm sceptical, and I suspect a lot of people with prescriptions are in fact using it for entertainment or escapism, but if it has genuine medical applications that's fine, same as it's fine to use morphine for pain management but not just for fun.

[–] modulus@lemmy.ml 18 points 2 years ago

Welcome back!

There were points at which Firefox was difficult to stick with, especially after the extension apocalypse, but I think it's evolving pretty well at this point.

[–] modulus@lemmy.ml 5 points 2 years ago (2 children)

It's not primarily the younger voters going fash though. Otherwise I mostly agree with your comment.

[–] modulus@lemmy.ml 11 points 2 years ago (1 children)

Thank goodness. But now what? I wonder if we'll have another election by the end of the year.

Hoping for the undoubtedly difficult negotiations to yield a left government instead.

[–] modulus@lemmy.ml 4 points 2 years ago

I find it impossible not to see it as a symmetric situation. The notion the US is restricting access to chips for natsec reasons may be true, if that includes restricting Chinese economic and technical development to maintain its hegemony. That China responds in kind is not only to be expected, but also fair from any possible neutral stance. The special pleading is especially apparent here. "No, it's different when they do it to us because we're the good guys." Really?

[–] modulus@lemmy.ml 2 points 2 years ago

Greg Egan's Distress has some of that.

[–] modulus@lemmy.ml 1 points 2 years ago

A bit soon to tell, but it's quite unclear what will happen. I don't find believable the article blaming cultural issues for the changes though, or UP's "messianic" ministers.

I think the issues are economic. Inflation has made many people angry and uncertain, and the same for increased interest rates. It's not as bad as in much of EU, but arguably there was less disposable income too. Whether the left can regain the initiative remains to be seen.

view more: ‹ prev next ›