this post was submitted on 11 Dec 2025

547 points (96.6% liked)

Technology

77631 readers

1584 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

547

A Developer Accidentally Found CSAM in AI Data. Google Banned Him For It (www.404media.co)

submitted 2 days ago by themachinestops@lemmy.dbzer0.com to c/technology@lemmy.world

91 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] Devial@discuss.online 176 points 2 days ago* (last edited 2 days ago) (7 children)

The article headline is wildly misleading, bordering on being just a straight up lie.

Google didn't ban the developer for reporting the material, they didn't even know he reported it, because he did so anonymously, and to a child protection org, not Google.

Google's automatic tools, correctly, flagged the CSAM when he unzipped the data and subsequently nuked his account.

Google's only failure here was to not unban on his first or second appeal. And whilst that is absolutely a big failure on Google's part, I find it very understandable that the appeals team generally speaking won't accept "I didn't know the folder I uploaded contained CSAM" as a valid ban appeal reason.

It's also kind of insane how this article somehow makes a bigger deal out of this devolper being temporarily banned by Google, than it does of the fact that hundreds of CSAM images were freely available online and openly sharable by anyone, and to anyone, for god knows how long.

[–] ulterno@programming.dev 4 points 8 hours ago* (last edited 8 hours ago) (1 children)

Another point is, the reason Google's AI is able to identify CSAM is because it has that in its training data, flagged as such.

In that case, it would have detected the training material as ~100% match.

I don't get though, how it ended up being openly available as if it were properly tagged, they would probably exclude it from the open-sourced data. And now I see it would also not be viable to have an open-source, openly scrutinisable AI deployment for CSAM detection for the same reason.

And while some governmental body got a lot of backlash for trying to implement such an AI thing on chat stuff, Google gets to do so all it wants because it's E-Mail/GDrive and all on their servers and you can't expect privacy.

Considering how many such stories of people having problems due to this system is coming up, is there any statistic of legitimate catches using this model? I suspect not, because why would anyone use Google services for this kind of stuff?

[–] arararagi@ani.social 2 points 7 hours ago (1 children)

You would think, but none of these companies actually make their own dataset, they buy from third parties.

[–] ulterno@programming.dev 0 points 6 hours ago

I am not sure which point you are answering to.
COuld you please specify.

[–] MangoCats@feddit.it 17 points 1 day ago

Google’s only failure here was to not unban on his first or second appeal.

My experience of Google and the unban process is: it doesn't exist, never works, doesn't even escalate to a human evaluator in a 3rd world sweatshop - the algorithm simply ignores appeals inscrutably.

[–] forkDestroyer 22 points 1 day ago (2 children)

I'm being a bit extra but...

Your statement:

The article headline is wildly misleading, bordering on being just a straight up lie.

The article headline:

A Developer Accidentally Found CSAM in AI Data. Google Banned Him For It

The general story in reference to the headline:

He found csam in a known AI dataset, a dataset which he stored in his account.
Google banned him for having this data in his account.
The article mentions that he tripped the automated monitoring tools.

The article headline is accurate if you interpret it as

"A Developer Accidentally Found CSAM in AI Data. Google Banned Him For It" ("it" being "csam").

The article headline is inaccurate if you interpret it as

"A Developer Accidentally Found CSAM in AI Data. Google Banned Him For It" ("it" being "reporting csam").

I read it as the former, because the action of reporting isn't listed in the headline at all.

^___^

[–] WildPalmTree@lemmy.world 1 points 8 hours ago

The inclusion of "found" indicates that it is important to the action taken by Google, would be my interpretation.

[–] Blubber28@lemmy.world 6 points 1 day ago (2 children)

This is correct. However, many websites/newspapers/magazines/etc. love to get more clicks with sensational headlines that are technically true, but can be easily interpreted as something much more sinister/exciting. This headline is a great example of it. While you interpreted it correctly, or claim to at least, there will be many people that initially interpret it the second way you described. Me among them, admittedly. And the people deciding on the headlines are very much aware of that. Therefore, the headline can absolutely be deemed misleading, for while it is absolutely a correct statement, there are less ambiguous ways to phrase it.

[–] MangoCats@feddit.it 3 points 1 day ago

can be easily interpreted as something...

This is pretty much the art of sensational journalism, popular song lyric writing and every other "writing for the masses" job out there.

Factual / accurate journalism? More noble, but less compensated.

[–] obsoleteacct@lemmy.zip 1 points 1 day ago (1 children)

It is a terrible headline. It can be debated whether it's intentionally misleading, but if the debate is even possible then the writing is awful.

[–] MangoCats@feddit.it 2 points 1 day ago

if the debate is even possible then the writing is awful.

Awfully well compensated in terms of advertising views as compared with "good" writing.

Capitalism in the "free content market" at work.

[–] ayyy@sh.itjust.works 3 points 1 day ago

The article headline is wildly misleading, bordering on being just a straight up lie.

A 404Media headline? The place exclusively staffed by former BuzzFeed/Cracked employees? Noooo, couldn’t be.

[–] cupcakezealot@piefed.blahaj.zone 1 points 1 day ago (1 children)

so they got mad because he reported it to an agency that actually fights csam instead of them so they can sweep it under the rug?

[–] Devial@discuss.online 24 points 1 day ago* (last edited 1 day ago) (1 children)

They didn't get mad, they didn't even know THAT he reported it, and they have no reason or incentive to swipe it under the rug, because they have no connection to the data set. Did you even read my comment ?

I hate Alphabet as much as the next person, but this feels like you're just trying to find any excuse to hate on them, even if it's basically a made up reason.

[–] cupcakezealot@piefed.blahaj.zone -5 points 1 day ago (3 children)

they obviously did if they banned him for it; and if they're training on csam and refuse to do anything about it then yeah they have a connection to it.

[–] Devial@discuss.online 7 points 1 day ago* (last edited 1 day ago)

Also, the data set wasn't hosted, created, or explicitly used by Google in any way.

It was a common data set used in various academic papers on training nudity detectors.

Did you seriously just read the headline, guess what happened, and are now arguing based on that guess that I, who actually read the article, am wrong about it's content ? Because that's sure what it feels like reading your comments......

[–] MangoCats@feddit.it 3 points 1 day ago

Google doesn't ban for hate or feels, they ban by algorithm. The algorithms address legal responsibilities and concerns. Are the algorithms perfect? No. Are they good? Debatable. Is it possible to replace those algorithms with "thinking human beings" that do a better job? Also debatable, from a legal standpoint they're probably much better off arguing from a position of algorithm vs human training.

[–] Devial@discuss.online 3 points 1 day ago* (last edited 1 day ago)

So you didn't read my comment then did you ?

He got banned because Google's automated monitoring system, entirely correctly, detected that the content he unzipped contained CSAM. It wasn't even a manual decision to ban him.

His ban had literally nothing whatsoever to do with the fact that the CSAM was part of an AI training data set.

[+] bobzer@lemmy.zip -33 points 1 day ago (2 children)

CSAM images

ATM machine

[–] Goodlucksil@lemmy.dbzer0.com 27 points 1 day ago (1 children)

CSAM stands for "material". Adding "image" specifies what kind of material it is.

[–] Devial@discuss.online 10 points 1 day ago (1 children)

Which of the letters in CSAM stand for images then ?

[+] bobzer@lemmy.zip -6 points 1 day ago (1 children)

Material.

[–] Devial@discuss.online 5 points 1 day ago (2 children)

Material can be anything. It can be images, videos theoretically even audio recordings.

Images is a relevant and sensible distinction. And judging by the downvotes you're collecting, the majority of people disagree with you.

[–] MangoCats@feddit.it 1 points 1 day ago

Material can be anything.

And, if you're trying to authorize law enforcement to arrest and prosecute, you want the broadest definitions possible.

[–] bobzer@lemmy.zip -4 points 1 day ago (1 children)

You're right. It can be images, that's exactly why saying "this man was found in possession of child abuse material images" does not make grammatical sense. It's why CP still defines it better as we're not arresting people for owning copies of Lolita, which you could argue is also CSAM.

the majority of people disagree with you.

The majority of people can be wrong.

[–] Devial@discuss.online 2 points 1 day ago* (last edited 1 day ago) (1 children)

Big "Ben Shapiro ranting about renewable energies because of the first law of thermodynamics" energy right here.

And your point is literally the opposite. Lolita could be argued to be child porn, as it's pornographic material showing (fictional/animated) children. It is objectively NOT CSAM, because it does not contain CSA, because you can't sexually abuse a fictional animated character.

CP is also a common acronym that can mean many other things.

Porn also implies it's a work of artistic intent, which is just wrong for CSAM.

The majority of people can be wrong.

No they can't, not with regards to linguistics. Linguistics is a descriptive science, not a prescriptive one. Words and language, by definition, and convention of every serious linguist in the world, mean what the majority of people think them to mean. That's how language works.

[–] bobzer@lemmy.zip -3 points 1 day ago (1 children)

"I'm mad you're right so let me compare you to a hateful right wing grifter and also by the way, you're wrong because all my friends say so."

It may shock you but a handful of Lemmy users doesn't constitute the linguistic consensus you're trying to inherit here.

[–] Devial@discuss.online 3 points 1 day ago* (last edited 1 day ago) (1 children)

I'm not comparing you to Ben Shapiro, I'm comparing your grammar nazi pedantism to a single specific instance of his grammar nazi pedantism.

I also gave several explicit reasons why using CP over CSAM is idiotic, not just "my friends say so"

So that's 2 for 2 for wildly and dishonestly misrepresenting my points.

But hey, if you want to be like that sure.

You're right, everyone else is wrong, you do you and keep using CP instead of CSAM, and keep getting irrationally upset and angry at people who think CSAM is a better term. Happy now ?

[–] bobzer@lemmy.zip -2 points 1 day ago* (last edited 1 day ago) (1 children)

Sorry. To address your two points, where did people get the idea that the word porn implies artistic merit or consent? It simply means material intended to sexually excite people. Is that meaning not important to retain as it's the difference between what we consider CSAM and a picture of your kid in the tub?

CP can stand for a lot of things but it's common parlance now. CSAM just causes confusion.

Also really? Now you're stooping to the old "why so mad bro?". You're the one having a meltdown, I'm wasting time at work by sharing an opinion.

[–] Devial@discuss.online 3 points 1 day ago* (last edited 13 hours ago)

To address your two points, where did people get the idea that the word porn implies artistic merit or consent?

I didn't say merit (or consent, though I assume that one's a typo), I said artistic intent. Which every creative work by definition has. And I don't consider CSAM to be a creative work. It's just abuse, created opportunisticly with no real artistic or creative consideration.

Also, there is nothing ethically wrong with porn in a vacuum, so categorising this material as a sub-category of something that isn't inherently ethically wrong in my opinion makes it a bad term. The term CSAM clearly and strictly delineates it from consensual porn.

CP can stand for a lot of things but it's common parlance now. CSAM just causes confusion.

Ah yes. The Acronym with MORE common definitions somehow causes less confusion. That makes perfect sense. Of course. That explains why so many people in this thread were confused by it. Oh no wait. They weren't.

Also really? Now you're stooping to the old "why so mad bro?". You're the one having a meltdown, I'm wasting time at work by sharing an opinion.

You're the one who got upset enough about me using a common abbreviation, that no one in the thread was remotely confused by, to kick off this entire shit. You decided you needed to pedantically comment on this. I'm simply defending myself from your pedantic grammar nazi shit.

[+] Cybersteel@lemmy.world -27 points 1 day ago (4 children)

We need to block access to the web to certain known actors and tie ipaddresses to IDs, names, passport number. For the children.

[–] tetris11@feddit.uk 11 points 1 day ago

Also, pay me exhorbitant amounts of tax-payer money to ineffectually enforce this. For the children.

[–] kylian0087@lemmy.dbzer0.com 16 points 1 day ago (1 children)

Oh hell no. That's a privacy nightmare to he abused like hell.

Also that wouldn't work at all what you say.

[+] Cybersteel@lemmy.world -9 points 1 day ago (1 children)

In the current digitized world, trivial information is accumulating every second; preserved in all it's tritness, never fading, always accessible; rumors of petty issues, misinterpretations, slander.

All junk data preserved in an unfiltered state, growing at an alarming rate, it will only slow down social progress.

The digital society furthers human flaws and selectively rewards development of convenient half-truths. Just look at the strange juxtaposition of morality around us. Billions spent on new weapons to humanely murder other humans. Rights of criminals are given more respect than the privacy of their own victims. Although there are people in poverty, huge donations are made to protect endangered species; everyone grows up being told what to do.

"Be nice to other people."

"But beat out the competition."

"You're special, believe in yourself and you will succeed".

But it's obvious from the start that only a few can succeed.

You exercise your right to freedom and this is the result. All the rhetoric to avoid conflict and protect each other from hurt. The untested truths spun by different interests continue to churn and accumulate in the sandbox of political correctness and value systems.

Everyone withdrawals into their own small gated community, afraid of a larger forum; they stay inside their little ponds leaking what ever "truth" suits them into the growing cesspool of society at large.

The different cardinal truths neither clash nor mesh, no one is invalidated but no one is right. Not even natural selection can take place here.

The world is being engulfed in "Truth". And this is the way the world ends. Not with a BANG, but with a...

[–] zalgotext@sh.itjust.works 7 points 1 day ago

Is this a fresh new copypasta, or are you just a really long-winded, elaborate troll?

[–] jjlinux@lemmy.zip 5 points 1 day ago (1 children)

Fuck you, and everything you stand for.

[–] driving_crooner@lemmy.eco.br 15 points 1 day ago (1 children)

That sounds like sarcasm to me

[–] x0x7@lemmy.world -3 points 1 day ago (1 children)

People on Lemmy don't understand sarcasm because they have brain damage.

[–] asudox@lemmy.asudox.dev 1 points 1 day ago (1 children)

including you

[–] x0x7@lemmy.world -1 points 4 hours ago (1 children)

Apparently you didn't get the sarcasm.

[–] asudox@lemmy.asudox.dev 2 points 4 hours ago

I don't see the sarcasm in your comment.

[–] NoForwardslashS@sopuli.xyz 0 points 1 day ago

No need to go that far. If we just require one valid photo ID for TikTok, the children will finally be safe.