this post was submitted on 11 Dec 2025

547 points (96.6% liked)

Technology

77631 readers

1584 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

547

A Developer Accidentally Found CSAM in AI Data. Google Banned Him For It (www.404media.co)

submitted 2 days ago by themachinestops@lemmy.dbzer0.com to c/technology@lemmy.world

91 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[–] billwashere@lemmy.world 1 points 3 hours ago

I imagine most of these models have all kinds of nefarious things in them, sucking up all the info they could find indiscriminately.

[–] arararagi@ani.social 6 points 7 hours ago

"stop noticing things" -Google

[–] cyberpunk007@lemmy.ca 23 points 23 hours ago (2 children)

Child sexual abuse material.

Is it just me or did anyone else know what "CSAM" was already?

[–] chronicledmonocle@lemmy.world 5 points 7 hours ago

I had no idea what the acronym was. Guess I'm just sheltered or something.

[–] pipe01@programming.dev 13 points 22 hours ago (1 children)

Yeah it's pretty common, unfortunately

[–] TipsyMcGee@lemmy.dbzer0.com 3 points 20 hours ago (2 children)

I don’t quite get it, is it a wider term than child pornography or more narrow (e.g. excludes some types of materials that could be considered porn but strictly speaking doesn’t depict abuse)? The abbreviation sounds like some kind of exotic Surface to Air Missile lol

[–] Midvikudagur@lemmy.world 11 points 9 hours ago

"Child pornography" is a term NGO's and Law enforcement are trying to get phased out. It makes it sound like CSAM is related to porn, when in fact it is simply abuse of a minor.

[–] ulterno@programming.dev 1 points 9 hours ago

The abbreviation sounds like some kind of exotic Surface to Air Missile lol

It does.
Somehow acronyms just end up sounding cool. Guess we should just use the full form. That would be better.

[–] finitebanjo@lemmy.world 29 points 1 day ago (3 children)

My dumb ass sitting here confused for a solid minute thinking CSAM was in reference to a type of artillery.

[–] pigup@lemmy.world 15 points 1 day ago

Combined surface air munitions

[–] llama@lemmy.zip 4 points 1 day ago (1 children)

Right I thought it was cyber security something or other like API keys now duck duck go probably thinks I'm a creep

[–] echodot@feddit.uk 1 points 10 hours ago

This guy is into some really weird stuff, and not the normal henti that everyone else is asking for.

[–] Hozerkiller@lemmy.ca 2 points 1 day ago

I feel that I assumed it was something like SCCM.

[–] Devial@discuss.online 176 points 2 days ago* (last edited 2 days ago) (37 children)

The article headline is wildly misleading, bordering on being just a straight up lie.

Google didn't ban the developer for reporting the material, they didn't even know he reported it, because he did so anonymously, and to a child protection org, not Google.

Google's automatic tools, correctly, flagged the CSAM when he unzipped the data and subsequently nuked his account.

Google's only failure here was to not unban on his first or second appeal. And whilst that is absolutely a big failure on Google's part, I find it very understandable that the appeals team generally speaking won't accept "I didn't know the folder I uploaded contained CSAM" as a valid ban appeal reason.

It's also kind of insane how this article somehow makes a bigger deal out of this devolper being temporarily banned by Google, than it does of the fact that hundreds of CSAM images were freely available online and openly sharable by anyone, and to anyone, for god knows how long.

[–] ulterno@programming.dev 4 points 8 hours ago* (last edited 8 hours ago) (1 children)

Another point is, the reason Google's AI is able to identify CSAM is because it has that in its training data, flagged as such.

In that case, it would have detected the training material as ~100% match.

I don't get though, how it ended up being openly available as if it were properly tagged, they would probably exclude it from the open-sourced data. And now I see it would also not be viable to have an open-source, openly scrutinisable AI deployment for CSAM detection for the same reason.

And while some governmental body got a lot of backlash for trying to implement such an AI thing on chat stuff, Google gets to do so all it wants because it's E-Mail/GDrive and all on their servers and you can't expect privacy.

Considering how many such stories of people having problems due to this system is coming up, is there any statistic of legitimate catches using this model? I suspect not, because why would anyone use Google services for this kind of stuff?

[–] arararagi@ani.social 2 points 7 hours ago (1 children)

You would think, but none of these companies actually make their own dataset, they buy from third parties.

[–] ulterno@programming.dev 0 points 6 hours ago

I am not sure which point you are answering to.
COuld you please specify.

[–] MangoCats@feddit.it 17 points 1 day ago

Google’s only failure here was to not unban on his first or second appeal.

My experience of Google and the unban process is: it doesn't exist, never works, doesn't even escalate to a human evaluator in a 3rd world sweatshop - the algorithm simply ignores appeals inscrutably.

[–] forkDestroyer 22 points 1 day ago (2 children)

I'm being a bit extra but...

Your statement:

The article headline is wildly misleading, bordering on being just a straight up lie.

The article headline:

A Developer Accidentally Found CSAM in AI Data. Google Banned Him For It

The general story in reference to the headline:

He found csam in a known AI dataset, a dataset which he stored in his account.
Google banned him for having this data in his account.
The article mentions that he tripped the automated monitoring tools.

The article headline is accurate if you interpret it as

"A Developer Accidentally Found CSAM in AI Data. Google Banned Him For It" ("it" being "csam").

The article headline is inaccurate if you interpret it as

"A Developer Accidentally Found CSAM in AI Data. Google Banned Him For It" ("it" being "reporting csam").

I read it as the former, because the action of reporting isn't listed in the headline at all.

^___^

[–] WildPalmTree@lemmy.world 1 points 8 hours ago

The inclusion of "found" indicates that it is important to the action taken by Google, would be my interpretation.

[–] Blubber28@lemmy.world 6 points 1 day ago (3 children)

This is correct. However, many websites/newspapers/magazines/etc. love to get more clicks with sensational headlines that are technically true, but can be easily interpreted as something much more sinister/exciting. This headline is a great example of it. While you interpreted it correctly, or claim to at least, there will be many people that initially interpret it the second way you described. Me among them, admittedly. And the people deciding on the headlines are very much aware of that. Therefore, the headline can absolutely be deemed misleading, for while it is absolutely a correct statement, there are less ambiguous ways to phrase it.

[–] MangoCats@feddit.it 3 points 1 day ago

can be easily interpreted as something...

This is pretty much the art of sensational journalism, popular song lyric writing and every other "writing for the masses" job out there.

Factual / accurate journalism? More noble, but less compensated.

load more comments (2 replies)

[–] ayyy@sh.itjust.works 3 points 1 day ago

The article headline is wildly misleading, bordering on being just a straight up lie.

A 404Media headline? The place exclusively staffed by former BuzzFeed/Cracked employees? Noooo, couldn’t be.

load more comments (33 replies)

[–] TheJesusaurus@sh.itjust.works 99 points 2 days ago

Why confront the glaring issues with your "revolutionary" new toy when you could just suppress information instead

[–] killea@lemmy.world 45 points 2 days ago (11 children)

So in a just world, google would be heavily penalized for not only allowing csam on their servers, but also for violating their own tos with a customer?

[–] shalafi@lemmy.world 19 points 2 days ago (3 children)

We really don't want that first part to be law.

Section 230 was enacted as part of the Communications Decency Act of 1996 and is a crucial piece of legislation that protects online service providers and users from being held liable for content created by third parties. It is often cited as a foundational law that has allowed the internet to flourish by enabling platforms to host user-generated content without the fear of legal repercussions for that content.

Though I'm not sure if that applies to scraping other server's content. But I wouldn't say it's fair for the scraper to review everything. If we don't like that take, then we should illegalize scraping altogether, but I'm betting there are unwanted side effects to that.

load more comments (3 replies)

load more comments (10 replies)

[–] hummingbird@lemmy.world 19 points 2 days ago

It goes to show: developers should make sure they don't make their livelihood dependent on access to Google services.

load more comments