this post was submitted on 03 Jan 2026
913 points (99.4% liked)

Fuck AI

5167 readers
1789 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

AI, in this case, refers to LLMs, GPT technology, and anything listed as "AI" meant to increase market valuations.

founded 2 years ago
MODERATORS
 

On January 1, I received a bill from my web hosting provider for a bandwidth overage for $155. I've never had this happen before. For comparison, I pay about $400/year for the hosting service, and usually the limitation is disk space.

Turns out, on December 17, my bandwidth usage jumped dramatically - see the attached graph.

I run a few different sites, but tech support was able to help me narrow it down to one site. This is a hobbyist site, with a small phpBB forum, for a very specific model of motorhome that hasn't been built in 25 years. This is NOT a high traffic site; we might get a new post once a week...when it's busy. I run it on my own dime; there are no ads, no donation links, etc.

Tech support found that AI bots were crawling the site repeatedly. In particular, OpenAI's bot was hitting it extremely hard.

Here's an example: There are about 1,500 attachments to posts (mostly images), totaling about 1.5 GB on the disc. None of these are huge; a few are into the 3-4 megabyte range, probably larger than necessary, but not outrageously large either. The bot pulled 1.5 terabytes on just those pictures. It kept pulling the same pictures repeatedly and only stopped because I locked the site down. This is insane behavior.

I locked down the pictures so you had to be logged in to see them, but the attack continued. This morning I took the site offline to stop the deluge.

My provider recommended implementing Cloudflare, which initially irritated me, until I realized there was a free tier. Cloudflare can block bots, apparently. I'll re-enable the site in a few days after the dust settles.

I contacted OpenAI, arguing with their bot on the site, demanding the bug that caused this be fixed. The bot suggested things like "robots.txt", which I did, but...come on, the bot shouldn't be doing that, and I shouldn't be on the hook to fix their mistake. It's clearly a bug. Eventually the bot gave up talking to me, and an apparent human emailed me with the same info. I replied, trying to tell them that their bot has a bug to cause this. I doubt they care, though.

I also asked for their billing address, so I can send them a bill for the $155 and my consulting fee time. I know it's unlikely I'll ever see a dime. Fortunately my provider said they'd waive the fee as a courtesy, as long as I addressed the issue, but if OpenAI does end up coming through, I'll tell my provider not to waive it. OpenAI is responsible for this and should pay for it.

This incident reinforces all of my beliefs about AI: Use everyone else's resources and take no responsibility for it.

top 50 comments
sorted by: hot top controversial new old
[–] artyom@piefed.social 236 points 1 week ago* (last edited 1 week ago) (11 children)

What you are experiencing is the unfortunate reality of hosting any kind of site on the open internet in the AI era. You can't do it without implementing some sort of bot detection and rate limiting or your site will either be DDOS'd or you'll incurr insane fees from your provider.

The bot suggested things like "robots.txt",

You can do that but they will ignore it.

I'll re-enable the site in a few days after the dust settles.

They'll just attack again.

It's clearly a bug.

It's not a bug. This is very common practice these days.

My provider recommended implementing Cloudflare, which initially irritated me, until I realized there was a free tier.

Please consider Anubis instead.

[–] AsgerFD@programming.dev 19 points 1 week ago* (last edited 1 week ago) (1 children)

There's also Iocaine ~~as an alternative to Anubis~~. I've not tried Anubis nor Iocaine myself though.

[–] db0@lemmy.dbzer0.com 37 points 1 week ago

Iocaine is not an alternative to anubis. It's a different tool in the toolbox and can be used along with it, but it has a different purpose. Something like haphash is an anubis alternative

load more comments (10 replies)
[–] halcyoncmdr@lemmy.world 127 points 1 week ago* (last edited 1 week ago) (3 children)

File in small claims court for the fees and time if they refuse or don't respond. OpenAI isn't going to bother sending a representative for such a small amount.

[–] hansolo@lemmy.today 70 points 1 week ago

OP, please consider this. It's likely to actually work.

[–] tate@lemmy.sdf.org 25 points 1 week ago (2 children)

Once you win a small claim it is up to you to collect. They will never manage to collect.

[–] Nollij@sopuli.xyz 60 points 1 week ago (1 children)

Something to remember is that small claims is very cheap, and accessible for the average person. It's something like $35 filing, and they can't even send their lawyers. You need to do some research and bring all sorts of documentation to support your claims, but it's not meant to be intimidating.

Once you win, you can enlist the police to help you enforce the judgment. See what Warren and Maureen Nyerges did to Bank of America in 2011.

Yes, you will probably need additional judgments to enforce the original one that they will ignore, but you can keep getting attorneys fees added to the total.

[–] toiletobserver@lemmy.world 26 points 1 week ago (1 children)

When they collect via sheriff enforcement, do something funny like seizing any and all Ethernet terminations for bulk resale.

[–] halcyoncmdr@lemmy.world 37 points 1 week ago

You just go back to the court showing they're not paying the court mandated restitution.

Yes it takes time, yes it will probably cost more in time alone than the $155 issue that started it. But you can get increased penalties awarded for failure to pay.

Small claims courts really don't like big businesses ignoring the little man.

load more comments (1 replies)
[–] UpperBroccoli@lemmy.blahaj.zone 75 points 1 week ago (3 children)

I have experienced something similar. I run a small forum for a computer games series, a series I myself have not been interested in a long time. I am just running it because the community has no other place to go, and they seem to really enjoy it.

A few months ago, I received word from them that the forum barely responded anymore. I checked it out and noticed there were several hundred active connections at any time, something we have never seen before. After checking the whois info on the IPs, I realized they were all connected to meta, google, apple, microsoft and other AI companies.

It felt like a coordinated DDoS attack and certainly had almost the same effect. Now, I have a hosting contract where I pay a flat monthly fee for a complete server and any traffic going through it, so it was not a problem financially speaking, but those AI bots made the server almost unusable. Naturally, I went ahead and blocked all the crawler IPs that I could find, and that relieved the pressure a lot, but I still keep finding new ones.

Fuck all of those companies, fuck the lot of them. All they do is rob and steal and plunder, and leave charred ruins. And for what? Fan fiction. Unbelievable.

[–] dgriffith@aussie.zone 27 points 1 week ago (1 children)

Maybe it's time to implement an AI tarpit. Each response for a request from a particular IP address or range takes double the time of the previous, with something like a 30 second cool down window before response time halves.

Would stop AI scrapers in their tracks, but it wouldn't hurt normal users too much.

Maybe I should start looking into it a bit more 🤔

[–] limelight79@lemmy.world 17 points 1 week ago (1 children)

Apparently my phpbb forum served as a nice tar pit. The only thing I can figure is that they neglected to take session IDs into account, so they assumed every url was a different page.

[–] gothic_lemons@lemmy.world 16 points 1 week ago

Not an expert or anything but could a script be made that feeds a bot an endless steam of unique tinyurls that points to images openai pays to host?

[–] Candice_the_elephant@lemmy.world 10 points 1 week ago* (last edited 1 week ago) (1 children)

Could you run a script that presents the AI bots with alternative believable but incorrect text based information? That would be a great way to fight back.

You could even implement an AI to rewrite your content with intentional errors so you don't have to generate the misinformation yourself. Sounds like a great use for AI.

[–] Cypher@lemmy.world 21 points 1 week ago (1 children)

Nepenthes already does a better job of this than what you’re proposing and doesn’t require AI.

https://hackaday.com/2025/01/23/trap-naughty-web-crawlers-in-digestive-juices-with-nepenthes/

load more comments (1 replies)
load more comments (1 replies)
[–] 4am@lemmy.zip 67 points 1 week ago (1 children)

Send an invoice to OpenAI for abusing your EULA and demand payment. Report them to all three credit bureaus when they don’t. Encourage others to do the same.

[–] IphtashuFitz@lemmy.world 10 points 1 week ago

Hell. I’d look into taking them to small claims court if they don’t pay the invoice. If that became common practice then OpenAI may actually do something about it.

[–] callyral@pawb.social 50 points 1 week ago (1 children)

Consider looking into Anubis or Iocaine. I have heard about them and apparently they're pretty helpful, though I don't self-host my own website so take this with a grain of salt.

load more comments (1 replies)
[–] snooggums@piefed.world 45 points 1 week ago

This isn't a bug, this is how AI is designed to work and it is absolutely terrible foe the web. If it was actually designed well it would use robots.txt (it doesn't care) and cache common query results but instead it sends out fresh queries and pulls down data over and over again just in case something changed.

It is malicious and should be treated as such, but it isn't.

[–] adamth0@lemmy.world 40 points 1 week ago (2 children)

Where robots.txt has failed for me in the past, I have added dummy paths to it (and other similar paths hidden in html or in JS variables) which, upon being visited, cause the offending IP to be blocked.
Eg, I'll add a /blockmeplease/ reference in robots.txt, and when anything visits that path, its IP, User-Agent, etc get recorded and it gets its IP blocked automatically.

[–] perviouslyiner@lemmy.world 14 points 1 week ago* (last edited 1 week ago)

https://www.jwz.org/robots.txt is a useful template that results in fail2ban "running full tilt", but apparently the AI companies will slurp that shut up enough to cause more technical problems - that site seems to have some practical advice on handling this sort of thing without giving control to cloudflare

load more comments (1 replies)
[–] drunkpostdisaster@lemmy.world 34 points 1 week ago

It gets worse every fucking day

[–] FalschgeldFurkan@lemmy.world 29 points 1 week ago (1 children)

That shit cannot be legal. It's like DDoS but without getting the target offline... I hope this all works out for you, and that you get OpenAI to pay for it.

(Why are these asshats calling themselves "open" anyways when they are clearly not?)

load more comments (1 replies)
[–] echodot@feddit.uk 28 points 1 week ago (1 children)

The robots.txt scene is a waste of time. I've had arguments with people about this for about 10 years now and they still seem to think it's some god level solution.

By all means make use of it as some callers do pay attention but you can just download a basic boilerplate and use that, there is zero pointing customising it beyond that unless you find that the basic template causes a problem for your specific configuration. Lots of the bots simply ignore it, this has been a problem for years and has only got worse in the AI era.

Cloudflare probably would stop most of the problems of course the other option is to just rate limit the site in general it sounds like you probably don't need anything particularly complicated since it doesn't seem like the site is hugely active.

load more comments (1 replies)
[–] minorkeys@lemmy.world 27 points 1 week ago (2 children)

Get used to it, there won't be AI regulation until Trump and MAGA are gone.

[–] HeyThisIsntTheYMCA@lemmy.world 9 points 1 week ago (1 children)

there might be regulation protecting ai

load more comments (1 replies)
load more comments (1 replies)
[–] SLVRDRGN@lemmy.world 26 points 1 week ago* (last edited 1 week ago) (1 children)

Robots.txt is a standard, developed in 1994, that relies on voluntary compliance.

Voluntary compliance is conforming to a rule, without facing negative consequences if not complying.

Malicious web robots are unlikely to honor robots.txt; some may even use the robots.txt as a guide to find disallowed links and go straight to them.

This is all from Wikipedia's entry on Robots.txt.
I don't get how we only have voluntary protocols for things like this at this point in 2025 AD..

[–] limelight79@lemmy.world 10 points 1 week ago (4 children)

Yeah that's part of why I was so frustrated with the answer from OpenAI about it. I don't think I mentioned it in the writeup, but I actually did modify robots.txt on Jan 1 to block OpenAI's bot, and it didn't stop. In fairness, there's probably some delay before it re-reads the file, but who knows how long it would have taken for the bot to re-read it and stop flooding the site (assuming it obeys at all) - and it still would have been sucking data until that point.

I also didn't mention that the support bot gave me the wrong URL for the robots.txt info on their site. I pointed it out and it gave me the correct link. So, it HAD the correct link and still gave me the wrong one! Supporters say, "Oh, yeah, you have to point out its errors!" Why the fuck would I want to argue with it? Also, I'm asking questions because I don't know the answer! If I knew the correct answer, why would I be asking?

In the abstract, I see the possibilities of AI. I get what they're trying to do, and I think there may be some value to AI in the future for some applications. But right now they're shoveling shit at all of us and ripping content creators off.

load more comments (4 replies)
[–] nublug@piefed.blahaj.zone 26 points 1 week ago* (last edited 1 week ago) (1 children)

try out crowdsec, it's a modern alternative to fail2ban that crowdsources ip blocks from its users with similar setups (optional to contribute to). in the first day i had it set up it had blocked over 50k attempts, mostly scraping and enumeration but also some known http exploit attempts and bruteforcing.

you get 3 blocklists with a free acct so sort the blocklists on their site by size and get the three biggest and you'll block the vast majority before crowdsec even has to evaluate rules. only like 100 or so or mine have been blocked dynamically by crowdsec, the rest of the now 200k or so total have been those blocklists.

edit: tho i don't know how much control you have on your hosting service, whether you can install something like this or only plugin things they have integrated into the service themselves.

[–] limelight79@lemmy.world 10 points 1 week ago

Yeah that's kind of the issue with some of the solutions that have been offered, I can't install new software on the server.

[–] Greg@lemmy.ca 25 points 1 week ago (1 children)

Cloudflare also has caching on the free tier which will reduce these kinds of AI attacks

load more comments (1 replies)
[–] gkaklas@lemmy.zip 23 points 1 week ago (7 children)

For self-hosting there is go-away and anubis

Also, at least on my server, the LLM bots have correct User-Agents set (they identify themselves) so you could block them from that as well

(If you are able to manage it yourself, self-hosting on a VPS instead of a web hosting provider would also have the benefit of costing around €80/year with ~20TB/month; phpBB should be easy to set up with containers (I don't know about migrating your existing data though))

load more comments (7 replies)
[–] Damage@feddit.it 22 points 1 week ago (1 children)

Tragedy of the commons, modern web edition

load more comments (1 replies)
[–] DrFistington@lemmy.world 21 points 1 week ago

If you sue them in civil court, you have a surprisingly good chance of winning

[–] ThirdConsul@lemmy.zip 15 points 1 week ago (1 children)

If you have money to spend you might want to go to a small claims court (consult the lawyer first). It would be extra funny if you've managed to get a lien over OpenAi infrastructure lol or just get int and start taking their laptops and such.

[–] KelvarCherry@lemmy.blahaj.zone 11 points 1 week ago (6 children)

Small Claims is cheap AF (IIRC it's like 25 dollars to sue) and by the rules in the US you HAVE to represent yourself - no lawyers allowed. I doubt an executive is going to take a private jet down to your town. You should win by default.

load more comments (6 replies)
[–] utopiah@lemmy.world 15 points 1 week ago (2 children)

Tech support found that AI bots were crawling the site repeatedly. In particular, OpenAI’s bot was hitting it extremely hard.

Yup... I just had to read your title to know how it happened. In fact more than a year ago at OFFDEM (the off discussion parallel to FOSDEM in Brussels) we discussed how to mitigate such practices because at least 2 of us self-hosting had this problem. I had problem with my own forge because AI crawlers generate archives and that quickly generate quite a bit of space. It's a well known problem that's why there are quite a few "mazes" out there or simply blocking rules for HTTPS or reverse proxies.

AI hype is so destructive for the Web.

load more comments (2 replies)
[–] LiveLM@lemmy.zip 14 points 1 week ago (1 children)

The bot pulled 1.5 terabytes on just those pictures

It's no wonder these assholes still aren't profitable. Idiots burning all this bandwidth on the same images over and over

load more comments (1 replies)
[–] Stupidmanager@lemmy.world 14 points 1 week ago

I ran a small hobby site that generated custom lambda to make a serverless, white label “what’s my ip” site. It was an exercise in learning, that was repeatedly beaten in by OpenAI. robots.txt was useless and cloudflare worked wonders after I blocked all access to the real site for all ips but cloudflare.

Cost was near $1000 for just 2 weeks of it repeatedly hitting the site and I wish I got credit.

[–] gwl@lemmy.blahaj.zone 10 points 1 week ago (3 children)

Check out the idea of "Tarpits", a tool that traps bots

load more comments (3 replies)
load more comments
view more: next ›