The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall : technology

[–] poopkins@lemmy.world -1 points 1 week ago* (last edited 1 week ago) (2 children)

I've developed my own agent for assisting me with researching a topic I'm passionate about, and I ran into the exact same barrier: Cloudflare intercepts my request and is clearly checking if I'm a human using a web browser. (For my network requests, I've defined my own user agent.)

So I use that as a signal that the website doesn't want automated tools scraping their data. That's fine with me: my agent just tells me that there might be interesting content on the site and gives me a deep link. I can extract the data and carry on my research on my own.

I completely understand where Perplexity is coming from, but at scale, implementations like ~~this~~ Perplexity's are awful for the web.

(Edited for clarity)

load more comments (2 replies)

[–] xxce2AAb@feddit.dk -2 points 1 week ago

Ooh, that's though sweetheart. If the owners of those servers want you to visit, they'll just choose another WAF than CF's.

All zero of them.

[–] FauxLiving@lemmy.world -5 points 1 week ago* (last edited 1 week ago) (21 children)

The amount of people just reacting to the headline in the comments on these kinds of articles is always surprising.

Your browser acts as an agent too, you don’t manually visit every script link, image source and CSS file. Everyone has experienced how annoying it is to have your browser be targeted by Cloudflare.

There’s a pretty major difference between a human user loading a page and having it summarized and a bot that is scraping 1500 pages/second.

Cheering for Cloudflare to be the arbiter of what technologies are allowed is incredibly short sighted. They exist to provide their clients with services, including bot mitigation. But a user initiated operation isn’t the same as a bot.

Which is the point of the article and the article’s title.

It isn’t clear why OP had to alter the headline to bait the anti-ai crowd.

[–] unpossum@sh.itjust.works -2 points 1 week ago (1 children)

Thank you for trying to fight the irrational anti-AI brainrot on lemmy! It’s probably a lost cause, but your efforts are appreciated :)

load more comments (1 replies)

load more comments (20 replies)

[+] drmoose@lemmy.world -10 points 1 week ago (13 children)

It's insane that anyone would side with Cloudflare here. To this day I cant visit many websites like nexusmods just because I run Firefox on Linux. The Cloudflare turnstile just refreshes infinitely and has been for months now.

Cloudflare is the biggest cancer on the web, fucking burn it.

load more comments (13 replies)

[+] panda_abyss@lemmy.ca -23 points 1 week ago* (last edited 1 week ago) (11 children)

I actually agree with them

This feels like cloudflare trying to collect rent from both sides instead of doing what’s best for the website owners.

There is a problem with AI crawlers, but these technologies are essentially doing a search, fetching a several pages, scanning/summarizing them, then presenting the findings to the user.

I don’t really think that’s wrong, it’s just a faster version of rummaging through the SEO shit you do when you Google something.

(I’ve never used perplexity, I do use Kagi’s ki assistant for similar search. It runs 3 searches and scans the top results and then provides citations)

load more comments (11 replies)

Technology

Our Rules

Approved Bots