Cybersecurity

8004 readers

114 users here now

c/cybersecurity is a community centered on the cybersecurity and information security profession. You can come here to discuss news, post something interesting, or just chat with others.

THE RULES

Instance Rules

Be respectful. Everyone should feel welcome here.
No bigotry - including racism, sexism, ableism, homophobia, transphobia, or xenophobia.
No Ads / Spamming.
No pornography.

Community Rules

Idk, keep it semi-professional?
Nothing illegal. We're all ethical here.
Rules will be added/redefined as necessary.

If you ask someone to hack your "friends" socials you're just going to get banned so don't do that.

Learn about hacking

Hack the Box

Try Hack Me

Pico Capture the flag

Other security-related communities !databreaches@lemmy.zip !netsec@lemmy.world !securitynews@infosec.pub !cybersecurity@infosec.pub !pulse_of_truth@infosec.pub

Notable mention to !cybersecuritymemes@lemmy.world

founded 2 years ago

MODERATORS

kid@sh.itjust.works

Lanky_Pomegranate530@midwest.social

Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives (blog.cloudflare.com)

submitted 3 days ago by kid@sh.itjust.works to c/cybersecurity@sh.itjust.works

5 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] beeng@discuss.tchncs.de 9 points 3 days ago (1 children)

Saw this and a reply from perplexity in their blog essentially said "cos the user asked us to find the information, we do it on behalf of the user and therefore robots.txt doesn't apply"

It is different to how Google crawls and makes a database of info, but... Not sure how I feel. It's a greenfield out there.

[–] NotForYourStereo@lemmy.world 8 points 3 days ago (1 children)

There's no question about "how to feel."

If the user wants information, they can seek it out themselves. No bots means no bots.

[–] beeng@discuss.tchncs.de 1 points 3 days ago (1 children)

"Themselves" define that. Can I use Python requests?

[–] MTK@lemmy.world 6 points 3 days ago

No, the point of it is only live interactive browsing.

The closest thing would be lynx, anything less than that should respect robots.txt

Of course as a single user, you don't really hace an impact and no one cares if you decide to ignore it, but once you are talking about automated systems...