this post was submitted on 09 Aug 2025
187 points (99.0% liked)

Ask Lemmy

33916 readers
947 users here now

A Fediverse community for open-ended, thought provoking questions


Rules: (interactive)


1) Be nice and; have funDoxxing, trolling, sealioning, racism, and toxicity are not welcomed in AskLemmy. Remember what your mother said: if you can't say something nice, don't say anything at all. In addition, the site-wide Lemmy.world terms of service also apply here. Please familiarize yourself with them


2) All posts must end with a '?'This is sort of like Jeopardy. Please phrase all post titles in the form of a proper question ending with ?


3) No spamPlease do not flood the community with nonsense. Actual suspected spammers will be banned on site. No astroturfing.


4) NSFW is okay, within reasonJust remember to tag posts with either a content warning or a [NSFW] tag. Overtly sexual posts are not allowed, please direct them to either !asklemmyafterdark@lemmy.world or !asklemmynsfw@lemmynsfw.com. NSFW comments should be restricted to posts tagged [NSFW].


5) This is not a support community.
It is not a place for 'how do I?', type questions. If you have any questions regarding the site itself or would like to report a community, please direct them to Lemmy.world Support or email info@lemmy.world. For other questions check our partnered communities list, or use the search function.


6) No US Politics.
Please don't post about current US Politics. If you need to do this, try !politicaldiscussion@lemmy.world or !askusa@discuss.online


Reminder: The terms of service apply here too.

Partnered Communities:

Tech Support

No Stupid Questions

You Should Know

Reddit

Jokes

Ask Ouija


Logo design credit goes to: tubbadu


founded 2 years ago
MODERATORS
 

I keep trying to find things like “making waffles from sour dough discard” and all the sites are the same: long meandering paragraphs full of links to other things on the site with dubious instructions.

Considering at this point I can pretty much identify the type of site by looking at it; are there good extensions or search engines which might remove them from search results?

all 40 comments
sorted by: hot top controversial new old
[–] BlameTheAntifa@lemmy.world 14 points 2 days ago

Recipe sites got bad long before AI thanks to SEO changes by Google that rewarded long, meandering text with repetitive term usage.

[–] Blackmist@feddit.uk 20 points 2 days ago

Sure there is, and I'll tell you all about it right after this meandering story about my grandmother growing up on a farm.

[–] crimsonpoodle@pawb.social 89 points 2 days ago (2 children)

Update: for now it seems duck duck go’s date range filter is kinda a magic bullet for this type of thing. Set the range between 2010 and 2020 and the top results for a lot of temporally agnostic searches.

[–] burntbacon@discuss.tchncs.de 2 points 1 day ago

I don't think you're going back far enough. As others have pointed out, the problem existed due to seo stuff long before any LLM would see common usage, even in advertising rat races. Copy-pasted-extended descriptions became a problem around 2015, in my filtering journey. That's where I see real blogs (and just simple websites) start to be the majority of search results.

[–] crandlecan@mander.xyz 18 points 2 days ago (1 children)

I've switched to presearch.com long ago. No more tracking.

[–] brucethemoose@lemmy.world 5 points 2 days ago* (last edited 1 day ago) (1 children)

This is one of the neater concepts for blockchain I’ve seen, though the “PRE” coin is giving me very serious pause… I mean, I guess they have to make money, but still.

EDIT: I take this back. It's a plan for a neat concept that's not implemented yet... Even though the crypto token is.

The UI is pretty nice though.

[–] crandlecan@mander.xyz 2 points 1 day ago (1 children)

Just ignore that and don't make a wallet? I don't think they're making much money from the coin. Main revenu stream is the ads, which since recently got sneaked into the search results. But you'll get used to that quickly. Its search results are far better than the Duck's... :)

[–] brucethemoose@lemmy.world 5 points 1 day ago

It seems like it's just a Google search fetcher now. I did a couple of searches side by side (both extremely niche, and popular SEO tests) and got almost identical results, down to the exact same ordering and forum posts, minus the YouTube and ad spam and such on Google.

That's... fine I guess? But it's nothing like what they advertise, and also something many services provide without muddying it.

[–] fetchies@lemmy.blahaj.zone 47 points 2 days ago (1 children)

Try uBlacklist, with these blocklists:

# AI Spam
https://raw.githubusercontent.com/laylavish/uBlockOrigin-HUGE-AI-Blocklist/main/list_uBlacklist.txt
# Copycat Sites
https://raw.githubusercontent.com/quenhus/uBlock-Origin-dev-filter/main/dist/other_format/uBlacklist/global.txt
# SEO Spam & Junk
https://raw.githubusercontent.com/NotaInutilis/Super-SEO-Spam-Suppressor/main/ublacklist.txt
[–] naught101@lemmy.world 1 points 2 days ago (1 children)

Thanks for this! Been looking for something like it. I guess it just blocks the sites though, and doesn't block them appearing in search results?

[–] fetchies@lemmy.blahaj.zone 2 points 2 days ago (2 children)

So uBlacklist actually removes sites specified in your subscribed rulesets from your search results.

I found it helped out a lot on programming or tech support searches, as there's so many content mirrors and SEO spam sites for that domain.

I also found that using search shortcuts has helped me reduce the need for the middle-man in a surprising amount of my searches. (e.g., “@w” for Wikipedia, “@g” for the Gentoo Wiki, “@git” for GitHub, “@p” for ProtonD, “@y” for YouTube, “@s” for Stack Overflow, et cetera)

[–] naught101@lemmy.world 1 points 1 day ago

Oh, cool. Thank you!

[–] DoGeeseSeeGod@lemmy.blahaj.zone 1 points 1 day ago (1 children)

Is there one for DuckDuckGo?

[–] fetchies@lemmy.blahaj.zone 2 points 1 day ago

Yes it supports DDG and every major search engine (see: GitHub). You might have to manually give it permission first (like for specific SearXng instances), after which it'll work.

If you meant can you add DDG as a search shortcut, also yes.

[–] TropicalDingdong@lemmy.world 20 points 2 days ago (1 children)

I don't have an answer. What I can tell you is that it is BAD. I pretty much can't find useful results post 2022/23

[–] jballs@sh.itjust.works 10 points 2 days ago (1 children)

I just realized how freaking bad this was - especially on Reddit. I was doing a search for IPTV providers. I checked reddit first, since in the last that has been a reliable way for finding people recommending stuff, not just whatever is search engine optimized.

Holy shit was it bad. Nearly every comment was AI slop. They all followed a very similar structure. Tons and tons of comments like:

Appreciate the share! LunoTV is great — super affordable and has tons of channels. Definitely recommend checking it out.

Appreciate the share! Xalvon IPTV offers tons of channels and the streaming has been really smooth on my devices. Good value overall.

LunoTV has been great for me. Tons of channels, really affordable, and easy to set up.

Luno TV has been great for me. Tons of channel selection is huge, and their app runs smoothly .

We're seriously getting to a point where we won't be able to search for anything due to AI fake results.

[–] hardcoreufo@lemmy.world 4 points 2 days ago (1 children)

Did you find an iptv service? I've been thinkibg about getting one but whenever i look into i just find junk.

[–] jballs@sh.itjust.works 4 points 2 days ago

Will report back. I used one a couple years ago, but it's since gone out of business. Like I said, yesterday I was having trouble sifting through all the slop to find an actual real answer. I signed up for a free trial of one and will respond if it's any good. I'm not holding my breath through.

[–] kossa@feddit.org 12 points 2 days ago

With kagi.com you can at least blacklist domains to never show up in your search again. I guess there are some filter lists out there already, to get a head start.

[–] razorcandy@discuss.tchncs.de 12 points 2 days ago (2 children)

You can add filters to unlock origin to remove AI overviews if you are using Google, or add “-ai” at the end of your search query. That, or switch to a browser which doesn’t have AI overviews.

I think what you are describing is just typical filler that you find on recipe websites and not AI. They’re all full of unnecessary stories and links, but most will have a “jump to recipe” button somewhere so you can skip the BS.

[–] SGforce@lemmy.ca 25 points 2 days ago (1 children)

The vast majority of that blogspam has been algorithmically generated long before LLMs

[–] roguetrick@lemmy.world 8 points 2 days ago

Mechanical turk even before that which frankly wasn't much better.

[–] crimsonpoodle@pawb.social 2 points 2 days ago

I’ll have to try it out ^^

[–] moonlight@fedia.io 7 points 2 days ago (1 children)
[–] friend_of_satan@lemmy.world 4 points 2 days ago

Vintage websites going up in value.

[–] garbagebagel@lemmy.world 7 points 2 days ago

Ironically, the stupid AI overview is more convenient for some stuff now because it cuts down to the core of what you're asking and you don't have to deal with those goddamn SEO terribly written websites. I still prefer not to use it but when it's really bad and I really can't find anything useful, I will leave it on sometimes just to at least get a general answer (usually for how to-s and stuff that's easily verifiable)

[–] imsufferableninja@sh.itjust.works 7 points 2 days ago (1 children)

Use a search engine that doesn't do bullshit LLM stuff. Kagi, for example

[–] LifeInMultipleChoice@lemmy.dbzer0.com 16 points 2 days ago (4 children)

If it were websites made with AI, why wouldn't Kagi find them just the same? Search engines just search key terms. Can't see how it would know if the term was typed by a person or a bot. That said I used SearchXNG and it wasn't bad.

[–] Dave@lemmy.nz 10 points 2 days ago (1 children)

Kagi does seem to cut out a lot of blogspam. I think Google is incentivised to send people to these sites with adwords ads on them.

[–] jh34@lemmy.world 2 points 1 day ago

You can also block any slopsite that makes it past so you only have to see it once.

[–] tyler@programming.dev 7 points 2 days ago (1 children)

Kagi doesn’t know but I think kagi’s indexing is just better so you don’t get the blogspam as much and when you do you can block it across your searches.

[–] P1nkman@lemmy.world 4 points 2 days ago

And customize the result page. Mine looks like the early 2000's, without the image, text only. It's just a great customizable search engine!

Ah, I guess I misunderstood the problem

[–] tal@lemmy.today 5 points 2 days ago* (last edited 2 days ago)

No, because there's no reliable way to distinguish AI-generated spam sites from non-AI-generated spam sites. I'll also add that I don't expect there to be one promptly forthcoming: any attempt to identify them is going to run into improved systems, and that's gonna happen even if the systems aren't explicitly intending to evade detection. If it were easy, Google would have done so years back. I can recognize some now, but the SEO spam crowd that's creating this is trying hard to pollute search engine results, and if someone implements a generalized "block" that's effective, they're going to keep looking for alternatives until they find something that gets through.

On Kagi, I can set the acceptable date range on results to prior to the emergence of LLMs, but that cuts out a lot of material that I want to see. For some searches, that might work, but it's not really a general solution.

You can manually blacklist or deprioritize sites on Kagi. Probably can either run some sort of local proxy or Greasemonkey-style plugin that would let you do so in browser on any search engine. Problem is that there are people making these sites faster than you're going to be banning them.

Kagi's also got a "pin" and a "raised priority" feature for a list of sites, and I suppose could whitelist some "known good" sites. Kagi's "blacklist/deprioritize/prioritize/pin" feature does not have the ability to exchange sites between users (and I imagine that there'd be some privacy issues with doing so) aside from Kagi running a "leaderboard" of the most-blacklisted/deprioritized/prioritized/pinned sites. One could probably do the "proxy" or "plugin" route as well for a variety of websites on other search engines. Any general solution would need to have some level of interchange, since requiring every individual user to maintain a "killfile" on websites is going to be impractical. It may be that the human labor involved in curation is outweighed by how cheap it is to generate new websites; not sure.

At some point, I assume that it may become practical to just make a conservative whitelist of "non-spam" sites that accepts that many useful websites will be excluded because we just can't validate them as not being non-spam. Probably require human curation, which is either going to need volunteer labor or a commercial service.

There's also a secondary problem that if you curate content at the domain level, Web 2.0 sites that permit posting content (Reddit, Wikipedia, the Threadiverse, etc) can have individual users inserting AI-generated spam. So a general solution is probably going to need to permit some sort of sub-domain level filtering for at least major sites.

And there's also the wrinkle that a "trusted good" site or user can become a spammer at some point. Spammers/people who want to run influence operations have been buying high-karma Reddit accounts


and the reputation that comes with them


for quite some years. Domains expire, or their operators change. Reputation has value, and it can be sold. So that also has to be addressed.

This isn't really a qualitative change. I mean, people have hand-crafted spam websites that try to grab searchers before. It's just that the ability to use a computer to do it is way more cost-efficient, brings the cost way down, and thus opens up a lot of opportunity for spam that wouldn't have made sense financially before. So what you're really aiming to do is to get the cost to make a spam website up. One possibility


which I am absolutely confident that TLS certificate issuers would like


would be to have tiers of TLS certificate, some of which are a lot more expensive. Search engine indexers could check and validate the TLS "cost tier" when indexing a site. That will artificially inflate the cost of running a website, and can be done to an arbitrary degree. That's not fantastic, since it also tends to cut out non-spam individual/low-cost websites, but if you're a large company somewhere, the price is basically a rounding error compared to what a spammer needs to make to make his super-cheap-to-generate LLM-generated website worthwhile. Could be a component in a system that takes into account other factors.

[–] felixwhynot@lemmy.world 2 points 2 days ago
[–] chosensilence@pawb.social 1 points 2 days ago

yeah, use engines like startpage.com instead.

[–] forrgott@lemmy.sdf.org 1 points 2 days ago (1 children)

I use Firefox with the udm14 extension for Google - gives me only the web results. No AI, no shopping, images, etc, only website results.

[–] breecher@sh.itjust.works 3 points 2 days ago (1 children)

Plenty of websites are AI created, with more being created every hour, which I gather is what OP is talking about.

And there isn't really any useful wait to filer those out, with only lots of profitable reasons for people to create more of them, and they will shortly be the complete death of the internet as we knew it.

[–] forrgott@lemmy.sdf.org 1 points 2 days ago

Oof.

And I read it wrong like multiple times.

Thank you