You know, I think I'm overdue for a donation to Wikipedia. They honestly might end up being the last bastion of sanity
Technology
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
I downloaded the entirety of wikipedia as of 2024 to use as a reference for "truth" in the post-slop world. Maybe I should grab the 2022 version as well just in case...
If anyone has specific questions about this, let me know, and I can probably answer them. Hopefully I can be to Lemmy and Wikimedia what Unidan was to Reddit and ecology before he crashed out over jackdaws and got exposed for vote fraud.
How do I get started on contributing to new articles (written by a human) for my language? I always wanted to help out but never found an easy way to do so.
I'm going to write this from the perspective of the English Wikipedia, but most specifics should have some analog in other Wikipedias. By "contribute to new articles", do you mean create new articles, contribute to articles which are new that you come across, or contribute to articles which you haven't before (thus "new to you")? Asking because the first one has a very different – much more complicated – answer from the other two.
Both. How do I get started creating a new article, and how do I contribute to them, or other articles?
The short answer is that I really, really suggest you try other things before trying to create your first article. This isn't just me; every experienced editor will tell you that creating a new article is one of the hardest things any editor can do, let alone a newer one. It's why the task center lists it as being appropriate for "advanced editors". Finding an existing article which interests you and then polishing and expanding it is almost always more rewarding, more useful, easier, and less stressful than creating an article from scratch. And if creating articles sounds appealing, expanding existing stub articles is great experience for that.
The long answer is "you can", but it's really hard:
- New editors are subject to Articles for Creation, or AfC, when creating an article. The article sits in a draft state until the editor flags it for review. The backlog is very long, and while reviewers can go in any order they want, they usually prioritize the oldest articles out of fairness and because most AfC submissions are about equal in urgency and time consumption. "Months" is the expected waiting time.
- If you're not using the English Wikipedia, you can try translating over a well-established article from English. There's no rule that says sources have to be in the language of the Wikipedia they're on, although it's still considered a big plus if sources are in the same language.
- Wikipedia's notability guidelines are predicated on you understanding other policies and guidelines like "reliable sources" and "independent sources". They're also intentionally fuzzy so people don't play lawyer and follow the exact letter without considering the spirit of the guideline.
- The English Wikipedia currently has over 7 million articles. There are still a lot of missing articles (mostly in taxonomy, where notability is almost guaranteed), but you really need to know where to look.
- When choosing an article subject, it's extremely important to avoid COI.
- Assuming you have a subject you think meets criteria, now you have to go out and find reliable, independent sources with substantial coverage of the subject to confirm your hypothesis.
- Now you need to start the article, and you need to do this in a manner which:
- Is verifiable (all claims are cited)
- Is not original research (i.e. nothing you say can be based on "because I know it")
- Is reliable (all citations are to reliable sources)
- Is neutral (you've minimized bias as much as you can, let the sources speak for themselves, and made sure your source selection isn't biased)
- Is stylistically correct (there's a manual of style, but just use your best judgment, and small mistakes can be copy-edited out by people familiar with style guidelines)
- If the article is nominated for deletion, you have to keep your cool and argue based solely on guidelines (not on perceived importance of the subject) that the article should be kept.
- New articles are almost always given more scrutiny than articles which have been around; this isn't a cultural problem as much as it is a heuristic one.
- An article deleted feels much more personal than edits reverted (despite the fact that subject notability is 100% out of your control).
Some of these apply to normal editing too, but working within an article others have worked on and might be willing to help with is vastly easier than building one from scratch. If you want specific help in picking out, say, an article to try editing and are on the English Wikipedia, I have no problem acting like bowling bumpers if you're afraid your edits won't meet standards.
Well now I want to know about jackdaws and voter fraud
what about the jackdaws thing?
unzips
Is there a danger that unscrupulous actors will try and build out a Wikipedia edit history with this and try to mass skew articles with propaganda using their "trusted" accounts?
Or what might be the goal here? Is it just stupid and bored people?
So Wikipedia has three methods for deleting an article:
- Proposed deletion (PROD): An editor tags an article explaining why they think it should be uncontroversially deleted. After seven days, an administrator will take a look and decide if they agree. Proposed deletion of an article can only be done once, even this can be removed by anyone passing by who disagrees with it, and an article deleted via PROD can be recreated at any time.
- Articles for deletion (AfD): A discussion is held to delete an article. Pretty much always, this is about the subject's notability. After the discussion (a week by default), a closer (almost always an administrator, especially for contentious discussions) will evaluate the merits of the arguments made and see if a consensus has been reached to e.g. delete, keep, redirect, or merge. Articles deleted via discussion cannot be recreated until they've satisfied the concerns of said discussion, else they can be summarily re-deleted.
- Speedy deletion: An article is so fundamentally flawed that it should be summarily deleted at best or needs to be deleted as soon as possible at worst. The nominating editor will choose one or more of the criteria for speedy deletion (CSD), and an administrator will delete the article if they agree. Like a PROD, articles deleted this way can be recreated at any time.
This new criterion has nothing to do with preempting the kind of trust building you described. The editor who made it will not be treated any differently than without this criterion. It's there so editors don't have to deal with the bullshit asymmetry principle and comb through everything to make sure it's verifiable. Sometimes editors will make these LLM-generated articles because they think they're helping but don't know how to do it themselves, sometimes it's for some bizarre agenda (e.g. there's a sockpuppet editor who's been occasionally popping up trying to push articles generated by an LLM about the Afghan–Mughal Wars), but whatever the reason, it just does nothing but waste other editors' time and can be effectively considered unverified. All this criterion does is expedite the process of purging their bullshit.
I'd argue meticulously building trust to push an agenda isn't a prevalent problem on Wikipedia, but that's a very different discussion.
Thank you for your answer, I really feel happy that Wikipedia is safe then. Stuff happening nowadays makes me always think of the worst.
Do you think your problem is similar to open-source developers fighting AI pull requests? There it was theorised that some people try to train their models by making them submit code changes and abuse the maintainers' time and effort to get training data.
Is it possible that this is an effort to steal work from Wikipedia editors to get you to train their AI models?
Is it possible that this is an effort to steal work from Wikipedia editors to get you to train their AI models?
I can't definitively say "no", but I've seen no evidence of this at all.
How frequently are images generated/modified by diffusion models uploaded to Wikimedia Commons? I can wrap my head around evaluating cited sources for notability, but I don't know where to start determining the repute of photographs. So many images Wikipedia articles use are taken by seemingly random people not associated with any organization.
So far, I haven't seen all that many, and the ones that are are very obvious like a very glossy crab at the beach wearing a Santa Claus hat. I definitely have yet to see one that's undisclosed, let alone actively disguising itself. I also have yet to see someone try using an AI-generated image on Wikipedia. The process of disclaiming generative AI usage is trivialized in the upload process with an obvious checkbox, so the only incentive not to is straight-up lying.
I can't say how much this will be an issue in the future or what good steps are to finding and eliminating it should it become one.
How would you know if an image is AI generated? That was easy to do in the past, but have you seen what they are capable of now?
Oh for fuck's sake...
I'd not considered this was happening (people submitting AI wiki articles)
Isn't Wikipedia where AI gets like half of its information from anyway?
Reddit seems to be a substantial source if the many bits of questionable advice that google famously offered are any indication
reddit allows GOOGLE to scrape it for its AI, because google allows them to use thier v3captcha for thier moderation and banning purposes.
Do you think these people surreptitiously submitting articles written by AI are gonna be capable of validating what they're submitting is even true? Particularly if the (presumably effective) Wikipedia defense for this is detecting made up citations?
This kind of thing makes something valuable to everyone, like Wikipedia, ultimately a less valuable resource, and should be resisted and rejected by anyone with their head screwed on
Oh, I think this is a good move by Wikipedia. I just hate to imagine the disaster that ouroboros of AI citing AI generated Wikipedia articles would come up with.
The headline reflects a sensible move by Wikipedia to protect content quality. AI-generated articles often include errors or fake citations, so giving admins the authority to quickly delete such content helps maintain accuracy and credibility. While there's some risk of overreach, the policy targets misuse, not responsible AI-assisted editing, and aligns with Wikipedia’s existing standards for removing low-quality material.
Did you generate this comment with a LLM for irony?
Ha, fair question! But no irony here—I actually wrote it myself. That said, it's kind of funny how quickly we've reached the point where any well-written, balanced take sounds like it could be AI-generated. Maybe that's part of the problem we're trying to solve!
But no irony here—I actually wrote it myself.
I see that em dash I know what you're doing
It really is crazy how predictable it is.
Even saying fair question set off alarms. At this point saying anything good about a response at the start is immediate red flag.
These lists of red flags make me feel like I must be a replicant. I wrote a comment just like that one, em dash and all, on a different site just the other day, with my own organic brain!
My first instinct was to use an em dash instead of that last comma, but it seemed too on the nose.
I've started to drop using emdashes because AI ruined them--bastards.
Honestly I don't think dropping them is a particular loss. I use them in work writing and then in more casual writing if I happen to be using the keyboard I use for that work since I have a key binding for it, but that's all. The distinction of dash length (or of dashes from hyphens) doesn’t bring anything useful to our writing in my opinion
Username does not check out.
It always feels weird when people write an essay as if this is their final quarter project for high school. Too neat, thoughts too organized, much flowery proses.
I do that on almost all these posts now. And I've stopped leaving in em dashes.
It's a step. Why wouldn't they default to not accepting any AI generated content, and maybe have a manual approval process? It would both protect the content and discourage LLM uses where llms suck.
Manual approval process would kill the site I think, there's just so much content on it that gets updated constantly it would just grind it all to a halt
Right, and by manual approval it just would be the absolute lowest priority. Kind of like the automated message "we're expecting higher than normal call volumes" as companies gently tell us their margins are more important than their customers.
They call the rule "LLM-generated without human review". The specific criteria are mistakes that LLMs frequently make.
common wikipedia w