this post was submitted on 11 Sep 2025
807 points (96.4% liked)
Technology
75017 readers
2727 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
The famous 'invisible D' of Connecticut, my favorite SCP.
That actually sounds like a fun SCP - a word that doesn't seem to contain a letter, but when testing for the presence of that letter using an algorithm that exclusively checks for that presence, it reports the letter is indeed present. Any attempt to check where in the word the letter is, or to get a list of all letters in that word, spuriously fail. Containment could be fun, probably involving amnestics and widespread societal influence, I also wonder if they could create an algorithm for checking letter presence that can be performed by hand without leaking any other information to the person performing it, reproducing the anomaly without computers.
ct -> d is a not-uncommon OCR fuck up. Maybe that's the source of it's garbage data?
No, LLMs produce the most statistically likely (in their training data) token to follow a certain list of tokens (there's nothing remotely resembling reasoning going on in there, it's pure hard statistics, with some error and randomness thrown in), and there are probably a lot more lists where Colorado is followed by Connecticut than ones where it's followed by Delaware, so they're obviously going to be more likely to produce the former.
Moreover, there aren't going to be many texts listing the spelling of states (maybe transcripts of spelling bees?), so that information is unlikely to be in their training data, and they can't extrapolate because it's not really something they do and because they use words or parts of words as tokens, not letters, so they literally have no way of listing the letters of a word if said list is not in their training data (and, again, that's not something we tend to write, and if we did we wouldn't include d in Connecticut even if we were reading a misprint). Same with counting how many letters a word has, and stuff like that.
SCP-00WTFDoC (lovingly called "where's the fucking D of Connecticut" by the foundation workers, also "what the fuck, doc?")
People think it's safe, because it's "just an invisible D", not even a dick, just the letter D, and it only manifests verbally when someone tries to say "connecticut" or write it down. When you least expect it, everyone heard "Donnedtidut", everyone read that thing and a portal to that fucking place opens and drags you in.
Words are full of mystery! Besides the invisible D, Connecticut has that inaudible C...
I hear the Invisible D and Silent C are happily married.