196

18146 readers

1167 users here now

Be sure to follow the rule before you head out.

Rule: You must post before you leave.

Other rules

Behavior rules:

No bigotry (transphobia, racism, etc…)
No genocide denial
No support for authoritarian behaviour (incl. Tankies)
No namecalling
Accounts from lemmygrad.ml, threads.net, or hexbear.net are held to higher standards
Other things seen as cleary bad

Posting rules:

No AI generated content (DALL-E etc…)
No advertisements
No gore / violence
Mutual aid posts are not allowed

NSFW: NSFW content is permitted but it must be tagged and have content warnings. Anything that doesn't adhere to this will be removed. Content warnings should be added like: [penis], [explicit description of sex]. Non-sexualized breasts of any gender are not considered inappropriate and therefore do not need to be blurred/tagged.

If you have any questions, feel free to contact us on our matrix channel or email.

Other 196's:

founded 2 years ago

MODERATORS

moss@lemmy.blahaj.zone

greembow@lemmy.blahaj.zone

moss@lemmy.world

queue@beehaw.org

funky_rodent@lemmy.blahaj.zone

PeachyMcPeachface@lemmy.blahaj.zone

threegnomes@lemmy.blahaj.zone

greembow@lemmy.world

remotelove@lemmy.ca

Roflmasterbigpimp@feddit.de

A_Very_Big_Fan@lemm.ee

qaz@lemmy.blahaj.zone

A_Very_Big_Fan@lemmy.world

qaz@lemmy.sdf.org

qaz@lemmy.world

qaz@sh.itjust.works

954

Mayonnaise Rule (files.catbox.moe)

submitted 2 years ago by Gork@lemm.ee to c/196@lemmy.blahaj.zone

67 comments fedilink hide all child comments

(page 2) 18 comments

sorted by: hot top controversial new old

[–] Mac@mander.xyz 2 points 1 year ago

[–] gerryflap@feddit.nl 1 points 2 years ago (1 children)

I think these models struggle with this because they don't process text as individual characters, but rather as tokens that often contain parts of a word. So the model never sees the actual characters within a token, and can only infer the contents of a token from the training data itself if the training data contains more information about it. It can get it right, but this depends on how much it can infer from training data and context. It's probably a bit like trying to infer what an English word sounds like when you've only heard 10% of the dictionary spoken aloud and knowing what it sounds like isn't actually that important to you.

More info can be found here: https://platform.openai.com/tokenizer

[–] Krauerking@lemy.lol 1 points 2 years ago

Ok, so, tokenization of the words is why I get that I have seen tech nerds get so excited about a system that allows for being able to come up with synonyms for words that were auto-generated that have a basic ability to sometimes be correct by looking at the words before and after it....

But it's such a shitty way to look up synonyms! Using the words on either side doesn't mean you found a synonym just that you found another word that might work and it still has to use the full horsepower of ridiculously overpowered system.

Or you could have a lookup table that just reads the frickin word and has alternate synonyms predefined and it was able to run in word 97.

It's ridiculous that we think this is better in any meaningful way instead of just wasteful development.

load more comments