lvxferre

joined 4 years ago
MODERATOR OF
[–] lvxferre@lemmy.ml 2 points 2 years ago* (last edited 2 years ago)

The source that I've linked mentions semantic embedding; so does further literature on the internet. However, the operations are still being performed with the vectors resulting from the tokens themselves, with said embedding playing a secondary role.

This is evident for example through excerpts like

The token embeddings map a token ID to a fixed-size vector with some semantic meaning of the tokens. These brings some interesting properties: similar tokens will have a similar embedding (in other words, calculating the cosine similarity between two embeddings will give us a good idea of how similar the tokens are).

Emphasis mine. A similar conclusion (that the LLM is still handling the tokens, not their meaning) can be reached by analysing the hallucinations that your typical LLM bot outputs, and asking why that hallu is there.

What I'm proposing is deeper than that. It's to use the input tokens (i.e. morphemes) only to retrieve the sememes (units of meaning; further info here) that they're conveying, then discard the tokens themselves, and perform the operations solely on the sememes. Then for the output you translate the sememes obtained by the transformer into morphemes=tokens again.

I believe that this would have two big benefits:

  1. The amount of data necessary to "train" the LLM will decrease. Perhaps by orders of magnitude.
  2. A major type of hallucination will go away: self-contradiction (for example: states that A exists, then that A doesn't exist).

And it might be an additional layer, but the whole approach is considerably simpler than what's being done currently - pretending that the tokens themselves have some intrinsic value, then playing whack-a-mole with situations where the token and the contextually assigned value (by the human using the LLM) differ.

[This could even go deeper, handling a pragmatic layer beyond the tokens/morphemes and the units of meaning/sememes. It would be closer to what @njordomir@lemmy.world understood from my other comment, as it would then deal with the intent of the utterance.]

[–] lvxferre@lemmy.ml 3 points 2 years ago

Soap and water do wonders for 90% of the restroom cleaning.

The problem is that the other 10% are important too.

[–] lvxferre@lemmy.ml 4 points 2 years ago (2 children)

Not quite. I'm focusing on chatbots like Bard, ChatGPT and the likes, and their technology (LLM, or large language model).

At the core those LLMs work like this: they pick words, split them into "tokens", and then perform a few operations on those tokens, across multiple layers. But at the end of the day they still work with the words themselves, not with the meaning being encoded by those words.

What I want is an LLM that assigns multiple meanings for those words, and performs the operations above on the meaning itself. In other words the LLM would actually understand you, not just chain words.

[–] lvxferre@lemmy.ml 1 points 2 years ago

Yup, that's the stuff. It's mostly a finishing touch, to get rid of bacteria.

[–] lvxferre@lemmy.ml 14 points 2 years ago* (last edited 2 years ago) (4 children)

At the very least, I'd recommend you:

  • gloves - because you'll get really close to that gross shit. You don't want to touch it.
  • a sponge - it doesn't need to be new; your old kitchen sponge is enough, just don't use it again in the kitchen. Use the yellow side to spread the cleaning agent, and the green side to remove obnoxious grime stuck to something. (Do it gently, and only with a really old sponge, to avoid scratching the surface.)
  • a bucket - mostly to mix some soap and water.
  • a dry rag - mostly for finishing/drying. A cringey old shirt that you won't be using again is usually enough.
  • toilet brush - don't use the sponge to clean inside the toilet bowl; you'll be spreading the bacteria from your shit and piss to the rest of the restroom.

Everyone has the cleaning agents that they swear upon, so look for something that works for you. For me it's

  • alcohol vinegar - to get rid of that brown crust in the sink (water in my city is hard as a brick) and around the shower drain. I usually apply it, wait a few minutes, then use the sponge to scrub it a bit. Then I remove the vinegar with the rag.
  • bleach - exclusively used inside the toilet bowl. I squish some bleach there, then scrub it with the toilet brush, then flush it off, making sure that there's no bleach behind.
  • disinfecting agent - I squish a bit of that inside the toilet bowl and just leave it there. It smells good, and it gets rid of the bacteria.
  • an ammonium-based cleaning agent - I squish it on obvious grime on the walls (except the above), then scrub it with the sponge.
  • soap and water - to "wash" the walls with the sponge.
  • plain water with some disinfecting agent - to rinse it. Then I just remove the excess water with the rag and let the restroom to dry naturally (with closed doors otherwise my cats will step on the bathroom, step outside, and now I got to clean the bathroom again plus the corridor and furniture).

Important detail: do not mix any two of the cleaning agents that I've mentioned. Specially not ammonium and bleach.

For reference, the disinfecting agent that I use is called "pinho sol", but I have no idea if it's sold outside Brazil. You probably have some similar product wherever you live.

[–] lvxferre@lemmy.ml 1 points 2 years ago (4 children)

Complexity does not mean sophistication when it comes to AI and never has and to treat it as such is just a forceful way to make your ideas come true without putting in the real effort.

It's a bit off-topic, but what I really want is a language model that assigns semantic values to the tokens, and handles those values instead of directly working with the tokens themselves. That would be probably far less complex than current state-of-art LLMs, but way more sophisticated, and require far less data for "training".

[–] lvxferre@lemmy.ml 1 points 2 years ago

creating a label and checking the skip invoice box

That works great too, specially if you want to use less foolproof filters. Or even a mix of both strategies.

[–] lvxferre@lemmy.ml 67 points 2 years ago (6 children)

Oh "great", more crap between Ctrl and Alt.

[Grumpy grandpa] In my times, the space row only had five keys! And we did more than those youngsters do with eight, now nine keys!

[–] lvxferre@lemmy.ml 1 points 2 years ago

Thank you! It's working now.

[–] lvxferre@lemmy.ml 2 points 2 years ago (2 children)

It's giving me an error, "Error Finding Entity // Make sure you spelled the entity correctly and that it exists!", when I use my username for lemmy.ml; curiously it works well when I do it for my beehaw.org account.

[–] lvxferre@lemmy.ml 34 points 2 years ago (1 children)

Create your account through old.reddit.com; when it asks you for an email, simply press "next". And, if you need an e-mail provider for some other reason, protonmail.com doesn't ask you for your phone number.

That said do you really need a reddit account?

[–] lvxferre@lemmy.ml 1 points 2 years ago* (last edited 2 years ago)

[Note: this is my personal take, not Chomsky's]

We can recognise colours and things even without properly labelling them. (Colour example: I have no clue on how to call the colour of my cat's fur, but I'm fairly certain to remember thus recognise it.) However, it's hard to handle them logically this way.

  • if you are outside and it is raining, then you get wet
  • if you get wet, you might get sick
  • so if you are outside and it is raining, you might get sick

And at least for me this is the main role of the internal monologue. It isn't just about repeating the state of the things, it's about connecting pieces of info together, as if I was explaining the link to another person.

Perhaps those without verbal internal monologue/dialogue have a more persistent innate language, that is not overwritten by common external language?

Possible; I don't know, really. It's also possible that the "innate language" doesn't really exist, only the innate ability to learn a language; but that ability is already enough to structure simple reasoning.

 

There's a general tendency across languages to order the adjectives connected to the same noun the same way; for example, usually adjectives referring to colour or other innate attributes are closer to the noun than the ones dealing with subjective attributes. This tendency is so strong that made some linguists (and psychologists) believe that this order might be actually innate.

This study contradicts that. Excerpt from the conclusion:

Taking these findings together, we have argued that there is no universal hierarchy for adjective ordering imposing a hard constraint which then translates into one rigid, unmarked order.

 

If you want, please add your tips to the comments.

Here are mine - just a bunch of opinions/suggestions from someone who used to be a forum mod, then a subreddit mod.

As this is a rather big wall of text, I'll split it into sections, contained within spoiler tags.

Mindset and duty.

A happy mod is a good mod. Take care of your personal life first.

Your comm[unity] is not your personal possession or project, it's a collective effort. You're just its representative - be humble but proud about it.

Use your comm as any other user would. Be active in it, interact with other users, discuss, learn, have fun.

If you don't enjoy your comm any more, for whatever reason, pass the torch to newer mods.

Check your comm at least a few times per day. A quick peek is fine for slower comms.

It's useful to follow the RSS feed of the comms that you moderate, as it'll be quicker to spot rule-breaking posts. You can do it here:

In days that your comm is too slow, specially at the beginning, it's your job to provide content for your comm.

Recruitment

Avoid recruiting mods who:

  • never post/comment in your comm, or only did it after you announced "we want new mods"
  • asked over and over to be a mod
  • already mod lots of other comms
  • claim to have a "vision" about your comm, and propose 9001 drastic changes for it*.
  • rush towards certainty on things that they cannot reliably know (intrinsically unfair)
  • cannot reasonably infer things from context (ditto)

*Major exception: if your comm got some biiiig problem, and nobody seems to be able to solve it.

If you get multiple people willing to mod, you can be a bit pickier. Use open-ended questions to trial them. Ideally new mods should:

  • be active members of the comm
  • work well alongside the rest of the mod team. (are they strict rule enforcers, or more on the "let users have fun" side?)
  • active in different hours than the rest of the mod team. (e.g. night owls, different longitudes, etc.)

A lazy mod is less worse than a well-intentioned but dumb mod.

Rules and their enforcement

You do not know the users' intentions, thoughts or beliefs. However, you do know how they behave and what they say. Use the later, not the former, in your rules.

If a rule cannot be enforced, it is not a rule. It's at most a request.

Enforce rules by spirit, but the letter should follow fashion. There's some room to be sloppy with this with smaller comms, but not the bigger ones.

Do not enforce "hidden rules". If there's some shitty behaviour that needs to be addressed, do it in the open.

Do not enforce new rules retro-actively. You're just creating more work for yourself and pissing off users, for no good reason.

Be succinct when phrasing the rules. If necessary/desired, write down two versions of it:

  • short version - addressing what users can/can't/need do in broad strokes, without "why". Keep it in the side panel, visible at all times.
  • long version - addressing specificities of each rule, as well as reasoning. Keep it in a post or similar.

Synchronise changes in all versions of your rules. Few things confuse users the most than rule disparity.

If your comm got more than seven rules, it's probably already too much. Consider merging them.

It's fine to use imperative in the rules ("do this", "don't do that"), as it's succinct and you're in a position to do so.

Every rule has a grey area, of things that are only arguably rule-breaking content. Try to minimise the grey area when possible, but keep in mind that you'll never get rid of it.

Beware the fluff principle: voting alone will allow only the lowest common denominator content to the top, and shove down well thought content that is hard to judge. Take that into account when creating rules.

Handling other users

Ask community input periodically. Don't use votes for this, let other users speak their mind.

Input from rule-lawyers is surprisingly useful to find issues in your rules. And a few people will be abler to phrase your own rules better than you do.

Even then, asking community input is not an excuse to relay responsibility. You're still the mod.

It's useful to keep notes of a few users from your comm:

  • notes about good, specially engaged and helpful users might be useful later on, as you're recruiting newer mods
  • notes about bad users are useful for rule enforcement. Certain types of bad behaviour are only revealed in long-term tendencies.

Activity of your comm's users outside your own comm should be only taken into account as much as it might predict their future activity in your comm. There are a few corner cases to do so, but by default you're better off not doing this.

Don't feel afraid to upstream reports of specially problematic users to the admins of your instance. Specially if they're more on the stricter side.

This is debatable, but I personally believe that a few types of users, regardless of their intentions, should be handled as you would handle trolls and shitposters. They are:

  • witch hunters: users who point fingers at other specific users without rational grounds to do so
  • entitled, bossy or whiny users: users who are eager to tell other users what to do, for their own sake
  • obtuse users: users who "conveniently" pretend to not read or understand counter-arguments of other users in discussions
  • assumers: users who are prone to say things that they cannot reliably know, specially about other users. (A superset for witch hunters.)

This sort of user is prone to piss off other users, specially the most contributive ones.

When there's a fight between users, typically, at least one of them is stupid. Make sure to know which one (or if both) before intervening.

 

Plenty Google Search users were appending "site:reddit.com" to their searches to avoid SEO and get actual human answers. This became less useful with the blackouts, and Google is actually addressing it - through a new feature called "Perspectives". Allegedly the feature highlights forums and videos from social media (TikTok, YouTube, Reddit, Quora).

This means that those search users won't beeline towards Reddit anymore. Instead there's a reasonable chance that they end in Reddit's competitors, including Youtube (owned by Alphabet, the same parent company as Google Search).

Given that 47% of the traffic of Reddit comes from organic search, this is going to hurt. A lot.

 

TL;DR: say hello to our friend u/ModCodeOfConduct, disguising threats behind feigned politeness, yet again!

 

Fun scientific paper talking about the odd rarity of *b in the current Proto-Indo-European reconstructions. It doesn't propose why this happens, but it claims that most PIE instances of *b might be actually from a later stage of the language, that the author calls "Indo-Celtic" (the common ancestor of all IE languages minus Anatolian and Tocharian; also known in the literature as "core PIE").

 

Archive link, original link. I'll copypaste the contents here.

So Long, and Thanks for All the Feedback

As you have no doubt heard by now, Reddit management introduced changes recently that have led to rule and moderation changes across many subreddits. Because of these changes, we no longer feel that Reddit is an appropriate place to post official content or refer our players to.

We want to thank you for all the feedback and discussion you've participated in in past changelog threads. You are of course welcome to post unofficial update threads going forward, and if you want to reach the team with feedback about the game, please visit our feedback site at feedback.minecraft.net or contact us on one of our official social media channels.

The text is rather polite and professional, but at the end of the day it boils down to "we'll GTFO of this place. Keep discussing Minecraft here if you want, but you won't get official content through Reddit."

And who's "we"? The one posting the text above was u/sliced_lime aka Mikael Hedberg, a game developer at Mojang and interim tech lead for the Minecraft launcher. Odds are that his position represents Mojang AB's position. That is, the position of a subsidiary of Microsoft.

Yup - a subsidiary of one of the GAFAM companies is refusing to deal with Reddit.

 

In Reddit, users can create lists of subs, called "multireddits". And you can browse the content of all those subs in a multireddit as if it was a single community. You can also share your multireddits with other people.

Reddit itself implemented the idea and never touched it again, but it be amazing in the federation. For example, someone who's interested in cooking could create the following ~~multireddit~~ multicomm:

That increases discoverability of the communities across the Lemmyverse (as people share their multicomms), and also makes it easier to handle redundant communities across instances. Because of that, I feel like the concept would be right at home in Lemmy.

 

I just deep-fried and filled a batch of the thing above. They turned out delicious, so I'm sharing here.

Here's a link for the recipe for the dough; I followed it closely, the only thing was that I subbed the butter with veg oil. The recipe yielded ~20 of them, but this depends a lot on the size that you cut them.

Recipe for the custard filling:

  • 300g sweetened condensed milk
  • 300g 20% fat milk cream
  • 300g whole milk
  • 2 eggs
  • vanilla essence
  • [optional] 1/2 tsp cornstarch

Just mix everything together until homogeneous, then heat it on low fire, while stirring constantly, until it thickens. I didn't use the cornstarch because I wanted it slightly runnier, but do add it if you like a thicker custard.

 

Those puzzles are fun to solve, so why not give them a try? Feel free to use this post to share hints or the solution as you've found it, but please use spoilers to do so.

The first three puzzles boil down to "retro-engineering" tidbits of the the grammar of three languages (Ubyx, Alabama, N|uuki). The fourth one is to deduce the words for familiar relationship used in Arabana. The fifth one is historical linguistics, deducing the sound changes from Proto-Chamic to Phan Rang Cham and Tsat.

Check this link for the puzzles of previous years, solutions, as well as versions in other metalanguages (in case you feel more comfortable solving them in another language than English).

1
submitted 2 years ago* (last edited 2 years ago) by lvxferre@lemmy.ml to c/lemmywishlist@lemmy.ml
 

I'm not suggesting this straight to the devs because come on, they got too much on their hands already.

Skin switcher

A drop-down menu allowing you to choose which style you want to use with Lemmy, like a built-in Stylus.

Reverse Slashdot voting system

You have one upvote ("like"), but plenty downvotes ("disagree", "rude", "unfunny", "off-topic", "incorrect", "insightless"). So to downvote you'd click first on the downvote button, then again on your chosen reaction.

I'm suggesting this asymmetry because good content has often multiple qualities, so it's sometimes hard to choose on which one to vote. Plus positive feedback should be as streamlined as possible, as it makes people feel good.

On the other hand, bad content has often an obvious flaw, and it's more important for the commenter/poster to know why people dislike their content than why they like it. It would demand a bit more effort to downvote (two clicks instead of one), slightly discouraging people from mindlessly downvoting.

Custom sorting system

It would require a multi-dimensional voting system, as above. Let users sort their feeds by assigning a weight to each thing that could be used to sort it - recency (as in "new"), upvotes, each type of downvote, etc. So for example if you don't care about coarse language you'd weight the "rude" downvotes as zero.

Bonus points if comm mods can set up a custom sorting system as the comm default. I could picture for example a discussion-based community ignoring "unfunny" downvotes for the sake of sorting, while a memes community would ignore if the content is off-topic.

 

https://myanimelist.net/anime/51632/Isekai_wa_Smartphone_to_Tomo_ni_2

Note: this is the final episode, so expect spoilers.

view more: ‹ prev next ›