overview for noli

good girl in c/traaaaaaannnnnnnnnns@lemmy.ca

[–] noli@programming.dev 1 points 2 years ago

Username checks out :p

Async Rust Is A Bad Language in c/rust@programming.dev

[–] noli@programming.dev -1 points 2 years ago (2 children)

It's a joy to do async in go IMO

Bloom filters: real-world applications in c/programming@programming.dev

[–] noli@programming.dev 2 points 2 years ago

That's actually a really nice application, in this case to reduce bandwidth constraints as opposed to the usual use case of memory constraints!

Which side are you? Javascript or Typescript in c/programmer_humor@programming.dev

[–] noli@programming.dev 19 points 2 years ago

The typical arguments for a dynamic typed language are that it takes less time to write something in it.

The benefits of static typed languages are that your development environment can be a lot smarter (ironically enough leading to faster development speed) and several classes of bugs being unable to happen. In a statically typed language, the IDE can detect if you're trying to call a function that takes a number but you're actually providing a string. In this case the IDE will let you know and you can immediately fix silly mistakes like that.

Bloom filters: real-world applications in c/programming@programming.dev

[–] noli@programming.dev 1 points 2 years ago (1 children)

Cool, so in this case your filter is basically a classifier ML model. How would you set the hash functions then though?

Bloom filters: real-world applications in c/data_engineering@programming.dev

[–] noli@programming.dev 2 points 2 years ago (1 children)

The classical example of scanning a k:v space in a memory efficient manner is the best application I've seen, but it still doesn't really work if you need to know a key is definitely in that space. You still have to perform the lookup.

In the google BigTable paper they use it like this. The reason it makes sense in that context is that there is that bloom filters are tiny and can be easily held in memory for a large dataset, whereas performing the actual lookup requires a slow disk access/network request. This way you can reduce the amount of expensive lookups massively at the cost of a small amount of memory.

Also since the underlying data format is immutable the cost of recalculating upon removal doesn't matter since it doesn't need to happen

it turned out that even with absurdly high lookup rates, adding an initial check for negative presence in a record set, we didn't benefit at all.

Compared to which alternative? I also don't understand exactly what the filter's purpose was in this case, did you have all DNS records in a bloom filter to quickly check for misses? Or was it some kind of allowlist/blocklist of clients?

Finally, what metrics were you using to decide it did not benefit at all?

Bloom filters: real-world applications in c/programming@programming.dev

[–] noli@programming.dev 2 points 2 years ago (1 children)

Interesting. Do I understand it correctly if I say it's a bloom filter where instead of setting a bit to 1 for each of the hashes, you increment a counter for that hash?

How do you infer the count then, take the minimum of all matching hashes? Because intuitively it seems to me like you would need a lot more space to avoid counts being too high

The birth of JS in c/programmer_humor@programming.dev

[–] noli@programming.dev 4 points 2 years ago* (last edited 2 years ago)

I am dumb. The more things I need to think about when reading code that is not the logic of the code, the worse it is. Any time I have to spend thinking about the peculiarities of the way the language handles something is time wasted.

I'll give a very simple example, think like you're trying to find a bug. Assume we're in a dynamic language that allows implicit conversion like this. We can write our code very "cleanly" as follows:

if(!someVar) doSomething();

-> ok, now we gotta check where someVar's value is last set to know what type of data this is. Then I need to remember or look up how those specific types are coerced into a bool.

When trying the same code in a statically typed language that doesn't do implicit coercion that code will fail to run/compile so probably you'll have something like this:

if(someVar.length() == 0) doSomething();

-> this time I can just look at the type of someVar to see it's a string and it's clear what the condition actually means.

The second option is both easier to read and less bug prone even without the type system. It takes maybe 3 seconds longer to type, but if your productivity in coding is that limited by typing speed then I envy you

The birth of JS in c/programmer_humor@programming.dev

[–] noli@programming.dev 22 points 2 years ago (9 children)

Which is why I'm of the opinion that dynamically typed languages are evil. !!"false" should either be caught at compile time or raise an exception.

I'm thoroughly convinced that the only use of dynamically typed languages is to introduce bugs

Bloom filters: real-world applications in c/programming@programming.dev

[–] noli@programming.dev 5 points 2 years ago* (last edited 2 years ago) (3 children)

I know they are used in google's BigTable. All data there is stored in seperate SSTables and you can specify that a locality group should have bloom filters generated for its SSTables. Apparently cassandra has them too.

Both are the same general application though and you already mentioned databases.

I did think about using them at some point for authentication purposes in a webservice. The idea being to check for double uses of a refresh token. This way the user database would need to store only a small amount of extra storage to check for the reuse of a refresh token and if you set the parameters accordingly, the false positives are kind of a benefit in that users cannot infinitely refresh and they actually have to reauthenticate sometimes.

Edit to add: I also read a paper recently that uses a datastructure called a collage that is closely related to bloom filters to perform in-network calculations in a sensor network. If I understand correctly, the basic idea there is that every node in the network adds a bit to the datastructure while it is in transit, so data from the entire network is aggregated. The result can then be fed to a classifier ML model. (Source: Oostvogels, J., Michiels, S., & Hughes, D. (2022). One-Take: Gathering Distributed Sensor Data Through Dominant Symbols for Fast Classification. )

Simple trick in c/programmer_humor@programming.dev

[–] noli@programming.dev 26 points 2 years ago (3 children)

Nvm fixed it

What is the best file format for configuration file? in c/experienced_devs@programming.dev

[–] noli@programming.dev 17 points 2 years ago

You disgust me