this post was submitted on 28 Oct 2023
317 points (97.9% liked)

Programmer Humor

32410 readers
1 users here now

Post funny things about programming here! (Or just rant about your favourite programming language.)

Rules:

founded 6 years ago
MODERATORS
 
you are viewing a single comment's thread
view the rest of the comments
[–] alvvayson@lemmy.world 115 points 2 years ago* (last edited 2 years ago) (1 children)

It's a joke.

UTF-16 already exists, which doesn't favor Roman characters as much, but UTF-8 is more popular because it is backword compatible with the legacy ASCII.

UTF-32 also exists which has exactly equal length representation for every character.

But the thing that equalizes languages is compression.

Yes, a text written in Cyrillic with UTF-8 will take more space than a Roman language, easily double. However this extra space is much more easily compressed by an algorithm like GZIP.

So after compression, the two compressed texts will then be similarly sized and much smaller than UTF-16 or UTF-32.

[–] jmcs@discuss.tchncs.de 17 points 2 years ago (1 children)

Besides most text on the average computer is either within some configuration file (which tend to use latin script), or within some SGML derived format which has a bunch of latin characters in it. For network transmission most things will use HTML, XML or JSON and use English language property names even in countries that don't speak English (see Yandex's and Baidu's APIs for example).

No one is moving large amounts of .txt files around.

[–] Buckshot@programming.dev 24 points 2 years ago (2 children)

You've never worked in finance then. All our systems at work do nothing but move large amounts of txt files around.

That said, many of our clients still don't support utf-8 so its all ascii and non-latin alphabets are screwed. They can't even handle characters 128-255 so even stuff like £ is unsupported.

[–] LaggyKar@programming.dev 11 points 2 years ago

That said, many of our clients still don’t support utf-8 so its all ascii and non-latin alphabets are screwed.

Ah, yes, I heard about that sort of thing. Some bank getting a GDPR complaint because they couldn't correct the spelling of someone's name, because their system uses EBCDIC.

[–] anytimesoon@lemmy.ml 7 points 2 years ago

finance

even stuff like £ is unsupported.

Probably not an issue then...