Programming

22709 readers

42 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Follow the programming.dev instance rules
Keep content related to programming in some way
If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev

founded 2 years ago

MODERATORS

snowe@programming.dev

Ategon@programming.dev

MaungaHikoi@lemmy.nz

UlrikHD@programming.dev

How do I design a multi-language database? (lemmy.dbzer0.com)

submitted 2 days ago* (last edited 2 days ago) by orsetto@lemmy.dbzer0.com to c/programming@programming.dev

21 comments fedilink hide all child comments

Soo I'm working on a database that needs to support multiple languages (two for now, but who knows). I stumbled across this blog post that explains how to develop what it calls a "translation subschema" (haven't seen it called like this anywhere else so I don't know if it's actually how you'd call it), which seems like a very nice way of dealing with things.

I'm not very experienced with DB stuff, so it took me a while to fully understand what it was doing, but now that (I think) I do, I'm wondering if I could just ignore the Languages table, and just use a language field in the tables TextContent and Translations, without loosing any functionality. (except of course having a table listing the available languages, which is not however something I'm interested in)

In my head everything would still work, I'd insert stuff with

INSERT INTO TextContent (OriginalText, OriginalLanguage)
VALUES ("Ciao", "it");

DECLARE TextContentId INT = SCOPE_IDENTITY();

INSERT INTO Translations (TextContentId, Language, Translation)
VALUES (@TextContentId, "it", "Ciao");
INSERT INTO Translations (TextContentId, Language, Translation)
VALUES (@TextContentId, "en", "Hello");

and given a TextContentId, i'd retrieve the correct translation with

SELECT Translation FROM Translations WHERE TextContentId = TCId AND Language = "en"

At this point, I'm thinking I could drop TextContent too, and just have a Translations table with TextContentId, Language, and Translation, with (TextContentId, Language) as primary key.

Am I missing something? I'm trying to simplify this solution but I don't want to risk making random errors.

Edit: translations on the DB are for user inserted text, which will also provide translations. The client then will only receive text it the correct language.

you are viewing a single comment's thread
view the rest of the comments

[–] TehPers@beehaw.org 15 points 2 days ago (5 children)

Localization is a hard problem, but storing your translations in the DB is a bit unusual unless you're trying to translate user data or something.

I'd recommend looking into tools like Project Fluent or similar that are designed around translating.

As for the schema you have, if you're sticking with it, I would change the language into an IETF language tag or similar instead. The important part is that it separates language variants. For example, US English and British (or international) English have differences, Brazilian Portuguese and Portugal Portuguese have differences, Mexican Spanish and Spain Spanish have differences, etc.

Using an ID instead of the text content itself as part of the PK should be a no-brainer. Languages evolve over time, and translations change. PKs should not. Your choice of PK = (TextContentId, Language) is the most reasonable to me, though I still think that translations should live as assets to your application instead to better integrate with existing localization tools.

One last thing: people tend to believe that translating is enough to localize. It is not. For example, RTL languages often swap the entire UI direction to RTL, not just the text direction. Also, different cultures sometimes use different colors and icons than each other.

[–] xianjam@programming.dev 1 points 2 days ago (1 children)

Why is storing resources in the database unusual? I've done that my entire career, and I've believed it to be a best practice.

[–] TehPers@beehaw.org 2 points 2 days ago

Storing UI assets in a database is unusual because assets aren't data, they are part of your UI. This is of course assuming a website - an application may choose to save assets in a local sqlite database or similar for convenience.

It's the same reason I wouldn't store static images in a database though - there's no reason to do so. Databases provide no additional value over just storing the images next to the code, and same with localizations.

User-generated content changes things because that data is now dynamically generated, not static assets for a frontend.

load more comments (3 replies)