Language Names, In Those Languages

While looking around for a decent spreadsheet containing a map between the ISO 639-1 two-letter language codes and localized versions of language names, I could not find a straightforward version of this information published in a sensible form like CSV or as a JSON object.

What I’m looking for is something like:

This way, when I want to display the name of a language in a particular locale, I can just do a simple lookup: languageNameMap[localeCode][languageCode].

So here is my attempt at putting together something like this, using reference material from the Unicode Common Locale Data Repository.

Inside of the core.zip file, a number of locale definition files are located under common/main/*.xml:

Each of these files contains a list of the world’s languages, as they would be named in that locale.

For example, in the German language locale definition file “de.xml” and many of the other files, there’s a “languages” list that looks like:

Now, let’s say we want the names of of the German language and the English language, in those languages, respectively. The output should be a grid of 2 x 2 language name pairs.

I’ve written a PHP script to parse the necessary locale definition files and to create a JSON object containing this information:

When it is run from the command line (and with a little help from Python, since OS X doesn’t by default ship w/a pretty-printing PHP) the following should pop out:

So if you want to use this later to display the name of the English language in German, you just do something like languageNameMap[‘de’][‘en’]; (I realize it might even be easier to rewrite the script so it’s LNM[‘en’][‘de’] instead, but I’ll leave that as an exercise to the reader.)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.