While looking around for a decent spreadsheet containing a map between the ISO 639-1 two-letter language codes and localized versions of language names, I could not find a straightforward version of this information published in a sensible form like CSV or as a JSON object.
What I’m looking for is something like:
1
2
3
4
5
|
var languageNameMap = {
'de': { 'de': 'Deutsch', 'en': 'Englisch', 'fr': 'Französisch' },
'en': { 'de': 'German', 'en': 'English', 'fr': 'French' },
'fr': { 'de': 'Allemand', 'en': 'Anglais', 'fr': 'Français' }
};
|
This way, when I want to display the name of a language in a particular locale, I can just do a simple lookup: languageNameMap[localeCode][languageCode].
So here is my attempt at putting together something like this, using reference material from the Unicode Common Locale Data Repository.
Inside of the core.zip file, a number of locale definition files are located under common/main/*.xml:
1
2
3
4
5
6
7
8
9
|
-rw-r--r--@ 1 user staff 8202 Aug 1 22:53 aa.xml
-rw-r--r--@ 1 user staff 702 Apr 27 2011 aa_DJ.xml
-rw-r--r--@ 1 user staff 99651 Oct 11 12:59 af.xml
-rw-r--r--@ 1 user staff 2033 Sep 23 15:02 af_NA.xml
-rw-r--r--@ 1 user staff 297 May 5 2009 af_ZA.xml
-rw-r--r--@ 1 user staff 27386 Sep 23 15:02 agq.xml
-rw-r--r--@ 1 user staff 298 Aug 1 22:53 agq_CM.xml
-rw-r--r--@ 1 user staff 24593 Oct 11 03:06 ak.xml
[...]
|
Each of these files contains a list of the world’s languages, as they would be named in that locale.
For example, in the German language locale definition file “de.xml” and many of the other files, there’s a “languages” list that looks like:
1
2
3
4
5
6
7
8
|
<languages>
<language type="aa">Afar</language>
<language type="ab">Abchasisch</language>
<language type="ace">Aceh-Sprache</language>
<language type="ach">Acholi-Sprache</language>
<language type="ada">Adangme</language>
[...]
</languages>
|
Now, let’s say we want the names of of the German language and the English language, in those languages, respectively. The output should be a grid of 2 x 2 language name pairs.
I’ve written a PHP script to parse the necessary locale definition files and to create a JSON object containing this information:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
|
<?php
// Usage:
// create_language_map.php CLDR-main-dir "2 letter language codes separated by spaces"
$cdrlDir = $argv[1];
$which = $argv[2];
$whichList = explode(' ', $which);
foreach($whichList as $languageA)
{
$localeFile = $cdrlDir . "/" . "$languageA.xml";
// Load the locale definition file.
if ($data = simplexml_load_file($localeFile))
{
/*
array(1) {
[0]=>
object(SimpleXMLElement)#11 (2) {
["@attributes"]=>
array(1) {
["type"]=>
string(2) "en"
}
[0]=>
string(20) "الإنجليزية"
}
}
*/
// Loop over the language codes and get the names we want.
foreach ($whichList as $languageB)
{
$L = $data->xpath("//languages/language[@type='$languageB']");
if (is_array($L) && 1 == count($L))
{
// Coerce to string.
$output[$languageA][$languageB] = "" . $L[0];
}
}
}
}
if (defined(JSON_PRETTY_PRINT))
print json_encode($output, JSON_PRETTY_PRINT);
else
print json_encode($output);
?>
|
When it is run from the command line (and with a little help from Python, since OS X doesn’t by default ship w/a pretty-printing PHP) the following should pop out:
1
2
3
4
5
6
7
8
9
10
11
|
$ php create_language_map.php ~/Downloads/core/common/main "de en" | python -m json.tool
{
"de": {
"de": "Deutsch",
"en": "Englisch"
},
"en": {
"de": "German",
"en": "English"
}
}
|
So if you want to use this later to display the name of the English language in German, you just do something like languageNameMap[‘de’][‘en’]; (I realize it might even be easier to rewrite the script so it’s LNM[‘en’][‘de’] instead, but I’ll leave that as an exercise to the reader.)