Internationalization (I18N) and Localization (L10N)

Many web application built with PHP will not have internationalization in mind when it was first written. It may be that it was not intended for use in languages and cultures. Internationalization is an important aspect due to the increase adoption of the Internet in many non-English speaking countries. The process of internationalization and localization will contain difficulties. Below are some general guidelines to internationalize an existing application.

Separate culture/locale sensitive data

Identify and separate data that varies with culture. The most obvious are text/string/message. Other type of data should also be considered. The following list categorize some examples of culture sensitive data

If possible all manner of text should be isolated and store in a persistence format. These text include, application error messages, hard coded strings in PHP files, emails, static HTML text, and text on form elements (e.g. buttons).

Configuration

To enable the localization features in Prado, you need to add a few configuration options in your application configuration. First you need to include the System.I18N.* namespace to your paths.

Then, if you wish to translate some text in your application, you need to add one translation message data source.

Where source in translation is the dot path to a directory where you are going to store your translate message catalogue. The autosave attribute if enabled, saves untranslated messages back into the message catalogue. With cache enabled, translated messages are saved in the application runtime/i18n directory.

With the configuration complete, we can now start to localize your application. If you have autosave enabled, after running your application with some localization activity (i.e. translating some text), you will see a directory and a messages.xml created within your source directory.

What to do with messages.xml?

The translation message catalogue file, if using type="XLIFF", is a standardized translation message interchange XML format. You can edit the XML file using any UTF-8 aware editor. The format of the XML is something like the following.

Hello world. Hi World!!! Each translation message is wrapped within a trans-unit tag, where source is the original message, and target is the translated message. Editors such as Heartsome XLIFF Translation Editor can help in editing these XML files.

Setting and Changing Culture

Once globalization is enabled, you can access the globalization settings, such as, Culture, Charset, etc, using

$globalization = $this->getApplication()->getGlobalization(); echo $globalization->Culture; $globalization->Charset= "GB-2312"; //change the charset

You also change the way the culture is determined by changing the class attribute in the module configuration. For example, to set the culture that depends on the browser settings, you can use the TGlobalizationAutoDetect class. ...

You may also provide your own globalization class to change how the application culture is set. Lastly, you can change the globalization settings on page by page basis using template control tags. For example, changing the Culture to "zh".

<%@ Application.Globalization.Culture="zh" %>

Localizing your Prado application

There are two areas in your application that may need message or string localization, in PHP code and in the templates. To localize strings within PHP, use the localize function detailed below. To localize text in the template, use the TTranslate component.

Using localize function to translate text within PHP

The localize function searches for a translated string that matches original from your translation source. First, you need to locate all the hard coded text in PHP that are displayed or sent to the end user. The following example localizes the text of the $sender (assuming, say, the sender is a button). The original code before localization is as follows. function clickMe($sender,$param) { $sender->Text="Hello, world!"; }

The hard coded message "Hello, world!" is to be localized using the localize function.

function clickMe($sender,$param) { $sender->Text=localize("Hello, world!"); }

Compound Messages

Compound messages can contain variable data. For example, in the message "There are 12 users online.", the integer 12 may change depending on some data in your application. This is difficult to translate because the position of the variable data may be difference for different languages. In addition, different languages have their own rules for plurals (if any) and/or quantifiers. The following example can not be easily translated, because the sentence structure is fixed by hard coding the variable data within message.

$num_users = 12; $message = "There are " . $num_users . " users online."; This problem can be solved using the localize function with string substitution. For example, the $message string above can be constructed as follows. $num_users = 12; $message = localize("There are {num_users} users online.", array('num_users'=>$num_users));

Where the second parameter in localize takes an associative array with the key as the substitution to find in the text and replaced it with the associated value. The localize function does not solve the problem of localizing languages that have plural forms, the solution is to use TChoiceFormat.

The following sample demonstrates the basics of localization in Prado.

I18N Components

TTranslate

Messages and strings can be localized in PHP or in templates. To translate a message or string in the template, use TTranslate.

<com:TTranslate>Hello World</com:TTranslate> <com:TTranslate Text="Goodbye" />

TTranslate can also perform string substitution. Any attributes of TTranslate will be substituted with {attribute name} in the translation. E.g.

<com:TTranslate time="late"> The time is {time}. </com:TTranslate>

A short for TTranslate is also provided using the following syntax.

<%[string]>

where string will be translated to different languages according to the end-user's language preference. This syntax can be used with attribute values as well.

<com:TLabel Text="<%[ Hello World! ]%>" />

TDateFormat

Formatting localized date and time is straight forward.

<com:TDateFormat Value="12/01/2005" />

There are of 4 localized date patterns and 4 localized time patterns. They can be used in any combination. If using a combined pattern, the first must be the date, followed by a space, and lastly the time pattern. For example, full date pattern with short time pattern.

<com:TDateFormat Pattern="fulldate shorttime" />

If the Value property is not specified, the current date and time is used.

TNumberFormat

PRADO's Internationalization framework provide localized currency formatting and number formatting. Please note that the TNumberFormat component provides formatting only, it does not perform current conversion or exchange.

<com:TNumberFormat Type="currency" Value="100" />

Culture and Currency properties may be specified to format locale specific numbers.

TTranslateParameter

Compound messages, i.e., string substitution, can be accomplished with TTranslateParameter. In the following example, the strings "{greeting}" and "{name}" will be replace with the values of "Hello" and "World", respectively.The substitution string must be enclose with "{" and "}". The parameters can be further translated by using TTranslate. <com:TTranslate> {greeting} {name}! <com:TTranslateParameter Key="name">World</com:TTranslateParameter> <com:TTranslateParameter Key="greeting">Hello</com:TTranslateParameter> </com:TTranslate>

TChoiceFormat

Using the localize function or TTranslate component to translate messages does not inform the translator the cardinality of the data required to determine the correct plural structure to use. It only informs them that there is a variable data, the data could be anything. Thus, the translator will be unable to determine with respect to the substitution data the correct plural, language structure or phrase to use . E.g. in English, to translate the sentence, "There are {number} of apples.", the resulting translation should be different depending on the number of apples.

The TChoiceFormat component performs message/string choice translation. The following example demonstrated a simple 2 choice message translation.

<com:TChoiceFormat Value="1"/>[1] One Apple. |[2] Two Apples</com:TChoiceFormat>

In the above example, the Value "1" (one), thus the translated string is "One Apple". If the Value was "2", then it will show "Two Apples".

The message/string choices are separated by the pipe "|" followed by a set notation of the form.

Any non-empty combinations of the delimiters of square and round brackets are acceptable. The string chosen for display depends on the Value property. The Value is evaluated for each set until the Value is found to belong to a particular set.