Internationalize your software, Part 3

Learn how to develop software for the global marketplace

1 2 3 4 5 Page 4
Page 4 of 5

So what happens if your code needs a date formatter for a locale that's not supported? Java provides a solution via the SimpleDateFormat and DateFormatSymbols classes. (These classes are used internally by DateFormat.) SimpleDateFormat allows you to create your own patterns for controlling the visual representation of a formatted value. DateFormatSymbols allows you to define the actual characters that appear in this visual representation -- such as month names and time zone strings.

Please consult The Java Tutorial (see Resources) for more information on using these classes. Detailed information about SimpleDateFormat is available in the following class reference, located at Sun's Java Web site: http://java.sun.com/products/jdk/1.1/docs/api/java.text.SimpleDateFormat.html. And detailed information about DateFormatSymbols is available in the following class reference, located at Sun's Java Web site: http://java.sun.com/products/jdk/1.1/docs/api/java.text.DateFormatSymbols.html.

Message formatters

A message formatter is an object that formats a compound message. What's a compound message? Before defining this term, let's look at a definition for message.

A message is textual data (usually) that provides the user with status or error information, descriptive names (such as widget names), and so on. There are two categories of messages: simple and compound. A simple message consists of static (non-changing) text, and a compound message consists of static and variable (changing) text, such as dates, currencies, and file counts. This table provides an example of a compound message with the variable text shown in boldface type.

Error 26! Disk "Accounts" last backed
up on August 11,1998.  826 files were
deleted Account balance is: ,659.23

Unlike simple messages, compound messages cannot be stored directly in resource bundles. (After all, how can we store variable text?) However, as we will see, there is an indirect way to store compound messages in resource bundles.

Each variable text item can be replaced by an argument: instructional text that provides a message formatter with information on how to format that item. The resulting combination of static text and arguments is known as a message pattern. The next table shows a message pattern that's based on the compound message in the previous table with arguments shown in boldface type.

Error {0, number, integer}!  Disk
"{1}" last backed up on {2, date,
long}.  {3, number, integer} files were
deleted Account balance is: {4, number, currency}

Each argument is surrounded by brace characters ("{}"). The first component of an argument is a digit -- the argument number. Argument numbers identify arguments and do not need to be placed in any particular order. For example, I could have placed the digit 1 in the argument that follows Error and the digit 0 in the argument that follows Disk. This table describes each argument.

ArgumentDescription
{0, number, integer} This argument represents a Number object, using the INTEGER style.
{1} This argument will be associated with a corresponding String object in the resource bundle that represents a disk name.
{2, date, long} This argument represents the components of a Date object whose length is greater than or equal to a day. This date will be formatted using the date formatter's LONG formatting style.
{3, number, integer} This argument represents a Number object, using the INTEGER style.
{4, number, currency} This argument represents a Number object, using the CURRENCY style.

The message pattern now can be stored in a resource bundle that's backed by a properties file for each locale. Let's create a resource bundle called Demo. For the United States locale, the name of this properties file is Demo_en_US.properties.

The following tables shows the contents of this file. Each logical line in the template string is terminated by the Unicode linefeed '\u000A' character so that it will appear on a separate physical line. The continuation character ("\") is used to inform the Java compiler that the next physical line is part of the string.

// Demo_en_US.properties

template = Error {0, number, integer}'\u000A' Disk "{1}" last backed up on {2, date, long}.'\u000A' {3, number, integer} files were deleted'\u000A' Account balance is: {4, number, currency}'\u000A'

diskname = Accounts

The code fragment below shows how to obtain the resource bundle for the United States locale:

ResourceBundle mb = ResourceBundle.getBundle ("Demo", Locale.US);

The next step is to create an array of arguments, as follows:

Object [] arguments = { new Integer (26), mb.getString ("diskname"), null, new Integer (826), new Double (3659.23) };

Calendar c = Calendar.getInstance (); c.set (1998, 7, 11); arguments [2] = c.getTime ();

The position of each argument must match the argument number in the message pattern. For example, the null argument in the preceding code fragment matches {2, date, long} in the message pattern after it's been set to a specific Date object.

Once the arguments array has been created, we can create a message formatter:

MessageFormat mf = new MessageFormat ("");
mf.setLocale (Locale.US);
mf.applyPattern (mb.getString ("template"));

In addition to creating the message formatter, the preceding code initializes its locale and establishes the message pattern that this formatter will use. The message can now be formatted by calling the MessageFormat's format (Object []) method, as shown in this code fragment:

String result = mf.format (arguments);

The source code to an application that demonstrates message formatting is located in example17.java. Here are the results of running this application with the United States and France locales:

Error 26 Disk "Accounts" last backed up on August 11, 1998. 826 files were deleted Account balance is: ,659.23

Erreur 26 Le disque " rend compte " dernier sauvegardÈ 11 aošt 1998. 826 fichiers ont ÈtÈ effacÈs L'Èquilibre de compte est: 3�?659,23 F

Detailed information about MessageFormat is available in the following class reference, located at Sun's Java Web site: http://java.sun.com/products/jdk/1.1/docs/api/java.text.MessageFormat.html.

Let's suppose your code needs to generate messages similar to the following: There are 3 delinquent accounts.. You could convert this message into a message pattern: There are {0, number} delinquent accounts., and store this pattern in a resource bundle. Now what happens if there is only one delinquent account? Your code would generate the message: There are 1 delinquent accounts.. This is bad grammar, and something that no professional application should display to a user. So what do you do?

The answer is to use Java's ChoiceFormat class. I'm going to defer discussion of ChoiceFormat to another resource as this article is already long enough. Please consult The Java Tutorial (see Resources) for more information on using this class. Detailed information about ChoiceFormat is also available in the following class reference, located at Sun's Java Web site: http://java.sun.com/products/jdk/1.1/docs/api/java.text.ChoiceFormat.html.

International fonts and non-Unicode text

How do we display Chinese, Arabic, or other intensely visual symbols? The answer is to make use of international fonts. How do we convert non-Unicode text to Unicode? The answer is to make use of encodings, Java's reader/writer classes, and tools such as native2ascii.

International fonts

The internationalization section of the JDK 1.1.6 documentation discusses how to add international fonts to the Java runtime. Specifically, it discusses how to add Japanese, Korean, Chinese, and Traditional Chinese fonts. Adding fonts involves working with a special file that's distributed with the runtime: font.properties. Rather than duplicate what's already been said, please consult this documentation for more information.

Java's "virtual" fonts are mapped to real fonts on the host machine. The internationalization section of the JDK 1.1.6 documentation discusses this mapping and provides detailed information on the structure of the font.properties file. Once again, please refer to this documentation for more information.

Non-Unicode text

As you already know, char variables in the Java programming language represent Unicode characters. However, few text editors support Unicode text entry. Many text editors are based on the ASCII character set. However, using such an editor, you can enter Java source code in ASCII and represent Unicode characters with special '\uxxxx' escape sequences (each x represents a hexadecimal digit). The Java compiler and runtime environment automatically convert ASCII and International Standards Organization (ISO) Latin-1 characters to Unicode characters. But if you want to convert characters from other encodings to Unicode, you'll need to do these conversions yourself.

Java contains APIs for translating non-Unicode text to Unicode. Before using these APIs, you must make sure that the character encoding of characters that must be converted to Unicode is supported. The internationalization section of the JDK 1.1.6 documentation provides a list of supported encodings.

You can convert a byte array of non-Unicode text to a Java String object by using code that's similar to this code fragment:

byte [] nonUnicodeBytes = ...;

String UnicodeCharacters = new String (nonUnicodeBytes, "UTF8");

In the preceding code fragment, the String (byte [], String) constructor is called to create a new String object from a byte array. The "UTF8" parameter specifies the encoding of nonUnicodeBytes. In this example, nonUnicodeBytes contains bytes that are stored using the UTF8 encoding (UTF8 is a compact binary form for encoding 16-bit Unicode characters into 8 bits). Conversely, once you have a String object, you can extract its contents into a byte array by calling its getBytes () or getBytes (String) methods.

The getBytes () method converts characters to bytes based on the JDK platform's default character encoding. The getBytes (String) method converts characters to bytes based on a specified encoding. For example, in the previous code fragment, we could call UnicodeCharacters.getbytes ("UTF8") to obtain the original byte array.

Java contains InputStreamReader and OutputStreamWriter classes for converting between Unicode character streams and bytestreams of non-Unicode text. Please consult The Java Tutorial (see Resources) for more information on using these classes.

Detailed information about InputStreamReader is available in the following class reference, located at Sun's Java Web site: http://java.sun.com/products/jdk/1.1/docs/api/java.io.InputStreamReader.html. And detailed information about OutputStreamWriter is available in the following class reference, located at Sun's Java Web site: http://java.sun.com/products/jdk/1.1/docs/api/java.io.OutputStreamWriter.html.

The native2ascii tool is used to convert a file with non-Unicode or non-ISO Latin-1 characters to a file with Unicode-encoded characters. Obviously, this tool is useful if you've created a file with a tool that generates characters using a character set that's alien to ASCII, ISO Latin-1, or Unicode. Please refer to the internationalization section of the JDK 1.1.6 documentation for more information on native2ascii.

Beyond JDK 1.1.6

I think that the perfect way to conclude this series is to look beyond JDK 1.1.6 by exploring some of the internationalization features that are new to JDK 1.1.7 and Java 2.

What's new in JDK 1.1.7?

JDK 1.1.7 has introduced at least two new features: euro currency symbol support and changes to the java.lang.Character class to reflect updates to the Unicode standard. (I haven't come across any other new features.)

On January 1, 1999, the European Monetary Union (EMU) will introduce the euro as the new common currency in 11 European countries. Applications that handle international currencies will need to take this new currency into account. Figure 2 shows a picture of the euro currency symbol.

Figure 2: Euro currency symbol

The euro currency symbol is represented by the Unicode character '\u20AC'. Figure 3 is an example of a number that uses this symbol.

Figure 3: Euro currency example

In Part 1 of this series, I mentioned that a locale consists of language, region, and variant components. Usually, it's sufficient to describe a locale by a combination of language and region. However, this is not always possible. For example, France is one of the European countries that will support the euro in 1999. Because it will take time to make a full transition from the French franc to the euro, France will use the two currencies. This affects Java.

As I've already mentioned, Java's NumberFormat class can be used to format currency values. To format a currency value, all your code needs to do is instantiate a currency formatter by calling NumberFormat's getCurrencyInstance (Locale) factory method and calling one of the resulting object's format methods.

Suppose you want to format the value of a double variable as a French currency value. You can do this by using the following code fragment:

double value = 1.23;

NumberFormat nf = NumberFormat.getCurrencyInstance (new Locale ("fr", "FR"));

System.out.println ("French currency: " + nf.format (value));

The displayed result is: 1,23 F. This is fine if we're dealing with francs, but how do we handle the French euro? The solution is to make use of the variant part of the locale. Take a look at the following code fragment:

double value = 1.23;

NumberFormat nf = NumberFormat.getCurrencyInstance (new Locale ("fr", "FR", "EURO"));

System.out.println ("French currency: " + nf.format (value));

If you compare the two preceding code fragments, you'll notice only one difference. The variant portion of the France locale is set to the string "EURO". In fact, this is the only way to differentiate a France locale that uses francs from a France locale that uses euros.

Related:
1 2 3 4 5 Page 4
Page 4 of 5