Internationalize dynamic messages

Build flexible formatters for international applications with Java 1.1

If you have ever used a piece of software written for speakers of one language -- human, not computer -- that has been hastily rewritten for speakers of another, you probably noticed that the interface, unless it had been redone very professionally, felt a bit awkward.

This awkwardness comes from a number of sources -- some related to poor word selection and some related to differences in the syntax of the two languages.

From the perspective of the programmer rewriting the software in a new language, the words are usually the easiest part of the code to get right. For example, when you change the strings in the code from English to German, "Yes" becomes "Ja," "No" becomes "Nein," and so on. Pretty simple stuff. Even simple phrases, such as "Do you wish to proceed?" can be replaced in their entirety with the appropriate translation.

If translations were limited to these simplistic examples, there would be no problem. Consider, however, the following example of a compiler diagnostic:

"There are 3 errors in 2 files"

Unlike the earlier examples, this example contains two pieces of information that change depending on the result of the compilation: the number of errors and the number of source files compiled.

A naive implementation of the code to generate this diagnostic might look like this:

   String str = "There were " + nErrorCount +
                " errors in " + nFileCount +
                "."

In this case the variables nErrorCount and nFileCount hold the number of errors and the number of files, respectively.

Imagine that we wanted to make it possible to display this compiler diagnostic in other languages. Using resource bundles and the resource bundle API (which we covered in detail in last month's column), we could easily modify the code so that it doesn't contain any non-localized text.

   ResourceBundle res = ResourceBundle.getBundle("Strings");
   String str = res.getString("first") + " " + nErrorCount +
                res.getString("second") + " " + nFileCount +
                res.getString("third");

The resource bundle keys "first," "second," and "third" in the resource bundle named "Strings" are associated with the textural pieces that make up the diagnostic. Now, each piece of the original can be translated and placed in the appropriate resource bundle -- and the appropriate string will be constructed at runtime. We're in great shape, right?

Wrong!

The example above contains implicit assumptions about the position and order of the elements that make up the text. That might not seem like such a problem, but it is.

A closer look

Consider the following three phrases:

"You copied 3 files"

"You deleted 2 files"

"You moved 0 files"

These three statements are identical except for the operation (copied, deleted, and moved) and the number of files affected (3, 2, and 0). Let's call the operation, parameter zero {0}, and the number of files, parameter one {1}.

Here's the phrase with the operation and number of files affected replaced by parameter marker zero and one.

"You {0} {1} files"

What if, like the Japanese language, the English language required the verb to be at the end of the sentence? Our three phrases then become:

"3 files were copied"

"2 files were deleted"

"0 files were moved"

Once again, these three statements are identical except for the operation and the number of files affected.

Here's the phrase with the operation and number of files affected replaced by parameter marker zero and one.

"{1} files were {0}"

Notice that the order of the parameter markers has changed. It appears we can't simply concatenate pieces of a message together in order to create a display string, even if we translate each piece separately.

What we need is a syntactically correct template for each locale. At runtime, we combine the template with the supplied parameters:

"You {0} {1} files" , "copied" , "2" => "You copied 2 files"

"{1} files were {0}" , "copied" , "2" => "2 files were copied"

Voilá, the correct message!

The ability to combine template and parameters is provided by class MessageFormat.

The MessageFormat class

MessageFormat's constructor takes a template string. The template string should come from a resource bundle appropriate for the current locale. The template string contains parameter markers that indicate where parameters go.

   MessageFormat mf = new MessageFormat("You {0} {1} files");

The parameter markers must be replaced by parameters at the appropriate time. The format method takes an array of objects. The formatted representation of the objects are substituted into the template string at the positions indicated by the parameter markers.

The first object of the array replaces the {0} parameter marker, the second object of the array replaces the {1} parameter marker, and so on.

   Object[] args =
   {
      "moved",
      new Integer(3)
   };
   String str = mf.format(args);

The result is properly formatted string for the given locale.

The MessageFormat class provides a static convenience method also named format. This method takes both a template string and an array of objects and creates the appropriate string.

Parameter marker tricks

The examples we used above illustrated the simplest form of a parameter marker. You will probably require more complex forms in your own work, so let's review complete parameter marker syntax.

A parameter marker consists of three fields separated by commas (I'll show you some examples in a moment). The first field, which is required, is a number between 0 and 9. This field indicates which object in the array of objects to use to obtain the formatted representation.

The second field is the optional format. If the second field is not present, each object is examined and formatted based on its type. If the object is an instance of class Number the NumberFormat class is used to format the object. If the object is an instance of class Date the DateFormat class is used to format the object. If it is neither, the object's formatted representation is obtained by calling its toString method.

The following formats are valid: time, date, number, and choice. Each causes a default instance of the appropriate formatter to be created and applied. The result is a string to use in place of the parameter marker.

The third field is the style. The style field influences how the format formats the object. If the format is either time or date, the style can take one of two forms:

  • One of the following: short, medium, long, or full
  • A date format pattern

If the format is number, the style can take one of two forms:

  • One of the following: currency, percent, or integer
  • A number format pattern

Let's take a look at some examples:

Parameter markerDescription
"{0,date}" or "{0,time}"Formats the zeroth parameter as either a date or a time
"{0,number,percent}" or "{0,number,currency}"Formats the zeroth parameter as either a percent or a currency amount
"{0,number,integer}"Formats the zeroth parameter as an integer
"{0,date,short}" or "{0,date,long}"Formats the zeroth parameter as either a short date or a long date
"{0,number,+#,###}"Formats the zeroth parameter as a number using the supplied style.
"{0,date,EEEE, MMMM d, ''yy}"Formats the zeroth parameter as a date using the supplied style

As you can see, parameter markers can be very expressive.

The choice format

Let's take a look at the compiler diagnostic one more time.

"There were 3 errors in 2 files"

What if there had been only one error or only one file? If we had naively formatted the following message like this:

"There were {0} errors in {1} files"

We would have obtained:

"There were 1 errors in 1 files"

We may both know what that statement means, but it isn't correct English. In the past, programmers typically handled the singular case in one of three ways:

  • They ignored it
  • They modified the message so that it read: "There was/were 1 error(s) in 1 file(s)"
  • They wrote special code to handle it

We can do better!

The choice format option (in the second field of the parameter marker) allows us to attach arbitrary messages to a ranges of numbers. This point is best made by illustration.

We would like the compiler diagnostic to take one of three forms, depending on the number of errors and the number of files:

  • "There were no errors"
  • "There was 1 error in 1 file"
  • "There were 3 errors in 2 files"

The choice format style works by associating a message with a range of numbers. For example, the range consisting of only the number 0 would be associated with the message "were no errors." The range consisting of only the number 1 would be associated with the message "was 1 error." And the range consisting of the numbers greater than 1 would be associated with the message "were N errors" where N would be the number of errors.

Files work in the same manner.

Let's review the syntax for specifying choice.

A choice is specified as a set of alternatives separated by commas. An alternative consists of a numeric limit and a message. The alternatives are ordered by their limit. The limit is a double and can be specified in one of two ways:

  • N# means from N above, inclusive
  • N< means from N above, exclusive

The message consists of a parameterized template.

Here's an example of a message string that will generate the appropriate computer diagnostics:

"There {0,choice,0#were no errors|1#was 1 error|1<were {0} errors}{1,choice,0#|1# in 1 file|1< in {1} files}"

The string is composed of two choice parameter markers. We'll look at each independently.

The first choice parameter marker consists of three alternatives:

"{0,choice,0#were no errors|1#was 1 error|1<were {0} errors}"

If the first object in the array of objects is 0, the string "were no errors" results; if it is 1, the string "was 1 error" results; and if it is greater than 1, the string "were N errors" results (where N is its value).

The second choice marker also consists of three alternatives:

"{1,choice,0#|1# in 1 file|1< in {1} files}

If the second object in the array of objects is 0, the empty string results; if it is 1, the string "in 1 file" results; and if it is greater than 1, the string "in N files" results (where N is its value).

Show and tell -- an applet for dynamic strings

The following applet provides a dynamic environment in which to experiment with the MessageFormat class.

You need a Java-enabled browser to see this applet.

Note: You must use a Java 1.1-compliant browser to access the features of this applet.

Begin by entering a simple template in the Template field. I suggest:

"{0}"

Now type a number in the Parameter 0 field. When you press Enter, the string you entered will replace the template parameter {0} and appear in the Result field. From the drop-down menu to the right of the Parameter 0 field, select Number. The applet will attempt to parse the string in the Parameter 0 field as a Number. Next, enter the following in the Template field:

"{0,number,percent}"

You should see an appropriately formatted percent appear in the Result field.

Conclusion

We've completed our brief tour of the internationalization features present in the Java 1.1 class language and library. We began by working with date, time, and number formats to accommodate the cultural differences your apps may encounter. Such modifications, while trivial in nature, make all the difference between an awkward or an easy-to-use interface. We followed that lesson with an examination of the problem of text messages.

Our next step was to examine text messages. Because the developers at Sun could not possibly anticipate all the different text messages used by each application (current and future) and provide the appropriate translations. We learned how to code translations of our messages for each locale we needed by using resource bundles, which are a collection of all resources and information for a specific locale.

Finally, this month we learned how to handle dynamic text-based messages by combining a syntactically correct template with parameter markers within the MessageFormat class. This approach allowed us to format a string properly at runtime.

Todd Sundsted has been writing programs since computers became available in convenient desktop models. Though originally interested in building distributed applications in C++, Todd moved to the Java programming language when it became the obvious choice for that sort of thing. In addition to writing, Todd is president of Etcee, which offers Java-centric training, mentoring, and consulting.
1 2 Page 1