Internationalize your software, Part 3

Learn how to develop software for the global marketplace

Last month, I presented the second part of a three-part series exploring the development of Java-based software for an international audience. Part 2 provided a complete list of Java's internationalization and localization classes -- as of JDK 1.1.6 -- and introduced the concept of an "umbrella" class. We explored character properties, string comparisons, and character-, line-, sentence-, and word-break detection. We saw how to set the default locale via the host computer's operating system (Windows 95 was used as an example) and continued to explore resource bundles -- specifically, we learned how to store image data in a list resource bundle.

Read the whole "Internationalize Your Software" series:

Part 3 closes the internationalization series with an expansion of the material presented in Part 2, including:

  • Dates, time zones, and calendars
  • Formatters
  • International fonts and non-Unicode text
  • Beyond JDK 1.1.6

In Part 1 of this series, I included calendars on a list of items requiring localization. In Part 3, we're going to examine calendars, along with dates and time zones, from Java's perspective.

How do we display numbers, dates, and messages, according to the conventions of different locales, without writing lots of code? We'll answer this question by examining Java's formatter classes. As we'll see, it's possible to use these same classes to parse user input in a locale-sensitive manner.

So far, we haven't seen an applet that displays Chinese, Arabic, Hebrew, or Japanese characters. Why? We'll find out when we explore international fonts and non-Unicode text.

And finally, although this series has been based on JDK 1.1.6, we'll move beyond JDK 1.1.6 and explore new internationalization features that have been introduced in JDK 1.1.7 and what's now known as the Java 2 platform (previously JDK 1.2).

In this article, Java applets are used to illustrate Java's internationalization and localization features. These applets were compiled with the JDK 1.1.6 compiler and tested with the JDK 1.1.6 appletviewer and Netscape Navigator 4.06 programs. Netscape was running version 1.1.5 of the Java runtime environment during testing.

Dates, time zones, and calendars

Many Java programs work with the concept of time. For example, one program might measure the interval between two events while another is designed to calculate a person's age. Different cultures tend to measure time in standardized units such as minutes and days. However, they don't all use the same calendar. For example, one culture might use the Gregorian calendar while another uses the 13-month lunar calendar. And we need to make sure that our international software takes this varying calendar usage into account, so that it exhibits consistent behavior for the particular locale in which it's used.

But before we look at calendars, we'll need to study dates and time zones. Why? Java's Calendar class is intricately connected to the Date and TimeZone classes. Therefore, it would be a good idea to see how Java deals with dates and time zones before exploring the more complex concept of calendars.

Dates

A date is a concrete representation of a precise instant in time. Dates consist of several components -- day, month, year, hour, minute, second, and so on. Normally, we think of a date as consisting of a year, a month, and a day. We also think of a time as consisting of an hour, a minute, and a second.

Over the years, I've come across operating systems, programs, articles, and books that combine these two concepts into the single concept of a date. I've also seen other examples that treat these concepts as separate entities. Is one right and the other wrong? I think it's a case of "six of one and a half-dozen of the other." In other words, I think separating the entities is splitting hairs.

For the purposes of my article, I decided to combine these elements as a date. Think of the hour, minute, and second, as representing the fraction of a day when specifying a precise instant in time.

Java's Date class is used to instantiate objects that represent dates. Internally, the Date () constructor calls System.currentTimeMillis () to obtain the host computer's current time -- expressed as the number of milliseconds that have elapsed since midnight GMT on January 1, 1970.

The following is a digital clock applet that uses the Date class. Press the Start button to start this clock and the Stop button to stop it. The source code to this applet is located in example7.java.

You need a Java-enabled browser to view this applet.

The digital clock applet calls the Date () constructor to instantiate a new Date object. It also calls Date's toString () method to return a String object that contains a human-readable date in the language and format -- weekday name, short month name, day of month, time (24-hour format), time zone, and year -- of the United States locale. Since toString () always works with the United States locale, it's not a good idea to use this method to format the contents of Date objects when developing international software. A better way to format Date objects is to use the DateFormat class, as we'll find out.

Several of Date's methods -- getTimeZoneOffset (), setYear (int), getMonth (), parse (String), and so on -- have been deprecated because they are not amenable to internationalization. In other words, they are either based exclusively on the United States locale or they exclusively support the Gregorian calendar. There is no room for growth. These methods should not be used.

Their functionality has been replaced by the TimeZone, Calendar, and DateFormat classes. Date's JDK documentation provides examples of replacement code. For example, a call to setYear (int year) could be replaced by a call to Calendar.set (Calendar.YEAR, year + 1900).

More detailed information about Date is available in the following class reference, located at Sun's Java Web site: http://java.sun.com/products/jdk/1.1/docs/api/java.util.Date.html.

Time zones

A time zone is a set of geographical regions that share a common time zone offset -- a specific number of hours relative to Greenwich Mean Time (GMT), the standard geographical location from where all time is measured. For example, the Central Standard Time (CST) and Eastern Standard Time (EST) time zones represent all the geographical regions located -6 and -5 hours, respectively, from GMT.

Why is Standard Time a part of the names given to these time zones? Standard Time is the default (normal) time used by a time zone. To capitalize on daylight hours as the seasons change, many regions within a time zone move their time setting forward in spring and backward in autumn by one or more hours. The period of time that lies between spring and autumn time changes is known as daylight savings time. Since standard time is the default time for a time zone, it makes sense to include Standard Time as part of a time zone's name.

Java's TimeZone class is used to obtain objects that represent time zone offsets. Because TimeZone is an abstract class, you must call one of TimeZone's two static factory methods -- getDefault () and getTimeZone (String) -- to return objects that have been instantiated from TimeZone's concrete subclasses.

This table shows the results of running a Java application that calls some of TimeZone's methods. These methods include getAvailableIDs (), getAvailableIDs (int), getDefault (), getID (), getRawOffset (), getTimeZone (), and useDaylightTime (). The source code to this application is located in example8.java.

Default time zone ID: CST

Available IDs =============

GMT UTC ECT EET ART EAT MET NET PLT IST BST VST CTT JST ACT AET SST NST MIT HST AST PST PNT MST CST EST IET PRT CNT AGT BET CAT

IDs associated with time zone -10 hours from GMT ================================================

HST

Offsets to add to UTC to get local time =======================================

CST: -6 EST: -5

Daylight Savings Time Usage for HST and MST ===========================================

HST (Hawaiian Standard Time): false MST (Mountain Standard Time): true

How does the preceding application work? The first task is to obtain TimeZone subclass objects for the default and EST time zones. The following code fragment shows how this is done. It calls TimeZone's getDefault () and getTimeZone (String) static factory methods.

// Get the default TimeZone object for this computer.

TimeZone tz1 = TimeZone.getDefault ();

// Get the TimeZone object associated with Eastern Standard Time.

TimeZone tz2 = TimeZone.getTimeZone ("EST");

The next step is to display the time zone identifier (ID) associated with the default time zone, and obtain an array of IDs. (A time zone identifier is string of characters that uniquely identifies either a time zone -- such as CST or EST -- or a region within a time zone that differs, based on daylight savings time behavior, from the rest of the time zone.) For example, Phoenix, AZ, and Denver, CO, lie within the same time zone, Mountain Standard Time (MST), but differ in daylight savings time behavior. Denver takes daylight savings into account, but Phoenix does not. The three-letter ID for Denver is MST while the three-letter ID for Phoenix is PNT (I'm unsure what PNT stands for as I found it appearing in a comment for Phoenix in the TimeZone.java source file. I suspect PNT was chosen because the more natural PST is already used for Pacific Standard Time).

The getAvailableIDs () method, as of JDK 1.1.6, returns these three-letter names. However, this will probably change. Because the three-letter IDs differ in some respect to current standards, they've been replaced by longer and more meaningful names. Denver's new ID is called America/Denver while Phoenix's ID is called America/Phoenix.

This name change has been partially reflected in JDK 1.1.6. TimeZone's getDefault () method obtains the default ID from the system properties. If this ID follows the new format, it will be remapped to the older (and less accurate) three-letter ID -- for compatibility reasons. The following code fragment calls getID () to obtain the default time zone's ID and getAvailableIDs () to obtain an array of supported IDs, which are then displayed to the user.

// Get the ID associated with the default time zone and display it.

System.out.println ("Default time zone ID: " + tz1.getID () + "\n");

// Get an array of IDs.

String [] IDs = tz1.getAvailableIDs ();

// Display all of these IDs.

System.out.println ("Available IDs"); System.out.println ("=============\n");

for (int i = 0; i < IDs.length; i++) System.out.println (IDs [i]);

The next step involves calling the getAvailableIDs (int) method to obtain an array of IDs for all geographical regions located in a time zone that are -10 hours from GMT. We also display these IDs to the user, as shown in the following code fragment.

// Get an array of IDs associated with a time zone -10 hours from GMT.

IDs = tz1.getAvailableIDs (-10 * millisInHour);

// Display all of these IDs.

System.out.println ("IDs associated with time zone -10 hours from GMT"); System.out.println ("================================================\n");

for (int i = 0; i < IDs.length; i++) System.out.println (IDs [i]);

Now, let's find out what offsets need to be added to GMT to obtain local time for someone living in the CST or EST time zones. The following code fragment calls the getRawOffset () method, divided by the number of milliseconds in one hour -- 60 * 60 * 1000 -- to obtain these offsets (expressed as hours). Daylight savings time isn't compensated for when obtaining these offsets.

System.out.println (tz1.getID () + ": " + tz1.getRawOffset () / millisInHour);

System.out.println (tz2.getID () + ": " + tz2.getRawOffset () / millisInHour + "\n");

Finally, we examine the Hawaiian Standard Time (HST) and MST time zones to find out if they use daylight savings time. As it turns out, HST does not. The following code fragment calls useDaylightTime () to obtain this information.

TimeZone tz = TimeZone.getTimeZone ("HST");

System.out.println ("HST (Hawaiian Standard Time): " + tz.useDaylightTime ());

tz = TimeZone.getTimeZone ("MST");

System.out.println ("MST (Mountain Standard Time): " + tz.useDaylightTime ());

Detailed information about TimeZone is available in the following class reference, located at Sun's Java Web site: http://java.sun.com/products/jdk/1.1/docs/api/java.util.TimeZone.html.

As of JDK 1.1.6, TimeZone has only one concrete subclass -- SimpleTimeZone. Objects instantiated from this class represent time zone offsets for use with the Gregorian calendar and can take daylight savings time into account. Although your code should normally work with the TimeZone umbrella class (for maximum portability), it may come across a locale that has no time zone support. In this case, it would need to create a time zone object for this locale -- via SimpleTimeZone.

Here are the results of running a Java application that creates a new time zone. Why would you want to create another time zone? Your code might come across a locale where Java doesn't provide time zone support. The source code to this application is located in example9.java.

Time zone ID: XYZ
Raw offset (hours): -6
Daylight savings time not in use.
Daylight savings time not in effect
Daylight savings time in use.
Daylight savings time not in effect

The preceding application results show that a new XYZ time zone object was created. The following code fragment shows how this object was created. As you can see, this time zone is a combination of a GMT offset and an ID, and is created by calling the SimpleTimeZone (int, String) constructor. Also, this time zone does not yet support daylight savings time.

// Create a time zone -6 hours from GMT (same as CST). // Name this time zone XYZ. There is no daylight savings // time.

SimpleTimeZone stz = new SimpleTimeZone (-6 * millisInHour, "XYZ");

Although I have not done so, it would be a good idea first to check if the XYZ ID has been assigned to the specified offset before using this ID. Call SimpleTimeZone's getAvailableIDs (int) method to return an array of IDs that are assigned to a specified offset. Scan this list to see if XYZ has already been assigned.

At some point, we decide this time zone should enforce daylight savings time behavior. How do we do this? Check out the following code fragment. It calls the setStartRule (int, int, int, int) and setEndRule (int, int, int, int) methods to set the rules for when daylight savings time starts and ends within this time zone.

// Set the start rule for daylight savings time. // This rule states that daylight savings begins at // 2 AM standard time on the first Sunday in April.

stz.setStartRule (Calendar.APRIL, 1, Calendar.SUNDAY, 2 * millisInHour);

// Set the end rule for daylight savings time. // This rule states that daylight savings ends at // 2 AM standard time on the last Sunday in October.

stz.setEndRule (Calendar.OCTOBER, -1, Calendar.SUNDAY, 2 * millisInHour);

The setStartRule (int, int, int, int) method sets the starting rule for determining daylight savings time within a time zone object and has the following prototype:

public void setStartRule (int month, int dayOfWeekInMonth, int dayOfWeek, int time)

The setEndRule (int, int, int, int) method sets the ending rule for determining daylight savings time within a time zone object and has the following prototype:

public void setEndRule (int month, int dayOfWeekInMonth, int dayOfWeek, int time)

What rules can be defined? Check out the following table for a list of rules. These rules apply equally apply to both methods.

RuleDescription
1

If dayOfWeekInMonth is x (a positive number) and dayOfWeek is y (a positive number), this indicates the xth occurrence of y from the start of month.

// DST starts at 2 AM standard time on the first Sunday in April.

stz.setStartRule (Calendar.APRIL, 1, Calendar.SUNDAY, 2 * millisInHour);

2

If dayOfWeekInMonth is -x (a negative number) and dayOfWeek is y (a positive number) then this indicates the xth occurrence of y from the end of month.

// DST starts at 3 AM standard time on the last Sunday in April.

stz.setStartRule (Calendar.APRIL, -1, Calendar.SUNDAY, 3 * millisInHour);

3

If dayOfWeek is 0 then this specifies an exact day within month.

// DST starts at 5 AM standard time on June 5.

stz.setStartRule (Calendar.JUNE, 5, 0, 5 * millisInHour);

4

If dayOfWeekInMonth is x (a positive number) and dayOfWeek is -y (a negative number), this indicates the first occurrence of y on or after the xth of month.

// DST starts at 1 PM standard time on the first WEDNESDAY on or after the 15th // of August.

stz.setStartRule (Calendar.AUGUST, 15, -Calendar.WEDNESDAY, 13 * millisInHour);

5

If dayOfWeekInMonth is -x (a negative number) and dayOfWeek is -y (a negative number), this indicates the last occurrence of y on or before the xth of month.

// DST starts at 6 AM standard time on the last FRIDAY on or before the 21st // of February.

stz.setStartRule (Calendar.FEBRUARY, -21, -Calendar.FRIDAY, 6 * millisInHour);

Finally, let's take a look at a code fragment that calls inDaylightTime (Date) to see if a given date, specified by a Date object argument, falls within daylight savings time.

// Check if date is in daylight savings time.

if (stz.inDaylightTime (now)) System.out.println ("Daylight savings time in effect"); else System.out.println ("Daylight savings time not in effect");

Detailed information about SimpleTimeZone is available in the following class reference, located at Sun's Java Web site: http://java.sun.com/products/jdk/1.1/docs/api/java.util.SimpleTimeZone.html.

Calendars

A calendar is a system for setting the beginning, length, and division of a year into days, weeks, and months. There are many different kinds of calendars: lunar (a 13-month calendar based on the phases of the moon), Gregorian, Julian, Mayan, Hebrew, and so on. These days, most of our world recognizes the Gregorian calendar. However, unless I'm mistaken, the lunar calendar is still used in some countries in the Middle East.

Ideally, international code should work with calendars in a generic fashion. For example, if the default locale shows that Java code is running in North America, this code should work with the Gregorian calendar. On the other hand, this code should probably work with the lunar calendar if it detects a Middle Eastern locale. For most applications, this point may not seem to be important.

Why not just use the Gregorian calendar, since it's already used by most of the world? Software that has been internationalized cannot afford to make assumptions. If a given locale uses a non-Gregorian calendar, calendar logic must automatically use that calendar whenever that locale is the default.

Java's Calendar class is used to obtain objects that represent calendars. Because Calendar is an abstract class, you must call one of Calendar's four static factory methods -- getInstance (), getInstance (TimeZone), getInstance (Locale), and getInstance (TimeZone, Locale) -- to return objects that have been instantiated from Calendar's concrete subclasses.

The applet below instantiates a Calendar object for each user-selected locale and then queries this object for date/time information. The source code to this applet is located in example10.java.

You need a Java-enabled browser to view this applet.

Most of the information shown in the preceding applet is pretty straightforward. However, there are some elements to point out.

Zone offset identifies the number of hours that must be added to GMT to achieve local standard time. For example, adding -6 to GMT results in the CST time zone, found in North America and South America.

DST offset represents the number of hours that must be subtracted from local time to obtain standard time (when daylight savings time is in effect). For example, one hour is added to CST on the first Sunday in April to begin daylight savings time. CST is then known as Central Daylight Savings Time (CDT). This extra hour is then subtracted from CDT on the last Sunday in October to return to CST.

First day of week is a one-based integer that represents the first day of a new week. A value of 1 represents Sunday.

In our applet, Sunday is shown as the first day of the week in Japan and the United States, while Monday marks the first day of the week in France. This is yet another example of a cultural difference that internationalized software must handle.

Let's look at a code fragment to see how Calendar subclass objects are instantiated.

if (locale.equals ("Default")) c = Calendar.getInstance (); else if (locale.equals ("France")) c = Calendar.getInstance (TimeZone.getTimeZone ("ECT"), new Locale ("fr", "FR")); else if (locale.equals ("Japan")) c = Calendar.getInstance (TimeZone.getTimeZone ("JST"), new Locale ("ja", "JP")); else c = Calendar.getInstance (TimeZone.getTimeZone ("CST"), new Locale ("en", "US"));

// Update the text area to show calendar information.

populate (c);

In the preceding code fragment, two of Calendar's factory methods are called to instantiate calendar objects from concrete subclasses: getInstance () and getInstance (TimeZone, Locale). You would call the getInstance () method whenever you want to work with a calendar that's associated with the default time zone and default locale. When working with other locales, you need to obtain appropriate time zones. This is done by calling TimeZone's getTimeZone (String) factory method with an appropriate time zone ID. As shown in the preceding code fragment, ECT is the ID for French standard time while JST is the ID for Japanese standard time.

Once a Calendar subclass object has been created, we can query this object to obtain the current values of its time fields. Calendar's get (int) method is used to obtain these values. The argument passed to get (int) identifies a time field that is stored within the calendar object. Calendar defines suitable field name constants for these time fields.

StringBuffer sb = new StringBuffer ();

sb.append ("Year = " + c.get (c.YEAR) + "\n"); sb.append ("Month = " + c.get (c.MONTH) + "\n"); sb.append ("Week of Year = " + c.get (c.WEEK_OF_YEAR) + "\n"); sb.append ("Week of Month = " + c.get (c.WEEK_OF_MONTH) + "\n"); sb.append ("Day = " + c.get (c.DATE) + "\n"); sb.append ("Day of Month = " + c.get (c.DAY_OF_MONTH) + "\n"); sb.append ("Day of Year = " + c.get (c.DAY_OF_YEAR) + "\n"); sb.append ("Day of Week = " + c.get (c.DAY_OF_WEEK) + "\n"); sb.append ("Day of Week in Month = " + c.get (c.DAY_OF_WEEK_IN_MONTH) + "\n"); sb.append ("AM/PM = " + ((c.get (c.AM_PM) == c.AM) ? "AM" : "PM") + "\n"); sb.append ("Hour = " + c.get (c.HOUR) + "\n"); sb.append ("Hour of day = " + c.get (c.HOUR_OF_DAY) + "\n"); sb.append ("Minute = " + c.get (c.MINUTE) + "\n"); sb.append ("Second = " + c.get (c.SECOND) + "\n"); sb.append ("Millisecond = " + c.get (c.MILLISECOND) + "\n"); sb.append ("Zone Offset = " + c.get (c.ZONE_OFFSET) / millisInHour + "\n"); sb.append ("DST Offset = " + c.get (c.DST_OFFSET) / millisInHour + "\n"); sb.append ("First Day of Week = " + c.getFirstDayOfWeek () + "\n");

// Populate text area control with calendar information.

ta.setText (sb.toString ());

Internally, a Calendar subclass object uses an integer array of time fields. Here are just a handful of the many useful methods Calendar provides:

  • You can call Calendar's add (int, int) method to add an offset (either positive or negative) to a time field.

  • You can call Calendar's roll (int, boolean) method to roll a time field up or down by a single unit of time. For example, roll (Calendar.MINUTE, true) rolls the minute field ahead by one minute.

  • You can call Calendar's getTimeZone () method to obtain the time zone object associated with a calendar, and Calendar's getTime () method to return the time fields as an equivalent Date object.

Please consult the JDK documentation for additional information on these and other methods. Detailed information about Calendar is available in the following class reference, located at Sun's Java Web site: http://java.sun.com/products/jdk/1.1/docs/api/java.util.Calendar.html.

As of JDK 1.1.6, Calendar has only one concrete subclass -- GregorianCalendar. Objects instantiated from this class represent calendars based on the Gregorian calendar. Although your code normally should work with the Calendar umbrella class (for maximum portability, since future versions of Java could introduce support for new calendars), it may need to access the various constants and methods available only in the GregorianCalendar class.

Detailed information about GregorianCalendar can be found in the following class reference, located at Sun's Java Web site: http://java.sun.com/products/jdk/1.1/docs/api/java.util.GregorianCalendar.html.

Formatters

A formatter is an object that arranges data into a locale-specific, user-oriented representation. For example, the United States locale formats numbers that contain comma characters (,) in the "thousands" positions. In contrast, the France locale formats numbers with space characters (" ") in these positions.

There are three categories of formatters: numeric, date, and message. Note that Java's various numeric and date formatters may not handle every possible locale. So prior to attempting to instantiate either formatter, your code should check for locale support by calling the actual numeric or date umbrella class's getAvailableLocales () method -- to obtain a list of all supported locales -- and then examining this list to see if a given locale is supported.

For the sake of simplicity, the various formatter examples do not check to see if a locale is supported. (The locales they work with are pretty standard.) In a production environment, however, these examples should contain logic to check for locale support.

Numeric formatters

A numeric formatter is an object that formats numeric data into a locale-specific representation. Java contains four kinds of numeric formatters: number, currency, percentage, and scientific. For whatever reason, the scientific formatter is not available for use. (Internally, JDK 1.1.6 source code has commented out the public reserved word for each of the scientific formatter's two static factory methods.)

Java's NumberFormat umbrella class is used to instantiate numeric formatters. Since NumberFormat is an abstract class, you must call a static factory method to return a numeric formatter. NumberFormat contains a pair of static factory methods for each kind of numeric formatter; one of these methods works with the default locale, while the other method requires a Locale object argument.

The getNumberInstance () method returns a number formatter for the default locale, while the getNumberInstance (Locale) method returns a number formatter for a specified locale. The getCurrencyInstance () method returns a currency formatter for the default locale, and the getCurrencyInstance (Locale) method returns a currency formatter for a specified locale. And finally, the getPercentInstance () method returns a percent formatter for the default locale, while the getPercentInstance (Locale) method returns a percent formatter for a specified locale. As previously stated, the scientific formatter's factory methods are not publicly-accessible.

This table displays the results of running a Java application that instantiates number formatters for the U.S., France and Germany locales, and formats a floating-point number according to each locale's conventions. The source code to this application is located in example11.java.

Unformatted: 1234567.89

Formatted for en_US: 1,234,567.89 Formatted for fr_FR: 1�?234�?567,89 Formatted for de_DE: 1.234.567,89

If you were to observe the preceding results at a Microsoft Windows command prompt, you might be surprised to see a couple of strange-looking characters located in the formatted number for the fr_FR locale. Instead of seeing a space character between the 1 and 2 characters (as well as the 4 and 5 characters), you would see an accented lower-case "a" character. What's going on?

These strange-looking characters, located in the grouping (thousands) separator positions for French numbers, are represented by the Unicode value '\u00A0' and are known as non-breaking space character.

A non-breaking space character is a space character that cannot be used as the location for a line-break. Normally, when a sequence of characters exceeds the maximum length of a line, software must find a place within this line where the line can be broken (all characters following the break are moved to the next line). It would not be appropriate to break a word between its letters. Nor would it be appropriate to break a number between its digits.

The logical choice is to break a line between words and numbers -- in the area occupied by space characters. Unfortunately, there's a problem with this approach, and it manifests itself with French numbers. The appropriate symbol used as a group separator to identify the "thousands" positions within French numbers is a space character. It would be insulting to a French user if software broke a line at one of these separators; part of the number would appear on one line while the other part appears on the next line. The Unicode standardization committee came up with a novel solution to this problem: the non-breaking space character. This character appears as a space character, but it's not treated as one by software.

So how do we use number formatters? Take a look at the following code fragment.

double value = 1234567.89; ...

Locale locales [] = { Locale.US, Locale.FRANCE, Locale.GERMANY };

for (int i = 0; i < locales.length; i++) { NumberFormat nf = NumberFormat.getNumberInstance (locales [i]);

System.out.println ("Formatted for " + locales [i] + ": " + nf.format (value)); }

This code fragment iterates through an array of Locale objects -- United States, France and Germany. For each Locale object, NumberFormat's getNumberInstance (Locale) static factory method is called to instantiate a number formatter. Once instantiated, the formatter's format (double) method is called to format the contents of the value variable according to each locale's conventions.

The next table shows the results of running a Java application that instantiates currency formatters for the U.S., France, Germany and Japan locales, and formats a floating-point number according to each locale's conventions. The source code to this application is located in example12.java.

Unformatted: 1234567.89

Formatted for en_US: ,234,567.89 Formatted for fr_FR: 1�?234�?567,89 F Formatted for de_DE: 1.234.567,89 DM Formatted for ja_JP: €1,234,567.89

The code behind this example is almost identical to the previous example's code. The only differences are the inclusion of the Japan locale and the static factory method that's called to instantiate a currency formatter: NumberFormat nf = NumberFormat.getCurrencyInstance (locales [i]);.

Here are the results of running a Java application that instantiates percent formatters for the U.S., France and Germany locales, and formats a floating point-number according to each locale's conventions. The source code to this application is located in example13.java.

Unformatted: 120.89

Formatted for en_US: 12,089% Formatted for fr_FR: 12�?089% Formatted for de_DE: 12.089%

The code behind this example is almost identical to the previous example's code. The only differences are the absence of the Japan locale and the static factory method that's called to instantiate a percent formatter: NumberFormat nf = NumberFormat.getPercentInstance (locales [i]);. Once again, since France uses a space character as its grouping (thousands) separator, this formatter places a non-breaking space character in the result.

NumberFormat's numeric, currency, and percentage formatters also can be used to parse locale-specific strings of numeric, currency or percentage values. This is accomplished through a numeric formatter's parse (String) method.

Figures 1a through 1e show the results of running a Java application that parses user-entered numeric, currency and percentage values for a variety of locales. The Java command line that was used to generate each set of results is shown at the top of each figure. The source code to this application is located in example14.java.

java example14 23.0

args [0] = 23.0

Locale = en_US

Using numeric parser.

Parsed result = 23.0

Figure 1a: Numeric parser results for United States locale

java example14 4,322.89 C

args [0] = 4,322.89 args [1] = C

Locale = en_US

Using currency parser.

Parsed result = 54322.89

Figure 1b: Currency parser results for United States locale

java example14 95.8% P

args [0] = 95.8% args [1] = P

Locale = en_US

Using percentage parser.

Parsed result = 0.958

Figure 1c: Percentage parser results for United States locale

java example14 134,89 N fr FR

args [0] = 134,89 args [1] = N args [2] = fr args [3] = FR

Locale = fr_FR

Using numeric parser.

Parsed result = 134.89

Figure 1d: Numeric parser results for France locale

java example14 "1.234,89 DM" C de

args [0] = 1.234,89 DM args [1] = C args [2] = de

Locale = de

Using currency parser.

Parsed result = 1234.89

Figure 1e: Currency parser results for Germany locale

The first parameter to the Example 14 application is a numeric string representing a number, a currency value, or a percentage. This string must be surrounded by double-quote characters if it contains embedded space characters. This is optionally followed by an uppercase letter that defines the kind of parser: C for currency, N for number, or P for percentage. The default is number. Following this parser code is an optional language ID and an optional country ID.

Detailed information about NumberFormat is available in the following class reference, located at Sun's Java Web site: http://java.sun.com/products/jdk/1.1/docs/api/java.text.NumberFormat.html.

So what happens if your code needs a number, currency, or percent formatter for a locale that's not supported? Java provides a solution via the DecimalFormat and DecimalFormatSymbols classes. (These classes are used internally by NumberFormat.)

DecimalFormat allows you to create your own patterns (structures that define ways of doing things) for controlling the visual representation of a formatted value. DecimalFormatSymbols allows you to define the actual characters that appear in this visual representation. Please consult The Java Tutorial (see Resources) for more information on using these classes.

Detailed information about DecimalFormat is available in the following class reference, located at Sun's Java Web site: http://java.sun.com/products/jdk/1.1/docs/api/java.text.DecimalFormat.html. And detailed information about DecimalFormatSymbols is available in the following class reference, located at Sun's Java Web site: http://java.sun.com/products/jdk/1.1/docs/api/java.text.DecimalFormatSymbols.html.

Date formatters

A date formatter is an object that formats a date's components into a locale-specific representation.

Java's DateFormat umbrella class is used to instantiate date formatters. Since DateFormat is an abstract class, you must call a static factory method to return a date formatter. There are several factory methods.

  • getDateInstance ()
  • getDateInstance (int)
  • getDateInstance (int, Locale)
  • getTimeInstance ()
  • getTimeInstance (int)
  • getTimeInstance (int, Locale)
  • getDateTimeInstance ()
  • getDateTimeInstance (int, int)
  • getDateTimeInstance (int, int, Locale)
  • getInstance ()

Earlier, when I defined date, I mentioned that a date consists of several components -- day, month, hour, minute, and so on. These factory methods are used to format some or all of these components. For example, the getDateInstance methods format only the components whose length is greater than or equal to a day: day, month, year, and so on. The getTimeInstance methods format only the components whose length is less than a day -- hour, minute, second, and so on. If you want to format all of the components, use the getDateTimeInstance methods. Finally, the getInstance () method is a special case of getDateTimeInstance (int, int).

Dates are formatted by using formatting styles. There are five available styles:

  • DEFAULT
  • SHORT
  • MEDIUM
  • LONG
  • FULL

The DEFAULT style is the normal style that's used by a given locale. The SHORT style is completely numeric. The MEDIUM style allows for short month names. The LONG style allows for long month names. The FULL style provides everything. This table provides some examples.

StyleU.S. Locale DateU.S. Locale TimeU.S. Locale Date & TimeGerman Locale DateGerman Locale TimeGerman Locale Date & Time
SHORT 8/25/98 12:21 PM 8/25/98 12:21 PM 25.08.98 12:21 25.08.98 12:21
MEDIUM 25-Aug-98 12:21:55 PM 25-Aug-98 12:21:55 PM 25.08.1998 12:21:55 25.08.1998 12:21:55
LONG August 25, 1998 12:21:55 PM CDT August 25, 1998 12:21:55 PM CDT 25. August 1998 12:21:55 GMT-05:00 25. August 1998 12:21:55 GMT-05:00
FULL Tuesday, August 25, 1998 12:21:55 o'clock PM CDT Tuesday, August 25, 1998 12:21:55 o'clock PM CDT Dienstag, 25. August 1998 12.21 Uhr GMT-05:00 Dienstag, 25. August 1998 12.21 Uhr GMT-05:00

Below are the partial results of running a Java application that creates date formatters using the FULL formatting style. The source code to this application is located in example15.java.

Default Locale: en_US Date: Monday, August 17, 1998 Default Locale: en_US Time: 4:15:04 o'clock PM CDT Default Locale: en_US Date/Time: Monday, August 17, 1998 4:15:04 o'clock PM CDT

...

Locale: fi_FI Date: 17. elokuuta 1998 Locale: fi_FI Time: 16:15:04 GMT-05:00 Locale: fi_FI Date/Time: 17. elokuuta 1998 16:15:04 GMT-05:00

...

Locale: fr_CA Date: 17 aošt, 1998 Locale: fr_CA Time: 16 h 15 GMT-05:00 Locale: fr_CA Date/Time: 17 aošt, 1998 16 h 15 GMT-05:00

The following code fragment demonstrates how to create a date formatter:

// Get the current date.

Date now = new Date ();

// Get a date formatter for the default locale using the FULL date style.

DateFormat df = DateFormat.getDateInstance (DateFormat.FULL);

After the current date is obtained and stored in the now object, the getDateInstance (int) factory method is called. This method's single argment defines a formatting style. In this case, the FULL formatting style is represented by the DateFormat.FULL constant.

DateFormat's date formatters also can be used to parse locale-specific strings. This is accomplished through a date formatter's parse (String) method.

This applet parses user-entered dates and formats the results using date formatters. The source code to this applet is located in example16.java.

You need a Java-enabled browser to view this applet.

In the Input text field, enter a date, time, or date/time combination (as specified by the Formatter choice) using locale-specific conventions (as specified by the Locale choice). Press the FULL, LONG, MEDIUM, and SHORT buttons to parse the contents of the Input text field according to an appropriate style. (The entered text must be appropriate for a style.) If parsing is successful, the parsed text will be displayed in the Output text field. If it is not successful, an Unable to Parse! error message will be displayed in the Output text field.

IMPORTANT NOTE: Unfortunately, the applet above works erratically with JDK versions less than 1.1.6. This means that if you try to run this applet in a browser that does not support 1.1.6 or higher, parsing will not always work (you receive the Unable to Parse! message). For some reason, the Formatter and Locale drop-down listboxes are "crammed" together, with the lower-part of the Locale listbox missing. Occasionally, the contents of these listboxes are overwritten. In short, this applet works fine with the JDK 1.1.6 appletviewer program.

Detailed information about DateFormat is available in the following class reference, located at Sun's Java Web site: http://java.sun.com/products/jdk/1.1/docs/api/java.text.DateFormat.html.

So what happens if your code needs a date formatter for a locale that's not supported? Java provides a solution via the SimpleDateFormat and DateFormatSymbols classes. (These classes are used internally by DateFormat.) SimpleDateFormat allows you to create your own patterns for controlling the visual representation of a formatted value. DateFormatSymbols allows you to define the actual characters that appear in this visual representation -- such as month names and time zone strings.

Please consult The Java Tutorial (see Resources) for more information on using these classes. Detailed information about SimpleDateFormat is available in the following class reference, located at Sun's Java Web site: http://java.sun.com/products/jdk/1.1/docs/api/java.text.SimpleDateFormat.html. And detailed information about DateFormatSymbols is available in the following class reference, located at Sun's Java Web site: http://java.sun.com/products/jdk/1.1/docs/api/java.text.DateFormatSymbols.html.

Message formatters

A message formatter is an object that formats a compound message. What's a compound message? Before defining this term, let's look at a definition for message.

A message is textual data (usually) that provides the user with status or error information, descriptive names (such as widget names), and so on. There are two categories of messages: simple and compound. A simple message consists of static (non-changing) text, and a compound message consists of static and variable (changing) text, such as dates, currencies, and file counts. This table provides an example of a compound message with the variable text shown in boldface type.

Error 26! Disk "Accounts" last backed
up on August 11,1998.  826 files were
deleted Account balance is: ,659.23

Unlike simple messages, compound messages cannot be stored directly in resource bundles. (After all, how can we store variable text?) However, as we will see, there is an indirect way to store compound messages in resource bundles.

Each variable text item can be replaced by an argument: instructional text that provides a message formatter with information on how to format that item. The resulting combination of static text and arguments is known as a message pattern. The next table shows a message pattern that's based on the compound message in the previous table with arguments shown in boldface type.

Error {0, number, integer}!  Disk
"{1}" last backed up on {2, date,
long}.  {3, number, integer} files were
deleted Account balance is: {4, number, currency}

Each argument is surrounded by brace characters ("{}"). The first component of an argument is a digit -- the argument number. Argument numbers identify arguments and do not need to be placed in any particular order. For example, I could have placed the digit 1 in the argument that follows Error and the digit 0 in the argument that follows Disk. This table describes each argument.

ArgumentDescription
{0, number, integer} This argument represents a Number object, using the INTEGER style.
{1} This argument will be associated with a corresponding String object in the resource bundle that represents a disk name.
{2, date, long} This argument represents the components of a Date object whose length is greater than or equal to a day. This date will be formatted using the date formatter's LONG formatting style.
{3, number, integer} This argument represents a Number object, using the INTEGER style.
{4, number, currency} This argument represents a Number object, using the CURRENCY style.

The message pattern now can be stored in a resource bundle that's backed by a properties file for each locale. Let's create a resource bundle called Demo. For the United States locale, the name of this properties file is Demo_en_US.properties.

The following tables shows the contents of this file. Each logical line in the template string is terminated by the Unicode linefeed '\u000A' character so that it will appear on a separate physical line. The continuation character ("\") is used to inform the Java compiler that the next physical line is part of the string.

// Demo_en_US.properties

template = Error {0, number, integer}'\u000A' Disk "{1}" last backed up on {2, date, long}.'\u000A' {3, number, integer} files were deleted'\u000A' Account balance is: {4, number, currency}'\u000A'

diskname = Accounts

The code fragment below shows how to obtain the resource bundle for the United States locale:

ResourceBundle mb = ResourceBundle.getBundle ("Demo", Locale.US);

The next step is to create an array of arguments, as follows:

Object [] arguments = { new Integer (26), mb.getString ("diskname"), null, new Integer (826), new Double (3659.23) };

Calendar c = Calendar.getInstance (); c.set (1998, 7, 11); arguments [2] = c.getTime ();

The position of each argument must match the argument number in the message pattern. For example, the null argument in the preceding code fragment matches {2, date, long} in the message pattern after it's been set to a specific Date object.

Once the arguments array has been created, we can create a message formatter:

MessageFormat mf = new MessageFormat ("");
mf.setLocale (Locale.US);
mf.applyPattern (mb.getString ("template"));

In addition to creating the message formatter, the preceding code initializes its locale and establishes the message pattern that this formatter will use. The message can now be formatted by calling the MessageFormat's format (Object []) method, as shown in this code fragment:

String result = mf.format (arguments);

The source code to an application that demonstrates message formatting is located in example17.java. Here are the results of running this application with the United States and France locales:

Error 26 Disk "Accounts" last backed up on August 11, 1998. 826 files were deleted Account balance is: ,659.23

Erreur 26 Le disque " rend compte " dernier sauvegardÈ 11 aošt 1998. 826 fichiers ont ÈtÈ effacÈs L'Èquilibre de compte est: 3�?659,23 F

Detailed information about MessageFormat is available in the following class reference, located at Sun's Java Web site: http://java.sun.com/products/jdk/1.1/docs/api/java.text.MessageFormat.html.

Let's suppose your code needs to generate messages similar to the following: There are 3 delinquent accounts.. You could convert this message into a message pattern: There are {0, number} delinquent accounts., and store this pattern in a resource bundle. Now what happens if there is only one delinquent account? Your code would generate the message: There are 1 delinquent accounts.. This is bad grammar, and something that no professional application should display to a user. So what do you do?

The answer is to use Java's ChoiceFormat class. I'm going to defer discussion of ChoiceFormat to another resource as this article is already long enough. Please consult The Java Tutorial (see Resources) for more information on using this class. Detailed information about ChoiceFormat is also available in the following class reference, located at Sun's Java Web site: http://java.sun.com/products/jdk/1.1/docs/api/java.text.ChoiceFormat.html.

International fonts and non-Unicode text

How do we display Chinese, Arabic, or other intensely visual symbols? The answer is to make use of international fonts. How do we convert non-Unicode text to Unicode? The answer is to make use of encodings, Java's reader/writer classes, and tools such as native2ascii.

International fonts

The internationalization section of the JDK 1.1.6 documentation discusses how to add international fonts to the Java runtime. Specifically, it discusses how to add Japanese, Korean, Chinese, and Traditional Chinese fonts. Adding fonts involves working with a special file that's distributed with the runtime: font.properties. Rather than duplicate what's already been said, please consult this documentation for more information.

Java's "virtual" fonts are mapped to real fonts on the host machine. The internationalization section of the JDK 1.1.6 documentation discusses this mapping and provides detailed information on the structure of the font.properties file. Once again, please refer to this documentation for more information.

Non-Unicode text

As you already know, char variables in the Java programming language represent Unicode characters. However, few text editors support Unicode text entry. Many text editors are based on the ASCII character set. However, using such an editor, you can enter Java source code in ASCII and represent Unicode characters with special '\uxxxx' escape sequences (each x represents a hexadecimal digit). The Java compiler and runtime environment automatically convert ASCII and International Standards Organization (ISO) Latin-1 characters to Unicode characters. But if you want to convert characters from other encodings to Unicode, you'll need to do these conversions yourself.

Java contains APIs for translating non-Unicode text to Unicode. Before using these APIs, you must make sure that the character encoding of characters that must be converted to Unicode is supported. The internationalization section of the JDK 1.1.6 documentation provides a list of supported encodings.

You can convert a byte array of non-Unicode text to a Java String object by using code that's similar to this code fragment:

byte [] nonUnicodeBytes = ...;

String UnicodeCharacters = new String (nonUnicodeBytes, "UTF8");

In the preceding code fragment, the String (byte [], String) constructor is called to create a new String object from a byte array. The "UTF8" parameter specifies the encoding of nonUnicodeBytes. In this example, nonUnicodeBytes contains bytes that are stored using the UTF8 encoding (UTF8 is a compact binary form for encoding 16-bit Unicode characters into 8 bits). Conversely, once you have a String object, you can extract its contents into a byte array by calling its getBytes () or getBytes (String) methods.

The getBytes () method converts characters to bytes based on the JDK platform's default character encoding. The getBytes (String) method converts characters to bytes based on a specified encoding. For example, in the previous code fragment, we could call UnicodeCharacters.getbytes ("UTF8") to obtain the original byte array.

Java contains InputStreamReader and OutputStreamWriter classes for converting between Unicode character streams and bytestreams of non-Unicode text. Please consult The Java Tutorial (see Resources) for more information on using these classes.

Detailed information about InputStreamReader is available in the following class reference, located at Sun's Java Web site: http://java.sun.com/products/jdk/1.1/docs/api/java.io.InputStreamReader.html. And detailed information about OutputStreamWriter is available in the following class reference, located at Sun's Java Web site: http://java.sun.com/products/jdk/1.1/docs/api/java.io.OutputStreamWriter.html.

The native2ascii tool is used to convert a file with non-Unicode or non-ISO Latin-1 characters to a file with Unicode-encoded characters. Obviously, this tool is useful if you've created a file with a tool that generates characters using a character set that's alien to ASCII, ISO Latin-1, or Unicode. Please refer to the internationalization section of the JDK 1.1.6 documentation for more information on native2ascii.

Beyond JDK 1.1.6

I think that the perfect way to conclude this series is to look beyond JDK 1.1.6 by exploring some of the internationalization features that are new to JDK 1.1.7 and Java 2.

What's new in JDK 1.1.7?

JDK 1.1.7 has introduced at least two new features: euro currency symbol support and changes to the java.lang.Character class to reflect updates to the Unicode standard. (I haven't come across any other new features.)

On January 1, 1999, the European Monetary Union (EMU) will introduce the euro as the new common currency in 11 European countries. Applications that handle international currencies will need to take this new currency into account. Figure 2 shows a picture of the euro currency symbol.

Figure 2: Euro currency symbol

The euro currency symbol is represented by the Unicode character '\u20AC'. Figure 3 is an example of a number that uses this symbol.

Figure 3: Euro currency example

In Part 1 of this series, I mentioned that a locale consists of language, region, and variant components. Usually, it's sufficient to describe a locale by a combination of language and region. However, this is not always possible. For example, France is one of the European countries that will support the euro in 1999. Because it will take time to make a full transition from the French franc to the euro, France will use the two currencies. This affects Java.

As I've already mentioned, Java's NumberFormat class can be used to format currency values. To format a currency value, all your code needs to do is instantiate a currency formatter by calling NumberFormat's getCurrencyInstance (Locale) factory method and calling one of the resulting object's format methods.

Suppose you want to format the value of a double variable as a French currency value. You can do this by using the following code fragment:

double value = 1.23;

NumberFormat nf = NumberFormat.getCurrencyInstance (new Locale ("fr", "FR"));

System.out.println ("French currency: " + nf.format (value));

The displayed result is: 1,23 F. This is fine if we're dealing with francs, but how do we handle the French euro? The solution is to make use of the variant part of the locale. Take a look at the following code fragment:

double value = 1.23;

NumberFormat nf = NumberFormat.getCurrencyInstance (new Locale ("fr", "FR", "EURO"));

System.out.println ("French currency: " + nf.format (value));

If you compare the two preceding code fragments, you'll notice only one difference. The variant portion of the France locale is set to the string "EURO". In fact, this is the only way to differentiate a France locale that uses francs from a France locale that uses euros.

The displayed result is the same as the previous result except that the F is replaced by the euro currency symbol. In Figure 3, I showed an example of a euro number with the euro currency symbol to the left of the number. However, Java's currency formatter places the euro currency symbol to the right of the number. Which is correct? I'm not sure, but I'll stick with Java. Let Java's currency formatter "worry" about which side of a number to place the euro currency symbol.

JDK 1.1.7 introduced the concept of the EURO variant. Unfortunately, its currency formatter for the EURO variant does not use the euro currency symbol. Instead, it falls back to using existing currency formats. This problem has been corrected in JDK 1.1.7B. (I'm not sure about JDK 1.1.7A.)

The following results were generated from an application that formats French and German currency values according to the France and Germany euro locale conventions. (This application must be run under JDK 1.1.7B or higher). The source code to this application is located in example18.java. (NOTE: There is one additional caveat that I will shortly discuss.)

Unformatted: 1234567.89

Formatted for en_US: ,234,567.89 Formatted for fr_FR: 1�?234�?567,89 F Formatted for de_DE: 1.234.567,89 DM Formatted for ja_JP: €1,234,567.89 Formatted for fr_FR_EURO: 1�?234�?567,89 Ä Formatted for de_DE_EURO: 1.234.567,89 Ä

If you can't see the euro currency symbol, then you probably have guessed that the previously mentioned caveat is about this symbol. Windows 95, Windows NT, and versions of Solaris prior to version 7 cannot display the euro currency symbol without an update to their fonts. Information on obtaining the update is distributed with the README file that accompanies JDK 1.1.7B. I suggest that you obtain a copy of this update.

The other major new feature that I've come across in JDK 1.1.7 is an update to Java's Character class. This update makes it possible for Java to transition from the Unicode 2.0.14 standard (used by JDK versions prior to 1.1.7) to the Unicode 2.1.2 standard. The following table lists these changes. Basically, some of Character's methods have been modified so that certain Unicode argument values are remapped according to Unicode 2.1.2.

java.lang.Character MethodMethod argumentReturn Value Under JDK versions 1.1.7Return Value Under JDK version 1.1.7
toLowerCase (char) '\u018E' '\u0258' '\u01DD'
toLowerCase (char) '\u019F' '\u019F' '\u0275'
toUpperCase (char) '\u01DD' '\u01DD' '\u018E'
toTitleCase (char) '\u01DD' '\u01DD' '\u018E'
toUpperCase (char) '\u0258' '\u018E' '\u0258'
toTitleCase (char) '\u0258' '\u018E' '\u0258'
toLowerCase (char) '\u0275' '\u0275' '\u019F'
toUpperCase (char) '\u03C2' '\u03C2' '\u03A3'
toTitleCase (char) '\u03C2' '\u03C2' '\u03A3'
toUpperCase (char) '\u1E9B' '\u1E9B' '\u1E60'
toTitleCase (char) '\u1E9B' '\u1E9B' '\u1E60'
getType (char) '\u20AC' 0 - unassigned26 - currency symbol
isDefined (char) '\u20AC' false true
isJavaIdentifierPart (char) '\u20AC' false true
isJavaIdentifierStart (char) '\u20AC' false true
getType (char) '\u301F' 21 - start of punctuation22 - end of punctuation
getType (char) '\uFFFC' 0 - unassigned28 - other symbol
isDefined (char) '\uFFFC' false true

What's new in Java 2?

After spending some time with JDK 1.2 Beta 4 (and beyond), I've uncovered the following new internationalization features.

  • Complex character input (such as Japanese, Chinese and Korean) is supported through the Input Method Framework.

  • Complex character output (such as Japanese, Arabic and Hebrew) is supported through Java's 2D API.

  • The concept of Unicode character blocks (subsections of Unicode's character set) has been introduced to simplify the classification of characters.

  • The Character class has a new compareTo (Character) method that compares the current Character object with a Character object argument (for ordering).

  • The Character class has a new compareTo (Object) method that compares the current Character object with an Object object argument (for ordering). This argument must be castable to another Character object, otherwise a ClassCastException object is thrown. If the argument is another Object object, then this method behaves like the compareTo (Character) method.

  • The Character class has a new getUnicodeBlock (char) method, which returns a constant that identifies the character block to which the character located in the char argument belongs.

  • The Collator class has a new compare (Object, Object) method that compares two Object object arguments (for ordering). These two arguments must be castable to two String objects. Otherwise, a ClassCastException object is thrown.

  • The CollationKey class has a new compareTo (Object) method that compares the current CollationKey object with the Object object argument (for ordering).

  • The Comparator interface is new to Java 2.

  • The Comparable interface is new to Java 2. The String class uses this interface as the basis of its public CASE_INSENSITIVE_ORDER constant.

  • The Date class has a new compareTo (Date) method that compares the current Date object with the Date object argument (for ordering).

  • The Date class has a new compareTo (Object) method that compares the current Date object with an Object object argument (for ordering). This argument must be castable to another Date object, otherwise a ClassCastException object is thrown. If the argument is another Date object, then this method behaves like the compareTo (Date) method.

  • The String class has a new compareTo (Object) method that compares the current String object with an Object object argument (for ordering). This argument must be castable to another String object, otherwise a ClassCastException object is thrown. If the argument is another String object, then this method behaves like the compareTo (String) method.

  • The String class has a new compareToIgnoreCase (String) method that compares the current String object with a String object argument and ignores case (for ordering).

Conclusion

We've certainly been busy. After exploring Java's Date, TimeZone, and Calendar classes, we explored the formatter classes for dates, messages, and numbers. Along the way, we discovered that parsing user-input is handled by these same formatter classes. We briefly looked at international fonts and working with non-Unicode text. Finally, we ventured beyond JDK 1.1.6 and explored some internationalization features that are new to JDK 1.1.7 and Java 2.

We've come to the end of this series. I hope you've learned a few things and had some fun. If you'd like to learn more about internationalizing your software, check out the Web sites in the Resources section. These sites contain a wealth of useful material. In trying to keep this series to a manageable size, I did not go into a great deal of depth on some aspects of internationalization, such as calendars. Hopefully, some of you will expand on this work and present your results in future JavaWorld articles. Interested?

Jeff is a consultant working with various technologies including C++, digital signatures/encryption, Java, smart cards and Win32. He has worked for a number of technology-related consulting firms including EDS (Electronic Data Systems).

Learn more about this topic

  • More information on calendars is available in The Calendar FAQ http://www.pip.dknet.dk/~pip10160/calendar.html
  • An interesting overview article on internationalization is available at http://developer.java.sun.com/developer/technicalArticles/intl.html
  • The official Unicode Web site contains a wealth of information on this character definition standard http://www.unicode.org
  • IBM has a "cookbook" on creating global applications http://www.ibm.com/java/education/globalapps/index.html
  • Sun offers a tutorial on Java, The Java Tutorial, that contains a really good section on internationalization http://java.sun.com/docs/books/tutorial/index.html
  • A complete list of ISO-639 language codes is available at http://www.ics.uci.edu/pub/ietf/http/related/iso639.txt
  • A complete list of ISO-3166 country codes is available at http://www.ics.uci.edu/pub/ietf/http/related/iso3166.txt
  • An interesting article on localization with resource bundles is available at http://developer.java.sun.com/developer/technicalArticles/ResourceBundles.html
  • You can translate text from one language to another from this site http://babelfish.altavista.com/cgi-bin/translate?
  • Note: Two of the Resources links take you to Sun's Java Developer Connection site. You need to be a member of the Java Developer Connection in order to view articles, and you will be prompted to enter a user ID and password the first time you access this site. There is no charge to become a member. You can register when prompted to enter a user ID/password.