|
|
Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs
Read the whole "Internationalize Your Software" series:
Software that breaks free of a single region's or country's conventions is known as international software. Although it can be costly to develop international software -- because of the time required to write the software, the cost of language translation, and so on -- the results can be rewarding. There is simply a larger market for this software.
"Internationalize your software" is a three-part series that explores the topic of developing Java-based software for an international audience, and it's divided into numerous subtopics. The following subtopics are covered in Part 1:
Java applets are used to illustrate Java's internationalization and localization features. These applets were compiled with the JDK 1.1.6 compiler and tested with the JDK 1.1.6 appletviewer and Netscape Navigator 4.06 programs. Netscape was running version 1.1.5 of the Java runtime environment during testing.
First, let's define internationalization and localization.
The process of designing an application that can automatically adapt to different regions and countries without the need for recompilation is called internationalization. Because this word contains 18 letters between the first i and the last n, a shorter term, i18n, is sometimes used. A truly internationalized program contains no hard coded region- or country-specific elements -- for example, audio clips, text (GUI labels and other messages), graphics, currency/date/number formats, and so on. Instead, these elements are stored outside of the program, meaning that the program doesn't need to be recompiled every time a new region or country requires support.
The process of creating a set of region- or country-specific elements (including the translation of text) to support a new region or country is called localization. Because this word contains 10 letters between the first l and the last n, a shorter term, l10n, is sometimes used. The most time-consuming part of localization usually involves translating text. However, region- or country-specific elements such as currency/date/number formats need to be verified and this can also take a considerable time. Below is a partial checklist of elements that should be verified when localizing an internationalized program to a new region or country.
Human beings understand symbols -- letters, digits, punctuation, and so on -- while computers recognize binary numbers. Symbols must be mapped to binary numbers in order for a computer to be used effectively. Once mapped, the association between a symbol and a binary number is known as a character. The set of all mappings used on a particular computer is known as that computer's character set. Over the years, various standards have been developed for defining characters. Three of these standards have gained considerable fame: EBCDIC, ASCII, and Unicode. Following is a description of each of these character standards:
EBCDIC Extended Binary Coded Decimal Interchange Code (EBCDIC) was developed by IBM as a standard for associating 8-bit binary values, ranging from 0 to 255, with symbols taken from the English language. EBCDIC is a complex and proprietary code, existing in at least six mutually incompatible versions. (Even standards can get it wrong!) EBCDIC is used mainly in the mainframe world.
ASCII The American Standard Code for Information Interchange (ASCII) was developed by the American National Standards Institute (ANSI) committee as a standard for associating 7-bit binary values, ranging from 0 to 127, with symbols taken from the English language. ASCII is used primarily by smaller (that is, nonmainframe) computers.
Tables 1 and 2 identify all of the characters defined by ASCII. The first table lists those characters that have been used in the past to control devices, such as teletypes and printers, or have a special purpose in a programming language. (For example, the ASCII null [0] character is used in the C/C++ languages to identify the end of a sequence of characters -- a string.) These characters are not displayed to the user. The second table lists characters that can be displayed to the user.
Table 2: Displayable characters
There is a problem with the EBCDIC and ASCII character definition standards. Both standards are based on the English language and have no room for growth. How can they represent the many thousands of written characters used by modern and ancient languages? This problem has been addressed during the last few years and has resulted in the emergence of a new character definition standard called Unicode.
Unicode The Unicode standard maps out symbols to 16-bit binary numbers, which gives Unicode the ability to define 65,536 distinct characters. As of version 2.1, Unicode has defined a total of 38,887 characters. In contrast, ASCII defines a maximum of 128 characters. A link to the official Unicode Internet site is available in the Resources section.
The Java language supports Unicode. For example, Java's char (character) type has a defined size of 16 bits, allowing it to hold any one of Unicode's 65,536 characters. In contrast,
the C and C++ languages have a char type that has no defined size (that is, the size can vary from one platform to another) and usually has a size of 8 bits,
allowing it to hold a maximum of 256 characters. The Java language also defines a Unicode character constant notation. The
'\uxxxx' notation identifies a Unicode character where xxxx represents that character's hexadecimal code (a number that ranges from 0000 to FFFF covering the entire 65,536 Unicode character
range).
| Note |
|---|
ANSI/ISO C also defines the wchar_t wide-character type. This type is intended for representing characters from the ISO 10646 Universal Character Set. However
(on various platforms), it can also be used to represent Unicode characters. For more information about this type, check out
Wikipedia's Wide character entry and the UTF-8 and Unicode FAQ.
|
Below is a code fragment that defines several Unicode international character constants and prints them. The printed characters are also shown.
char [] characters = { '\u00e5', '\u00a5', '\u00c7' };
System.out.println (new String (characters));
€"
I originally defined internationalization as the process of desiging an application that automatically adapts to different regions and countries without the need to recompile the application. When we talk about "different regions and countries," we're really talking about locales. A locale is a geographical, political, or cultural region (possibly an entire country) that shares some combination of common geography, politics, or culture.
Java treats locales as objects. A locale object is nothing more than an identifier (made up of a language and a region/country
code) that is used by locale-sensitive classes -- classes containing locale-specific functionality (for example, Calendar). Locale objects are instantiated from the Locale class. Detailed information about Locale is available in the following class reference, located at Sun's Java Web site: http://java.sun.com/products/jdk/1.1/docs/api/java.util.Locale.html.
The concept of a global locale is not present in Java. Therefore, it's possible for different parts of a program to use different locales. This makes it possible to create multilingual programs. For example, imagine a Java program that displays a spreadsheet of financial transactions. Each column displays the same transactions based on locale-specific currencies using locale-specific formats.
A program can determine the default locale by calling the Locale class's static getDefault () method; it can set the default locale by calling the static setDefault (Locale) method. However, attempting to set the default locale from within an applet while running under a Netscape browser results
in a security violation (since an attempt is being made to change a fundamental property on a user's machine).
Figure 1 shows a Java applet that lets you change the applet's locale to English, French, German, or Italian. Once the locale
has been changed, some of the Locale class's information methods are called and the results are displayed in a text area. The source code to this applet is located
in example1.java.
Figure 1: Locale example
Below is a code fragment, taken from the applet shown above, that creates a locale object.
l = new Locale ("en", "US");
en, identifies the language that this locale will use (English), while the second argument, US, identifies the region used by this locale (United States). Both arguments were obtained from lists of standard language
and region/country codes that are defined and maintained by the International Standards Organization (ISO). For a complete
list of codes, refer to the Resources section.Below is a code fragment, taken from the applet shown in Figure 1, which calls some of the Locale class's information methods.
sb.append ("Language Code = " + l.getLanguage () + "\n");
sb.append ("Country Code = " + l.getCountry () + "\n");
sb.append ("Variant = " + l.getVariant () + "\n");
sb.append ("ISO 3-letter Language Abbreviation = " + l.getISO3Language () + "\n");
sb.append ("ISO 3-letter Country Abbreviation = " + l.getISO3Country () + "\n");
sb.append ("Display Language = " + l.getDisplayLanguage (l) + "\n");
sb.append ("Display Country = " + l.getDisplayCountry (l) + "\n");
sb.append ("Display Variant = " + l.getDisplayVariant (l) + "\n");
sb.append ("Display Name = " + l.getDisplayName (l) + "\n");
The getLanguage () method returns the lowercase, two-letter ISO-639 language code. The getCountry () method returns the uppercase, two-letter ISO-3166 region/country code. The getVariant () method returns the variant portion of the locale. (A variant is specified as the third argument in the Locale (String, String, String) constructor and is used to further differentiate a region -- such as North, South -- or to provide some vendor-specific code.)
The getISO3Language () method returns the three-letter ISO abbreviation for the locale's language. The getISO3Country () method returns the three-letter ISO abbreviation for the locale's region/country. The getDisplay methods take a locale object argument and return information using the locale argument's language (français would be returned as the display language for the French locale while English would be returned as the display language for the English locale). Locale defines several equivalent getDisplay methods that take no arguments. These methods return values based on the default locale. The getDisplayName (Locale) method calls getDisplayLanguage (Locale), getDisplayCountry (Locale), and getDisplayVariant (Locale), and concatenates this information into a single value.
Suppose you want to dynamically determine which locales are available on a particular platform. What do you do? Many of the
locale-sensitive classes define a getAvailableLocales () method that returns an array of locale objects. These objects represent all of the supported locales. Surprisingly, this
method is not part of the Locale class in any JDK version prior to 1.2.
Figure 2 shows a Java applet that lists all locales that are available on the current platform. Each locale is shown on a separate line starting with a lowercase, two-letter ISO language code, followed by an underscore character ("_") and an uppercase, two-letter ISO region/country code. This is followed by a descriptive name. If you view this applet with the JDK 1.1.6 appletviewer program, you'll see all of the region/country codes. Because Netscape Navigator 4.06 contains version 1.1.5 of the Java runtime environment, you won't see all of the region/country codes when running the applet under this browser. The source code to this applet is located in example2.java.
Below is a code fragment, taken from the applet shown in Figure 2, that calls the Calendar class's getAvailableLocales () method to return all locale objects that are available on the current platform. Each locale object's getLanguage (), getCountry (), and getDisplayName () methods are called to obtain the ISO language and region/country codes along with descriptive text. This information is concatenated
together and appended to a StringBuffer object.
// Obtain all currently available locales.
Locale [] locales = Calendar.getAvailableLocales ();
// Create a buffer for holding locale text.
StringBuffer sb = new StringBuffer ();
// Populate the buffer with locale text.
for (int i = 0; i < locales.length; i++) sb.append (locales [i].getLanguage () + "_" + locales [i].getCountry () + "\t" + locales [i].getDisplayName () + "\n");
When a program is localized, a set of locale-specific elements are created for each locale where this program will be used. These elements aren't stored in source code. Instead, they're stored in resource bundles. A resource bundle is a container that holds one or more locale-specific elements and is associated with one and only one locale.
A program works with one or more families of resource bundles. Each family contains resource bundles for all supported locales and differs from another family in the kind of elements that are stored in these bundles. For example, one family might hold text in its bundles while another family holds audio clips containing language-specific verbal instructions.
Each family shares a common family name, and each of a family's resource bundles has a unique locale designation appended
to this family name. This designation is what differentiates one resource bundle from another within the family. For example,
suppose that you plan to localize a financial applet for French-speaking investors who live in France and follow French customs,
and German-speaking investors who live in different regions/countries. There will be one family of resource bundles, and FinRes will be the name of this family. You do your research and learn that the appropriate language code for French is fr, and the appropriate region/country code for France is FR. Appending these codes to the bundle's family name, you end up with FinRes_fr_FR. You then learn that de is the appropriate language code for German. Finding that de is the appropriate language code for German, you append this code to your bundle's family name, and you end up with FinRes_de.
Resource bundles are instantiated from subclasses of Java's abstract ResourceBundle class. More detailed information about ResourceBundle is available in the following class reference, located at the Sun's Java Web site: http://java.sun.com/products/jdk/1.1/docs/api/java.util.ResourceBundle.html
When your program needs a locale-specific element, it calls the ResourceBundle class's static "getBundle" methods, either getBundle (String) for the default locale or getBundle (String, Locale) for a specified locale, to return an object that allows access to the element. Below is a code fragment that illustrates
a call to getBundle (String, Locale).
currentLocale = Locale.FRANCE;
ResourceBundle resources = ResourceBundle.getBundle ("FinRes", currentLocale);
The first getBundle argument specifies the family name of the resource bundle family (shown as FinRes in the code sample above), while the second argument identifies the desired locale (shown as currentLocale in the code sample above). Both arguments are used by getBundle (String, Locale) to construct the locale-specific name of the desired resource bundle object.
The getBundle methods search for a resource bundle object in a specific order, as follows:
The search begins by looking for a resource bundle that matches the family name, followed by the language, country, and variant
components. If there's no match, then the search continues by looking for a resource bundle that matches the family name,
followed by the language and country components. If then there is no match, the search continues by looking for a resource
bundle that matches the family name and language component. The search continues by looking for a bundle that matches only the family name if there still is no match.
Finally, when and if no match is made, a MissingResourceException object is thrown. This search process uses a graceful degradation algorithm to find a bundle that most closely matches the
bundle being searched for, in the event that the specified bundle either cannot be found or doesn't exist.
If the resource bundle can be found, then the getBundle methods will return an object (shown as resources in code sample above) that contains methods for extracting a locale-specific element. This object can extract an element
from either a property file, a class file, or some other developer-defined entity. It doesn't matter where this element is
stored because the getBundle object provides a common interface for accessing this element.
Sun's Java Software Division has defined two kinds of resource bundles:
A property resource bundle is a resource bundle that is based on a property file (a text-based list of key = value entries). This kind of bundle is useful for storing text. A common reason for a MissingResourceException object being thrown when working with property resource bundles is that the underlying property file is missing a .properties extension. A sample property file is shown below. Optional comments can appear in this file as long as they are prefaced
by a # character.
Hello=Bonjour Goodbye=Au revoir
PropertyResourceBundle class. Detailed information about PropertyResourceBundle is available in the following class reference, located at the Sun's Java Web site: http://java.sun.com/products/jdk/1.1/docs/api/java.util.PropertyResourceBundle.htmlFigure 3 shows the results of running a Java applet that lets you change the applet's locale to English, French, German, or
Italian, and view the words Hello and Goodbye in the languages represented by these four locales. Language-specific text is obtained from a property resource bundle (with
a family name of ex3). The image on the left shows this text using the English/United States locale, while the image on the right shows this text
using the Italian/Italy locale. The source code to this applet is located in example3.java.
![]() Example3 Applet (English - United States) |
![]() Example3 Applet (Italian) |
| Figure 3: PropertyResourceBundle example | |
The example3 applet will not run under Netscape because this applet attempts to read from a property file stored on the user's
machine and this read attempt results in a security violation. This security violation causes the Java runtime to throw a
MissingResourceException object.
I've created an HTML file, example3.html, and placed it in this article's zip file (see the Resources section). You can use this HTML file with appletviewer to run this applet. Below is a code fragment, taken from the "example3" applet, that calls the resource bundle object's getString (String) method to obtain a text element from the resource bundle whose family name is ex3. The string argument passed to getString (String) identifies the key in the list of key=value entries that are stored in the property resource bundle's underlying property file.
ResourceBundle resources = ResourceBundle.getBundle ("ex3", l);
StringBuffer sb = new StringBuffer ();
sb.append ("Hello = " + resources.getString ("Hello") + "\n");
sb.append ("Goodbye = " + resources.getString ("Goodbye") + "\n");
// Populate text area control with locale information.
ta.setText (sb.toString ());
A list resource bundle is a resource bundle that is based on a Java class file. This kind of bundle is useful for storing nontext elements such
as graphics and audio clips. A sample class file is shown below. This class must implement a single method, getContents ().
import java.util.*;
public class ex4_fr_FR extends ListResourceBundle { public Object [][] getContents () { return contents; }
private Object [][] contents = { { "Hello", "Bonjour" }, { "Goodbye", "Au revoir" } }; }
List resource bundles are implemented by Java's ListResourceBundle class. Detailed information about ListResourceBundle is available in the following class reference, located at the Sun's Java Web site: http://java.sun.com/products/jdk/1.1/docs/api/java.util.ListResourceBundle.html
Figure 4 shows a Java applet that lets you change the applet's locale to English, French, German, or Italian and view the
words Hello and Goodbye in the languages represented by these four locales. This text is obtained from a list resource bundle (with a family name
of ex4). The source code to this applet is located in example4.java.
Internationalizing your software is worth considering if access to the global marketplace is important, but this isn't a task to be undertaken lightly. Fortunately, Java has simplified the job, in part due to its platform-independent status and its internationalization and localization features.
In Part 1 of our three-part series, we've defined internationalization and localization. We've presented a list of elements that must be localized when creating international software. We encountered new characters and character sets, and explored the EBCDIC, ASCII, and Unicode character definition standards. We also examined locales and resource bundles. In Part 2, we'll explore text processing in a locale-sensitive manner along with formatters for messages, dates, times, numbers, and currencies.
If you have any questions about the material that's been presented in this article, please send me e-mail, using the link in my bio below. See you next month.