Wizard API updated!
Tim Boudreau has released a new version of the Swing Wizard library (version 0.997) that fixes the WizardException bug reported in JavaWorld's recent Open Source Java Project profile. The article's examples have been reworked to test out the new, improved WizardException. Thanks, Tim, for this helpful fix!
Open Source Java Projects: The Wizard API

Newsletter sign-up

Sign up for our technology specific newsletters.

Enterprise Java
View all newsletters

Email Address:

Multibyte-character processing in J2EE

Develop J2EE applications with multibyte characters

The Chinese language is one of the most complex and comprehensive languages in the world. Sometimes I feel lucky to be Chinese, specifically when I see some of my foreign friends struggle to learn the language, especially writing Chinese characters. However, I do not feel so lucky when developing localized Web applications using J2EE. This article explains why.

Though the Java platform and most J2EE servers support internationalization well, I am still confronted by many multibyte-character problems when developing Chinese or Japanese language-based applications:

  • What is the difference between encoding and charset?
  • Why do multibyte-character applications display differently when ported from one operating system to another?
  • Why do multibyte-character applications display differently when ported from one application server to another?
  • Why do my multibyte-character applications display well on the Internet Explorer browser but not on the Mozilla browser?
  • Why do applications on most J2EE servers display poorly when using UTF-16 (universal transformation format) encoding?


If you are asking the same set of questions, this article helps you answer them.

Basic knowledge of characters

Characters have existed long before computers. More than 3,000 years ago, special characters (named Oracles) appeared in ancient China. These characters have special visual forms and special meanings, with most having names and pronunciations. All of these facets compose the character repertoire, a set of distinct characters defined by a special language, with no relationship to the computer at all. Over thousands of years, many languages evolved and thousands of characters were created. And now we are trying to digitize all these characters into 1s and 0s, so computers can understand them.

When typing words with a keyboard, you deal with character input methods. For simple characters, there is one-to-one mapping between a key and a character. For a more complex language, a character needs multiple keystrokes.

Before you can see characters on the screen, the operating system must store characters in memory. In fact, the OS defines a one-to-one correspondence between characters in a character repertoire and a set of nonnegative integers, which are stored in memory and used by the OS. These integers are called character code.

Characters can be stored in a file or transmitted through the network. Software uses character encoding to define a method (algorithm) for mapping sequences of a character's character code into sequences of octets. Some character code maps into one byte, such as ASCII code; other character code, such as Chinese and Japanese, map into two or more bytes, depending on the different character-encoding schemas.

Different languages may use different character repertoires; each character repertoire uses some special encodings. Sometimes, when you choose a language, you may choose a character repertoire implicitly, which uses an implied character encoding. For example, when you choose the Chinese language, you may, by default, use the GBK Chinese character repertoire and a special encoding schema also named GBK.

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |  Next >

Discuss

Start a new discussion or jump into one of the threads below:

Subject Replies Last post
. Helpful article - another good one
By coljac
1 05/09/08 10:40 AM
by Anonymous
. good!
By Anonymous
0 04/29/08 10:37 PM
by Anonymous
. Chinese encodings don't work on Sun' Chinese site
By Vernon
0 04/22/08 06:03 AM
by Anonymous
. very good
By Xirui
0 04/22/08 06:03 AM
by Anonymous
. Excellent!
By Anonymous
0 04/22/08 06:02 AM
by Anonymous
. Multibyte-character processing in J2EE
By JavaWorldAdministrator
0 04/05/08 06:09 PM
by Anonymous
. this article just saved my project!!
By Anatolia
5 04/05/08 06:08 PM
by Anonymous
. Multibyte Characters in HTTP Header itself
By Matt 12345
0 04/02/08 09:36 AM
by Anonymous
. very useful article!
By Anonymous
0 04/02/08 09:33 AM
by Anonymous
. Tomcat is buggy
By Anonymous
0 04/02/08 09:29 AM
by Anonymous
. Very good article!!!
By Fest Farmer
0 04/02/08 09:20 AM
by Anonymous
. Excellent!! Just meet my needs.
By JavaLearner
0 07/28/07 09:54 PM
by Anonymous
. excellent!
By xinfeng liu
0 05/24/07 06:47 PM
by Anonymous
. Good Article on Multibyte character encoding
By Sanjay Patra
0 03/27/06 04:24 AM
by Anonymous
. good !
By Anonymous
0 05/09/04 02:46 AM
by Anonymous


Resources