Multibyte-character processing in J2EE

Develop J2EE applications with multibyte characters

1 2 3 Page 3
Page 3 of 3
public class PostForm extends HttpServlet {
    ....    
    protected void processRequest(HttpServletRequest request, HttpServletResponse response)
    throws ServletException, IOException {
        response.setContentType("text/html;charset=UTF-16");
        PrintWriter out = response.getWriter();
        out.println("<html><head>");
        out.println("<meta content="text/html; charset=UTF-16\" http-equiv="content-type\">");
        out.println("</head><body>");
        out.println("<form action=\"servlet/ByteTest\" method=\"POST\">");
        out.println("<input type=\"text\" name=\"name\"><input type=\"submit\">");
        out.println("</form></body></html>");
        out.close();
    }
    
   ....    
}

Listing 6. ByteTest.java

public class ByteTest extends HttpServlet {
      ... 
    protected void processRequest(HttpServletRequest request, HttpServletResponse response)
    throws ServletException, IOException {
        ServletInputStream in = request.getInputStream();
        response.setContentType("text/html");
        PrintWriter out = response.getWriter();
        byte[] postdata = new byte[50];
        int size = in.read(postdata,0,50);
        in.close();
        out.println("<html>");
        out.println("<head>");
        out.println("<title>Servlet</title>");
        out.println("</head>");
        out.println("<body>");
        printBytes(out,postdata, size, "postdata");
        out.println("</body>");
        out.println("</html>");
        out.close();
    }
...
}

When run, the PostForm page is obviously encoded with UTF-16. So what is the output result from the servlet ByteTest?

  • Internet Explorer: Whatever characters we input, the browser performs UTF-8 encoding in this UTF-16-encoded page.
  • Mozilla: Whatever characters we input in this UTF-16-encoded page, only one character = is shown, an obviously wrong result.

Result

UTF-16 encoding can be used in a J2EE application only if it:

  • Uses only servlet technology
  • Limits the browser types to IE
  • Performs UTF-8 decoding on the server side in spite of UTF-16-encoded pages on the browser side

In fact, the UTF-8 encoding schema can be used in J2EE applications with no difficulties. In Unicode 3.1, UTF-8 can encode the same number of characters as UTF-16. UTF-8 differs from UTF-16 in storage and processing efficiency.

Conclusion

So, if you have multibyte-character problems in J2EE applications, you must dive into every phase of the development lifecycle, check the configurations on both the server and client side, and use the debug tools to help you find these problems' root cause.

Wang Yu presently works for Sun Microsystems as a Java technology engineer and technology architecture consultant. His duties include supporting local ISVs, evangelizing and consulting on important Java technologies such as J2EE, EJB (Enterprise JavaBeans), JSP/Servlet, JMS (Java Message Service), and Web services technologies.

Learn more about this topic

1 2 3 Page 3
Page 3 of 3