Newsletter sign-up
View all newsletters

Sign up for our technology specific newsletters.

Enterprise Java
Email Address:

End-to-end internationalization of Web applications

Going beyond the JDK

  • Digg
  • Reddit
  • SlashDot
  • Stumble
  • del.icio.us
  • Technorati
  • dzone

Page 3 of 7

This risk only results when a user forces the page to be interpreted with an encoding for which it was not intended. In general, assuming the submitted text uses the same encoding as the form page is perfectly reasonable.

As noted earlier, there are problems associated with applications that render different pages using different encodings—and needing to know the browser's character encoding only adds to the mess. The character encoding used to decode submitted text must be set by calling setCharacterEncoding() on the ServletRequest object before calling getParameter(). Hence, you cannot embed the page encoding in a hidden form field unless you bypass the Servlet API (which is not recommended). Your best solution is to pick a single UTF encoding, such as UTF-8, and use it consistently throughout your application.

Controlling output character encoding

Because the output character encoding controls input character encoding, you must ensure the pages sent to your user are encoded as you intended.

You have several options for controlling output character encoding in a J2EE application. If you're writing a servlet, you can set the content type directly on the ServletResponse object. In doing so, however, be sure to use the java.io.PrintWriter to render your output. If you write directly to the java.io.OutputStream, your response will not be encoded as you intended:

   ServletResponse response = getServletResponse();
   // Always set the content type before getting the PrintWriter
   response.setContentType( "text/html; charset=UTF-8" );
   // Now, get the writer that will handle your output
   PrintWriter writer = response.getWriter();


Setting the content type directly on the response object in a servlet is essentially the same as using a JSP (JavaServer Pages) page directive like this:

   <%@ page contentType="text/html; charset=UTF-8" %>


Both methods set the output response encoding, but they have a shortcoming. If you use the same page encoding throughout your Web application, you'll need to replicate this code throughout all of your application's servlets and JSP pages. Are you certain you, or another developer on your team, won't forget this subtle one-liner in any of your code? If you set the encoding in the servlet, then you can, of course, encapsulate this behavior in a common subclass for all of your servlets. However, this approach isn't recommended; it now prevents you from subclassing from other framework-related base classes because Java restricts you to single-inheritance of implementation.

If you're using Struts, you're in luck. The contentType attribute on the controller element in your struts-config.xml file can be used to set your responses' default character encodings:

   <controller contentType="text/html; charset=UTF-8" />


This attribute only sets the default encoding type. A JSP page directive setting the content type, or setting the content type on the response object, overrides this setting.

If your Struts application has workflows that pass through servlets, or go directly to JSP pages without first passing through Struts, this configuration setting won't help.

  • Digg
  • Reddit
  • SlashDot
  • Stumble
  • del.icio.us
  • Technorati
  • dzone
Comment
Login
Forgot your account info?
Add comment
Anonymous comments subject to approval. Register here for member benefits.
Have a JavaWorld account? Log in here. Register now for a free account.
Resources