Server-Side Java: Using XML and JSP together

Two great tastes that taste great together

For the purpose of this article I'm going to assume that you know what JavaServer Pages (JSP) and Extensible Markup Language (XML) are, but you may be a little unclear on how you can use them. JSP use is pretty easy to defend. It allows you to design a Website built from files that look and act a lot like HTML. The only difference is that JSPs also act dynamically -- for example, they can process forms or read databases -- using Java as a server-side scripting language. XML use is more difficult to justify. While it seems as if every new product supports it, each one seems to be using XML for a different purpose.

In this article, you will learn to design a system using XML in a fairly modest way. Many Websites have vast collections of data that are displayed in a more or less standard way. I will design a system that uses XML files to store data on a Web server and JSP files to display that data.

XML versus relational databases

"But wait," you may ask, "you're using XML to store data? Why not use a database?" Good question. The answer is that for many purposes, a database is overkill. To use a database, you have to install and support a separate server process, which often also requires installing and supporting a database administrator. You must learn SQL, and write SQL queries that convert data from a relational to an object structure and back again. If you store your data as XML files, you lose the overhead of an extra server. You also gain an easy way to edit your data: just use a text editor, rather than a complicated database tool. XML files are also easier to back up, to share with your friends, or to download to your clients. You can also easily upload new data to your site, using FTP.

A more abstract advantage of XML is that, being a hierarchical rather than a relational format, it can be used in a much more straightforward manner to design data structures that fit your needs. You don't need to use an entity relationship editor nor normalize your schema. If you have one element that contains another element, you can represent that directly in the format, rather than using a join table.

Note that for many applications, a filesystem will not suffice. If you have a high volume of updates, a filesystem may get confused or corrupted by simultaneous writes; databases usually support transactions, which allow concurrency without corruption. Further, a database is an excellent tool if you need to make complicated queries, especially if they will vary from time to time. Databases build indexes, and are optimized for keeping the indexes up to date with a constantly changing data set. Relational databases also have many other advantages, including a rich query language, mature authoring and schema design tools, proven scalability, fine-grained access control, and so on.

(Note: You can use simple file locking to provide a poor man's transaction server. And you can also implement an XML index-and-search tool in Java, but that's a topic for another article.)

In this case, as in most low-to-medium volume, publishing-based Websites, you can assume the following: most of the data access is reads, not writes; the data, though potentially large, is relatively unchanging; you won't need to do complicated searches, but if you do, you'll use a separate search engine. The advantages of using a mature RDBMS fade, while the advantage of using an object-oriented data model come to the fore.

Finally, it's entirely possible to provide a wrapper for your database that makes SQL queries and translates them into XML streams, so you could have it both ways. XML becomes a more robust, programmer-friendly frontend to a mature database for storing and searching. (Oracle's XSQL servlet is one example of this technique.)

The application: An online photo album

Everybody loves photos! People love showing pictures of themselves, their friends, their pets, and their vacations. The Web is the ultimate medium for self-indulgent shutterbugs -- they can annoy their relatives from thousands of miles away. While a full-fledged photo album site would require a complicated object model, I'll focus on defining a single Picture object. (The source code for this application is available in Resources.) The object representing a picture needs fields representing its title, the date it was taken, an optional caption, and, obviously, a pointer to the image source.

An image, in turn, needs a few fields of its own: the location of the source file (a GIF or JPEG) and the height and width in pixels (to assist you in building <img> tags). Here there is one neat advantage to using the filesystem as your database: you can store the image files in the same directory as the data files.

Finally, let's extend the picture record with an element defining a set of thumbnail images for use in the table of contents or elsewhere. Here I use the same concept of image I defined earlier.

The XML representation of a picture could look something like this:

  <title>Alex On The Beach</title>
  <caption>Trying in vain to get a tan</caption>

Note that by using XML, you put all the information about a single picture into a single file, rather than scattering it among three or four separate tables. Let's call this a .pix file -- so your filesystem might look like this:



There's more than one way to skin a cat, and there's more than one way to bring XML data on to your JSP page. Here is a list of some of those ways. (This list is not exhaustive; many other products and frameworks would serve equally well.)

  • DOM: You can use classes implementing the DOM interface to parse and inspect the XML file
  • XMLEntryList: You can use my code to load the XML into a java.util.List of name-value pairs
  • XPath: You can use an XPath processor (like Resin) to locate elements in the XML file by path name
  • XSL: You can use an XSL processor to transform the XML into HTML
  • Cocoon: You can use the open source Cocoon framework
  • Roll your own bean: You can write a wrapper class that uses one of the other techniques to load the data into a custom JavaBean

Note that these techniques could be applied equally well to an XML stream you receive from another source, such as a client or an application server.

JavaServer Pages

The JSP spec has had many incarnations, and different JSP products implement different, incompatible versions of the spec. I will use Tomcat, for the following reasons:

  • It supports the most up-to-date versions of the JSP and servlet specs
  • It's endorsed by Sun and Apache
  • You can run it standalone without configuring a separate Web server
  • It's open source

(For more information on Tomcat, see Resources.)

You are welcome to use any JSP engine you like, but configuring it is up to you! Be sure that the engine supports at least the JSP 1.0 spec; there were many changes between 0.91 and 1.0. The JSWDK (Java Server Web Development Kit) will work just fine.

The JSP structure

When building a JSP-driven Website (also known as a Webapp), I prefer to put common functions, imports, constants, and variable declarations in a separate file called init.jsp, located in the source code for this article.

I then load that file into each JSP file using <%@include file="init.jsp"%>. The <%@include%> directive acts like the C language's #include, pulling in the text of the included file (here, init.jsp) and compiling it as if it were part of the including file (here, picture.jsp). By contrast, the <jsp:include> tag compiles the file as a separate JSP file and embeds a call to it in the compiled JSP.

Finding the file

When the JSP starts, the first thing it needs to do after initialization is find the XML file you want. How does it know which of the many files you need? The answer is from a CGI parameter. The user will invoke the JSP with the URL picture.jsp?file=summer99/alex-beach.pix (or by passing a file parameter through an HTML form).

However, when the JSP receives the parameter, you're still only halfway there. You still need to know where on the filesystem the root directory lies. For example, on a Unix system, the actual file may be in the directory /home/alex/public_html/pictures/summer99/alex-beach.pix. JSPs do not have a concept of a current directory while executing, so you need to provide an absolute pathname to the package.

The Servlet API provides a method to turn a URL path, relative to the current JSP or Servlet, into an absolute filesystem path. The method ServletContext.getRealPath(String) does the trick. Every JSP has a ServletContext object called application, so the code would be:

String picturefile =
  application.getRealPath("/" + request.getParameter("file"));


String picturefile =
  getServletContext().getRealPath("/" + request.getParameter("file"));

which also works inside a servlet. (You must append a / because the method expects to be passed the results of request.getPathInfo().)

One important note: whenever you access local resources, be very careful to validate the incoming data. A hacker, or a careless user, can send bogus data to hack your site. For instance, consider what would happen if the value file=../../../../etc/passwd were entered. The user could in this way read your server's password file.

The Document Object Model

DOM stands for the Document Object Model. It is a standard API for browsing XML documents, developed by the World Wide Web Consortium (W3C). The interfaces are in package org.w3c.dom and are documented at the W3C site (see Resources).

There are many DOM parser implementations available. I have chosen IBM's XML4J, but you can use any DOM parser. This is because the DOM is a set of interfaces, not classes -- and all DOM parsers must return objects that faithfully implement those interfaces.

Unfortunately, though standard, the DOM has two major flaws:

  1. The API, though object-oriented, is fairly cumbersome.
  2. There is no standard API for a DOM parser, so, while each parser returns a org.w3c.dom.Document object, the means of initializing the parser and loading the file itself is always parser specific.

The simple picture file described above is represented in the DOM by several objects in a tree structure.

Document Node
 --> Element Node "picture"
      --> Text Node "\n  " (whitespace)
      --> Element Node "title"
        --> Text Node "Alex On The Beach"
      --> Element Node "date"
           --> ... etc.

To acquire the value Alex On The Beach you would have to make several method calls, walking the DOM tree. Further, the parser may choose to intersperse any number of whitespace text nodes, through which you would have to loop and either ignore or concatenate (you can correct this by calling the normalize() method). The parser may also include separate nodes for XML entities (like &amp;), CDATA nodes, or other element nodes (for instance, the <b>big<b> bear would turn into at least three nodes, one of which is a b element, containing a text node, containing the text big). There is no method in the DOM to simply say "get me the text value of the title element." In short, walking the DOM is a bit cumbersome. (See the XPath section of this article for an alternative to DOM.)

From a higher perspective, the problem with DOM is that the XML objects are not available directly as Java objects, but they must be accessed piecemeal via the DOM API. See my conclusion for a discussion of Java-XML Data Binding technology, which uses this straight-to-Java approach for accessing XML data.

I have written a small utility class, called DOMUtils, that contains static methods for performing common DOM tasks. For instance, to acquire the text content of the title child element of the root (picture) element, you would write the following code:

Document doc = DOMUtils.xml4jParse(picturefile);
Element nodeRoot = doc.getDocumentElement();
Node nodeTitle = DOMUtils.getChild(nodeRoot, "title");
String title = (nodeTitle == null) ? null : DOMUtils.getTextValue(nodeTitle);

Getting the values for the image subelements is equally straightforward:

Node nodeImage = DOMUtils.getChild(nodeRoot, "image");
Node nodeSrc = DOMUtils.getChild(nodeImage, "src");
String src =  DOMUtils.getTextValue(nodeSrc);

And so on.

Once you have Java variables for each relevant element, all you must do is embed the variables inside your HTML markup, using standard JSP tags.

 <table bgcolor="#FFFFFF" border="0" cellspacing="0" cellpadding="5">
 <td align="center" valign="center">
 <img src="<%=src%>" width="<%=width%>" height="<%=height%>" border="0" alt="<%=src%>"></td>
1 2 Page 1
Page 1 of 2