Easy Java/XML integration with JDOM, Part 1

Learn about a new open source API for working with XML

1 2 Page 2
Page 2 of 2

Those attributes are directly available on an Element.

  String width = table.getAttributeValue("width");

You can also retrieve the attribute as an Attribute instance. That ability helps JDOM support advanced concepts such as Attributes residing in a namespace. (See the section Namespaces later in the article for more information.)

  Attribute widthAttrib = table.getAttribute("width");
  String width = widthAttrib.getValue();

For convenience you can retrieve attributes as various primitive types.

  int width = table.getAttribute("border").getIntValue();

You can retrieve the value as any Java primitive type. If the attribute cannot be converted to the primitive type, a DataConversionException is thrown. If the attribute does not exist, then the getAttribute() call returns null.

Extracting element content

We touched on getting element content earlier, and showed how easy it is to extract an element's text content using element.getText(). That is the standard case, useful for elements that look like this:

<name>Enlightenment</name>

But sometimes an element can contain comments, text content, and child elements. It may even contain, in advanced documents, a processing instruction:

  <table>
    <!-- Some comment -->
    Some text
    <tr>Some child</tr>
    <?pi Some processing instruction?>
  </table>

This isn't a big deal. You can retrieve text and children as always:

  String text = table.getText(); // "Some text"
  Element tr = table.getChild("tr"); // <tr> child

That keeps the standard uses simple. Sometimes as when writing output, it's important to get all the content of an Element in the right order. For that you can use a special method on Element called getMixedContent(). It returns a List of content that may contain instances of Comment, String, Element, and ProcessingInstruction. Java programmers can use instanceof to determine what's what and act accordingly. That code prints out a summary of an element's content:

  List mixedContent = table.getMixedContent();
  Iterator i = mixedContent.iterator();
  while (i.hasNext()) {
    Object o = i.next();
    if (o instanceof Comment) {
      // Comment has a toString()
      out.println("Comment: " + o);
    }
    else if (o instanceof String) {
      out.println("String: " + o);
    }
    else if (o instanceof ProcessingInstruction) {
      out.println("PI: " + ((ProcessingInstriction)o).getTarget());
    }
    else if (o instanceof Element) {
      out.println("Element: " + ((Element)o).getName());
    }
  }

Dealing with processing instructions

Processing instructions (often called PIs for short) are something that certain XML documents have in order to control the tool that's processing them. For example, with the Cocoon Web content creation library, the XML files may have cocoon processing instructions that look like this:

  <?cocoon-process type="xslt"?>

Each ProcessingInstruction instance has a target and data. The target is the first word, the data is everything afterward, and they're retrieved by using getTarget() and getData().

  String target = pi.getTarget(); // cocoon-process
  String data = pi.getData(); // type="xslt"

Since the data often appears like a list of attributes, the ProcessingInstruction class internally parses the data and supports getting data attribute values directly with getValue(String name):

  String type = pi.getValue("type");  // xslt

You can find PIs anywhere in the document, just like Comment objects, and can retrieve them the same way as Comments -- using getMixedContent():

  List mixed = element.getMixedContent();  // List may contain PIs

PIs may reside outside the root Element, in which case they're available using the getMixedContent() method on Document:

  List mixed = doc.getMixedContent();

It's actually very common for PIs to be placed outside the root element, so for convenience, the Document class has several methods that help retrieve all the Document-level PIs, either by name or as one large bunch:

  List allOfThem = doc.getProcessingInstructions();
  List someOfThem = doc.getProcessingInstructions("cocoon-process");
  ProcessingInstruction oneOfThem =
    doc.getProcessingInstruction("cocoon-process");

That allows the Cocoon parser to read the first cocoon-process type with code like this:

  String type =
    doc.getProcessingInstruction("cocoon-process").getValue("type");

As you probably expect, getProcessingInstruction(String) will return null if no such PI exists.

Namespaces

Namespaces are an advanced XML concept that has been gaining in importance. Namespaces allow elements with the same local name to be treated differently because they're in different namespaces. It works similarly to Java packages and helps avoid name collisions.

Namespaces are supported in JDOM using the helper class org.jdom.Namespace. You retrieve namespaces using the Namespace.getNamespace(String prefix, String uri) method. In XML the following code declares the xhtml prefix to correspond to the URL "http://www.w3.org/1999/xhtml". Then <xhtml:title> is treated as a title in the "http://www.w3.org/1999/xhtml" namespace.

<html xmlns:xhtml="http://www.w3.org/1999/xhtml">

When a child is in a namespace, you can retrieve it using overloaded versions of getChild() and getChildren() that take a second Namespace argument.

  Namespace ns =
    Namespace.getNamespace("xhtml", "http://www.w3.org/1999/xhtml");
  List kids = element.getChildren("p", ns);
  Element kid = element.getChild("title", ns);

If a Namespace is not given, the element is assumed to be in the default namespace, which lets Java programmers ignore namespaces if they so desire.

Making a list, checking it twice

JDOM has been designed using the List and Map interfaces from the Java 2 Collections API. The Collections API provides JDOM with great power and flexibility through standard APIs. It does mean that to use JDOM, you either have to use Java 2 (JDK 1.2) or use JDK 1.1 with the Collections library installed.

All the List and Map objects are mutable, meaning their contents can be changed, reordered, added to, or deleted, and the change will affect the Document itself -- unless you explicitly copy the List or Map first. We'll get deeper into that in Part 2 of the article.

Exceptions

As you probably noticed, several exception classes in the JDOM library can be thrown to indicate various error situations. As a convenience, all of those exceptions extend the same base class, JDOMException. That allows you the flexibility to catch specific exceptions or all JDOM exceptions with a single try/catch block. JDOMException itself is usually thrown to indicate the occurrence of an underlying exception such as a parse error; in that case, you can retrieve the root cause exception using the getRootCause() method. That is similar to how RemoteException behaves in RMI code and how ServletException behaves in servlet code. However, the underlying exception isn't often needed because the JDOMException message contains information such as the parse problem and line number.

Using JDOM to read a web.xml file

Now let's see JDOM in action by looking at how you could use it to parse a web.xml file, the Web application deployment descriptor from Servlet API 2.2. Let's assume that you want to look at the Web application to see which servlets have been registered, how many init parameters each servlet has, what security roles are defined, and whether or not the Web application is marked as distributed.

Here's a sample web.xml file:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE web-app
    PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.2//EN"
    "http://java.sun.com/j2ee/dtds/web-app_2.2.dtd">
<web-app>
    <servlet>
        <servlet-name>snoop</servlet-name>
        <servlet-class>SnoopServlet</servlet-class>
    </servlet>
    <servlet>
        <servlet-name>file</servlet-name>
        <servlet-class>ViewFile</servlet-class>
        <init-param>
            <param-name>initial</param-name>
            <param-value>1000</param-value>
            <description>
                The initial value for the counter  <!-- optional -->
            </description>
        </init-param>
    </servlet>
    <servlet-mapping>
        <servlet-name>mv</servlet-name>
        <url-pattern>*.wm</url-pattern>
    </servlet-mapping>
    <distributed/>
    <security-role>
      <role-name>manager</role-name>
      <role-name>director</role-name>
      <role-name>president</role-name>
    </security-role>
</web-app>

On processing that file, you'd want to get output that looks like this:

This WAR has 2 registered servlets:
        snoop for SnoopServlet (it has 0 init params)
        file for ViewFile (it has 1 init params)
This WAR contains 3 roles:
        manager
        director
        president
This WAR is distributed

With JDOM, achieving that output is easy. The following example reads the WAR file, builds a JDOM document representation in memory, then extracts the pertinent information from it:

import java.io.*;
import java.util.*;
import org.jdom.*;
import org.jdom.input.*;
import org.jdom.output.*;
public class WarReader {
  public static void main(String[] args) {
    PrintStream out = System.out;
    if (args.length != 1 && args.length != 2) {
      out.println("Usage: WarReader [web.xml]");
      return;
    }
    try {
      // Request document building without validation
      SAXBuilder builder = new SAXBuilder(false);
      Document doc = builder.build(new File(args[0]));
      // Get the root element
      Element root = doc.getRootElement();
      // Print servlet information
      List servlets = root.getChildren("servlet");
      out.println("This WAR has "+ servlets.size() +" registered servlets:");
      Iterator i = servlets.iterator();
      while (i.hasNext()) {
        Element servlet = (Element) i.next();
        out.print("\t" + servlet.getChild("servlet-name")
                                .getText() +
                  " for " + servlet.getChild("servlet-class")
                                .getText());
        List initParams = servlet.getChildren("init-param");
        out.println(" (it has " + initParams.size() + " init params)");
      }
      // Print security role information
      List securityRoles = root.getChildren("security-role");
      if (securityRoles.size() == 0) {
        out.println("This WAR contains no roles");
      }
      else {
        Element securityRole = (Element) securityRoles.get(0);
        List roleNames = securityRole.getChildren("role-name");
        out.println("This WAR contains " + roleNames.size() + " roles:");
        i = roleNames.iterator();
        while (i.hasNext()) {
          Element e = (Element) i.next();
          out.println("\t" + e.getText());
        }
      }
      // Print distributed information (notice this is out of order)
      List distrib = root.getChildren("distributed");
      if (distrib.size() == 0) {
        out.println("This WAR is not distributed");
      } else {
        out.println("This WAR is distributed");
      }
    } catch (Exception e) {
      e.printStackTrace();
    }
  }
}

Looking forward

At the time of this writing, JDOM is in beta 5 release. Already quite a few significant features are being discussed for the next release: an optimized builder, XPath support, a listener interface, and a tree-walk mechanism. What features are actually included depends largely on feedback from members of the jdom-interest mailing list -- an open, medium-traffic list; sign-up information is available in Resources. People who just want to know when new JDOM releases are available can subscribe to jdom-announce.

Jason Hunter is a senior technologist with Collab.net, a company that provides tools and services for open source collaboration. In addition to being the cocreator of JDOM, he is the author of Java Servlet Programming (O'Reilly) and the publisher of http://Servlets.com. He has worked on projects from the largest (setting up an intranet application for a Fortune 100 company) to the smallest (helping develop a commercial product for a small startup). He contributes to Apache's Jakarta project and belongs to the working group responsible for Servlet API development. Brett McLaughlin works as an Enterprise Java consultant at Metro Information Services and specializes in distributed systems architecture. In addition to cocreating JDOM, he has written Java and XML (O'Reilly) and Enterprise Applications in Java (O'Reilly). Brett is involved in technologies such as Java servlets, Enterprise JavaBeans, XML, and business-to-business applications. He is an active developer on the Apache Cocoon project and EJBoss EJB server, and he is a cofounder of the Apache Turbine project.

Learn more about this topic

1 2 Page 2
Page 2 of 2