Simplify XML file processing with the Jakarta Commons Digester

How to use Digester to parse XML configuration files

The Apache Jakarta site is home to many well-known Java open source projects, including Tomcat, Ant, and log4j. A lesser-known subproject of Jakarta is Jakarta Commons, a repository of reusable Java components. These components, such as Commons BeanUtils, Commons DBCP, and Commons Logging, alleviate the pain of some standard programming tasks. This article will focus on the Jakarta Commons Digester, a utility that maps XML files to Java objects.

Note: To use Digester, you must have the following libraries in your classpath: Digester, BeanUtils, Collections, Logging, and an XML parser conforming to SAX (Simple API for XML) 2.0 or JAXP (Java API for XML Parsing) 1.1. Links to all Jakarta Commons components, along with two suitable parsers, Crimson and Xerces, can be found in Resources.

XML parsing overview

Two basic methods parse XML documents. One is the Document Object Model (DOM) method. When parsing an XML document with DOM, the parser reads the entire document and creates a tree-like representation of it. The second method uses SAX and parses XML documents with events. The DOM method, while sometimes easier to implement, is slower and more resource-intensive than SAX. Digester simplifies SAX parsing by providing a higher-level interface to SAX events. This interface hides much of the complexity involved in XML document navigation, allowing developers to concentrate on processing XML data instead of parsing it.

Digester concepts

Digester introduces three important concepts: element matching patterns, processing rules, and the object stack.

Element matching patterns associate XML elements with processing rules. The following example shows the element matching patterns for an XML hierarchy:

<datasources>          'datasources'
  <datasource>         'datasources/datasource'
    <name/>            'datasources/datasource/name'
    <driver/>          'datasources/datasource/driver'  
  <datasource>         'datasources/datasource'
    <name/>            'datasources/datasource/name'
    <driver/>          'datasources/datasource/driver'  

Each time a pattern is matched, an associated rule is fired. In the above example, a rule associated with 'datasources/datasource' executes twice.

Processing rules define what happens when Digester matches a pattern. Digester includes predefined processing rules. Custom rules can also be created by subclassing org.apache.commons.digester.Rule.

The object stack makes objects available for manipulation by processing rules. Objects can be added or removed (pushed or popped) from the stack either manually or through processing rules.

Using Digester

Digester is often used to parse XML configuration files. In the following examples, we have an XML configuration file that contains information used to build a DataSources pool. The DataSource is a hypothetical class that has an empty constructor and many get and set methods that take in and return strings.

<?xml version="1.0"?>

To use Digester you must create an instance of the Digester class, push any required objects to the Digester's object stack, add a set of processing rules, and finally parse the file. Here is an example:

Digester digester = new Digester();
digester.addObjectCreate("datasources/datasource", "DataSource");
digester.addCallMethod("datasources/datasource/driver","setDriver", 0);

In this example, the addObjectCreate() method adds an ObjectCreateRule to the 'datasources/datasource' pattern. The ObjectCreateRule creates a new instance of the DataSource class and pushes the instance to the Digester's object stack. Next, addCallMethod() adds a CallMethodRule to two patterns. The CallMethodRule calls the specified method of the object at the top of the object stack. The last addCallMethod() argument is the number of additional parameters to be passed into the method. Since the number is zero, the matching element's body passes to the method.

If this code runs against our sample XML file, here's what happens:

  • A new instance of the DataSource class is created and pushed to the object stack
  • The setName(String name) method of the newly instantiated DataSource object is called with the argument 'HsqlDataSource'
  • The setDriver(String driver) method of the newly instantiated DataSource object is called with the argument 'OracleDataSource'
  • At the end of the 'datasource' element, the object pops off the stack, and the process repeats itself

The problem with this example is that the ObjectCreateRule pops off the object it creates when its associated element completes. When Digester finishes parsing the document, only the last object created remains. Solve this problem by pushing an object to the stack before parsing begins, and then call that object's methods to create any objects you need. The following class provides an example of this:

public class SampleDigester
  public void run() throws IOException, SAXException
    Digester digester = new Digester();
    // This method pushes this (SampleDigester) class to the Digesters
    // object stack making its methods available to processing rules.
    // This set of rules calls the addDataSource method and passes
    // in five parameters to the method.
    digester.addCallMethod("datasources/datasource", "addDataSource", 5);
    digester.addCallParam("datasources/datasource/name", 0);
    digester.addCallParam("datasources/datasource/driver", 1);
    digester.addCallParam("datasources/datasource/url", 2);
    digester.addCallParam("datasources/datasource/username", 3);
    digester.addCallParam("datasources/datasource/password", 4);
    // This method starts the parsing of the document.
  // Example method called by Digester.
  public void addDataSource(String name,
                            String driver,
                            String url,
                            String userName,
                            String password)
    // create DataSource and add to collection...

In the SampleDigester class, the addDataSource() method is called each time the pattern 'datasources/datasource' is matched. The addCallParam() methods add CallParamRules that pass the matching elements' bodies as addDataSource() method parameters. In the addDataSource() method, you create the actual DataSource and add it to your collection of DataSources.

Digest the Digester

Although Digester was initially developed to simplify XML-configuration file parsing, it is useful any time you need to map XML files to Java objects. This article has provided an introduction to Digester. To learn more about Digester and other Jakarta Commons components, visit the Jakarta Commons Website. In addition, look at the open source projects in the Resources section below for real-world examples of Digester in action. You can also download the source code that accompanies this article below.

Erik Swenson is a consultant and the founder of Open Source Software Solutions. Swenson specializes in Java development using open source software and components. Moreover, he developed JasperEdit and OpenReports open source projects.

Learn more about this topic

  • Open source projects that use Digester:

Join the discussion
Be the first to comment on this article. Our Commenting Policies
See more