Mapping XML to Java, Part 2

Create a class library that uses the SAX API to map XML documents to Java objects

As I mentioned in Part 1, one of the big problems programmers face when using the SAX API is conceptual. I'd like to address that issue again as the foundation for the reusable class library that we will develop in this article.

TEXTBOX:

TEXTBOX_HEAD: Mapping XML to Java: Read the whole series!

:END_TEXTBOX

Regardless of whether you use DOM or SAX, when mapping XML data into Java, two things happen -- navigation and data collection. DOM and SAX differ in how those aspects are addressed. That is, DOM separates navigation and data collection, while SAX merges navigation and collection.

Most of DOM's performance weakness stems from the fact that the separation of the navigational and data-collection aspects seems natural and required as expressed in the DOM programming model, but that separation is not, in fact, a runtime requirement. SAX pierces that illusion by merging navigation and data collection during runtime, at the cost of making its programming model less obvious.

Using DOM, once you've created your in-memory DOM tree, you navigate to find the node that interests you. Then, once you've found the correct node, you collect data. You navigate and collect data, and those two aspects are conceptually separated. Unfortunately, as previously mentioned, using the in-memory DOM tree presents big performance implications.

With SAX, it's more of a juggling game. You listen to SAX events to keep track of where you are -- a different form of navigation. When the SAX events have positioned you in just the right place, you collect data. One of the reasons that SAX hasn't dominated the XML APIs is that the navigational aspect of its programming model is not as intuitive as it is with DOM.

As such, wouldn't it be really cool if we could get the navigational and data collection aspects of SAX into separate corners but keep the runtime performance advantages? Well, pay attention because that's exactly what we will do. That is, no reason exists for not separating navigational and data collection aspects in the programming model during development but leaving them intermixed at runtime.

You are here

In Part 1, I went through some basic applications of SAX. I also mentioned that there were some situations that needed special attention, such as recursive data structures. To create a class library that separates out the navigational aspects of SAX in the programming model, we will need a general-purpose approach to dealing with navigation. That approach will have to deal with all the special cases, including ambiguous tag names and recursive data structures.

So, how do we do that?

The key to navigation in SAX: at all times keep track of where you are during parsing. The most complicated navigational case is keeping track of where you are while receiving SAX events for recursive data structures generated while parsing an XML document. The conventional programming approach to using recursive structures -- sometimes called walking the tree -- is to use either a stack data structure or recursive function calls. Unfortunately, we can't use recursive function calls in SAX because we have to return control back to the XML parser after processing each SAX event. But we can use a stack data structure to keep track of SAX events.

Using a stack fixes a second problem that I mentioned in my previous article: if you have an ambiguous tag name such as name or location, that appears in more than one location within the XML document, you have to do something to remove the ambiguity. Using the full XML path from the XML document root to the ambiguous tag accomplishes that. Using a stack makes the full path from root to tag name available at all times. So, using a stack addresses both parsing special cases.

To demonstrate, let's look at a simple example that uses concatenate tags as they are discovered and uses the concatenated string as pseudo stack:

import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.io.*;
import java.util.*;
import common.*;
public class Example1 extends DefaultHandler {
   // A stack to keep track of the tag names
   // that are currently open ( have called
   // "startElement", but not "endElement".)
   private Stack tagStack = new Stack();
   // Local list of item names...
   private Vector items = new Vector();
   // Customer name...
   private String customer;
   // Buffer for collecting data from
   // the "characters" SAX event.
   private CharArrayWriter contents = new CharArrayWriter();
   // Override methods of the DefaultHandler class
   // to gain notification of SAX Events.
   //
        // See org.xml.sax.ContentHandler for all available events.
   //
   public void startElement( String namespaceURI,
               String localName,
              String qName,
              Attributes attr ) throws SAXException {
      contents.reset();
      // push the tag name onto the tag stack...
      tagStack.push( localName );
      // display the current path that has been found...
      System.out.println( "path found: [" + getTagPath() + "]" );
   }
   public void endElement( String namespaceURI,
               String localName,
              String qName ) throws SAXException {
      if ( getTagPath().equals( "/CustomerOrder/Customer/Name" ) ) {
         customer = contents.toString().trim();
      }
      else if ( getTagPath().equals( "/CustomerOrder/Items/Item/Name" ) ) {
         items.addElement( contents.toString().trim() );
      }
      // clean up the stack...
      tagStack.pop();
   }
   public void characters( char[] ch, int start, int length )
                  throws SAXException {
      // accumulate the contents into a buffer.
      contents.write( ch, start, length );
   }
   // Build the path string from the current state
   // of the stack...
   //
   // Very inefficient, but we'll address that later...
   private String getTagPath( ){
      //  build the path string...
      String buffer = "";
      Enumeration e = tagStack.elements();
      while( e.hasMoreElements()){
               buffer  = buffer + "/" + (String) e.nextElement();
      }
      return buffer;
   }
   public Vector getItems() {
           return items;
   }
   public String getCustomerName() {
         return customer;
   }
   public static void main( String[] argv ){
      System.out.println( "Example1:" );
      try {
         // Create SAX 2 parser...
         XMLReader xr = XMLReaderFactory.createXMLReader();
         // Set the ContentHandler...
         Example1 ex1 = new Example1();
         xr.setContentHandler( ex1 );
         System.out.println();
         System.out.println("Tag paths located:");
         // Parse the file...
         xr.parse( new InputSource(
               new FileReader( "Example1.xml" )) );
         System.out.println();
         System.out.println("Names located:");
         // Display Customer
         System.out.println( "Customer Name: " + ex1.getCustomerName() );
         // Display all item names to stdout...
         System.out.println( "Order Items: " );
         String itemName;
         Vector items = ex1.getItems();
         Enumeration e = items.elements();
         while( e.hasMoreElements()){
                   itemName = (String) e.nextElement();
            System.out.println( itemName );
         }
      }catch ( Exception e )  {
         e.printStackTrace();
      }
   }
}

Below you'll find sample data to use with our example:

<?xml version="1.0"?>
<CustomerOrder>
   <Customer>
      <Name> Customer X </Name>
      <Address> unknown  </Address>
   </Customer>
   <Items>
      <Item>
         <ProductCode> 098 </ProductCode>
         <Name> Item 1 </Name>
         <Price> 32.01 </Price>
      </Item>
      <Item>
         <ProductCode> 4093 </ProductCode>
         <Name> Item 2 </Name>
         <Price> 0.76 </Price>
      </Item>
      <Item>
         <ProductCode> 543 </ProductCode>
         <Name> Item 3 </Name>
         <Price> 1.42 </Price>
      </Item>
   </Items>
</CustomerOrder>

Running our example with the sample data yields the following output:

Example1:
Tag paths located:
path found: [/CustomerOrder]
path found: [/CustomerOrder/Customer]
path found: [/CustomerOrder/Customer/Name]
path found: [/CustomerOrder/Customer/Address]
path found: [/CustomerOrder/Items]
path found: [/CustomerOrder/Items/Item]
path found: [/CustomerOrder/Items/Item/ProductCode]
path found: [/CustomerOrder/Items/Item/Name]
path found: [/CustomerOrder/Items/Item/Price]
path found: [/CustomerOrder/Items/Item]
path found: [/CustomerOrder/Items/Item/ProductCode]
path found: [/CustomerOrder/Items/Item/Name]
path found: [/CustomerOrder/Items/Item/Price]
path found: [/CustomerOrder/Items/Item]
path found: [/CustomerOrder/Items/Item/ProductCode]
path found: [/CustomerOrder/Items/Item/Name]
path found: [/CustomerOrder/Items/Item/Price]
Names located:
Customer Name: Customer X
Order Items:
Item 1
Item 2
Item 3

You are here -- now do something

Now that you have an idea of where I'm going, what should we keep on the stack? Obviously, performing string operations to keep track of where you are during parsing proves inefficient. We also must tackle how to use the stack effectively. Even though I used a stack in the example, I didn't leverage it to control activating the collection of data at key locations during parsing. I simply referenced it as the name of you are here during parsing for every single start-tag SAX event.

Back to the question, what should we keep on the stack? Well, I just mentioned that we want to leverage the stack to help us activate the collection of data. So, maybe a good candidate for the contents of the stack is data collection actions. As those actions are sometimes just place holders, I've named them tag trackers.

Tag trackers are markers that represent positions within the XML document. To reflect the structure of the XML document, tag trackers have one parent and zero-to-many children. Starting with a root tag tracker, all other tag trackers are connected via a parent-child relationship. When a startElement SAX event occurs, the active tag tracker compares the tag name to the tag names that were associated with each of its children tag trackers when they were created. When a match is found, the active tag tracker places itself on the stack and makes the matching child tag tracker the new active tag tracker.. Later, when the child has finished processing SAX events, the parent will be popped back off of the stack and reestablished as the active tag tracker.

Not only do tag trackers mark positions within the XML document but also associate actions with positions within the XML document. That is where the navigational aspects and the data collection aspects coordinate. When a tag tracker activates, it will fire an associated event, indicating that a particular position in the XML document has been announced by the SAX API. Unlike SAX, the tag tracking network fires that event only when the full path has been reached, making the event fully specified and unambiguous.

Tag trackers work as a group in a tag tracker network to navigate an XML document. Programs that use a tag tracker network start by creating a root tag tracker node. Then they create child tag trackers and bind them to the root tag tracker for each possible XML tag that can occur in the root of the XML document. That process is repeated recursively for each child until all XML tags in which the program is interested have a tag tracker linked to the tag tracker network. That is continued for every level in the XML document that is to be mapped. In that way, a network is created.

Our next example simply demonstrates tag trackers and a stack to keep track of where we are during XML parsing:

import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.io.*;
import java.util.*;
import common.*;
public class Example2 extends DefaultHandler {
   // A stack for the tag trackers to
   // coordinate on.
   //
   private Stack tagStack = new Stack();
   private TagTracker root;
   // Buffer for collecting data from
   // the "characters" SAX event.
   private CharArrayWriter contents = new CharArrayWriter();
   // Override methods of the DefaultHandler class
   // to gain notification of SAX Events.
   //
   // See org.xml.sax.ContentHandler for all available events.
   //
   public void startElement( String namespaceURI,
               String localName,
              String qName,
              Attributes attr ) throws SAXException {
      contents.reset();
      // delegate the event handling to the tag tracker
      // network.
      TagTracker activeTracker = (TagTracker) tagStack.peek();
      activeTracker.onStart( localName, tagStack );
   }
   public void endElement( String namespaceURI,
               String localName,
              String qName ) throws SAXException {
      // delegate the event handling to the tag tracker
      // network.
      TagTracker activeTracker = (TagTracker) tagStack.peek();
      activeTracker.onEnd( tagStack );
   }
   public void characters( char[] ch, int start, int length )
                  throws SAXException {
      // accumulate the contents into a buffer.
      contents.write( ch, start, length );
   }
   //
   // Sets the root tag tracker
   //
   // See the TagTracer class.
   //
   //
   public void setRoot( TagTracker root ) {
      // This tag tracker anchors the tag
      // tracking network to the beginning
      // of the XML document. ( before the
      // first tag name is located ).
      //
      // By placing it first on the stack
      // all future tag tracking will follow
      // the network anchored by this
      // root tag tracker.
      //
      tagStack.push( root );
   }
   // Build the path string from the current state
   // of the stack...
   //
   // Very inefficient, but we'll address that later...
   private String getTagPath( ){
      //  build the path string...
      String buffer = "";
      Enumeration e = tagStack.elements();
      while( e.hasMoreElements()){
               buffer  = buffer + "/" + (String) e.nextElement();
      }
      return buffer;
   }
   public static void main( String[] argv ){
      System.out.println( "Example2:" );
      try {
         // Create SAX 2 parser...
         XMLReader xr = XMLReaderFactory.createXMLReader();
         // Create the ContentHandler...
         Example2 ex2 = new Example2();
         // Create the tag tracking network.
         //
         // This is different from our previous
         // examples.
         //
         // We are going to create a network
         // of tag trackers that will track
         // the opening and closing of tags
         // during XML parsing.
         //
         // This process is more declarative
         // and resembles changing directories
         // from a command line with the restriction
         // of only being able to change one
         // directory at a time.
         //
         // -- create root: /
         TagTracker root = new TagTracker();
         // -- create /CustomerOrder
         TagTracker customerOrder = new TagTracker();
         root.track( "CustomerOrder", customerOrder );
         // -- create /CustomerOrder/Customer
         TagTracker customer = new TagTracker();
         customerOrder.track( "Customer", customer );
         // -- create announcement /CustomerOrder/Customer/Name
         TagTracker customerName = new TagTracker(
                                  "Found :[/CustomerOrder/Customer/Name]");
         customer.track( "Name", customerName );
         // -- create /CustomerOrder/Items
         TagTracker items = new TagTracker();
         customerOrder.track( "Items", items );
         // -- create /CustomerOrder/Items/Item
         TagTracker item = new TagTracker();
         items.track( "Item", item );
         // -- create announcement /CustomerOrder/Items/Item/Name
         TagTracker itemName = new TagTracker(
                              "Found :[/CustomerOrder/Items/Item/Name]");
         item.track( "Name", itemName );
         // Associate the tag tracking network
         // with the ContentHandler...
         ex2.setRoot( root );
         // Set the ContentHandler...
         xr.setContentHandler( ex2 );
         // Parse the file...
         // All output from this program will come from
         // the TagTracker announcements set up in
         // the tag tracking network.
         //
         xr.parse( new InputSource(
               new FileReader( "Example2.xml" )) );
      }catch ( Exception e )  {
         e.printStackTrace();
      }
   }
}

Next, we turn our attention to a tag tracker class that represents a simplified introduction to the class we will develop for the library:

import java.util.*;
public class TagTracker {
   // Table of tag trackers.
   // This table contains an entry for
   // every tag name that this TagTracker
   // has been configured to follow.
   // This is a single-level parent-child relation.
   //
   private Hashtable trackers = new Hashtable();
   // Useful for skipping tag names that are not
   // being tracked.
   private static SkippingTagTracker skip = new SkippingTagTracker();
   // This version of tag tracker only
   // does simple announcements...
   private String announcement = "";
   private boolean haveAnnouncement = false;
   // A special constructor for TagTrackers
   // that are supposed to make an announcement...
   public TagTracker( String message ) {
      haveAnnouncement = true;
      announcement = message;
   }
   // default constructor
   public TagTracker() {}
   // Configuration method for setting up a network
   // of tag trackers...
   // Each parent tag name should be configured
   // ( call this method ) for each child tag name
   // that it will track.
   public void track( String tagName, TagTracker tracker ){
      trackers.put( tagName, tracker);
   }
   // Tag trackers work together on a stack.
   // The tag tracker at the top of the stack
   // is the "active" tag tracker and is responsible
   // for delegating the tracking to a child tag
   // tracker or putting a skipping place marker on the
   // stack.
   //
   public void onStart( String tagName, Stack tagStack ){
      // Lookup up the tag name in the tracker table.
      TagTracker tracker = (TagTracker) trackers.get( tagName );
      //
      // Are we tracking this tag name?
      //
      if ( tracker == null ) {
         // Not tracking this
         // tag name.  Skip the
         // entire branch.
         tagStack.push( skip );
      }
      else {
         // Found a tracker for this
         // tag name.  Make it the
         // new top of stack tag
         // tracker
         tagStack.push( tracker );
         // Announce the tag tracker
         tracker.announce();
      }
   }
   // Tag trackers work together on a stack.
   // The tag tracker at the top of the stack
   // is the "active" tag tracker and is responsible
   // for reestablishing its parent tag tracker
   // ( next to top of stack ) when it has
   // been notified of the closing tag.
   //
   public void onEnd( Stack tagStack ){
      // Clean up the stack...
      tagStack.pop();
   }
   public void announce() {
      if( haveAnnouncement ){
         System.out.println( announcement );
      }
   }
}
class SkippingTagTracker extends TagTracker {
   // Tag trackers work together on a stack.
   // The tag tracker at the top of the stack
   // is the "active" tag tracker.
   //
   // This class represents a skipping place
   // marker on the stack.  When a real tag
   // tracker places a skipping tag tracker on
   // the stack, that is an indication that
   // all tag names found during the skip are
   // of no interest to the tag tracking network.
   //
   // This means that if the skipping tag tracker
   // is notified of a new tag name, this new
   // tag name should also be skipped.
   //
   // Since this class never varies its behavior,
   // it is OK for it to skip new tag names by
   // placing itself on the stack again.
   public void onStart( String tagName, Stack tagStack ){
      //
      // If the current tag name is being
      // skipped, all children should be
      // skipped.
      //
      tagStack.push( this );
   }
   //
   // The skipping tag tracker has
   // nothing special to do when
   // a closing tag is found other
   // than to remove itself from
   // the stack, which as a side
   // effect replaces it with its
   // parent as the "active," top
   // of stack tag tracker.
   //
   public void onEnd( Stack tagStack ){
      // Clean up the stack...
      tagStack.pop();
   }
}

Here's some sample data to use with our example:

<?xml version="1.0"?>
<CustomerOrder>
   <Customer>
      <Name> Customer X </Name>
      <Address> unknown  </Address>
   </Customer>
   <Items>
      <Item>
         <ProductCode> 098 </ProductCode>
         <Name> Item 1 </Name>
         <Price> 32.01 </Price>
      </Item>
      <Item>
         <ProductCode> 4093 </ProductCode>
         <Name> Item 2 </Name>
         <Price> 0.76 </Price>
      </Item>
      <Item>
         <ProductCode> 543 </ProductCode>
         <Name> Item 3 </Name>
         <Price> 1.42 </Price>
      </Item>
   </Items>
</CustomerOrder>

Running our example with the sample data yields the following output:

Example2:
Found :[/CustomerOrder/Customer/Name]
Found :[/CustomerOrder/Items/Item/Name]
Found :[/CustomerOrder/Items/Item/Name]
Found :[/CustomerOrder/Items/Item/Name]

Score

Tag trackers coordinating on a stack separate the navigational and the data collection aspects. You specify the navigational aspect when you create the network of tag trackers. Once the network is connected, the tag trackers work together on the stack to keep track of the location of the XML parser during runtime -- exactly what we were looking for.

An interesting note for the previous example: the skip tag tracker. When the active tag tracker receives a startElement SAX event for a tag name it doesn't recognize, it places a skip marker on the stack. The marker keeps ignoring every nested tag until it receives its matching endElement SAX event. You might think that step is unnecessary, but it is important. Without a skip marker, a tag nested deep in a branch of XML that is being skipped could match one of the active tag tracker's child tag trackers. That situation would definitely map data incorrectly.

All dressed up

Now we need to clean up the packaging of that programming model into a class library. Take another look at the previous example. Notice that I've isolated all of the SAX code in the Example2 class. Once you begin using tag trackers, you become interested in tag tracker actions, not raw SAX events. That feature will allow us to hide the SAX code in a base class. Neat, right?

Further, I want to force our class library's user to do the bare minimum to specify the tag tracking network. So what is the bare minimum? Well, we know that we need to do at least two things:

  1. Specify the tag tracking network that will navigate the XML document we are interested in
  2. Specify any actions that have to be attached to locations in the tag tracking network

On the other side of the coin, we want the make using XML mapping code created with the class library as simple as possible. To accomplish that goal, we turn to the builder design pattern, which describes the behavior our library should display. The builder pattern delegates the construction of an object to a specialized class that encapsulates the logic for constructing objects -- exactly what we want to do here. With that in mind, any XML mapping code generated with that library should fit the builder pattern.

So in summary, we want to create a class library that uses tag trackers, hides SAX, makes the specification of tag tracker networks simple, and follows the builder design pattern when completed.

Now let's take a look at the class library source code.

The SaxMapper and TagTracker classes, seen below, occupy the core of the library. We start with SaxMapper:

package reh.SaxMapper;
import java.io.*;
import java.util.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
public abstract class SaxMapper extends DefaultHandler {
   // Must be overridden by all subclasses...
   public abstract Object getMappedObject();
   public abstract TagTracker createTagTrackerNetwork();
   // A stack for the tag trackers to
   // coordinate on.
   //
   private Stack tagStack = new Stack();
   // The SAX 2 parser...
   private XMLReader xr;
   // Buffer for collecting data from
   // the "characters" SAX event.
   private CharArrayWriter contents = new CharArrayWriter();
   public SaxMapper( ) {
      try {
         // Create the XML reader...
         xr = XMLReaderFactory.createXMLReader();
      } catch ( Exception e ) {
                    e.printStackTrace();
      }
      // Create the tag tracker network
      // and initialize the stack with
      // it.
      //
      // This constructor anchors the tag
      // tracking network to the beginning
      // of the XML document. ( before the
      // first tag name is located ).
      //
      // By placing it first on the stack
      // all future tag tracking will follow
      // the network anchored by this
      // root tag tracker.
      //
      // The createTagTrackerNetwork() method
      // is abstract.  All subclasses are
      // responsible for reacting to this
      // request with the creation of a
      // tag tracking network that will
      // perform the mapping for the subclass.
      //
      SaxMapperLog.trace( "Creating the tag tracker network." );
      tagStack.push( createTagTrackerNetwork() );
      SaxMapperLog.trace( "Tag tracker network created." );
   }
        public Object fromXML( String url ) {
      try {
         return fromXML( new InputSource( url ) );
      } catch ( Exception e ) {
                    e.printStackTrace();
         return null;
      }
   }
        public Object fromXML( InputStream in ) {
      try {
         return fromXML( new InputSource( in ) );
      } catch ( Exception e ) {
                    e.printStackTrace();
         return null;
      }
   }
        public Object fromXML( Reader in ) {
      try {
         return fromXML( new InputSource( in ) );
      } catch ( Exception e ) {
                    e.printStackTrace();
         return null;
      }
   }
   private synchronized Object fromXML( InputSource in ) throws Exception {
      // notes,
      // 1.  The calling "fromXML" methods catch
      //     any parsing exceptions.
      // 2.  The method is synchronized to keep
      //     multiple threads from accessing the XML parser
      //     at once.  This is a limitation imposed by SAX.
      // Set the ContentHandler...
      xr.setContentHandler( this );
      // Parse the file...
      SaxMapperLog.trace( "About to parser XML document." );
      xr.parse( in );
      SaxMapperLog.trace( "XML document parsing complete." );
           return getMappedObject();
   }
   // Implement the content handler methods that
   // will delegate SAX events to the tag tracker network.
   public void startElement( String namespaceURI,
               String localName,
              String qName,
              Attributes attr ) throws SAXException {
      // Resetting contents buffer.
      // Assuming that tags either tag content or children, not both.
      // This is usually the case with XML that is representing
      // data structures in a programming language independent way.
      // This assumption is not typically valid where XML is being
      // used in the classical text mark up style where tagging
      // is used to style content and several styles may overlap
      // at once.
      contents.reset();
      // delegate the event handling to the tag tracker
      // network.
      TagTracker activeTracker = (TagTracker) tagStack.peek();
      activeTracker.startElement( namespaceURI, localName,
                   qName, attr, tagStack );
   }
   public void endElement( String namespaceURI,
               String localName,
              String qName ) throws SAXException {
      // delegate the event handling to the tag tracker
      // network.
      TagTracker activeTracker = (TagTracker) tagStack.peek();
      activeTracker.endElement( namespaceURI, localName,
                 qName, contents, tagStack );
   }
   public void characters( char[] ch, int start, int length )
                  throws SAXException {
      // accumulate the contents into a buffer.
      contents.write( ch, start, length );
   }
}

And here's TagTracker:

package reh.SaxMapper;
import java.util.*;
import java.io.*;
import org.xml.sax.*;
public class TagTracker {
   // Table of tag trackers.
   // This table contains an entry for
   // every tag name that this TagTracker
   // has been configured to follow.
   // This is a single-level parent-child relation.
   //
   private Hashtable trackers = new Hashtable();
   // Useful for skipping tag names that are not
   // being tracked.
   private static SkippingTagTracker skip = new SkippingTagTracker();
   // default constructor
   public TagTracker() {}
   // Configuration method for setting up a network
   // of tag trackers...
   // Each parent tag name should be configured
   // ( call this method ) for each child tag name
   // that it will track.
   public void track( String tagName, TagTracker tracker ){
      int slashOffset = tagName.indexOf( "/" );
      if( slashOffset < 0 ) {
         // if it is a simple tag name ( no "/" separators )
         // simply add it.
         trackers.put( tagName, tracker);
      } else if ( slashOffset == 0 ) {
         // Oooops leading slash, remove it and
         // try again recursively.
         track( tagName.substring( 1 ), tracker );
      } else {
         // if it is not a simple tag name
         // recursively add the tag.
         String topTagName = tagName.substring( 0, slashOffset );
         String remainderOfTagName = tagName.substring( slashOffset + 1 );
         TagTracker child = (TagTracker)trackers.get( topTagName );
         if ( child == null ) {
                            // Not currently tracking this
            // tag. Add new tracker.
            child = new TagTracker();
            trackers.put( topTagName, child );
         }
         child.track( remainderOfTagName, tracker );
      }
   }
   // Tag trackers work together on a stack.
   // The tag tracker at the top of the stack
   // is the "active" tag tracker and is responsible
   // for delegating the tracking to a child tag
   // tracker or putting a skipping place marker on the
   // stack.
   //
   public void startElement( String namespaceURI,
               String localName,
              String qName,
              Attributes attr,
              Stack tagStack ) {
      // Look up the tag name in the tracker table.
      // Note, this implementation does not address
      // using XML name space support that is now available
      // with SAX2.
      // We are simply using the localName as a key
      // to find a possible tracker.
      TagTracker tracker = (TagTracker) trackers.get( localName );
      //
      // Are we tracking this tag name?
      //
      if ( tracker == null ) {
         // Not tracking this
         // tag name.  Skip the
         // entire branch.
         SaxMapperLog.trace( "Skipping tag: [" + localName + "]");
         tagStack.push( skip );
      }
      else {
         // Found a tracker for this
         // tag name.  Make it the
         // new top of stack tag
         // tracker
         SaxMapperLog.trace( "Tracking tag: [" + localName + "]");
         // Send the deactivate event to this tracker.
         SaxMapperLog.trace( "Deactivating current tracker.");
         onDeactivate();
         // Send the on start to the new active
         // tracker.
         SaxMapperLog.trace( "Sending start event to [" + localName + "] tracker.");
         tracker.onStart(namespaceURI, localName,
               qName, attr );
         tagStack.push( tracker );
      }
   }
   // Tag trackers work together on a stack.
   // The tag tracker at the top of the stack
   // is the "active" tag tracker and is responsible
   // for reestablishing its parent tag tracker
   // ( next to top of stack ) when it has
   // been notified of the closing tag.
   //
   public void endElement(   String namespaceURI,
               String localName,
              String qName,
              CharArrayWriter contents,
              Stack tagStack ) {
      // Send the end event.
      SaxMapperLog.trace( "Finished tracking tag: [" + localName + "]");
      onEnd( namespaceURI, localName, qName, contents );
      // Clean up the stack...
      tagStack.pop();
      // Send the reactivate event.
      TagTracker activeTracker = (TagTracker) tagStack.peek();
      if ( activeTracker != null ) {
         SaxMapperLog.trace( "Reactivating previous tag tracker.");
                    activeTracker.onReactivate();
      }
   }
   // Methods for collecting content. These methods
   // are intended to be overridden with specific
   // actions for nodes in the tag tracking network
   // that require
   public void onStart( String namespaceURI,
              String localName,
              String qName,
              Attributes attr ) {
      // default is no action...
   }
   public void onDeactivate() {
      // default is no action...
   }
   public void onEnd(   String namespaceURI,
               String localName,
              String qName,
              CharArrayWriter contents ){
      // default is no action...
   }
   public void onReactivate() {
      // default is no action...
   }
}
class SkippingTagTracker extends TagTracker {
   // Tag trackers work together on a stack.
   // The tag tracker at the top of the stack
   // is the "active" tag tracker.
   //
   // This class represents a skipping place
   // marker on the stack.  When a real tag
   // tracker places a skipping tag tracker on
   // the stack, that is an indication that
   // all tag names found during the skip are
   // of no interest to the tag tracking network.
   //
   // This means that if the skipping tag tracker
   // is notified of a new tag name, this new
   // tag name should also be skipped.
   //
   // Since this class never varies its behavior,
   // it is OK for it to skip new tag names by
   // placing itself on the stack again.
   public void startElement( String namespaceURI,
               String localName,
              String qName,
              Attributes attr,
              Stack tagStack ) {
      //
      // If the current tag name is being
      // skipped, all children should be
      // skipped.
      //
      SaxMapperLog.trace( "Skipping tag: [" + localName + "]");
      tagStack.push( this );
   }
   //
   // The skipping tag tracker has
   // nothing special to do when
   // a closing tag is found other
   // than to remove itself from
   // the stack, which as a side
   // effect replaces it with its
   // parent as the "active," top
   // of stack tag tracker.
   //
   public void endElement(   String namespaceURI,
               String localName,
              String qName,
              CharArrayWriter contents,
              Stack tagStack ) {
      // Clean up the stack...
      SaxMapperLog.trace( "Finished skipping tag: [" + localName + "]");
      tagStack.pop();
   }
}

Next, logger, a utility class, allows for the debugging of XML mappers:

package reh.SaxMapper;
public class SaxMapperLog {
   static boolean doTraceLogging = Boolean.getBoolean( "reh.SaxMapper.trace" );
   public static void trace( String msg ){
      if ( doTraceLogging )  {
         System.out.println( "trace: " + msg );
      }
   }
   public static void error(  String msg ){
      System.out.println( "error: " + msg );
   }
   // testing main method...
   public static void main( String[] argv ) {
      Boolean b = new Boolean( doTraceLogging );
      System.out.println( "Tracing is: ["
               + b.toString()
               + "]" );
      trace( "test message" );
   }
}

Before we move on to some examples using the library, I'd like to comment on a few things in its design.

First, notice that the model of implementing XML mappers with that library is based on extending a base class. I'm not the biggest fan of that library style, but it serves our purpose of hiding the SAX implementation and making the specification of the tag tracking library clear.

Second, notice that the TagTracker class implements a few action methods that are empty -- all of the onXXXX methods. We associate actions with specific tag trackers by subclassing the TagTracker class and overriding those methods.

A trace logging class also can be turned on by setting the reh.SaxMapper.trace system property to true, done on the command line by passing -D reh.SaxMapper.trace=true to the JVM. You can log to that logger within tag track actions to trace your actions intermixed with the tracing of the execution of the library. For debugged production code, I would probably create a second implementation of the library with the tracing code removed. The trace methods are called regardless of whether they actually log, so that overhead can be eliminated on debugged systems. You may also want to replace trace debugging and error logging with some of the great logging libraries available on the Internet. Both JLog from IBM's Alphaworks and Log4j are good candidates.

Last, notice that the track method of the TagTracker class is more complicated than our previous example. That implementation of track will accept tag names concatenated with the Unix-style forward slash separator. That is merely a convenience for specifying the tag trackers of a path all at once for trackers that are just acting as place holders on the tag stack.

Using the tag tracker library

The basic steps to using the tag tracker are:

  1. Create a Java class or classes that will represent the data being mapped from the XML document.
  2. Create a Mapper class that extends the reh.SaxMapper.SaxMapper class.
  3. Implement two methods -- createTagTrackerNetwork() and getMappedObject().
  4. For the createTagTrackerNetwork() method, create a network of tag trackers and return the root tag tracker.
  5. When creating the tag tracker network, for every tag tracker that must perform a data collection action, subclass the TagTracker class and override the action methods. Attributes are available in the onStart action. Content is available in the onEnd action. The easiest way to subclass tag trackers: use anonymous inner classes. The syntax of anonymous inner classes is a little ugly, but they do the trick.
  6. For the getMappedObject, simply return the object that you have mapped from XML.

Using mappers based on the tag tracker library

Using mappers based on the tag tracker library proves straight forward:

  1. Create an instance of the mapper
  2. Call the fromXML() method with either a file name, URL, input stream, or reader
  3. Cast the returned object from the fromXML() to the proper type
  4. The mapper can be reused, so you shouldn't be constantly creating new mappers

Using the library

Now let's create a basic mapper that collects data from an attribute, tagged content, and tagged content representing a list. Pay attention to the createTagTrackerNetwork() method. We associate data collection actions to tag trackers by creating anonymous inner classes that override the empty tag tracker action methods with methods that actually collect the data.

First, we see the class that will contain the mapped data:

import java.util.*;
import java.io.*;
public class TestObject {
   String customerName = "";
   String riskFactor = "";
   Vector itemNames = new Vector();
   void addItemName( String name ) {
      itemNames.addElement( name );
   }
   Enumeration getItemNames() {
      return itemNames.elements();
   }
}

Next is our first XML mapper that employs the tag tracker class library. Notice how I use anonymous inner classes to override the default tag tracker actions:

import reh.SaxMapper.*;
import org.xml.sax.*;
import java.util.*;
import java.io.*;
public class Example3 extends SaxMapper {
   private TestObject target;
   public Object getMappedObject() {
      return target;
   }
   public TagTracker createTagTrackerNetwork() {
      SaxMapperLog.trace( "creating tag track network" );
         // -- create root: /
         TagTracker root = new TagTracker() {
            public void onDeactivate() {
               // The root will be deactivated when
               // parsing a new document begins.
               target = new TestObject();
            }
            };
         // -- create action for /CustomerOrder/Customer/Name
         TagTracker customerName = new TagTracker(){
            public void onStart( String namespaceURI,
                 String localName,
                 String qName,
                 Attributes attr ) {
               // Example of capturing an attribute.
               target.riskFactor = attr.getValue("RiskFactor");
            }
            public void onEnd(   String namespaceURI,
                       String localName,
                       String qName,
                       CharArrayWriter contents ){
               // Example of capturing contents.
               SaxMapperLog.trace( "Found :[/CustomerOrder/Customer/Name]" );
               target.customerName =  contents.toString();
            }
         };
         root.track( "/CustomerOrder/Customer/Name", customerName );
         // -- create action for /CustomerOrder/Items/Item/Name
         TagTracker itemName = new TagTracker(){
            public void onEnd(   String namespaceURI,
                       String localName,
                       String qName,
                       CharArrayWriter contents ){
               // Example of capturing contents to a list...
               SaxMapperLog.trace( "Found :[/CustomerOrder/Items/Item/Name]" );
               target.addItemName( contents.toString() );
            }
         };
         root.track( "/CustomerOrder/Items/Item/Name", itemName );
           return root;
   }
   public static void main( String[] argv ){
      try {
         System.out.println( "Example3:");
         Example3 e3 = new Example3();
         TestObject to  = (TestObject) e3.fromXML("Example3.xml");
         // Display name and risk factor.
         System.out.println( "Customer Name: " + to.customerName
                     + " Risk Factor: " + to.riskFactor );
         // Display list of item names.
         String name;
         Enumeration e = to.getItemNames();
         while( e.hasMoreElements()){
            name  = (String) e.nextElement();
            System.out.println( name );
         }
      } catch ( Exception e ) {
                    e.printStackTrace();
      }
   }
}

Here's sample data for our example:

<?xml version="1.0"?>
<CustomerOrder>
   <Customer>
      <Name RiskFactor="high"> Customer X </Name>
      <Address> unknown  </Address>
   </Customer>
   <Items>
      <Item>
         <ProductCode> 098 </ProductCode>
         <Name> Item 1 </Name>
         <Price> 32.01 </Price>
      </Item>
      <Item>
         <ProductCode> 4093 </ProductCode>
         <Name> Item 2 </Name>
         <Price> 0.76 </Price>
      </Item>
      <Item>
         <ProductCode> 543 </ProductCode>
         <Name> Item 3 </Name>
         <Price> 1.42 </Price>
      </Item>
   </Items>
</CustomerOrder>

Running our example with the sample data yields the following output:

Example3:
Customer Name:  Customer X  Risk Factor: high
 Item 1
 Item 2
 Item 3

Putting it all together

When we were exploring the ideas behind a generic solution to tracking the navigational aspect of XML parsing, I mentioned that the library must handle recursive data structures. Now that we have the class library and the basic process for creating mappers, let's put it all together in an example that tackles the toughest mapping problem: recursive data structures. For our example, we'll use a recursive data structure that most of us are familiar with -- an XML representation of an operating-system directory.

There are two important things to notice in that code. First, the recursion is accomplished by adding a circular tag tracker reference to the tag tracker that tracks the recursive tag. Pointing the tracker back to itself allows the recursive tag tracker to place itself onto the stack multiple times in order to recurse into nested structures.

Second, notice the use of a second stack to keep track of the current directory, which helps us keep track of where we are in creating the mapped object. Think of the tag tracker library stack as the "from" part of navigation and that second stack as the "to" part.

I'll use an example that reads an XML file that contains a listing of a directory tree for a directory on my PC.

First, we see a class that will contain the hierarchical tree of information representing the directories on my PC:

import java.io.*;
import java.util.*;
public class Dir {
   String dirName = "";
   Vector fileNames = new Vector();
   Vector dirList = new Vector();
   public Dir ( String name) {
      dirName = name;
   }
   void addFileName( String name ){
      fileNames.addElement( name.trim() );
   }
   void addDirectory( Dir newDir ){
      dirList.addElement( newDir );
   }
   void print( PrintStream out, String offset ){
      out.println( offset + dirName + ":" );
      offset = offset + "  ";
      // Display list of file names.
      String name;
      Enumeration e = fileNames.elements();
      while( e.hasMoreElements()){
         name  = (String) e.nextElement();
         System.out.println( offset + "- " +  name );
      }
      // Display list of directories.
      Dir d;
      e = dirList.elements();
      while( e.hasMoreElements()){
         d  = (Dir) e.nextElement();
         d.print( out, offset );
      }
   }
}

Finally, our recursive example code:

import reh.SaxMapper.*;
import org.xml.sax.*;
import java.util.*;
import java.io.*;
public class Example4 extends SaxMapper {
   private Dir target;
   private Stack stack = new Stack();
   public Object getMappedObject() {
      return target;
   }
   public TagTracker createTagTrackerNetwork() {
      SaxMapperLog.trace( "creating tag track network" );
         // -- create root: /
         TagTracker root = new TagTracker() {
            public void onDeactivate() {
               // The root will be deactivated when
               // parsing a new document begins.
               // clear the stack
               stack.removeAllElements();
               // create the root "dir" object.
               target = new Dir("root");
               // push the root dir on the stack...
               stack.push( target );
            }
            };
         // -- create action /listing/directory  and */directory
         TagTracker dir = new TagTracker(){
            public void onStart( String namespaceURI,
                 String localName,
                 String qName,
                 Attributes attr ) {
               // Capture the directory name...
               String dirName = attr.getValue("name");
               Dir d = new Dir( dirName );
               Dir temp = (Dir) stack.peek();
               // Log a trace message...
               SaxMapperLog.trace( "Creating directory: " + dirName );
               // Connect new directory to its parent...
               temp.addDirectory( d );
               // Make the new directory the active directory...
               stack.push(d);
            }
            public void onEnd(   String namespaceURI,
                       String localName,
                       String qName,
                       CharArrayWriter contents ){
               // Clean up the directory stack...
               stack.pop();
            }
         };
         root.track( "listing/directory", dir );
         // Key to handling recursive data in XML mappers...
         // have the directory tag tracker track itself...
         dir.track( "directory", dir );
         // -- create action */file
         TagTracker file = new TagTracker(){
            public void onEnd(   String namespaceURI,
                       String localName,
                       String qName,
                       CharArrayWriter contents ){
               // Example of capturing contents to
               // a file in the current directory
               Dir temp = (Dir) stack.peek();
               temp.addFileName( contents.toString() );
            }
         };
         dir.track( "file", file );
           return root;
   }
   public static void main( String[] argv ){
      try {
         System.out.println( "Example4:");
         Example4 e4 = new Example4();
         Dir root  = (Dir) e4.fromXML("Example4.xml");
         // Display the directory...
         root.print( System.out, "" );
      } catch ( Exception e ) {
                    e.printStackTrace();
      }
   }
}

Below you'll find sample data to use with our example. Notice how directories contain both files and other directories:

<?xml version="1.0"?>
<listing>
   <directory name="javaworld">
      <directory name="reh">
         <directory name="SaxMapper">
         <file>    SaxMapper.class      </file>
         <file>   SaxMapperLog.class   </file>
         <file>   SkippingTagTracker.class   </file>
         <file>   TagTracker.class      </file>
         </directory>
      </directory>
      <file> article2.txt </file>
      <directory name="example2_4">
         <file>   buildExample4.bat      </file>
         <file>   Dir.class         </file>
         <file>   Dir.java         </file>
         <file>   example2_4.tws      </file>
         <file>   Example4.class      </file>
         <file>   Example4.class      </file>
         <file>   Example4.class      </file>
         <file>   Example4.class      </file>
         <file>   Example4.java      </file>
         <file>   Example4.out      </file>
         <file>   Example4.xml      </file>
         <file>   runExample4.bat      </file>
      </directory>
   </directory>
</listing>

Running our example with the sample data yields the following output:

Example4:
root:
  javaworld:
    - article2.txt
    reh:
      SaxMapper:
        - SaxMapper.class
        - SaxMapperLog.class
        - SkippingTagTracker.class
        - TagTracker.class
    example2_4:
      - buildExample4.bat
      - Dir.class
      - Dir.java
      - example2_4.tws
      - Example4.class
      - Example4.class
      - Example4.class
      - Example4.class
      - Example4.java
      - Example4.out
      - Example4.xml
      - runExample4.bat

Conclusion

Well, we did it. We've created a class library and an approach to using SAX parsers that keeps the navigational and data collections aspects isolated from each other, giving us a conceptual model that is easier than SAX but without the runtime penalties associated with DOM.

As a pleasant side effect, we've isolated all of the SAX code into a base class so that our mapping code is isolated from SAX.

One thing that I wanted to mention: AElfred, a light weight XML parser. The latest version of the AElfred jar is less than 24 K compared to 1.4 M for the Xerces jar. The AElfred XML parser has a non-SAX event interface and a SAX version 1 interface, so you will need to adjust the code in the SaxMapper class to work with one of those interfaces. If you are not using XML namespaces, AElfred is a very small and fast parser. Using the techniques we developed in this article with AElfred should be just as easy as using DOM.

Learn more about this topic

  • Recent XML articles in JavaWorld
  • XML help
  • Other valuable XML resources
  • Information on logging packages available on the Internet

Join the discussion
Be the first to comment on this article. Our Commenting Policies