Newsletter sign-up
View all newsletters

Enterprise Java Newsletter
Stay up to date on the latest tutorials and Java community news posted on JavaWorld

Sponsored Links

Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs

Take the sting out of SAX

Generate SAX parsers with XML Schemas

  • Print
  • Feedback

A Simple API for XML (SAX) parser offers an invaluable tool for parsing XML files, especially if you need to parse large XML input files that cannot load into main memory. A SAX parser can also prove helpful if you have a slow input stream, like an Internet connection, and you need to process bytes as soon as they arrive, instead of waiting for the complete input. As a bonus, a well-designed SAX parser is generally faster than the approach of processing a DOM (Document Object Model) tree; you need only one pass over the XML data as opposed to the two passes needed with a DOM tree (one to build the tree, and one to do the processing).

Unfortunately, a SAX parser can be difficult to develop because of its event-driven nature. In this article, I create a source code generator that will help you easily develop a SAX parser.

Note: I don't explain SAX in detail here; see Resources below for some excellent references.

SAX reviewed

SAX is a standard API that parses an XML input stream, like a file or network connection, and triggers events in an event-handler class. Many different SAX parser implementations are available for Java. In my examples here, I use Xerces from the Apache XML Project, one of the most popular parser implementations.

Listings 1 and 2 below show an XML file and a SAX event handler, respectively. (You can download all source code and examples for this article from Resources.)

Listing 1. Example XML

<company name="My Widgets Inc.">
  <employees>
    <employee>
      <name>
        <first>John</first>
        <last>Dole</last>
      </name>
      <office>1-50</office>
      <telephone>123456</telephone>
    </employee>
    <employee>
      <name>
        <first>Jane</first>
        <last>Dole</last>
      </name>
      <office>1-51</office>
      <telephone>123457</telephone>
    </employee>
  </employees>
</company>


Listing 2. SAX handler

    public void startElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, Attributes attributes) throws SAXException
    {
        text.reset();
        
        if (qName.equals ("company"))
        {
            String name = attributes.getValue("name");
            String header = "Employee Listing For "+name;
            System.out.println (header);
            System.out.println ();
        }
        
    }
    public void endElement(java.lang.String uri, java.lang.String localName, java.lang.String qName) throws SAXException
    {
        if (qName.equals ("first"))
        {
            firstName = getText();
        }
        if (qName.equals ("last"))
        {
            lastName = getText();
        }
        
        if (qName.equals ("office"))
        {
            office = getText();
        }
        
        if (qName.equals ("telephone"))
        {
            telephone = getText ();
        }
        
        if (qName.equals ("employee"))
        {
            System.out.println (office + "\t " + firstName + "\t" + 
lastName + "\t" + telephone);
        }
        
    }


The SAX handler above merely prints the XML file's data to the standard output device. It prints a header line containing the company name followed by tab-delimited employee data.

As you can see from Listing 2, parsing even a simple XML file can produce a significant amount of source code. SAX's event-driven (as opposed to document-driven) nature also makes the source code difficult to maintain and debug because you must be constantly aware of the parser's state when writing SAX code. Writing a SAX parser for complex document definitions can prove even more demanding; see Resources for challenging real-life examples.

  • Print
  • Feedback

Resources
  • "Programming XML in Java," Mark Johnson (JavaWorld):