Recent articles:
Popular archives:
Java: A platform for platforms
Sun's reorg may seem promising to shareholders but it's also a scramble for position. The question now is whether Sun can,
or wants to, maintain its hold on Java technology. Especially with enterprise leaders like SpringSource and RedHat investing
heavily in Java's future as a platform for platforms
Also see:
Discuss: Tim Bray on 'What Sun Should Do'
A Simple API for XML (SAX) parser offers an invaluable tool for parsing XML files, especially if you need to parse large XML input files that cannot load into main memory. A SAX parser can also prove helpful if you have a slow input stream, like an Internet connection, and you need to process bytes as soon as they arrive, instead of waiting for the complete input. As a bonus, a well-designed SAX parser is generally faster than the approach of processing a DOM (Document Object Model) tree; you need only one pass over the XML data as opposed to the two passes needed with a DOM tree (one to build the tree, and one to do the processing).
Unfortunately, a SAX parser can be difficult to develop because of its event-driven nature. In this article, I create a source code generator that will help you easily develop a SAX parser.
Note: I don't explain SAX in detail here; see Resources below for some excellent references.
SAX is a standard API that parses an XML input stream, like a file or network connection, and triggers events in an event-handler class. Many different SAX parser implementations are available for Java. In my examples here, I use Xerces from the Apache XML Project, one of the most popular parser implementations.
Listings 1 and 2 below show an XML file and a SAX event handler, respectively. (You can download all source code and examples for this article from Resources.)
Listing 1. Example XML
<company name="My Widgets Inc.">
<employees>
<employee>
<name>
<first>John</first>
<last>Dole</last>
</name>
<office>1-50</office>
<telephone>123456</telephone>
</employee>
<employee>
<name>
<first>Jane</first>
<last>Dole</last>
</name>
<office>1-51</office>
<telephone>123457</telephone>
</employee>
</employees>
</company>
Listing 2. SAX handler
public void startElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, Attributes attributes) throws SAXException
{
text.reset();
if (qName.equals ("company"))
{
String name = attributes.getValue("name");
String header = "Employee Listing For "+name;
System.out.println (header);
System.out.println ();
}
}
public void endElement(java.lang.String uri, java.lang.String localName, java.lang.String qName) throws SAXException
{
if (qName.equals ("first"))
{
firstName = getText();
}
if (qName.equals ("last"))
{
lastName = getText();
}
if (qName.equals ("office"))
{
office = getText();
}
if (qName.equals ("telephone"))
{
telephone = getText ();
}
if (qName.equals ("employee"))
{
System.out.println (office + "\t " + firstName + "\t" +
lastName + "\t" + telephone);
}
}
The SAX handler above merely prints the XML file's data to the standard output device. It prints a header line containing the company name followed by tab-delimited employee data.
As you can see from Listing 2, parsing even a simple XML file can produce a significant amount of source code. SAX's event-driven (as opposed to document-driven) nature also makes the source code difficult to maintain and debug because you must be constantly aware of the parser's state when writing SAX code. Writing a SAX parser for complex document definitions can prove even more demanding; see Resources for challenging real-life examples.