|
|
Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs
A Simple API for XML (SAX) parser offers an invaluable tool for parsing XML files, especially if you need to parse large XML input files that cannot load into main memory. A SAX parser can also prove helpful if you have a slow input stream, like an Internet connection, and you need to process bytes as soon as they arrive, instead of waiting for the complete input. As a bonus, a well-designed SAX parser is generally faster than the approach of processing a DOM (Document Object Model) tree; you need only one pass over the XML data as opposed to the two passes needed with a DOM tree (one to build the tree, and one to do the processing).
Unfortunately, a SAX parser can be difficult to develop because of its event-driven nature. In this article, I create a source code generator that will help you easily develop a SAX parser.
Note: I don't explain SAX in detail here; see Resources below for some excellent references.
SAX is a standard API that parses an XML input stream, like a file or network connection, and triggers events in an event-handler class. Many different SAX parser implementations are available for Java. In my examples here, I use Xerces from the Apache XML Project, one of the most popular parser implementations.
Listings 1 and 2 below show an XML file and a SAX event handler, respectively. (You can download all source code and examples for this article from Resources.)
Listing 1. Example XML
<company name="My Widgets Inc.">
<employees>
<employee>
<name>
<first>John</first>
<last>Dole</last>
</name>
<office>1-50</office>
<telephone>123456</telephone>
</employee>
<employee>
<name>
<first>Jane</first>
<last>Dole</last>
</name>
<office>1-51</office>
<telephone>123457</telephone>
</employee>
</employees>
</company>
Listing 2. SAX handler
public void startElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, Attributes attributes) throws SAXException
{
text.reset();
if (qName.equals ("company"))
{
String name = attributes.getValue("name");
String header = "Employee Listing For "+name;
System.out.println (header);
System.out.println ();
}
}
public void endElement(java.lang.String uri, java.lang.String localName, java.lang.String qName) throws SAXException
{
if (qName.equals ("first"))
{
firstName = getText();
}
if (qName.equals ("last"))
{
lastName = getText();
}
if (qName.equals ("office"))
{
office = getText();
}
if (qName.equals ("telephone"))
{
telephone = getText ();
}
if (qName.equals ("employee"))
{
System.out.println (office + "\t " + firstName + "\t" +
lastName + "\t" + telephone);
}
}
The SAX handler above merely prints the XML file's data to the standard output device. It prints a header line containing the company name followed by tab-delimited employee data.
As you can see from Listing 2, parsing even a simple XML file can produce a significant amount of source code. SAX's event-driven (as opposed to document-driven) nature also makes the source code difficult to maintain and debug because you must be constantly aware of the parser's state when writing SAX code. Writing a SAX parser for complex document definitions can prove even more demanding; see Resources for challenging real-life examples.