Newsletter sign-up
View all newsletters

Enterprise Java Newsletter
Stay up to date on the latest tutorials and Java community news posted on JavaWorld

Sponsored Links

Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs

XML documents on the run, Part 3

How do SAX2 parsers perform compared to new XMLPull parsers?

  • Print
  • Feedback

In Parts 1 and 2 of this three-part series, I explained both push- (Simple API for XML 2 (SAX2)) and pull-style XML parsers. The pull-side story continues to change rapidly, so, as promised, I'll update you on the latest developments. These include the new Common API for XML Pull Parsing, or XMLPull, announced earlier this month. (Talk about hot off the presses!)

Read the whole "XML Documents on the Run" series:



But that's not all: In Part 2 I left loyal readers hanging on performance differences. Pull parsers offer some big ease-of-use advantages compared to SAX2, but can they measure up to SAX2's industrial-strength performance? You'll find out in this article's second half in which I show performance tests pitting five top SAX2 parsers against two new XMLPull parsers.

XMLPull

Just this month the ringleaders from the two leading pull-parser implementations announced XMLPull. Stefan Haustein from the kXML project and Aleksander Slominski from XPP3 (XML Pull Parser), both feeling that the lack of a common API hindered wider pull parsing adoption, began work on XMLPull in December 2001. The resulting API reflects their substantial experience, drawing from their respective projects to produce an approach that works well for a wide range of applications.

XMLPull supports everything from J2ME (Java 2 Platform, Micro Edition) to J2EE (Java 2 Platform, Enterprise Edition). The J2ME requirement forced them to create a simple interface with the minimal number of classes necessary to function well in limited-memory environments. In contrast, although in J2EE situations, memory isn't usually an issue, flexibility and performance are key. Accommodating both extremes with a single interface is tough. Does XMLPull succeed? I tackle that question below. Let's start by looking at the basic interface.

The all-in-one approach

The XMLPull API consists of a single interface, org.xmlpull.v1.XmlPullParser, along with two supporting classes: org.xmlpull.v1.XmlPullParserException and org.xmlpull.v1.XmlPullParserFactory. The XmlPullParser defines XMLPull's interesting parts, so let's examine the interface and ignore the two support classes.

Think of the XmlPullParser interface as defining a special kind of iterator. That iterator delivers an XML document's components to you one at a time. It's up to you, in your program, to decide when you're done with the current component and ready to move to the next one.

The parser always holds a particular state that matches the current component type. Many of XmlPullParser's methods prove meaningful only when the parser is in a particular state, identified by a set of constant definitions in the interface. When you begin parsing a document, the parser always resides in the START_DOCUMENT state.

How do you determine the parser's state once you begin parsing? Two ways: As the value returned by a call to the interface's next() or nextToken() methods, which advances the parser to the next document component. Or as the value returned by getEventType(), which just gives you the current state.

  • Print
  • Feedback

Resources
  • Visit my XMLBench homepage for the latest parser performance updates and to check out Java document-model performance comparisons
    http://www.sosnoski.com/opensrc/xmlbench
  • For full details of the SAX specification, currently at version 2.0.1, go to
    http://www.saxproject.org
  • For the five SAX2 parsers included in the performance tests, see:
  • For full details of the XMLPull specification, currently at version 1.0.7, go to
    http://www.xmlpull.org
  • Here are the XMLPull parsers tested: