Java: A platform for platforms
Sun's reorg may seem promising to shareholders but it's also a scramble for position. The question now is whether Sun can, or wants to, maintain its hold on Java technology. Especially with enterprise leaders like SpringSource and RedHat investing heavily in Java's future as a platform for platforms

Also see:

Discuss: Tim Bray on 'What Sun Should Do'

Featured Whitepapers
Newsletter sign-up
View all newsletters

Sign up for our technology specific newsletters.

Enterprise Java
Email Address:

XML documents on the run, Part 1

SAX speeds through XML documents with parse-event streams

  • Digg
  • Reddit
  • SlashDot
  • Stumble
  • del.icio.us
  • Technorati
  • dzone

One of the oldest approaches to processing XML documents in Java also proves one of the fastest: parse-event streams. That approach became standardized in Java with the SAX (Simple API for XML) interface specification, later revised as SAX2 to include support for XML Namespaces.

Read the whole "XML Documents on the Run" series:



Event-stream processing offers other advantages beyond just speed. Because the parser processes the document on the fly, you can handle it as soon as you read its first part. Other approaches generally require you to parse the complete document before you start working with it -- fine if the document comes off a local disk drive, but if the document is sent from another system, parsing the complete document can cause significant delays.

Event-stream processing also eliminates any document size limits. In contrast, approaches that store the document's representation in memory can run out of space with very large documents. Setting a hard limit on a real-world document's size is often difficult, and potentially a major problem in many applications.

A note on the source code

This article features two example source code files: stock.jar and option.jar, both found in a downloadable zip file in Resources. Each jar file includes full example implementations, along with sample documents and test driver programs. To try an example, create a new directory, then extract the jar's files to that directory with jar xvf stock.jar or jar xvf option.jar. The readme.txt file gives instructions for setting up and running the test drivers.

The event view

Parsers with event-stream interfaces deliver a document one piece at a time. Think of the document's text as spread out in time, as it would be if read from a stream. The parser looks for significant document components (start and end tags, character data, and so on) in the text, generating parse events for each.

For example, here's a simple document:

<author>
  <first-name>Dennis</first-name>
  <last-name>Sosnoski</last-name>
</author>


The table shows the parse-event sequence a SAX2 parser would generate for this document (though the parser can divide up the character data reported by characters events differently than I've shown, as I discuss when I get to the actual code).

Parse events for document
Text processed
Parse event
""
startDocument()
"<author>"
startElement("author")
"\n "
characters("\n ")
"<first-name>"
startElement("first-name")
"Dennis"
characters("Dennis")
"</first-name>"
endElement("first-name")
"\n "
characters("\n ")
"<last-name>"
startElement("last-name")
"Sosnoski"
characters("Sosnoski")
"</last-name>"
endElement("last-name")
"\n"
characters("\n")
"</author>"
endElement("author")

Notice in the table that the parse events include both start of element and end of element notifications -- important information for your program because it lets you track the document's nested structure. Without the end notifications, you couldn't know which elements or character data are part of the content of some earlier element. Also note that the parse events include all the character data in the document, even the whitespace sequences most people would consider unimportant.

  • Digg
  • Reddit
  • SlashDot
  • Stumble
  • del.icio.us
  • Technorati
  • dzone
Comment
Login
Forgot your account info?
Add comment
Anonymous comments subject to approval. Register here for member benefits.
Have a JavaWorld account? Log in here. Register now for a free account.
Resources
  • For full details of the SAX specification, currently at version 2.0.1, go to
    http://www.saxproject.org
  • The "Links" page within the SAX Project site features links to numerous related areas, including an assortment of SAX2 parsers
    http://www.saxproject.org/?selected=links
  • Sun Microsystems's JAXP page gives links to downloads, documentation, and other resources
    http://java.sun.com/xml/jaxp/index.html
  • For another take on working with the SAX2 APIs, check out Robert Hustead's "Mapping XML to Java" JavaWorld series in which he describes a class library for working with SAX2: