Newsletter sign-up
View all newsletters

Enterprise Java Newsletter
Stay up to date on the latest tutorials and Java community news posted on JavaWorld

Sponsored Links

Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs

Programming XML in Java, Part 2

Experience the joy of SAX, LAX, and DTDs

  • Print
  • Feedback
If you read last month's article, you already understand how you can use SAX (the Simple API for XML) to process XML documents. (If you haven't read it yet, you may want to start there; see "Read the Whole Series!" below). In that article, I explained how application writers implement the SAX DocumentHandler interface, which takes a specific action when a particular condition (such as the start of a tag) occurs during the parsing of an XML document. But what good is that function? Read on.

TEXTBOX: TEXTBOX_HEAD: Programming XML in Java: Read the whole series!

:END_TEXTBOX

You'll also remember that an XML parser checks that the document is well formed (meaning that roughly all of the open and close tags match and don't overlap in nonsensical ways). But even well-formed documents can contain meaningless data or have a senseless structure. How can such conditions be detected and reported?

This article answers both questions through an illustrative example. I'll start first with the latter question: once the document is parsed, how do you ensure that the XML your program is processing actually makes sense? Then I'll demonstrate an extension to XML that I call LAX (the Lazy API for XML), which makes writing handlers for SAX events even easier. Finally, I'll tie all of the themes together and demonstrate the technology's usefulness with a small example that produces both formatted recipes and shopping lists from the same XML document.

Garbage in, garbage out

One thing you may have heard about XML is that it lets the system developer define custom tags. With a nonvalidating parser (discussed in Part 1 of this series), you certainly have that ability. You can make up any tag you want and, as long as you balance your open and close tags and don't overlap them in absurd ways, the nonvalidating SAX parser will parse the document without any problems. For example, a nonvalidating SAX parser would correctly parse and fire events for the document in Listing 1.

Listing 1. A well-formed, meaningless document



A nonvalidating SAX parser would produce a valid event stream for the document in Listing 1 because the input document is well formed. It's really stupid input, but it is well formed. Every opening tag has a corresponding close tag, and the tags don't overlap (meaning there are no combinations of tags like <A><B></A></B>). So a nonvalidating SAX parser will have no problem with Listing 1.

  • Print
  • Feedback

Resources
  • Download the source code and class files for this article
Additional resources