Newsletter sign-up
View all newsletters

Sign up for our technology specific newsletters.

Enterprise Java
Email Address:

Programming XML in Java, Part 2

Experience the joy of SAX, LAX, and DTDs

  • Digg
  • Reddit
  • SlashDot
  • Stumble
  • del.icio.us
  • Technorati
  • dzone
If you read last month's article, you already understand how you can use SAX (the Simple API for XML) to process XML documents. (If you haven't read it yet, you may want to start there; see "Read the Whole Series!" below). In that article, I explained how application writers implement the SAX DocumentHandler interface, which takes a specific action when a particular condition (such as the start of a tag) occurs during the parsing of an XML document. But what good is that function? Read on.

TEXTBOX: TEXTBOX_HEAD: Programming XML in Java: Read the whole series!

:END_TEXTBOX

You'll also remember that an XML parser checks that the document is well formed (meaning that roughly all of the open and close tags match and don't overlap in nonsensical ways). But even well-formed documents can contain meaningless data or have a senseless structure. How can such conditions be detected and reported?

This article answers both questions through an illustrative example. I'll start first with the latter question: once the document is parsed, how do you ensure that the XML your program is processing actually makes sense? Then I'll demonstrate an extension to XML that I call LAX (the Lazy API for XML), which makes writing handlers for SAX events even easier. Finally, I'll tie all of the themes together and demonstrate the technology's usefulness with a small example that produces both formatted recipes and shopping lists from the same XML document.

Garbage in, garbage out

One thing you may have heard about XML is that it lets the system developer define custom tags. With a nonvalidating parser (discussed in Part 1 of this series), you certainly have that ability. You can make up any tag you want and, as long as you balance your open and close tags and don't overlap them in absurd ways, the nonvalidating SAX parser will parse the document without any problems. For example, a nonvalidating SAX parser would correctly parse and fire events for the document in Listing 1.

Listing 1. A well-formed, meaningless document



A nonvalidating SAX parser would produce a valid event stream for the document in Listing 1 because the input document is well formed. It's really stupid input, but it is well formed. Every opening tag has a corresponding close tag, and the tags don't overlap (meaning there are no combinations of tags like <A><B></A></B>). So a nonvalidating SAX parser will have no problem with Listing 1.

  • Digg
  • Reddit
  • SlashDot
  • Stumble
  • del.icio.us
  • Technorati
  • dzone
Comment
Login
Forgot your account info?
Add comment
Anonymous comments subject to approval. Register here for member benefits.
Have a JavaWorld account? Log in here. Register now for a free account.
Resources
  • Download the source code and class files for this article
Additional resources