|
|
Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs
DocumentHandler interface, which takes a specific action when a particular condition (such as the start of a tag) occurs during the parsing
of an XML document. But what good is that function? Read on.TEXTBOX: TEXTBOX_HEAD: Programming XML in Java: Read the whole series!
You'll also remember that an XML parser checks that the document is well formed (meaning that roughly all of the open and close tags match and don't overlap in nonsensical ways). But even well-formed documents can contain meaningless data or have a senseless structure. How can such conditions be detected and reported?
This article answers both questions through an illustrative example. I'll start first with the latter question: once the document is parsed, how do you ensure that the XML your program is processing actually makes sense? Then I'll demonstrate an extension to XML that I call LAX (the Lazy API for XML), which makes writing handlers for SAX events even easier. Finally, I'll tie all of the themes together and demonstrate the technology's usefulness with a small example that produces both formatted recipes and shopping lists from the same XML document.
One thing you may have heard about XML is that it lets the system developer define custom tags. With a nonvalidating parser (discussed in Part 1 of this series), you certainly have that ability. You can make up any tag you want and, as long as you balance your open and close tags and don't overlap them in absurd ways, the nonvalidating SAX parser will parse the document without any problems. For example, a nonvalidating SAX parser would correctly parse and fire events for the document in Listing 1.
001 <?xml version="1.0"> 002 <Art CENTURY="20"> 003 <Dada>
004 <Author CENTURY="18" NOMDEPLUME="Voltaire"> 005 François-Marie Arouet 006 </Author> 007 <Tree SPECIES="Maple"> 008 <Yes/> 009 <Book AUTHOR="Musashi, Miyamoto"> 010 <Title LANG="English">The Book of Five Rings</Title> 011 <Title LANG="Nihongo">Go Rin No Sho</Title> 012 <Filter POLY="Chebyshev" POLES="2"/> 013 <Title LANG="Espanol">El Libro de Cinco Anillos</Title> 014 <Title LANG="Francais">Le Livre de Cinq Bagues</Title> 015 </Book> 016 <Bahrain FORMAT="MP3"> 017 <Cathedral CITTA="Firenze"> 018 <Nome>Santa Maria del Fiore</Nome> 019 <Architetto>Brunelleschi, Filippo (1377-1466)</Architetto> 020 <Ora FORMAT="DMY24">22032000134591</Ora> 021 </Cathedral> 022 </Bahrain> 023 <Phobias> 024 <Herbs NAME="Ma Huang"/> 025 <Appliance COLOR="Harvest Gold">Yuck</Appliance> 026 </Phobias> 027 </Tree> 028 </Dada> 029 </Art>
A nonvalidating SAX parser would produce a valid event stream for the document in Listing 1 because the input document is
well formed. It's really stupid input, but it is well formed. Every opening tag has a corresponding close tag, and the tags don't overlap (meaning there are no combinations
of tags like <A><B></A></B>). So a nonvalidating SAX parser will have no problem with Listing 1.