Programming XML in Java, Part 3
DOMination: Take control of structured documents with the Document Object Model
By Mark Johnson, JavaWorld.com, 07/07/00
- Digg
- Reddit
- SlashDot
- Stumble
- del.icio.us
- Technorati
- dzone
The Simple API for XML (SAX) is an excellent interface for many XML applications. It is intuitive, extremely easy to learn,
and, as its name implies, simple. Any Java programmer can, in just an hour or two, learn to use and develop an application
using SAX. It is especially useful in situations where the data in an XML file is already in a form that is structurally similar
to the desired output. For instance, the recipe example in
Part 2 of this series formatted Recipe XML into an HTML representation of a recipe page and a shopping list. The structure of the
output HTML was very similar to the structure of the input XML. The ingredients in the Recipe XML were grouped together in
an
<Ingredients> element; the ingredients in the output HTML were grouped together in an unordered list (
<ul>). The tags were somewhat different, but the basic structure was the same.
TEXTBOX:
TEXTBOX_HEAD: Programming XML in Java: Read the whole series!
:END_TEXTBOX
In real data-processing situations, however, the structure of the input data often differs greatly from the eventual output
structure. Since SAX passes SAX events to a programmer-defined handler in the order in which they appear in the input XML,
as the programmer you are responsible for any data restructuring or reordering. Also, if the same data is to be used in more
than one place in the output, you must either perform multiple passes over the XML or arrange for the handler to "remember"
that data while producing output. One example of this was the recipe title in Part 2, which the handler maintained in an internal
variable for use both in the browser title bar and in the Webpage.
For tasks of low and intermediate complexity, SAX works just fine. As an application's complexity (and functionality) increases,
however, the SAX handler code can become extremely difficult to understand. SAX code can spend most of its time storing information
from the input in an internal form usable for producing the desired output. When using SAX, you are generally responsible
for creating an internal object model of your application's information.
DOM to the rescue
The Document Object Model, or DOM, is a standardized object model for XML documents. DOM is a set of interfaces describing
an abstract structure for an XML document. Programs that access document structures through the DOM interface can arbitrarily
insert, delete, and rearrange the nodes of an XML document programmatically.
DOM and SAX parsers work in different ways. A SAX parser processes the XML document as it parses the XML input stream, passing
SAX events to a programmer-defined handler method. A DOM parser, on the other hand, parses the entire input XML stream and
returns a Document object. Document is the programmatic, language-neutral interface that represents a document. The Document returned by the DOM parser has an API that lets you manipulate a (virtual) tree of Node objects; this tree represents the structure of the input XML. Figures 1 and 2 illustrate this difference between the APIs.
- Digg
- Reddit
- SlashDot
- Stumble
- del.icio.us
- Technorati
- dzone
Resources
- For the source code to this article in jar format
http://www.javaworld.com/jw-07-2000/xmldom/xmldom.jar
- For the source code to this article in gzipped tarball format
http://www.javaworld.com/jw-07-2000/xmldom/xmldom.tgz
- For the source code to this article in zip format
http://www.javaworld.com/jw-07-2000/xmldom/xmldom.zip
- The World Wide Web Consortium (W3C) Webpage on the DOM, which contains information about all of the versions (or "levels")
of DOM
http://www.w3.org/DOM/
- The current W3C recommendation for DOM Level 1
http://www.w3.org/TR/REC-DOM-Level-1/
- Find out about SAX
http://www.megginson.com/SAX/index.html
- I covered the DOM API previously in my introductory article on XML, "XML for the absolute beginner," (JavaWorld, April 1999)
http://www.javaworld.com/javaworld/jw-04-1999/jw-04-xml.html
- Sun's tutorial introduction to JAXP (the Java API for XML Parsing) includes a section on DOM
http://java.sun.com/xml/tutorial_intro.html
- A large set of references about DOM appear in the About.com XML site
http://xml.about.com/compute/xml/msubdom.htm
- An entertaining, if vertigo-inducing, multimedia introduction to using DOM in JavaScript
http://www.webcoder.com/howto/15/index2.html
- "XML APIs for databases," by Ramnivas Laddad (JavaWorld, January 2000) gives an example of using DOM in a database application
http://www.javaworld.com/javaworld/jw-01-2000/jw-01-dbxml.html
- There is a new alternative to DOM, called JDOM. In many cases, JDOM is much easier to use than DOM, and is likely to be more
memory efficient. Read "Easy Java/XML Integration with JDOM, Part 1," (JavaWorld, May 2000) by Jason Hunter and Brett McLaughlin, the creators of JDOM
http://www.javaworld.com/javaworld/jw-05-2000/jw-0518-jdom.html
- Once you've read the JDOM article, find out more about the open source JDOM at its Website
http://www.jdom.org/
- See Elliotte Rusty Harold's presentation slides on JDOM from this month's XML DevCon
http://metalab.unc.edu/xml/slides/nypc/jdom/
- "Should I Use SAX or DOM?" at Developerlife.com summarizes when to use SAX and when to use DOM
http://developerlife.com/saxvsdom/default.htm
- "New standards orbit XML" by Tom Yager (InfoWorld, June 2000)
http://www.infoworld.com/articles/mt/xml/00/07/03/000703mtxml.xml