In just a few years, XML's importance to Java developers has grown by leaps and bounds, especially with Web services on the scene. XML, however, is an evolving technology with numerous subtechnologies emerging to solve common problems. With that in mind, how can a Java developer hope to keep up? This glossary of important XML acronyms will help you dip your feet into the XML pond. I assume some XML knowledge, such as that you understand attributes and elements and know how an XML document looks. To further aid your knowledge, for each technology introduced here, I've included a list of related Websites in Resources.
As for structure, I've split the acronyms into four topical sections:
- Develop with XML
- XML building blocks
- Communicate with XML
- Display XML
Within each section, rather than a mere alphabetical list, I've listed the terms in order of importance or relevance to other terms within the section. For quick links to specific terms, use the following alphabetical list:
- XML Base
Develop with XML
By "Develop with XML," I mean writing Java code. You'd use each of the technologies below, all used by typing
import ...;, to read or write XML documents. I've started with the most important, although certainly not the most interesting or even useful.
DOM (Document Object Model) is the standard in-memory XML representation. DOM proves flexible in that you can access any document bits whenever you want, but it can be memory hungry, so developers commonly use it to build client applications where memory is not an issue. DOM suffers from being old, language-neutral (so it is a lowest common denominator solution), and works with ill-formed HTML documents—all of which add up to a fairly unfriendly API. Many developers upgrade to JDOM or DOM4J.
SAX (Simple API for XML) differs from DOM in that it is event-driven: the document flashes before your eyes while the parser notifies you of elements and attributes. You pick out the bits you want as it goes by. SAX is lightweight and simple, but working out your location in a document can be challenging. Developers generally use SAX in server applications where memory can be tight.
JAXP (Java API for XML Parsing) is not really a technology separate from DOM or SAX, but simply an extension to both that makes them easier to use in Java. Both DOM and SAX are language-neutral, so neither answers questions like, "How do we create a parser." JAXP answers the creation question and is a standard API for XSLT (Extensible Stylesheet Language Transformations) (see below).
To those struggling with DOM, JDOM will seem like a breath of fresh air. JDOM fixes some of DOM's more arcane areas. For example, unlike with DOM, in JDOM elements and attributes are objects, so you can call
new Element("name"); and so on. JDOM uses Java collections and helps you write files, another area where DOM falls short. JDOM is being standardized under JSR 102 (Java Specification Request), which indicates Sun Microsystes' conviction that JDOM is a good solution.
JDOM's greatest strengths include its ease of use and specification compliance, but some developers criticize its performance. At the time of this writing, JDOM has not yet officially reached version 1.0; however, it is already stable and fairly bug-free, so don't let the version number put you off.
If you are considering JDOM, also consider DOM4J.
Because DOM4J fixes the same DOM problems as JDOM, they have similar APIs. In fact, DOM4J was originally a fork from JDOM. They differ most in that DOM4J, like DOM, uses interfaces in some places where JDOM uses objects; consequently, in DOM4J, you need a factory to create elements, attributes, and so on. While that makes DOM4J slightly harder to use, it does make DOM4J more flexible, since there can be several
Sun used DOM4J in the JAXM (Java APIs for XML Messaging) reference implementation, so Sun clearly sees DOM4J as a viable and sensible solution. As a DOM4J advantage compared to JDOM, DOM4J includes Jaxen, letting you use Xpath expressions to select nodes from the tree. While Jaxen also works with JDOM, the two are not well integrated.
JAXB (Java API for XML Binding) offers a fresh method to parse XML documents. So fresh that the ink has yet to dry, so to speak. JAXB, not due for release until the end of 2002, is an in-memory model like the DOM variants, but the similarity with DOM ends there. With JAXB, you compile your DTD (document type definition) (or soon XML Schema) into Java classes. You sometimes will write instructions to the compiler to help it create exactly the Java classes you want or you can let it create defaults.
From that point on, the only APIs you need are your newly created Java classes. The current JAXB early access release creates Java classes with methods called
unmarshal(), which load and save to and from disk; from then on you use getters and setters just like in any other JavaBean.
JAXB represents a promising way to ease XML editing. It's a technology worth watching.
XML building blocks
XML building block technologies represent the foundation upon which the rest of the XML world is built. An understanding of many of these key technologies proves vital to employing the technologies found in this glossary's other sections.
Namespaces let you mix tags from different sources without confusing their origin. Each element in a namespace acquires two extra bits of information.
First, and most importantly, is a unique identifier, generally resembling a URL, that distinguishes elements from different sources. While unique identifiers resemble URLs, you're not guaranteed a response if you type an identifier into a Web browser. For example, XSL (Extensible Stylesheet Language) uses
"http://www.w3.org/1999/XSL/Transform" as its unique identifier, which every element using the specified namespace includes.
However, if you had to type that whole string each time you used an element, things would quickly become unreadable. In response, namespaces' second extra bit of information is a shortcut. For XSL, developers generally use the string
xsl, but any string would do.
You must associate the shortcut with the unique identifier, accomplished by placing an
xmlns attribute on a parent element. For example:
The above code announces that, for this element, and all its children, the prefix
xsl refers to the namespace
http://www.w3.org/1999/XSL/Transform. Whenever you use an element from this namespace, you should prefix it with the shortcut string:
That way a processor that works with elements from a certain namespace can look for
xmlns attributes, find the shortcut, then work with the shortcut's elements.
XML lets you store arbitrary data in an organized manner, and it lets you design how to lay it out. Sometimes, however, you may wish to explain exactly how you laid things out.
You can do so with a DTD (document type definition), which further lets your XML parser check that the data you are reading is a well-formed XML document and conforms to your specified layout. A DTD is a contract that specifies an XML document's layout.
However, DTD suffers from not being XML: its roots lie in SGML (Standard Generalized Markup Language), forcing you to learn another language. Moreover, DTDs do not work well with namespaces, and they are not particularly good at nailing down exactly what you can put where. A general push is on to replace DTDs with XML Schema.
XML Schema lets you define an XML document's contents similarly to DTDs, but XML Schema does not suffer from DTD's shortcomings.
Most importantly, XML Schema works well with namespaces. XML Schema can specify your document's appearance far more accurately—letting you specify numbers of elements and the legal strings they can contain better than DTDs. Finally, unlike DTDs, an XML Schema document is an XML document, so the same tools that edit and test your XML can edit and test your XML Schema.
<a href="..."> on steroids, lets you link to more than one item and have labels and meanings for each link.
The simplest example resembles the HTML version, with some extra attributes:
<a xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="simple" xlink:href="http://www.artist.com"/>
<a> to signify links, but you can use XLink in a variety of XML documents so you are not restricted to using
xmlns:xlink attribute gives meaning to the other attributes by announcing that this is XLink. The
xlink:type="simple" attribute tells the XLink interpreter to keep things simple, and
xlink:href="..." represents the real link, just like in HTML.
A more involved sample showing multiple links looks like this:
<artist xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="extended"> <album xlink:type="locator" xlink:label="album" xlink:href="a.html"/> <song xlink:type="locator" xlink:label="single" xlink:href="b1.html"/> <song xlink:type="locator" xlink:label="single" xlink:href="b2.html"/> </object>
In this example, one
<artist> links to one album and two singles.
XLink uses XML:Base (see below).
The XML Base attribute is analogous to the HTML
<BASE> element. In HTML, you can use
<BASE HREF="http://www.foo.com/"> to instruct the browser to resolve relative URLs using the base given in the
href. XML:Base resembles the
cd command under DOS or Unix. Here's an example of XML:Base working with XLink:
<doc xml:base="http://foo.com/" xmlns:xlink="http://www.w3.org/1999/xlink"> <link xlink:type="simple" xlink:href="new.xml">new</link> </doc>
In the code above, XLink looks for
http://foo.com/new.xml when it must resolve the link.
XPath, a language for selecting an XML document's parts, lets you treat an XML document like a filesystem. XPath queries start with a current element or attribute (much like a current directory within a filesystem) and let you specify other nodes relative to your location.
For example, the path
".." takes you to the parent element.
"aaa" takes you to a child node called
<aaa> and the path
"/aaa/*" jumps to the root element called
<aaa> and selects the elements inside. While that resembles a filesystem, XPath gets much cleverer. For example:
"ccc"selects the fifth
<eee>elements with ancestors (however distant) called
<fff>elements with the attribute
"//ggg/following::*"selects the elements following the first
Xalan, Saxon, Jaxen, and DOM4J let you select nodes using XPath expressions. XPath is particularly valuable here if you use XML to configure an application: rather than digging around your document using DOM or SAX, you can simply request the element you want to read. XPath is the platform upon which XSLT, XPointer, and other technologies are built.
XPointer closely resembles XPath by letting you specify an XML document's parts. Further, XPointer and XPath share a similar syntax. However, XPointer differs from XPath in that the XPointer specifies only a location or contiguous region of the original document; XPath can select many unconnected elements. Compared to XPath, XPointer allows finer control over what you select, down to selecting parts of a text node.
However, XPointer remains controversial because Sun holds a key XPointer patent which the company refuses to freely license. Instead, Sun licenses it on the condition that XPointer improvements go to the W3C (World Wide Web Consortium). With the license, Sun aims to keep XPointer free, or, to read between the lines, to keep Microsoft from embracing and extending this one.
Unfortunately, XML documents can quickly become lengthy. XML is quite verbose, and sometimes we just want to store a lot of data. Editing 20 MB files can be memory hungry, and finding your way around them, nearly impossible. XInclude solves the problem by enhancing XML with an
<xi:include> element that lets you build composite documents.
XInclude lets you specify a file or URI (Uniform Resource Identifier) that should be treated as if it replaced the
XSLT (Extensible Stylesheet Language Transformations) converts any XML document into another XML, HTML, or plaintext document. Some developers find XSLT difficult to grasp because it is a rule-based language, not primarily a procedural or object-based language (although with effort you can make it work in a procedural way).
With XSLT, you specify a set of rules—called templates—that describe how the output document should be created, for example:
<xsl:template match="data"> <table border="1"> <xsl:apply-templates/> </table> <xsl:template>
The above template tells the XSLT processor to find elements in the source document called
<data>, then create an HTML table. You would define other templates to create the table's content.