Use XML data binding to do your laundry

Explore JAXB and Castor from the ground up

Let's face it; XML by itself is just another data format that is annoying to access from your Java programs. Don't get me wrong; I appreciate XML for its portability, its separation of data and presentation, and its human and computer readability. However, I just can't be bothered writing DOM (Document Object Model), SAX (Simple API for XML), or even JDOM code to programmatically work with XML data. I have better things to do with my precious programming time.

What I need is a summer intern: someone who can get me lunch and do my dirty work for me -- like writing Java classes that correspond to every XML document type I work with. These Java classes could turn XML documents into my program's objects. Each XML tag would map to an object attribute, and a tag's contents would be the attribute value. Then, for a real challenge, I'd ask my intern to provide a marshal method in each Java class.

"Who's Marshal?" the intern might ask.

"A military officer who marshals things, arranges things in methodical order," I'd explain smugly.

Marshaling a Java object means converting it to XML format for storage or for sending. It's like when you fold your socks to put them neatly away -- which, come to think of it, is something else my intern could do. Later, when you or someone else wears those socks again, it's like turning an XML document back into useable Java objects -- unmarshaling. And you'll agree that, when you wake up in the morning, it's nice to find a basket full of neatly folded socks.

This is the classic marshal and unmarshal diagram, but for fun, this one illustrates my socks analogy.

Marshaling and unmarshaling

The ability to work with unmarshaled XML documents would be great because I could use and maintain regular Java objects much more easily and naturally than I could with a bunch of XML parsing code. My unmarshaled Java objects could even validate attributes based on the original XML Schema constraints. This validation would include type checking and verifying range values. I would have a way to programmatically construct a Java object that could save itself as an XML document valid against a certain XML Schema. Now we're talking! But what about performance? Using these unmarshaled objects, my application would be faster than SAX parsing and require less memory than DOM parsing.

Do you want to work as my intern this summer? I know what you're thinking: writing the marshal, unmarshal, and validating-accessor methods based on schema constraints would be a long and tedious task. In addition, every time I changed my XML Schema, you'd have to update this code. Doing my laundry wouldn't be much fun either. Doesn't matter; I don't want to baby-sit and entertain an intern all summer anyway. I've got better things to do!

What if I told you that there already are XML data-binding frameworks that can generate this type of marshaling and unmarshaling code for you? Just feed in a DTD (document type definition) or an XML Schema and -- presto! -- you have Java classes that can marshal, unmarshal, and check data constraints. And like many Java XML tools, these frameworks are mostly free. In this article, we'll examine two such frameworks: Sun's Java Architecture for XML Binding (JAXB) and Castor from the Exolab Group.

XML data constraints: DTD versus XML Schema

A valuable XML concept is the ability to define your own XML vocabulary. An XML vocabulary is an industry-specific XML information model or document type that you define for XML data sharing. In other words, you define constraints that specify what a particular group of XML documents should always look like. Document creators, programmers, graphic designers, and database specialists use a constrained document type as the basis for creating compatible application pieces. This parallel collaboration around a document type is easy because everyone knows ahead of time what the constrained XML documents will look like.

You can define an XML vocabulary by constraining XML in two different ways. The original method from the XML specification uses a DTD. The new and improved approach uses the recently formalized W3C (World Wide Web Consortium) Recommendation, XML Schema.

For example, in this article, I will define an XML vocabulary for describing socks. I'm a sock expert, since I wear socks almost every day, and so I know exactly what information is needed to fully describe a sock collection. Every sock in my definition has the following descriptive properties: number, name, image, color, price, and smell.

Thus, I can create the following DTD to formally describe a sock collection:

Listing 1. socks.dtd

<!ELEMENT socks ( sock* ) >
<!ELEMENT sock (name, image, color, price, smell) >
<!ATTLIST sock number CDATA #REQUIRED >
<!ELEMENT name (#PCDATA) >
<!ELEMENT image (#PCDATA) >
<!ELEMENT color EMPTY >
<!ELEMENT price (#PCDATA) >
<!ELEMENT smell (#PCDATA) >
<!ATTLIST color value (white|black) #REQUIRED >

Listing 1 says simply that an XML document conforming to my socks.dtd constraints must have zero or more socks. Each sock has exactly one name, image, color, price, and smell -- in that order. The color can only have the values white or black. Each sock has an attribute called number. Do you see why we say DTDs constrain conformant XML documents? (For more on DTDs, see Resources.)

An XML document that is valid against this DTD -- that is, one that follows the constraints correctly -- might look like this:

Listing 2. socks.xml

<?xml version="1.0"?>
<!DOCTYPE socks SYSTEM "socks.dtd">
<socks>
  <sock number="1">
    <name>black socks</name>
    <image>blacksocks.jpg</image>
    <color value="black"/>
    <price>9.99</price>
    <smell>7</smell>
  </sock>
  <sock number="2">
    <name>white socks</name>
    <image>whitesocks.jpg</image>
    <color value="white"/>
    <price>5.34</price>
    <smell>2</smell>
  </sock>
  <sock number="3">
    <name>old white socks</name>
    <image>oldwhitesocks.jpg</image>
    <color value="white"/>
    <price>2.20</price>
    <smell>9</smell>
  </sock>
</socks>

DTDs have garnered some complaints, especially from programmers. The problem: DTDs really constrain only the document's structure, not the data it contains. All the elements and attributes are strings, and you can't specify allowed value ranges. The best data constraining you can do with a DTD is to require that attributes be strings from a constant list. Furthermore, DTDs are not in XML format themselves, so they don't seem to fit in too well.

The answer to these deficiencies: XML Schema. XML Schemas are in XML format; they allow data typing, user-defined types, and range value constraints. XML Schema's popularity and software support is growing because it is now a final W3C Recommendation. Here you'll find an example of constraining the same document type using XML Schema:

Listing 3. socks.xsd

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <xsd:element name="socks">
      <xsd:complexType>
        <xsd:sequence>
          <xsd:element name="sock" minOccurs="0" maxOccurs="unbounded">
            <xsd:complexType>
              <xsd:sequence>
                <xsd:element name="name" type="xsd:string"/>
                <xsd:element name="image" type="imageType"/>
                <xsd:element name="color" type="colorType"/>
                <xsd:element name="price" type="money"/>
                <xsd:element name="smell" type="smellType"/>
                </xsd:sequence>
                <xsd:attribute name="number" type="xsd:string" use="required"/>
            </xsd:complexType>
          </xsd:element>
        </xsd:sequence>
      </xsd:complexType>
    </xsd:element>
    <xsd:simpleType name="imageType">
      <xsd:restriction base="xsd:string">
        <xsd:pattern value="(.)+\.(gif|jpg|jpeg|bmp)"/>
      </xsd:restriction>
    </xsd:simpleType>
    <xsd:simpleType name="colorType">
      <xsd:restriction base="xsd:string">
        <xsd:enumeration value="black"/>
        <xsd:enumeration value="white"/>
      </xsd:restriction>
    </xsd:simpleType>
    <xsd:simpleType name="money">
      <xsd:restriction base="xsd:decimal">
        <xsd:fractionDigits value="2"/>
      </xsd:restriction>
    </xsd:simpleType>
    <xsd:simpleType name="smellType">
      <xsd:annotation>
        <xsd:documentation>0=clean and 10=smells terrible</xsd:documentation>
      </xsd:annotation>
      <xsd:restriction base="xsd:nonNegativeInteger">
        <xsd:minInclusive value="0"/>
         <xsd:maxInclusive value="10"/>
      </xsd:restriction>
    </xsd:simpleType>
</xsd:schema>

That looks a little more complicated, but believe me, the complexity is worth it. In addition to the constraints we specified in the DTD, XML Schema lets us add the following:

  • The contents of the <image> tag must end with an image type extension (like .gif).
  • The <price> must be a number with two fractional digits (like 5.34).
  • The <smell> must be an int between 0 and 10. There is also a documentation comment stating 0=clean and 10=smells terrible.

For more on XML Schemas, see the schema tutorial in Resources.

To associate socks.xml with our XML Schema, we'll change only some attributes in the root (socks) tag:

Listing 4. socks.xml for use with the XML Schema

<?xml version="1.0"?>
<socks xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:noNamespaceSchemaLocation="socks.xsd">
  <sock number="1">
        <name>black socks</name>
...

How do we check to see if an XML document is valid against a DTD or XML Schema? We use an XML parser like Apache Xerces to verify that an XML document conforms to the socks.dtd or socks.xsd constraints. I've provided a Windows batch file in my sample code called validate.bat to do that:

validate.bat <path_to_socks>\socks.xml

This invokes the Xerces parser in validation mode, which asks it to please check socks.xml against its stated document type. Xerces now supports both DTD and XML Schema validation.

I'm a teacher, so I always like to include programming exercises. As an exercise, run validate on socks.xml, change socks.xml to make it invalid, and run validate again. Xerces should produce helpful error messages.

Onward to the data-binding frameworks!

Now we'll use JAXB and Castor to generate Java classes based on DTDs and XML Schemas. Again, schema-based code generation enables us to represent conformant XML documents as objects in our programs. JAXB and Castor perform essentially the same task, but we'll see that Castor is a more mature and full-featured package. Two other similar, young frameworks worth noting are Enhydra Zeus and Arborealis from Beautiful Code BV, but we won't examine them here (see Resources for more information).

Use JAXB to turn your socks black

Let's first look at the Sun framework for XML data binding, JAXB. The JAXB API automates the mapping between XML documents and Java objects. It is currently an early access release, meaning that we can download and use a bare-bones working version. JAXB now supports only Java class creation from DTDs; future releases will add XML Schema support.

Let's begin our tour of generating classes with JAXB based on socks.dtd with the following summary of steps:

  1. Start with our DTD, socks.dtd, and define a JAXB binding schema
  2. Invoke the JAXB schema compiler
  3. Compile our newly generated classes
  4. Examine a test program using these classes; the test program follows these steps:
    1. Unmarshal an existing XML document
    2. Change the content tree
    3. Validate
    4. Marshal
  5. Compile and run the test program

Step 1. Start with our DTD, socks.dtd, and define a JAXB binding schema

To begin, we'll need to write one more thing: a JAXB binding schema -- that is, a JAXB-specific document that helps JAXB convert the DTD to Java classes. It makes up for the DTD data typing deficiencies and separates the programming-specific information from the schema information. For our example, it looks like this:

Listing 5. socks.xjs

<?xml version="1.0"?>
<xml-java-binding-schema version="1.0ea">
<!-- Register a type.  This specifies that we want to use this type instead of String somewhere in our document. --> 
<conversion name="BigDecimal" type="java.math.BigDecimal" />
<element name="socks" type="class" root="true" />
<element name="price" type="value" convert="BigDecimal"/>
<!-- To restrict the sock color to white or black we create an enumeration
     with the allowed values and make the color attribute the new enumeration type -->
<element name="color" type="class">
    <attribute name="value" convert="SockColor"/>
</element>
<enumeration name="SockColor" members="white black"/>
</xml-java-binding-schema>
Related:
1 2 3 Page 1