A flood of mail from excited JavaWorld readers indicates that with my previous column I've hit a nerve: there's a great deal of interest in XML in the Java community. Readers are also particularly interested in the marriage of JavaBeans with XML, because it's a concrete example of using a platform-neutral component technology (JavaBeans) with a generalized document format (XML). The result of this combination is a component technology that is network-mobile, standards-based, and potentially interoperable with new and legacy systems alike.
I've received some great ideas from readers on how to improve XML JavaBeans, so I've expanded this topic from two installments to three. This month, we're going to develop a class,
XMLBeanWriter, which writes a JavaBean in XML format. Then next month, we'll look at some of the improvements suggested by readers, and implement them (in Java, of course).
Last month's sample program,
XMLBeanReader, reads XML and produces a running JavaBean in memory. If you're not yet up to speed on exactly what XML is, or if you're not familiar with the World Wide Web Consortium's (W3C's) Document Object Model (DOM), you'll probably want to read last month's cover story, " XML JavaBeans, Part 1." Once you've read and understood that article, you'll be up to speed for this month's topic,
Flattening object structures
In most nontrivial programs, information is processed using data structures. In object-oriented programming, these data structures are usually composed of objects that contain the data of interest to the application. It's often desirable to use the data in the application objects in more than one work session. (Just try to imagine a word processor without a Save command: turn off your computer, and all of your documents would die with the editor!) Persistence is the common term for how a software system arranges for such data to "survive" the death of the process in which they are running, so that they may live to run again another day (or on another machine).
Persistence has been around longer than object-oriented programming -- at least in practice, if not in name. Word processor files, data requests moving through a network, and punch card decks are all examples of persistence. In fact, data persistence is the whole reason for all of the various types of data media we have: from the ancient and lowly clay tablets and papyrus, to the slightly more modern paper tapes, magnetic reels, and punch cards of yesteryear, to today's CD-ROMs and DVDs. The basic idea behind persistence is to encode a software structure into a stream of bytes (a process sometimes called "flattening") in such a way that the stream can later be used to recreate the identical structure.
When object-oriented system design and programming began to appear on the scene, it was immediately clear (to anyone who knew about such things) that it would be very convenient to be able to create persistent objects. An object running inside a program could be converted to a series of bytes, and then stored or transmitted, and that series of bytes could be used later and/or elsewhere to recreate the object. This is precisely what Java serialization does. The Java core has a set of built-in methods (in package java.io) that allows a programmer to arrange for an object to write itself to a byte stream, or to read itself from a byte stream. (To find out more about Java serialization, see the JavaWorld articles on those topics, linked in Resources below.) Persistence in the context of object-oriented systems is commonly called object persistence; or, said another way, objects that have been "persisted" (note that "persist" has suddenly become a transitive verb) are called "persistent objects".
Practically every object-oriented system has some object persistence mechanism, usually involving a persistence format (the layout of the bytes in the data stream) that is "standard", at least for that application. It's been said that the nice thing about standards is that there are so many to choose from, and object persistence formats are no exception. Just take a look at the list of document "filters" in a commercial word processor if you don't believe it: every one of those filters is trying to convert from one proprietary persistence format to the one used internally by the word processor. The Java serialization mechanism was, in fact, designed (via the java.io.Externalizable interface) to allow a programmer to read and write the object persistence formats of other applications.
A standard standard
What does all this have to do with XML and JavaBeans? Well, in these XML JavaBeans articles, we're using XML as a persistence format for JavaBeans components. The nice thing about XML, though, is that it's a true standard, as the explosion of interest in and products available for XML demonstrates. The current XML standard is called the Extensible Markup Language (XML) 1.0 W3C Recommendation, and it is available for you to read in full (see the link in Resources). Any application that is "XML-enabled" (meaning simply that the application can read and write its objects in XML) potentially will be able to interoperate with other systems more easily than before XML was available, because all XML systems conform to the standard (or they're not XML-compliant, by definition.)
For example, a new custom markup language called RDF, for Resource Definition Framework, is currently being standardized. You can read the RDF specification, which is a work-in-progress (see Resources). RDF is a "dialect" (technically called an application) of XML that is being defined for general descriptions of metadata (data about data). Once this format is standardized, every application that uses the standard will be able to manipulate and use data and metadata from other compliant systems, because the applications will all be making the same assumptions about what the data mean.
Now, this doesn't necessarily mean that every application will always understand every other application's markup tags, as anyone who's tried to write browser-neutral HTML can tell you. There's always tension between making a system extensible on one hand, and maintaining compatibility on the other. (Of course, there also are spoilsports who deliberately "extend" existing standards to try to wrest control away from the standards bodies and muddy the water for their competition. End of editorial.) But XML provides a common ground for systems developers to structure object persistence and to publish the interfaces to their systems. Custom markup languages are already appearing for application domains as varied as molecular chemistry, graphical user interfaces, and business forms. As these standards become widespread, system interoperation becomes easier for everyone.
Notice from last month that the
Player object we were reading from an XML file wasn't simply one object. It was three objects: a
Player object, which held references to a
Statistics object, and a
PersonName object. That entire data structure was encoded in the XML, and
XMLBeanReader was able to create the corresponding structure in memory by examining the Document Object Model (DOM) object structure created by the XML parser. And, remember, we didn't need to write any parsing code. We simply gave the XML parser the name of the XML file, and it returned the entire data structure that the XML file represented. Not bad for one line of code!
XMLBeanReader gives us the ability to take a "flat" representation of an object structure (that is, a JavaBean and its properties represented as a text stream that just happens to be XML) and create the corresponding structure in memory. Now, let's have a look at
XMLBeanReader, which can take almost any JavaBean instance and represent it in our XML dialect.
Before we dive into writing a class to convert a JavaBean into XML, there's an issue I want to clarify. The particular XML file format we're using for our XML JavaBeans is not generic XML. It's an application of XML -- that is, a dialect of XML that we defined ourselves, simply by making it up. No other system I know of currently uses it, though judging from my reader mail, some people may have already written little applications that do so.
Our little XML JavaBeans language is also very simple: a
<JavaBean> element contains a
<Properties> element, which itself can contain multiple
<Property> elements. A
<Property> element may contain either a text element, indicating the value of the property, or a
<JavaBean> element, if the property's value is a JavaBean. That's our custom JavaBean markup language. So, you can't (currently) use
XMLBeanReader to read generic XML. The input XML has to conform to the "little language" we've defined.
A class that reads a JavaBean's properties and writes the JavaBean to a file needs to be able to do several things. Here are the tasks it needs to perform, along with some discussion about how they need to be done:
Identify the class of a JavaBean. The class of a JavaBean can be identified simply by calling the bean's
getClass()method, which every Java object must have. (It's defined in
java.lang.Object, which every object inherits.)
Identify the JavaBean's property names, types and values. Getting a JavaBean's property names, types, and values is fairly easy because most of that work comes with the Java core distribution in the class
java.beans.Introspectorand its associated class
Represent each property value as XML. Representing each property's value as XML is a little more difficult, because not every property can be represented as text. If a particular property has no text representation, but the value of the property is a JavaBean, then we can represent the property's value as an XML-encoded JavaBean. This fact indicates that the code that encodes a JavaBean as XML should be a standalone method, so that it can be called recursively if the value of a JavaBean's property is also a JavaBean.
- Write the resulting XML somewhere. The simplest place to write the output XML would be to a file, but that solution isn't very general. What if you wanted to write the XML to a network, or even to a memory buffer? Both of these seem pretty likely scenarios. Fortunately, the Java platform comes to the rescue again with the
Writerinterface, which we'll cover when we discuss the code in-depth below.
Now that we've identified what we're going to do, let's get into the details.
To begin with, I've replanted
XMLBeanReader from the default package into a package called
com.javaworld.JavaBeans.XMLBeans , to which I've added
XMLBeanWriter, and to which we'll be adding other classes and interfaces later.
You can see the entire source for the
XMLBeanWriter at XMLBeanWriter.java, but I've placed the code sections we're discussing in scrolling text boxes below for your convenience. Use whichever works best for you.
At the highest level of abstraction, we want to define a method that writes the XML representation of a JavaBean to a file, given the bean class and a filename. I actually implemented three overloaded methods for added flexibility. The Java Core API provides an interface called
java.io.Writer, which is an abstraction for writing data. The concrete class
java.io.FileWriter writes data to a file, but other subclasses of
Writer allow writing to
String objects, buffers, print streams, pipes, and so on. This is sufficiently general that I put the actual logic in a method called
writeXMLBean, which takes the bean instance and a
Writer as arguments. By using the
XMLBeanWriter can write to any class that implements
java.io.Writer -- for example, a network connection. Two overloaded methods (meaning they have the same name, but different arguments) accept a
File object or a
String filename as an indication of where the data are to go. These methods are just for convenience, since our example program will write to a file. Let's take a look at the code.
Building and printing a document tree
There are three methods called
XMLBeanWriter.writeXMLBean(), and two of them simply call the third one. The code for all three
XMLBeanWriter.writeXMLBean() methods appears in the figure below.
Figure 1. The
The first version of the method (lines 287 to 290) takes a
String filename, then simply creates a
File object, and passes it to the third version of the method. The third version (lines 325 to 329) uses its
File argument to construct a
FileWriter, and passes the result to the second version.
It's the second version of
writeXMLBean (lines 302 to 317) that actually does the work. This
writeXMLBean class could very easily write its XML to the
Writer, keeping track of indentation and so forth as it goes. But think about this for a moment: an XML document in a program can be represented as a tree of Document Object Model (DOM) nodes, right? And we want to flatten that tree of DOM nodes and write it to a text file. Well, why not have one method that actually builds the DOM tree representing the JavaBean, and then another that prints the DOM tree as XML? We could expose methods (which you'll see below) that convert a JavaBean to a tree, and then convert the tree to text, which is then written to a file.
For example, imagine we had a JavaBean that looked like Figure 2.
The JavaBean of class
Grade has two properties: a
float called "average", and a JavaBean of class
Course called "course". (Remember that the value of a JavaBean property may itself be a JavaBean.) The
XMLBeanWriter class will internally build a DOM tree for this JavaBean that will look something like Figure 3. Figure 3 is a DOM tree that represents our JavaBean.
The XML corresponding to the DOM tree in Figure 3 appears in Figure 4.
Figure 4. The XML representing the DOM tree in Figure 3
There are three excellent reasons for building a tree and then printing the tree, instead of just printing XML as we analyze the JavaBean.
First, if we have a method that converts a JavaBean to a DOM tree, then any other "cut-and-paste" Java program that uses DOM trees can use our class to get a DOM representation of the JavaBean. It makes the
XMLBeanWriterclass more generally useful.
The second reason to create a DOM tree and then print it is that most DOM implementations include a method that prints the DOM tree as XML, so that part of the work is already done.
- The third reason is that creating a DOM tree gives me something to write about, which gives you an opportunity to learn how to manipulate XML documents in Java.
Now that we have an idea of what the code's doing, take a look at the source code itself. The code for all versions of
XMLBeanWriter appear in Figure 4 below.
Going back to the third version of
writeXMLBean, notice that first we create a
doc, like this:
TXDocument doc = new TXDocument();
TXDocument is a specific implementation of the interface
org.w3c.dom.Document (or, just
Document is just an interface defined by the W3C. It has no particular implementation, but it does have defined behavior, which pretty much sums up what an interface is. A
TXDocument is a class in IBM's xml4j package that implements the
Document interface. It's the root of the DOM document tree. The
Document interface (you can read about it at http://www.w3.org/TR/REC-DOM-Level-1/level-one-core.html#i-Document) defines an API that allows a programmer to add, delete, and iterate on the child elements of the DOM document tree.
After we create the
Document that we're going to print, we call the method
getAsDOM(), which builds a DOM document tree based on the properties of the JavaBean.
309 DocumentFragment df = getAsDOM(doc, bean); 310 doc.appendChild(df);
There are two interesting things here. The first is that
getAsDOM() returns a
DocumentFragment, which requires a little explanation. Remember above, where I discussed cutting and pasting pieces of document trees? Well, a
DocumentFragment is just what its name implies: it's a little piece of a document that can be added to the list of children of any DOM
DocumentFragment is a lightweight container object that, when added to a DOM
Node's children, gives all of its children to that
Node, but doesn't appear as a child of the
Node itself. So, in the second line above, when the
DocumentFragment is appended to the list of the
Document's children, the
Document doesn't have a child that is a
DocumentFragment; rather, it has whatever children the
DocumentFragment had. (The
DocumentFragment remains unchanged: it doesn't "lose" its children to the
Document.) We use
DocumentFragment objects extensively in
The second interesting thing about the call to
getAsDOM() is that we're passing the
Document object to the method. We do this for a very specific reason: the only objects that can be added to a
Document object are objects that the
Document itself has created. You'll notice that a few lines ago, I created an instance of
TXDocument. This is a specific implementation of a DOM object implemented by IBM. The other objects that may appear in a DOM tree --
Text, and so forth -- must also be created. Instead of peppering
new() statements everywhere, each one of which would have to be changed if the classes implementing DOM were changed,
Document defines a list of methods that perform the creation of the sub-objects of the tree, on request. So, if you want a
Text node object to add to some node of the DOM tree you're building, instead of saying:
Text tx = new TXText("FiddleFaddle");
you ask the
Document to create one for you:
Text tx = doc.createTextNode("FiddleFaddle");
The other DOM object types are created similarly, and we'll see examples of that below.
create methods are an excellent example of the Factory design pattern, wherein creation of an object is deferred to another object. If we change to some other DOM implementation, only the
new(TXDocument) will have to be changed. All of the other objects will be created by the new
Document class. If you have trouble remembering which interface creates DOM objects, just remember the little-known rhyme:
Programs are made by fools like me, But only can make a tree
Programs are made by fools like me,
can make a tree
(See Resources below for an explanation of this pathetic stab at humor.)
Using this Factory design pattern provides other benefits that are powerful but that are also outside of the scope of this article.
The final line of the
writeXMLBean method calls the method I told you about before, in which the
TXDocument object writes its tree to the
Writer in XML format:
The basic framework is out of the way; now, let's have a look at how
getAsDOM() creates a
Anatomy of a bean tree
There are two versions of
getAsDOM(), the first of which appears in Figure 5 below:
getAsDOM() converts a bean to a DOM document
The first thing
getAsDOM() does, other than get the bean's class for later use, is to check to see if the bean defines a method called
DocumentFragment getAsDOM(org.w3c.Document) (lines 42 to 49). If so, it invokes that method on the bean, and sets the resulting
DocumentFragment (the subdocument we're building) to whatever that method returns. It does this to allow a developer to override the standard way of representing this object instance as XML. This is a
writeXMLBean defines in order to add flexibility.
WriteXMLBean.getAsDOM() is "asking" the JavaBean, "Hey, you! Do you know how to represent yourself as a DOM document?" The answer is "yes" if the bean class defines that method. This makes
XMLBeanWriter more flexible. For example, if you're writing a JavaBean that you want to turn into XML, but you want some control over how that bean is represented in XML (you don't like how I did it, let's say), you can still use
XMLBeanWriter. You simply define your own
getAsDOM() method in your JavaBean class, and
XMLBeanWriter.getAsDOM() returns whatever
DocumentFragment you return. We'll see an example of this again in the section "Representing properties as XML" below.
As an example, I've modified the test class
Player.java to define its own
getAsDOM(). Let's say the hypothetical baseball player bean (in Player.java) from our example last month tracked a player's grade point average, but for privacy reasons, we want this number excluded from the resulting DOM document. The code that does that appears in Figure 7.
Figure 7. Class
getAsDOM(), leaving out the new property "gradePointAverage"
In Figure 7, the new method
getAsDOM() actually builds a DOM subdocument in code, and returns it as a
DocumentFragment. Note that it ignores the
gradePointAverage property that the standard mechanism would have output. It also calls
PersonName.getAsDOM(). The XML that results from printing the entire document tree appears in Figure 8 below. (
Bean1.xml, which is provided with the source in Resources below, is the input for this run.)
Figure 8. The result of running
Note how the comment included in
resulted in a comment in the output. Note also the absence of any grade point average. This technique is powerful because it gives the programmer using
complete control over the structure of the output. A more elegant way of doing this, involving
, will appear in next month's column. Let's get back to looking at the code. Referring to Figure 5 again, lines 54 to 55 introspect the bean by getting its
object, and asking the
for the bean's list of
objects. At this point, the program knows what properties it is going to add to the
. Lines 58 to 64 are our first example of creating DOM document objects and adding them to the tree:
058 // Add an Element indicating that this is a JavaBean. 059 // The element has a single attribute, which is that Element's class. 060 Element eBean = doc.createElement("JavaBean"); 061 dfResult.appendChild(eBean); 062 eBean.setAttribute("CLASS", classOfBean.getName()); 063 Element eProperties = doc.createElement("Properties"); 064 eBean.appendChild(eProperties);
Line 60 creates an element that looks like
in XML. The next line (line 61) adds it to the resulting tree. The method then sets the
attribute of the
element to the classname of the bean (line 62), so the resulting XML would look like
. The next two lines (lines 63 to 64) create a
element and append it to the
element. So at this point, the tree looks like this:
<JavaBean CLASS="classname"> <Properties> </Properties> </JavaBean>
Of course, the document doesn't "really" look like this. In reality, it's just a data structure. The XML above is what you'd see if you printed the document fragment at this point.
The loop at the bottom of the method (lines 69-86) iterates through the list of property descriptors, and for each property, it:
gets the XML for each property by calling the overloaded
<Property>element for the property
NAMEattribute to the property name
appends the DOM subtree for the property to the
- adds the new
<Property>element to the
The resulting structure is the DOM representation of the JavaBean. Now, all that's left is to understand how a property is represented as XML.
The code for the second version of
getAsDOM() appears in Figure 9. This method generates the XML for the JavaBean property indicated by the method's third argument, a
PropertyDescriptor. This method may return null, though, to indicate that it couldn't figure out how to represent the property as XML.
getAsDOM determines the XML for properties
Let's go through this three-argument version of
getAsDOM() and see what it does:
Check to see if the property is the bean's
Class-- The very first thing this
getAsDOM()does (lines 114 to 116) is to check to see if the property name is
classand the property type is
java.lang.Class, and if it is, it returns null. This is because the
Introspector(unless it's told otherwise by
BeanInfo) notices that the JavaBean has a method called
getClass()that returns a
Class. So, by definition, it treats the object's class as a property. For most beans, this method will be asked to create XML for the object's class. This is not a valid property for our purposes (we deal with the property's class by asking the JVM about it, not by reading it from the XML), so we tell the caller to ignore this property by returning null.
Check to see if the bean provides custom XML for a property -- Next, lines (lines 130 to 136) check to see if the bean has a method called
getPropertynameAsDOM(). This is just like the test we mentioned above for the
getAsDOM(), except it tests an individual property instead of an entire bean. Looking for a method with a particular name in this way defines a naming convention that
XMLBeanWriteruses to make programming both easy and flexible. If a bean has a
Priceproperty, for example, a programmer may define
void setPrice(int price)as usual, but then also define
getPriceAsDOM(), and return an arbitrarily complex DOM fragment when this chunk of code requests it. We've created a new type of property accessor, which gets a property not as a value, but as a DOM tree. Of course,
XMLBeanReaderneeds to be modified to handle a
setPropertynameAsDOMaccessor. We'll tackle that next month.
Get the property value -- If the bean doesn't define an accessor for the property as XML (let's call it an "XML property accessor"), then we've got to figure out how to represent it ourselves. In Figure 9, lines 157 to 159 use the property's getter method (which came from the
PropertyDescriptor) to get the property's value (in the bean we're processing) as a
java.lang.Object. Hereafter, any representation of this property must depend on having the property's value -- because otherwise, what are you formatting?
See if the property class knows how to represent itself as XML -- Lines 130 to 148 ask the
PropertyDescriptorfor the class of the property, and then use reflection to determine if the property's class defines a
getAsDOM()method. In the first version of
getAsDOM()(Figure 5, the one that gets the DOM fragment for a bean, rather than a property), we looked for a
getAsDOM()method in the bean class; here, we're looking for one in the property class. As an example of this, I added a
getAsDOM()method to the
PersonName.javafrom last month. Now, when a
Playeris being converted to XML, this version of
PersonName.getAsDOM()exists, and uses whatever that method returns as the representation of the
nameproperty. You can see the results of this substitution in the comments and formatting of the
PersonNameobject in Figure 8.
Try to represent the property as a string using the PropertyEditor for the property -- Lines 190 to 220 try to get a string representation of the property. The idea here is to represent the property as a single string, which can be appended to the
<Property>element as a
PropertyEditorfor a property, found in the
PropertyDescriptor, has methods to set and get the property as text. This means we can use the
PropertyEditorto convert the object to and from a string. So, we ask the
PropertyEditor, and if that doesn't work, we ask the system (via
PropertyEditorManager) for a default editor for the property's type. If we get a property editor in either of these two ways, we set its value (
PropertyEditor.setValue()) to the value of the property, and then get its value as text. If all of that succeeds, lines 215 to 219 will create a
Textnode with the property text value inside, and return that as the DOM representation of the property.
- Try to represent the property as a JavaBean -- If all of these other methods fail, we assume that the object returned is a JavaBean. We recursively call the first version of
getAsDOM()on the property's value and hope for the best. Note that there's no guarantee that just any class will be a bean. What that means is, if you are going to define properties that return nonprimitive types, and have no property editor, and you don't want those properties to be lost, you'll have to define either
getAsDOM()for the property,
getPropertynameAsDOM()for the property, or
getAsDOM()for the whole bean. If the property is a bean, however, these last few lines (lines 229 to 231) will correctly create the DOM document for the property, which is then (as you remember from the discussion of
TXDocumentabove) used to write the XML to the output file. Voilà!
This month, we've created a class that turns a JavaBean into XML. Next month, we'll look at (and fix) some of the problems that our new code will cause for
XMLBeanReader, and discuss some reader feedback. Please keep writing: it's gratifying to know that this column is helping people to better understand JavaBeans, XML, and how they work together. Please feel free to suggest improvements to the code you see here.
Learn more about this topic
- Source code for this article is available for download in ZIP format at http://www.javaworld.com/jw-03-1999/beans/XMLBeansMarch99.zip
- Get the source code in gzippped tar format at http://www.javaworld.com/jw-03-1999/beans/XMLBeansMarch99.tar.gz
- You can also get a classfile-only jar file at http://www.javaworld.com/jw-03-1999/beans/XMLBeansMarch99.jar
- The poem alluded to in the article is "Trees" by Joyce Kilmer, and can be found at http://www.venus.net/~nwashel2/poem.trees.html
- The baseball player in our sample XML file is named Jonas Grumby. Extra credit for anyone who can tell me who that is without doing a Web search.
- Previous columns on serialization mentioned above include:
- "Do it the "Nescafé" way -- with freeze-dried JavaBeans" at http://www.javaworld.com/jw-01-1998/jw-01-beans.html
- "Serialization and the JavaBeans Specification" at http://www.javaworld.com/jw-02-1998/jw-02-beans.html
- "Serialization grab bag" at http://www.javaworld.com//jw-04-1998/jw-04-beans.html
- You can read the complete XML standard, called Extensible Markup Language (XML) 1.0 W3C Recommendation, here http://www.w3.org/TR/1998/REC-xml-19980210
- Read the RDF (Resource Definition Framework) standard -- a work in-progress -- at http://www.w3.org/RDF/
- One of the better "one-stop shopping" sources for XML information is at XML.com. It has links to just about everything in the XML world. One of the more interesting things at this site is, believe it or not, the commentary on XML technology http://www.xml.com
- A current version of the XML FAQ by Peter Flynn, et al. This is the version of the FAQ recommended by the W3C http://www.ucc.ie/xml/
- The parser from IBM's
xml4jpackage is available free for noncommercial use. It's even free for commercial use, but be sure to read the license agreement http://www.alphaWorks.ibm.com/formula/XML
- In a note unrelated to JavaBeans, but still too cool for words, check out Jikes, IBM's new open source java compiler! Find out about it at the alphaWorks site at http://www.alphaWorks.ibm.com/formula/JikesOS
- For IBM's Bean Markup Language (BML), see http://www.alphaWorks.ibm.com/formula/BML
- If you're interested in the fine details of the current Document Object Model (Level 1) specification, you can find it at the W3C's Web site at http://www.w3.org/TR/REC-DOM-Level-1/level-one-core.html
- Microsoft has a good set of tutorials on XML at http://www.microsoft.com/xml/tutorial/default.asp
- There's also a whole XML "workshop" area. Don't try to access the workshop in Netscape, thoughthe table of contents doesn't work! These documents are free training, and are well written (though the examples don't always work, even in IE5beta.) Just don't be fooled into thinking that everything there is open standard. Some of the tutorials and many of the articles are about Microsoft-only technology that won't work with all browsers or platforms http://www.microsoft.com/xml/default.asp