XML and Java: A potent partnership, Part 1

Find out why XML and Java have captured the minds of enterprise application developers

I'm happy to say that I've returned from my brief sabbatical all in one piece. While I was away, I had the opportunity to work on an interesting e-commerce application. On that project, one of my responsibilities involved specifying the interface between the server and a bevy of external systems. One facet of that task involved defining the format of the messages to be exchanged. The client's sole requirement was that the message format utilize XML.

Prior to this opportunity, I had always thought of XML as a "new and improved" HTML. I learned, however, that what began life as a "new and improved" HTML had application in domains far removed from Web publishing.

I intend, over the next several months, to explore these "other" applications of XML, with an eye toward the places where those applications intersect with Java. Of course, my column will remain true to its origin as a source of practical instruction (the "how to" aspect) by providing guidance on how to use Java to facilitate the development of these applications. In the process, I hope to help you gain a deeper understanding of how Java and XML work together.

This month, I'll examine XML's role in the exchange of data.

A brief history of computing in the enterprise

At first glance, it may seem odd that XML, a language designed for document markup, has garnered such attention from the enterprise application development community. It's not so odd, however, once you understand the historical background of computing in the enterprise.

The automation of formerly manual business processes was one of the first important tasks for which computers were employed. Since business processes were typically segmented along departmental lines, the systems that automated those business processes were also segmented by department. The resulting systems were characterized by narrow scope -- they often did little more than automate the same steps and procedures that comprised the manual business process -- and lack of interoperability, as they seldom integrated with other systems. These arrangements became known as stovepipe systems: systems oriented toward the needs of a specific group of people or toward a specific purpose with little or no horizontal integration.

Computing, therefore, looked a lot like the illustration in Figure 1.

Figure 1. Independent islands of computing

A telecommunications company, for example, might have had separate systems for plain-old telephone service (POTS) customers, inter-exchange carrier (IXC) customers, and wireless customers.

This created two significant problems. First, the systems didn't interoperate. Second, they duplicated critical business data. These problems made it difficult to create a single, comprehensive view of customers, their behavior, and their value to the company. The business computing model had to change.

The forces of change

Three forces drove the shift away from the "islands of automation" computing model:

  1. The desire to automate business processes across the enterprise and across existing boundaries. This naturally called for the integration of existing systems -- rebuilding from scratch wasn't an option.

  2. The growing understanding that the customer information locked within stovepipe systems had value -- especially when viewed as a whole.

  3. The desire to integrate key systems with vendors and customers.

In short, the new model needed to look a lot like the integrated enterprise illustrated in Figure 2.

Figure 2. The integrated enterprise

Data exchange

In the heyday of the stovepipe system, few developers thought much about data exchange. As a result, the interfaces they developed (if they developed them at all) were often ad hoc and usually differed from system to system.

Every method for exchanging information between two systems -- RMI, RPC, CORBA, and COM included -- has much in common with the other methods. To wit, they all revolve around passing information that has been reduced to a block of data.

Typically, a block of data is not self-describing. The key to what a block of data means -- where fields begin and end, and how information is formatted -- is incorporated into code. The code is responsible for parsing the block of data and validating the information contained therein.

Figure 3, for example, illustrates a simple order. It defines the position of the fields in the order, the size in bytes of each field and its format, and the overall size in bytes of the order itself.

Order
OrderIDUnsigned 32-bit integer
CustomerIDEight 8-bit characters
ProductIDEight 8-bit characters
QuantityUnsigned 16-bit integer
Figure 3. Illustration of a simple order

Figure 4 below contains an instance of such an order.

0x00 0x00 0x00 0x0F (15)
"RULDS   "
"DC123_44"
0x00 0x05 (5)
Figure 4. Instance of a simple order

The following Java code reads data in the format defined in Figure 4 above from an input stream and instantiates a class:

import java.io.InputStream; import java.io.OutputStream; import java.io.IOException;

public class Class1 { public static void read(InputStream inputstream, Struct1 struct1) throws IOException { // nlOrderID

long nl1 = inputstream.read(); long nl2 = inputstream.read(); long nl3 = inputstream.read(); long nl4 = inputstream.read();

struct1.nlOrderID = (nl1 << 24) | (nl2 << 16) | (nl3 << 8) | nl4;

// rgcCustomerID

for (int i = 0; i < struct1.rgcCustomerID.length; i++) { struct1.rgcCustomerID[i] = (char)inputstream.read(); }

// rgcProductID

for (int i = 0; i < struct1.rgcProductID.length; i++) { struct1.rgcProductID[i] = (char)inputstream.read(); }

// nQuantity

int n1 = inputstream.read(); int n2 = inputstream.read();

struct1.nQuantity = (n1 << 8) | n2; }

public static void write(OutputStream outputstream, Struct1 struct1) throws IOException { // nlOrderID

outputstream.write((int)(struct1.nlOrderID >>> 24) & 0xFF); outputstream.write((int)(struct1.nlOrderID >>> 16) & 0xFF); outputstream.write((int)(struct1.nlOrderID >>> 8) & 0xFF); outputstream.write((int)struct1.nlOrderID & 0xFF);

// rgcCustomerID

for (int i = 0; i < struct1.rgcCustomerID.length; i++) { outputstream.write(struct1.rgcCustomerID[i]); }

// rgcProductID

for (int i = 0; i < struct1.rgcProductID.length; i++) { outputstream.write(struct1.rgcProductID[i]); }

// nQuantity

outputstream.write((struct1.nQuantity >>> 8) & 0xFF); outputstream.write(struct1.nQuantity & 0xFF); } }

You'll notice that using Java didn't bring much benefit to the table. The code is completely ad hoc and largely nonreusable. It is also weak in the validation department -- almost any pattern of bytes will satisfy the logic. In a real application, you'd need to add considerably more validation logic (translation: more code).

Data exchange: Round two

Of course, the example above wasn't really that convincing. Now imagine what would happen if the order became more complex and consisted of not just the fields in the example above, but also fields that described options for each product being ordered -- some of these options might not be necessary, some might be mandatory, and some might depend on other options.

Figure 5 below describes a more complicated format. None of the three options are mandatory and each may appear in any order.

Order
OrderIDUnsigned 32-bit integer
CustomerIDEight 8-bit characters
ProductIDEight 8-bit characters
QuantityUnsigned 16-bit integer
Options...
Option 1
TypeUnsigned 8-bit integer
Field 1ASigned 32-bit integer
Option 2
TypeUnsigned 8-bit integer
Field 2AUnsigned 16-bit integer
Field 2BTen 8-bit characters
Option 3
TypeUnsigned 8-bit integer
Figure 5. A more complex format

Figure 6 below contains an instance of such an order.

0x00 0x00 0x00 0x0F (15)
"RULDS   "
"DC123_44"
0x00 0x05 (5)
0x03 (3)
0x02 (2)
0x00 0x03 (3)
"black     "
0x02 (2)
0x00 0x02 (2)
"white     "
Figure 6. An instance of a more complex order

The following Java code reads data in the format defined in the figure above from an input stream and instantiates a class:

import java.io.InputStream; import java.io.OutputStream; import java.io.IOException;

import java.util.Enumeration;

public class Class2 { public static void read(InputStream inputstream, Struct2 struct2) throws IOException { long nl1, nl2, nl3, nl4; int n1, n2, n3, n4;

// nlOrderID

nl1 = inputstream.read(); nl2 = inputstream.read(); nl3 = inputstream.read(); nl4 = inputstream.read();

struct2.nlOrderID = (nl1 << 24) | (nl2 << 16) | (nl3 << 8) | nl4;

// rgcCustomerID

for (int i = 0; i < struct2.rgcCustomerID.length; i++) { struct2.rgcCustomerID[i] = (char)inputstream.read(); }

// rgcProductID

for (int i = 0; i < struct2.rgcProductID.length; i++) { struct2.rgcProductID[i] = (char)inputstream.read(); }

// nQuantity

n1 = inputstream.read(); n2 = inputstream.read();

struct2.nQuantity = (n1 << 8) | n2;

// options...

while (true) { int nType = inputstream.read();

if (nType < 0) break;

switch (nType) { case 1:

Option1 option1 = new Option1();

// nField1A

n1 = inputstream.read(); n2 = inputstream.read(); n3 = inputstream.read(); n4 = inputstream.read();

option1.nField1A = (n1 << 24) | (n2 << 16) | (n3 << 8) | n4;

struct2.vectorOptions.addElement(option1);

break;

case 2:

Option2 option2 = new Option2();

// nField2A

n1 = inputstream.read(); n2 = inputstream.read();

option2.nField2A = (n1 << 8) | n2;

// nField2B

for (int i = 0; i < option2.rgcField2B.length; i++) { option2.rgcField2B[i] = (char)inputstream.read(); }

struct2.vectorOptions.addElement(option2);

break;

default:

Option3 option3 = new Option3();

struct2.vectorOptions.addElement(option3);

break; } } }

public static void write(OutputStream outputstream, Struct2 struct2) throws IOException { // nlOrderID

outputstream.write((int)(struct2.nlOrderID >>> 24) & 0xFF); outputstream.write((int)(struct2.nlOrderID >>> 16) & 0xFF); outputstream.write((int)(struct2.nlOrderID >>> 8) & 0xFF); outputstream.write((int)struct2.nlOrderID & 0xFF);

// rgcCustomerID

for (int i = 0; i < struct2.rgcCustomerID.length; i++) { outputstream.write(struct2.rgcCustomerID[i]); }

// rgcProductID

for (int i = 0; i < struct2.rgcProductID.length; i++) { outputstream.write(struct2.rgcProductID[i]); }

// nQuantity

outputstream.write((struct2.nQuantity >>> 8) & 0xFF); outputstream.write(struct2.nQuantity & 0xFF);

Enumeration enumeration = struct2.vectorOptions.elements();

while (enumeration.hasMoreElements()) { Object object = enumeration.nextElement();

if (object instanceof Option1) { Option1 option1 = (Option1)object;

outputstream.write(1);

// nField1A

outputstream.write((option1.nField1A >>> 24) & 0xFF); outputstream.write((option1.nField1A >>> 16) & 0xFF); outputstream.write((option1.nField1A >>> 8) & 0xFF); outputstream.write(option1.nField1A & 0xFF); } else if (object instanceof Option2) { Option2 option2 = (Option2)object;

outputstream.write(2);

// nField2A

outputstream.write((option2.nField2A >>> 8) & 0xFF); outputstream.write(option2.nField2A & 0xFF);

// nField2B

for (int i = 0; i < option2.rgcField2B.length; i++) { outputstream.write(option2.rgcField2B[i]); } } else { outputstream.write(3); } } } }

You'll notice that the business rules guiding the handling of the options have to be embedded in the code. That's system integration in a nutshell.

A large component of the system-integration task consists of writing translators, adapters, or agents (they answer to many names) and debugging them by hand. Many times, writing the translator and getting it right (especially in light of the poor documentation that often accompanies legacy code) is one of the most difficult challenges any system integration effort faces.

Data exchange: XML to the rescue

Now consider how an order might be specified in XML. If this is your first contact with XML, I suggest you take a look at Mark Johnson's April 1999 JavaWorld article, "XML for the absolute beginner." (See Resources.)

<?xml version="1.0" ?>

<!DOCTYPE Order [ <!ELEMENT Order (OrderID, CustomerID, ProductID, Quantity, Options*) > <!ELEMENT OrderID (#PCDATA) > <!ELEMENT CustomerID (#PCDATA) > <!ELEMENT ProductID (#PCDATA) > <!ELEMENT Quantity (#PCDATA) > <!ELEMENT Options (Option1 | Option2 | Option3)+ > <!ELEMENT Option1 (Field1A) > <!ELEMENT Option2 (Field2A, Field2B) > <!ELEMENT Option3 EMPTY > <!ELEMENT Field1A (#PCDATA) > <!ELEMENT Field2A (#PCDATA) > <!ELEMENT Field2B (#PCDATA) > ]>

<Order> <OrderID>15</OrderID> <CustomerID>RULDS</CustomerID> <ProductID>DC123_44</ProductID> <Quantity>5</Quantity> <Options> <Option3/> <Option2> <Field2A>3</Field2A> <Field2B>black</Field2B> </Option2> <Option2> <Field2A>2</Field2A> <Field2B>white</Field2B> </Option2> </Options> </Order>

Figure 7. An XML-specified order

The XML in Figure 7 consists of two parts. First is the Document Type Definition (DTD). The DTD defines the structural relationship between the tags that comprise a document. This information allows a parser to rigorously and unambiguously validate a document.

The second part is an order that has been marked up with XML tags. You should immediately notice how easy it is to read and understand. Even in the absence of a DTD, the use of XML markup allows an XML parser to determine whether or not a document has the correct general form (or is "well-formed" in XML parlance).

Now consider the benefits:

  • Using XML results in less custom development. A validating XML parser can use a supplied DTD to automatically check the syntax of a document and enforce business rules. The application has only to validate the character data between the tags.

  • XML documents are self-documenting. The textual nature of XML tags and the inclusion of a well-defined DTD greatly reduce the amount of guesswork involved in developing a translator.

  • XML allows developers to create open, standardized interfaces for existing systems based on robust and widely available tools.

Wrapping it up

Before we go, lets look briefly at code that could be used to parse our XML order. We'll use SAX (Simple API for XML) and IBM's XML parser. (See Resources.)

import org.xml.sax.Parser; import org.xml.sax.Locator; import org.xml.sax.DocumentHandler; import org.xml.sax.ErrorHandler; import org.xml.sax.HandlerBase; import org.xml.sax.AttributeList; import org.xml.sax.SAXException; import org.xml.sax.SAXParseException;

import org.xml.sax.helpers.ParserFactory;

public class TheParser { // This is the parser we will use to parse the XML. // It will be loaded dynamically.

private static final String _stringParserClass = "com.ibm.xml.parsers.ValidatingSAXParser";

public static void main(String [] rgstring) { try { // Create the parser.

Parser parser = ParserFactory.makeParser(_stringParserClass);

HandlerBase handlerbase = new HandlerBase() { public void startElement(String stringTagName, AttributeList attributelist) { if (stringTagName.equals("Order")) { // handle the <Order> tag } else if (stringTagName.equals("OrderID")) { // handle the <OrderID> tag } else if (stringTagName.equals("CustomerID")) { // handle the <CustomerID> tag } else if (stringTagName.equals("ProductID")) { // handle the <ProductID> tag } else if (stringTagName.equals("Quantity")) { // handle the <Quantity> tag } else if (stringTagName.equals("Option1")) { // handle the <Option1> tag } else if (stringTagName.equals("Option2")) { // handle the <Option2> tag } else if (stringTagName.equals("Option3")) { // handle the <Option3> tag } else if (stringTagName.equals("Field1A")) { // handle the <Field1A> tag } else if (stringTagName.equals("Field2A")) { // handle the <Field2A> tag } else if (stringTagName.equals("Field2B")) { // handle the <Field2B> tag } }

public void endElement (String stringTagName) { if (stringTagName.equals("Order")) { // handle the </Order> tag } else if (stringTagName.equals("OrderID")) { // handle the </OrderID> tag } else if (stringTagName.equals("CustomerID")) { // handle the </CustomerID> tag } else if (stringTagName.equals("ProductID")) { // handle the </ProductID> tag } else if (stringTagName.equals("Quantity")) { // handle the </Quantity> tag } else if (stringTagName.equals("Option1")) { // handle the </Option1> tag } else if (stringTagName.equals("Option2")) { // handle the </Option2> tag } else if (stringTagName.equals("Option3")) { // handle the </Option3> tag } else if (stringTagName.equals("Field1A")) { // handle the </Field1A> tag } else if (stringTagName.equals("Field2A")) { // handle the </Field2A> tag } else if (stringTagName.equals("Field2B")) { // handle the </Field2B> tag } }

public void characters(char [] rgc, int nStart, int nLength) { // handle character data }

public void error(SAXParseException saxparseexception) throws SAXException { throw saxparseexception; } };

parser.setDocumentHandler((DocumentHandler)handlerbase); parser.setErrorHandler((ErrorHandler)handlerbase);

for (int i = 0; i < rgstring.length; i++) { parser.parse(rgstring[i]); } } catch (Exception exception) { exception.printStackTrace(); } } }

The code in the example above illustrates how we would set up and use a SAX-compliant parser to parse and validate XML. Note that the code doesn't actually do anything with the parsed XML. You should notice, however, the lack of complicated validation logic. The XML parser, using the DTD, takes care of that for us.

Conclusion

I hope you've gained a better understanding of why XML is important in the enterprise -- it supports the enterprise application integration effort by providing a common, standardized platform upon which to build an integration infrastructure.

Next month, I'll continue my exploration of the space at the intersection of Java, XML, and the enterprise. It's fertile ground for the fruitful mind. Join me then!

Learn more about this topic

Join the discussion
Be the first to comment on this article. Our Commenting Policies