Recommended: Sing it, brah! 5 fabulous songs for developers
JW's Top 5
Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs
Page 4 of 5
Understanding the time and space requirements of the different approaches to sending XML across process boundaries is important when choosing which approach to take. First, I'll present the results of my experiments measuring the space used for an XML document in text versus an XML document represented by the serialized DOM. Then I'll present the results of our timing tests for each case.
The space required to represent an XML document depends on the structure implied by the DTD and the amount of data in the document. Obviously, an XML document with one simple tag and megabytes of textual data is going to have almost identical space requirements in both representations.
I am interested in highly structured data. The performance tests compared the XML textual representation of a purchase order with the serialized DOM representation of it. The performance tests used IBM's XML parser and DOM implementation, version 2.0.13.
The tests used a purchase order since it is a typical business-to-business application of XML with a fairly rich XML structure. It represents information extracted from several tables in a relational database. Here is the DTD for the purchase order:
<?xml encoding="US-ASCII"?> <!ELEMENT orders (order)*> <!ELEMENT order (header,item+,total)> <!ELEMENT header (billing_info,shipping_info)> <!ELEMENT billing_info (name,address,credit_card)> <!ELEMENT shipping_info (name,address)> <!ELEMENT name (given,family)> <!ELEMENT address (street,city,state,zipcode,country,phone)> <!ELEMENT item (product_id,product_name,quantity,price)> <!ELEMENT credit_card (#PCDATA)> <!ELEMENT given (#PCDATA)> <!ELEMENT family (#PCDATA)> <!ELEMENT street (#PCDATA)> <!ELEMENT city (#PCDATA)> <!ELEMENT state (#PCDATA)> <!ELEMENT zipcode (#PCDATA)> <!ELEMENT country (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT product_id (#PCDATA)> <!ELEMENT product_name (#PCDATA)> <!ELEMENT quantity (#PCDATA)> <!ELEMENT price (#PCDATA)> <!ELEMENT total (#PCDATA)>
Here's an XML document with a single purchase order:
<?xml version="1.0"?>
<!DOCTYPE orders SYSTEM "orders.dtd">
<orders>
<order>
<header>
<billing_info>
<name>
<given>John</given>
<family>Doe</family>
</name>
<address>
<street>555 Main Street</street>
<city>Mill Valley</city>
<state>California</state>
<zipcode>94520</zipcode>
<country>USA</country>
<phone>707 555-1000</phone>
</address>
<credit_card>5555 5555 5555 5555</credit_card>
</billing_info>
<shipping_info>
<name>
<given>John</given>
<family>Doe</family>
</name>
<address>
<street>555 Main Street</street>
<city>Mill Valley</city>
<state>California</state>
<zipcode>94520</zipcode>
<country>USA</country>
<phone>707 555-1000</phone>
</address>
</shipping_info>
</header>
<item>
<product_id>5555555</product_id>
<product_name>Widget</product_name>
<quantity>100</quantity>
<price>.25</price>
</item>
<total>25.00</total>
</order>
</orders>
Notice that the DTD allows multiple orders to be given in a single XML document. For my performance tests, I enlarged the XML document by including the same order n times. Table 2 presents the space results.
| Number of orders | Bytes to represent as XML text | Bytes to represent as serialized DOM | Ratio (serialized/text) |
| 1 | 1,048 | 7,278 | 6.9 |
| 5 | 4,900 | 29,310 | 5.9 |
| 10 | 9,715 | 56,850 | 5.9 |
| 100 | 9,6385 | 552,570 | 5.7 |
| 500 | 481,585 | 2,755,770 | 5.7 |
Table 2. Space results
I now compare the translation times for the purchase order. In particular, I give the time it takes to externalize a DOM representation to XML text and then reparse it into a DOM, as well as the time it takes to serialize and deserialize the DOM representation. This effectively compares the performance of the senders and receivers, without considering the actual communication of the data.
I performed the tests on a Compaq Pentium 3 system running at 450 MHz with 128 MB of RAM. The system was running Microsoft Windows NT 4.0, Sun's Java 1.2 Virtual Machine, and IBM's XML parser and DOM implementation, version 2.0.13. A time is the average of 50 trials.
| Number of orders | Milliseconds to write DOM as XML text and reparse it to DOM | Milliseconds to serialize and deserialize DOM | Ratio (serialize/parse) |
| 1 | 191 | 478 | 2.5 |
| 5 | 199 | 534 | 2.7 |
| 10 | 231 | 603 | 2.6 |
| 100 | 862 | 2228 | 2.6 |
| 500 | 4707 | 10107 | 2.1 |
Table 3. Timing results
Server-side Java: Read the whole series -archived on JavaWorld