Accelerate WSS applications with VTD-XML

Virtual Token Descriptor-XML can accelerate parsing for applications based on Web Services Security

1 2 Page 2
Page 2 of 2

Benchmark results

The purpose of this section is to help readers get a quantitative feel of the performance characteristics of various types of essential operations in WSS applications. The first part contains benchmark numbers of the base-line performance (parsing and reserialization) for WSS. The second part goes one step further and measures combined latency of parsing, XPath evaluation, and outputting XML. The benchmark code used in this article is available as part of the VTD-XML 1.9 release, which can be downloaded from Resources.

The environment for the benchmark has the following setup:

  • Hardware: A Sony VAIO notebook featuring a 1.7-GHz Pentium M processor with integrated 2 MB of cache memory, 512-MB DDR2 RAM and 400-MHz front-side bus.
  • OS/JVM setting: The notebook runs Windows-XP, and the test applications are obtained from version 1.506 of JDK/JVM.
  • XML parsers: The benchmark tests Xerces DOM version 2.7.1 and VTD-XML version 1.8.5. The DOM tests are configured to use both deferred node expansion (by default) and full node expansion. The VTD-XML tests are in normal mode and in buffer-reuse mode.

To reduce timing variation due to disk I/O, the benchmark programs first read XML files into the memory buffer prior to the test runs and output XML files into an in-memory byte array output stream. The server JVM is used to obtain peak performance. All input/output streams are reused whenever possible.

For the first part of the benchmark, a random collection of XML files are chosen and divided into three groups according to the file size. Small files are less than 30 KB in size. Mid-sized files are between 30 KB and 1 MB. Files larger than 1 MB are considered big. The benchmark code first parses an in-memory XML file, then immediately writes it back out into a byte array output stream. The results consist of both the parsing-only performance and round-trip performance (parsing plus reserialization).

For the second part of the benchmark, three XML purchase orders of similar structure, but different sizes, are chosen. The benchmark code parses an XML file in the buffer, evaluates a single precompiled XPath expression, removes the nodes from the document, and writes the output into a byte array output stream. The five chosen XPath expressions are:

  1. /*/*/*[position() mod 2 = 0]
  2. /purchaseOrder/items/item[USPrice<100]
  3. /*/*/*/quantity/text()
  4. //item/comment
  5. //item/comment/../quantity

To give you some idea about the XML file structure, below is the starting portion of the purchase order:

 <?xml version="1.0"?>
<purchaseOrder orderDate="1999-10-20">
    <shipTo country="US">
        <name>Alice Smith</name>
        <street>123 Maple Street</street>
        <city>Mill Valley</city>
        <state>CA</state>
        <zip>90952</zip>
    </shipTo>
    <billTo country="US">
        <name> Robert Smith </name>
        <street>8 Oak Avenue</street>
        <city>Old Town</city>
        <state>PA</state>
        <zip>95819</zip>
    </billTo>
    <comment>Hurry, my lawn is going wild!</comment>
    <items>
        <item partNum="872-AA">
            <productName>Lawnmower</productName>
            <quantity></quantity>
            <USPrice>148.95</USPrice>
            <comment>Confirm this is electric</comment>
        </item>
        <item partNum="926-AA">
            <productName>Baby Monitor</productName>
            <quantity>1</quantity>
            <USPrice>39.98</USPrice>
            <shipDate>1999-05-21</shipDate>
        </item>
        <item partNum="872-AA">
            <productName>Lawnmower</productName>
            <quantity><![CDATA[1]]></quantity>
            <USPrice>148.95</USPrice>
            <comment>Confirm this is electric</comment>
        </item>
        <item partNum="926-AA">
            <productName>Baby Monitor</productName>
            <quantity>1</quantity>
            <USPrice>39.98</USPrice>
            <shipDate>1999-05-21</shipDate>
         </item>
               ...
    </items>
</purchaseOrder>

Part 1. Parsing vs. parsing and reserializing

Each cell of Table 1 contains two numbers: the left one is the parsing latency, the right one is the combined latency of parsing and reserialization.

Table 1. Latency comparisons of parsing vs. parsing and reserialization

 DOM Deferred (ms)DOM Full (ms)VTD-XML (ms)VTD-XML with Buffer-reuse(ms)

po_small.xml (6,780 bytes)

0.547 / 1.604

0.388 / 0.934

0.134 / 0.134

0.121  / 0.121

form.xml (15,845 bytes)

1.071 / 2.951 

 0.946 / 1.845

0.234 / 0.238

 0.217 / 0.223

book.xml (22,996 bytes)

3.711 / 11.444

3.238 / 7.312

0.381 / 0.392

0.361 / 0.373

cd.xml (30,831 bytes)

 7.951 / 17.238

9.082 / 12.454

0.612 / 0.640

0.59 /  0.616

bioInfo.xml (34,759 bytes)

7.811 / 14.502

8.674 / 10.904

0.553 / 0.567

0.534 / 0.566

po_medium.xml (112,238 bytes)

13.095  /  28.552

18.268 /   26.766

2.069 / 2.199

2.023  /  2.081

po_big.xml (1,060,823 bytes)

104.688 / 237.903

144.821 / 266.94  

21.956 / 26.826

21.556 /  23.408

blog.xml (1,334,455 bytes)

68.486 / 156.337

89.289 / 138.75

20.517 / 22.97

20.253 / 22.195

soap.xml (2,716,834 bytes)

313.26 / 808.86

480.989 / 835.1

61.69 / 72.01

55.58 / 61.69

ORTCA.xml (8,029,319 bytes)

749.88 / 1667.1

1056.32 / 1483.53

210.41 / 235.43

195.88 / 212.81

address.xml (15,981,592 bytes)

2790.92 / 4120.719 

2217.79 / 4963.64

334.18 / 379.14

304.74 / 335.580

Part 2. Update performance comparison

Table 2. Combined update latency comparison for /*/*/*[position() mod 2 = 0]

 DOM defered (ms)DOM full (ms)VTD-XML(ms)VTD-XML with buffer reuse (ms)
po_small.xml  (6,780 bytes) 6.6453.8270.1840.171
po_medium.xml  (112,238 bytes)42.24835.1253.0522.751
po_big.xml (1,060,823 bytes) 324.38286.61335.80129.475

Table 3. Combined update latency comparison for /purchaseOrder/items/item[USPrice< 100]

 DOM defered (ms)DOM full (ms)VTD-XML(ms)VTD-XML with buffer reuse (ms)
po_small.xml (6,780 bytes) 6.9214.1340.2290.204
po_medium.xml  (112,238 bytes)42.72437.5913.0523.392
po_big.xml (1,060,823 bytes) 314.585354.17541.74337.035

Table 4. Combined update latency comparison for /*/*/*/quantity/text()

 DOM defered (ms)DOM full (ms)VTD-XML(ms)VTD-XML with buffer reuse (ms)
po_small.xml (6,780 bytes) 4.7624.1130.2120.192
po_medium.xml  (112,238 bytes)41.91238.8483.5423.132
po_big.xml (1,060,823 bytes) 376.808367.49539.99135.635

Table 5. Combined update latency comparison for //item/comment

 DOM defered (ms)DOM full (ms)VTD-XML(ms)VTD-XML with buffer reuse (ms)
po_small.xml (6,780 bytes) 4.9514.2780.3070.282
po_medium.xml  (112,238 bytes)47.0641.3444.8094.373
po_big.xml (1,060,823 bytes) 358.783395.40156.89852.058

Table 6. Combined update latency comparison for //item/comment/../quantity

 DOM defered (ms)DOM full (ms)VTD-XML(ms)VTD-XML with buffer reuse (ms)
po_small.xml (6,780 bytes) 4.9784.3260.3060.291
po_medium.xml  (112,238 bytes)46.52242.4665.1124.716
po_big.xml (1,060,823 bytes) 405.683401.29353.4649.003
Figure 1. Normalized latency comparison for /*/*/*[position() mod 2 = 0]. Click on thumbnail to view full-sized image.
Figure 2. Normalized latency comparison for /purchaseOrder/items/item[USPrice < 100]. Click on thumbnail to view full-sized image.
Figure 3. Normalized latency comparison for /*/*/*/quantity/text(). Click on thumbnail to view full-sized image.
Figure 4. Normalized latency comparison for //item/comment. Click on thumbnail to view full-sized image.
Figure 5. Normalized latency comparison for //item/comment/../quantity. Click on thumbnail to view full-sized image.

Observations

It is clear that by looking at the data in Table 1, DOM's reserialization is quite costly, particularly for small XML files for which reserialization can take nearly twice as long as parsing. Both the parsing and reserialization performance of DOM drop precipitously with Xerces's default setting (deferred node expansion). As the file sizes increase, DOM's reserialization cost declines but still accounts for roughly two-thirds of the total cost in a typical case. For big files, default node expansion helps parsing performance at the expense of increased reserialization performance. VTD-XML, with or without buffer reuse, consistently outperforms Xerces DOM by a single order of magnitude regardless of the file size. In some cases, VTD-XML with buffer reuse is nearly 30 times as fast as Xerces DOM with deferred node expansion.

The second part of the benchmark shows that, regardless of file size, VTD-XML maintains the performance edge even after adding XPath expression evaluation in the mix, outperforming Xerces by a factor between seven and 38 times. Since both VTD-XML and DOM are random-access capable, XPath evaluation should perform roughly the same. But surprisingly, this is not the case, particularly for small XML files. For Xerces, XPath evaluation for po_small.xml takes twice as long as the combined latency of parsing and reserialization, dragging the combined throughput down to a whopping 1.7 MB/second (around one-tenth of the parsing throughput). To me, it seems that small XML files flowing through the networks are more likely to produce choking points within DOM-based WSS infrastructure — a problem for which VTD-XML should provide a reasonable solution.

The issues of XML encryption and XML signing

But VTD-XML only offers a new starting point. Other design issues in the WSS family of specs only become obvious after we remove the parsing/reserialization overhead from WSS applications. One of the common complaints of XML signature and XML encryption is that they are deadly slow. By looking at the names of those two specs, you would think that they are mostly cryptography related. Not so. Strict crypto operations only account for a small percentage of CPU cycles. The lion's share of the overhead is the result of performing parsing, reserialization, and XML canonicalization on SOAP messages. And among them, the most troubling part is XML canonicalization, which converts XML Infoset into a unique byte pattern. The original goal of XML canonicalization was to check the logical equivalence of two documents. To canonicalize an XML document, one must apply the transformation process, which consists of 13 steps, to the XML document (see "Performance of Web Services Security," Hongbin Liu Shrideep Pallickara Geoffrey Fox).

But in the context of WSS, XML canonicalization introduces too much processing overhead. Even worse, it introduces the overhead without accomplishing anything significantly necessary or useful. For one thing, signing or encrypting XML is quite different from logical equivalence checking between two XML documents. For another, the values of XML signature and cipher, like signing and encrypting any other data types, should always have been computed from the byte content of the XML itself, not the Infoset.

And don't forget that the technology world, along with its underlying assumptions, is relentlessly marching forward. The speed of networks has gone from 10 Mbits/second a decade ago to 10 Gbits/second nowadays, with 100 Gbits/second on the horizon. Given that XML/SOAP data are rapidly increasing, due to the proliferation of SOA and Web services applications, what is the point of introducing artificial choking points in the network? The misery seems entirely self-imposed.

The challenge is for someone to go back to the drawing board and come up with a replacement for XML canonicalization—one that can be completely described in one to two pages and understood by anyone in five minutes. Performance-wise, it should strive not to screw up. Matching at least the parsing performance should not be that difficult. Among those 13 steps described in XML canonicalization, it seems that only the transcoding (to UTF-8) step should be retained.

Conclusion

This article investigated some of the practical implementation issues in a DOM-based WSS infrastructure. As the next-generation XML parser beyond DOM and SAX, VTD-XML fundamentally and completely solves DOM's wasteful parsing and reserialization. But more challenges are ahead. The XML canonicalization spec is unnecessarily complex and inefficient, making it ill-suited for a reasonably high-performance implementation of XML signature and encryption. We need something much simpler and faster.

Jimmy Zhang is founder of XimpleWare, a provider of high-performance XML-processing solutions. He has experience in the fields of electronic design automation and voice-over IP with numerous Silicon Valley technology companies. He graduated from University of California, Berkeley with both an MS and a BS from the department of EECS.

Learn more about this topic

Download VTD-XML version 1.9

Performance of Web Services Security, Hongbin Liu et al. (Indiana University)

XML Canonicalization, Bilal Siddiqui (XML.com, September 2002)

"Cut, Paste, Split, and Assemble XML Documents with VTD-XML," Jimmy Zhang (JavaWorld, July 2006)

"Simplify XML Processing with VTD-XML," Jimmy Zhang (JavaWorld, March 2006)

For more articles on working with XML, browse JavaWorld's Java and XML Research Center

For more articles on Web services, browse JavaWorld's Web Services and SOAs Research Center

Also browse through the articles in JavaWorld's Security & Testing Research Center

1 2 Page 2
Page 2 of 2