jzhang
newbie
Reged: 12/02/05
Posts: 26
|
|
Interesting comment. We will think about how to address this in our future releases of VTD-XML...
|
ollyx
Unregistered
|
|
I would also be glad about a posibility to manipulate the in-memory XML data by means of VTD-XML API in a future release!
|
jzhang
newbie
Reged: 12/02/05
Posts: 26
|
|
Yes, we are looking for feedback and gathering requirement. Help us extend the features of VTD-XML, any suggestions are welcome.
|
Anonymous
Unregistered
|
|
Benchmark weaknesses in v1.6
1. JVM runtime optimizer might optimize on fixed data.
Observation: Benchmark.java reads the data file once into a byte array, runs the parser over and over on the byte array for 30 seconds, and then times how long it takes to "parse" the data a fixed number of repetitions.
Critique: Since it is repeatedly parsing the same array a large number of times, in theory HotSpot could optimize the parser for that array. The reader can't tell whether the benchmark data is reliable.
Suggestion: A better benchmark might vary the data during the warmup phase so that the optimizations are not specific to the test data.
(Such fixed-data optimizations are a selling point for cases where a java version of a program runs faster than the C version, and if the FAQ answer [saying the C version is 5% slower] is based on the benchmark, it might be happening here.) http://vtd-xml.sourceforge.net/faq.html#How%20does%20the%20C%20version%20of%20VTD-XML%20compare%20with%20the%20Java%20Version
2. JVM runtime optimizer (or VTD) might eliminate unused computations.
Observation: Benchmark.java does not extract any of the data from the file.
Critique: The optimizer could in theory notice that the body of the loop does no I/O and no results are used, so it can be eliminated. The reader can't tell whether the benchmark data is reliable.
Critique: VTD could be simply delaying parsing and construction of objects until the caller navigates to them. This is a good idea, but it would mean that this is an unfair benchmark. The reader can't tell whether the benchmark data is reliable.
Suggestion: A better benchmark would actually show that the parser must have parsed a significant part of the data, maybe by navigating to the last leaf (or every leaf), extracting the text, computing a checksum of the text (or its last character), and writing the checksum to console after the loop is over.
(Benchmark.java from benchmark_1.6.zip; current version at) http://www.ximpleware.com/code/benchmark.java
Has anyone run a benchmark without these problems?
|