Java performance geek Aleksey Shipilёv doesn't just work on tools that help developers to write and run better benchmarks; he also takes on the meta side of performance measurement. A good benchmark shouldn't be used to "prove a point in a holy war" or as a marketing tool, he says. Instead, good benchmarks strive to understand the whole system:
The exact numbers don’t usually matter there, it is important to see through them and understand why those numbers are arranged in that particular way [...] Benchmarks, in this parlance, are the tools which isolate and quantify the behavior in a particular lab environment.
-- Interview with Martijn Verburg, jClarity, May 2014
A a demonstration, Shipilёv recently investigated a performance difference between Java and Scala, where Scala was consistently performing slower than Java in for-loop computations. His careful benchmarking doesn't reveal much about which language is faster, but it does showcase the strengths of his Java Microbenchmarking Harness. It also winds up being an excellent tutorial on the pitfalls of cross-language benchmarking:
Comparing two implementations in otherwise the same environment is hard on its own. Comparing two implementations coming from two different ecosystems, when both implementations have experienced quite a few transformations before reaching the execution units on your CPU is very hard. The minute differences the code encounters while going through its own life-cycle can drastically affect its performance, ruining the comparison.
That is why, the cross-language benchmarks should include a tough analysis on what was going on. The superficial conclusions almost always feed on existing biases, and are almost always wrong. Blindly believing some optimizations work the way you expect them to work is unwarranted. The juxtapositions and interference of optimizations may lead to completely surprising outcomes.
See the JMH benchmarking demo at Java vs Scala: Divided we fall ...