Recommended: Sing it, brah! 5 fabulous songs for developers
JW's Top 5
Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs
As an industry, we are obsessed with benchmarking, almost to a fault. Projects regularly allocate significant chunks of time executing vendor "beauty contests" for hardware and software (, JDKs, IDEs). In my opinion, we do this for three main reasons:
With the current proliferation of ever-more parallelized hardware, benchmarking is a hot topic again. In Part 1 of this article, I detailed the basic precepts behind parallel hardware. In Part 2, I want to move on and examine how to measure how effectively this hardware has been crystallized. This is a process fraught with difficulty—benchmarks, by their nature, are more often pilloried for what they don't measure, or what they measure inaccurately, than lauded for their impartiality. My benchmark will suffer the same fate. Nonetheless, after reading this article, you will see how I designed and implemented the framework, what I believe it measures (and doesn't), and its potential uses.
Before moving into the crux of the article, I want to first define what a benchmark is and then describe the scope of this article and associated code base (available for download from Resources) as measured against that definition: A benchmark is a test that allows multiple implementations targeted at a common purpose to be compared with each other in a repeatable, objective, and transparent manner.
Given this definition then, I believe that there are two logical components to any benchmark test and, furthermore, that these distinct objectives are not separated in existing benchmarks. These components are:
This assumption is key to the rest of the article.
This article describes a software framework that facilitates the construction of multithreaded Java benchmarks. The primary focus of the framework is to enable the benchmarking of multicore CPUs from multiple vendors against a specific application or purpose by allowing the framework user to extend the framework.
Before describing the benchmark framework I have built for this article, I wanted to see what is already available to use. In a nutshell, nothing was suitable. The Resources section provides links to the projects I found so you can evaluate them independently. The problems I found with existing frameworks can be boiled down to this list:
The frameworks also do not distinguish between a benchmark test and the test itself. Put another way, the potential audience for a benchmark is the entire Java development population. It is impossible to design a benchmark that will be applicable to this diverse constituency. But, if we can separate what the benchmark runs from the plumbing of the benchmark itself, then the framework will be applicable—when extended.
Schematic detailing the major components of the benchmark framework. Click on thumbnail to view full-sized image.
As the diagram above illustrates, I have deliberately kept the framework simple. Specifically, I want to make the framework easy to understand and extend, and also minimize the implicit overhead in testing the target hardware and software platform.
The benchmark is made significantly simpler through the use of Java Platform, Standard Edtion 6.0 and higher features, in particular the additions to the concurrency API. For more details on concurrency in the Java platform, check out the Java Concurrency in Practice Website (and book) link in the Resources section. It is the seminal work in its field, in my opinion.
No platform-specific features are utilized; although, if you extend the framework, you are free to do so to squeeze your target platform to the limit. The framework should run "out of the box" on all platforms, specifically Windows, Unix, and Linux. On a quick side point, if DTrace (see Resources) was a platform-independent tool, it would be a contender to use as a component of the implementation.
You can download, build, and run the benchmark framework source code from the Resources section.
The following code snippet details how simple the framework is. Once you understand this main method, you basically understand the framework itself.
public static void main(String[] args) throws Exception {
LoadTestEngine te = new LoadTestEngine();
//Holds all data relating to how we want the load test executed
LoadTestMetaData ltmd = new LoadTestMetaData();
//Object to analyze the output of the load test
Analyser ra = new LinpackRunAnalyser();
ltmd.setAnalyser(ra);
// Figure out how many threads we want to run;
// use reported number of cores plus one
int numThreads = Runtime.getRuntime().availableProcessors() + 1;
ltmd.setNumThreads(numThreads);
//ltmd.setTaskClass(LinpackTask.class);
//now set the class we want to form the basis of each individual Task
ltmd.setTaskClass(SimpleTask.class);
//How many times we want the task executed
ltmd.setNumIterations(1000);
//We need to be able to survive any tasks that fail to complete,
//so we set a cutoff point after which we will not wait any longer for completion
ltmd.setTaskTimeout(10000);
te.setLoadTestMetaData(ltmd);
long startTime = System.currentTimeMillis();
RunResult rr = te.runLoadTest();
long endTime = System.currentTimeMillis() - startTime;
System.out.println(rr);
System.out.println("Engine output BEGIN");
System.out.println("Total elapsed time (ms): " + endTime);
System.out.println("Engine output END");
te.stop();
}
But the real work is done in LoadTestEngine.java, specifically this snippet:
//First, fire off the tasks
for(int i = 0; i < numTasks;i++){
try {
Task t = (Task) ltmd.getTaskClass().newInstance();
// Allow fw extenders to return their own subtype of TaskResult
Future<? extends TaskResult> f = pool.submit(t);
futures[i] = f;
} catch (InstantiationException e1) {
throw new LoadTestException(e1);
} catch (IllegalAccessException e) {
throw new LoadTestException(e);
}
}
// Now collate the results
for(int i = 0; i< numTasks;i++){
try {
TaskResult tr = futures[i].get(ltmd.getTaskTimeout(),
TimeUnit.MILLISECONDS);
ra.addTaskResult(tr);
} catch (TimeoutException e) {
e.printStackTrace();
} catch (ExecutionException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
return ra.analyse();
This code uses a pool of threads managed by an ExecutorService to execute the workload I have defined, then gathers the results via the Future.get() method for final analysis.
This micro-benchmark point was made at JavaOne by David Dagastine, Brian Doherty, and Paul Hohensee at their BOF "Java Technology-Based Performance on Multithreaded Hardware Systems." Unfortunately, the slides for JavaOne BOFs aren't available online. Nonetheless, the Resources section contains links to the presenters' blogs, with a lot of useful information on benchmarking and also performance tuning.