Newsletter sign-up
View all newsletters

Enterprise Java Newsletter
Stay up to date on the latest tutorials and Java community news posted on JavaWorld

Sponsored Links

Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs

Is your code ready for the next wave in commodity computing? Part 2

Measure the speed-up (or slow-down!) delivered by parallel hardware for your application

  • Print
  • Feedback

As an industry, we are obsessed with benchmarking, almost to a fault. Projects regularly allocate significant chunks of time executing vendor "beauty contests" for hardware and software (, JDKs, IDEs). In my opinion, we do this for three main reasons:

  1. The landscape changes so often that assertions we hold to be self-evident need to be revalidated about every 18 months.
  2. It can never be too fast. If I cut corners on application code, having fast hardware is an insurance policy to help meet performance or scalability service-level agreements that the nice salesman promised the customer, including the service credit clawback clause.
  3. We like doing it, especially when it means getting a new kit to test for free.

With the current proliferation of ever-more parallelized hardware, benchmarking is a hot topic again. In Part 1 of this article, I detailed the basic precepts behind parallel hardware. In Part 2, I want to move on and examine how to measure how effectively this hardware has been crystallized. This is a process fraught with difficulty—benchmarks, by their nature, are more often pilloried for what they don't measure, or what they measure inaccurately, than lauded for their impartiality. My benchmark will suffer the same fate. Nonetheless, after reading this article, you will see how I designed and implemented the framework, what I believe it measures (and doesn't), and its potential uses.

Benchmark definition

Before moving into the crux of the article, I want to first define what a benchmark is and then describe the scope of this article and associated code base (available for download from Resources) as measured against that definition: A benchmark is a test that allows multiple implementations targeted at a common purpose to be compared with each other in a repeatable, objective, and transparent manner.

Given this definition then, I believe that there are two logical components to any benchmark test and, furthermore, that these distinct objectives are not separated in existing benchmarks. These components are:

  • The infrastructure required to successfully execute a benchmark test
  • The test itself—this is always application-specific

This assumption is key to the rest of the article.

What does this article describe?

This article describes a software framework that facilitates the construction of multithreaded Java benchmarks. The primary focus of the framework is to enable the benchmarking of multicore CPUs from multiple vendors against a specific application or purpose by allowing the framework user to extend the framework.

Existing benchmarks

Before describing the benchmark framework I have built for this article, I wanted to see what is already available to use. In a nutshell, nothing was suitable. The Resources section provides links to the projects I found so you can evaluate them independently. The problems I found with existing frameworks can be boiled down to this list:

  • They are old/unmaintained
  • They are battlegrounds for vendors to compete in and hence must be considered unreliable
  • They tend to focus on scientific/engineering points of interest (floating point performance, for example)
  • They tend to focus on benchmarking JVMs against each other
  • They are very specific (the Volano benchmark, for example, is a chat server—great if you're building a chat server, inapplicable if you aren't)

The frameworks also do not distinguish between a benchmark test and the test itself. Put another way, the potential audience for a benchmark is the entire Java development population. It is impossible to design a benchmark that will be applicable to this diverse constituency. But, if we can separate what the benchmark runs from the plumbing of the benchmark itself, then the framework will be applicable—when extended.

Benchmark design

Schematic detailing the major components of the benchmark framework. Click on thumbnail to view full-sized image.

As the diagram above illustrates, I have deliberately kept the framework simple. Specifically, I want to make the framework easy to understand and extend, and also minimize the implicit overhead in testing the target hardware and software platform.

Benchmark implementation

The benchmark is made significantly simpler through the use of Java Platform, Standard Edtion 6.0 and higher features, in particular the additions to the concurrency API. For more details on concurrency in the Java platform, check out the Java Concurrency in Practice Website (and book) link in the Resources section. It is the seminal work in its field, in my opinion.

No platform-specific features are utilized; although, if you extend the framework, you are free to do so to squeeze your target platform to the limit. The framework should run "out of the box" on all platforms, specifically Windows, Unix, and Linux. On a quick side point, if DTrace (see Resources) was a platform-independent tool, it would be a contender to use as a component of the implementation.

You can download, build, and run the benchmark framework source code from the Resources section.

How the framework works

The following code snippet details how simple the framework is. Once you understand this main method, you basically understand the framework itself.

    public static void main(String[] args) throws Exception {

      LoadTestEngine te = new LoadTestEngine();

      //Holds all data relating to how we want the load test executed
      LoadTestMetaData ltmd = new LoadTestMetaData();

      //Object to analyze the output of the load test 
      Analyser ra = new LinpackRunAnalyser();
      ltmd.setAnalyser(ra);

      // Figure out how many threads we want to run;
      // use reported number of cores plus one
      int numThreads = Runtime.getRuntime().availableProcessors() + 1;
      ltmd.setNumThreads(numThreads);

      //ltmd.setTaskClass(LinpackTask.class);
      //now set the class we want to form the basis of each individual Task
      ltmd.setTaskClass(SimpleTask.class);

      //How many times we want the task executed
      ltmd.setNumIterations(1000);
      
      //We need to be able to survive any tasks that fail to complete,
      //so we set a cutoff point after which we will not wait any longer for completion
      ltmd.setTaskTimeout(10000);

      te.setLoadTestMetaData(ltmd);

      long startTime = System.currentTimeMillis();
      RunResult rr = te.runLoadTest();
      long endTime = System.currentTimeMillis() - startTime;

      System.out.println(rr);

      System.out.println("Engine output BEGIN");
      System.out.println("Total elapsed time (ms): " + endTime);
      System.out.println("Engine output END");

      te.stop();

   }

But the real work is done in LoadTestEngine.java, specifically this snippet:

      //First, fire off the tasks
      for(int i = 0; i < numTasks;i++){
         try {
            Task t = (Task) ltmd.getTaskClass().newInstance();
            // Allow fw extenders to return their own subtype of TaskResult
            Future<? extends TaskResult> f = pool.submit(t);
            futures[i] = f;
         } catch (InstantiationException e1) {
            throw new LoadTestException(e1);
         } catch (IllegalAccessException e) {
            throw new LoadTestException(e);
         }
      }
      
      // Now collate the results
      for(int i = 0; i< numTasks;i++){
         try {
            TaskResult tr = futures[i].get(ltmd.getTaskTimeout(),
                  TimeUnit.MILLISECONDS);
            ra.addTaskResult(tr);

         } catch (TimeoutException e) {
            e.printStackTrace();
         } catch (ExecutionException e) {
            e.printStackTrace();
         } catch (InterruptedException e) {
            e.printStackTrace();
         }
      }
      return ra.analyse();

 

This code uses a pool of threads managed by an ExecutorService to execute the workload I have defined, then gathers the results via the Future.get() method for final analysis.

Strengths of the framework

  • The framework is very simple. Simply put, you should be able to understand how the framework operates in 10 minutes or less, right down to the code level.
  • The framework is extensible. By default, the framework is designed to be extended by you. Therefore, if you want to test Spring versus Struts/WebWork, for example, then my framework gives you the ability to objectively do this.
  • The framework allows you to test for application correctness in a multithreaded environment.
  • Once extended, the framework can be integrated into a continuous integration environment and used to ensure that an application in development continues to meet stability, scalability, and performance criteria.

Failings of the framework

  • The framework is simple. If you're tasked with understanding how a specific release of a specific framework (e.g., Spring R1.2) performs, then at best, this framework is a shell for you to use.
  • The framework (by default) is measuring exactly what it has been coded to measure. While this may sound obvious, it is clear that the devil is in the details. Two benchmarks might accurately be described as testing the ability of a given application to scale in a multithreaded Java environment running on a specific hardware profile, but in reality, they may be measuring completely different aspects of a system. Put another way, the two benchmarks are micro-benchmarks, and results from a micro-benchmark taken in isolation are always misleading.

This micro-benchmark point was made at JavaOne by David Dagastine, Brian Doherty, and Paul Hohensee at their BOF "Java Technology-Based Performance on Multithreaded Hardware Systems." Unfortunately, the slides for JavaOne BOFs aren't available online. Nonetheless, the Resources section contains links to the presenters' blogs, with a lot of useful information on benchmarking and also performance tuning.

  • Print
  • Feedback

Resources