Newsletter sign-up
View all newsletters

Enterprise Java Newsletter
Stay up to date on the latest tutorials and Java community news posted on JavaWorld

Sponsored Links

Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs

JVM performance optimization, Part 2: Compilers

Use the right Java compiler for your Java application

  • Print
  • Feedback

Page 6 of 6

Listing 3. Caller method

int whenToEvaluateZing(int y) {
   return daysLeft(y) + daysLeft(0) + daysLeft(y+1);
}

Listing 4. Called method

int daysLeft(int x){
   if (x == 0)
      return 0;
   else
      return x - 1;
}

Listing 5. Inlined method

int whenToEvaluateZing(int y){
   int temp = 0;
   
   if(y == 0) temp += 0; else temp += y - 1;
   if(0 == 0) temp += 0; else temp += 0 - 1;
   if(y+1 == 0) temp += 0; else temp += (y + 1) - 1;
   
   return temp; 
}

In Listings 3 through 5 the calling method makes three calls to a small method, which we assume for this example's sake is more beneficial to inline than to jump to three times.

It might not make much difference to inline a method that is called rarely, but inlining a so-called "hot" method that is frequently called could mean a huge difference in performance. Inlining also frequently makes way for further optimizations, as shown in Listing 6.

Listing 6. After inlining, more optimizations can be applied

int whenToEvaluateZing(int y){
   if(y == 0) return y;
   else if (y == -1) return y - 1;
   else return y + y - 1;
}

Loop optimization

Loop optimization plays a big role when it comes to reducing the overhead that comes with executing loops. Overhead in this case means expensive jumps, number of checks of the condition, non-optimal instruction pipeline (i.e., an order of instructions that causes no-operations or extra cycles in the CPU). There are many kinds of loop optimizations, amounting to a vast set of optimizations. Notables include:

  • Combining loops: When two nearby loops are iterated the same amount of times, the compiler can try to combine the bodies of the loops, to be executed at the same time (in parallel) in the case where nothing in the bodies reference each other, i.e., they are fully independent of each other.
  • Inversion loops: Basically you replace a regular while loop with a do-while loop. And the do-while loop is set within an if clause. This replacement leads to two less jumps. However, it adds to the condition check and hence increases the code size. This optimization is an excellent example of how using slightly more resources leads to a more efficient code - a cost-gain balance the compiler has to evaluate and decide on dynamically during runtime.
  • Tiling loops: Reorganizes the loop so that it iterates over blocks of data that are sized to fit in the cache.
  • Unrolling loops: Reduces the number of times the loop condition has to be evaluated and also the number of jumps. You can think of this as "inlining" several iterations of the body to be executed without crossing the loop condition. Unrolling loops comes with risk, as it might decrease performance by impairing the pipeline and causing multiple redundant instruction fetches. Again, this is a judgment call by the compiler to make at runtime, i.e., if the gain is enough, the cost might be worth it.

This has been an overview of what a compiler does on a bytecode level (and below) to improve an application's execution performance on a target platform. The optimizations discussed are common and popular, but only a brief sampling of the available options. These have been very simple and broad explanations, which hopefully serve to pique your interest for more in-depth exploration. See Resources for further reading.

In conclusion: Reflection points and highlights

Use different compilers for different needs.

  • Interpretation is the simplest form of bytecode translation to machine instructions, and works based on an instruction lookup table.
  • Compilers allow for optimization based on performance counters, but will require some additional resources (code cache, optimization threads, etc.)
  • Client-side compilers improve the performance of execution code by an order of magnitude (5 to 10 times better) when compared to interpreted code.
  • Server-side compilers improve application performance by 30 percent to 50 percent over client-side compilers, but utilize more resources.
  • Tiered compilation provides the best of two worlds. Enable client compilation to get your code performing well quickly, and server compilation over time, to make frequently called code execute even better.

There are many possible code optimizations. An important task for the compiler is to analyze all possibilities and weigh the cost of using an optimization against the execution speed benefit of the output machine code.

About the author

Eva Andreasson has been involved with Java virtual machine technologies, SOA, cloud computing, and other enterprise middleware solutions for 10 years. She joined the startup Appeal Virtual Solutions (later acquired by BEA Systems) in 2001 as a developer of the JRockit JVM. Eva has been awarded two patents for garbage collection heuristics and algorithms. She also pioneered Deterministic Garbage Collection which later became productized through JRockit Real Time. Eva has worked closely with Sun and Intel on technical partnerships, as well as various integration projects of JRockit Product Group, WebLogic, and Coherence (post Oracle acquisition in 2008). In 2009 Eva joined Azul Systems as product manager for the new Zing Java Platform. Recently she switched gears and joined the team at Cloudera as senior product manager for Cloudera's Hadoop distribution, where she is engaged in the exciting future and innovation path of highly scalable, distributed data processing frameworks.

  • Print
  • Feedback

Resources
  • "JVM performance optimization, Part 1: A JVM technology primer" (Eva Andreasson, JavaWorld, August 2012) launches the JVM performance optimization series with an overview of how a classic Java virtual machine works, including Java's write-once, run-anywhere engine, garbage collection basics, and some common GC algorithms.
  • See "Watch your HotSpot compiler go" (Vladimir Roubtsov, JavaWorld.com, April 2003) for more about the mechanics of hotspot optimization and why it pays to warm up your compiler.
  • If you want to learn more about bytecode and the JVM, see "Bytecode basics" (Bill Venners, JavaWorld, 1996), which takes an initial look at the bytecode instruction set of the Java virtual machine, including primitive types operated upon by bytecodes, bytecodes that convert between types, and bytecodes that operate on the stack.
  • The Java compiler javac is fully discussed in the formal Java platform documentation.
  • Get more of the basics of JVM (JIT) compilers, see the IBM Research Java JIT compiler page.
  • Also see Oracle JRockit's "Understanding Just-In-Time Compilation and Optimization" (Oracle® JRockit Introduction Release R28).
  • Dr. Cliff Click gives a complete tutorial on tiered compilation in his Azul Systems blog (July 2010).
  • Learn more about using performance counters for JVM performance optimization: "Using Platform-Specific Performance Counters for Dynamic Compilation" (Florian Schneider and Thomas R. Gross; Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing, published by ACM Digital Library).
  • Oracle JRockit: The Definitive Guide (Marcus Hirt, Marcus Lagergren; Packt Publishing, 2010): A complete guide to the JRockit JVM.