Watch your HotSpot compiler go

Don't start profiling your Java application too early

April 11, 2003

Q: In " My Kingdom for a Good Timer! ," what is the purpose of this part of the sample code:

// JIT/hotspot warmup:
for (int i = 0; i < 3000; ++ i)
        {
            timer.start ();
            timer.stop ();
            timer.getDuration ();
            timer.reset ();
        }

A: The short answer is the above code snippet "warms up" the JVM with respect to the ITimer implementation. Let's investigate what I mean by that in more detail.

It's easier to illustrate my point with a specially crafted code snippet:

public class Main
{
    public static void main (String [] args)
    {
        // Create an ITimer using the Factory class:
        final ITimer timer = TimerFactory.newTimer ();
        
        for (int repeat = 0; repeat < 25; ++ repeat)
        {
            timer.start ();
            sum (100);
            timer.stop ();
            
            System.out.println (repeat + ": " + timer.getDuration ());
            timer.reset ();
        }
    }
    
    public static int sum (int n)
    {
        if (n <= 1)
            return 1;
        else
            return n + sum (n - 1);
    }
} // End of class

As you can see, I use the ITimer high-resolution API once again to measure the performance of a test method called sum(), which is a very naïve implementation of the sum of numbers 1+2+...+n (an arithmetic progression). It is implemented recursively by design. You should never use such an implementation inside a real application. However, it suits my discussion precisely because of its high method call overhead.

The above code snippet invokes sum(100) 25 times and logs the cost of every call. The results from running in Java 2 Platform, Standard Edition (J2SE) 1.4.1 for Win32 are quite interesting:

0: 0.08888
1: 0.04755
2: 0.03448
3: 0.03462
4: 0.03432
5: 0.03437
6: 0.03440
7: 0.03460
8: 0.03441
9: 0.03445
10: 0.0344
11: 0.0343
12: 0.0343
13: 0.0345
14: 0.8865
15: 0.0444

16: 0.0107
17: 0.0103
18: 0.0102
19: 0.0102
20: 0.0102
21: 0.0103
22: 0.0101
23: 0.0103
24: 0.0103

Notice that something strange happens around the 15th repeat: the method slows down for a while and then speeds up by a factor of 3.5. "It's just a random garbage collection (GC) delay," I hear some seasoned Java programmers mutter. Except it is not: a sporadic GC run would not explain why sum() gets faster afterwards and stays that way (you can see that by making the upper limit for the repeat loop variable much larger than 25). Furthermore, you can repeat this experiment numerous times, and the phenomenon will always happen at about the same point in the program's execution.

What is going on?

The riddle has a simple explanation. A call to sum(n) implies n total calls to the method; n-1 of them are recursive. After 15 calls to sum(100), the method body executes a grand total of 1,500 times. As it happens, my code executes in the client HotSpot JVM by default, and 1,500 is the default method invocation threshold for when a method is considered for compilation into native code. Execution slows down while HotSpot spends extra CPU cycles to compile sum() into native code and then speeds up because the JVM switches to the method's natively compiled, and thus noticeably faster, version.

For further proof you can add the -XX:+PrintCompilation HotSpot option (see Resources) to your JVM command line and watch the HotSpot engine tell you when it compiles various methods:

... other methods from core libraries get compiled ...
0: 0.08118
1: 0.03642
2: 0.03460
3: 0.03468
4: 0.03470
5: 0.03439
6: 0.03440
7: 0.03463
8: 0.03441
9: 0.03460
10: 0.0348
11: 0.0345
12: 0.0342
  9   b   java.lang.String::indexOf (74 bytes)
13: 0.0350
 10   b   Main::sum (16 bytes)
14: 1.4324
15: 0.0469

16: 0.0103
17: 0.0102
18: 0.0101
19: 0.0101
20: 0.0100
21: 0.0104
22: 0.0102
23: 0.0104
24: 0.0104

At this point, I can at last give a deeper reason behind the code snippet in the question that started this article: in a loop of length 3,000 (greater than the default method invocation threshold for a client HotSpot VM), I exercise all ITimer methods to have them compiled into native code and/or inlined. Although I cheat HotSpot a little, this ultimately gives me a small boost in the timing resolution and improves profiling accuracy when measuring short time intervals. If you remove the call to sum() from the above sample code and change the loop repeat count to, say, 2,000 so that timer.start() and timer.stop() execute back to back multiple times, you will observe the expected phenomenon around the 1,500th line in the output:

...
1492: 0.00283
1493: 0.00271
1494: 0.00271
1495: 0.00267
1496: 0.00271
1497: 0.00272
1498: 0.00284
1499: 2.84138
1500: 0.00724

1501: 0.00207
1502: 0.00219
1503: 0.00196
1504: 0.00203
1505: 0.00203
1506: 0.00203
1507: 0.00203
...

The improvement in timer resolution is about 30 percent, not mind-bogglingly large, but easy to achieve. Therefore, I make it a rule to always condition my

ITimer

implementation this way when I use it for very fine measurements.

Lessons learned

Together we have seen that a given piece of Java code will execute at very different speeds during an application's lifetime. Initially, it might execute in interpreted mode; if it executes enough times, it becomes a candidate for compilation to native code.

For a Java program with high uptime, such as an application server, most methods will translate into native code sooner or later. At this state of the JVM, profiling gives a realistic picture of the application's performance. Profiling data taken early in the application's lifetime should either be discarded or not collected at all (unless you are specifically profiling your application's startup behavior). Unfortunately, I suspect that most programmers do not take this into proper consideration. Sun Microsystems' server HotSpot JVM has a larger default method invocation threshold than the client JVM (10,000 instead of 1,500 (see Resources)), and it might take minutes, if not hours, to properly warm up the JVM.

As an experiment, you might consider shortening the warm-up period by manipulating the -XX:CompileThreshold JVM option, although you will soon discover that making this threshold too small will delay your application's startup, as HotSpot begins compiling just about every method it discovers into native code, including methods in the core Java libraries.

Vladimir Roubtsov has programmed in a variety of languages for more than 13 years, including Java since 1995. Currently, he develops enterprise software as a senior developer for Trilogy in Austin, Texas.

Learn more about this topic