Diagnose common runtime problems with hprof

Track down the culprits behind your application failures

Memory leaks and deadlocks and CPU hogs, oh my! Java application developers often face these runtime problems. They can be particularly daunting in a complex application with multiple threads running through hundreds of thousands of lines of code -- an application you can't ship because it grows in memory, becomes inactive, or gobbles up more CPU cycles than it should.

It's no secret that Java profiling tools have had a long way to catch up to their alternate-language counterparts. Many powerful tools now exist to help us track down the culprits behind those common problems. But how do you develop confidence in your ability to use these tools effectively? After all, you're using the tools to diagnose complex behavior you don't understand. To compound your plight, the data provided by the tools is reasonably complex and the information you are looking at or for is not always clear.

When faced with similar problems in my previous incarnation as an experimental physicist, I created control experiments with predictable results. This helped me gain confidence in the measurement system I used in experiments that generated less predictable results. Similarly, this article uses the hprof profiling tool to examine three simple control applications that exhibit the three common problem behaviors listed above. While not as user-friendly as some commercial tools on the market, hprof is included with the Java 2 JDK and, as I'll demonstrate, can effectively diagnose these behaviors.

Run with hprof

Running your program with hprof is easy. Simply invoke the Java runtime with the following command-line option, as described in the JDK tool documentation for the Java application launcher:

java -Xrunhprof[:help][:<suboption>=<value>,...] MyMainClass

A list of suboptions is available with the [:help] option shown. I generated the examples in this article using the Blackdown port of the JDK 1.3-RC1 for Linux with the following launch command:

java -classic -Xrunhprof:heap=sites,cpu=samples,depth=10,monitor=y,thread=y,doe=y MemoryLeak

The following list explains the function of each suboption used in the previous command:

  • heap=sites: Tells hprof to generate stack traces indicating where memory was allocated
  • cpu=samples: Tells hprof to use statistical sampling to determine where the CPU spends its time
  • depth=10: Tells hprof to show stack traces 10 levels deep, at most
  • monitor=y: Tells hprof to generate information on contention monitors used to synchronize the work of multiple threads
  • thread=y: Tells hprof to identify threads in stack traces
  • doe=y: Tells hprof to produce a dump of profiling data upon exit

If you use JDK 1.3, you need to turn off the default HotSpot compiler with the -classic option. HotSpot has its own profiler, invoked through an -Xprof option, that uses an output format different from the one I'll describe here.

Running your program with hprof will leave a file called java.hprof.txt in your working directory; this file contains the profiling information collected while your program runs. You can also generate a dump at any time while your program is running by pressing Ctrl-\ in your Java console window on Unix or Ctrl-Break on Windows.

Anatomy of an hprof output file

An hprof output file includes sections describing various characteristics of the profiled Java program. It starts with a header that describes its format, which the header claims is subject to change without notice.

The output file's Thread and Trace sections help you figure out what threads were active when your program ran and what they did. The Thread section provides a list of all threads started and terminated during the program's life. The Trace section includes a list of numbered stack traces for some threads. These stack trace numbers are cross-referenced in other file sections.

The Heap Dump and Sites sections help you analyze memory usage. Depending on the heap suboption you choose when you start the virtual machine (VM), you can get a dump of all live objects in the Java heap (heap=dump) and/or a sorted list of allocation sites that identifies the most heavily allocated objects (heap=sites).

The CPU Samples and CPU Time sections help you understand CPU utilization; the section you get depends on your cpu suboption (cpu=samples or cpu=time). CPU Samples provides a statistical execution profile. CPU Time includes measurements of how many times a given method was called and how long each method took to execute.

The Monitor Time and Monitor Dump sections help you understand how synchronization affects your program's performance. Monitor Time shows how much time your threads experience contention for locked resources. Monitor Dump is a snapshot of monitors currently in use. As you'll see, Monitor Dump is useful for finding deadlocks.

Diagnose a memory leak

In Java, I define a memory leak as a (usually) unintentional failure to dereference discarded objects so that the garbage collector cannot reclaim the memory they use. The MemoryLeak program in Listing 1 is simple:

Listing 1. MemoryLeak program

01 import java.util.Vector;
02 
03 public class MemoryLeak {
04 
05    public static void main(String[] args) {
06 
07       int    MAX_CONSUMERS        = 10000;
08       int    SLEEP_BETWEEN_ALLOCS = 5;
09 
10       ConsumerContainer objectHolder = new ConsumerContainer();
11 
12       while(objectHolder.size() < MAX_CONSUMERS) {
13           System.out.println("Allocating object " + 
14                              Integer.toString(objectHolder.size())
15                             );
16           objectHolder.add(new MemoryConsumer());
17           try {
18               Thread.currentThread().sleep(SLEEP_BETWEEN_ALLOCS);
19           } catch (InterruptedException ie) {
20               // Do nothing.
21           }
22       } // while.
23    } // main.
24 
25 } // End of MemoryLeak.
26 
27 /** Named container class to hold object references. */
28 class ConsumerContainer extends Vector {}
29 
30 /** Class that consumes a fixed amount of memory. */
31 class MemoryConsumer {
32   public static final int MEMORY_BLOCK        = 1024;
33   public byte[]           memoryHoldingArray;
34 
35   MemoryConsumer() {
36       memoryHoldingArray = new byte[MEMORY_BLOCK];
37   }
38 } // End MemoryConsumer.

When the program runs, it creates a ConsumerContainer object, then starts creating and adding MemoryConsumer objects at least 1 KB in size to that ConsumerContainer object. Keeping the objects accessible makes them unavailable for garbage collection, simulating a memory leak.

Let's look at select parts of the profile file. The Sites section's first few lines clearly show what is happening:

SITES BEGIN (ordered by live bytes) Mon Sep  3 19:16:29 2001
          percent         live     alloc'ed       stack class
 rank   self  accum    bytes objs     bytes  objs trace name
    1 97.31% 97.31% 10280000 10000 10280000 10000  1995 [B
    2  0.39% 97.69%    40964     1    81880    10  1996 [L<Unknown>;
    3  0.38% 98.07%    40000 10000    40000 10000  1994 MemoryConsumer
    4  0.16% 98.23%    16388     1    16388     1  1295 [C
    5  0.16% 98.38%    16388     1    16388     1  1304 [C
    ...

There are 10,000 objects of type byte[] ([B in VM-speak) as well as 10,000 MemoryConsumer objects. The byte arrays take up 10,280,000 bytes, so apparently there is overhead just above the raw bytes that each array consumes. Since the number of objects allocated equals the number of live objects, we can conclude that none of these objects could be garbage collected. This is consistent with our expectations.

Another interesting point: the memory reported to be consumed by the MemoryConsumer objects does not include the memory consumed by the byte arrays. This shows that our profiling tool does not expose hierarchical containment relationships, but rather class-by-class statistics. This is important to understand when using hprof to pinpoint a memory leak.

Now, where did those leaky byte arrays come from? Notice that the MemoryConsumer objects and the byte arrays reference traces 1994 and 1995 in the following Trace section. Lo and behold, these traces tell us that the MemoryConsumer objects were created in the MemoryLeak class's main() method and that the byte arrays were created in the constructor (<init>() method in VM-speak). We've found our memory leak, line numbers and all:

TRACE 1994: (thread=1)
        MemoryLeak.main(MemoryLeak.java:16)
TRACE 1995: (thread=1)
        MemoryConsumer.<init>(MemoryLeak.java:36)
        MemoryLeak.main(MemoryLeak.java:16)

Diagnose a CPU hog

In Listing 2, a BusyWork class has each thread call a method that regulates how much the thread works by varying its sleep time in between bouts of performing CPU-intensive calculations:

Listing 2. CPUHog program

01 /** Main class for control test. */
02 public class CPUHog {
03    public static void main(String[] args) {
04 
05       Thread slouch, workingStiff, workaholic;
06       slouch       = new Slouch();
07       workingStiff = new WorkingStiff();
08       workaholic   = new Workaholic();
09 
10       slouch.start();
11       workingStiff.start();
12       workaholic.start();
13    }
14 }
15 
16 /** Low CPU utilization thread. */
17 class Slouch extends Thread {
18   public Slouch() {
19      super("Slouch");
20   }
21   public void run() {
22      BusyWork.slouch();
23   }
24 }
25 
26 /** Medium CPU utilization thread. */
27 class WorkingStiff extends Thread {
28   public WorkingStiff() {
29       super("WorkingStiff");
30   }
31   public void run() {
32      BusyWork.workNormally();
33   }
34 }
35 
36 /** High CPU utilization thread. */
37 class Workaholic extends Thread {
38   public Workaholic() {
39       super("Workaholic");
40   }
41   public void run() {
42      BusyWork.workTillYouDrop();
43   }
44 }
45 
46 /** Class with static methods to consume varying amounts
47  *  of CPU time. */
48 class BusyWork {
49 
50   public static int callCount = 0;
51 
52   public static void slouch() {
53     int SLEEP_INTERVAL = 1000;
54     computeAndSleepLoop(SLEEP_INTERVAL); 
55   }
56 
57   public static void workNormally() {
58     int SLEEP_INTERVAL = 100;
59     computeAndSleepLoop(SLEEP_INTERVAL); 
60   }
61 
62   public static void workTillYouDrop() {
63     int SLEEP_INTERVAL = 10;
64     computeAndSleepLoop(SLEEP_INTERVAL); 
65   }
66   
67   private static void computeAndSleepLoop(int sleepInterval) {
68      int MAX_CALLS = 10000;
69      while (callCount < MAX_CALLS) {
70         computeAndSleep(sleepInterval);
71      }
72   } 
73 
74   private static void computeAndSleep(int sleepInterval) {
75         int    COMPUTATIONS = 1000;
76         double result;
77         
78         // Compute.
79         callCount++;
80         for (int i = 0; i < COMPUTATIONS; i++) {
81            result = Math.atan(callCount * Math.random());         
82         }
83 
84         // Sleep.
85         try {
86            Thread.currentThread().sleep(sleepInterval);
87         } catch (InterruptedException ie) {
88            // Do nothing.
89         }
90 
91   } // End computeAndSleep.
92 } // End BusyWork.

There are three threads -- Workaholic, WorkingStiff, and Slouch -- whose work ethics vary by orders of magnitude, judging by the work they choose to do. Examine the profile's CPU Samples section shown below. The three highest ranked traces show that the CPU spent most of its time calculating random numbers and arc tangents, as we would expect:

CPU SAMPLES BEGIN (total = 935) Tue Sep  4 20:44:49 2001
rank   self  accum   count trace method
   1 39.04% 39.04%     365  2040 java/util/Random.next
   2 26.84% 65.88%     251  2042 java/util/Random.nextDouble
   3 10.91% 76.79%     102  2041 java/lang/StrictMath.atan
   4  8.13% 84.92%      76  2046 BusyWork.computeAndSleep
   5  4.28% 89.20%      40  2050 java/lang/Math.atan
   6  3.21% 92.41%      30  2045 java/lang/Math.random
   7  2.25% 94.65%      21  2051 java/lang/Math.random
   8  1.82% 96.47%      17  2044 java/util/Random.next
   9  1.50% 97.97%      14  2043 java/util/Random.nextDouble
  10  0.43% 98.40%       4  2047 BusyWork.computeAndSleep
  11  0.21% 98.61%       2  2048 java/lang/StrictMath.atan
  12  0.11% 98.72%       1  1578 java/io/BufferedReader.readLine
  13  0.11% 98.82%       1  2054 java/lang/Thread.sleep
  14  0.11% 98.93%       1  1956 java/security/PermissionCollection.setReadOnly
  15  0.11% 99.04%       1  2055 java/lang/Thread.sleep
  16  0.11% 99.14%       1  1593 java/lang/String.valueOf
  17  0.11% 99.25%       1  2052 java/lang/Math.random
  18  0.11% 99.36%       1  2049 java/util/Random.nextDouble
  19  0.11% 99.47%       1  2031 BusyWork.computeAndSleep
  20  0.11% 99.57%       1  1530 sun/io/CharToByteISO8859_1.convert
  ...

Note that calls to the BusyWork.computeAndSleep() method take up 8.13 percent, 0.43 percent, and 0.11 percent for the Workaholic, WorkingStiff, and Slouch threads, respectively. We can tell which threads these are by examining the traces referenced in the CPU Samples section's trace column above (ranks 4, 10, and 19) in the following Trace section:

Related:
1 2 Page 1