Diagnose common runtime problems with hprof

Track down the culprits behind your application failures

1 2 Page 2
Page 2 of 2
TRACE 2040: (thread=8)
        java/util/Random.next(Random.java:Unknown line)
        java/util/Random.nextDouble(Random.java:373)
        java/lang/Math.random(Math.java:557)
        BusyWork.computeAndSleep(CPUHog.java:81)
        BusyWork.computeAndSleepLoop(CPUHog.java:70)
        BusyWork.workTillYouDrop(CPUHog.java:64)
        Workaholic.run(CPUHog.java:42)
TRACE 2046: (thread=8)
        BusyWork.computeAndSleep(CPUHog.java:81)
        BusyWork.computeAndSleepLoop(CPUHog.java:70)
        BusyWork.workTillYouDrop(CPUHog.java:64)
        Workaholic.run(CPUHog.java:42)
TRACE 2047: (thread=7)
        BusyWork.computeAndSleep(CPUHog.java:81)
        BusyWork.computeAndSleepLoop(CPUHog.java:70)
        BusyWork.workNormally(CPUHog.java:59)
        WorkingStiff.run(CPUHog.java:32)
TRACE 2031: (thread=6)
        BusyWork.computeAndSleep(CPUHog.java:81)
        BusyWork.computeAndSleepLoop(CPUHog.java:70)
        BusyWork.slouch(CPUHog.java:54)
        Slouch.run(CPUHog.java:22)

Perhaps just as interesting as what the profiler shows us is what it doesn't clearly show. Note that thread 8 (the Workaholic) is responsible for the top CPU hog call to the Random.next() method. The workTillYouDrop() method ultimately calls the Random.next() method, but that does not mean that it shows up as a top CPU hog on a line by itself, as we might expect.

The profiler flattens the hierarchy of method calls the way it flattened the hierarchy of object containment in the MemoryLeak example. It does not count CPU utilization by internal methods toward the total of a method lower on the stack trace. The calls to the BusyWork.computeAndSleep() method show up because this method is called many times by each thread. BusyWork.computeAndSleep() is often part of the sampled stack traces, so it frequently gets caught in the act. As was the case in the Sites section, here we can only discern what is really happening when we look at the stack traces.

Note: hprof's cpu=time mode should show CPU usage hierarchically, but this feature appears to have a bug that leads to a segmentation fault in the 1.3 VMs I worked with.

Diagnose a deadlock

The third control program exhibits a classic deadlock, as shown in Listing 3:

Listing 3. Deadlock program

01 /** Main class for the control test. */
02 public class ClassicDeadlock {
03 
04    public static void main(String[] args) {
05       SynchronizedObject left;
06       SynchronizedObject right;
07 
08       LeftToRightGrabber leftToRight;
09       RightToLeftGrabber rightToLeft;
10 
11       left  = new SynchronizedObject("left");
12       right = new SynchronizedObject("right");
13 
14       leftToRight = new LeftToRightGrabber(left,right);
15       rightToLeft = new RightToLeftGrabber(left,right);
16 
17       leftToRight.start();
18       rightToLeft.start();
19    }
20 
21 } // End ClassicDeadlock.
22 
23 /** An object containing a synchronized method. */
24 class SynchronizedObject {
25   public String name;
26   public SynchronizedObject(String aName) {
27      name = aName;
28   }
29   public synchronized void synchronizedMethod() {
30       String message = name                        + 
31                        " lock acquired by thread " +
32                        Thread.currentThread().getName(); 
33       System.out.println(message);
34       System.out.flush();
35   }
36 } // End of SynchronizedObject.
37 
38 /** Base class for the Left and Right Grabber. */
39 abstract class GrabberBase extends Thread {
40    protected SynchronizedObject leftObject;
41    protected SynchronizedObject rightObject;
42 
43    public GrabberBase(SynchronizedObject left,
44                       SynchronizedObject right,
45                       String             name
46                      ) {
47      super(name);
48      leftObject  = left;
49      rightObject = right; 
50    } 
51 } // End of GrabberBase.
52 
53 /** Class to grab locks right and then left. */
54 class RightToLeftGrabber extends GrabberBase {
55       public RightToLeftGrabber( SynchronizedObject left,
56                                  SynchronizedObject right) {
57           super(left, right, "RightToLeft");
58       }
59 
60       public void run() {
61          while(true) {
62            grabRightToLeft();
63          }
64       }
65 
66       private void grabRightToLeft() {
67           synchronized (rightObject) {
68              leftObject.synchronizedMethod();
69           }
70       }
71 } // End of RightToLeftGrabber.
72 
73 /** Class to grab locks left and then right. */
74 class LeftToRightGrabber extends GrabberBase {
75       public LeftToRightGrabber( SynchronizedObject left,
76                                  SynchronizedObject right) {
77           super(left, right, "LeftToRight");
78       }
79 
80       public void run() {
81          while(true) {
82             grabLeftToRight();
83          }
84       }
85  
86       private void grabLeftToRight() {
87           synchronized (leftObject) {
88              rightObject.synchronizedMethod();
89           }
90       }
91 } // End of LeftToRightGrabber.

This program creates two equivalent objects, left and right, each having its own synchronization monitor. Two threads then try to grab locks on these objects in a particular order (by entering a block synchronized on one object and then calling a synchronized method on the other). If the LeftToRightGrabber thread can enter its synchronized block and the RightToLeftGrabber thread is scheduled to run before LeftToRightGrabber completes its work, a deadlock will occur. How long the program must run to hit a deadlock depends on the JVM's thread scheduling algorithm. For me, this typically occurred during the third switch between thread contexts.

Once the deadlock occurs, hitting Ctrl-\ generates a profile file containing a Monitor Dump section, like so:

MONITOR DUMP BEGIN
    THREAD 8, trace 2018, status: CW
    THREAD 7, trace 2027, status: MW
    THREAD 6, trace 2016, status: MW
    THREAD 5, trace 2036, status: MW
    THREAD 4, trace 2037, status: CW
    THREAD 3, trace 2038, status: CW
    THREAD 2, trace 2039, status: R
    MONITOR java/lang/ref/Reference$Lock(80ddd80) unowned
        waiting to be notified: thread 3
    MONITOR SynchronizedObject(81ece98)
        owner: thread 7, entry count: 1
        waiting to enter: thread 6
    MONITOR SynchronizedObject(81ecd40)
        owner: thread 6, entry count: 1
        waiting to enter: thread 7
    MONITOR java/lang/ref/ReferenceQueue$Lock(80e7010) unowned
        waiting to be notified: thread 4
    ...

We can see that threads 6 and 7 (LeftToRight and RightToLeft from the Thread section) are both in waiting states. Thread 7 holds one of the SynchronizedObject instance locks while thread 6 waits for that lock, and thread 6 holds the other lock while thread 7 waits for it. The traces below confirm that each thread dutifully waits to obtain the lock held by the other thread -- exactly where we expect them to wait:

TRACE 2027: (thread=7)
        RightToLeftGrabber.grabRightToLeft(ClassicDeadlock.java:68)
        RightToLeftGrabber.run(ClassicDeadlock.java:62)
TRACE 2016: (thread=6)
        LeftToRightGrabber.grabLeftToRight(ClassicDeadlock.java:88)
        LeftToRightGrabber.run(ClassicDeadlock.java:82)

hprof has shown us what stopped this program in its tracks.

Tackle uncharted territory

By writing simple control experiment programs and running them under your chosen profiling tool, you can gain experience with the tool and confidence to use it when you venture into uncharted territory. The programs examined in this article model three common problems. You can use them as-is with other profilers and let them serve as a starting point for your own experimentation.

hprof may not be the easiest tool with which to work, but it's freely available to anyone using the Java 2 Platform and can effectively track down runtime problems like these.

Bill Pierce is a senior software engineer at Seattle-based Vixel Corp., where he has helped ship eight releases of an all-Java network management application called SAN InSite. Previously, he spent 10 years pulling results from measurement systems as an experimental physicist.

Learn more about this topic

Related:
1 2 Page 2
Page 2 of 2