Programming Java threads in the real world, Part 2

The perils of race conditions, deadlock, and other threading problems

In his recent " Design for thread safety" article (part of the Design Techniques column), Bill Venners introduced the notion of a race condition, a situation whereby two threads simultaneously contend for the same object and, as a consequence, leave the object in an undefined state. Bill pointed out that Java's synchronized keyword is in the language to help avoid these problems, and he provided a straightforward example of its use. Last month's article was the first in a series on threads. The present article continues that series, covering race conditions in depth. It also discusses various deadlock scenarios, which are related to race conditions but are much harder to find and debug. This month, we'll focus primarily on the problems encountered when programming Java threads. Subsequent Java Toolbox columns will focus entirely on solutions.

Monitors, mutexes, and bathrooms

Just to make sure we're all starting from the same place, a little review of terms is in order. The central concept for synchronization in the Java model is the monitor, developed some 20 years ago by C. A. R. Hoare. A monitor is a body of code guarded by a mutual-exclusion semaphore (or, to use a term coined at Digital Equipment Corp., a mutex). The central notion of a mutex concerns ownership. Only one thread can own the mutex at a time. If a second thread tries to "acquire" ownership, it will block (be suspended) until the owning thread "releases" the mutex. If several threads are all waiting to acquire the same mutex, they will all be released simultaneously when the owning thread releases the mutex. The released threads will then have to sort out amongst themselves who gains ownership. (Typically, priority order, FIFO order, or some combination thereof is used to determine which thread gains control.) You guard a block of code by acquiring a mutex at the top of the block and releasing it at the bottom. The code comprising the monitor does not have to be contiguous: several discontiguous code blocks can all acquire and release the same mutex, in which case all of this code is considered to be in the same monitor because it uses a common mutex.

The best analogy I've heard for a monitor is an airplane bathroom. Only one person can be in the bathroom at a time (we hope). Everybody else is queued up in a rather narrow aisle waiting to use it. As long as the door is locked, the bathroom is unaccessible. Given these terms, in our analogy the object is the airplane, the bathroom is the monitor (assuming there's only one bathroom), and the lock on the door is the mutex.

In Java, every object has one and only one monitor and mutex associated with it. The single monitor has several doors into it, however, each indicated by the synchronized keyword. When a thread passes over the synchronized keyword, it effectively locks all the doors. Of course, if a thread doesn't pass across the synchronized keyword, it hasn't locked the door, and some other thread is free barge in at any time.

Bear in mind that the monitor is associated with the object, not the class. Several threads can all be executing the same method in parallel, but the receiving objects (as identified by the this references) have to be different. For example, several instances of a thread-safe queue could be in use by several threads. These threads could simultaneously enqueue objects to different queues, but they could not enqueue to the same queue at the same time. Only one thread at a time can be in the monitor for a given queue.

To refine the earlier analogy, the airplane is still the object, but the monitor is really all the bathrooms combined (each synchronized chunk of code is a bathroom). When a thread enters any bathroom, the doors on all the bathrooms are locked. Different instances of a class are different airplanes, however, and if the bathroom doors in your airplane are unlocked, you needn't really care about the state of the doors in the other airplanes.

Why is stop() deprecated in JDK 1.2?

The fact that the monitor is built into every Java object is actually somewhat controversial. Some of the problems associated with bundling the condition variable and mutex together inside every Java object have been fixed in JDK 1.2, by the simple expediency of deprecating the most problematic methods of the Thread class: stop() and suspend(). You can deadlock a thread if you call either of these methods from inside a synchronized method of your own. Look at the following method, remembering that the mutex is the lock on the door, and the thread that locks the door "owns" the monitor. That's why the stop() and suspend() methods are deprecated in JDK 1.2. Consider this method:

class some_class
{   //...
    synchronized void f()
    {   Thread.currentThread().stop();
    }
}

Now consider what happens when a thread calls f(). The thread acquires the lock when entering the monitor but then stops. The mutex is not released by stop(). This is the equivalent of someone going into the bathroom, locking the door, and flushing himself down the toilet! Any other thread that now tries to call f() (or any other synchronized method of the class) on the same object is blocked forever. The suspend() method (which is also deprecated) has the same problem. The sleep() method (which is not deprecated) can be just as tricky. (Someone goes into the bathroom, locks the door, and falls asleep). Also remember that Java objects, even those that extend Thread or implement Runnable, continue to exist, even if the associated thread has stopped. You can indeed call a synchronized method on an object associated with a stopped thread. Be careful.

Race conditions and spin locks

A race condition occurs when two threads try to access the same object at the same time, and the behavior of the code changes depending on who wins. The following diagram shows a single (unsynchronized) object accessed simultaneously by multiple threads. A thread can be preempted in fred() after modifying one field but before modifying the other. If another thread comes along at that point and calls any of the methods shown, the object will be left in an unstable state, because the initial thread will eventually wake up and modify the other field.

Usually, you think of objects sending messages to other objects. In multithreaded environments, you must think about message handlers running on threads. Think: this thread causes this object to send a message to that object. A race condition can occur when two threads cause messages to be sent to the same object at the same time.

Working with wait() and notify()

The following code demonstrates a blocking queue used for one thread to notify another when some event occurs. (We'll see a more realistic version of these in a future article, but for now I want to keep things simple.) The basic notion is that a thread that tries to dequeue from an empty queue will block until some other thread puts something in the queue.

class Notifying_queue 
{
    private static final queue_size = 10;
    private Object[]  queue = new Object[ queue_size ];
    private int       head  = 0;
    private int       tail  = 0;
    public void synchronized enqueue( Object item )
    {
        queue[++head %= queue_size ]= item; 
        this.notify();                  // The "this" is there only to
    }                                   // improve readability.
    public Object synchronized dequeue( )
    {
        try
        {   if( head == tail ) // <=== This is a bug
                this.wait();
        }
        catch( InterruptedException e )
        {
            // If we get here, we were not actually notified.
            // returning null doesn't indicate that the
            // queue is empty, only that the waiting was
            // abandoned.
            return null;
        }
        return queue[++tail %= queue_size ]; 
    }
}

Starting with an empty queue, let's follow the sequence of operations in play if one thread does a dequeue and then another (at some later time) enqueues an item.

  1. The dequeueing thread calls dequeue(), entering the monitor (and locking out other threads) until the monitor is released. The thread tests for an empty queue (head==tailwait(). (I'll explain why this is a bug in a moment.)

  2. The wait() releases the lock. (The current thread temporarily leaves the monitor.) It then blocks on a second synchronization object called a condition variable. (The basic notion of a condition variable is that a thread blocks until some condition becomes true. In the case of Java's built-in condition variable, the thread will wait until the notified condition is set to true by some other thread calling notify.) It's important to realize that the waiting thread has released the monitor at this juncture.

  3. A second thread now comes along and enqueues an object, eventually calling notify(), thereby releasing the waiting (dequeueing) thread.

  4. The dequeueing thread now has to wait to reacquire the monitor before wait() can return, so it blocks again, this time on the lock associated with the monitor.

  5. The en queueing thread returns from the enqueue() method releasing the monitor.

  6. The dequeueing thread acquires the monitor, wait() returns, an object is dequeued, and dequeue() returns, releasing the monitor.

Race conditions following a wait()

Now, what sorts of problems can arise? The main ones are unexpected race conditions. First, what if you replaced the notify() call with a call to notifyAll()? Imagine that several threads were simultaneously trying to dequeue something from the same, empty, queue. All of these threads are blocked on a single condition variable, waiting for an enqueue operation to be executed. When enqueue() sets the condition to true by calling notifyAll(), the threads are all released from the wait (moved from a suspended to a runnable state). The released threads all try to acquire the monitor at once, but only one would "win" and get the enqueued object. The problem is that the others would then get the monitor too, in some undefined sequence, after the winning thread had dequeued the object and returned from dequeue(). Since these other threads don't retest for an empty queue, they will dequeue garbage. (Looking at the code, the dequeue() method will advance the tail pointer to the next slot, the contents of which are undefined since the queue is empty.)

Fix the problem by replacing the if statement with a while loop (called a spin lock). Now the threads that don't get the object will go back to waiting:

    public Object synchronized dequeue( )
    {
        try
        {   while( head == tail )  // used to be an if
                this.wait();
        }
        catch( InterruptedException e )
        {   return null;
        }
        return queue[++tail %= queue_size ]; 
    }

That while loop also solves another, less obvious problem. What if we leave the notify() statement in place, and don't put in a notifyAll()? Since notify() releases only one waiting thread, won't that solve the problem? It turns out that the answer is no. Here's what can happen:

  1. notify() is called by the enqueueing thread, releasing the condition variable.

  2. The dequeueing thread is then preempted, after being released from the wait on the condition variable, but before it tries to reacquire the monitor's lock.

  3. A second thread calls dequeue() at exactly this point, and successfully enters the monitor because no other threads are officially waiting. This second thread successfully dequeues the object; wait() is never called since the queue isn't empty.

  4. The original thread is now allowed to run, it acquires the monitor, doesn't test for empty a second time, and then dequeues garbage.

This second scenario is easily fixed the same way as the first: replace the if statement with a while loop.

Note that the Java specification does not require that wait() be implemented as an atomic operation (that is, one that can't be preempted while moving from the condition variable to the monitor's lock). This means that using an if statement instead of a while loop might work in some Java implementations, but the behavior is really undefined. Using a spin lock instead of a simple if statement is cheap insurance against implementations that don't treat wait() as atomic.

Threads are not objects

Now let's move on to harder-to-find problems. The first difficulty is the commonplace confusion of threads and objects: Methods run on threads, objects do not. Put another way, the only way to get a method to run on a given thread is to call it (either directly or indirectly) from that thread's run() method. Simply putting it into the Thread derivative is not enough. For example, look at the following simple thread (which just prints its fields every so often):

class My_thread extends Thread
{
    private int field_1 = 0;
    private int field_2 = 0;
    public void run()
    {   
        setDaemon(true);  // this thread will not keep the app alive
        while( true )
        {
            System.out.println( " field_1=" + field_1
                                    " field_2=" + field_2 );
            sleep(100);
        }
    }
    synchronized public void modify( int new_value )
    {   field_1 = new_value;
            field_2 = new_value;
    }
}

You could start up the thread and send it a message like this:

My_thread test = new My_thread;
test.start();
//...
test.modify(1);

The only functions that run on the new thread are run() itself and println() (which run() calls). The modify() method never runs on the same thread as the println call; rather, it runs on whatever thread was running when the call was made. (In this case, it runs on whatever thread main() is running on.) Depending on timing, the earlier fragment could print:

field_1=0, field2=0

But it could just as easily print:

field_1=0, field2=1

or

field_1=1, field2=1

There's just no telling. (In the first and last cases, the thread would have been outside the println statement when modify() was called. In the second example, the thread would have been halfway through evaluating the arguments to println(), having fetched field_1, but not field_2. It prints the unmodified field_1 and the modified field_2.

The main issue here is that there is no simple solution to this problem. The modify() method is indeed synchronized in the earlier example, but run() can't be. Were it synchronized, you'd enter the monitor (and lock the object), when you started up the thread. Thereafter, any other thread that called any synchronized method on the object (such as modify()) would block until the monitor was released. Since run() doesn't return (as is often the case), the release will never happen, and the thread will act like a black hole, sucking up any other thread that calls any of its synchronized methods. In the current example, the main thread would be suspended, and the program would hang. So just using the synchronized keyword in a naive way gets us nowhere.

The deadlock scenario

Synchronizing run() is a good example of a simple deadlock scenario, where a thread is blocked forever, waiting for something to happen that can't. Let's look at a few examples that are more realistic than this.

The most common deadlock scenario occurs when two threads are both waiting for each other to do something. The following (admittedly contrived) code snippet makes what's going on painfully obvious:

class Flintstone
{
    int field_1;    private Object lock_1 = new int[1];
    int field_2;    private Object lock_2 = new int[1];
    public void fred( int value )
    {   synchronized( lock_1 )
        {   synchronized( lock_2 )
            {
                field_1 = 0;
                field_2 = 0;
            }
        }
    }
    public void barney( int value )
    {   synchronized( lock_2 )
        {   synchronized( lock_1 )
            {
                field_1 = 0;
                field_2 = 0;
            }
        }
    }
}

Now, imagine a scenario whereby one thread (call it Wilma) calls fred(), passes through the synchronization of lock_1, and is then preempted, allowing another thread (call it Betty) to execute. Betty calls barney(), acquires lock_2, and tries to acquire lock_1, but can't because Wilma has it. Betty is now blocked, waiting for lock_1 to become available, so Wilma wakes up and tries to acquire lock_2 but can't because Betty has it. Wilma and Betty are now deadlocked. Neither one can ever execute.

(Note that lock_1 and lock_2 have to be one-element arrays rather than simple ints, because only objects have monitors in Java; the argument to synchronized must be an object. An array is a first-class object in Java; a primitive-type such as int is not. Consequently, you can synchronize on it. Moreover, a one-element array is efficient to bring into existence compared to a more elaborate object (like an Integer) since it's both small and does not require a constructor call. Also, note that I can keep the reference to the lock as a simple Object reference, since I'll never access the array elements.)

As I mentioned above, Wilma and Betty are a contrived example, but the multiple-lock situation comes up frequently. I'll give a more detailed example next month.

Get out the magnifying glass

If all deadlock scenarios were as easy to recognize as Wilma and Betty, deadlock wouldn't be a problem. Consider the following code, though:

class Boss
{
    private Side_kick robin;
    public synchronized void set_side_kick( Side_kick kid_in_tights )
    {   robin = kid_in_tights;
    };
    public synchronized void to_the_bat_cave()
    {   robin.get_in_the_car(); 
    }
    public synchronized void okay()     // sent to us by robin
    {   //...
    }
    public synchronized void hold_on()  // sent to us by robin
    {   //...
    }
}
//-------------------------------------------------------
class Side_kick
{
    private Boss batman;
    public synchronized void set_boss( Boss guy_in_cape )
    {   batman = guy_in_cape;
    }
    public synchronized void get_in_the_car()   // sent by batman
    {   batman.okay();
    }
    public synchronized void sock_bam_pow()     // sent from outside
    {   batman.hold_on();
    }
}
//-------------------------------------------------------
class Gotham_city
{   static Boss      batman = new Boss();
    static Side_kick robin  = new Side_kick();
    public static void main( String[] args )
    {
        batman.set_side_kick( robin );
        robin.set_boss( batman );
        // spawn off a bunch of threads that use batman and robin.
    }
}

Now imagine the following:

  1. One thread (call it Alfred) issues a to_the_bat_cave() request to the batman object passed to it from main().

  2. The batman object starts to process the method, but is preempted just before it calls robin.get_in_the_car(). At this juncture, Alfred has acquired the lock for the batman object.

  3. Now along comes a second thread (call it Joker), which issues a sock_bam_pow() message to the robin object that it got from main().

  4. The robin object (whose sock_bam_pow() method is running on the Joker thread) tries to send a hold_on() message to batman, but can't because Alfred owns the lock on batman. So the Joker thread is now blocked, waiting for Alfred to release the lock on batman.

  5. Now Alfred gets a chance to run, and it tries to send a get_in_the_car() message to the robin object, but it can't because the Joker thread owns the lock on robin. Both threads are now deadlocked (sound familiar?)

Remember: threads own the locks and methods execute on threads, not objects.

This situation is, of course, much harder to see than the Wilma-and-Betty problem because the locks in the batman-robin example are the natural locks associated with individual objects. There are no standalone synchronized statements in the batman-robin code, and the locks are associated with two completely distinct objects.

Multithreaded programmers tear their hair out looking for the causes of these sorts of problems, and there are only two solutions. The first is to thoroughly design the code before implementing it, and then to really study both the design and the implementation before you even think about running it the first time. When I do multithreaded programs, I spend much more time on code and design reviews than I spend on coding.

The other solution is to use an architecture that tends to minimize these sorts of problems. (Various threading architectures will be the subject of the remaining articles in this series.)

There is no magic bullet there. I've seen ads for a couple of products that instrument a VM in a way that, in theory, will detect deadlock and race conditions. The problem, though, is a classic Heisenberg-uncertainty dilemma: there's no way to observe the process without impacting it. If the problem is timing-related, adding a print statement or changing to a debugging version of the VM will change the timing, perhaps eliminating the problem. I haven't actually used any of these products yet, but I remain skeptical.

Nested-monitor lockout

Another important form of deadlock is not discussed much in the Java-language books: nested-monitor lockout. This problem usually occurs when you call a blocking function from within a synchronized method, and the only way to release the block is to call another synchronized method. The following code demonstrates the problem.

class Notifying_queue 
{   // Same class as was described earlier, blocks on dequeue from
    // an empty queue.
    //...
}
class Black_hole
{
    private Notifying_queue queue = new Notifying_queue(5);
    public synchronized void put( Object thing )
    {   queue.enqueue( thing );
    }
    public synchronized Object get( )
    {   return queue.dequeue();
    }
}

Consider what happens when you try to dequeue something from an empty queue:

  1. You call get() to get the item from the queue.

  2. get() is synchronized, so the Black_hole is now locked.

  3. get() calls dequeue(), which blocks, waiting for some other thread to enqueue something. get() does not return.

  4. Another thread tries to enqueue something, but the only way to enqueue something is by calling put, which we can't do because the Black_hole is locked. That is, any thread that tries to put() will block because the first thread has not returned from get() yet.

The Black_hole now sucks up all threads that try to put() or get() anything. They all block forever.

Depending on where this occurs, the black hole could suck up every thread in your program. Also bear in mind that this problem can occur anytime you have a blocking call (such as a file read) inside a synchronized method.

The only cures are:

  1. Don't make blocking calls in synchronized methods

  2. Make sure there's a way to talk to the blocking object via another class or a nonsynchronized method

In the current situation, you could just not synchronize put(), but that wouldn't work in a more realistic situation where put() accessed fields of the class that were accessed by other methods.

This problem has been known since Java 1.0 was in the early prerelease stage, and several people complained vociferously about it. (The problem is a direct result of the way Java's synchronization works -- the condition variable and mutex are both part of the object and not separate entities --- compounded by the fact that you have to acquire the mutex to wait on the condition.) But as Doug Lea pointed out in a recent e-mail to me:

[the complaints] boiled down to "you tend to like best what you are most used to." Java makes some things that are painful in POSIX easy, and vice-versa. In any case, it is pretty easy to simulate one set of primitives with the other.

That's life, I guess.

The next several articles in this series on threads will present a solution to the problem that decouples the semaphores from the things they guard, but that solution introduces a whole set of additional problems.

Conclusion

Hopefully, I've demonstrated by now that programming in a multithreaded environment isn't as easy as the evangelist types would have you believe. Java provides platform-independent ways to use the two essential synchronization mechanisms: exclusion semaphores and condition variables. It does it in an awkward way, however, that doesn't help much when you're trying to do more than blink your logo in a simple applet. All is not lost, however. Over the next few months, I'll present a library of classes that solve many common threading problems, including some of the ones I've just discussed. Sometimes, I even think of it as fun, but maybe that's because I've been programming too long.

Allen Holub has been working in the computer industry since 1979. He is widely published in magazines (Dr. Dobb's Journal, Programmers Journal, Byte, MSJ, among others). He has seven books to his credit, and is currently working on an eighth that will present the complete sources for a Java compiler written in Java. Allen abandoned C++ for Java in early 1996 and now looks at C++ as a bad dream, the memory of which is mercifully fading. He's been teaching programming (first C, then C++ and MFC, now OO-Design and Java) both on his own and for the University of California Berkeley Extension since 1982. Allen offers both public classes and in-house training in Java and object-oriented design topics. He also does object-oriented design consulting. Get more information and contact Allen via his Web site http://www.holub.com.

Learn more about this topic

  • Sun's Technical Articles page has several articles on multithreading http://developer.javasoft.com/developer/technicalArticles/#thread
  • Prashant Jain and Douglas C. Schmidt have a good article contrasting C++ to Java that discusses many of the thread-related problems inherent in the language. The article can be found at http://www.cs.wustl.edu/%7Eschmidt/C++2java.html
  • Doug Lea has a bunch of Mutex and Condition-variable classes in his util.concurrent package. See http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/intro.html
  • Doug Schmidt's Ace Framework is a good, though complex, attempt at a truly platform-independent threading system http://www.cs.wustl.edu/~schmidt/
  • There are several good books that discuss the Java threading issues mentioned in the first article in this series. For convenience, I've listed them again here:
  • For a great in-depth look at multithreading in general and the implementation of multithreading both in and with Java in particular, this is a must. It's required reading if you're using threads heavilyDoug Lea, Concurrent Programming in JavaDesign Principles and Patterns (ReadingAddison Wesley, 1997) http://java.sun.com/docs/books/cp/
  • For a book on Java threading that is less technical but more readable than Lea's, seeScott Oaks and Henry Wong, Java Threads (Sebastopol, Calif.O'Reilly, 1997) http://www.oreilly.com/catalog/jthreads/
  • This book is good for the general subject of multithreading but doesn't have a Java slantBill Lewis and Daniel J. Berg, Threads PrimerA Guide to Multithreaded Programming (Englewood CliffsPrentice Hall/SunSoft Press, ISBN 0-13-443698-9) http://www.sun.com/books/books/Lewis/Lewis.html
Join the discussion
Be the first to comment on this article. Our Commenting Policies