Deadlock anti-patterns #2: Worker Aggregation

In the first installment of this three part series, I described the problem of deadlock; focusing on the logical conditions that are necessary for a deadlock to occur. I also introduced the deadlock-prone No Arbitration anti-pattern which is evident in applications where interdependent processes are allowed to contend for resources without effective arbitration to determine priority of access at critical junctures. An example of this was provided in the form of a train simulator.

This installment will examine yet another deadlock-prone anti-pattern, which we shall refer to by the moniker ‘Worker Aggregation’. Note that this installment also involves working through code samples, so please feel free to download this series' companion source code archive and load the files into your preferred IDE.

Worker Aggregation

An interesting experiment to try is as follows:

  1. divert all voice calls from phone A to B
  2. simultaneously divert all calls from B to A
  3. attempt a call to phone A and observe the result

What happened to your call? Did it simply end up in your voicemail inbox? Did you get some auto generated error-message threatening you with the wrath of the evil Emperor Zurg for your audacity? Or did it go through? Not surprisingly, this is a scenario that telecoms switch designers had to anticipate quite early on in the evolution of voice networks. Various manufacturers and operators came up with rather ingenious ways around it. Some simply, and rather unceremoniously, ruled out second-hop diverts; others chose to forward all second-level diverts to voicemail. A few even went further, and permitted larger divert chains under very specific and strict conditions, but also implemented algorithms to test for chain cyclicity. All are valid approaches, which undoubtedly saved some poor designers the few strands of hair they had remaining!

Have no fear, we are not about to write a telecoms switch simulator! The above example only serves to exemplify the conditions one must take into account when building systems that incorporate autonomous processes capable of self aggregation. Examples of self aggregating software include, but are not limited to, systems where processes can: cooperate with each other to break-down and perform composite tasks; divert portions of their tasks to other processes; subjugate pooled processes to perform sub-units of work. Process aggregation inevitably leads to chains of workers performing tasks in concert. In such architectures, a problem which becomes clear sooner or later is how to enforce acyclicity of process chains. Where cycles do occur, there is every possibility that the chains in question will never terminate, and in some cases may even deadlock.

Consider processes P1 ,P2, and Pn, such that they form a cyclic task chain of the form { P1,P2,...,Pn, P1}, where Pn holds a resource which is required by P1, and Pn cannot release the resource until P1 has terminated. In such a scenario neither process will be able to proceed further; the chain will simply deadlock.

Let us simplify the above example by examining the same concept expressed in code. To do so we will use the compilation units contained in the package example.workeraggregation, which illustrates what happens when worker thread dependencies become cyclic. The package consists of the following source files:


+example.workeraggregation
	-Worker.java
	-Worker2.java
	-WorkerRunner.java
	-WorkerRunner.java

The Worker class, shown in listing 6, models a worker process. Worker recruitment is encapsulated in the field coWorker. The worker also maintains an internal lock which is held whilst it is in service. Thus when a co-worker is involved in performing a task, its lock is held by the recruiting thread. This can be seen more clearly by examining the run() method.

Listing 6: example.workeraggregation.Worker


public class Worker extends Thread
{
	protected Lock lock;
	protected Worker coWorker;
	protected int workerIndex;
	
	public Worker(int workerIndex)
	
	public void setCoWorker(Worker coWorker)	
	public void lock(){lock.lock();}
	public void unlock(){lock.unlock();}
	
	@Override
	public void run()
	{		
		lock();
		System.out.println("Thread " + getName()+ " started ...");
				
		try
		{
			//simulate some work
			Thread.sleep(1000 *workerIndex);
			if (coWorker!=null)
			{
				coWorker.lock();
				System.out.println("Thread " + getName()+ 
							" co-opted co-worker " +
							coWorker.getName());
			}
		}
		catch (InterruptedException exce)
		{
			return;
		}
		finally
		{
			coWorker.unlock();
			unlock();
		}
	}
	
}

Note that a staggered sleep is used to simulate work before the co-worker is engaged. Now consider what would happen if we created two worker threads worker1 and worker2 such that the dependencies between them is as follows: worker1 -> worker2 -> worker1? This question is answered by executing the class WorkerRunner which is detailed in listing 7.

Listing 7: example.workeraggregation.WorkerRunner


public class WorkerRunner
{
	public static void main(String[] args)
	{
		Worker worker1 = new Worker(1);
		Worker worker2 = new Worker(2);
		
		worker1.setCoWorker(worker2);
		worker2.setCoWorker(worker1);
		
		worker1.start();
		worker2.start();		
	}
}

Doing so, produces the following output, and promptly deadlocks.


Thread Thread-0 started ...
Thread Thread-1 started ...

Examining, or stepping through the code, you will notice that worker2 tries to acquire a lock already held by worker1, which in turn cannot terminate and release the lock until worker2 has completed. Voila, deadlock!

There are of course many ways to go about solving this particular conundrum. A brute force approach would simply be to adopt the same strategy applied by some telecoms engineers: prevent the dependency chain from exceeding a pre-determined size. A more complex solution is to build and enforce an acyclic worker-dependency graph in real-time as the application progresses.

To simplify the illustration, we will use the former approach to show how this situation can be prevented from occurring. Listing 8 details an extension Worker2 of the class Worker which includes a chainSize attribute that is used to track the size of the dependency chain. This attribute is also used to ensure that all second-hop dependencies are ignored.

Listing 8: example.workeraggregation.Worker2


package example.workeraggregation;

public class Worker2 extends Worker
{
	private int chainSize;
	
	public void setCoWorker(Worker coWorker, 
					int chainSize)
	
	@Override
	public void run()
	{
		lock();
		System.out.println("Thread " + getName()+ 
						" started ...");
		
		boolean coOptedWorker=false;
		try
		{
			//simulate some work			
			Thread.sleep(1000 *workerIndex);
			if (coWorker!=null && chainSize<2)
			{
				coWorker.lock();
				coOptedWorker=true;
				System.out.println("Thread " + getName()+ 
							" co-opted co-worker " + 
							coWorker.getName());
			}
			else System.out.println(getName() 
						+ " working solo today...");
		}
		catch (InterruptedException exce)
		{
			return;
		}
		finally
		{
			if (coOptedWorker)coWorker.unlock();			
			unlock();
		}
	}
}

WorkerRunner

is also replaced with

WorkerRunner2

(shown in listing 9) which is responsible for specifying the dependency chain size. Executing this class, produces the following output:


Thread Thread-0 started ...
Thread Thread-1 started ...
Thread-1 working solo today...
Thread Thread-0 co-opted co-worker Thread-1

Notice that thread#1 does not bother trying to co-opt its co-worker because it is aware that the dependency chain has grown to a point where circular waits are possible. Note that this is not the only solution to this problem, but is intended to serve as a simple example of how this concurrency pattern can be modified so that it does not lead to circular waits. Readers as always are strongly encouraged to contribute alternatives, however inspired.

Listing 9: example.workeraggregation.WorkerRunner2


public class WorkerRunner2
{
	public static void main(String[] args)
	{
		Worker2 worker1 = new Worker2(1);
		Worker2 worker2 = new Worker2(2);
		
		int chainSize = 0;
		worker1.setCoWorker(worker2,++chainSize);
		worker2.setCoWorker(worker1,++chainSize);
		
		worker1.start();
		worker2.start();	
	}
}

Summary

In this installment, we have looked at the problems involved in worker aggregation; specifically focusing on how cyclic worker chains can lead to application deadlock. Code examples were provided to demonstrate this, as well as to show how this anti-pattern can be modified so as to reduce the risk of deadlock.

The next, and final, installment of this series will examine the incremental locking anti-pattern.