Book excerpt: Executing tasks in threads

When creating threads to perform tasks, look to the Executor framework

1 2 3 4 Page 3
Page 3 of 4

Example: Sequential page renderer

The simplest approach is to process the HTML document sequentially. As text markup is encountered, render it into the image buffer; as image references are encountered, fetch the image over the network and draw it into the image buffer as well. This is easy to implement and requires touching each element of the input only once (it doesn't even require buffering the document), but is likely to annoy the user, who may have to wait a long time before all the text is rendered.

A less annoying but still sequential approach involves rendering the text elements first, leaving rectangular placeholders for the images, and after completing the initial pass on the document, going back and downloading the images and drawing them into the associated placeholder. This approach is shown in SingleThreadRenderer in Listing 10.

Listing 10. Rendering page elements sequentially

                        public class SingleThreadRenderer {
   void renderPage(CharSequence source) {
      renderText(source);
            List<ImageData> imageData = new ArrayList<ImageData>();
      for (ImageInfo imageInfo : scanForImageInfo(source))
         imageData.add(imageInfo.downloadImage());
      for (ImageData data : imageData)
         renderImage(data);
   }
}
                   

Downloading an image mostly involves waiting for I/O to complete, and during this time the CPU does little work. So the sequential approach may underutilize the CPU, and also makes the user wait longer than necessary to see the finished page. We can achieve better utilization and responsiveness by breaking the problem into independent tasks that can execute concurrently.

Result-bearing tasks: Callable and Future

The Executor framework uses Runnable as its basic task representation. Runnable is a fairly limiting abstraction; run cannot return a value or throw checked exceptions, although it can have side effects such as writing to a log file or placing a result in a shared datastructure.

Many tasks are effectively deferred computations�executing a database query, fetching a resource over the network, or computing a complicated function. For these types of tasks, Callable is a better abstraction: it expects that the main entry point, call, will return a value and anticipates that it might throw an exception. (To express a non-value-returning task with Callable, use Callable<Void>.) Executor includes several utility methods for wrapping other types of tasks, including Runnable and java.security.PrivilegedAction, with a Callable.

Both Runnable and Callable describe abstract computational tasks. Tasks are usually finite: they have a clear starting point and they eventually terminate. The lifecycle of a task executed by an Executor has four phases: created, submitted, started, and completed. Since tasks can take a long time to run, we also want to be able to cancel a task. In the Executor framework, tasks that have been submitted but not yet started can always be cancelled, and tasks that have started can sometimes be cancelled if they are responsive to interruption. Cancelling a task that has already completed has no effect.

The Future represents the lifecycle of a task and provides methods to test whether the task has completed or been cancelled, retrieve its result, and cancel the task. Callable and Future are shown in Listing 11. Implicit in the specification of Future is that task lifecycle can only move forwards, not backwards�just like the ExecutorService lifecycle. Once a task is completed, it stays in that state forever.

The behavior of get varies depending on the task state (not yet started, running, completed). It returns immediately or throws an Exception if the task has already completed, but if not it blocks until the task completes. If the task completes by throwing an exception, get rethrows it wrapped in an ExecutionException; if it was cancelled, get throws CancellationException. If get throws ExecutionException, the underlying exception can be retrieved with getCause.

There are several ways to create a Future to describe a task. The submit methods in ExecutorService all return a Future, so that you can submit a Runnable or a Callable to an executor and get back a Future that can be used to retrieve the result or cancel the task. You can also explicitly instantiate a FutureTask for a given Runnable or Callable. (Because FutureTask implements Runnable, it can be submitted to an Executor for execution or executed directly by calling its run method.)

Listing 11. Callable and Future interfaces

                        

public interface Callable { V call() throws Exception;

public interface Future { boolean cancel(boolean mayInterruptIfRunning); boolean isCancelled(); boolean isDone(); V get() throws InterruptedException, ExecutionException, CancellationException; V get(long timeout, TimeUnit unit) throws InterruptedException, ExecutionException, CancellationException, TimeoutException; }

As of Java 6, ExecutorService implementations can override newTaskFor in AbstractExecutorService to control instantiation of the Future corresponding to a submitted Callable or Runnable. The default implementation just creates a new FutureTask, as shown in Listing 12.

Listing 12. Default implementation of newTaskFor in ThreadPoolExecutor

                        protected   RunnableFuture  newTaskFor(Callable  task) {
   return new FutureTask(task);
}
                   

Submitting a Runnable or Callable to an Executor constitutes a safe publication of the Runnable or Callable from the submitting thread to the thread that will eventually execute the task. Similarly, setting the result value for a Future constitutes a safe publication of the result from the thread in which it was computed to any thread that retrieves it via get.

Example: Page renderer with Future

As a first step towards making the page renderer more concurrent, let's divide it into two tasks, one that renders the text and one that downloads all the images. (Because one task is largely CPU-bound and the other is largely I/O-bound, this approach may yield improvements even on single-CPU systems.)

Both Callable and Future can help us express the interaction between these cooperating tasks. In FutureRenderer in Listing 13, we create a Callable to download all the images, and submit it to an ExecutorService. This returns a Future describing the task's execution; when the main task gets to the point where it needs the images, it waits for the result by calling Future.get. If we're lucky, the results will already be ready by the time we ask; otherwise, at least we got a head start on downloading the images.

Listing 13. Waiting for image download with Future

                        

public class FutureRenderer { private final ExecutorService executor = ...; void renderPage(CharSequence source) { final List<ImageInfo> imageInfos = scanForImageInfo(source); Callable<List<ImageData>> task = new Callable<List<ImageData>>() { public List<ImageData> call() { List<ImageData> result= new ArrayList<ImageData>(); for (ImageInfo imageInfo : imageInfos) result.add(imageInfo.downloadImage()); return result;

} }

Future<List<ImageData>> future = executor.submit(task); renderText(source);

try { List<ImageData> imageData =

future.get();

for (ImageData data : imageData) renderImage(data); } catch (InterruptedException e) { // Re-assert the thread's interrupted status Thread.currentThread().interrupt(); // We don't need the result, so cancel the task too future.cancel(true); } catch (ExecutionException e) { throw launderThrowable(e.getCause()); } } }

The state-dependent nature of get means that the caller need not be aware of the state of the task, and the safe publication properties of task submission and result retrieval make this approach thread-safe. The exception handling code surrounding Future.get deals with two possible problems: that the task encountered an Exception, or the thread calling get was interrupted before the results were available.

The FutureRenderer allows the text to be rendered concurrently with downloading the image data. When all the images are downloaded, they are rendered onto the page. This is an improvement in that the user sees a result quickly and it exploits some parallelism, but we can do considerably better. There is no need for users to wait for all the images to be downloaded; they would probably prefer to see individual images drawn as they become available.

Limitations of parallelizing heterogeneous tasks

In the last example, we tried to execute two different types of tasks in parallel� downloading the images and rendering the page. But obtaining significant performance improvements by trying to parallelize sequential heterogeneous tasks can be tricky.

Two people can divide the work of cleaning the dinner dishes fairly effectively: one person washes while the other dries. However, assigning a different type of task to each worker does not scale well; if several more people show up, it is not obvious how they can help without getting in the way or significantly restructuring the division of labor. Without finding finer-grained parallelism among similar tasks, this approach will yield diminishing returns.

A further problem with dividing heterogeneous tasks among multiple workers is that the tasks may have disparate sizes. If you divide tasks A and B between two workers but A takes ten times as long as B, you've only speeded up the total process by 9 percent. Finally, dividing a task among multiple workers always involves some amount of coordination overhead; for the division to be worthwhile, this overhead must be more than compensated by productivity improvements due to parallelism.

The FutureRenderer uses two tasks: one for rendering text and one for downloading the images. If rendering the text is much faster than downloading the images, as is entirely possible, the resulting performance is not much different from the sequential version, but the code is a lot more complicated. And the best we can do with two threads is speed things up by a factor of two. Thus, trying to increase concurrency by parallelizing heterogeneous activities can be a lot of work, and there is a limit to how much additional concurrency you can get out of it.

Note
The real performance payoff of dividing a program's workload into tasks comes when there are a large number of independent, homogeneous tasks that can be processed concurrently.

CompletionService: Executor meets BlockingQueue

If you have a batch of computations to submit to an Executor and you want to retrieve their results as they become available, you could retain the Future associated with each task and repeatedly poll for completion by calling get with a timeout of zero. This is possible, but tedious. Fortunately there is a better way: a completion service.

The CompletionService combines the functionality of an Executor and a BlockingQueue. You can submit Callable tasks to it for execution and use the queuelike methods take and poll to retrieve completed results, packaged as Futures, as they become available. ExecutorCompletionService implements CompletionService, delegating the computation to an Executor.

The implementation of ExecutorCompletionService is quite straightforward. The constructor creates a BlockingQueue to hold the completed results. FutureTask has a done method that is called when the computation completes. When a task is submitted, it is wrapped with a QueueingFuture, a subclass of FutureTask that overrides done to place the result on the BlockingQueue, as shown in Listing 14. The take and poll methods delegate to the BlockingQueue, blocking if results are not yet available.

Listing 14. QueueingFuture class used by ExecutorCompletionService

                        

private class QueueingFuture<V> extends FutureTask<V> { QueueingFuture(Callable<V> c) { super(c); } QueueingFuture(Runnable t, V r) { super(t, r); }

protected void done() { completionQueue.add(this); } }

Example: Page renderer with CompletionService

We can use a CompletionService to improve the performance of the page renderer in two ways: shorter total runtime and improved responsiveness. We can create a separate task for downloading each image and execute them in a thread pool, turning the sequential download into a parallel one: this reduces the amount of time to download all the images. And by fetching results from the CompletionService and rendering each image as soon as it is available, we can give the user a more dynamic and responsive user interface. This implementation is shown in Renderer in Listing 15.

Listing 15. Using CompletionService to render page elements as they become available

Related:
1 2 3 4 Page 3
Page 3 of 4