Newsletter sign-up
View all newsletters

Enterprise Java Newsletter
Stay up to date on the latest tutorials and Java community news posted on JavaWorld

Make room for JavaSpaces, Part 5

Make your compute server robust and scalable with Jini and JavaSpaces

  • Digg
  • Reddit
  • SlashDot
  • Stumble
  • del.icio.us
  • Technorati
  • dzone
In "Make Room for JavaSpaces, Part 2: Build a compute server with JavaSpaces" you saw how to build a simple general-purpose compute server using the JavaSpaces technology. Recall that a compute server is a powerful, all-purpose computing engine that accepts tasks, computes them, and returns results. In that design, a master process breaks down a large, compute-intensive problem into smaller tasks -- entries that describe the task and contain a method to perform the necessary computation -- and writes them into a space. In the meantime, worker processes watch the space and retrieve tasks as they become available, compute them, and write the results back to the space, from which the master will retrieve them at some point.

TEXTBOX:

TEXTBOX_HEAD: Make room for JavaSpaces: Read the whole series!



:END_TEXTBOX

Simple as it is, the compute server has some impressive qualities. First, it is general purpose: You can simply send it new tasks whenever you want and know they will be computed. There's no need to bring the compute server down or install new task code on various machines, since executable content is built into the task entries themselves. Second, the compute server scales automatically: As new workers come along, they can pick up tasks and join in the computation, thereby speeding up the overall solution. Workers can come and go from the compute server, without requiring code changes or reconfiguration. And finally, the compute server is well suited to load balancing: workers pick up tasks whenever they can, so the workload is balanced naturally among workers on slower and faster machines.

Despite its admirable properties, the compute server still isn't quite ready for the real world yet. In this article, I'll show you how to make two big improvements to gear the system up for real-world use. If you read Part 4 of this series, you've probably guessed that one of the compute server's weaknesses is that it neglects to account for the presence of partial failure. Armed with what you know about Jini transactions from that article, I'll now have you revisit the compute server code and show you how to make it more fault tolerant. Another potential shortcoming of the compute server is its use of a single JavaSpace, running on a single CPU. Under some circumstances, reliance on one space may introduce a bottleneck, so I'll revisit the compute server code to show how you can make use of multiple spaces to allow for greater scalability.

Adding transactions to the worker

Take another look at the original worker code from the compute server and see why it's not fault tolerant:

public class Worker {
    private JavaSpace space;
    public static void main(String[] args) {
        Worker worker = new Worker();
        worker.startWork();
    }
    public Worker() {
        space = SpaceAccessor.getSpace();
    }
    public void startWork() {
        TaskEntry template = new TaskEntry();
        for (;;) {
            try {
                TaskEntry task = (TaskEntry) 
                    space.take(template, null, Long.MAX_VALUE);
                Entry result = task.execute();
                if (result != null) {
                    space.write(result, null, 1000*60*10);
                }
            } catch (Exception e) {
                System.out.println("Task cancelled");
            }
        }
    }
}


After gaining access to a space and calling the startWork method, the worker repeatedly takes a task entry from the space, computes the task, and writes the result to the space. Note that take and write are both performed under a null transaction, which means each of those operations consists of one indivisible action (the operation itself). Step back and think about one scenario that can occur in networked environments, which are prone to partial failure. Consider the case in which a worker removes a task and begins executing it, and then failure occurs (maybe the worker dies unexpectedly or gets disconnected from the network). In this scenario, the task entry is lost for good, and as a result the overall computation won't ever be fully solved.

  • Digg
  • Reddit
  • SlashDot
  • Stumble
  • del.icio.us
  • Technorati
  • dzone
Comment
Login
Forgot your account info?
Add comment
Anonymous comments subject to approval. Register here for member benefits.
Have a JavaWorld account? Log in here. Register now for a free account.
Resources