Java grid computing

An introduction to Java-based frameworks for grid and cluster computing

1 2 3 4 Page 2
Page 2 of 4

Grids and clusters: Purpose and context

As computing power increased and prices dropped over the last part of the 20th century, it became clear that if large numbers of low-cost computers could be used in concert, they could provide supercomputing power at a much lower cost than purpose-built, high-performance supercomputers. This isn't entirely true anymore; my co-worker just purchased a consumer computer that, according to the government's definition, qualifies as a supercomputer. Apple makes this computer, and it costs less than $3,000, including delivery.

As always in engineering, if you optimize one variable, in this case cost, you must offset that gain for a loss in some other aspect of the system. Compared with a supercomputer, clusters pay the price for this low cost in communication overhead and RAM size. Because processes must communicate with other processes via the network, rather than hardware on the motherboard, communication is much slower. Also, high-speed RAM availability is limited by the amount of memory available to hosts in the cluster. Given these constraints, clusters have still proven invaluable in high-performance computing for solving problems that can easily be broken into many smaller tasks and distributed to workers. Ideal problems require little communication between workers, and their work product can be combined or processed in some way after the tasks have been completed.

Grids can certainly solve these sorts of problems, but they might have a supercomputer available for tasks that cannot be broken up so easily. The grid would provide a way to match this supercomputer with your problem, reserve it, authenticate your task and authorize its use of the supercomputer. It would execute the task and provide a way to monitor progress on that supercomputer. When the supercomputer completes your task, it would send the results to you. This supercomputer might even be in a different hemisphere and owned by a different institution. Finally, the grid might even debit your account for using this service.

By way of contrast, a cluster might provide some of these services. It might even be a cluster of supercomputers, but the cluster would probably belong entirely to your institution, and it probably wouldn't bill you. In addition, your institution probably would have a consistent policy and method of authenticating your credentials and authorizing your use of the cluster. More important, the cluster would probably exercise complete and centralized control over its resources.

Grids initially emerged to solve resource-sharing problems across academic and research institutions, where funding for researching a broad topic might be distributed across a variety of institutions that employed researchers focusing on particular aspects of that topic. Often, researchers need to share experimental data generated by expensive sensors. For example, the particle-colliders that provide experimental data for high-energy particle physics are extremely complex and expensive beasts. Modern colliders generate massive amounts of data that must be managed efficiently and replicated to move the data closer to computational resources. When complete, the Large Hadron Collider (LHC) at European Organization for Nuclear Research (CERN) will generate petabytes of data every year. Sharing this experimental data among many geographically distributed research organizations and researchers requires sophisticated resource-sharing technology that can expose those resources in an open, standard, and secure way. These requirements far exceed those of a cluster intended to provide high-performance computing for a given institution.

Ian Foster, widely regarded as the father of grid computing, provides three criteria that must be met in order for something to be categorized as a grid:

  1. Coordinates resources that are not subject to centralized control;
  2. Uses standard, open, general-purpose protocols and interfaces; and
  3. Delivers nontrivial qualities of service.

Of these three, in my mind, the first most clearly defines a grid; it defines a software requirement for enabling resource sharing that crosses organizations. The second makes more sense in the context of the grid, rather than a grid. If you can envision the grid as something akin to the Internet — pervasive, yet providing access to abstract computing resources — the necessity for the second makes more sense. Finally, "nontrivial" qualities of service (QOS) distinguishes a grid from a cluster. Although clusters typically guarantee some level of QOS, they are trivial in comparison to providing QOS that offer decentralized control of resources. Refer to "What Is the Grid?" for a complete explanation of these criteria and grid definitions as they have evolved over the last decade.

1 2 3 4 Page 2
Page 2 of 4