J2EE clustering, Part 2

Migrate your application from a single machine to a cluster, the easy way

Within the J2EE framework, clusters provide an infrastructure for high availability (HA) and scalability. A cluster is a group of application servers that transparently run your J2EE application as if the group were a single entity. However, Web applications behave differently when they are clustered as they must share application objects with other cluster members through serialization. Moreover, you'll have to contend with the extra configuration and setup time.

To avoid major Web application rework and redesign, you should from the very beginning of your development process consider cluster-related programming issues, as well as critical setup and configuration decisions in order to support intelligent load balancing and failover. Finally, you will need to have a management strategy to handle failures.

Read the whole "J2EE clustering" series:

Building on the information in Part 1, I'll impart an applied understanding of clustering. Further, I'll examine clustering-related issues and their possible solutions, as well as the advantages and disadvantages of each choice. I'll also demonstrate programming guidelines for clustering. Finally, I'll show you how to prepare for outages. (Note that, due to licensing constraints, this article will not cover benchmarking.)

Set up your cluster

During cluster setup, you need to make important decisions. First, you have to choose a load balancing method. Second, you must decide how to support server affinity. Finally, you need to determine how you will deploy the server instances among clustered nodes.

Load balancing

You can choose between two generally recognized options for load balancing a cluster: DNS (Domain Name Service) round robin or hardware load balancers.

DNS round robin

DNS is the process by which a logical name (i.e., www.javaworld.com) is converted to an IP address. In DNS round-robin load balancing, a single logical name can return any IP address of the machines in a cluster.

DNS round-robin load balancing's advantages include:

  • Cheap and easy setup
  • Simplicity

Its disadvantages include:

  • No server affinity support. When a user receives an IP address, it is cached on the browser. Once the cache expires, the user makes another request for the IP address associated with a logical name. That second request can return the IP address of any other machine in the cluster, resulting in a lost session.
  • No HA support. Imagine a cluster of n servers. If one of those servers goes down, every nth request to the DNS server will go to the dead server.
  • Changes to the cluster take time to propagate to the rest of the Internet. Many corporations' and ISPs' DNS servers cache DNS lookups from their clients. Even if your DNS list of servers in the cluster could change dynamically, it would take time for the cached entries on other DNS servers to expire. For example, after a downed server is removed from your cluster's DNS list, AOL clients could still attempt to hit the downed server if AOL's DNS servers cached entries to the downed server. As a result, AOL users would not be able connect to your site even if other machines in the cluster were available.
  • No guarantee of equal client distribution across all servers in the cluster. If you don't configure cooperating DNS servers to support DNS load balancing, they could take only the first IP address returned from the initial lookup and use that for their client requests. Imagine a partner corporation with thousands of employees all pinned to a single server in your cluster!

Hardware load balancers

In contrast, a hardware load balancer (like F5's Big IP) solves most of these problems through virtual IP addressing. A load balancer presents to the world a single IP address for the cluster. The load balancer receives each request and rewrites headers to point to other machines in the cluster. If you remove any machine in the cluster, the changes take effect immediately.

Hardware load balancers' advantages include:

  • Server affinity when you're not using SSL
  • HA services (failover, monitoring, and so on)
  • Metrics (active sessions, response time, and so on)
  • Guaranteed equal client distribution across cluster

However, hardware load balancers exhibit disadvantages:

  • High cost -- 0,000 to 0,000, depending on features
  • Complex setup and configuration

Once you have picked your load balancing scheme, you must decide how your cluster will support server affinity.

Server affinity

Server affinity becomes a problem when using SSL without Web server proxies. (Server affinity directs a user to a particular server in the cluster for the duration of her session.) Hardware load balancers rely on cookies or URL readings to determine where requests are directed. If requests are SSL encrypted, hardware load balancers cannot read the header, cookie, or URL information. To solve the problem, you have two choices: Web server proxies or SSL accelerators.

Web server proxies

In this scenario, a hardware load balancer acts like a DNS load balancer for the Web server proxies, except that it acts through a single IP address. The Web servers decrypt SSL requests and pass them to the Web server plug-in (Web server proxy). Once the plug-in receives a decrypted request, it can parse the cookie or URL information and redirect the request to the application server where the user's session state resides.

With Web server proxies, the major advantages include:

  • Server affinity with SSL
  • No additional hardware required (only the hardware load balancer is needed)

The disadvantages are:

  • The hardware load balancer cannot use metrics to direct requests
  • Extensive SSL use puts an additional strain on the Web servers
  • Web server proxies need to support server affinity

If a large portion of the transactions your site processes must be secure, SSL accelerators, explained in the next section, can add flexibility to your cluster topology while supporting server affinity.

SSL accelerators

SSL accelerator networking hardware processes SSL requests to the cluster. It sits in front of the hardware load balancer, allowing the hardware load balancer to read decrypted information in cookies, headers, and URLs. The hardware load balancer can then use its own metrics to direct requests. With this setup, you can avoid Web proxies if you choose and still achieve server affinity through SSL.

With SSL accelerators, you benefit from:

  • A flexible topology layout (with Web proxies or without) that supports server affinity and SSL
  • Off-loaded SSL processing to the SSL accelerator, which increases scalability
  • Centralized SSL certificate management in a single box

The disadvantages comprise:

  • A high cost when you buy two accelerators to achieve HA
  • Added setup and configuration complexity

Once you have decided on your server affinity setup, you need to tactically place your application server instances throughout the cluster nodes.

Application server distribution

When distributing application server instances throughout your cluster, you must decide whether or not you want multiple application server instances on single nodes in the cluster, and determine the total number of nodes in your cluster.

The number of application server instances on a single node depends on the number of CPUs on the box, CPU utilization, and available memory. Consider multiple instances on a single box in any of three situations:

  • You have three or more CPUs not fully saturated under load
  • The instance heap size is set too large, causing garbage collection times to increase
  • The application is not I/O bound

Determining the optimal number of nodes in your cluster is an iterative process. First, profile and optimize the application. Second, use load-testing software to simulate your expected peak usage. Finally, add additional surplus nodes to handle the load when failures occur.

Ideally, it would be best to push out development releases to a staging cluster to catch clustering issues as they occur. Unfortunately, developers create most applications for a single machine, then migrate them to a clustered environment, a situation that can break the application.

Session-storage guidelines

To minimize the breakage, follow these general guidelines for application servers that use in-memory or database session persistence:

  • Make sure all objects and those they reference recursively in the HttpSession are serializable. As a rule of thumb, all objects should implement java.io.Serializable as a part of their canonical form.
  • Whenever you change an object's state in the HttpSession, call session.setAttribute(...) to flag the object as changed and save the changes to a backup server or database:

       AccountModel am = (AccountModel)session.getAttribute("account");
       //You need this so the AccountModel object on the backup receives the 
       //Credit card
  • The ServletContext is not serializable, so do not use it as an instance variable (unless it is marked as transient) for any object directly or indirectly stored within the HttpSession. Getting a reference to the ServletContext proves easier in a Servlet 2.3 container when the HttpSessionBindingEvent holds a reference to the ServletContext.
  • EJB remotes may not be serializable. When they are not serializable, you need to override the default serialization mechanism as follows (this class does not implement java.io.Serializable because AccountModel, its superclass, does):

    public class AccountWebImpl extends AccountModel 
       implements ModelUpdateListener, HttpSessionBindingListener { 
               transient private Account acctEjb;
       private void writeObject(ObjectOutputStream s) {
          try {
             Handle acctHandle = acctEjb.getHandle();
                          } catch (IOException ioe) {
             throw new GeneralFailureException(ioe);
                          } catch (RemoteException re) {
                               throw new GeneralFailureException(re);
               private void readObject(ObjectInputStream s) {
                          try {
             Handle acctHandle = (Handle)s.readObject()
                      Object ref = acctHandle.getEJBObject();
                       acctEjb = (Account) PortableRemoteObject.narrow(ref,Account.class);
                          } catch (ClassNotFoundException cnfe) {
                                     throw new GeneralFailureException(cnfe);
                          } catch (RemoteException re) {
                                     throw new GeneralFailureException(re);
                          } catch (IOException ioe) {
             throw new GeneralFailureException(ioe);
  • HttpSessionBindingListener's valueBound(HttpSessionBindingEvent event) method is called after the session is restored from disk and after every call to HttpSession's setAttribute(...) method. valueBound(HttpSessionBindingEvent event), however, is not called during failover.

In-memory session state replication

In-memory session state replication proves more complicated than database persistence because individual objects in the HttpSession are serialized to a backup server as they change. With database session persistence, the objects in the session are serialized together when any one of them changes. As a side effect, in-memory session state replication clones all HttpSession objects stored directly under a session key. This has virtually no effect if each object stored under a session key is independent of the other objects stored under different session keys. However, if the objects stored under session keys are highly dependent on other objects within the HttpSession, copies will be made on the backup machine. After failing over, your application will continue to run but some features may not work -- a shopping cart may refuse to accept more items, for instance. The problem stems from different parts (updating the shopping cart and displaying the shopping cart) of your application referring to their own copies of the shopping cart object. The class responsible for updating the cart is making changes to its copy of the shopping cart while the JSPs attempt to display their copy of the shopping cart. In a single-server environment, this problem would not arise because both parts of your application would point to the same shopping cart.

Here is an example of indirectly copying objects:

import java.io.*;
public class Aaa implements Serializable {
   String name;
   pubic void setName (String name) {
                      this.name = name;
   public String getName( ) {
                      return name;
import java.io.*;
public class Bbb implements Serializable {
   Aaa a;
   public Bbb (Aaa a) {
                      this.a = a;
   pubic void setName (String name) {
   public String getName( ) {
                      return a.getName();
In first.jsp on server1:
Aaa a = new Aaa();
Bbb b = new Bbb(a);
// a is copied to backup machine under key "a"
// b is copied to backup machine under key "b"
In second.jsp on server1:
Bbb b = (Bbb)session.getAttribute("b");
// b is copied to backup machine under key "b"
// but object Aaa under key "a" has the name "Abe"
// and "b"'s Aaa has the name "Bob"
---> Failure trying to get to server1's third.jsp
----->Failover to server2's (backup machine) third.jsp
In third.jsp on server2:
The name associated with Object Aaa is:
The name associated with Object Aaa through Object Bbb is:
//End of third.jsp

The first expression tag outputs "Abe", while the second expression tag outputs "Bob". On a single server, both expressions would output "Bob".

To see invalid session state copies, create a Cluster Object Relationship Diagram (CORD) like the figure below.

JavaPetStore CORD. Click on thumbnail to view full-size image.

In the diagram, the ovals represent objects directly stored under a key value in the HttpSession. The squares represent objects indirectly referenced by an object stored under an HttpSession key. The filled-in arrows point to objects that are the source object's instance variables. The open arrows indicate an inheritance relationship.

Counting the number of arrows pointing into an object usually tell you the number of copies of that object will be generated. If the arrow points to an oval, you have to add one to the total number of arrows because the oval is stored in the HttpSession under a session key. For example, look at the ContactInformation oval. It has one arrow pointing into it, indicating two copies: one stored directly under a session key and another stored as an instance variable of AccountWebImpl.

I say usually because complex relationships tend to bend this rule of thumb. Look at ShoppingClientControllerWebImpl; it has one arrow pointing into it, so there are two copies, right? Wrong! There are three copies: one through ModelManager, one through AccountWebImpl, and one through RequestProcessor. You cannot count ShoppingClientControllerWebImpl itself because it is not directly stored under an HttpSession key.

Let's try another example. How many copies of ModelManager are there? If you answered four, you are right. There is a copy from ModelManager itself because it is stored under an HttpSession key, AccountWebImpl, ShoppingCartWebImpl, and RequestProcessor. I didn't count the arrow from RequestToEventTranslator because it is not directly stored under an HttpSession key, and its reference to ModelManager is the same as the reference from RequestProcessor.

Note: With in-memory replication, always remember: any object stored under an HttpSession key will be cloned and sent to the backup machine.

To keep multiple object copies to a minimum, store primitive object types, simple objects, or independent objects under session keys in HttpSession. If you can't avoid having complex interrelated objects in the HttpSession, use session.setAttribute(...) to remove copies once the user has failed over. For example, to replace the clone of ModelManager in AccountWebImpl with the real ModelManager, you need to call session.setAttribute("account",accountWebImplWithRealModelManager). In most cases, you will focus your analysis to copied objects with shared state. If the object doesn't have any shared instance variables, you can leave it alone. The single side effect: increased memory usage for every client that fails over. After testing proves that everything fails over properly in your cluster, you need to develop a cluster management strategy.

A cluster management strategy document defines what can go wrong and how to solve the problems. Generally, there are four categories of problems:

  1. Hardware
  2. Software
  3. Network
  4. Database

Since this is JavaWorld, I will focus on software. You will probably want to discuss the other issues with your local sysadmin, network admin, and DBA, respectively.

If software on a machine in a cluster fails, other cluster members must assume responsibility for the downed server, a process known as failover. In JSP applications, failover occurs at the Web proxy or hardware load balancer. For example, a request comes into the Web proxy, which notices the server it is trying to reach is down, so it sends the request to another application server. This application server activates the users' backup session state and fulfills the request. In an EJB client application, the remote references try different servers in the cluster until they receive a response.

This description makes failover seem trivial, but we have covered only the best-case scenario in which a failure occurs between requests. Failover proves more difficult when a failure occurs during a request. Let's take a look at examples involving JSPs and EJBs.

For our JSP example, imagine a user placing an order on your Website. When the user clicks the Submit button, the page seems to hang, and then he receives a message that reads "Response contained no data." In this situation, one of two things could have happened. The order may have completed but the server failed before delivering the response page. On the other hand, the server may have failed before saving the order and sending the response. Does the proxy automatically call the same URL on another server and reinitiate the transaction, causing possible duplicates? Or does it return an error and force the user to contact a member of your support team?

In our EJB example, imagine a user modifying inventory on a standalone Java application. Once the user clicks the Save button, a request reaches an EJB, but the application returns a RemoteException. Again, one of two things could have happened. The inventory could have been saved, but the server may have failed before delivering a reply. Or, the server could have failed before making inventory changes. Should the EJB remote automatically reattempt the same remote call on another server in the cluster and risk duplicates or inconsistent data?

After trying different failover techniques, I concluded that responsibility for transparent failover rests with the application server. You can try to use transaction tables or transaction identifiers in cookies, but the pain-to-gain ratio doesn't justify either. The simplest solution: after a server failure, run a set of stored procedures that find inconsistencies within the database and notify the interested parties.

Wrap it all up

This article has given you an applied understanding of clustering. You've learned how to set up, program for, and maintain J2EE clusters. You've seen the issues related to clustering, as well as possible solutions. With this practical knowledge, you can set up a working J2EE cluster.

But your work is not done: you have to develop the network, hardware, and database infrastructure to ensure HA. The easiest way to develop these services is through an enterprise hosting service. Most enterprise hosting services will set up your redundant networking, database, and hardware services. You just provide the clusterable code. Good luck and happy clustering.

Learn more about this topic

Join the discussion
Be the first to comment on this article. Our Commenting Policies