Enterprises are choosing Java 2, Enterprise Edition (J2EE) to deliver their mission-critical applications over the Web. Within the J2EE framework, clusters provide mission-critical services to ensure minimal downtime and maximum scalability. A cluster is a group of application servers that transparently run your J2EE application as if it were a single entity. To scale, you should include additional machines within the cluster. To minimize downtime, make sure every component of the cluster is redundant.
In this article we will gain a foundational understanding of clustering, clustering methods, and important cluster services. Because clustering approaches vary across the industry, we will examine the benefits and drawbacks of each approach. Further, we will discuss the important cluster-related features to look for in an application server.
To apply our newly acquired clustering knowledge to the real world, we will see how HP Bluestone Total-e-Server 7.2.1, Sybase Enterprise Application Server 3.6, SilverStream Application Server 3.7, and BEA WebLogic Server 6.0 each implement clusters.
In Part 2 of this series, we will cover programming and failover strategies for clusters, as well as test our four application server products to see how they scale and failover.
J2EE application server vendors define a cluster as a group of machines working together to transparently provide enterprise services (support for JNDI, EJB, JSP, HttpSession and component failover, and so on). They leave the definition purposely vague because each vendor implements clustering differently. At one end of the spectrum rest vendors who put a dispatcher in front of a group of independent machines, none of which has knowledge of the other machines in the cluster. In this scheme, the dispatcher receives an initial request from a user and replies with an HTTP redirect header to pin the client to a particular member server of the cluster. At the other end of the spectrum reside vendors who implement a federation of tightly integrated machines, with each machine totally aware of the other machines around it along with the objects on those machines.
In addition to machines, clusters can comprise redundant and failover-capable:
- Load balancers: Single points of entry into the cluster and traffic directors to individual Web or application servers
- Web servers
- Gateway routers: Exit points out of an internal network
- Multilayer switches: Packet and frame filters to ensure that each machine in the cluster receives only information pertinent to that machine
- Firewalls: Cluster protectors from hackers by filtering port-level access to the cluster and internal network
- SAN (Storage Area Networking) switches: Connect the application servers, Web servers, and databases to a backend storage medium; manage which physical disk to write data to; and failover
Regardless of how they are implemented, all clusters provide two main benefits: scalability and high availability (HA).
Scalability refers to an application's ability to support increasing numbers of users. Clusters allow you to provide extra capacity by adding extra servers, thus ensuring scalability.
HA can be summed up in one word: redundancy. A cluster uses many machines to service requests. Therefore, if any machine in a cluster fails, another machine can transparently take over.
A cluster only provides HA at the application server tier. For a Web system to exhibit true HA, it must be like Noah's ark in containing at least two of everything, including Web servers, gateway routers, switching infrastructures, and so on. (For more on HA, see the HA Checklist.)
J2EE clusters usually come in two flavors: shared nothing and shared disk. In a shared-nothing cluster, each application server has its own filesystems with its own copy of applications running in the cluster. Application updates and enhancements require updates in every node in the cluster. With this setup, large clusters become maintenance nightmares when code pushes and updates are released.
In contrast, a shared-disk cluster employs a single storage device that all application servers use to obtain the applications running in the cluster. Updates and enhancements occur in a single filesystem and all machines in the cluster can access the changes. Until recently, a downside to this approach was its single point of failure. However, SAN gives a single logical interface into a redundant storage medium to provide failover, failback, and scalability. (For more on SAN, see the Storage Infrastructure sidebar.)
When comparing J2EE application servers' cluster implementations, it's important to consider:
- Cluster implementation
- Cluster and component failover services
- HttpSession failover
- Single points of failure in a cluster topology
- Flexible topology layout
Later on we'll look at how four popular application servers compare in various areas. But first, let's examine each item in more detail.
J2EE application servers implement clustering around their implementation of JNDI (Java Naming and Directory Interface). Although JNDI is the core service J2EE applications rely on, it is difficult to implement in a cluster because it cannot bind multiple objects to a single name. Three general clustering methods exist in relation to each application server's JNDI implementation:
- Shared global
Independent JNDI tree
HP Bluestone Total-e-Server and SilverStream Application Server utilize an independent JNDI tree for each application server. Member servers in an independent JNDI tree cluster do not know or care about the existence of other servers in the cluster. Therefore, failover is either not supported or provided through intermediary services that redirect HTTP or EJB requests. These intermediary services are configured to know where each component in the cluster resides and how to get to an alternate component in case of failure.
One advantage of the independent JNDI tree cluster: shorter cluster convergence and ease of scaling. Cluster convergence measures the time it takes for the cluster to become fully aware of all the machines in the cluster and their associated objects. However, convergence is not an issue in an independent JNDI tree cluster because the cluster achieves convergence as soon as two machines start up. Another advantage of the independent JNDI tree cluster: scaling requires only the addition of extra servers.
However, several weaknesses exist. First, failover is usually the developer's responsibility. That is, because each application server's JNDI tree is independent, the remote proxies retrieved through JNDI are pinned to the server on which the lookup occurred. Under this scenario, if a method call to an EJB fails, the developer has to write extra code to connect to a dispatcher, obtain the address of another active server, do another JNDI lookup, and call the failed method again. Bluestone implements a more complicated form of the independent JNDI tree by making every request go through an EJB proxy service or Proxy LBB (Load Balance Broker). The EJB proxy service ensures that each EJB request goes to an active UBS instance. This scheme adds extra latency to each request but allows automatic failover in between method calls.
Centralized JNDI tree
Sybase Enterprise Application Server implements a centralized JNDI tree cluster. Under this setup, centralized JNDI tree clusters utilize CORBA's CosNaming service for JNDI. Name servers house the centralized JNDI tree for the cluster and keep track of which servers are up. Upon startup, every server in the cluster binds its objects into its JNDI tree as well as all of the name servers.
Getting a reference to an EJB in a centralized JNDI tree cluster is a two-step process. First, the client looks up a home object from a name server, which returns an interoperable object reference (IOR). An IOR points to several active machines in the cluster that have the home object. Second, the client picks the first server location in the IOR and obtains the home and remote. If there is a failure in between EJB method invocation, the CORBA stub implements logic to retrieve another home or remote from an alternate server listed in the IOR returned from the name server.
The name servers themselves demonstrate a weakness of the centralized JNDI tree cluster. Specifically, if you have a cluster of 50 machines, of which five are name servers, the cluster becomes useless if all five name servers go down. Indeed, the other 45 machines could be up and running but the cluster will not serve a single EJB client while the naming servers are down.
Another problem arises from bringing an additional name server online in the event of a total failure of the cluster's original name servers. In this case, a new centralized name server requires every active machine in the cluster to bind its objects into the new name server's JNDI tree. Although it is possible to start receiving requests while the binding process takes place, this is not recommended, as the binding process prolongs the cluster's recovery time. Furthermore, every JNDI lookup from an application or applet really represents two network calls. The first call retrieves the IOR for an object from the name server, while the second retrieves the object the client wants from a server specified in the IOR.
Finally, centralized JNDI tree clusters suffer from an increased time to convergence as the cluster grows in size. That is, as you scale your cluster, you must add more name servers. Keep in mind that the generally accepted ratio of name server machines to total cluster machines is 1:10, with a minimum number of two name servers. Therefore, if you have a 10-machine cluster with two name servers, the total number of binds between a server and name server is 20. In a 40-machine cluster with four name servers, there will be 160 binds. Each bind represents a process wherein a member server binds all of its objects into the JNDI tree of a name server. With that in mind, the centralized JNDI tree cluster has the worst convergence time among all of the JNDI cluster implementations.
Shared global JNDI tree
Finally, BEA WebLogic implements a shared global JNDI tree. With this approach, when a server in the cluster starts up it announces its existence and JNDI tree to the other servers in the cluster through IP (Internet Protocol) multicast. Each clustered machine binds its objects into the shared global JNDI tree as well as its own local JNDI tree.
Having a global and local JNDI tree within each member server allows the generated home and remote stubs to failover and provides quick in-process JNDI lookups. The shared global JNDI tree is shared among all machines within the cluster, allowing any member machine to know the exact location of all objects within the cluster. If an object is available at more than one server in the cluster, a special home object is bound into the shared global JNDI tree. This special home knows the location of all EJB objects with which it is associated and generates remote objects that also know the location of all EJB objects with which it is associated.
The shared global approach's major downsides: the large initial network traffic generated when the servers start up and the cluster's lengthy convergence time. In contrast, in an independent JNDI tree cluster, convergence proves not to be an issue because no JNDI information sharing occurs. However, a shared global or centralized cluster requires time for all of the cluster's machines to build the shared global or centralized JNDI tree. Indeed, because shared global clusters use multicast to transfer JNDI information, the time required to build the shared global JNDI tree is linear in relation to the number of subsequent servers added.
The main benefits of shared global compared with centralized JNDI tree clusters center on ease of scaling and higher availability. With shared global, you don't have to fiddle with the CPUs and RAM on a dedicated name server or tune the number of name servers in the cluster. Rather, to scale the application, just add more machines. Moreover, if any machine in the cluster goes down, the cluster will continue to function properly. Finally, each remote lookup requires a single network call compared with the two network calls required in the centralized JNDI tree cluster.
All of this should be taken with a grain of salt because JSPs, servlets, EJBs, and JavaBeans running on the application server can take advantage of being co-located in the EJB server. They will always use an in-process JNDI lookup. Keep in mind that if you run only server-side applications, little difference exists among the independent, centralized, or shared global cluster implementations. Indeed, every HTTP request will end up at an application server that will do an in-process JNDI lookup to return any object used within your server-side application.
Next, we turn our attention to the second important J2EE application server consideration: cluster and failover services.