Open source Java projects: Terracotta

A unique approach to clustering conquers scalability and fail-over

By distributing application load among multiple redundant servers, clustering maintains performance and keeps users blissfully unaware of single-server failures. In this Open source Java projects installment, Steven Haines introduces Terracotta, an enterprise Java clustering solution. Find out why Terracotta, unlike traditional clustering solutions, doesn't make you sacrifice an iota of reliability in the name of performance. Level: Intermediate

Terracotta is an open source solution for enterprise Java clustering that boasts near linear scalability and 100 percent reliability. Terracotta supports standard HTTP session clustering in Apache Tomcat and Oracle WebLogic, as well as open source projects such as Struts, Spring, and Hibernate. I'll start by explaining what's unique about Terracotta clustering. Then, after taking you through installation, I'll show you how to configure Terracotta to cluster a sample Web application that uses HTTP sessions, and how to deploy Terracotta clustering in a production environment.

Clustering solves two fundamental problems for mission-critical applications: scalability and fail-over. Scalability measures how well an application can maintain its performance under increasing load. Clustering addresses scalability by letting you distribute the load among several physical servers or server instances. Theoretically perfect scalability is linear: as new servers are added to a cluster, each server adds support for a constant number of users. For example, if one server can support 500 users, then two servers can support 1,000 users and three servers can support 1,500.

Performance vs. scalability

The concepts of performance and scalability are often interwoven, but they're distinct. Performance measures whether an application can respond to a request within its defined service-level agreement (SLA). Scalability measures how well an application can maintain its performance under increasing load. Horizontal clustering distributes load across multiple machines; you can think of it as "scaling out." You can think of vertical clustering as "scaling up," meaning that load is distributed to multiple application server instances running on the same physical server. Vertical clustering can sometimes better utilize all of a server's resources than a single JVM instance.

The other side of clustering is fail-over in the event of server failure. Successful fail-over makes outages transparent to users while maintaining their state within the application. It requires a strategy for replicating a user's state to one or more secondary servers and then, if the first server goes down, redirecting all subsequent requests to the secondary server(s). Deciding how, when, and where to send the data are fundamental challenges to implementing this strategy effectively.

Serialization -- the process of converting a Java object to a binary object -- is the traditional approach to how the data is sent. Application servers typically identify whether a change has been made to stateful objects, serialize those objects, and send them to the replicated servers. This strategy is inefficient because the serialization process is "all or nothing." In many applications, such as those powered by portals, stateful information can be measured in megabytes. Even if a user changes only a single byte, such as changing a preference from "true" to "false," the application server must construct a serialized version of a potentially multi-megabyte object and send all of that data across the network to its replicated servers. This approach's inefficiency hinders linear scalability.

To overcome the limitations of Java serialization, Terracotta uses bytecode instrumentation (BCI) in the Terracotta client to identify the exact properties within stateful objects that change and then replicate only those properties across the cluster.

Bytecode instrumentation

Bytecode instrumentation is a process through which an application's behavior can be modified at runtime. Bytecode is the format Java is compiled to and that the JVM knows how to interpret. The JVM provides "hooks" through which a process can examine and modify bytecode of objects before they are returned to the application that is using them. In the performance-monitoring space, this feature is exploited to mark the time that a method starts and the time that it ends in order to measure its response time. Terracotta uses BCI to intercept changes made to objects so that it can identify those changes and send them to the Terracotta server.

The other fail-over challenge is determining when and where to send clustered information. In traditional solutions, typically data is sent to each replicated server as soon as the user's request is completed. The level of redundancy has a direct impact on clustering performance. In an ideal world, any application server can fail over to any other server. But because this would require network communication to all other servers in the cluster, the performance cost is prohibitive when you have more than a few servers. Most application administrators opt instead for a lesser reliability that balances performance. For example, they might define at most two secondary servers to which a server can fail over. The idea is that the chance of two or three servers going down simultaneously is low, and the performance overhead of replicating stateful information to those servers is manageable.

Terracotta, in contrast, replicates data to multiple servers without compromising reliability. It does so by introducing a new server that hosts all stateful information. When an application makes changes to a stateful object, those changes are sent to the Terracotta server. Then if another server needs access to that data, it is injected into that server on demand. Thus, all servers in the cluster all have access to the same data, but the data is pushed to an individual server only when it is accessed. So the overhead required to replicate stateful information to 100 servers is the same overhead required to replicate to one server. Furthermore, the Terracotta server -- not a server in the cluster -- is the one receiving the request, so the overhead is much less than even replicating to one additional clustered server. And the Terracotta server can be clustered itself to support the redundancy of your stateful objects in case it ever goes down.

Downloading and installing Terracotta

You can download Terracotta from the project site. The download is free (as is the source code), but you need to register first. At the time of this writing, the latest version is 2.7.2. Pick the version for your operating system (either Windows or "All Platforms"); for this example I picked the latter. Decompress the file somewhere locally, such as C:\terracotta-2.7.2. On my Linux box I installed it in my home directory: /home/shaines/lib/terracotta-2.7.2.

Open source licenses

Each of the open source Java projects covered in this series is subject to a license, which you should understand before integrating the project with your own projects. Terracotta is subject to the Terracotta Public License.

The installation creates the following directories:

  • bin contains all of Terracotta's executables, including the files you'll use to start and stop the Terracotta server.
  • config-examples contains sample configurations for clustering Plain Old Java Objects (POJOs), Tomcat, Spring, and WebLogic.
  • docs contains Terracotta's documentation (which comprises HTML documents that link to the Terracotta Web site) and the XML configuration file reference.
  • lib contains the Terracotta compiled JAR files and all of its dependencies.
  • modules contains prebuilt JAR files for integrating with several technologies and projects, such as the Commons collections, Spring, Struts, and GlassFish.
  • samples contains examples that demonstrate how to cluster POJOs, RIFE applications, HTTP sessions, and Spring applications.
  • schema contains the XML schema for the Terracotta configuration file and accompanying documentation.
  • tools contains tools to help you build your Terracotta configuration file. This include the Sessions Configurator, which helps you configure Terracotta for HTTP session clustering.
  • vendor contains external applications. In this version the only application it includes is a version of Tomcat that is configured to work with Terracotta. The Sessions Configurator uses it to help you build HTTP session clustering.

The welcome.sh or welcome.bat file, found in the root of the Terracotta installation, executes a Java Swing application that is a launching pad for exploring POJO and Spring examples. It also launches the Sessions Configurator to help you build a configuration file for HTTP session clustering. It is a good starting point for familiarizing yourself with Terracotta.

Configuring and using Terracotta

Applications can store stateful information in different ways, such as by maintaining data in a cache, persisting data to a database and accessing that database through an object-relational mapping tool such as Hibernate, or by using HTTP Sessions. Terracotta makes it easy to implement any of these solutions, but for the purposes of this example I'll assume that you want to configure and use Terracotta to cluster a Web application's sessions.

The sample application

The sample application is purposely simple. It contains:

  • A servlet that manages a single variable called name in an HttpSession. If name's value is null, then the user is forwarded to a JSP file that prompts for a name. If name has a non-null value, then the user is forwarded to a JSP file that greets the user by name.
  • A JSP files one that prompts the user for his or her name
  • A JSP file that displays the user's name

The point of this example is to illustrate how to configure Terracotta to cluster an HttpSession object. If you can cluster the single String that the example is storing in the HttpSession, you can just as easily cluster a complex object graph stored in an HttpSession.

Listing 1 shows the source code for the SessionServlet.

Listing 1. SessionServlet.java

package com.geekcap.terracottaexample;

import java.io.PrintStream;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import javax.servlet.http.HttpSession;

public class SessionServlet extends HttpServlet
{
  public void service( HttpServletRequest req, HttpServletResponse res )
  {
    try
    {
      // See if there is a name in the request
      String name = req.getParameter( "name" );

      // Load session
      HttpSession session = req.getSession();

   // Put the name into the session
   if( name != null )
   {
     session.setAttribute( "name", name );
   }

   else
   {
     // See if we already have it in the session
     name = ( String )session.getAttribute( "name" );
     if( name == null )
     {
          // Forward the user to the name entry form
       req.getRequestDispatcher( "/enterYourName.jsp" ).forward(
             req, res );
        }
      }

     // We have a name in the session so forward the user to the hello page
     req.getRequestDispatcher( "/hello.jsp" ).forward( req, res );
    }
    catch( Exception e )
    {
      e.printStackTrace();
    }
  }

}

The servlet in Listing 1 first checks to see if the request has a name in it. If it does, then it either sets or overwrites the name in the session. If it does not, then it checks to see if there is a name in the session. If there is, then it forwards the user to the hello.jsp file; otherwise it forwards the user to the enterYourName.jsp file. It is a contrived and superficial application, but it demonstrates how to store a String in a session.

Listing 2 shows the source code for the enterYourName.jsp file.

Listing 2. enterYourName.jsp

<%@ page language="java" contentType="text/html; charset=UTF-8"
    pageEncoding="UTF-8"%>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
   <head>
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
         <title>Hello, Terracotta!</title>
   </head>
<body>
      <form action="SessionServlet" method="post">
Enter your name: <input type="text" name="name"/><input type="submit">
      </form>
   </body>
</html>

Listing 2's enterYourName.jsp constructs a form that submits a user's name to the SessionServlet (which we'll later map to the SessionServlet in the web.xml file).

Listing 3 shows the source code for the hello.jsp file.

Listing 3. hello.jsp

<%@ page language="java" contentType="text/html; charset=UTF-8"
    pageEncoding="UTF-8"%>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Hello, Terracotta</title>
</head>
<body>
Hello, <%= session.getAttribute( "name" ) %>
<p>
<form action="SessionServlet" method="post">
Update your name: <input type="text" name="name"/><input type="submit">
</form>
</body>
</html>
1 2 Page 1