J2EE technology has been wildly successful in recent years. But provisioning, deploying, and managing J2EE applications has become a problem for many enterprises—the sheer number of server systems that must be configured and managed can tax the human and facilities resources available to today's data centers. A new technology called network attached processing can help tackle these consolidation and management challenges associated with J2EE development.
Victims of our own success
Management guru Peter Senge once observed that "today's problems come from yesterday's solutions." While Senge had business and governmental policy decisions in mind when he coined this saying, it applies equally well to software technology, and the rapid rate of change in software development practices makes it all the more easy to see the truth in this observation. For example, the client-server revolution of the early 1990s was intended to alleviate the difficulty and cost of developing mainframe applications, and make it more practical to tackle larger software problems.
But the world doesn't stand still in the wake of innovation. As the demand for ever more complex enterprise applications continued to increase, the limits of the client-server approach were soon realized. Distributed objects, Web applications, application servers, service-oriented architectures (SOA)—each of these generations of enterprise software development technology addressed some of the limitations of their predecessors and enabled the development of more sophisticated applications. But with each generation of technology, new problems surfaced, such as scalability, manageability, or development complexity.
As developers, we welcome these rapid changes in technology—they feed us with interesting new ideas to ponder and new techniques to master, increase the scope and complexity of problems we can solve with a given resource budget, and, of course, keep our skills in high demand.
The technology model of virtual machines and application servers has reached a similar point in its evolution: While it has become the dominant form of enterprise software development (Gartner estimates that by the end of 2008, 80 percent of all new e-business application development will be based on virtual machines), we are struggling with the consequences of this model's success—deployment complexity, challenges in capacity management and planning, and the difficulty of developing applications that scale well across a variety of clustering topologies. Application servers alleviate some of these problems, but they contribute to the complexity as well. Java technology has more than proven its value in high-throughput, transactional enterprise applications, but it may well have reached the point where it has become the victim of its own success.
Today's challenges in application deployment
Processing hardware today is incredibly cheap—for a few thousand dollars, you can buy commodity, rack-mount, fault-tolerant systems today that can outperform yesterday's mainframes. But today's applications are far more resource-hungry than yesterday's applications. The technologies we use to accelerate software development, improve software reliability, and manage software complexity consume much of the computational surplus; increased user expectations consume the rest and then some. As a result, most enterprise applications are too big to run on a single low-end server and are often distributed across clusters of inexpensive server hosts. The software technology for clustering J2EE applications is readily available, but it, in turn, imposes a cost in terms of additional resource consumption and in development, deployment, and management complexity.
While clustering can provide benefits such as high availability and fault tolerance, clustering an application across 100 servers because that's the only way to obtain the desired throughput has a definite negative impact on the design, development, and deployment process. (Clustering support in application servers is constantly improving, but no matter how much it improves, developing an application that runs in a single JVM will still be easier than developing a clustered application.) So, if we were confident we could run our application in a single JVM, even as the load increases, we could save a lot of effort in development, deployment, and management, which we could instead use to build more powerful applications.
Clustering also poses a significant operational challenge. Today's data centers often house hundreds of application server hosts (so called "server sprawl"), which means that system administrators must manage and provision these servers while providing a high level of availability and utilization. As processing demand increases, operational staff must quickly add capacity and reconfigure applications without disrupting service. Or, if the deployment time for new capacity exceeds the reprovisioning time requirements, the application may instead be over-provisioned up-front to allow for unexpected growth. And, of course, management would like to minimize the cost of support staff such as system and database administrators.
Because the individual server systems have a comparatively low computing capacity relative to the demand of a typical enterprise application, server hosts are not commonly shared across applications; an application will consume one or more boxes, but not fractional boxes. This approach has a significant cost for many enterprises in that resources allocated to each application are sized based on the application's peak demand (multiplied by the desired margin of safety), even if the average demand is much lower than the peak demand. If an enterprise runs multiple applications whose periods of expected peak load are uncorrelated, which is true of most enterprises, the result is significant wasted capacity—and attendant cost—across applications. Utilization often looks like the diagrams show in Figure 1.
These common provisioning practices, deliberate over-provisioning and provisioning for peak demand on a per-application basis, contribute to higher hardware costs, higher staffing costs, and lower resource utilization—reducing return on IT investment.
Reduce management and capacity-planning complexity with network attached processing
Enterprises have long since recognized the value of network attached storage (NAS) and storage area network (SAN) technologies as a means of reducing cost and improving quality of service and reliability. By managing storage capacity across the enterprise instead of on a per-system basis, storage utilization is improved; capacity can be easily added, allocated, or reallocated as needed; critical data can be more easily replicated and backed up; and demand can be more easily monitored and centrally managed.
NAS and SAN were disruptive technologies in 1995; now they are mainstream, accepted solutions to the problems of storage management. These problems are, in fact, quite similar to the problems I've already discussed with server provisioning—large and growing numbers of boxes to manage; multiple boxes required per application; boxes allocated exclusively to specific applications, impairing efficient utilization; and high costs associated with reprovisioning resources should demand patterns change. Network attached processing, a new technology that can do the same for computing resources that NAS does for storage, seems just what the doctor ordered for today's data centers.
Just as NAS starts with a huge (and expandable) pool of storage and then lets administrators carve the storage into virtual storage devices and dynamically set resource utilization policies, network attached processing starts with a large pool of computing resources that can be statically or dynamically allocated to applications according to customizable resource management policies.
Azul's network attached processing solution: Compute appliances
The first (and, to date, only) company to deliver a network attached processing solution is Azul Systems of Mountain View, California. The solution, a 384-processor, 11-U rack-mountable compute appliance with 256 GB of coherent uniform-access memory, utilizes a custom processor explicitly designed for running virtual machine technology, with hardware features to accelerate garbage collection and concurrency.
As with SAN technology, the compute appliance can be logically partitioned into multiple virtual hosts, and CPU and memory resources can be dynamically allocated among virtual hosts. These virtual hosts can be thought of as "mountable compute pools," enabling multiple applications to draw compute power from a huge, enterprise-wide pool. A single 384-way compute appliance can handle 50 or more virtual machine-based applications with thousands of threads and heaps of 100 GB.
Since so many applications will rely on them, compute pool servers need to provide high reliability and fault tolerance, and the Azul box has all the expected RAS (reliability, availability, serviceability) features—ECC (error correcting codes) on main memory and on caches, failure detection, and hot-pluggable redundant power supplies and fans. And with pauseless, concurrent, parallel garbage collection, enabled by hardware features such as efficient read barriers and write barriers, response times can be more predictable. Of course, cool technology is only a means to an end—the goal of all this technology is to reduce management and deployment costs for J2EE applications.
Simplified capacity planning
The compute-pool approach turns the capacity planning process upside-down, enabling capacity planning on an enterprise-wide basis. Rather than planning hardware acquisition and server management on a per-application basis, applications can be allocated to virtual servers within the compute pool, and as their demand grows, the resources allocated to that virtual server can be dynamically increased. If multiple applications share a single compute appliance and one grows large enough that the aggregate demand for all the applications exceeds the pool server's capacity, it (or one of the other applications) can simply be seamlessly shifted to another compute appliance with spare capacity, requiring no application-level reconfiguration. Azul's Compute Pool Manager software allows resource allocation and usage policies to be set across all compute appliances.
Better resource utilization
Because peak load times generally differ across applications (payroll utilization is highest at the beginning of the month, benefits utilization is highest during open enrollment, etc.), managing capacity on an enterprise-wide basis allows greater hardware utilization because you no longer need to purchase and provision enough extra capacity to service each application at its peak load—only the peak aggregate load across all applications, which is generally much lower than the sum of individual peak loads. (Consolidating application-tier processing into a compute pool also simplifies the process of monitoring usage and planning capacity.)
Figure 2 shows how the utilization for Figure 1's five applications would look (plus another ten similar applications) when run in a compute pool environment. The result: Utilization improves by a factor of 3 to 5, and flexibility in reallocating resources across applications also improves.
Migrating to network attached processing
For existing applications, a great deal of effort went into configuring and deploying them. While the compute-pool approach might offer greater scalability, there are always costs associated with migrating an application to another platform. With network attached processing, the cost is similar to that of migrating from local storage to network attached storage—the existing hardware and software configuration does not have to change. Rather than moving the entire application to the compute appliance, only the computation is moved—the application, storage, database, and security configuration remains on the existing server host, unchanged. The difference is that the native JVM on the server host is replaced with a proxy JVM, which finds an available compute appliance by consulting Azul's Compute Pool Manager, and then ships the bytecodes that it would have executed to the chosen compute appliance for remote execution.
Any I/O performed by the Java application is handled by a remote callback to the server host, so the compute appliance need not know about any of the resources used by the application—files, database connections, security credentials, or other configuration information. The only difference here is that the application consumes computing power from the compute appliance instead of from the server hosts. Other than network interfaces (there are four gigabit Ethernet interfaces in the 384-way Azul box), the compute appliance has no I/O peripherals—and doesn't need them. I/O can be handled by existing server hosts for existing applications and by cheap, lower-power servers (or even VMware instances, for low-I/O applications) as new applications are configured and deployed.