Tackle Java server capacity problems

Improve the capacity of your Java server application through load testing and analysis

Engineers and their managers are familiar with organizing a set of concrete tasks and driving them to completion. Simple performance problems, which can be isolated by a single developer on a personal machine, are straightforward to manage and remedy. However, large capacity problems, occurring when the system is under load, are common, and handling them requires a completely different approach. These problems require an isolated test environment, a simulated load, and careful analysis and tracking of changes.

In this article, I create a test environment using some easily obtainable tools and equipment. I then walk through the analysis of two capacity problems, focusing on memory and synchronization issues, which can be difficult to expose using a simple profiler. By walking through a concrete example, I hope to make tackling complex capacity problems less daunting and provide insight into the general process.

Improving server capacity

Server capacity improvements are inherently data driven. Making any application or environment changes without reliable data will generally yield poor results. Profilers provide valuable information about Java server applications, but they are frequently inaccurate because data derived from a single application user may look entirely different from the data derived from dozens or even hundreds of application users. Utilizing profilers to optimize application performance during development is a good place to start, but augmenting this common approach by analyzing the application under load yields far better overall results.

Analyzing a server application under load requires a few basic elements:

  1. A controlled environment to load-test the application
  2. A controlled synthetic load to drive the application to full capacity
  3. Data collection from monitors, applications, and the load-testing software itself
  4. The tracking of capacity changes

Underestimating this last requirement, the tracking of capacity, is a mistake because, if you fail to track the capacity, you have no way of actually managing the project. It is unlikely that a 10 or 20 percent gain in capacity will make any noticeable difference when only a single person is using the application, but this is not necessarily obvious to everyone supporting the project. A 20 percent improvement is significant, and, by tracking the capacity improvements, you can provide important feedback and keep the project on track.

As important as tracking capacity is, unfortunately, it is sometimes necessary to invalidate previous test results when making future test results more accurate. Over the course of a capacity project, improving the load test's accuracy may require changes in the simulation and environment. These changes are necessary and, by holding the application constant—load testing before and after other changes—you can carefully record the transition.

A controlled environment

A controlled environment, at a minimum, requires two dedicated machines and a third for a controller. One of the machines generates a load; a second machine, the controller, communicates with the first to set up the test scenario and receive feedback on the test; and the third machine runs your application. In addition, the network between the load and application machines should be isolated from the rest of the LAN. The controller receives feedback from the loaded application machines about OS metrics, hardware utilization, and application metrics, especially the VM in this case.

Load simulation

The most accurate simulations are constructed using actual user datasets and, in the case of Web servers, access logs. If you either have not deployed yet or lack access to actual user data, then you can do well enough by constructing likely scenarios, querying sales and product management teams for specifics, and making a few educated guesses. Reconciling the discrepancies between the load test and actual user experience is an ongoing process.

Several user scenarios are generally necessary in simulation. For instance, in a common address book application, you have separate scenarios for users updating the address book and those querying it. In the simple GrinderServlet class that serves as my test application, I have only one scenario. A single user accesses the servlet 10 times in succession, pausing briefly between each request. Though this application is trivial, I wanted to replicate a couple of common attributes. Users do not make requests of a server continuously without pause. Without allowing for brief pauses, I would have an inaccurate understanding of the number of active users supportable.

The other reason for stringing 10 requests together is because a real application is unlikely to consist of one HTTP request. Single, separate requests could affect numerous elements in the environment. Specifically, Tomcat may create separate sessions for each request, and the HTTP protocol allows separate requests to reuse connections. I am cultivating my load test somewhat to avoid confusing artifacts.

The GrinderServlet does not operate on data of any sort, but this requirement is frequently at the core of most applications. In these applications, when composing a load test, you will need to create a simulated dataset and then construct usage scenarios parameterized with the simulated data.

For example, if your scenario involves a user logging into a Web application, selecting a user at random from a list of possible users is more accurate than using only one. Otherwise, you may mistakenly invoke caching systems, other optimizations, or some subtle and unlikely element of your application that completely distorts your results.

Load-testing software

Load-testing software allows you to construct scenarios and drive a load against your test server. OpenSTA is the load-testing software I use in the following examples. It is fairly simple and quick to learn, archives data easy to export, supports user data parameterized scripts, and monitors a variety of information. Its main drawback is that it is Windows based, but that was not a problem for my environment. Many other solutions are available, for example, Apache's JMeter and Mercury's LoadRunner. All three of these solutions will spread the load generation across a cluster of servers and collect information back to a central control server. Your tests will be more accurate if you use separate, dedicated servers to generate the load and ensure they do not exhaust their hardware resources.

The GrinderServlet

The GrinderServlet class, shown in Listing 1, and the Grinder class, shown in Listing 2, make up my test application.

Listing 1


package pub.capart;

import java.io.*; import java.util.*; import javax.servlet.*; import javax.servlet.http.*;

public class GrindServlet extends HttpServlet { protected void doGet(HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException { Grinderv1 grinder = Grinderv1.getGrinder(); long t1 = System.currentTimeMillis(); grinder.grindCPU(13); long t2 = System.currentTimeMillis();

PrintWriter pw = res.getWriter(); pw.print("<html>\n< body> \n"); pw.print("Grind Time = "+(t2-t1)); pw.print("< body> \n< /html> \n"); } }

Listing 2


package pub.capart;

/** * This is a simple class designed to simulate an application consuming * CPU, memory, and contending for a synchronization lock. */ public class Grinderv1 { private static Grinderv1 singleton = new Grinderv1(); private static final String randstr = "this is just a random string that I'm going to add up many many times";

public static Grinderv1 getGrinder() { return singleton; } public synchronized void grindCPU(int level) { StringBuffer sb = new StringBuffer(); String s = randstr; for (int i=0;i<level;++i) { sb.append(s); s = getReverse(sb.toString()); } } public String getReverse(String s) { StringBuffer sb = new StringBuffer(s); sb = sb.reverse(); return sb.toString(); } }

These listings are brief, but interesting to study because they reproduce two common problems. The most glaring is probably the bottleneck caused by the synchronization modifier on the grindCPU() method, but the memory consumption will actually prove to be an even worse problem. The results of my first load test, displayed in Figure 1, show a modest load gently ramped up against version one of the GrinderServlet. Ramping up the load is important, because otherwise you are simulating a vastly larger initial load. It is also more accurate to "warm up" your application and avoid artifacts such as JSP (JavaServer Pages) compilation. I generally run a single simulated user through the application before beginning the load test.

Figure 1

I use the same capacity summary plot throughout this article. Much more information is available when performing a load test, but this provides a useful summary. The top panel contains throughput, the number of completed requests per second, and request-duration information from the load-testing software. Throughput most accurately quantifies capacity. The second panel contains the number of active users and a failure rate. I consider timeouts, bad server responses, and any requests taking more than five seconds to be failures. The third panel contains JVM memory statistics and CPU utilization. The CPU is an average of user time across all processors. All machines used in my load testing have two processors. The memory statistics contain a graph of garbage collections and the rate of garbage collections per second.

The two most obvious features from Figure 1 are the 50 percent CPU utilization—this test was run on a dual CPU machine—and the enormous amount of memory being consumed and immediately released. The reasons for both should be readily obvious after examining Listing 2. The synchronization modifier serializes all processing, restricting the number of CPUs to just one. The algorithm itself consumes enormous amounts of memory in local variables.

CPU is frequently the limiting resource, and it is tempting to assume in this test that if I can utilize both processors without adding extra overhead, then I will double capacity. The garbage collector is so active that it impossible to see individual collections. The memory deallocated per second is 100 megabytes for the majority of the load test, and this will turn out to be the limiting factor. The number of failures is also striking and may actually render the application completely unusable.

1 2 Page 1