Java EE (Java Platform, Enterprise Edition) applications, regardless of the application server they are deployed to, tend to experience the same sets of problems. As a Java EE tuner, I have been exposed to a variety of environments and have made some observations about common problems. In this capacity, I see my role as similar to that of an automobile mechanic: you tell your mechanic that the engine is chirping; then he asks you a series of questions that guide you in quantifying the nature, location, and circumstances of the chirp. From this information, he forms a good idea about a handful of possible causes of the problem.
In much the same way, I spend the first day of a tuning engagement interviewing my clients. During this interview, I look for known problems as well as architectural decisions that may negatively affect the performance of the application. With an understanding of the application architecture and the symptoms of the problem, I greatly increase my chances of resolving the problem. In this chapter, I share some of the common problems that I have encountered in the field and their symptoms. Hopefully, this article can serve as a troubleshooting manual for your Java EE environment.
One of the most common problems that plagues enterprise applications is the dreaded
OutOfMemoryError. The error is typically followed by one of the following:
- An application server crash
- Degraded performance
- A seemingly endless loop of repeated garbage collections that nearly halts processing and usually leads to an application server crash
Regardless of the symptoms, you will most likely need to reboot the application server before performance returns to normal.
Causes of out-of-memory errors
Before you attempt to resolve an out-of-memory error, first understanding how it can occur is beneficial. If the JVM runs out of memory anywhere in its process memory space, including all regions in the heap as well as the permanent memory space, and a process attempts to create a new object instance, the garbage collector executes to try to free enough memory to allow the new object's creation. If the garbage collector cannot free enough memory to hold the new object, then it throws an
Out-of-memory errors most commonly result from Java memory leaks. Recall from previous discussions that a Java memory leak is the result of maintaining a lingering reference to an unused object: you are finished using an object, but because one or more other objects still reference that object, the garbage collector cannot reclaim its memory. The memory occupied by that object is thus lost from the usable heap. These types of memory leaks typically occur during Web requests, and while one or two leaked objects may not crash your application server, 10,000 or 20,000 requests might. Furthermore, most objects that are leaked are not simple objects such as
Doubles, but rather represent subgraphs within the heap. For example, you may inadvertently hold on to a
Person object, and that
Person object has a
Profile object that has several
PerformanceReview objects that each maintain sets of data. Rather than losing 100 bytes of memory that the
Person object occupies, you lose the entire subgraph that might account for 500 KB or more of memory.
In order to identify the root of this problem, you need to determine whether a real memory leak exists or whether something else is manifesting as an
OutOfMemoryError. I use the following two techniques when making this determination:
- Analyze deep memory statistics
- Inspect the growth pattern of the heap
The JVM tuning process is not the same for all JVMs, such as Sun and IBM, but some commonalities exist.
Sun JVM memory management
The Sun JVM is generational, meaning that objects are created in one space and given several chances to die before they are tenured into a long-term space. Specifically, the Sun JVM is broken into the following spaces:
- Young generation, including Eden and two survivor spaces (the From space and the To space)
- Old generation
- Permanent generation
Figure 1 illustrates the breakdown of the Sun heap's generations and spaces.
Objects are created in Eden. When Eden is full, the garbage collector iterates over all objects in Eden, copies live objects to the first survivor space, and frees memory for any dead objects. When Eden again becomes full, it repeats the process by copying live objects from Eden to the second survivor space, and then copying live objects from the first survivor space to the second survivor space. If the second survivor space fills and live objects remain in Eden or in the first survivor space, then these objects are tenured (that is, they are copied to the old generation). When the garbage collector cannot reclaim enough memory by executing this type of minor collection, also known as a copy collection, then it performs a major collection, also known as a stop-the-world collection. During the stop-the-world collection, the garbage collector suspends all threads and performs a mark-and-sweep collection on the entire heap, leaving the entire young generation empty and ready to restart this process.
Figures 2 and 3 illustrate how minor collections run.
Figure 4 illustrates how a major collection runs.
From Sun's implementation of garbage collection, you can see that objects in the old generation can be collected only by a major collection. Long-lived objects are expensive to clean up, so you want to ensure that short-lived objects die in a timely manner before they have a chance to be tenured, and hence require a major garbage collection to reclaim their memory.
All of this background prepares us to identify memory leaks. Memory is leaked in Java when an object maintains an unwanted reference to another object, hence stopping the garbage collector from reclaiming its memory. In light of the architecture of the Sun JVM, objects that are not dereferenced will make their way through Eden and the survivor spaces into the old generation. Furthermore, in a multiuser Web-based environment, if multiple requests are being made to leaky code, we will see a pattern of growth in the old generation.
Figure 5 highlights potential candidates for leaked objects: objects that survive multiple major collections in the tenured space. Not all objects in the tenured space represent memory leaks, but all leaked objects will eventually end up in the tenured space. If a true memory leak exists, the tenured space will begin filling up with leaked objects until it runs out of memory.
Therefore, we want to track the effectiveness of garbage collection in the old generation: each time that a major garbage collection runs, how much memory is it able to reclaim? Is the memory use in the old generation growing according to any discernable pattern?
Some of this information is available through monitoring APIs, and detailed information is available through verbose garbage collection logs. The level of logging affects the performance of the JVM, and as with almost any monitoring technology, the more detailed (and useful) information you want, the more expensive it is to obtain. For the purposes of determining whether a memory leak exists, I use relatively standard settings that show the overall change in generational memory between garbage collections and draw conclusions from that. Sun reports the overhead for this level of logging at approximately 5 percent, and many of my clients run with these settings enabled all the time to ensure that they can manage and tune garbage collection. The following settings usually give you enough information to analyze:
–verbose:gc –xloggc:gc.log –XX:+PrintGCDetails –XX:+PrintGCTimeStamps
Observable trends in the heap overall can point to a potential memory leak, but looking specifically at the growth rate of the old generation can be more definitive. But remember that none of this investigation is conclusive: in order to conclusively determine that you have a memory leak, you need to run your application off-line in a memory profiler.
IBM JVM memory management
The IBM JVM works a little differently. Rather than starting with a large generational heap, it maintains all objects in a single space and frees memory as the heap grows. It runs different levels of garbage collections. The main behavior of this heap is that it starts relatively small, fills up, and at some point executes a mark-sweep-compact garbage collection to clean up dead objects as well as to compact live objects at the bottom of the heap. As the heap grows, long-lived objects get pushed to the bottom of the heap. So your best bet for identifying potential memory leaks is to observe the behavior of the heap in its entirety: is the heap trending upward?
Resolving memory leaks
Memory leaks are elusive, but if you can identify the request causing the memory leak, then your work is much easier. Take your application to a development environment and run it inside a memory profiler, performing the following steps:
- Start your application inside the memory profiler
- Execute your use-case (make the request) once to allow the application to load all of the objects that it needs in memory to satisfy the request; this reduces the amount of noise that you have to sift through later
- Take a snapshot of the heap to capture all objects in the heap before the use-case has been executed
- Execute your use-case again
- Take another snapshot of the heap to capture all objects in the heap after the use-case has been executed
- Compare the two snapshots and look for objects that should not remain in the heap after executing the use-case
At this point, you will need access to developers involved in coding the request you are testing so that they can make a determination about whether an object is, in fact, being leaked or if it is supposed to remain in memory for some purpose.
If nothing screams out as a leaked object after performing this exercise, one trick I sometimes use is to perform Step 4 a distinct number of times. For example, I might configure my load tester to execute the request 17 times, in hopes that my leak analysis might show 17 instances of something (or some multiple of 17). This technique is not always effective, but it has greatly helped me out when each execution of a request leaks objects.
If you cannot isolate the memory leak to a specific request, then you have two options:
- Profile each suspected request until you find the memory leak
- Configure a monitoring tool with memory capabilities
The first option is feasible in a small application or if you were lucky enough to partially isolate the problem, but not very feasible for large applications. The second option is more effective if you can gain access to the monitoring tools. These tools track object creation and destruction counts through bytecode instrumentation and typically report the number of objects held in predefined or user-defined classes, such as the Collections classes, as a result of individual requests. For example, a monitoring tool might report that the
/action/login.do request left 100 objects in a
HashMap after it completed. This report does not tell you where the memory leak is in the code or the specific object that it leaks, but it tells you, with very low overhead, what requests you need to look at inside a memory profiler. Finding memory leaks in a production environment without crashing your application server is tricky, but tools with these monitoring capabilities make your job much easier!
Artificial memory leaks
A few issues can appear to be memory leaks that in actuality are not. I refer to these as artificial memory leaks, and they may appear in the following situations:
- Premature analysis
- Leaky sessions
- Permanent space anomalies
This section examines each artificial memory leak, describing how to detect it and how to work around it.
To avoid a false positive when searching for memory leaks, you need to ensure that you are observing and analyzing the heap at the appropriate time. The danger is that, because a certain number of long-lived objects need to be in the heap, a trend may look deceiving until the heap reaches a steady state and contains its core objects. Wait until your application reaches this steady state prior to performing any trend analysis on the heap.
To detect whether or not you are analyzing the heap prematurely, continue monitoring it after your analysis snapshot for a couple hours to see if the upward heap trend levels off or if it continues upward indefinitely. If the trend levels off, then capture a new memory recording at this point. If the trend continues upward, then analyze the memory session you have.