Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs
Page 7 of 7
In general, if you have excessively large sessions, the true resolution is to refactor your application to reduce session memory overhead. The following two workaround solutions can minimize the impact of excessively large sessions:
A larger heap will spend more time in garbage collection, which is not an ideal situation, but a better one than an OutOfMemoryError. Increase the size of your heap to be able to support your sessions for the duration of your time-out value; this means that
you need enough memory to hold all active user sessions as well as all sessions for users who abandon your Website within
the session time-out interval. If the business rules permit, decreasing the session time-out will cause session data to time
out earlier and lessen the impact on the heap memory it is occupying.
In summary, here are the steps to perform, prioritized from most desirable to least desirable:
However, unwanted object references maintained from application-scoped variables, static variables, and long-lived classes are, in fact, memory leaks that need to be analyzed in a memory profiler.
Permanent space anomalies
The purpose of the permanent space in the JVM process memory is typically misunderstood. The heap itself only contains class
instances, but before the JVM can create an instance of a class on the heap, it must load the class bytecode (.class file) into the process memory. It can then use that class bytecode to create an instance of the object in the heap. The space
in the process memory that the JVM uses to store the bytecode versions of classes is the permanent space. Figure 6 illustrates
the relationship between the permanent space and the heap: it exists inside the JVM process memory, but is not part of the
heap itself.

Figure 6. The relationship between the permanent space and the heap
In general, you want the permanent space to be large enough to hold all classes in your application, because reading classes from the file system is obviously more expensive than reading them from memory. To help you ensure that classes are not unloaded from the permanent space, the JVM has a tuning option:
–noclassgc
This option tells the JVM not to perform garbage collection on (and unload) the class files in the permanent space. This tuning option is very intelligent, but it raises a question: what does the JVM do if the permanent space is full when it needs to load a new class? In my observation, the JVM examines the permanent space and sees that it needs memory, so it triggers a major garbage collection. The garbage collection cleans up the heap, but cannot touch the permanent space, so its efforts are fruitless. The JVM then looks at the permanent space again, sees that it is full, and repeats the process again, and again, and again.
When I first encountered this problem, the customer was complaining of very poor performance and an eventual OutOfMemoryError after a certain amount of time. After examining verbose garbage collection logs in conjunction with heap utilization and
process memory utilization charts, I soon discovered that the heap was running well, but the process was running out of memory.
This customer maintained literally thousands of JSPs, and as such, each one was translated to Java code, compiled to bytecode,
and loaded in the permanent space before creating an instance in the heap. Their environment was running out of permanent
space, but because of the –noclassgc tuning option on the heap, the JVM was unable to unload classes to make room for new ones. To correct this out-of-memory
error, I configured their heap with a huge permanent space (512 MB) and disabled the –noclassgc JVM option.
As Figure 7 illustrates, when the permanent space becomes full, it triggers a full garbage collection that cleans up Eden and the survivor spaces, but does not reclaim any memory from the permanent space.
Figure 7. Garbage collection behavior when the permanent space becomes full. Click on thumbnail to view full-sized image.
| Note |
|---|
| When sizing the permanent space, consider using 128 MB, unless your applications have a large number of classes, in which case, you can consider using 256 MB. If you have to configure the permanent space to use anything more, then you are only masking the symptoms of a significant architectural issue. Configuring the permanent space to 512 MB is OK while you address your architectural issues, but just realize that it is only a temporary solution to buy you time while you address the real problems. Creating a 512 MB permanent space is analogous to getting painkillers from your doctor for a broken foot. True, the painkillers make you feel better, but eventually they will wear off, and your foot will still be broken. The real solution is to have the doctor set your foot and put a cast on it to let it heal. The painkillers can help while the doctor sets your foot, but they are used to mask the symptoms of the problem while the core problem is resolved. |
As a general recommendation, when configuring the permanent space, make it large enough to hold all of your classes, but allow
the JVM to unload classes when it needs to. Size it large enough so that hopefully it will not unload classes, but a minor
slowdown to load classes from the file system is far more preferable than a JVM OutOfMemoryError crash!
The main entry point into any Web or application server is a process that receives a request and places it into a request queue for an execution thread to process. After tuning memory, the tuning option with the biggest impact in an application server is the size of the execution thread pool. The size of the thread pool controls the number of simultaneous requests that can be processed at one time. If the pool is sized too small, then requests will wait in the queue for processing, and if the pool is sized too large, then the CPU will spend too much time switching contexts between the various threads.
Each server has a socket it listens on. A process that receives an incoming request places the request into an execution queue, and the request is subsequently removed from the queue by an execution thread and processed. Figure 8 illustrates the components that make up the request processing infrastructure inside a server.
Figure 8. The request processing infrastructure inside a server. Click on thumbnail to view full-sized image.
When my clients complain of degraded performance at relatively low load that worsens measurably as the load increases, I first check the thread pools. Specifically, I am looking for the following information:
When the thread pool is 100 percent in use and requests are pending, the response time degrades substantially, because requests that otherwise would be serviced quickly spend additional time inside a queue waiting for an execution thread. During this time, CPU utilization is usually low, because the application server is not doing enough work to keep the CPU busy. At this point, I increase the size of the thread pool in steps, monitoring the throughput of the application until it begins to decrease. You need consistent load or, even better, an accurate load tester to ensure your measurements' accuracy. Once you observe a dip in the throughput, lower the thread pool size down one step, to the size where throughput was maximized.
Figure 9 illustrates the behavior of a thread pool that is sized too small.
Figure 9. When all threads are in use, requests back up in the execution queue. Click on thumbnail to view full-sized image.
Every time I read performance tuning documents, one thing that bothers me is that they never recommend specific values for the size of your thread pools. Because these values depend so much on what your application is doing, the documents are completely accurate to generalize their recommendations; but it would greatly benefit the reader if they presented best-practice starting values or ranges of values. For example, consider the following two applications:
However, most applications do not exhibit this extreme dynamic in functionality. Most do similar things, but do them for different domains. Therefore, my recommendation is for you to configure between 50 and 75 threads per CPU. For some applications, this number may be too low, and for others it may be too high, but as a best practice, I start with 50 to 75 threads per CPU, monitor the CPU performance along with application throughput, and make adjustments.
In addition to having thread pools that are sized too small, environments can be configured with too many threads. When load increases in these environments, the CPU is consistently high, and response time is poor, because the CPU spends too much time switching contexts between threads and little time allowing the threads to perform their work.
The main indication that a thread pool is too large is a consistently high CPU utilization rate. Many times, high CPU utilization is associated with garbage collection, but high CPU utilization during garbage collection differs in one main way from that of thread pool saturation: garbage collection causes CPU spikes, while saturated thread pools cause consistently high CPU utilization.
When this occurs, requests may be pending in the queue, but not always, because pending requests do not affect the CPU as processing requests do. Decreasing the thread pool size may cause requests to wait, but having requests waiting is better than processing them if processing the requests saturates the CPU utilization. A saturated CPU results in abysmal performance across the board, and performance is better if a request arrives, waits in a queue, and then is processed optimally. Consider the following analogy: many highways have metering lights that control the rate that traffic can enter a crowded highway. In my opinion, the lights are ineffective, but the theory is sound. You arrive, wait in line behind the light for your turn, and then enter the highway. If all of the traffic entered the highway at the same time, we would be in complete gridlock, with no one able to move, but by slowing down the rate that new cars are added to the highway, the traffic is able to move. In practice, most metropolitan areas have so much traffic that the metering lights do not help, and what they really need is a few more lanes (CPUs), but if the lights could actually slow down the rate enough, then the highway traffic would flow better.
To fix a saturated thread pool, reduce the thread pool size in steps until the CPU is running between 75 and 85 percent during normal user load. If the size of the queue becomes too unmanageable, then you need to do one of the following two things:
If your user load has exceeded the capacity of your environment, you need to either change what you are doing (refactor and tune code) to lessen the CPU impact or add CPUs.
Most Java EE applications connect to a backend data source, and often these applications communicate with that backend data source through a JDBC (Java Database Connectivity) connection. Because database connections can be expensive to create, application servers opt to pool a specific number of connections and share them among processes running in the same application server instance. If a request needs a database connection when one is unavailable in the connection pool, and the connection pool is unable to create a new connection, then the request must wait for a connection to become available before it can complete its operation. Conversely, if the database connection pool is too large, then the application server wastes resources, and the application has the potential to force too much load on the database. As with all of our tuning efforts, the goal is to find the most appropriate place for a request to wait to minimize its impact on saturated resources; having a request waiting outside the database is best if the database is under duress.
An application server with an inadequately sized connection is characterized by the following:
If you observe these characteristics, increase the size of the connection pool until database connection pool utilization is running at 70 to 80 percent utilization during average load and threads are rarely observed waiting for a connection. Be cognizant of the load on the database, however, because you do not want to force enough load to the database to saturate its resources.
Another important tuning aspect related to JDBC is the correct sizing of JDBC connection prepared statement caches. When your application executes a SQL statement against the database, it does so by passing through three phases:
During the preparation phase, the database driver may ask the database to compute an execute plan for the query. During the execution phase, the database executes the query and returns a reference to a result set. During the retrieval phase, the application iterates over the result set and obtains the requested information.
The database driver optimizes this process: the first time you prepare a statement, it asks the database to prepare an execution plan and caches the result. On subsequent preparations, it loads the already prepared statement from the cache without having to go back to the database.
When the prepared statement cache is sized too small, the database driver is forced to prepare noncached statements again, which incurs additional processing time as well as network time if the database connection goes back to the database. The primary symptom of an inadequately sized prepared statement cache is a significant amount of JDBC processing time spent repeatedly preparing the same statement. The breakdown of time that you would expect is for the preparation time to be high initially and then begin to diminish on subsequent calls.
To complicate things ever so slightly, prepared statements are cached on a per-connection basis, meaning that a cached statement can be prepared for each connection. The impact of this complication is that if you have 100 statements that you want to cache, but you have 50 database connections in your connection pool, then you need enough memory to hold 5,000 prepared statements.
Through performance monitoring, determine how many unique SQL statements your application is running, and from those unique statements, consider how many of them are executed very frequently.
While stateless objects can be pooled, stateful objects like entity beans and stateful session beans need to be cached, because each bean instance is unique. When you need a stateful object, you need a specific instance of that object, and a generic instance will not suffice. As an analogy, consider that when you check out of a supermarket, which cashier you use doesn't matter; any cashier will do. In this example, cashiers can be pooled, because your only requirement is a cashier, not Steve the cashier. But when you leave the supermarket, you want to bring your children with you; other peoples' children will not suffice: you need your own. In this example, children need to be cached.
The benefit to using a cache is that you can serve requests from memory rather than going across the network to load an object from a database. Figure 10 illustrates this benefit. Because caches hold stateful information, they need to be configured at a finite size. If they were able to grow without bound, then your entire database would eventually be in memory! The size of the cache and the number of unique, frequently accessed objects dictate the performance of the cache.
Figure 10. The application requests an object from the cache that is in the cache, so a reference to that object is returned without making a network trip to the database. Click on thumbnail to view full-sized image.
When a cache is sized too small, the cache management overhead can dramatically affect the performance of the cache. Specifically, when a request queries for an object that is not present in a full cache, then the following steps, illustrated in Figure 11, must be performed:
Figure 11. Because the requested object is not in the cache, an object must be selected for removal from the cache and removed from it. Click on thumbnail to view full-sized image.
If these steps must be performed for the majority of requested objects, then using a cache would not be the best idea in the first place! When this process occurs frequently, the cache is said to thrash. Recall that removing an object from the cache is called passivation, and loading an object loaded from persistent storage into the cache is called activation. The percentage of requests that are served by the cache is the hit ratio, and the percentage that are not served is the miss ratio.
While the cache is being initialized, its hit ratio will be zero, and its activation count will be high, so you need to observe the cache performance after it is initialized. To work around the initialization phase, you can monitor the passivation count as compared to the total requests for objects in the cache, because passivations will only occur after the cache has been initialized. But in general, we are mostly concerned with the cache miss ratio. If the miss ratio is greater than 25 percent, then the cache is probably too small. Furthermore, if the miss count is above 75 percent, then either the cache is too small or the object probably should not be cached.
Once you determine that your cache is too small, try increasing its size and measure the improvement. If the miss ratio comes down to less than 20 percent, then your cache is well sized, but if increasing the size of the cache does not have much of an effect, then you need to work with the application technical owner to determine whether the object should be cached or whether the application needs to be refactored with respect to that object.
Stateless session beans and message-driven beans implement business processes, and as such, do not maintain their states between invocations. When your application needs access to these beans' business functionality, it obtains a bean instance from a pool, calls one or more of its methods, and then returns the bean instance to the pool. If your application needs the same bean type later, it obtains another one from the pool, but receiving the same instance is not guaranteed.
Pools allow an application to share resources, but they present another potential wait point for your application. If there is not an available bean in the pool, then requests will wait for a bean to be returned to the pool before continuing. These pools are tuned pretty well by default in most applications servers, but I have seen environments where customers have introduced problems by sizing them too small. Stateless bean pools should generally be sized the same as your execution thread pool, because a thread can use only one instance at a time; anything more would be wasteful. Furthermore, some application servers optimize pool sizes to match the thread count, but as a safety precaution, you should configure them this way yourself.
One of the benefits to using enterprise Java is its inherent support for transactions. By adding an annotation to methods in a Java EE 5 EJB (Enterprise JavaBeans), you can control how the method participates in transactions. A transaction can complete in one of the following two ways:
When a transaction is committed, it has completed successfully, but when it rolls back, something went wrong. Rollbacks come in the following two flavors:
An application rollback is usually the result of a business rule. Consider a Web application that asks users to take a survey to enter a drawing for a prize. The application may ask the user to enter an age, and a business rule might state that users need to be 18 years of age or older to enter the drawing. If a 16-year-old submits information, the application may throw an exception that redirects the user to a Webpage informing that user that he or she is not eligible to enter the drawing. Because the application threw an exception, the transaction in which the application was running rolled back. This rollback is a normal programming practice and should be alarming only if the number of application rollbacks becomes a measurable percentage of the total number of transactions.
A nonapplication rollback, on the other hand, is a very bad thing. The three types of nonapplication rollbacks follow:
A system rollback means that something went very wrong in the application server itself, and the chances of recovery are slim. A time-out rollback indicates that some process within the application server timed out while processing a request; unless your time-outs are set very low, this constitutes a serious problem. A resource rollback means that when the application server was managing its resources internally, it had a problem with one of them. For example, if you configure your application server to test database connections by executing a simple SQL statement, and the database becomes unavailable to the application server, then anything interacting with that resource will receive a resource rollback.
Nonapplication rollbacks are always serious issues that require immediate attention, but you do need to be cognizant of the frequency of application rollbacks. Many times people overreact to the wrong types of exceptions, so knowing what each type means to your application is important.
While each application and each environment is different, a common set of issues tends to plague most environments. This article focused not on application code issues, but on the following environmental issues that can manifest poor performance:
In order to effectively diagnose performance problems, you need to understand how problem symptoms map the root cause of the underlying problem. If you can triage the problem to application code, then you need to forward the problem to the application support delegate, but if the problem is in the environment, then resolving it is within your control.
The root of a problem depends on many factors, but some indicators can increase your confidence when diagnosing problems and completely eliminate others. I hope this article can serve as a beginning troubleshooting guide for your Java EE environment that you can customize to your environment as issues arise.
Archived Discussions (Read only)