A major issue stands in the way of Java scalability today, and Cloudera's Eva Andreasson wants Java developers to stop ignoring it. Here she makes the case for why true Java application scalability will require a dramatic overhaul in how we think about Java virtual machines, and how developers and vendors build them.
Most developers approach JVM performance issues as they surface, which means spending a lot of time fine-tuning application-level bottlenecks. If you've been reading this series so far then you know that I see the problem more systemically. I say it's JVM technology that limits the scalability of enterprise Java applications. Before we go further, let me point out some leading facts:
- Modern hardware servers offer huge amounts of memory.
- Distributed systems require huge amounts of memory, the demand is always increasing.
- A normal heap size for a Java application instance currently is between 1 and 4 GB -- far below what most servers can manage and what most distributed applications require. This is sometimes known as the Java application memory wall, as illustrated in Figure 1.
The time graph in Figure 1, created by Gil Tene, shows a history of memory usage on Java application servers and what was a normal heap size for Java applications at each point in time. (See Resources.)
Image copyright Azul Systems.
This brings us to the JVM performance conundrum, which goes something like this:
- If you provide too little memory to an application it will run out of memory. The JVM will not be able to free up memory space at the rate that your application needs it. Push too hard and eventually the JVM will throw an
OutOfMemoryErrorand shut down completely. So you have to provide more memory to your applications.
- If you increase the heap size for a response-time-sensitive application, the heap will eventually become fragmented. This is unavoidable if you don't restart your system or custom-architect your application. When fragmentation happens, an application can hang for anywhere from 100 millisecond to 100 seconds depending on the application, the heap size, and other JVM tuning parameters.
Most discourse about JVM pauses focuses on average or target pauses. What isn't discussed as much is the worst-case pause time that happens when the whole heap needs to be compacted. A worst-case pause time in a production environment is around one second per gigabyte of live data in the heap.
JVM performance optimization: Read the series
A two- to four-second pause is not acceptable for most enterprise applications, so Java application instances are stalled out at 2 to 4 GB, despite their need for more memory. On some 64-bit systems, with lots of JVM tuning for scale, it is possible to run 16 GB or even 20 GB heaps and meet typical response-time SLAs. But compared to where Java heap sizes should be today, we're still way off. The limitation lies in the JVM's inability to handle fragmentation without a stop-the-world GC. As a result, Java application developers are stuck doing two tasks that most of us deplore:
- Architecting or modeling deployments in chopped-up large instance pools, even though it leads to a horrible operations monitoring and management situation.
- Tuning and re-tuning the JVM configuration, or even the application, to "avoid" (meaning postpone) the worst-case scenario of a stop-the-world compaction pause. The most that developers can hope for is that when a pause happens, it won't be during a peak load time. This is what I call a Don Quixote task, of chasing an impossible goal.
Now let's dig a little deeper into Java's scalability problem.
Over-provisioning and over-instancing Java deployments
To be able to utilize all of the hardware memory available in a large box, many Java teams choose to scale their application deployment through multiple instances instead of within a single instance or a few larger instances. While running 16 application instances on a single box is a good way to use all of the available memory, it doesn't address the cost of managing and monitoring that many instances, especially when you've deployed multiple servers the same way.
Another problem designed as a solution is the dramatic precautions that teams take to stay up (and not go down) during peak load times. This means configuring heap sizes for worst-case peak loads. Most of that memory isn't needed for everyday loads, so it becomes an expensive waste of resources. In some cases, teams will go even further, deploying not more than two or three instances per box. This is exceptionally wasteful, both economically and in terms of environmental impact, particularly during non-peak hours.
Now let's compare architectures. On the left side of Figure 2 we see many small-instance clusters, which are harder to manage and maintain. To the right are fewer, larger instances handling the same load. Which solution is more economical?
Image copyright Azul Systems.
As I discussed in my last article, concurrent compaction is a truly viable solution to this problem. It makes fewer, larger instances possible and removes the scalability limitations commonly associated with the JVM. Currently only Azul's Zing JVM offers concurrent compaction, and Zing is a server-side JVM, not yet in the hands of developers. It would be terrific to see more developers take on Java's scalability challenge at the JVM level.
Since performance tuning is still our primary tool for managing Java scalability issues, let's look at some of the main tuning parameters and what we're actually able to achieve with them.
Tuning parameters -- some examples
The most well-known JVM performance option, which most Java developers specify on their command-line when launching a Java application, is
-Xmx. This option lets you specify how much memory is allocated to your Java heap, although results will vary by JVM.
Some JVMs include the memory needed for internal structures (such as compiler threads, GC structures, code cache, and so on) in the
-Xmx setting, while others add more memory outside of it. As a result, the size of your Java processes may not always reflect your
Once you've reached your maximum memory threshold, you can't return that memory to the system. You also can't grow beyond it; it is a fixed upper limit. If you don't get your
-Xmx setting "right" -- that is, if the application's object-allocation rate, the lifetime of your objects, or the size of your objects exceeds your JVM's memory configuration settings -- you will run out of memory. Then the garbage collector will throw an out-of-memory error and your application will shut down.
If your application is struggling with memory availability, you currently don't have many other options than to restart it with a larger
-Xmx size. To avoid downtime and frequent restarts, most enterprise production environments tend to tune for the worst-case load, hence over-provisioning.
Tip: Tune for production load
A common error for Java developers is to tune heap memory settings in a lab environment, forgetting to re-tune it for production load. The two loads can differ significantly, so always re-tune for your production load.
Tuning generational garbage collectors
Some other common tuning options for the JVM are
-XX:NewSize. These options are used to tune the size of young generation (or nursery) in your JVM. They specify the amount of the heap that should be dedicated for new allocation in generational garbage collectors.
Most Java developers will try to tune the nursery size based on a lab environment, which means risking a production-load fail. It's common to set a third or half of the heap as nursery, which is a bit of glue that seems to work most of the time. It isn't based on a real rule, however, as the "right" size is actually application dependent. You're better off investigating the actual promotion rate and the actual size of your long-living objects, and then setting the nursery size to be as large as possible without causing promotion failure in the old space of the heap. (Promotion failure is a sign that the old space is too small, and will trigger a number of garbage collection actions that could result in an out-of-memory error. See JVM performance optimization, Part 2 for an in-depth discussion of generational garbage collection and heap sizing.)
Another nursery-related option is
-XX:SurvivorRatio. This option lets you set the promotion rate, meaning the life-length an object has to survive before getting promoted to old space. To set this option "right" you will have to know the frequency of young-space collection and be able to estimate for how long new objects will be referenced in your application. Note that the "right" setting for these options depends on the allocation rate, so setting them based on the lab environment will cost you in production.
Tuning concurrent garbage collectors
If you have a pause-sensitive application, your best bet is to use a concurrent garbage collector -- at least until someone invents something better. Although parallel approaches give excellent throughput benchmark scores and are often used for JVM comparative publications, parallel GC does not benefit response times. Concurrent GC is currently the only way to achieve some kind of consistency and the least number of stop-the-world interruptions. Different JVMs supply different options for setting a concurrent garbage collector. For the Oracle JVM it's
-XX:+UseConcMarkSweepGC. More recently G1 has become the default for the Oracle JVM, which uses a concurrent approach.
GC algorithms compared
See JVM performance optimization, Part 2 to learn about the strengths and weaknesses of different garbage collection algorithms for various application scenarios.
Performance tuning is not the solution
You might have noticed that I put quotation marks around each use of the word right in my discussion about tuning parameters. That's because my experience shows that when it comes to JVM performance tuning, there is no right setting. Assigning a value to one of your JVM parameters means setting it for a specific application scenario. Given that the application scenario changes, JVM performance tuning is a stop-gap solution at best.
Take the heap size as an example: If 2 GB looks good with 200,000 concurrent users, it probably won't be enough with 400,000 users. Or take the survivor ratio, or the promotion rate: A setting that looked good for a test case had a continuously growing load up to 10,000 transactions per millisecond, but what happens when the pressure increases to 50,000 for the same timeframe in production?
Most enterprise application loads are dynamic, which is why Java is indeed an excellent language for the enterprise. Java's dynamic memory management and compilation means that the more dynamic you keep your configuration environment, the better it is for language execution. Consider the two listings below. Do either of these example look similar to your production environment's startup command line?
Listing 1. Startup options for an enterprise Java application (1)
>java -Xmx12g -XX:MaxPermSize=64M -XX:PermSize=32M -XX:MaxNewSize=2g -XX:NewSize=1g -XX:SurvivorRatio=16 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:MaxTenuringThreshold=0 -XX:CMSInitiatingOccupancyFraction=60 -XX:+CMSParallelRemarkEnabled -XX:+UseCMSInitiatingOccupancyOnly -XX:ParallelGCThreads=12 -XX:LargePageSizeInBytes=256m …
Listing 2. Startup options for an enterprise Java application (2)
>java --Xms8g --Xmx8g --Xmn2g -XX:PermSize=64M -XX:MaxPermSize=256M -XX:-OmitStackTraceInFastThrow -XX:SurvivorRatio=2 -XX:-UseAdaptiveSizePolicy -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled -XX:CMSMaxAbortablePrecleanTime=10000 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=63 -XX:+UseParNewGC --Xnoclassgc …
The values are very different for these two real-world JVM startup configurations, which makes sense because they're for two different Java applications. Both are "right" in the sense that they have been tuned to suit particular enterprise application characteristics. Both achieved great performance in the lab but eventually failed in production. The config in Listing 1 failed immediately because the test cases didn't represent the dynamic load in production. The config in Listing 2 failed following an update because new features changed the application's production usage pattern. In both cases, the development team was blamed. But were they really to blame?
Do 'workarounds' work?
Some enterprises have gone to the extreme of "recycling" object space by exactly measuring how large transaction objects are and "cutting" their architecture to fit that size. This is one way to reduce fragmentation and survive perhaps for an entire day of trading without compaction. Another option is to use an application design that assures objects are only referenced for a very short while, so that they never are promoted, thus preventing old-space collection and full compaction scenarios. Both approaches work, but put unnecessary strain on the developers and the application design.