Accelerate your Java apps!

Where does the time go? Find out with these speed benchmarks

As a Java programmer, knowing the performance characteristics of different Java environments running on different operating systems is crucial. Having this information at hand can prepare you for potential bottlenecks, and it can save you from building bottlenecks into your apps accidentally. This article tests six different Java environments -- some with a just-in-time (JIT) compiler, some without -- running on four OSs and provides valuable benchmarks that can help you out with your Java development efforts.

The testing process

To understand performance characteristics and therefore where to expect bottlenecks, I ran benchmark tests on the following typical Java language constructs: method call, try/catch set-up, object creation, array creation, and array accessing. I didn't run tests of network I/O, disk I/O, or AWT performance -- the focus was purely on Java language performance. The tests were designed to avoid paging to disk.

Most of the tests required no garbage collection, so general system performance cannot be inferred by simply adding the results from the various tests together. I ran no general computation tests like "Tower of Hanoi" or "Sieve of Erastothenes." I omitted general computational tests because they're dedicated to showing relative speeds on differing platforms, and rarely show where bottlenecks are.

Target systems and environments

The benchmark tests I ran for this article were performed on a range of hardware systems and Java environments. The Java environments were:

Java Environments
Description JIT
Netscape Navigator 4.05 for Windows NT/95Symantec Java! ByteCode Compiler Version 210.065
Netscape Navigator 4.05 for Power MacintoshYes
Internet Explorer 4.0 for Windows NT/95Yes
Symantec Visual Cafe PDE 2.1a for Windows NT/95 JDK 1.1.4Symantec Java! ByteCode Compiler Version i300.009
Netscape Navigator 4.05 for SPARCNo
Netscape Navigator 4.05 for LinuxNo

The hardware/OS platforms were:

Hardware/OS Systems
OS CPU (MegaHertz = MHz) RAM (megabytes = MB)
Windows NT SP3Pentium Pro 200 MHz128 MB
Macintosh 7.6.1PowerPC 604e 180 MHz 
Solaris 2.5.1UltraSPARC-1 167 MHz128 MB
Red Hat Linux 5.1Pentium-II 266 MHz128 MB
Windows NTDual Pentium Pro 180 MHz32 MB

To compare the various systems, I converted the time it took to perform the various operations into clock cycles. Why? This conversion makes it possible to compare machines running CPUs at different speeds. In general, comparing different CPUs to each other in such a crude way can be dangerous, because the amount of work that can be done in a single clock cycle can vary a lot from CPU to CPU. The 80486, for example, averages about 2 clock cycles per instruction, while the Pentium executes closer to 1. Fortunately, the PowerPC 604e, UltraSPARC, Pentium Pro, and Pentium-II are roughly comparable. While cache behavior could be different between the various systems, this seems not to affect the performance much. All the tests ran without paging to disk.

Special resources

For information relating to the benchmark testing, I've provided the following links:

Method calls

The ability to write and call methods (or functions) is a critical tool for building and maintaining large systems. Methods allow programs to be broken into smaller, more easily handled chunks. However, if method calls slow down a running program, programmers will design systems with bigger parts and fewer method calls.

Object-oriented programming increases the number of method calls when compared to equivalent procedural programs because it encourages more data encapsulation (among other things). Compare these two lines of code and notice the extra method call in the line showing encapsulation:

Without encapsulation: int x = someObject.x;
With encapsulation: int x = someObject.getX();

Encapsulation increases the number of method calls in a program, so it is essential that those method calls execute quickly. If method calls don't execute quickly, programmers often attempt to speed up their programs by avoiding encapsulating the data in their programs. Examples of this lack of encapsulation can be seen in some of the standard Java classes. The class java.awt.Dimension, for example, is written with both of its data members public. A better design would have hidden the data members by making them private and providing public accessor methods:

private int height;
       private int width;
       public int getHeight()
       {
           return height;
       }
       public int getWidth()
       {
           return width;
       }

Because the early Java environments shipped without JIT compilers, method calls were much slower than current Java environments. The encapsulation shown above may have been unacceptably slow to run in those early environments, with the result that the data is public.

Fortunately, today's JIT-enabled Java environments perform method calls much faster than earlier non-JIT-enabled environments. There is less of a need to make speed-versus-encapsulation tradeoffs in these environments. With the best JIT, static methods returning nothing and taking no arguments execute in 2 clock cycles. Non-static method calls returning integer quantities execute in 7 clock cycles. Non-static method calls returning floating-point numbers execute in 8 clock cycles.

By making these accessor methods final, you can expect to reduce these times by one clock cycle. When running in a Java environment without a JIT, method calls take anywhere between 280 and 500 clock cycles. A good JIT can speed up method calls by a factor of more than 100 -- so in target environments with a good JIT, you can have both encapsulation and speed. In environments without a JIT or with a poor JIT, programmers must decide on a case-by-case basis whether speed or encapsulation is more important. A good JIT can make this decision unnecessary.

The graph below shows the effect adding parameters has on the time a method call takes under various JIT-enabled Java runtimes. While the time a method call takes to execute varies considerably from one runtime to another, adding parameters to a method call frequently increases the time required to execute the method call. Often, adding one parameter does not increase the time required to execute a method call. Only rarely does adding a parameter speed up a method call. Also note that, regardless of the number of parameters, there is some general overhead in setting up a method call. Once a decision has been made to call a method, adding a few parameters will have little impact on the time it takes to make the call.

Notice that the JIT for Netscape Navigator on the Macintosh runs 25 percent as fast as the JITs on Windows. I have no numbers for JIT-enabled runtimes on Solaris. If you expect to support Macintosh and Windows clients, be sure to do your performance benchmarking on Macintoshes as well as on Windows clients.

Several popular environments do not yet come with a JIT. The graph below shows the effects of adding parameters to the time a method call takes under two non-JIT-enabled Java runtimes. You'll see that the cost of adding parameters is still mostly monotonic but that the general overhead of setting up a function call is very high.

Both Linux and Solaris have Java environments with JITs, but not under Navigator. I did not have access to these environments and have no data for them.

Finally, this graph compares the best Java time with C/C++.

Java seems to be only about 1 clock cycle slower than C++.

Recommendations

If you expect to run with a reasonable JIT, method calls are no more expensive in Java than they are in C or C++. If you expect to run on a system without a JIT or without a very good one, this is something you'll have to pay attention to in the speed-critical portions of your application.

Object creation

Modern microprocessors run at speeds of up to 600 MHz. Unfortunately, modern DRAM runs considerably slower. In burst mode, a modern SDRAM runs at about 100 MHz. If programs accessed memory in a truly random fashion, CPUs would spend most of their time waiting for DRAM. Fortunately, programs don't access memory in a random fashion. If a memory location has been accessed recently, it is quite likely to be accessed again soon. This property is called locality of reference.

Unfortunately, the locality of reference for Java programs can be worse than it is in equivalent C or C++ programs. This is due to object creation. Object creation in a Java program is fundamentally different than that in an equivalent C or C++ program. Many of the small temporary objects that C or C++ would create on the stack are created on the heap in Java.

In C and C++, when these objects are discarded at the end of a method call, the space is available for more temporary objects, and the stack area is almost always in the on-board Level 1 cache.

In Java, the objects are discarded, but the space typically is not reclaimed until the next garbage collection -- which usually doesn't happen until the heap memory is exhausted. The space for the next temporary object in a Java program always comes off the heap. The space for the new temporary object is rarely in a cache, and so the initial use of the temporary object should run slower than the initial use of a temporary object in a C or C++ program.

With the current Java runtimes, performance is also affected because creating a new object is roughly as costly as a mallocin C or a new operation in C++. Creating a new object in any of these ways takes hundreds or even thousands of clock cycles. In C and C++, creating an object on the stack takes about 20 clock cycles. C and C++ programs create many temporary objects on the stack, but Java programs don't have this option.

The graph below shows the cost of creating objects of various sizes under different JIT-enabled Java runtimes. The fastest time is 1.2 microseconds running under Microsoft Internet Explorer (IE) 4.0 on a 200-MHz Pentium Pro. The time would likely be cut to about 0.6 microseconds on a 400-MHz Pentium-II.

Netscape Navigator on Macintosh, IE on Windows NT, and Symantec on Windows NT all show reasonable profiles: the time required to create an object increases as the object size increases.

Navigator on Windows, however, shows very different behavior. The time it takes to create objects of different sizes cycles up and down. Objects containing 4 integers ("ints") and objects containing 12 ints are created fastest. A similar profile appears under Navigator for Solaris as well. I have no explanation for this profile, but there is a solution. When running applets under Navigator, padding object sizes should result in a faster-running applet if object creation proves to be a performance bottleneck.

The graph below shows object-creation times for objects of different sizes when running under non-JIT-enabled systems.

One important thing to notice is that a JIT does not speed up object creation time nearly as much as it speeds up method calls. The bottlenecks for the same program running under a JIT-enabled environment and running under a non-JIT-enabled environment might well be different -- if only subtly -- because of this. While profiling a Java application, you should bear in mind whether or not the profiler has performance characteristics similar to the expected runtime environment or environments.

Finally, the following graph shows the Java object-creation time in the best environment, compared to C/C++ object creation on the stack. The speed difference is about 10 to 12 times. Certain shortcuts can speed up C/C++ object creation on the stack slightly more than these already-fast times, but the shortcuts can reduce the maintainability of the C/C++ code. Java doesn't give you the option of taking these shortcuts.

Recommendations

In all Java runtime environments, object creation can become a performance bottleneck. Here JITs don't help very much! Object creation is something you must pay attention to for all Java programs. This doesn't mean you should avoid object creation at all cost -- simply be aware of where you are creating lots of objects, and watch for bottlenecks there.

Synchronization

Correct, nontrivial, multithreaded environments require some degree of synchronization. The chart below shows the times required for various monitor operations in clock cycles.

Cost of synchronized operation both with and without monitor on object and class
System Object synch w/o monitor Object synch w/ monitor Class synch w/o monitor Class synch w/ monitor
Navigator 4 under Windows NT 589259571289
Navigator 4 under MacOS 1209111611951193
Internet Explorer 4 under Windows NT 3941255236
Symantec with JIT under Windows NT 92368832
Navigator 4 under Solaris 371258465352
Navigator 4 under Linux 488417519429

Several things are worth noting:

First, synchronizing on an already-acquired monitor is usually, but not always, cheaper than acquiring and releasing the monitor.

Second, the time required for the same operation when executed by different Java runtimes can vary by a ratio of almost 40 to 1! The bottlenecks on various systems may well be different if there is a significant amount of synchronization. Profiling for performance bottlenecks should, if at all possible, be achieved on a runtime similar to the target runtime.

Third, on some systems the cost of synchronizing a class is significantly higher than the cost of synchronizing an object. This suggests using a single global object instead of static class methods.

Fourth, the overhead of synchronization makes the cost of calling a synchronized method significantly more expensive than calling a nonsynchronized method. This can make synchronization prohibitive for accessor functions. A nonsynchronized accessor that takes 7 to 10 clock cycles could take 500 clock cycles if it is synchronized. This does not suggest writing thread-unsafe code, but does suggest that designing the program to require little or no synchronization might be a good strategy for speed-critical portions of the program.

Recommendations

With a good Java runtime environment, synchronizing is more expensive than method calls but much less expensive than object creation. In this case, the only thing to worry about is synchronization operations in tight inner loops. When running in a poor Java runtime, synchronization can become more of a bottleneck in general.

Exception overhead

Since exceptions are the chosen mechanism for handling errors in Java, it makes sense to ask how expensive setting up a try/catchblock is. The answer is: 0 to 3 clock cycles on JIT-enabled systems, and 30 to 40 on non-JIT-enabled systems. The runtime cost of try/catch blocks is close to zero. If try/catch blocks don't appear inside speed-critical inner loops, they should have no impact on application performance.

The primary alternative to exceptions is to use status-return codes. Because using exceptions reduces the number of parameters needed in method calls, using exceptions in Java programs can actually speed up the program when compared to an equivalent program using status codes.

The cost of throwing an exception is higher than the cost of setting up a try/catch block. Since exception handling shouldn't be done on a speed-critical path, this doesn't bother me and I haven't benchmarked it.

Recommendations

The use of exceptions to handle error conditions in a Java program should run the non-error case faster than using status return codes! Setting up try/catch blocks is fast.

Arrays (updated 9/3/98)

Java, unlike C and C++, implements arrays as true objects with defined behavior, including range checking. This language difference imposes overhead on a Java application using arrays that is missing in equivalent C and C++ programs. Creating an array is object creation. Further, using an array involves the overhead of checking the array object for null and then range-checking the array index. C and C++ omit the checks for null and the range checks, and thus should run faster.

The benchmark program measures the cost of accessing array elements for various sized arrays. Unfortunately, a bug in the benchmark program reports the results for short arrays incorrectly, so only the results for the largest array size should be considered valid. (The bug also affects the other results, but only a few percentage points at most, and this is as accurate as I expect for tests as simple as these). Because I don't have easy access to the most of the systems on which the test were run, I can't rerun the tests with the bug fixed.

This table shows the cost of accessing each array element for the largest array size, which is relatively uneffected by the error.

The times in clock cycles are:

Runtime Member variable int[512000]
Navigator 4 on NT 616
Navigator 4 on Macintosh 2540
IE 4 711
Symantec 2.1 JIT on NT 22

In general, array access is 2 or 3 times as expensive as accessing non-array elements, even for large arrays when the range checking can be amortized over many accesses. Only the Symantec JIT compiler is successful in optimizing most of this overhead, and this may occur only in the case of large arrays.

Array access for non-JIT-enabled environments is about one-tenth as fast as JIT-enabled environments.

Finally, because Java doesn't have arrays of objects but only arrays of object references, the relative cost of creating a Java array of objects compared to a C or C++ array of objects goes up as the array gets longer.

Recommendations

Array access is slower than accessing atomic variables. If you expect to access a few elements in an array many times each, copying them to atomic variables can speed things up.

Casting

The Java programming language does not have templates. Because of this, Java collection classes -- queues, stacks, dictionaries, and so on -- typically store references to type java.lang.Object. Before using a reference retrieved from a collection class, the reference must be converted from a reference to a java.lang.Object into a reference to the underlying object's type. Converting a reference to a Java object from one type to another is called casting. Casting from a type at the top of an inheritance hierarchy (java.lang.Object is at the very top) to a type further down the inheritance hierarchy is called downcasting.Casting is more necessary in Java programs than in equivalent C++ programs, and therefore the speed of casting is more important, too.

Another major use of casting is for accessing interfaces. Using interfaces in function signatures makes for more flexible program architectures, but applications using interfaces generally run slightly slower than applications using classes in function signatures.

The times for various cast operations are shown below.

Cost of cast operations in clock cycles
System Shallow downcast Deep downcast Accessing 1st interface Accessing 18th interface
Navigator 4 under Windows NT 464516084
Navigator 4 under MacOS 603658779803
Internet Explorer 4 under Windows NT 1414305152
Symantec with JIT under Windows NT 2222
Navigator 4 under Solaris 56399161
Navigator 4 under Linux 7975230111

Symantec has the best cast performance and has reduced the cost of casts to almost nothing. The Macintosh executes casts especially slowly. Finally, for most environments, casting an object reference to the last declared interface is faster than casting to the first declared interface. Declaring the most commonly used interface last will speed things up marginally with little or no loss of program clarity.

Recommendations

With a good Java runtime, casting runs very fast. If you expect to run in an environment with a good Java runtime, you can use interfaces with almost no performance penalty. You can also use collection classes with almost no performance penalty.

Conclusions

The results of the benchmarking tests discussed above do not lead to any grand, unifying conclusion. But it is instructive to list a number of unrelated conclusions.

  • Object creation time is quite a bit more expensive in Java than it is in C or C++. With a good JIT and runtime, it is 10 to 12 times as expensive; with a poor runtime, it is about 25 times as expensive. JITs improve this situation, but not nearly as much as they improve things like method-call speed. For performance in Java programming, the most important thing to pay attention to -- but not panic over -- usually is the amount of object creation. Netscape Navigator has very strange object-creation behavior on some platforms.

  • try/catchblocks are cheap -- a lot cheaper than I expected.

  • Declaring the most commonly-used interface for a class last will result in a small speed increase. Symantec, which doesn't care, and Navigator under the Macintosh, which is already very slow anyway, are exceptions to this rule.

  • With a JIT, method calls are about as fast in Java as in C or C++.

  • Synchronization is several times more expensive than a method call, but still not as expensive as object creation. The cost of synchronization varies a lot between the various Java runtimes.

  • Array access is 2 or 3 times as expensive as member variable access. A good JIT can almost eliminate this overhead for large arrays.

  • The relative speed of different activities varies a lot from platform to platform and between JIT-enabled and non-JIT-enabled systems. To find the bottlenecks in your program you must profile your application on a system with performance characteristics similar to the systems on which you expect your application to run. The bottlenecks when running on a Macintosh may differ from the bottlenecks when running on Linux.

Acknowledgments

Generating and analyzing the data in this article involved valuable effort on the part of many people. I would like to thank:

  • Forrest Bennett, Stanford University, who contributed conjectures for some of the odd things I found.

  • Greg Dougherty, Molecular Software, who helped find a Macintosh environment that would run all the tests without crashing.

  • Eric Enders, Netscape, who provided Solaris data.

  • Dara Golden, my favorite wife, who provided Windows data.

  • Dee and Rayna Golden, T-Square Design, who provided more Macintosh support

  • Eric Roulo, who contributed Windows data.

  • Bob Scott, Broderbund, who contributed the Linux numbers.

  • Bernie Su, UC San Diego, who provided Windows data.

  • Sal Zaragoza, who provided Solaris data.
Markprogrammed using C and C++ for eight years. He has been using Java professionally for the last two years. His main programming interests are portable, distributed, concurrent software as well as software engineering.

Learn more about this topic

  • There isn't much information about HotSpot available (as of mid-July 1998), but one Web page with some data claims "Short lived object benchmark18 secs classic VM, 8 sec's on Jview, 6 secs malloc/free C, 2 secs HotSpot". If these numbers are representative, we should expect to see object creation time drop to about 3 times greater than C/C++ stack allocation, down from 10 to 12 times greater. http://java.sun.com/javaone/javaone98/keynotes/panel/bullets.htm
  • An interesting theoretical paper on object creation is "Garbage collection can be faster than stack allocation" by Andrew Appel. The paper is about object creation in Lisp, not Java, but still is relevant. It provides some ammunition for the position that in theory, Java object allocation on the heap can be faster than C/C++ object creation on the stack. http://www.cs.princeton.edu/fac/appel/papers/45.ps
  • A really good book on garbage collection is Garbage CollectionAlgorithms for Automatic Dynamic Memory Management by Richard Jones and Rafael Lins. I recommend this book to all serious Java programmers. http://www.cs.ukc.ac.uk/people/staff/rej/gcbook/gcbook.html
  • The article "Optimizing NET Compilers for Improved Java Performance" in the June 1997 issue of Computer compares some actual programs under various Java environments and also under C/C++. It provides a good starting point for answering the question, How fast is Java compared to C++? (as of mid-1997).
  • SPEC announced its JVM98 benchmark suite August 19 (just after we published this article) http://www.spec.org/osg/jvm98/
  • A "FAQs (Frequently Asked Questions) About the Multimedia Benchmarking Committee (MBC)" document notes in part"SPECjava will benchmark Java performance, which means essentially benchmarking the computing platform, as well as the Java virtual machine, the browser, and perhaps the just-in-time compiler." http://www.specbench.org/gpc/Dec97/mbc.static/mbcfaq~1.html
  • See JavaWorld's article "HotSpotA new breed of virtual machine" http://www.javaworld.com/jw-03-1998/jw-03-hotspot.html
Join the discussion
Be the first to comment on this article. Our Commenting Policies
See more