Accelerate your Java apps!

Where does the time go? Find out with these speed benchmarks

Page 2 of 2

Third, on some systems the cost of synchronizing a class is significantly higher than the cost of synchronizing an object. This suggests using a single global object instead of static class methods.

Fourth, the overhead of synchronization makes the cost of calling a synchronized method significantly more expensive than calling a nonsynchronized method. This can make synchronization prohibitive for accessor functions. A nonsynchronized accessor that takes 7 to 10 clock cycles could take 500 clock cycles if it is synchronized. This does not suggest writing thread-unsafe code, but does suggest that designing the program to require little or no synchronization might be a good strategy for speed-critical portions of the program.

Recommendations

With a good Java runtime environment, synchronizing is more expensive than method calls but much less expensive than object creation. In this case, the only thing to worry about is synchronization operations in tight inner loops. When running in a poor Java runtime, synchronization can become more of a bottleneck in general.

Exception overhead

Since exceptions are the chosen mechanism for handling errors in Java, it makes sense to ask how expensive setting up a try/catchblock is. The answer is: 0 to 3 clock cycles on JIT-enabled systems, and 30 to 40 on non-JIT-enabled systems. The runtime cost of try/catch blocks is close to zero. If try/catch blocks don't appear inside speed-critical inner loops, they should have no impact on application performance.

The primary alternative to exceptions is to use status-return codes. Because using exceptions reduces the number of parameters needed in method calls, using exceptions in Java programs can actually speed up the program when compared to an equivalent program using status codes.

The cost of throwing an exception is higher than the cost of setting up a try/catch block. Since exception handling shouldn't be done on a speed-critical path, this doesn't bother me and I haven't benchmarked it.

Recommendations

The use of exceptions to handle error conditions in a Java program should run the non-error case faster than using status return codes! Setting up try/catch blocks is fast.

Arrays (updated 9/3/98)

Java, unlike C and C++, implements arrays as true objects with defined behavior, including range checking. This language difference imposes overhead on a Java application using arrays that is missing in equivalent C and C++ programs. Creating an array is object creation. Further, using an array involves the overhead of checking the array object for null and then range-checking the array index. C and C++ omit the checks for null and the range checks, and thus should run faster.

The benchmark program measures the cost of accessing array elements for various sized arrays. Unfortunately, a bug in the benchmark program reports the results for short arrays incorrectly, so only the results for the largest array size should be considered valid. (The bug also affects the other results, but only a few percentage points at most, and this is as accurate as I expect for tests as simple as these). Because I don't have easy access to the most of the systems on which the test were run, I can't rerun the tests with the bug fixed.

This table shows the cost of accessing each array element for the largest array size, which is relatively uneffected by the error.

The times in clock cycles are:

Runtime Member variable int[512000]
Navigator 4 on NT 616
Navigator 4 on Macintosh 2540
IE 4 711
Symantec 2.1 JIT on NT 22

In general, array access is 2 or 3 times as expensive as accessing non-array elements, even for large arrays when the range checking can be amortized over many accesses. Only the Symantec JIT compiler is successful in optimizing most of this overhead, and this may occur only in the case of large arrays.

Array access for non-JIT-enabled environments is about one-tenth as fast as JIT-enabled environments.

Finally, because Java doesn't have arrays of objects but only arrays of object references, the relative cost of creating a Java array of objects compared to a C or C++ array of objects goes up as the array gets longer.

Recommendations

Array access is slower than accessing atomic variables. If you expect to access a few elements in an array many times each, copying them to atomic variables can speed things up.

Casting

The Java programming language does not have templates. Because of this, Java collection classes -- queues, stacks, dictionaries, and so on -- typically store references to type java.lang.Object. Before using a reference retrieved from a collection class, the reference must be converted from a reference to a java.lang.Object into a reference to the underlying object's type. Converting a reference to a Java object from one type to another is called casting. Casting from a type at the top of an inheritance hierarchy (java.lang.Object is at the very top) to a type further down the inheritance hierarchy is called downcasting.Casting is more necessary in Java programs than in equivalent C++ programs, and therefore the speed of casting is more important, too.

Another major use of casting is for accessing interfaces. Using interfaces in function signatures makes for more flexible program architectures, but applications using interfaces generally run slightly slower than applications using classes in function signatures.

The times for various cast operations are shown below.

Cost of cast operations in clock cycles
System Shallow downcast Deep downcast Accessing 1st interface Accessing 18th interface
Navigator 4 under Windows NT 464516084
Navigator 4 under MacOS 603658779803
Internet Explorer 4 under Windows NT 1414305152
Symantec with JIT under Windows NT 2222
Navigator 4 under Solaris 56399161
Navigator 4 under Linux 7975230111

Symantec has the best cast performance and has reduced the cost of casts to almost nothing. The Macintosh executes casts especially slowly. Finally, for most environments, casting an object reference to the last declared interface is faster than casting to the first declared interface. Declaring the most commonly used interface last will speed things up marginally with little or no loss of program clarity.

Recommendations

With a good Java runtime, casting runs very fast. If you expect to run in an environment with a good Java runtime, you can use interfaces with almost no performance penalty. You can also use collection classes with almost no performance penalty.

Conclusions

The results of the benchmarking tests discussed above do not lead to any grand, unifying conclusion. But it is instructive to list a number of unrelated conclusions.

  • Object creation time is quite a bit more expensive in Java than it is in C or C++. With a good JIT and runtime, it is 10 to 12 times as expensive; with a poor runtime, it is about 25 times as expensive. JITs improve this situation, but not nearly as much as they improve things like method-call speed. For performance in Java programming, the most important thing to pay attention to -- but not panic over -- usually is the amount of object creation. Netscape Navigator has very strange object-creation behavior on some platforms.

  • try/catchblocks are cheap -- a lot cheaper than I expected.

  • Declaring the most commonly-used interface for a class last will result in a small speed increase. Symantec, which doesn't care, and Navigator under the Macintosh, which is already very slow anyway, are exceptions to this rule.

  • With a JIT, method calls are about as fast in Java as in C or C++.

  • Synchronization is several times more expensive than a method call, but still not as expensive as object creation. The cost of synchronization varies a lot between the various Java runtimes.

  • Array access is 2 or 3 times as expensive as member variable access. A good JIT can almost eliminate this overhead for large arrays.

  • The relative speed of different activities varies a lot from platform to platform and between JIT-enabled and non-JIT-enabled systems. To find the bottlenecks in your program you must profile your application on a system with performance characteristics similar to the systems on which you expect your application to run. The bottlenecks when running on a Macintosh may differ from the bottlenecks when running on Linux.

Acknowledgments

Generating and analyzing the data in this article involved valuable effort on the part of many people. I would like to thank:

  • Forrest Bennett, Stanford University, who contributed conjectures for some of the odd things I found.

  • Greg Dougherty, Molecular Software, who helped find a Macintosh environment that would run all the tests without crashing.

  • Eric Enders, Netscape, who provided Solaris data.

  • Dara Golden, my favorite wife, who provided Windows data.

  • Dee and Rayna Golden, T-Square Design, who provided more Macintosh support

  • Eric Roulo, who contributed Windows data.

  • Bob Scott, Broderbund, who contributed the Linux numbers.

  • Bernie Su, UC San Diego, who provided Windows data.

  • Sal Zaragoza, who provided Solaris data.
Markprogrammed using C and C++ for eight years. He has been using Java professionally for the last two years. His main programming interests are portable, distributed, concurrent software as well as software engineering.

Learn more about this topic

  • There isn't much information about HotSpot available (as of mid-July 1998), but one Web page with some data claims "Short lived object benchmark18 secs classic VM, 8 sec's on Jview, 6 secs malloc/free C, 2 secs HotSpot". If these numbers are representative, we should expect to see object creation time drop to about 3 times greater than C/C++ stack allocation, down from 10 to 12 times greater. http://java.sun.com/javaone/javaone98/keynotes/panel/bullets.htm
  • An interesting theoretical paper on object creation is "Garbage collection can be faster than stack allocation" by Andrew Appel. The paper is about object creation in Lisp, not Java, but still is relevant. It provides some ammunition for the position that in theory, Java object allocation on the heap can be faster than C/C++ object creation on the stack. http://www.cs.princeton.edu/fac/appel/papers/45.ps
  • A really good book on garbage collection is Garbage CollectionAlgorithms for Automatic Dynamic Memory Management by Richard Jones and Rafael Lins. I recommend this book to all serious Java programmers. http://www.cs.ukc.ac.uk/people/staff/rej/gcbook/gcbook.html
  • The article "Optimizing NET Compilers for Improved Java Performance" in the June 1997 issue of Computer compares some actual programs under various Java environments and also under C/C++. It provides a good starting point for answering the question, How fast is Java compared to C++? (as of mid-1997).
  • SPEC announced its JVM98 benchmark suite August 19 (just after we published this article) http://www.spec.org/osg/jvm98/
  • A "FAQs (Frequently Asked Questions) About the Multimedia Benchmarking Committee (MBC)" document notes in part"SPECjava will benchmark Java performance, which means essentially benchmarking the computing platform, as well as the Java virtual machine, the browser, and perhaps the just-in-time compiler." http://www.specbench.org/gpc/Dec97/mbc.static/mbcfaq~1.html
  • See JavaWorld's article "HotSpotA new breed of virtual machine" http://www.javaworld.com/jw-03-1998/jw-03-hotspot.html
| 1 2 Page 2