Put Java in the fast lane

Follow a performance improvement process to locate bottlenecks in your application

In my consulting practice, one of the most common complaints I hear about Java is that it's "too slow," "too resource intensive," or that "performance may be a problem." These statements are often made without any basis in fact; many assume that Java's poor performance is a given.

Of course, Java programs can be slow, memory intensive, and painful to use; but so can programs written in C++, Visual Basic, Smalltalk, Pascal, Ada, or C#. That doesn't necessarily mean that the language or the runtime environment is bad; it might just mean that a developer wrote the code without considering performance side effects. A well-written Java program usually performs just as well as a program written in any other programming language, and it often performs better, especially with the performance improvements in JDK 1.3 and 1.4.

One important note: Perceived performance is the critical measure. It doesn't matter how fast, efficient, or elegant your code is if it doesn't appear to be running or displaying fast enough. Likewise, ugly code can sometimes be blazingly fast—but a maintenance nightmare. We should really strive for code that combines the best of both worlds: it's elegant and performs well enough to satisfy our clients.

To move you a step closer to that goal, in this article, I walk you through a performance improvement process. To illustrate the process, I include code that uses classes from the new java.nio package to provide you with additional clues about how to use them more effectively.

Performance improvement process

Suppose you've written or inherited a Java application that just isn't performing up to snuff. Your boss is breathing down your neck, and the delivery deadline is next week. What do you do?

I'm assuming you don't want to take the "run-away-as fast-as-you-can" approach. You want to fix the performance problems. Where do you start?

I recommend the following performance improvement process:

  1. Decide what performance level is "good enough"
  2. Test on all your target platforms
  3. If performance is good enough on all target platforms, stop; do not pass go, do not collect 00
  4. Profile your application to find the bottlenecks
  5. Re-architect or rewrite your code to fix the bottlenecks
  6. Return to Step 2

To illustrate the process, let's look at an example. Along the way, I'll fill in more details on the individual steps. The examples include mostly AWT (Abstract Windowing Toolkit) graphics, because performance improvements in graphical programs are easier to see and more interesting to write.

Apply the process

The following program reads a series of files and counts the occurrences of the letters a to z in the files. It displays a histogram of the letter frequency; the histogram updates after each file is read.

Examine the code below and note to yourself where you could improve it. Then follow along to see if your intuition was correct:

import java.awt.Color;
import java.awt.Component;
import java.awt.Dimension;
import java.awt.Graphics;
import java.awt.Frame;
import java.io.IOException;
import java.io.FileInputStream;
public class Letters extends Component { 
    long[] countArray = new long[26];
    static char[] letterArray = 
    {'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o',
     'p','q','r','s','t','u','v','w','x','y','z'};
    /**
     * Find the number of occurrences of each letter of the
     * alphabet in the named file.  The result is returned
     * as a 26-element array of long elements.
     * Of course, this will only work for the English alphabet.
     */
    void countCharacters (String filename) 
    throws IOException {
        System.out.println ("...reading " + filename);
        FileInputStream fis =
           new FileInputStream (filename);
        int tmp;
        while ((tmp = fis.read()) != -1) {
            char c = Character.toLowerCase((char)tmp);
            int pos = c - 'a';
            if ((pos >= 0) && (pos <= 25)) {
               ++countArray[pos];
            }
        }
        fis.close();
    }
    /**
      * Draw a histogram of the letter frequency.
      * This method is triggered by repaint(), or by 
      * window manager repaint events.
      */
    public void paint (Graphics g) {
        long maxCount = 0;
        for (int i=0; i<countArray.length; ++i) {
            if (countArray[i] > maxCount) maxCount = countArray[i];
        }
        Dimension d = getSize();
        double yScale = ((double)d.height) / ((double)maxCount);
        int barWidth = (int)d.width / countArray.length;
        int x = 0;
        for (int j=0; j<countArray.length; ++j) {
            g.setColor (Color.blue);
            int barHeight = (int)(countArray[j]*yScale);
            g.fillRect (x, d.height-barHeight, 
                        barWidth, barHeight);
            g.setColor (Color.white);
            g.drawRect (x, d.height-barHeight, 
                        barWidth, barHeight);
            g.setColor (Color.black);
            g.drawChars (letterArray, j, 1, x, 10);
            x += barWidth;
        }
    }
    /**
     * The main method.  Files to analyze are specified
     * on the command line; their names are in argv.
     */
    public static void main (String[] argv) 
    throws IOException {
        Letters letters = new Letters();
        Frame f = new Frame ("Letter Count");
        f.add (letters);
        f.setSize (300, 200);
        f.setVisible (true);
        long startTime = System.currentTimeMillis();
        for (int i=0; i<argv.length; ++i) {
            letters.countCharacters (argv[i]);
            letters.repaint();
        }
        long endTime = System.currentTimeMillis();
        System.out.println ();
        System.out.println 
            ("Elapsed time: " + (endTime - startTime) + " msec");
        System.out.println();
                
        for (int i=0; i<letters.countArray.length; ++i) {
            System.out.print 
               (letterArray[i] + "=" + letters.countArray[i] + ",");
        }
        System.out.println();
    }
}

What's wrong with this program, and how do we fix it? Let's follow our improvement process.

Step 1: Decide what performance level is good enough

I'll arbitrarily say that performance is good enough when the program can read and process all HTML documents from package javax.swing in about 10 seconds. In the real world, external, and often political, factors often determine performance targets, which might still be arbitrary, of course.

Step 2: Test on all your target platforms

I have two workstations handy and run JDK 1.4 on both: a fast Pentium running Windows (Platform A), and a slower one running Linux (Platform B). Testing on these two platforms shows that Platform A reads and processes the HTML documents in about 140 seconds; Platform B does the same in 153 seconds. Ouch. Not nearly good enough.

Step 3: If performance is good enough, stop

We didn't meet the performance targets, so we continue with optimization.

Step 4: Profile your application

You can profile your application any number of ways. Many profilers have GUI (graphical user interface) front ends; prices range from free to thousands of dollars. See Resources for a selection. Sun Microsystems' JDK includes the hprof tool for profiling, which I'll use to illustrate the profiling process.

The hprof profiler allows you to track many interesting things about an application, including object creation statistics and method profiles. I'll use method profiling for this example. Object creation information proves useful for locating memory leaks and garbage collection problems, which can offer another valuable exercise. To obtain a list of hprof options, type java -Xrunhprof:help. For this example's purposes, I used java -Xrunhprof:cpu=samples,depth=15 Letters. That generates an output file (java.hprof.txt) containing information about thread lifetime and an estimate of time spent in each method. The estimate generates at program exit by default; you can also receive an intermediate snapshot by typing <ctrl-Break> on Windows or <ctrl-\> on Unix.

Running the program generates the following output in java.hprof.txt:

CPU SAMPLES BEGIN (total = 3245) Sun Jul 14 18:18:03 2002
rank   self  accum   count trace method
   1 52.30% 52.30%    1697    25 sun.awt.windows.WToolkit.eventLoop
   2 45.30% 97.60%    1470    41 java.io.FileInputStream.read
   3  0.52% 98.12%      17    45 java.awt.EventQueue.postEventPrivate
...
CPU SAMPLES END

The samples rank in order, starting with the methods used most often. You can usually focus on the first 5 to 10 entries. We'll ignore the first entry for now. We can find information about the second entry by looking for the string TRACE 41 in the profiling output:

TRACE 41:
   java.io.FileInputStream.read(FileInputStream.java:Native method)
   Letters.countCharacters(Letters.java:27)
   Letters.main(Letters.java:71)

Hmm...we're spending an awful lot of time calling FileInputStream.read() in countCharacters(). That seems like a good place to make a change.

If you find a suspicious method with method-profiling techniques, you have a few options:

  1. Optimize the method—perhaps use a better algorithm
  2. Call the method less often
  3. Don't call the method at all

Step 5: Re-architect or rewrite your code to fix the bottlenecks

Looking at the code, you see that the input is unbuffered. Buffering could help quite a bit. Adding buffering is easy; just use a BufferedInputStream:

...import java.io.BufferedInputStream;
...
    void countCharacters (String filename)
    throws IOException {
        BufferedInputStream bis = 
           new BufferedInputStream(new FileInputStream (filename));
        int tmp;
        while ((tmp = bis.read()) != -1) {
            char c = Character.toLowerCase((char)tmp);
            int pos = c - 'a';
            if ((pos >= 0) && (pos <= 25)) {
                ++countArray[pos];
            }
        }
        bis.close();
...

Step 6: Return to Step 2

Testing the modified code yields the following:

  • Platform A: Times between 3.3 and 5.8 seconds
  • Platform B: Times between 18.6 and 19.2 seconds

Not good enough, but better. Let's take another pass at optimization.

Continue optimizing code

At the above profiling output, the top entry originates from the GUI event loop. The event loop starts as soon as we make a GUI object visible (that is, call setVisible(true)), and the loop runs for the application's lifetime. Also notice that the application redraws the graph after every file is read; this proves more significant for large file numbers. Assuming we don't care about viewing the intermediate graph (that is, it isn't a firm requirement), we can speed up the application by moving the graphics display to main()'s end. Then the event loop won't start until the last minute in the application, and paint() will probably only be called a couple of times.

At this point, we might question whether the graphical display is required at all and whether it needs to run simultaneously with the application. Depending on the answer, we might be able to move the graphics to a different thread, a different application, or even a different machine. For this discussion's purposes, though, we'll assume immediate graphics display is required.

main()'s modified version looks like this:

    public static void main (String[] argv) 
    throws IOException {
        Letters letters = new Letters();
        long startTime = System.currentTimeMillis();
        ...
      for (int i=0; i<letters.countArray.length; ++i) {
          System.out.print 
            (letterArray[i] + "=" + letters.countArray[i] + ",");
      }
      System.out.println();
        Frame f = new Frame ("Letter Count");
        f.add (letters);
        f.setSize (300, 200);
        f.setVisible (true);
    }

After making the change, testing shows:

  • Platform A: Times between 2.4 and 5 seconds
  • Platform B: Times between 11.6 and 12.8 seconds

We've improved performance, but still haven't met our performance guidelines.

Additional profiling shows the following:

CPU SAMPLES BEGIN (total = 149) Sun Jul 14 19:53:50 2002
rank   self  accum   count trace method
   1 28.86% 28.86%      43    12 java.io.FileInputStream.readBytes
   2 16.78% 45.64%      25    29 sun.awt.windows.WToolkit.init
   3 16.11% 61.74%      24    30 sun.awt.windows.WToolkit.eventLoop

That qualitatively differs from what we saw before. Most of our time is spent in readBytes(). What can we do about readBytes()?

JDK 1.4 introduced a new set of classes in the java.nio package and its subpackages for buffering I/O (input/output). One class in particular looks useful: MappedByteBuffer. We can use a MappedByteBuffer object to efficiently represent the contents of a file in memory; the class manages all the details of buffering and memory management.

To connect the MappedByteBuffer object to the file, we use a FileChannel object. FileChannel represents a thread-safe connection between a buffer and a file that can read, write, map, and manipulate the file. The FileChannel object maintains information about the current position in the file, and also provides for low-level OS-specific optimizations. The JDK 1.4 documentation includes examples of how to use FileChannel and MappedByteBuffer.

We modify our code to use FileChannel and MappedByteBuffer:

1 2 Page 1