Java performance programming, Part 3: Managing collections

See how collections alternatives measure up in performance, and find out how to get the most out of each type

In Java performance programming, Part 2, we looked into the basics of casting -- what it is, what it costs, and some ways you can reduce your use of it. This month, we'll cover the related issue of collections in Java, looking first at the basic collections support supplied in the original versions of Java, then moving on to the vastly improved support added in JDK 1.2. Looking ahead, we'll also examine the current state of the Java Community Process proposal for generic types in Java -- including both the benefits we're likely to see from this proposal and the areas in which it perhaps falls short.

We'll then take a look at ways to implement type-specific collections in Java, including an approach that uses a common base class to eliminate some of the duplicated code for these collections. Finally, we'll finish off this month's performance tips with the results of timing tests to see how our alternatives measure up.

(Note: See the sidebar below for my views on the effectiveness of Sun's Java Community Process.)

Vintage collections: Vector and Hashtable

If you're developing for JDK 1.1.x, your standard collections support includes just two choices: java.util.Vector and java.util.Hashtable. (There is also java.util.Stack, but this is only a minor variation on Vector). To make matters even worse, the two choices you do have available are noticeably handicapped in the performance area. Both classes use generic java.lang.Object references, creating the need for casting -- with the overhead associated with it -- on virtually every access, and both use synchronized methods for all accesses.

You need synchronized access methods for modifiable collections when they're used by multiple threads. However, in many cases a single thread owns and exclusively accesses a collection. In such a situation, synchronized access methods are overkill, and result in a substantial performance penalty.

We'll discuss the issue of synchronization further in the next article in this series, along with other threading issues. If you can't wait until then, the performance comparison table near the end of this article includes figures for the legacy Vector class that should give you a feeling for how heavy use of synchronization can affect performance.

Given the performance limitations, these old-time collection workhorses are generally best avoided in heavily used code. Just remember that they're still useful when you need only a simple collection and when performance is not a major issue.

There is a way to get improved standard collections support in JDK 1.1.x, but it comes with a price in terms of code compatibility. We'll discuss this further in the next section of this article, but for now let's limit our discussion to these two basic collections classes provided as part of the standard Java class library.

Change your Vector

If you must use JDK 1.1.x collections in heavily executed code, you can often still work around the limitations of these classes. An opportunity to do so arises when you want to dynamically create a collection of some type by accumulating the elements one at a time, as long as you won't be adding to the collection after you've finished creating it. In such a situation, java.util.Vector provides a simple means of accumulating the elements, although the need to access the elements while doing so leaves you at a performance disadvantage.

To get around this, it's easy to convert a Vector into a type-specific array; once you've turned the collection into an array, access is very fast. For the JDK 1.1.x version of Vector, you can take a direct approach to the conversion:

    // Convert a vector of Integer into an array of Integer.
    Vector vector = ...
    Integer[] array = new Integer[vector.size()];
    for (int i = 0; i < array.length; i++) {
        array[i] = (Integer)vector.elementAt(i);
    }

You can also extend Vector to implement a generic form of conversion:

    public class ConvertibleVector extends Vector
    {
        public Object buildArray(Class type) {
            Object copy = Array.newInstance(type, elementCount);
            System.arraycopy(elementData, 0, copy, 0, elementCount);
            return copy;
        }
        // Assumes there's at least one element and it's not null!
        public Object buildArray() {
            if (elementCount > 0 && elementData[0] != null) {
                return buildArray(elementData[0].getClass());
            } else {
                throw new IllegalArgumentException("cannot convert to array");
        }
}

This form assumes that all elements of the Vector are of the same type. The buildArray() method builds an array of elements of that type and returns it. The caller still needs to cast the result to an array of the specific type, though:

    // Convert a vector of Integer into an array of Integer.
    ConvertibleVector vector = ...
    Integer[] array = (Integer[])vector.buildArray();

Customize Hashtable

java.util.Hashtable is more problematic. In addition to the issues involved in casting and synchronization, Hashtable also requires that the keys be objects (rather than primitive types), and then creates more objects to link the keys and values. This is not a major concern in the standard use of a hashtable, when lookups are much more frequent than adds, but can lead to a lot of unnecessary object creation in cases that don't fit this usual pattern.

You can avoid these potential performance problems by using a custom hashtable implementation in place of the standard implementation. On my Website, I've provided simplified example implementations of such custom hashtables that work with int and String key values for objects implementing appropriate interfaces (see the Resources section below for the URL). These examples provide considerably better performance than the standard Hashtable implementation for simple accumulate-only hashtables, and can be extended to include additional functionality.

The next generation: The collections framework

Fortunately, Sun recognized the limitations of Java's original collection classes fairly early. For JDK 1.2, the company designed the collections framework, a greatly enhanced architecture for representing and manipulating collections. Based on six interrelated collection interfaces, the framework provides a variety of efficient implementations, as well as some extremely useful utility classes. The collections are unsynchronized by default, but the framework provides a simple wrapper class to build synchronized versions of the collections when necessary.

Unsynchronized equivalents of the legacy collection classes java.util.Vector and java.util.Hashtable exist in the collections framework (these classes are java.util.ArrayList and java.util.HashMap, respectively), while the legacy classes themselves remain synchronized but add some additional operations. All of the collections classes include support for converting the collection to an array of values via toArray() methods similar to the buildArray() method shown above.

The utility classes include the java.util.Collections class, which implements a variety of useful algorithms for collections, and the java.util.Arrays class, which sorts, searches, compares, and fills arrays of both primitive and generic reference types. Having efficient standard implementations available for these common operations on arrays is especially useful when working with primitive values, since they allow you to directly use efficient arrays of primitives and do not require as much custom code.

The Resources section below lists several articles and documents for more details on the collections framework.

The collections classes are also available for JDK 1.1.x, but only in a modified form (see Resources for a link to this, as well). Since it's generally not possible to add classes to the standard java.xxx packages in a JVM, the form used for JDK 1.1.x supplies the collections framework as a set of classes in the com.sun.java.util.collections package.

These classes can be used by JDK 1.1.x programs to provide the same functionality as their JDK 1.2 equivalents, but, because of the different class paths, they are not assignment or serialization compatible. They are also only licensed for use in JDK 1.1.x JVMs, so code that needs to run on both JDK 1.1.x and JDK 1.2 JVMs cannot use this alternative. Converting source code from the JDK 1.1.x form to the JDK 1.2 form requires only a simple search-and-replace operation, but this still means that the same class files cannot be used for both JDKs.

If you're supporting only JDK 1.1.x, or if you don't mind distributing separate versions of your class files for JDK 1.1.x and JDK 1.2, the add-on collections framework for JDK 1.1.x is well worth a look.

Parameterized types in Java

One of the most requested enhancements in the Java Developer Connection's bug database is support for templates or some form of parameterized types in Java. The idea here is to allow code to be written in a general form that can be applied to a variety of different data types. When you want an instance of the parameterized class that's specific to a particular type of value, you just create one by specifying the actual value type as a parameter.

This issue is especially relevant to collections, since their use is one of the common cases in which parameterized types would be helpful. With parameterized types, you could define a collection as containing objects of a specific type and have the type checked by the compiler whenever the collection was used. Most developers agree that this would be an improvement over the current approach of detecting collection usage errors by means of runtime exceptions -- which usually occur either during demonstrations to upper management or after the system is put into full production!

There are different ways in which support for parameterized types could be implemented in Java. The choices arouse strong emotions for many developers -- as can be seen in the comments to the request for enhancement linked in the Resources section -- with some adamantly demanding support for C++-style templates and others vehemently opposed to any such "perversion" of Java.

With C++ templates, the compiler constructs the code to follow a model or template provided by the developer. Within the constructed code, a specific type is substituted for the placeholder type used in the template definition. This allows a type-specific implementation of an algorithm to be generated automatically from the generic algorithm template to suit the needs of the program.

The upside of this approach is that it allows multiple highly efficient type-specific implementations to be generated from a single source template. You can write a binary search algorithm, for instance, which can then be used with any primitive or (in C++) object type that supports the comparison operations used by the algorithm. It also could be implemented as a compiler change for Java, without necessarily requiring any JVM changes.

The downside? This approach creates multiple copies of the generated code -- one copy for each specific type used with the template. This is the main reason that many Java developers object to a template approach of this type, since it can potentially lead to the proliferation of class files, or at least to multiple specialized copies of class files within a JVM.

Another approach to providing a form of parameterized types is to specify for the compiler the actual type of the objects used in a particular case, while still writing the implementation for generic objects as with the current collections framework. The compiler could then check the types of values passed in to ensure that they match the specified type, and could transparently generate code to cast returned generic values to the specified type.

This approach has the advantage of eliminating multiple copies of the base code, but the disadvantages of poorer performance (due to the casting) and a lack of support for primitive types. It would also require extensions to the class file format, to support specifying parameterized types, and probably to the JVMs as well, to support checking for proper usage of these types.

Community process to the rescue?

Sun has initiated a specification proposal -- JSR-000014, "Add Generic Types to the Java Programming Language" -- under the Java Community Process (JCP) to investigate adding support for some form of parameterized types to Java (see Resources for a link). The recent formation of an Expert Group indicates that the specification proposal, which was approved in May, has started moving forward. (See the sidebar below for my thoughts on the JCP.)

Sun's Gilad Bracha serves as the contact person for this particular JSR. In a telephone interview, he provided some details of the direction being taken by this process and the time frame in which we might see some results.

Related:
1 2 3 Page 1