Java performance programming, Part 3: Managing collections

See how collections alternatives measure up in performance, and find out how to get the most out of each type

1 2 3 Page 2
Page 2 of 3

The proposal was written with the second approach discussed above in mind, with a compiler that can check types and generate casts as necessary to provide type-specific interfaces to code written for generic objects. The focus is on code clarity and compile-time error checking, rather than on performance, though the proposal does not foresee any significant effects on performance. It also explicitly does not require any support for primitives as parameterized types. (For an in-depth discussion of one group's work on this problem, see "Behold the Power of Parametric Polymorphism in this month's JavaWorld.)

The formation of the Expert Group is the first step in the process of turning the proposal into any actual change to the Java specifications. From this point, the process will continue with a discussion among the experts, leading eventually to a draft specification change. This draft will go through one or more cycles of internal review among the participant companies before it is released as a public proposal. Finally, a reference implementation and reference test suite will be developed and released, and at that point the recommended specification change can be adopted for inclusion into a version of the JDK.

There is no guarantee that the proposal will actually result in a change to the Java specifications, and even if it does, the entire process will take at least one year -- and quite possibly two or more -- to complete. All things considered, it seems likely that we won't actually be able to make use of parameterized types in standard Java until sometime in 2002 or 2003. Even when (or if) this change comes about, it's unlikely to include support for primitive types, so there will likely be a use for type-specific collections in high usage code for some time to come.

Type-specific collections

Although the standard Java libraries don't include support for type-specific collections, it's possible to build template implementations of collections that can be specialized to a particular type by searching and replacing. Here's a (fairly lengthy) example of a collection class, similar to Vector, implemented specifically for java.lang.Integer:

public class IntegerArray
{
    protected int countPresent;
    protected int maximumGrowth;
    protected Integer[] baseArray;
    public IntegerArray(int size, int growth) {
        maximumGrowth = growth;
        baseArray = new Integer[size];
    }
        public IntegerArray(int size) {
        this(size, Integer.MAX_VALUE);
    }
    // Implementation method to increase the underlying array size.
    protected void growArray(int required) {
        int size = Math.max(required,
            baseArray.length + Math.min(baseArray.length, maximumGrowth));
        Integer[] grown = new Integer[size];
        System.arraycopy(baseArray, 0, grown, 0, baseArray.length);
        baseArray = grown;
    }
    // Append a value to the collection.
    public int add(Integer value) {
        int index = countPresent++;
        if (countPresent > baseArray.length) {
            growArray(countPresent);
        }
        baseArray[index] = value;
        return index;
    }
    
    // Insert a value into the collection.
    public void add(int index, Integer value) {
        if (index >= 0 && index <= countPresent) {
            if (++countPresent > baseArray.length) {
                growArray(countPresent);
            }
            if (index < countPresent) {
                System.arraycopy(baseArray, index, baseArray, index + 1,
                    countPresent - index - 1);
            }
            baseArray[index] = value;
        } else {
            throw new ArrayIndexOutOfBoundsException("Invalid index value");
        }
    }
    // Remove a value from the collection.
    public void remove(int index) {
        if (index >= 0 && index < countPresent) {
            if (index < --countPresent){
                System.arraycopy(baseArray, index + 1, baseArray, index,
                    countPresent - index);
                baseArray[countPresent] = null;
            }
        } else {
            throw new ArrayIndexOutOfBoundsException("Invalid index value");
        }
    }
    // Make sure we have at least a specified capacity.
    public void ensureCapacity(int min) {
        if (min > baseArray.length) {
            growArray(min);
        }
    }
    // Set the collection empty.
    public void clear() {
        countPresent = 0;
    }
    // Get number of values in collection.
    public int size() {
        return countPresent;
    }
    // Set the size of the collection.
    public void setSize(int count) {
        if (count > baseArray.length) {
            growArray(count);
        } else if (count < countPresent) {
            for (int i = count; i < countPresent; i++) {
                baseArray[i] = null;
            }
        }
        countPresent = count;
    }
    // Get value from the collection.
    public Integer get(int index) {
        if (index < countPresent) {
            return baseArray[index];
        } else {
            throw new ArrayIndexOutOfBoundsException("Invalid index value");
        }
    }
    // Set the value at a position in the collection.
    public void set(int index, Integer value) {
        if (index < countPresent) {
            baseArray[index] = value;
        } else {
            throw new ArrayIndexOutOfBoundsException("Invalid index value");
        }
    }
    // Convert to an array.
    public Integer[] buildArray() {
        Integer[] copy = new Integer[countPresent];
        System.arraycopy(baseArray, 0, copy, 0, countPresent);
        return copy;
    }
}

In C++, it would be easy to make this a template and generate an implementation specific to work with any type we wanted to use. The equivalent process in Java requires a search and replace (and continued maintenance of multiple copies of the code). We can convert this collection to any type of element we want by substituting the type name for Integer (though for primitive types we'd probably want to handle the class name separately, to get something like IntArray rather than intArray).

Factoring the base

If widely used, however, this approach definitely contributes to code bloat (along with maintenance problems). To mitigate this problem, with a little effort we can define a base class -- ArrayBase in the code below -- for type-specific collections of this type. Our newly defined class will allow a portion of the code to be shared between specific types:

import java.lang.reflect.Array;
public abstract class ArrayBase
{
    protected int countPresent;
    protected int countLimit;
    protected int maximumGrowth;
    public ArrayBase(int size, int growth, Class type) {
        Object array = Array.newInstance(type, size);
        countLimit = size;
        maximumGrowth = growth;
        setArray(array);
    }
    public ArrayBase(int size, Class type) {
        this(size, Integer.MAX_VALUE, type);
    }
    protected abstract Object getArray();
    protected abstract void setArray(Object array);
    protected abstract void discardValues(int from, int to);
    // Implementation method to increase the underlying array size.
    protected void growArray(int required) {
        Object base = getArray();
        int size = Math.max(required,
            countLimit + Math.min(countLimit, maximumGrowth));
        Class type = base.getClass().getComponentType();
        Object grown = Array.newInstance(type, size);
        System.arraycopy(base, 0, grown, 0, countLimit);
        countLimit = size;
        setArray(grown);
    }
    // Get next add position for appending, increasing size if needed.
    protected int getAddIndex() {
        int index = countPresent++;
        if (countPresent > countLimit) {
            growArray(countPresent);
        }
        return index;
    }
    // Make room to insert a value at a specified index.
    protected void makeInsertSpace(int index) {
        if (index >= 0 && index <= countPresent) {
            if (++countPresent > countLimit) {
                growArray(countPresent);
            }
            if (index < countPresent - 1) {
                Object array = getArray();
                System.arraycopy(array, index, array, index + 1,
                    countPresent - index - 1);
            }
        } else {
            throw new ArrayIndexOutOfBoundsException("Invalid index value");
        }
    }
    // Remove a value from the collection.
    public void remove(int index) {
        if (index >= 0 && index < countPresent) {
            if (index < --countPresent){
                Object base = getArray();
                System.arraycopy(base, index + 1, base, index,
                    countPresent - index);
                discardValues(countPresent, countPresent + 1);
            }
        } else {
            throw new ArrayIndexOutOfBoundsException("Invalid index value");
        }
    }
    // Make sure we have at least a specified capacity.
    public void ensureCapacity(int min) {
        if (min > countLimit) {
            growArray(min);
        }
    }
    // Set the collection empty.
    public void clear() {
        setSize(0);
    }
    // Get number of values in collection.
    public int size() {
        return countPresent;
    }
    // Set the size of the collection.
    public void setSize(int count) {
        if (count > countLimit) {
            growArray(count);
        } else if (count < countPresent) {
            discardValues(count, countPresent);
        }
        countPresent = count;
    }
    // Convert to an array of specified type.
    protected Object buildArray(Class type) {
        Object copy = Array.newInstance(type, countPresent);
        System.arraycopy(getArray(), 0, copy, 0, countPresent);
        return copy;
    }
}

ArrayBase factors out a lot of the common code that would otherwise be present in each of the type-specific collections. By using some helpful methods to construct arrays from runtime parameters, we're able to avoid copying the code that sizes and reallocates the underlying array. In addition, by using a set of abstract methods in the base class to call out to the subclass (as discussed in "Java Performance Programming, Part 2"), we're able to avoid a lot of casting in the subclasses while still allowing the base class to manipulate the array used for the collection without knowledge of the specific type of the elements.

IntBaseArray below serves as an example of a type-specific collection class extending our new base class, this time demonstrating its use for primitive int values:

public class IntBaseArray extends ArrayBase
{
    protected int[] baseArray;
    
    public IntBaseArray(int size, int growth) {
        super(size, growth, Integer.TYPE);
    }
    public IntBaseArray(int size) {
        super(size, Integer.TYPE);
    }
    // Implementation of callout to get the underlying array.
    protected Object getArray() {
        return baseArray;
    }
    // Implementation of callout to set the underlying array.
    protected void setArray(Object array) {
        baseArray = (int[]) array;
    }
    // Implementation of callout to initialize a portion of the array.
    protected void discardValues(int from, int to) {
        for (int i = from; i < to; i++) {
            baseArray[i] = 0;
        }
    }
    // Append a value to the collection.
    public int add(int value) {
        int index = getAddIndex();
        baseArray[index] = value;
        return index;
    }
    // Insert a value into the collection.
    public void add(int index, int value) {
        makeInsertSpace(index);
        baseArray[index] = value;
    }
    // Get value from the collection.
    public int get(int index) {
        if (index < countPresent) {
            return baseArray[index];
        } else {
            throw new ArrayIndexOutOfBoundsException("Invalid index value");
        }
    }
    // Set the value at a position in the collection.
    public void set(int index, int value) {
        if (index < countPresent) {
            baseArray[index] = value;
        } else {
            throw new ArrayIndexOutOfBoundsException("Invalid index value");
        }
    }
    // Convert to an array.
    public int[] buildArray() {
        return (int[]) buildArray(Integer.TYPE);
    }
}

This is not only much shorter than the type-specific collection class with which we started; it also has been pared down to a bare minimum in terms of complexity. Most of the methods remaining are nothing but error checkers and a simple operation or two, so they're much less likely to create any maintenance problems than the original code. The real meat of the implementation is now in the base class, and even if we create a dozen type-specific subclasses, the bulk of the code can easily be modified or fixed.

The performance tests

All this code for type-specific collections has been fun to write, and hopefully has been interesting to glance over. Since this series is about performance, though, we need to see how the different implementation approaches compare in order to get a feeling for when it may be worthwhile to use a type-specific collection.

Performance test 1: Bubble sort

Table 1 below shows how several alternatives perform for a typical case in which values in a collection are accessed more often than new ones are added or existing ones are deleted. This test performs a simple bubble sort on the values in a collection. For the base case, it uses a simple fixed-size array of int values to give us a line on the best possible performance for this type of operation with current JVMs. The test then tries the same operation using various types of resizable collections.

 JRE 1.1.8 (Sun)JRE 1.1.8 (IBM)JRE 1.2.2 (Classic)JRE 1.2.2 (HotSpot 2.0 beta)
int[]

8

9

7

10

IntArray

36

39

36

20

IntBaseArray

34

40

36

21

IntegerArray

72

125

74

36

ObjectArray

98

142

102

45

Vector

451

148

292

90

Table 1. Bubble sort performance (typical time in seconds, 20,000 values)
Related:
1 2 3 Page 2
Page 2 of 3