Easier ways to handle arrays -- plus a little string manipulation

Find the middle ground between primitive arrays and Vectors, and string manipulation methods that help suppress homesickness for Perl

Primitive arrays in Java are not very object-oriented. You can get around this shortcoming by using a Vector, but, as I'll explain in more detail below, this is often overkill. The first tool I will introduce is a simple object-oriented wrapper for a primitive array. After discussing arrays, I will move on to the second tool, a pair of little string manipulation routines that I use for splitting and joining, and that have come in handy for me many times.

Dealing with arrays can be a hairy business. You want to use primitive arrays for fast access and low memory overhead, but you don't always know how big an array has to be until it's already populated. You can use a Vector, but doing so means that each primitive value must be wrapped in an object. For example, any int has to be wrapped in an Integer object. But don't lose heart -- there is a happy medium, and that is this month's first tool.

Object wrappers for dynamic arrays

Here's a scenario with which I am commonly confronted: I need to populate an array of primitive values (usually ints). However, I don't know how many values there are until after I've already cycled through them -- thus, I don't know in advance at which size I should initialize the array.

For the sake of the following code examples, let us assume that rset is the object from which I will be extracting the integer values. However, rset does not provide any size, length, or count methods that would tell me how many values it contains.

Problems with primitives

The first solution that comes to mind, from my old days of C hacking, is to declare a very large array, populate it with the values, and then trim it down. Here's how that would look in Java:

    public int[] method01( SomeContainer rset )
    {
        int[] bigArray = new int[ 1024 ];
        int count = 0;
        while( rset.next() )
        {   
            bigArray[ count ] = rset.getInt();
            count += 1;
        }
        int[] smallArray = new int[ count ];
        System.arraycopy( bigArray, 0, smallArray, 0, smallArray.length );
        return smallArray;
    }

If you chose to implement this code, the obvious downfall here would be that the actual number of values could be even greater than your initial estimate. That would result in an ArrayOutOfBoundsException, and you'd be in big trouble. Such an exception is known as a runtime exception. If you expect it, you can attempt to catch it in your code; more often than not, however, you wouldn't see such an error until the application was running.

If you were dead set on using this method, you could make it safer, by which I mean that you would prevent the occurence of the dreaded out-of-bounds exception. In order to do so, you would have to catch the array when it was about to overflow, and then increase the size of the array to accommodate the excess data. There isn't really a growing utility in Java per se, so what you would have to do is create a second, bigger array and copy over the full array. Here's how:

    public int[] method01( SomeContainer rset )
    {
        int[] bigArray = new int[ 1024 ];
        int count = 0;
        while( rset.next() )
        {             if( count >= bigArray.length ) // time to grow!
            {
                int[] tmpArray = new int[ bigArray.length + 1024 ];
                System.arraycopy( bigArray, 0, tmpArray,
                  0, bigArray.length );
                bigArray = tmpArray;
            }  
            bigArray[ count ] = rset.getInt();
            count += 1;
        }
        int[] smallArray = new int[ count ];
        System.arraycopy( bigArray, 0, smallArray, 0, smallArray.length );
        return smallArray;
    }

Vexing Vectors

Of course, there is a cleaner and more object-oriented solution to dealing with arrays. Java has provided us with a dynamic array mechanism called the Vector. You simply put objects into the Vector, and it grows as needed. You can't directly place primitives into a Vector, however; the primitive values must be wrapped in objects. Thankfully, Java has provided us with wrapper objects for all of the primitives. Using the Integer object to wrap the int and storing it into a Vector dramatically cleans up our code:

    public int[] method02( SomeContainer rset )
    {
        Vector vector = new Vector();
        
        while( rset.next() )
        {
            int i = rset.getInt(); // get the value
            Integer wrapper = new Integer( i ); // wrap it up
            vector.addElement( wrapper ); // put it in the Vector
        }

But, as stated at the outset, the end product must be a primitive array, not a Vector. So after we have traversed all of the values and populated the Vector, we have to translate it back into a primitive array:

        int[] array = new int[ vector.size() ];
        int count = 0;
        
        Enumeration e = vector.elements();
        
        while( e.hasMoreElements() )
        {
            Integer wrapper = (Integer) e.nextElement();
            array[ count ] = wrapper.intValue();
            count += 1;
        }
        
        return array;
    }

This example is cleaner that the first, but lots of objects are being created and destroyed here, and that eats up time and memory. In most cases, this may be a negligible sacrifice, but there is a more elegant solution. The tool I've devised combines the best of both worlds: it produces primitive arrays with an object-oriented interface, by encapsulating the logic from the first example into an object interface that mimics the second.

Tool #1: SmartIntArray

I call my tool the SmartIntArray because it is smart enough to grow and shrink itself as necessary. The guts of the tool deal with primitive types and arrays for speed and memory efficiency. Before I explain the implementation, let's revisit the aforementioned problem one more time, using the SmartIntArray:

    public int[] method03( SomeContainer rset )
    {
        SmartIntArray smartArray = new SmartIntArray();
        
        while( rset.next() )
        {
            int i = rset.getInt(); // get the value
            smartArray.add( i ); // put it in the smart array
        }
        
        int[] intArray = smartArray.toArray(); // extract the primitive array
        
        return( intArray );
    }

That's it! In the above example, I added some unnecessary steps (the assignment of interim variables) for the sole purpose of demonstrating, as clearly as possible, how the process interacts with the SmartIntArray. It is possible to optimize the method down to a measly three lines of code:

    public int[] method04( SomeContainer rset )
    {
        SmartIntArray smartArray = new SmartIntArray();    
        while( rset.next() ) smartArray.add( rset.getInt() );     
        return( smartArray.toArray() );
    }

I don't recommend this coding style, especially using the dangerously deceiving while statement without any block delimiters, but it does help to demonstrate the power of the SmartIntArray tool.

Implementing the tool

As I have already mentioned, the internals of the SmartIntArray are modelled after the first code example, which was used as a solution to the problem stated at the opening of the article. The tool uses a primitive int array for storage, and then grows or shrinks the array, via the System.arrayCopy() method, as needed.

Now let's build the tool one step at a time. First comes the constructor, which simply initializes the internal array. I have overloaded the constructor to allow you to specify a starting size and growth size for the array.

public class SmartIntArray
{
    int sp = 0; // "stack pointer" to keep track of position in the array
    private int[] array;
    private int growthSize;
    
    public SmartIntArray()
    {
        this( 1024 );
    }
    
    public SmartIntArray( int initialSize )
    {
        this( initialSize, (int)( initialSize / 4 ) );
    }
    
    public SmartIntArray( int initialSize, int growthSize )
    {
        this.growthSize = growthSize;
        array = new int[ initialSize ];
    }

Simple enough. Next comes the add() method. This is what you use to append values to the array. Notice the similarities between the code below and the original example:

    public void add( int i )
    {
        if( sp >= array.length ) // time to grow!
        {
            int[] tmpArray = new int[ array.length + growthSize ];
            System.arraycopy( array, 0, tmpArray, 0, array.length );
            array = tmpArray;
        }
        array[ sp ] = i;
        sp += 1;
    }

Finally, we need a way to get the primitive array out of its SmartIntArray container:

    public int[] toArray()
    {
        int[] trimmedArray = new int[ sp ];
        System.arraycopy( array, 0, trimmedArray, 0, trimmedArray.length );        
        return trimmedArray;
    }

Tool #2: String manipulation

Before I delved into the wonderful world of Java, I paid my bills by developing applications in Perl. I'll spare you the pages and pages of praise for Perl which I can summon up on command and stick to the bare basics. Perl is, among other things, incredibly adept at manipulating strings. As I code in Java, one of the things I miss from the Perl language is that ease of string manipulation.

(Note: If you are in the same boat as I, an old Perl programmer who now spends all his time in Java, you will definitely want to check out a great regular expressions library (written for Java, inspired by Perl), which is linked to in the Resources section at the bottom of this article.)

Two of the more common methods I used in Perl were split() and join(). After an explanation of each, I will show you how to implement them in Java. Each tool will be a static method in a class called StringTools.

split()

The split() method takes two arguments: a pattern and a string. It then splits the string up into smaller strings based on the occurrence of the specified pattern. For example, if the string were "hello world" and the pattern were a single space (" "), then the results of the split would be two new strings, "hello" and "world".

By now, some of you are probably saying, "But I can do that with a standard Java StringTokenizer object!" And indeed, with that example, you can. But what if the pattern were "Fi" and the string were "Five Finnish Fiddlers Finally Finished Fiddling?" The StringTokenizer would consider "F" to be a token and "i" to be a token, but not the combined "Fi" pattern. This is where my tool shines.

The implementation (listed below in its entirety) is very straightforward. When provided with the pattern and the string, the tool simply traverses the string and breaks off substrings separated by the pattern. I use the indexOf method provided by Java's String class to find the next instance of the pattern. Then, I cut-out the part of the string between pattern occurences and put the piece into an array. But wait! That's not just any array; it's one of those smart arrays that we talked about in the first part of this article.

    public static String[] split( String token, String string )
    {
        SmartStringArray ssa = new SmartStringArray();
        
        int previousLoc = 0;
        int loc = string.indexOf( token, previousLoc );
        
        do
        {
            ssa.add( string.substring( previousLoc, loc ) );
            previousLoc = ( loc + token.length() );
            loc = string.indexOf( token, previousLoc );
        }
        while( ( loc != -1 ) && ( previousLoc < string.length() ) );
        
        ssa.add( string.substring( previousLoc ) );
        
        return( ssa.toArray() );
    }

Notice that after the loop ends, I add the remaining chunk of string to the array. The loop grabs only the string pieces that precede the pattern. This last operation ensures that we get everything after the last occurrence of the pattern.

join()

The join() method is the antithesis of the split() method. Rather than splitting a string into an array of smaller strings, join() assembles an array of smaller strings into one big string. Each substring is joined to its partners via a specified pattern. For example, if we had the strings "one," "two," and "three" in an array, and we specified that pattern as "precedes," then the output of the call to join() would be "oneprecedestwoprecedesthree."

The implementation of join() (listed below) is ludicrously simple. Start with an empty StringBuffer and alternate between appending the next index of the string array and the pattern.

    public static String join( String token, String[] strings )
    {
        StringBuffer sb = new StringBuffer();
        
        for( int x = 0; x < ( strings.length - 1 ); x++ )
        {
            sb.append( strings[x] );
            sb.append( token );
        }
        sb.append( strings[ strings.length - 1 ] );
        
        return( sb.toString() );
    }

Conclusion

I have presented two useful tools here. The first is a smart array capable of resizing itself as necessary, which exposes a simple interface for the programmer. This gives you the speed and efficiency of a primitive array coupled with a nice object-oriented wrapper. The second is a pair of handy string manipulation routines to assist you in splitting or joining. As always, comments, criticisms, and, particularly, design improvement suggestions are welcome.

Thomas E. Davis is a Sun Certified Java Programmer. He lives in sunny South Florida, but spends every waking second indoors in front of the computer. He has spent the past 2+ years designing and building large-scale multitier distributed applications in Java.

Learn more about this topic

Join the discussion
Be the first to comment on this article. Our Commenting Policies
See more