Mar 1, 1998 12:00 AM PT

Using threads with collections, Part 1

Threads add a new degree of complexity to otherwise straightforward design issues

Threads are an integral part of the Java language. Using threads, many algorithms, such as queue management systems, are easier to access than they are using polling and looping techniques. Recently, while writing a Java class, I found I needed to use threads while enumerating lists, and this uncovered some interesting issues associated with thread-aware collections.

This Java In Depth column describes the issues that I uncovered in my attempt to develop a thread-safe collection. A collection is called "thread-safe" when it can be used safely by multiple clients (threads) at the same time. "So what's the problem?" you ask. The problem is that, in typical usage, a program both changes a collection (called mutating), and reads it (called enumerating).

Some people simply do not register the statement, "The Java platform is multithreaded." Sure, they hear it, and they nod their heads. But they don't grasp that, unlike C or C++, in which threading was bolted on from the side through the OS, threads in Java are basic language constructs. This misunderstanding, or poor grasp, of the inherently threaded nature of Java, inevitably leads to two common flaws in programmers' Java code: Either they fail to declare a method as synchronized that should be (because the object is in an inconsistent state during the method's execution) or they declare a method as synchronized in order to protect it, which causes the rest of the system to operate inefficiently.

I came across this problem when I wanted a collection that multiple threads could use without needlessly blocking the execution of the other threads. None of the collection classes in the 1.1 version of the JDK are thread-safe. Specifically, none of the collection classes will allow you to enumerate with one thread while mutating with another.

Non-thread-safe collections

My basic problem was as follows: Assuming you have an ordered collection of objects, design a Java class such that a thread can enumerate all or part of the collection without worrying about the enumeration becoming invalid due to other threads that are changing the collection. As an example of the problem, consider Java's Vector class. This class is not thread-safe and causes many problems for new Java programmers when they combine it with a multithreaded program.

The Vector class provides a very useful facility for Java programmers: namely, a dynamically-sized array of objects. In practice, you might use this facility to store results where the final number of objects you will be dealing with isn't known until you are done with them all. I constructed the following example to demonstrate this concept.

01 import java.util.Vector;
02 import java.util.Enumeration;
03 public class Demo {
04     public static void main(String args[]) {
05         Vector digits = new Vector();
06         int     result = 0;
07 
08         if (args.length == 0) {
09             System.out.println("Usage is java demo 12345");
10             System.exit(1);
11         }
12 
13         for (int i = 0; i < args[0].length(); i++) {
14             char c = args[0].charAt(i);
15             if ((c >= '0') && (c <= '9'))
16                 digits.addElement(new Integer(c - '0'));
17             else
18                 break;
19         }
20         System.out.println("There are "+digits.size()+" digits.");
21         for (Enumeration e = digits.elements(); e.hasMoreElements();) {
22             result = result * 10 + ((Integer) e.nextElement()).intValue();
23         }
24         System.out.println(args[0]+" = "+result);
25         System.exit(0);
26     }
27 }

The simple class above uses a Vector object to collect digit characters from a string. The collection is then enumerated to compute the integer value of the string. There is nothing wrong with this class except that it is not thread-safe. If another thread happened to hold a reference to the digits vector, and that thread inserted a new character into the vector, the results of the loop in lines 21 through 23 above would be unpredictable. If the insertion occurred before the enumeration object had passed the insertion point, the thread computing result would process the new character. If the insertion happened after the enumeration had passed the insertion point, the loop would not process the character. The worst-case scenario is that the loop might throw a NoSuchElementException if the internal list was compromised.

This example is just that -- a contrived example. It demonstrates the problem, but what is the chance of another thread running during a short five- or six-digit enumeration? In this example, the risk is low. The amount of time that passes when one thread starts an operation at risk, which in this example is the enumeration, and then finishes the task is called the thread's window of vulnerability, or window. This particular window is known as a race condition because one thread is "racing" to finish its task before another thread uses the critical resource (the list of digits). However, when you start using collections to represent a group of several thousand elements, such as with a database, the window of vulnerability increases because the thread enumerating will spend much more time in its enumeration loop, and that makes the chance of another thread running much higher. You certainly don't want some other thread changing the list beneath you! What you want is an assurance that the Enumeration object you are holding is valid.

One way to look at this problem is to note that the Enumeration object is separate from the Vector object. Because they are separate, they are unable to retain control over each other once they are created. This loose binding suggested to me that perhaps a useful path to explore was an enumeration that was more tightly bound to the collection that produced it.

Creating collections

To create my thread-safe collection, I first needed a collection. In my case, a sorted collection was needed, but I didn't bother going the full binary tree route. Instead, I created a collection that I called a SynchroList. This month, I'll look at the core elements of the SynchroList collection and describe how to use it. Next month, in Part 2, I'll take the collection from a simple, easy-to-understand Java class to a complex multithreaded Java class. My goal is to keep the design and implementation of a collection distinct and understandable relative to the techniques used to make it thread-aware.

I named my class SynchroList. The name "SynchroList," of course, comes from the concatenation of "synchronization" and "list." The collection is simply a doubly-linked list as you might find in any college textbook on programming, though through the use of an inner class named Link, a certain elegance can be achieved. The inner class Link is defined as follows:

    class Link {
    private Object data;
    private Link nxt, prv;
    Link(Object o, Link p, Link n) {
        nxt = n; prv = p; data = o;
        if (n != null)
            n.prv = this;
        if (p != null)
            p.nxt = this;
    }
    Object getData() { return data; }
    Link next() { return nxt; }
    Link next(Link newNext) { Link r = nxt; nxt = newNext; return r;}
    Link prev() { return prv; }
    Link prev(Link newPrev) { Link r = prv; prv = newPrev; return r;}
    public String toString() { return "Link("+data+")"; }
}

As you can see in the code above, a Link object encapsulates the linking behavior that the list will use to organize its objects. To implement the doubly-linked list behavior, the object contains references to its data object, a reference to the next link in the chain, and a reference to the previous link in the chain. Further, the methods next and prev are overloaded to provide a means of updating the object's pointer. This is necessary as the parent class will need to insert and delete links into the list. The link constructor is designed to create and insert a link at the same time. This saves a method call in the implementation of the list.

Another inner class is used in the list -- in this case, an enumerator class named ListEnumerator. This class implements the java.util.Enumeration interface: the standard mechanism that Java uses to iterate over a collection of objects. By having our enumerator implement this interface, our collection will be compatible with any other Java classes that use this interface to enumerate the contents of a collection. The implementation of this class is shown in the code below.

        class LinkEnumerator implements Enumeration {
        private Link current, previous;
        LinkEnumerator( ) {
            current = head;
        }
        public boolean hasMoreElements() { return (current != null); }
        public Object nextElement() {
            Object result = null;
            Link tmp;
            if (current != null) {
                result = current.getData();
                current = current.next();
            }
            return result;
        }
    }

In its present incarnation, the LinkEnumerator class is pretty straightforward; it will become more complicated as we modify it. In this incarnation, it simply walks through the list for the calling object until it comes to the last link in the internal linked list. The two methods required to implement the java.util.Enumeration interface are hasMoreElements and nextElement.

Of course, one of the reasons we aren't using the java.util.Vector class is because I needed to sort the values in the collection. We had a choice: to build this collection to be specific to a particular type of object, thus using that intimate knowledge of the object type in order to sort it, or to create a more generic solution based on interfaces. I chose the latter method and defined an interface named Comparator to encapsulate the methods necessary to sort objects. That interface is shown below.

  public interface Comparator {
  public boolean lessThan(Object a, Object b);
  public boolean greaterThan(Object a, Object b);
  public boolean equalTo(Object a, Object b);
  void typeCheck(Object a);
}

As you can see in the above code, the Comparator interface is pretty simple. The interface requires one method for each of the three basic comparison operations. Using this interface, the list can compare the objects that are being added or removed with objects that are already on the list. The final method, typeCheck, is used to ensure type safety of the collection. When the Comparator object is used, the Comparator can be used to insure that objects in the collection are all of the same type. The value of this type checking is that it saves you from seeing object casting exceptions if the object in the list was not of the type you expected. I've got an example later on that uses a Comparator, but before we get to the example, let's look at the SynchroList class directly.

    public class SynchroList {
    class Link {
    ... this was shown above ...
    }
    class LinkEnumerator implements Enumeration {
        ... the enumerator class ...
    }
    /* An object for comparing our elements */
    Comparator cmp;
    Link head, tail;
    public SynchroList() { }
    public SynchroList(Comparator c) { cmp = c; }
    private void before(Object o, Link p) {
        new Link(o, p.prev(), p);
    }
    private void after(Object o, Link p) {
        new Link(o, p, p.next());
    }
    private void remove(Link p) {
        if (p.prev() == null) {
            head = p.next();
            (p.next()).prev(null);
        } else if (p.next() == null) {
            tail = p.prev();
            (p.prev()).next(null);
        } else {
            p.prev().next(p.next());
            p.next().prev(p.prev());
        }
    }
    public void add(Object o) {
        // if cmp is null, always add to the tail of the list.
        if (cmp == null) {
            if (head == null) {
                head = new Link(o, null, null);
                tail = head;
            } else {
                tail = new Link(o, tail, null);
            }
            return;
        } 
        cmp.typeCheck(o);
        if (head == null) {
            head = new Link(o, null, null);
            tail = head;
        } else if (cmp.lessThan(o, head.getData())) {
           head = new Link(o, null, head);
        } else {
            Link l;
            for (l = head; l.next() != null; l = l.next()) {
                if (cmp.lessThan(o, l.getData())) {
                    before(o, l);
                    return;
                }
            }
            tail = new Link(o, tail, null);
        }
        return;
    }
    public boolean delete(Object o) {
        if (cmp == null)
            return false;
        cmp.typeCheck(o);
        for (Link l = head; l != null; l = l.next()) {
            if (cmp.equalTo(o, l.getData())) {
               remove(l);
               return true;
            }
            if (cmp.lessThan(o, l.getData()))
               break;
        }
        return false;
    }
    public synchronized Enumeration elements() {
        return new LinkEnumerator();
    }
    public int size() {
        int result = 0;
        for (Link l = head; l != null; l = l.next()) result++;
        return result;
    }
}

The class is very straightforward; it has two basic constructors. The first creates an instance without a Comparator object. Using this constructor, the SynchroList behaves very much like a Vector, except that it stores its elements in a linked list rather than in an array. The class becomes more interesting with the second constructor. Using the second constructor, the class is initialized with an object that implements the Comparator interface. Using the Comparator object, the logic in the add method compares the object that is being added with the objects that are already on the list. This comparison is done with the line that reads if (cmp.lessThan(...)). That line of code checks each object until the Comparator's test returns true, then inserts the new object into the list just before the current Link. This algorithm results in a list that is sorted in ascending order. The delete method is only possible if you have a Comparator, because it is a requirement that you test for equality between the object passed and the objects in the list.

The last two methods, size and elements, mirror the methods of the same name in the Vector class. The size method is used by applications to get the total size of the list. This allows the application to open a file or perhaps allocate an array of the right size before iterating over all the elements in the list. The final method, elements, provides the Enumeration object that is used by the application to retrieve each object, in sorted order, from the list.

Tying it together

In order to put the list into practice, we must build an application or applet that uses it. For the basic list described here, I've written a simple example application that illustrates its use.

The application consists of two component classes -- the Comparator class and the application itself. The Comparator class is called IntCompare and is shown below.

    public class IntCompare implements Comparator {
    public boolean lessThan(Object a, Object b) {
        return ((Integer) a).intValue() < ((Integer) b).intValue();
    }
    public boolean greaterThan(Object a, Object b) {
        return ((Integer) a).intValue() > ((Integer) b).intValue();
    }
    public boolean equalTo(Object a, Object b) {
        return ((Integer) a).intValue() == ((Integer) b).intValue();
    }
    void typeCheck(Object a) {
        if (a instanceof Integer) return;
        throw new RuntimeException("Type violation!");
    }
}

As you can see in the code above, this comparator was built to compare two objects of type Integer. Now, take a look at the typeCheck method. The typeCheck method will throw an exception when the type of the object passed is not Integer. Further, if you go back and look at the add method of SynchroList, you will see that typeCheck is called before the object is inserted into the linked list. The check done in the add method means that the methods lessThan, greaterThan, and equalTo can cast the objects they are passed to Integer and not worry about Java throwing a class cast exception. Without this guarantee of type safety, the comparison methods in IntCompare would have to be written with checks like this:

   
    public boolean lessThan(Object a, Object b) {
        if (! (a instanceof Integer))
            return false;
        if (! (b instanceof Integer))
            return false;
        return ((Integer) a).intValue() < ((Integer) b).intValue();
    }

The example application is pretty trivial and is shown below.

     import java.util.Enumeration;
public class SynchroTest {
     static int data[] = {
            1, 0, 25, 20, 3, 45, 6, 2, 5, 17, 49, 24, 55, 12, 30, 31,
            32, 33, 34, 45, 46, 47, 10, 40 };
     public static void main(String args[]) {
         SynchroList sl = new SynchroList(new IntCompare());
         for (int i = 0; i < data.length; i++) {
             sl.add(new Integer(data[i]));
         }
         System.out.println("Total list size was "+sl.size()+" elements.");
         int lineNum = 1;
         for (Enumeration e = sl.elements(); e.hasMoreElements(); ) {
             Integer ii = (Integer)(e.nextElement());
             System.out.println("ELEMENT["+lineNum+"] - "+ii);
             lineNum++;
         }
     }
}

This application inserts a sequence of integers into a SynchroList object and then enumerates and prints the results to the console.

When considering a small number of elements, such as a few dozen values, this design would be an elegant, if simple, form for a collection class. However, as with the Vector class, our collection doesn't handle multiple threads very well. Further, when the collection gets to be very large, I would like to design it such that it could return only a piece of the collection instead of the whole thing.

Wrapping up

Now is a good place to pause and consider what we've covered. First, we looked at the issue of multiple threads and the collection classes that are available today in Java. Then we went through the design of a new collection class that provides some fundamental infrastructure on which we can build a much more powerful collection. Next month, we will start with the SynchroList collection and develop it in two ways. The first will be to increase the utility of the collection when its size gets quite large. The larger collection sizes, however, will increase the risk of multiple thread collisions because enumerating the collection will take so much longer. So, the next step will be to take our large-size collection and make it thread-aware. It's not easy to make it completely thread-safe, but we will be able to make it quite usable if we follow a few simple rules.

Chuck McManis currently is the director of system software at FreeGate Corp., a venture-funded start-up that is exploring opportunities in the Internet marketplace. Before joining FreeGate, Chuck was a member of the Java Group. He joined the Java Group just after the formation of FirstPerson Inc. and was a member of the portable OS group (the group responsible for the OS portion of Java). Later, when FirstPerson was dissolved, he stayed with the group through the development of the alpha and beta versions of the Java platform. He created the first "all Java" home page on the Internet when he did the programming for the Java version of the Sun home page in May 1995. He also developed a cryptographic library for Java and versions of the Java class loader that could screen classes based on digital signatures. Before joining FirstPerson, Chuck worked in the operating systems area of SunSoft, developing networking applications, where he did the initial design of NIS+.

Learn more about this topic

  • Generically Speaking -- a column on designing collections from December 1996 http://www.javaworld.com/jw-12-1996/jw-12-indepth.html
  • Java's new Collections Framework for 1.2 http://java.sun.com/products/jdk/1.2/docs/guide/collections/index.html
  • Concurrent Programming in Java by Doug Lea, published by Addison-Wesley ISBN 0-201-69581-2 -- An excellent source of information on issues surrounding the use of multiple threads in your Java applets and applications.
  • Design Patterns Elements of Reusable Object-Oriented Software by Gamma et al, published by Addison-Wesley, ISBN 0-201-63361-2 -- This book shows how valuable the separation of "what" and object does and "how" a particular class design does it.
  • "An in-depth look at Java's character type"
    Eight (bits) is not enough -- Java's character type adds another eight. http://www.javaworld.com/javaworld/jw-01-1998/jw-01-indepth.html
  • "A first look at Borland's JBuilder IDE" http://www.javaworld.com/jw-11-1997/jw-11-indepth.html
  • "A look at inner classes"
    Reduce class clutter in your Java designsUse inner classes. http://www.javaworld.com/javaworld/jw-10-1997/jw-10-indepth.html
  • "Take an in-depth look at the Java Reflection API"
    Learn about the new Java 1.1 tools for finding out information about classes. http://www.javaworld.com/javaworld/jw-09-1997/jw-09-indepth.html
  • "Take a look inside Java classes"
    Learn to deduce properties of a Java class from inside a Java program. http://www.javaworld.com/javaworld/jw-08-1997/jw-08-indepth.html
  • "Build an interpreter in Java -- Implement the execution engine"
    Here's how to take the interpreter classes and run with them. http://www.javaworld.com/javaworld/jw-07-1997/jw-07-indepth.html
  • "How to build an interpreter in Java, Part 2The structure"
    The trick to assembling the foundation classes for a simple interpreter. http://www.javaworld.com/javaworld/jw-06-1997/jw-06-indepth.html
  • "How to build an interpreter in Java, Part 1The BASICs"
    For complex applications requiring a scripting language, Java can be used to implement the interpreter, adding scripting abilities to any Java app. http://www.javaworld.com/javaworld/jw-05-1997/jw-05-indepth.html
  • "Lexical analysis, Part 2Build an application"
    How to use the StreamTokenizer object to implement an interactive calculator. http://www.javaworld.com/javaworld/jw-02-1997/jw-02-indepth.html
  • "Lexical analysis and JavaPart 1"
    Learn how to convert human-readable text into machine-readable data using the StringTokenizer and StreamTokenizer classes. http://www.javaworld.com/javaworld/jw-01-1997/jw-01-indepth.html
  • "Code reuse and object-oriented systems"
    Use a helper class to enforce dynamic behavior. http://www.javaworld.com/javaworld/jw-12-1996/jw-12-indepth.html
  • "Container support for objects in Java 1.0.2"
    Organizing objects is easy when you put them into containers. This article walks you through the design and implementation of a container. http://www.javaworld.com/javaworld/jw-11-1996/jw-11-indepth.html
  • "The basics of Java class loaders"
    The fundamentals of this key component of the Java architecture. http://www.javaworld.com/javaworld/jw-10-1996/jw-10-indepth.html
  • "Not using garbage collection"
    Minimize heap thrashing in your Java programs. http://www.javaworld.com/javaworld/jw-09-1996/jw-09-indepth.html
  • "Threads and applets and visual controls"
    This final part of the series explores reading multiple data channels. http://www.javaworld.com/javaworld/jw-07-1996/jw-07-mcmanis.html
  • "Using communication channels in applets, Part 3"
    Develop Visual Basic-style techniques to applet design -- and convert temperatures in the process. http://www.javaworld.com/javaworld/jw-06-1996/jw-06-mcmanis.html
  • "Synchronizing threads in Java, Part 2"
    Learn how to write a data channel class, and then create a simple example application that illustrates a real-world implementation of the class. http://www.javaworld.com/javaworld/jw-05-1996/jw-05-mcmanis.html
  • "Synchronizing threads in Java"
    Former Java team developer Chuck McManis walks you through a simple example illustrating how to synchronize threads to assure reliable and predictable applet behavior. http://www.javaworld.com/javaworld/jw-04-1996/jw-04-synch.html