Check out three collections libraries

Alternatives and additions to the Java Collections Framework

When Java was born, the only data structure support built into the core libraries involved arrays, vectors (essentially dynamically-sized arrays), and hash tables (for key-value pairs). Support for such key data structures as balanced trees, generic bags of objects, and priority queues didn't exist, but could certainly be created. Thus, many developers added their own structures to the libraries.

In this article, we review some of those early attempts, how they evolved with Sun Microsystems' release of the Java Collections Framework, and how the Java Collections Framework is changing with the evolution of the early and later attempts at filling users' needs for data structure support.

History of Java Collections Frameworks

Looking back at the initial alternative data structure collections available, Doug Lea, who serves on the faculty at the State University of New York at Oswego, probably created the first most widely used collection (see Resources). It (collections) was released into the public domain in the fall of 1995.

At the time, many early Java developers were transitioning from the C++ world. These developers had worked with the Standard Template Library (STL) and found the original Java core libraries severely lacking in algorithm and data structure support. Having come from the C++ and STL worlds, a company called ObjectSpace decided to port the STL to Java. The resulting Java Generic Library (JGL) was released in June 1996, but Sun's legal department didn't like the name. ObjectSpace consequently renamed it the Generic Collection Library for Java, but the JGL acronym stuck.

Though rather large, JGL was a popular offering for some time. Through various marketing efforts, ObjectSpace convinced 10 IDE tool vendors to package the library with their tools. ObjectSpace claimed their user base was more than 100,000. (At the time, the number of Java developers was nowhere near seven figures yet.) JGL became a de facto standard during Java's early days.

After a few revisions, including dividing a single package into subpackages and integrating with ObjectSpace's Voyager Object Request Broker (ORB) product, JGL 3.1 was released in the fall of 1997.

With the impending introduction of J2SE (Java 2 Platform, Standard Edition) 1.2 and its changes to the world of data structure support, Doug Lea abandoned his collections package and started work on a second library of utility classes. This new package was designed to improve concurrent or multithreaded access to various data structures. Released in July 1998, these classes were called util.concurrent and offered locking, pooling, and concurrent access support (see Resources).

Along comes the Java Collections Framework

J2SE 1.2 was released in December 1998 and included a set of classes called the Java Collections Framework for manipulating data collections. Early comparisons pitted JGL against the Collections Framework (see Resources), even though the two were meant to serve different purposes—JGL was an STL replacement, and the Collections Framework was not. Former C++ developers tended to use JGL because JGL's sheer size required a steep learning curve that came more natural to those already familiar with the STL. JGL had approximately 130 classes and interfaces while the Collections Framework was about 20 percent of that size.

Another alternative, the Jakarta Commons Collections component, was released in July 2001, which extended J2SE 1.2 APIs with specialized abstract data types and new methods to test set theory. A second version was released in April 2002, adding even more specialized implementations.

Not much changed between the initial release of the Collections Framework and J2SE 1.4's release beyond the addition of a few more classes and interfaces. Support for generic types (i.e., templates) was debated with JSR (Java Specification Request) 14 but didn't make it into the J2SE 1.4 release. JSR 14 released a working prototype for generic-types support, but it is only targeted at the developer community and is unsupported (see Resources).

JSR 166 was introduced this past January. Led by Doug Lea, this is a community effort to incorporate much of the high-level concepts in his previously mentioned util.concurrent library into the Java core libraries.

Little was heard from the folks at ObjectSpace with regard to JGL until the day before J2SE 1.4's release when a company named Recursion Software announced its acquisition of JGL and related ObjectSpace product lines. Then, in July 2002, Recursion Software released JGL 4.0, integrating JGL collections and algorithms with the standard Collections Framework.

That brings us to today. Now available are the following pieces:

  • The Collections Framework, a part of the Java core libraries that defines key interfaces to be shared by all data structure implementations
  • The Jakarta Commons Collections component, freely available under the Apache Software License
  • JGL 4.0
  • Generic-types support, available as a prototype for JSR 14
  • Two nonstandard libraries from Doug Lea, evolving into a core standard with JSR 166

Let's examine each of these and consider when you might use them.

The Collections Framework

For those unfamiliar with the Collections Framework, it offers a core set of six interfaces: Collection, Set, List, SortedSet, Map, and SortedMap.

A Collection is the base interface for sets and lists. It describes a generic group of elements with no particular characteristics. There are no direct implementations of Collection, only sub-interface implementations.

A Set is a collection of items that do not support duplicates. There are two standard Set implementations: HashSet and TreeSet; TreeSet is sorted and also implements SortedSet.

The List interfaces are for ordered collections, offering either indexed or sequential access. List implementations include ArrayList and LinkedList; ArrayList replaces the original Vector class.

Map describes collections of key-value pairs, similar to Hashtable. Available maps are HashMap and TreeMap; TreeMap is sorted and also implements SortedMap. J2SE 1.4 introduced two new implementations: LinkedHashSet and LinkedHashMap, for maintaining insertion order while needing constant times for operations like adding, searching, and removing. Another J2SE 1.4 implementation is IdentityHashMap, which uses == instead of equals() for equality checking operations. For those interested in weak references, yet another map, WeakHashMap, is available that uses WeakReference for the keys, therefore dropping the key-value pair if the only reference to the value is through the key.

These elements form the basic framework. By design, the framework is not thread-safe. Safe simultaneous access from multiple threads requires you to wrap the collection with a thread-safe wrapper. Similar wrappers exist for read-only versions of the collections.

While the Collections Framework is relatively small, it is not meant to provide support for every data structure needed. Instead, it is only the framework from which you can build more specialized implementations.

The Jakarta Commons Collections component

Many specialized implementations are well defined and understood, but they are not part of the core Collections Framework yet. Several of these implementations might be included with the concurrency utilities library (JSR 166) discussed in more detail later.

The Jakarta Commons Collections component is one example of a set of specialized implementations. Designed to work with J2SE 1.2, the Commons Collections component provides two List implementations and eight Map implementations. New base structures are also available and are the most interesting.

Let's look at the custom implementations in groups.

The simplest to explain are the FastArrayList, FastHashMap, and FastTreeMap implementations. These extend the standard ArrayList, HashMap, and TreeMap implementations that offer safe simultaneous read-write access from multiple threads. However, safety comes at the cost of slower write operations. While the read operations aren't synchronized, the write operations are performed on a data clone before the existing structure is replaced.

The Bag acts like a Set but permits duplicates. Two more implementations are available here: HashBag and TreeBag; the latter is sorted.

Another new interface is PriorityQueue. It supports comparable items and the use of a Comparator, so items can be maintained in a BinaryHeap based on priority and subsequently removed from the heap based on that priority.

The remaining implementations are a set of specialized Map implementations. They offer special-purpose, key-value pair lookup maps.

  • BeanMap uses reflection to provide key-value pairs based on JavaBean properties; the key is the property name, and the value is its value.
  • ExtendedProperties provides an extension to the java.util.Properties class that concatenates multiple values for a single property, instead of overwriting it.
  • LRUMap is a Map that lets you maintain a maximum capacity and uses a least-used (accessed) algorithm to determine which node to remove when full. This capability is already accessible from the standard LinkedHashMap class, but is only available with J2SE 1.4
  • .

  • SoftRefHashMap works like WeakHashMap but uses SoftReference instead of WeakReference for its keys.
  • MultiHashMap provides a multimap, where keys can have multiple associated values. Fetching the value for a key returns a Collection instead of the specific value for the key.
  • DoubleOrderedMap is the most interesting of the bunch. It provides a Map that supports fast lookup by value and by key. It has one requirement though: both keys and values must be unique. You can do this with two TreeMap objects, but DoubleOrderedMap only saves each key-value pair once.

Other classes and interfaces serve mostly supporting roles, but some specialized behavior is available. Besides utility methods to work with sets like isSubCollection and union, the framework includes additional Comparators and Iterators. The Comparators are used mostly for chaining or changing another Comparator's behavior. For instance, while java.util.Collections.reverseOrder() lets you reverse the natural order of elements, the ReverseComparator class lets you reverse the order from any Comparator. The Iterators support the process of passing each Collection element to a function (method) before getting the element back. This is a shared behavior of JGL and STL too.

JGL 4.0

Without any updates since 1997, JGL's new release caught me by surprise. I thought JGL was for all intents and purposes dead. Apparently, Recursion Software is giving it another go. The company claims a user base of more than 250,000, but I haven't heard of anyone using it for some time. The new release has three primary differences from the previous 3.1 version:

  • All the classes and interfaces are integrated into the Collections Framework, and like the Collections Framework, all implementations are unsynchronized by default
  • The package namespace has changed from com.objectspace.jgl to com.recursionsw.jgl
  • The Voyager-related capabilities are gone

If you are unfamiliar with JGL, see the JavaWorld article (June 1997) in Resources. Not much has changed with the new release. However, the library now works only with J2SE 1.3.1 and 1.4, and you can mix and match the standard Collections Framework with JGL capabilities. JGL provides several abstract data types and algorithm support as separate pieces. It supports more than 50 algorithms compared to the Collections Framework's 18. I'm not sure why anyone would want to learn two different frameworks from scratch, but one benefit is that those still using JGL can transition their applications to the Collections Framework in pieces, instead of all at once. JGL may still be more familiar to C++ developers learning Java than the Collections Framework, but since JGL is now part of that framework, this shouldn't matter.

Generic types

The oldest of the planned changes to the core Collections Framework is support for parameterized data types, also called generics, or templates. Right now, all collections are not type-safe. You can add any object into the collection, and, when you get an element out, you must cast it to the appropriate data type. You don't know until runtime if you are getting out of a collection the type you expect. JSR 14 proposes the ability to provide type-safe collection access.

In the simplest case, you just pass the data type along with the class name to make the collection the right type:

  public static void main(String args[]) {
    List<String> listOne = Arrays.asList(args);
    List<String> listTwo = new ArrayList<String>(listOne);
  }

In a more complicated case, you can pass types into class definitions so a data type internal to the class can be generic:

class Node<T> {
  T value;
  Node(T value) {
    this.value = value;
  }
  T getValue() {
    return value;
  }
  void setValue(T value) {
    this.value = value;
  }
}

The exact syntax for generic-types support and whether it will be incorporated into a future J2SE release (perhaps J2SE 1.5) is still to be determined. But you can generate class files today that will work with any J2SE 1.3 runtime environment. You don't have to wait for J2SE 1.5 to try the JSR 14 prototype (see Resources).

Related:
1 2 Page 1