Behold the power of parametric polymorphism

Adding generic types to Java could mean less coding and fewer

Suppose you want to implement a list class in Java. You start with an abstract class, List, and two subclasses, Empty and Cons, representing empty and nonempty lists, respectively. Since you plan to extend the functionality of these lists, you design a ListVisitor interface, and provide accept(...) hooks for ListVisitors in each of your subclasses. Furthermore, your Cons class has two fields, first and rest, with corresponding accessor methods.

What will the types of these fields be? Clearly, rest should be of type List. If you know in advance that your lists will always contain elements of a given class, the task of coding will be considerably easier at this point. If you know that your list elements will all be integers, for instance, you can assign first to be of type integer.

However, if, as is often the case, you don't know this information in advance, you must settle for the least common superclass that has all possible elements contained in your lists, which typically is the universal reference type Object. Hence, your code for lists of varying type elements has the following form:

abstract class List {
    public abstract Object accept(ListVisitor that);
}
interface ListVisitor {
    public Object _case(Empty that);
    public Object _case(Cons that);
}
class Empty extends List {
    public Object accept(ListVisitor that) {
      return that._case(this);
    }
}
class Cons extends List {
    private Object first;
    private List rest;
    
    Cons(Object _first, List _rest) {
      first = _first;
      rest = _rest;
    }
    
    public Object first() {return first;}
    public List rest() {return rest;}
    
    public Object accept(ListVisitor that) {
      return that._case(this);
    }
}

Although Java programmers often use the least common superclass for a field in this way, the approach has its disadvantages. Suppose you create a ListVisitor that adds all the elements of a list of Integers and returns the result, as illustrated below:

class AddVisitor implements ListVisitor {
    private Integer zero = new Integer(0);
    public Object _case(Empty that) {return zero;}
    public Object _case(Cons that) {
      return new Integer(((Integer) that.first()).intValue() + 
                     ((Integer) that.rest().accept(this)).intValue());
    }
}

Note the explicit casts to Integer in the second _case(...) method. You are repeatedly performing runtime tests to check properties of the data; ideally, the compiler should perform these tests for you as part of program type checking. But since you are not guaranteed that AddVisitor will only be applied to Lists of Integers, the Java type checker cannot confirm that you are, in fact, adding two Integers unless the casts are present.

You could potentially obtain more precise type checking, but only by sacrificing polymorphism and duplicating code. You could, for instance, create a special List class (with corresponding Cons and Empty subclasses, as well as a special Visitor interface) for each class of element you store in a List. In the example above, you would create an IntegerList class whose elements are all Integers. But if you wanted to store, say, Booleans in some other place in the program, you would have to create a BooleanList class.

Clearly, the size of a program written using this technique would increase rapidly. There are further stylistic issues, as well; one of the essential principles of good software engineering is to have a single point of control for each functional element of the program, and duplicating code in this copy-and-paste fashion violates that principle. Doing so commonly leads to high software development and maintenance costs. To see why, consider what happens when a bug is found: the programmer would have to go back and correct that bug separately in every copy made. If the programmer forgets to identify all of the duplicated sites, a new bug will be introduced!

But, as the example above illustrates, you will find it difficult to simultaneously keep a single point of control and use static type checkers to guarantee that certain errors will never occur when the program executes. In Java, as it exists today, you often have no choice but to duplicate code if you want precise static type checking. To be sure, you could never eliminate this aspect of Java entirely. Certain postulates of automata theory, taken to their logical conclusion, imply that no sound type system can determine precisely the set of valid inputs (or outputs) for all methods in a program. Consequently, every type system must strike a balance between its own simplicity and the expressiveness of the resulting language; the Java type system leans a bit too much in the direction of simplicity. In the first example, a slightly more expressive type system would have let you maintain precise type checking without having to duplicate code.

Such an expressive type system would add generic types to the language. Generic types are type variables that can be instantiated with an appropriately specific type for each instance of a class. For the purposes of this article, I will declare type variables in angle brackets above class or interface definitions. The scope of a type variable will then consist of the body of the definition at which it was declared (not including the extends clause). Within this scope, you can use the type variable anywhere that you can use an ordinary type.

For example, with generic types, you could rewrite your List class as follows:

abstract class List<T> {
    public abstract T accept(ListVisitor<T> that);
}
interface ListVisitor<T> {
    public T _case(Empty<T> that);
    public T _case(Cons<T> that);
}
class Empty<T> extends List<T> {
    public T accept(ListVisitor<T> that) {
      return that._case(this);
    }
}
class Cons<T> extends List<T> {
    private T first;
    private List<T> rest;
    
    Cons(T _first, List<T> _rest) {
      first = _first;
      rest = _rest;
    }
    public T first() {return first;}
    public List<T> rest() {return rest;}
    
    public T accept(ListVisitor<T> that) {
      return that._case(this);
    }
}

Now you can rewrite AddVisitor to take advantage of the generic types:

class AddVisitor implements ListVisitor<Integer> {
    private Integer zero = new Integer(0);
    
    public Integer _case(Empty<Integer> that) {return zero;}
    public Integer _case(Cons<Integer> that) {
      return new Integer((that.first()).intValue() + 
                 (that.rest().accept(this)).intValue());
    }
}

Notice that the explicit casts to Integer are no longer needed. The argument that to the second _case(...) method is declared to be Cons<Integer>, instantiating the type variable for the Cons class with Integer. Therefore, the static type checker can prove that that.first() will be of type Integer and that that.rest() will be of type List<Integer>. Similar instantiations would be made each time a new instance of Empty or Cons is declared.

In the example above, the type variables could be instantiated with any Object. You could also provide a more specific upper bound to a type variable. In such cases, you could specify this bound at the type variable's point of declaration with the following syntax:

 
<var> extends <bound> 

For instance, if you wanted your Lists to contain only Comparable objects, you could define your three classes as follows:

class List<T extends Comparable> {...}  
class Cons<T extends Comparable> {...}  
class Empty<T extends Comparable> {...}  

Although adding parameterized types to Java would give you the benefits shown above, doing so would not be worthwhile if it meant sacrificing compatibility with legacy code in the process. Fortunately, such a sacrifice is not necessary. It is possible to automatically translate code, written in an extension of Java that has generic types, to bytecode for the existing JVM. Several compilers already do this -- the Pizza and GJ compilers, written by Martin Odersky, are particularly good examples. Pizza was an experimental language that added several new features to Java, some of which were incorporated into Java 1.2; GJ is a successor to Pizza that adds only generic types. Since this is the only feature added, the GJ compiler can produce bytecode that works smoothly with legacy code. It compiles source to bytecode by means of type erasure, which replaces every instance of each type variable with that variable's upper bound. It also allows type variables to be declared for specific methods, rather than for whole classes. GJ uses the same syntax for generic types that I use in this article.

Work in progress

At Rice University, the programming languages technology group in which I work is implementing a compiler for an upward-compatible version of GJ, called NextGen. The NextGen language was jointly developed by Professor Robert Cartwright of Rice's computer science department and Guy Steele of Sun Microsystems; it adds the ability to perform runtime checks of type variables to GJ.

Another potential solution to this problem, called PolyJ, was developed at MIT. It is being extended at Cornell. PolyJ uses a slightly different syntax than GJ/NextGen. It also differs slightly in the use of generic types. For example, it does not support type parameterization of individual methods, and currently, doesn't support inner classes. But unlike GJ or NextGen, it does allow type variables to be instantiated with primitive types. Also, like NextGen, PolyJ supports runtime operations on generic types.

Sun has released a Java Specification Request (JSR) for adding generic types to the language. Unsurprisingly, one of the key goals listed for any submission is the maintenance of compatibility with existing class libraries. When generic types are added to Java, it is likely that one of the proposals discussed above will serve as the prototype.

There are some programmers who are opposed to adding generic types in any form, despite their advantages. I'll refer to two common arguments of such opponents as the "templates are evil" argument and the "it's not object oriented" argument, and address each of them in turn.

Are templates evil?

C++ uses templates to provide a form of generic types. Templates have earned a bad reputation among some C++ developers because their definitions are not type checked in parameterized form. Instead, the code is replicated at each instantiation, and each replication is type checked separately. The problem with this approach is that type errors might exist in the original code that don't show up in any of the initial instantiations. These errors can manifest themselves later if program revisions or extensions introduce new instantiations. Imagine the frustration of a developer using existing classes that type check when compiled by themselves, but not after he adds a new, perfectly legitimate subclass! Worse still, if the template is not recompiled along with the new classes, such errors will not be detected, but will instead corrupt the executing program.

Because of these problems, some people frown upon bringing templates back, expecting the drawbacks of templates in C++ to apply to a generic type system in Java. This analogy is misleading, because the semantic foundations of Java and C++ are radically different. C++ is an unsafe language, in which static type checking is a heuristic process with no mathematical foundation. In contrast, Java is a safe language, in which the static type checker literally proves that certain errors cannot occur when the code is executed. As a result, C++ programs involving templates suffer from myriad safety problems that cannot occur in Java.

Moreover, all of the prominent proposals for a generic Java perform explicit static type checking of the parameterized classes, rather than just doing so at each instantiation of the class. If you're worried that such explicit checking would slow down type checking, rest assured that, in fact, the opposite is true: since the type checker makes only one pass over the parameterized code, as opposed to a pass for each instantiation of the parameterized types, the type checking process is expedited. For these reasons, the numerous objections to C++ templates do not apply to the generic type proposals for Java. In fact, if you look beyond what has been widely used in the industry, there are many less popular but very well-designed languages, such as Objective Caml and Eiffel, that support parameterized types to great advantage.

Are generic type systems object oriented?

Finally, some programmers object to any generic type system on the grounds that, because such systems were originally developed for functional languages, they are not object oriented. This objection is spurious. Generic types fit very naturally into a object-oriented framework, as the examples and discussion above demonstrate. But I suspect that this objection is rooted in a lack of understanding of how to integrate generic types with Java's inheritance polymorphism. In fact, such integration is possible, and is the basis for our implementation of NextGen.

1 2 Page
Join the discussion
Be the first to comment on this article. Our Commenting Policies
See more