Reveal the magic behind subtype polymorphism

Behold polymorphism from a type-oriented point of view

The word polymorphism comes from the Greek for "many forms." Most Java developers associate the term with an object's ability to magically execute correct method behavior at appropriate points in a program. However, that implementation-oriented view leads to images of wizardry, rather than an understanding of fundamental concepts.

Polymorphism in Java is invariably subtype polymorphism. Closely examining the mechanisms that generate that variety of polymorphic behavior requires that we discard our usual implementation concerns and think in terms of type. This article investigates a type-oriented perspective of objects, and how that perspective separates what behavior an object can express from how the object actually expresses that behavior. By freeing our concept of polymorphism from the implementation hierarchy, we also discover how Java interfaces facilitate polymorphic behavior across groups of objects that share no implementation code at all.

Quattro polymorphi

Polymorphism is a broad object-oriented term. Though we usually equate the general concept with the subtype variety, there are actually four different kinds of polymorphism. Before we examine subtype polymorphism in detail, the following section presents a general overview of polymorphism in object-oriented languages.

Luca Cardelli and Peter Wegner, authors of "On Understanding Types, Data Abstraction, and Polymorphism," (see Resources for link to article) divide polymorphism into two major categories -- ad hoc and universal -- and four varieties: coercion, overloading, parametric, and inclusion. The classification structure is:

                                 |-- coercion
                 |-- ad hoc    --|
                                 |-- overloading
  polymorphism --|
                                 |-- parametric
                 |-- universal --|
                                 |-- inclusion

In that general scheme, polymorphism represents an entity's capacity to have multiple forms. Universal polymorphism refers to a uniformity of type structure, in which the polymorphism acts over an infinite number of types that have a common feature. The less structured ad hoc polymorphism acts over a finite number of possibly unrelated types. The four varieties may be described as:

  • Coercion: a single abstraction serves several types through implicit type conversion
  • Overloading: a single identifier denotes several abstractions
  • Parametric: an abstraction operates uniformly across different types
  • Inclusion: an abstraction operates through an inclusion relation

I will briefly discuss each variety before turning specifically to subtype polymorphism.

Coercion

Coercion represents implicit parameter type conversion to the type expected by a method or an operator, thereby avoiding type errors. For the following expressions, the compiler must determine whether an appropriate binary + operator exists for the types of operands:

  2.0 + 2.0
  2.0 + 2
  2.0 + "2"

The first expression adds two double operands; the Java language specifically defines such an operator.

However, the second expression adds a double and an int; Java does not define an operator that accepts those operand types. Fortunately, the compiler implicitly converts the second operand to double and uses the operator defined for two double operands. That is tremendously convenient for the developer; without the implicit conversion, a compile-time error would result or the programmer would have to explicitly cast the int to double.

The third expression adds a double and a String. Once again, the Java language does not define such an operator. So the compiler coerces the double operand to a String, and the plus operator performs string concatenation.

Coercion also occurs at method invocation. Suppose class Derived extends class Base, and class C has a method with signature m(Base). For the method invocation in the code below, the compiler implicitly converts the derived reference variable, which has type Derived, to the Base type prescribed by the method signature. That implicit conversion allows the m(Base) method's implementation code to use only the type operations defined by Base:

  C c = new C();
  Derived derived = new Derived();
  c.m( derived );

Again, implicit coercion during method invocation obviates a cumbersome type cast or an unnecessary compile-time error. Of course, the compiler still verifies that all type conversions conform to the defined type hierarchy.

Overloading

Overloading permits the use of the same operator or method name to denote multiple, distinct program meanings. The + operator used in the previous section exhibited two forms: one for adding double operands, one for concatenating String objects. Other forms exist for adding two integers, two longs, and so forth. We call the operator overloaded and rely on the compiler to select the appropriate functionality based on program context. As previously noted, if necessary, the compiler implicitly converts the operand types to match the operator's exact signature. Though Java specifies certain overloaded operators, it does not support user-defined overloading of operators.

Java does permit user-defined overloading of method names. A class may possess multiple methods with the same name, provided that the method signatures are distinct. That means either the number of parameters must differ or at least one parameter position must have a different type. Unique signatures allow the compiler to distinguish between methods that have the same name. The compiler mangles the method names using the unique signatures, effectively creating unique names. In light of that, any apparent polymorphic behavior evaporates upon closer inspection.

Both coercion and overloading are classified as ad hoc because each provides polymorphic behavior only in a limited sense. Though they fall under a broad definition of polymorphism, these varieties are primarily developer conveniences. Coercion obviates cumbersome explicit type casts or unnecessary compiler type errors. Overloading, on the other hand, provides syntactic sugar, allowing a developer to use the same name for distinct methods.

Parametric

Parametric polymorphism allows the use of a single abstraction across many types. For example, a List abstraction, representing a list of homogeneous objects, could be provided as a generic module. You would reuse the abstraction by specifying the types of objects contained in the list. Since the parameterized type can be any user-defined data type, there are a potentially infinite number of uses for the generic abstraction, making this arguably the most powerful type of polymorphism.

At first glance, the above List abstraction may seem to be the utility of the class java.util.List. However, Java does not support true parametric polymorphism in a type-safe manner, which is why java.util.List and java.util's other collection classes are written in terms of the primordial Java class, java.lang.Object. (See my article "A Primordial Interface?" for more details.) Java's single-rooted implementation inheritance offers a partial solution, but not the true power of parametric polymorphism. Eric Allen's excellent article, "Behold the Power of Parametric Polymorphism," describes the need for generic types in Java and the proposals to address Sun's Java Specification Request #000014, "Add Generic Types to the Java Programming Language." (See Resources for a link.)

Inclusion

Inclusion polymorphism achieves polymorphic behavior through an inclusion relation between types or sets of values. For many object-oriented languages, including Java, the inclusion relation is a subtype relation. So in Java, inclusion polymorphism is subtype polymorphism.

As noted earlier, when Java developers generically refer to polymorphism, they invariably mean subtype polymorphism. Gaining a solid appreciation of subtype polymorphism's power requires viewing the mechanisms yielding polymorphic behavior from a type-oriented perspective. The rest of this article examines that perspective closely. For brevity and clarity, I use the term polymorphism to mean subtype polymorphism.

Type-oriented view

The UML class diagram in Figure 1 shows the simple type and class hierarchy used to illustrate the mechanics of polymorphism. The model depicts five types, four classes, and one interface. Although the model is called a class diagram, I think of it as a type diagram. As detailed in "Thanks Type and Gentle Class," every Java class and interface declares a user-defined data type. So from an implementation-independent view (i.e., a type-oriented view) each of the five rectangles in the figure represents a type. From an implementation point of view, four of those types are defined using class constructs, and one is defined using an interface.

Figure 1. UML class diagram for the example code

The following code defines and implements each user-defined data type. I purposely keep the implementation as simple as possible:

/* Base.java */
public class Base
{
  public String m1()
  {
    return "Base.m1()";
  }
  public String m2( String s )
  {
    return "Base.m2( " + s + " )";
  }
}
/* IType.java */
interface IType
{
  String m2( String s );
  String m3();
}
/* Derived.java */
public class Derived
  extends Base
  implements IType
{
  public String m1()
  {
    return "Derived.m1()";
  }
  public String m3()
  {
    return "Derived.m3()";
  }
}
/* Derived2.java */
public class Derived2
  extends Derived
{
  public String m2( String s )
  {
    return "Derived2.m2( " + s + " )";
  }
  public String m4()
  {
    return "Derived2.m4()";
  }
}
/* Separate.java */
public class Separate
  implements IType
{
  public String m1()
  {
    return "Separate.m1()";
  }
  public String m2( String s )
  {
    return "Separate.m2( " + s + " )";
  }
  public String m3()
  {
    return "Separate.m3()";
  }
}

Using these type declarations and class definitions, Figure 2 depicts a conceptual view of the Java statement:

Derived2 derived2 = new Derived2();
Figure 2. Derived2 reference attached to Derived2 object

The above statement declares an explicitly typed reference variable, derived2, and attaches that reference to a newly created Derived2 class object. The top panel in Figure 2 depicts the Derived2 reference as a set of portholes, through which the underlying Derived2 object can be viewed. There is one hole for each Derived2 type operation. The actual Derived2 object maps each Derived2 operation to appropriate implementation code, as prescribed by the implementation hierarchy defined in the above code. For example, the Derived2 object maps m1() to implementation code defined in class Derived. Furthermore, that implementation code overrides the m1() method in class Base. A Derived2 reference variable cannot access the overridden m1() implementation in class Base. That does not mean that the actual implementation code in class Derived can't use the Base class implementation via super.m1(). But as far as the reference variable derived2 is concerned, that code is inaccessible. The mappings of the other Derived2 operations similarly show the implementation code executed for each type operation.

Now that you have a Derived2 object, you can reference it with any variable that conforms to type Derived2. The type hierarchy in Figure 1's UML diagram reveals that Derived, Base, and IType are all super types of Derived2. So, for example, a Base reference can be attached to the object. Figure 3 depicts the conceptual view of the following Java statement:

Base base = derived2;
Figure 3. Base reference attached to Derived2 object

There is absolutely no change to the underlying Derived2 object or any of the operation mappings, though methods m3() and m4() are no longer accessible through the Base reference. Calling m1() or m2(String) using either variable derived2 or base results in execution of the same implementation code:

String tmp;
// Derived2 reference (Figure 2)
tmp = derived2.m1();             // tmp is "Derived.m1()"
tmp = derived2.m2( "Hello" );    // tmp is "Derived2.m2( Hello )"
// Base reference (Figure 3)
tmp = base.m1();                 // tmp is "Derived.m1()"
tmp = base.m2( "Hello" );        // tmp is "Derived2.m2( Hello )"

Realizing identical behavior through both references makes sense because the Derived2 object does not know what calls each method. The object only knows that when called upon, it follows the marching orders defined by the implementation hierarchy. Those orders stipulate that for method m1(), the Derived2 object executes the code in class Derived, and for method m2(String), it executes the code in class Derived2. The action performed by the underlying object does not depend on the reference variable's type.

However, all is not equal when you use the reference variables derived2 and base. As depicted in Figure 3, a Base type reference can only see the Base type operations of the underlying object. So although Derived2 has mappings for methods m3() and m4(), variable base can't access those methods:

String tmp;
// Derived2 reference (Figure 2)
tmp = derived2.m3();             // tmp is "Derived.m3()"
tmp = derived2.m4();             // tmp is "Derived2.m4()"
// Base reference (Figure 3)
tmp = base.m3();                 // Compile-time error
tmp = base.m4();                 // Compile-time error

The runtime

Derived2

object remains fully capable of accepting either the

m3()

or

m4()

method calls. The type restrictions that disallow those attempted calls through the

Base

1 2 3 Page
Join the discussion
Be the first to comment on this article. Our Commenting Policies
See more