Object equality

Writing equals and hashCode methods for data objects

1 2 Page 2
Page 2 of 2

Since some parts of the equals method run faster than others, it makes sense to order the statements so the quicker ones are tested before the slower ones. For example, comparing the values of a primitive is much faster than invoking the equals() method; so primitives are compared first. Similarly, if the object is of the wrong type, there is no point in comparing any of the fields first.

There are some situations when using floating-point data types that can cause some odd behavior; see the sidebar, "When All is Not Equal," for more information.

Read-only objects

If an object is read-only (immutable), then the hash code can be computed ahead of time. When the object is created, all of the values will be passed in via a constructor, and the hash code can be calculated on that data. The hashCode() method can then just return the pre-computed value:

public class Point {
  private final int x, y;
  private final int hashCode;
  public Point(int x, int y) {
    this.x = x;
    this.y = y;
    this.hashCode = x*31 ^ y*37;
  }
  public boolean equals(Object other) {
    ...
  }
  public int hashCode() {
    return hashCode;
  }
}

Of course, care must be taken to ensure that the variables are not changed after their initialization; the final keyword helps to ensure that cannot occur.

Using instanceof (or not)

In Joshua Bloch's otherwise excellent book Effective Java, he recommends using instanceof as the test to determine type of a class. Whilst on the surface, this may seem a good idea, in fact it has a fatal flaw in that instanceof is not symmetric. Bloch's book recommends:

public class BadPoint {
  private int x, y;
  public boolean equals(Object other) {
    if (other == this) return true;
    if (!(other instanceof BadPoint)) return false; // BAD
    BadPoint point = (BadPoint)other;
    return (x == point.x && y == point.y);
  }
  public int hashCode() {
    return x + y;
  }
}

Unfortunately, because this code is shorter (it combines the test for other == null and the other.getClass() into the same statement; null instanceof X is always false) and because it was recommended in Effective Java, it has become ingrained into many Java developers' style. This has become one of the most controversial points in the book, with many on-going discussions; see the interview at Artima for more.

The big problem with using instanceof is that it isn't symmetric. It surfaces when creating subclasses:

public class BadPoint3D extends BadPoint {
  private int z;
  public boolean equals(Object other) {
    if (!super.equals(other)) return false;
    if (!(other instanceof BadPoint3D)) return false; // BAD
    BadPoint3D point = (BadPoint3D)other;
    return (z == point.z);
  }
}

The problem occurs in that given instances badPoint and badPoint3D, we have point instanceof BadPoint3D == false, but point3D instanceof BadPoint == true.

This example shows a point in 3 dimensions, subclassing the 2D point we've already seen. By using the instanceof implementation, we can break symmetry:

BadPoint p1 = new BadPoint(1,1);
BadPoint p2 = new BadPoint3D(1,1,1);
// p1.equals(p2) == true;  // incorrect
// p2.equals(p1) == false; // correct

The assumption that all subclasses of BadPoint are equal if they have the same x and y values causes this problem. Using instanceof hides the fact that we no longer compare like-for-like; instead, we compare like-for-subtype.

Using the getClass() implementation gives the correct answer of false in both cases. It isn't possible in general cases to compare a 3D point with a 2D point, and this is the key factor in the test. Using the getClass() implementation ensures that you only compare like-for-like tests.

Note: The weak argument that hiding subtype information is desirable is based on the assumption that you would want to create a subclass of the data type for the sole purpose of adding/changing some methods (but not data) of the superclass. Whilst this happens in classes like Applet, this is a false argument; an Applet is not a data object—it's a code object. In fact, you never need to create a subclass of a datastructure for the purpose of embellishing any methods; this is a common mistake by those new to object-orientation, referred to as the "is-a/has-a" argument. In this case, you're not creating a subtype of the datastructure, you just want to add/modify its behavior. The correct solution is to write a separate class that delegates to the contained object. The rest of that discussion reaches beyond this article's scope.

Practical problems with symmetry

Does breaking the equals() method's reflexivity matter in practice? Well, as noted earlier, it is used in many of the low-level libraries that make up the collections classes, and they depend implicitly on this behavior. If we use the instanceof variant, many odd behaviors can occur during use:

Set data = new HashSet();
data.add(new BadPoint3D(1,1,1));
data.add(new BadPoint(1,1));
data.add(new BadPoint3D(1,1,2));
data.add(new BadPoint3D(1,2,3));
data.size(); // gives 3, not 4
data.contains(new BadPoint(1,2)); // returns true, not false

Of course, this behavior occurs when the BadPoint3D does not define its own hashCode(). However, it is not required to define one; provided the default hashCode() has been overridden, it fulfils the hashCode() method's contractual obligations.

Further, the order in which the BadPoint and BadPoint3D instances are added determines which ones are in the final set. So although it appears trivial that the symmetry is broken, it can actually cause some deep-rooted erroneous behavior. You can see this from the samples in Resources.

Total violation

In fact, it's not just symmetry that's broken in the comparisons. If the subclass also defines a hashCode() method, then you can end up with a situation in which two objects are equals() with each other, but give different hash codes—a total violation of the equality contact. Instead of just breaking one method, using instanceof can actually break two of them:

public class BadPoint3D extends BadPoint {
  private int z;
  public boolean equals(Object other) {
    if (!super.equals(other)) return false;
    if (!(other instanceof BadPoint3D)) return false; // BAD
    BadPoint3D point = (BadPoint3D)other;
    return (z == point.z);
  }
  public int hashCode() {
    return super.hashCode() + z;
  }
}
// BadPoint p1 = new BadPoint(1,2);
// BadPoint p2 = new BadPoint3D(1,2,3);
// p1.equals(p2) == true
// p1.hashCode() == 3;
// p2.hashCode() == 6;

Summary

Writing an implementation of equals() and hashCode() is usually the case of following a sample and adapting it to the fields defined in the data object. The correct implementation can provide performance benefits with many collections and other low-level libraries.

It has also been conclusively shown that using instanceof is bad practice because of the erroneous situations that can occur when using subclasses and any other objects that rely on the correct implementation of equals() and hashCode().

For completeness, the Point and Point3D classes and their full correct implementations are provided for comparison and convenience:

Point

public class Point {
  private static double version = 1.0;
  private transient double distance;
  private String name;
  private int x, y;
  public Point(String name, int x, int y) {
    this(x,y);
    this.name = name;
  }
  public Point(int x, int y) {
    this.x = x;
    this.y = y;
  }
  public boolean equals(Object other) {
    if (other == this) return true;
    if (other == null) return false;
    if (getClass() != other.getClass()) return false;
    Point point = (Point)other;
    return (
      x == point.x &&
      y == point.y &&
      (name == point.name || 
        (name != null && name.equals(point.name)))
    );
  }
  public int hashCode() {
    return x ^ y;
  }
}

Point3D

public class Point3D extends Point {
  private int z;
  public Point3D(String name, int x, int y, int z) {
    super(name,x,y);
    this.z = z;
  }
  public Point3D(int x, int y, int z) {
    super(x,y);
    this.z = z;
  }
  public boolean equals(Object other) {
    if (!super.equals(other)) return false;
    Point3D point = (Point3D)other;
    return (z == point.z);
  }
  public int hashCode() {
    return super.hashCode() ^ z;
  }
}
Alex Blewitt is CEO of International Object Solutions Limited and Chartered Engineer of the Institute of Electrical Engineers. He has been working with Java since 1995 and is currently working towards a PhD at the University of Edinburgh involving verification of design patterns in Java. He currently lives in Milton Keynes with his wife Amy and two dogs Milly and Kea. For more on Alex and Amy, see http://www.bandlem.com/.

Learn more about this topic

1 2 Page 2
Page 2 of 2