It's in the contract! Object versions for JavaBeans

Use object versioning to maintain serialization compatibility with your JavaBeans

Over the past two months, we've gone into some depth regarding how to serialize objects in Java. (See "Serialization and the JavaBeans Specification" and "Do it the `Nescafé' way -- with freeze-dried JavaBeans.") This month's article assumes you've either already read these articles or you understand the topics they cover. You should understand what serialization is, how to use the Serializable interface, and how to use the java.io.ObjectOutputStream and java.io.ObjectInputStream classes.

Why you need versioning

What a computer does is determined by its software, and software is extremely easy to change. This flexibility, usually considered an asset, has its liabilities. Sometimes it seems that software is too easy to change. You've undoubtedly run into at least one of the following situations:

  • A document file you received via e-mail won't read correctly in your word processor, because yours is an older version with an incompatible file format

  • A Web page operates differently on different browsers because different browser versions support different feature sets

  • An application won't run because you have the wrong version of a particular library

  • Your C++ won't compile because the header and source files are of incompatible versions

All of these situations are caused by incompatible versions of software and/or the data the software manipulates. Like buildings, personal philosophies, and riverbeds, programs change constantly in response to the changing conditions around them. (If you don't think buildings change, read Stewart Brand's outstanding book How Buildings Learn, a discussion of how structures transform over time. See Resources for more information.) Without a structure to control and manage this change, any software system of any useful size eventually degenerates into chaos. The goal in software versioning is to ensure that the version of software you are currently using produces correct results when it encounters data produced by other versions of itself.

This month, we're going to discuss how Java class versioning works, so that we can provide version control of our JavaBeans. The versioning structure for Java classes permits you to indicate to the serialization mechanism whether a particular data stream (that is, a serialized object) is readable by a particular version of a Java class. We'll talk about "compatible" and "incompatible" changes to classes, and why these changes affect versioning. We'll go over the goals of the versioning structure, and how the java.io package meets those goals. And, we'll learn to put safeguards into our code to ensure that when we read object streams of various versions, the data is always consistent after the object is read.

Version aversion

There are various kinds of versioning problems in software, all of which pertain to compatibility between chunks of data and/or executable code:

  • Differing versions of the same software may or may not be able to handle each others' data storage formats

  • Programs that load executable code at runtime must be able to identify the correct version of the software object, loadable library, or object file to do the job

  • A class's methods and fields must maintain the same meaning as the class evolves, or existing programs may break in places where those methods and fields are used

  • Source code, header files, documentation, and build scripts must all be coordinated in a software build environment to ensure that binary files are built from the correct versions of the source files

This article on Java object versioning only addresses the first three -- that is, version control of binary objects and their semantics in a runtime environment. (There's a vast array of software available for versioning source code, but we're not covering that here.)

It's important to remember that serialized Java object streams don't contain bytecodes. They contain only the information necessary to reconstruct an object assuming you have the class files available to build the object. But what happens if the class files of the two Java virtual machines (JVMs) (the writer and the reader) are of different versions? How do we know if they're compatible?

A class definition can be thought of as a "contract" between the class and the code that calls the class. This contract includes the class's API (application programming interface). Changing the API is equivalent to changing the contract. (Other changes to a class may also imply changes to the contract, as we'll see.) As a class evolves, it's important to maintain the behavior of previous versions of the class so as not to break the software in places that depended upon given behavior.

A version change example

Imagine you had a method called getItemCount() in a class, which meant get the total number of items this object contains, and this method was used in a dozen places throughout your system. Then, imagine at some later time that you change getItemCount() to mean get the maximum number of items this object has ever contained. Your software will most likely break in most places where this method was used, because suddenly the method will be reporting different information. Essentially, you've broken the contract; so it serves you right that your program now has bugs in it.

There's no way, short of disallowing changes altogether, to completely automate the detection of this sort of change, because it happens at the level of what a program means, not simply at the level of how that meaning is expressed. (If you do think of a way to do this easily and generally, you're going to be richer than Bill.) So, in the absence of a complete, general, and automated solution to this problem, what can we do to avoid getting into hot water when we change our classes (which, of course, we must)?

The easiest answer to this question is to say that if a class changes at all, it shouldn't be "trusted" to maintain the contract. After all, a programmer might have done anything to the class, and who knows if the class still works as advertised? This solves the problem of versioning, but it's an impractical solution because it's way too restrictive. If the class is modified to improve performance, say, there's no reason to disallow using the new version of the class simply because it doesn't match the old one. Any number of changes may be made to a class without breaking the contract.

On the other hand, some changes to classes practically guarantee that the contract is broken: deleting a field, for example. If you delete a field from a class, you'll still be able to read streams written by previous versions, because the reader can always ignore the value for that field. But think about what happens when you write a stream intended to be read by previous versions of the class. The value for that field will be absent from the stream, and the older version will assign a (possibly logically inconsistent) default value to that field when it reads the stream. Voilà!: You've got a broken class.

Compatible and incompatible changes

The trick to managing object version compatibility is to identify which kinds of changes may cause incompatibilities between versions and which won't, and to treat these cases differently. In Java parlance, changes that don't cause compatibility problems are called compatible changes; those that may are called incompatible changes.

The designers of the serialization mechanism for Java had the following goals in mind when they created the system:

  1. To define a way in which a newer version of a class can read and write streams that a previous version of the class can also "understand" and use correctly

  2. To provide a default mechanism that serializes objects with good performance and reasonable size. This is the serialization mechanism we've already discussed in the two previous JavaBeans columns mentioned at the beginning of this article

  3. To minimize versioning-related work on classes that don't need versioning. Ideally, versioning information need only be added to a class when new versions are added

  4. To format the object stream so that objects can be skipped without loading the object's class file. This capability allows a client object to traverse an object stream containing objects it doesn't understand

Let's see how the serialization mechanism addresses these goals in light of the situation outlined above.

Reconcilable differences

Some changes made to a class file can be depended on not to change the contract between the class and whatever other classes may call it. As noted above, these are called compatible changes in the Java documentation. Any number of compatible changes may be made to a class file without changing the contract. In other words, two versions of a class that differ only by compatible changes are compatible classes: The newer version will continue to read and write object streams that are compatible with previous versions.

The classes java.io.ObjectInputStream and java.io.ObjectOutputStream don't trust you. They are designed to be, by default, extremely suspicious of any changes to a class file's interface to the world -- meaning, anything visible to any other class that may use the class: the signatures of public methods and interfaces and the types and modifiers of public fields. They're so paranoid, in fact, that you can scarcely change anything about a class without causing java.io.ObjectInputStream to refuse to load a stream written by a previous version of your class.

Let's look at an example. of a class incompatibility, and then solve the resulting problem. Say you've got an object called InventoryItem, which maintains part numbers and the quantity of that particular part available in a warehouse. A simple form of that object as a JavaBean might look something like this:

001 
002 import java.beans.*;
003 import java.io.*;
004 import Printable;
005 
006 //
007 // Version 1: simply store quantity on hand and part number
008 //
009 
010 public class InventoryItem implements Serializable, Printable {
011 
012
013
014
015 
016   // fields
017   protected int iQuantityOnHand_;
018   protected String  sPartNo_;
019 
020   public InventoryItem()
021   {
022     iQuantityOnHand_ = -1;
023     sPartNo_ = "";
024   }
025 
026   public InventoryItem(String _sPartNo, int _iQuantityOnHand)
027   {
028     setQuantityOnHand(_iQuantityOnHand);
029     setPartNo(_sPartNo);
030   }
031 
032   public int getQuantityOnHand()
033   {
034     return iQuantityOnHand_;
035   }
036 
037   public void setQuantityOnHand(int _iQuantityOnHand)
038   {
039     iQuantityOnHand_ = _iQuantityOnHand;
040   }
041 
042   public String getPartNo()
043   {
044     return sPartNo_;
045   }
046 
047   public void setPartNo(String _sPartNo)
048   {
049     sPartNo_ = _sPartNo;
050   }
051 
052   // ... implements printable
053   public void print()
054   {
055     System.out.println("Part: " + getPartNo() + "\nQuantity on hand: " +
056                        getQuantityOnHand() + "\n\n");
057   }
058 };
059 

(We also have a simple main program, called Demo8a, which reads and writes InventoryItems to and from a file using object streams, and interface Printable, which InventoryItem implements and Demo8a uses to print the objects. You can find the source for these here.) Running the demo program produces reasonable, if unexciting, results:

C:\beans>java Demo8a w file SA0091-001 33
Wrote object:
Part: SA0091-001
Quantity on hand: 33
C:\beans>java Demo8a r file
Read object:
Part: SA0091-001
Quantity on hand: 33

The program serializes and deserializes the object correctly. Now, let's make a tiny change to the class file. The system users have done an inventory and have found discrepancies between the database and the actual item counts. They've requested the ability to track the number of items lost from the warehouse. Let's add a single public field to InventoryItem that indicates the number of items missing from the storeroom. We insert the following line into the InventoryItem class and recompile:

016   // fields
017   protected int     iQuantityOnHand_;
018   protected String  sPartNo_;
019   public int iQuantityLost_;

The file compiles fine, but look at what happens when we try to read the stream from the previous version:

C:\mj-java\Column8>java Demo8a r file
IO Exception:
InventoryItem; Local class not compatible
java.io.InvalidClassException: InventoryItem; Local class not compatible
        at java.io.ObjectStreamClass.setClass(ObjectStreamClass.java:219)
        at java.io.ObjectInputStream.inputClassDescriptor(ObjectInputStream.java:639) 
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:276)
        at java.io.ObjectInputStream.inputObject(ObjectInputStream.java:820)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:284)
        at Demo8a.main(Demo8a.java:56)

Whoa, dude! What happened?

java.io.ObjectInputStream doesn't write class objects when it's creating a stream of bytes representing an object. Instead, it writes a java.io.ObjectStreamClass, which is a description of the class. The destination JVM's class loader uses this description to find and load the bytecodes for the class. It also creates and includes a 64-bit integer called a SerialVersionUID, which is a sort of key that uniquely identifies a class file version.

The SerialVersionUID is created by calculating a 64-bit safe hash of the following information about the class. The serialization mechanism wants to be able to detect change in any of the following things:

1 2 3 Page 1
Page 1 of 3