Ensure proper version control for serialized objects

Java serialization and version control for release compatibility

Java's Serializable interface provides an easy-to-use programming interface for converting between a runtime object and a byte stream. Serialization involves mapping a runtime object or an object graph into an ObjectOutputStream, which can then be written to the filesystem or stored in a database. Conversely, deserialization reads the byte stream through the ObjectInputStream and then maps the byte stream into an object or an object graph.

During object serialization, the default Java serialization mechanism writes the metadata about the object, which includes the class name, field names and types, and superclass. This class definition is stored as a part of the serialized object. This stored metadata enables the deserialization process to reconstitute the objects and map the stream data into the class attributes with the appropriate type.

Java serialization offers two scenarios for data interchange. The first scenario is store-and-retrieve object usage—when a Java-based tool or application needs to store the runtime state information in one session and makes it available for a subsequent session. Java serialization allows the state objects to be written into a filesystem or a database, then retrieved later to reconstitute into the proper state objects.

The second scenario involves two runtime environments interoperating through shared objects. In this case, one application, a catalog order, for example, produces a business object, such as an order, which is then handled by another application, say, a fulfillment system, running on a different application server instance. Java serialization enables these business objects to be sent to another server instance by writing the object content as byte stream. The application running on the other server then deserializes the byte stream back into the proper business objects.

Figure 1. Two common scenarios for Java serialization. Click on thumbnail to view full-sized image.

In this article, I use these two scenarios to illustrate best practices for preventing release compatibility from breaking when implementing Java serialization. I begin by discussing how Java serialization provides version control.

Java serialization and SerialVersionUID

In anticipating the need to evolve a serializable class, Java serialization provides a serialVersionUID, also called suid, in the ObjectStreamClass for version control. suid is used to inform the Java serialization mechanism which version of the class is compatible with this serialized object. However, the importance of this field is often overlooked, resulting in release incompatibility.

 

package java.io; public class MySerializableClass implements Serializable { static String notSerializableString = "static field not serialized"; transient Thread T1; /* Transient field not serialized */

public int aNum = 0; public String serializedString = "a serialized string";

private void writeObject (ObjectOutputStream s) throws IOException { s.defaultWriteObject (); // Followed by customized serialization code }

private void readObject (ObjectInputStream s) throws IOException, ClassNotFoundException { s.defaultReadObject (); // Followed by customized deserialization code } }

The code snippet above shows a basic implementation of the Java Serializable interface. The trace listed below shows the object's use, resulting in an InvalidClassException. This trace reveals a mismatch of the serialVersionUID found in the local class, i.e., the MySerializableClass, and the one stored in the serialized object. In this example, the addition of a non-static field in MySerializableClass results in a different suid associated with this class. This exception illustrates the following:

  1. That changes to a serializable class result in a different computed suid.
  2. How the Java serialization mechanism matches the suid in the current version of the serializable class with the value saved in the object serialized with the class's prior version. This approach provides Java serialization its version control mechanism.
 java.io.InvalidClassException: MySerializableClass; Local class not compatible: stream classdesc serialVersionUID=7187368850772554122  local class serialVersionUID=4104486680721271886
at java.io.ObjectStreamClass.validateLocalClass(ObjectStreamClass.java:565)
at java.io.ObjectStreamClass.setClass(ObjectStreamClass.java:609)
at java.io.ObjectInputStream.inputClassDescriptor(ObjectInputStream.java:981)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:402)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:272)
at java.io.ObjectInputStream.inputObject(ObjectInputStream.java:1231)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:272)
at MyDeserializeClient.main(MyDeserializeClient.java:27)

The suid is set in one of two ways, with the first being the default mechanism described above. The Java serialization mechanism automatically computes a hash value. ObjectStreamClass's computeSerialVersionUID method passes the class name, sorted member names, modifiers, and interfaces to the secure hash algorithm (SHA), which returns a hash value. This computation technique ensures that most changes in an object's "shape" and attribute types result in different hash values. I elaborate on the significance of this default suid generation later when I discuss release compatibility.

In the second approach, the developer explicitly sets suid in the serializable class, as shown in the code below:

 

package java.io; public class MySerializableClass implements Serializable { private static final long serialVersionUID = 1999L;

static String notSerializableString = "static field not serialized"; transient Thread T1; /* Transient field not serialized */

public int aNum = 0; public String serializedString = "a serialized string";

private void writeObject (ObjectOutputStream s) throws IOException { s.defaultWriteObject (); // Followed by customized serialization code }

private void readObject (ObjectInputStream s) throws IOException, ClassNotFoundException { s.defaultReadObject (); // Followed by customized deserialization code } }

A developer can set the suid with any integer value. However, a developer can also pass the class name to a Java command—serialver—which computes a hash value using the same algorithm described above. The developer then pastes the return value into the source code. In the following sections, I discuss why it is important for the developer of a serializable class to explicitly set the suid in the code.

Evolution of a serializable class

Changes to a serializable class can be either compatible or incompatible. A compatible change results in the evolved class being able to deserialize objects serialized by the class of a prior release. Conversely, the object serialized by the evolved class can be deserialized by the class's prior version. Compatible changes typically result from adding fields or objects to the object graph. With the addition of fields or objects, the default Java serialization read-mechanism simply provides default values for the corresponding attribute types. Examples of compatible changes are the following:

  • Add fields
  • Change a field from static to non-static
  • Change a field from transient to non-transient
  • Add classes to the object tree

Incompatible changes typically result from the removal of fields or objects from the object tree. As the default read-mechanism traverses the byte stream, it raises an exception when the target field no longer exists to accept the data. For example, when a class found in the prior version no longer exists in the evolved version, the default read-mechanism raises a ClassNotFound exception, whereas on a missing field, the InvalidArgumentException is raised.

Examples of incompatible changes include:

  • Delete fields
  • Change class hierarchy
  • Change non-static to static
  • Change non-transient to transient
  • Change type of a primitive field

For details on the compatible and incompatible Java type changes to a serializable class, see the Java Object Serialization Specification.

Having introduced the notion of a suid and the compatibility issues involving Java type changes, I am ready to discuss the importance of version control when using Java serialization. First, I present how Java serialization supports version control.

Release compatibility

To preserve an application's stability, release compatibility requires that a new version of an application run interchangeably with that of a prior version, with both exhibiting the same expected behavior. When release compatibility breaks, the client application can no longer run without having to be recompiled or even re-implemented.

With respect to Java serialization, release compatibility requires the following:

  1. A new version of a serializable class observes its externalized interface so that its client does not need to change.
  2. The evolved class can deserialize objects serialized by a prior version. Conversely, the prior class can deserialize objects serialized by the evolved class.

This level of compatibility is critical in supporting interoperability across releases in a mixed-version environment or in store-and-retrieve usage scenarios previously described.

In the store-and-retrieve scenario, an application serializes an object and stores it in the database. After this class is enhanced and released in a new application version, the application should be able to retrieve and deserialize the stored object into a runtime object.

In the mixed-release scenario, the producer-consumer relationship is bidirectional. The newly evolved serializable class must be able to deserialize the data serialized by one of a prior implementation. Conversely, the previously released serializable class must be able to deserialize the object serialized by a newer release.

Breaking serialization compatibility has serious consequences; some of which include:

  • Broken product migration: Software products may store the installation, configuration, and runtime state information as serialized objects. When serialization compatibility breaks, the new product version will not be able to read the data stored by the prior version, complicating the configuration of a new product release.
  • Broken mixed-version interoperability: When one or more products or applications share a serializable class, breaking serialization compatibility in such a class will result in these products or applications failing to exchange runtime objects between Java runtime instances.
  • Broken customer applications: Often, a customer application stores business transactions as serialized objects in a database. Subsequently, sometimes weeks or months later, the same application or another application retrieves these serialized objects from the database. If the serializable class used in an application is upgraded, resulting in incompatible serialization changes, the application will fail to deserialize these objects and generate a Java exception.

Java serialization and version control

The Java serialization mechanism provides an approach for version control. At the heart of Java serialization is the ObjectOutputStream class. It implements the computeSerialVersionUID() method to compute the suid during object serialization. This computed suid is then written to the serialized stream as a part of the serialized object. This suid in the stream serves as a marker to denote the version of the serializable class with which it is compatible. See Figure 2 for a sample of metadata stored in a serialized object.

Figure 2. Metadata associated with a serialized object. Click on thumbnail to view full-sized image.

The suid is computed based on the serializable class's name, sorted field names, method names of methods and interfaces, and, finally, modifiers. These names are passed through the secure hash algorithm to derive a long integer value for the suid. When the suid is not explicitly set in a serializable class's code, the Java default mechanism automatically and silently computes its value. This suid value is then placed in the serialized object. When a class changes, such as the addition of a non-static field, the new class will have a suid value that differs from the prior version.

Before a serialized object is read, the ObjectInputStream class first computes the suid of the local class—the serializable class. It then matches this suid value against the one stored in the serialized object stream. When these two values match, ObjectInputStream reads off the fields in the stream and maps the values into the instantiated object. If these two values do not match, ObjectInputStream raises the InvalidClassException.

As illustrated in Figure 3, the original serializable class has a suid of 1 and is stored with the serialized object. When a non-static field is added to this class, Java serialization computes a new suid value if 2. Though adding a data member to a serializable class is a compatible change, the evolved class's computed suid value differs from the one stored with the serialized object.

1 2 Page
Recommended
Join the discussion
Be the first to comment on this article. Our Commenting Policies
See more