Ensure proper version control for serialized objects

Java serialization and version control for release compatibility

Java's Serializable interface provides an easy-to-use programming interface for converting between a runtime object and a byte stream. Serialization involves mapping a runtime object or an object graph into an ObjectOutputStream, which can then be written to the filesystem or stored in a database. Conversely, deserialization reads the byte stream through the ObjectInputStream and then maps the byte stream into an object or an object graph.

During object serialization, the default Java serialization mechanism writes the metadata about the object, which includes the class name, field names and types, and superclass. This class definition is stored as a part of the serialized object. This stored metadata enables the deserialization process to reconstitute the objects and map the stream data into the class attributes with the appropriate type.

Java serialization offers two scenarios for data interchange. The first scenario is store-and-retrieve object usage—when a Java-based tool or application needs to store the runtime state information in one session and makes it available for a subsequent session. Java serialization allows the state objects to be written into a filesystem or a database, then retrieved later to reconstitute into the proper state objects.

The second scenario involves two runtime environments interoperating through shared objects. In this case, one application, a catalog order, for example, produces a business object, such as an order, which is then handled by another application, say, a fulfillment system, running on a different application server instance. Java serialization enables these business objects to be sent to another server instance by writing the object content as byte stream. The application running on the other server then deserializes the byte stream back into the proper business objects.

Figure 1. Two common scenarios for Java serialization. Click on thumbnail to view full-sized image.

In this article, I use these two scenarios to illustrate best practices for preventing release compatibility from breaking when implementing Java serialization. I begin by discussing how Java serialization provides version control.

Java serialization and SerialVersionUID

In anticipating the need to evolve a serializable class, Java serialization provides a serialVersionUID, also called suid, in the ObjectStreamClass for version control. suid is used to inform the Java serialization mechanism which version of the class is compatible with this serialized object. However, the importance of this field is often overlooked, resulting in release incompatibility.

 

package java.io; public class MySerializableClass implements Serializable { static String notSerializableString = "static field not serialized"; transient Thread T1; /* Transient field not serialized */

public int aNum = 0; public String serializedString = "a serialized string";

private void writeObject (ObjectOutputStream s) throws IOException { s.defaultWriteObject (); // Followed by customized serialization code }

private void readObject (ObjectInputStream s) throws IOException, ClassNotFoundException { s.defaultReadObject (); // Followed by customized deserialization code } }

The code snippet above shows a basic implementation of the Java Serializable interface. The trace listed below shows the object's use, resulting in an InvalidClassException. This trace reveals a mismatch of the serialVersionUID found in the local class, i.e., the MySerializableClass, and the one stored in the serialized object. In this example, the addition of a non-static field in MySerializableClass results in a different suid associated with this class. This exception illustrates the following:

  1. That changes to a serializable class result in a different computed suid.
  2. How the Java serialization mechanism matches the suid in the current version of the serializable class with the value saved in the object serialized with the class's prior version. This approach provides Java serialization its version control mechanism.
 java.io.InvalidClassException: MySerializableClass; Local class not compatible: stream classdesc serialVersionUID=7187368850772554122  local class serialVersionUID=4104486680721271886
at java.io.ObjectStreamClass.validateLocalClass(ObjectStreamClass.java:565)
at java.io.ObjectStreamClass.setClass(ObjectStreamClass.java:609)
at java.io.ObjectInputStream.inputClassDescriptor(ObjectInputStream.java:981)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:402)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:272)
at java.io.ObjectInputStream.inputObject(ObjectInputStream.java:1231)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:272)
at MyDeserializeClient.main(MyDeserializeClient.java:27)

The suid is set in one of two ways, with the first being the default mechanism described above. The Java serialization mechanism automatically computes a hash value. ObjectStreamClass's computeSerialVersionUID method passes the class name, sorted member names, modifiers, and interfaces to the secure hash algorithm (SHA), which returns a hash value. This computation technique ensures that most changes in an object's "shape" and attribute types result in different hash values. I elaborate on the significance of this default suid generation later when I discuss release compatibility.

In the second approach, the developer explicitly sets suid in the serializable class, as shown in the code below:

 

package java.io; public class MySerializableClass implements Serializable { private static final long serialVersionUID = 1999L;

static String notSerializableString = "static field not serialized"; transient Thread T1; /* Transient field not serialized */

public int aNum = 0; public String serializedString = "a serialized string";

private void writeObject (ObjectOutputStream s) throws IOException { s.defaultWriteObject (); // Followed by customized serialization code }

private void readObject (ObjectInputStream s) throws IOException, ClassNotFoundException { s.defaultReadObject (); // Followed by customized deserialization code } }

A developer can set the suid with any integer value. However, a developer can also pass the class name to a Java command—serialver—which computes a hash value using the same algorithm described above. The developer then pastes the return value into the source code. In the following sections, I discuss why it is important for the developer of a serializable class to explicitly set the suid in the code.

Evolution of a serializable class

Changes to a serializable class can be either compatible or incompatible. A compatible change results in the evolved class being able to deserialize objects serialized by the class of a prior release. Conversely, the object serialized by the evolved class can be deserialized by the class's prior version. Compatible changes typically result from adding fields or objects to the object graph. With the addition of fields or objects, the default Java serialization read-mechanism simply provides default values for the corresponding attribute types. Examples of compatible changes are the following:

  • Add fields
  • Change a field from static to non-static
  • Change a field from transient to non-transient
  • Add classes to the object tree

Incompatible changes typically result from the removal of fields or objects from the object tree. As the default read-mechanism traverses the byte stream, it raises an exception when the target field no longer exists to accept the data. For example, when a class found in the prior version no longer exists in the evolved version, the default read-mechanism raises a ClassNotFound exception, whereas on a missing field, the InvalidArgumentException is raised.

Examples of incompatible changes include:

  • Delete fields
  • Change class hierarchy
  • Change non-static to static
  • Change non-transient to transient
  • Change type of a primitive field

For details on the compatible and incompatible Java type changes to a serializable class, see the Java Object Serialization Specification.

Having introduced the notion of a suid and the compatibility issues involving Java type changes, I am ready to discuss the importance of version control when using Java serialization. First, I present how Java serialization supports version control.

Release compatibility

To preserve an application's stability, release compatibility requires that a new version of an application run interchangeably with that of a prior version, with both exhibiting the same expected behavior. When release compatibility breaks, the client application can no longer run without having to be recompiled or even re-implemented.

With respect to Java serialization, release compatibility requires the following:

  1. A new version of a serializable class observes its externalized interface so that its client does not need to change.
  2. The evolved class can deserialize objects serialized by a prior version. Conversely, the prior class can deserialize objects serialized by the evolved class.

This level of compatibility is critical in supporting interoperability across releases in a mixed-version environment or in store-and-retrieve usage scenarios previously described.

In the store-and-retrieve scenario, an application serializes an object and stores it in the database. After this class is enhanced and released in a new application version, the application should be able to retrieve and deserialize the stored object into a runtime object.

In the mixed-release scenario, the producer-consumer relationship is bidirectional. The newly evolved serializable class must be able to deserialize the data serialized by one of a prior implementation. Conversely, the previously released serializable class must be able to deserialize the object serialized by a newer release.

Breaking serialization compatibility has serious consequences; some of which include:

  • Broken product migration: Software products may store the installation, configuration, and runtime state information as serialized objects. When serialization compatibility breaks, the new product version will not be able to read the data stored by the prior version, complicating the configuration of a new product release.
  • Broken mixed-version interoperability: When one or more products or applications share a serializable class, breaking serialization compatibility in such a class will result in these products or applications failing to exchange runtime objects between Java runtime instances.
  • Broken customer applications: Often, a customer application stores business transactions as serialized objects in a database. Subsequently, sometimes weeks or months later, the same application or another application retrieves these serialized objects from the database. If the serializable class used in an application is upgraded, resulting in incompatible serialization changes, the application will fail to deserialize these objects and generate a Java exception.

Java serialization and version control

The Java serialization mechanism provides an approach for version control. At the heart of Java serialization is the ObjectOutputStream class. It implements the computeSerialVersionUID() method to compute the suid during object serialization. This computed suid is then written to the serialized stream as a part of the serialized object. This suid in the stream serves as a marker to denote the version of the serializable class with which it is compatible. See Figure 2 for a sample of metadata stored in a serialized object.

Figure 2. Metadata associated with a serialized object. Click on thumbnail to view full-sized image.

The suid is computed based on the serializable class's name, sorted field names, method names of methods and interfaces, and, finally, modifiers. These names are passed through the secure hash algorithm to derive a long integer value for the suid. When the suid is not explicitly set in a serializable class's code, the Java default mechanism automatically and silently computes its value. This suid value is then placed in the serialized object. When a class changes, such as the addition of a non-static field, the new class will have a suid value that differs from the prior version.

Before a serialized object is read, the ObjectInputStream class first computes the suid of the local class—the serializable class. It then matches this suid value against the one stored in the serialized object stream. When these two values match, ObjectInputStream reads off the fields in the stream and maps the values into the instantiated object. If these two values do not match, ObjectInputStream raises the InvalidClassException.

As illustrated in Figure 3, the original serializable class has a suid of 1 and is stored with the serialized object. When a non-static field is added to this class, Java serialization computes a new suid value if 2. Though adding a data member to a serializable class is a compatible change, the evolved class's computed suid value differs from the one stored with the serialized object.

During deserialization of a serialized object, the Java serialization mechanism computes the suid based on the evolved class, yielding a value of 2, which differs from the suid stored in the serialized object. Since Java serialization fails to match the suid values, it returns an InvalidClassException. This scenario highlights the risk of release compatibility when version control is not stringently exercised.

Figure 3. Two types of changes to a serializable object. Click on thumbnail to view full-sized image.

As a best practice, a class that implements the Serializable interface must explicitly set the suid, such as:

 private final static long serialVersionUID = <integer value> 

This practice rightfully puts the control and responsibility on the developer to determine the version of the serialization class with which a serialized object is compatible.

When compatible changes are made to the class, the suid is set to the same value as the prior release. This practice ensures that objects serialized by the prior release's class will have the same suid value. Therefore, these objects can be deserialized in the newly evolved class.

Since compatible changes involve the addition of new fields or objects, the Java serialization mechanism initializes these additional fields with the default value for the corresponding Java types. In addition, a developer can specify initial values for these fields during deserialization. In the case of evolving a serializable class with incompatible changes, typically, the developer has no quick and easy way to "fix" the incompatible serialized objects.

Mitigation approaches when making incompatible changes

When incompatible changes are made to a serializable class, this evolved class is no longer compatible with any objects serialized by the prior release's class. Hence, a new value must be assigned to this new release's suid.

In the stored-and-retrieved scenario, all serialized objects stored in a filesystem or database must be migrated to a format compatible with the new release. Migration effort involves retrieving these stored objects using the serializable class's prior release. Remove the fields no longer supported or recast their values to the Java types expected in the new serializable class. Finally, serialize these objects using the evolved class. This generates an suid in the serialized objects that is compatible with the newly released serializable class. This multistep migration effort aims to correct the stored serialized objects to the format expected by the evolved serializable class. As can be expected, this migration is unidirectional, supporting only forward compatibility, not backward compatibility.

In the mixed-version scenario for which the serialized object exchanged between applications has to be corrected dynamically, no easy mitigation solution is available, and its discussion reaches beyond this article's scope. Any successful mitigation effort will involve catching and using the InvalidClassException to load a fix that will read the byte stream off the serialized object and dynamically reconstruct an object compatible with the right version.

Design considerations for using Java serialization

While the Serialization interface is easy to use, it is not malleable to structural changes that result in incompatible changes. To design Serializable for ease of use, the serialized object's datastructure must be opaque and not directly accessible to the developer. When an incompatible change is made to a serializable class, release compatibility is difficult to maintain.

Besides the Serializable interface, at least three alternate approaches can serialize Java objects:

  1. For object serialization, instead of implementing the Serializable interface, a developer can implement the Externalizable interface, which extends Serializable. By implementing Externalizable, a developer is responsible for implementing the writeExternal() and readExternal() methods. As a result, a developer has sole control over reading and writing the serialized objects. For a detailed description of this technique, see the Externalizable interface's details in the Java Serialization Specification.
  2. XML serialization is an often-used approach for data interchange. This approach lags runtime performance when compared with Java serialization, both in terms of the size of the object and the processing time. With a speedier XML parser, the performance gap with respect to the processing time narrows. Nonetheless, XML serialization provides a more malleable solution when faced with changes in the serializable object.
  3. Finally, consider a "roll-your-own" serialization approach. You can write an object's content directly via either the ObjectOutputStream or the DataOutputStream. While this approach is more involved in its initial implementation, it offers the greatest flexibility and extensibility. In addition, this approach provides a performance advantage over Java serialization.

Since mitigating incompatible changes to an object serialized via the Serializable interface proves difficult, designing a serializable class must be deliberate. Before implementing a class using Java serialization, consider the following characteristics of the class:

  • Stability: How likely is this class to be substantially changed over the next couple of releases? A class at an early stage of maturity will more likely undergo major and incompatible changes.
  • Visibility: Will this class be used internally by the implementer or within the same development team where the implementer has more control over the change process? Or will it be published externally for general customer use, where the implementer is less free to mitigate the impact of change?
  • Usage pattern: Will this class be used in a store-and-retrieve or a mixed-version scenario? Or will it be used only in a transient real-time scenario, sending a Java object from one server instance to another for real-time usage? Finally, will this class be used in a mixed-version environment, where different versions of this class may be deployed while needing to interoperate?

If a Java class is in its early stage of maturity and will be involved in mixed-release usages, this class is not likely to be a good candidate for implementing the Serialization interface.

Conclusion

The Java Serialization interface lets a developer write a runtime object into a byte stream and conversely reconstitute a runtime object by reading the data back from a byte stream object. However, this ease of use comes with a price.

Without observing the version control feature provided by the Java Serialization interface, inadvertent changes to a serializable class can break release compatibility. Furthermore, the implementation of Serialization results in a design that is brittle to change. This article offers best practices and design considerations for safeguarding against breaking release compatibility and interoperability when implementing a serializable class.

Alan Hui is a senior IT architect with the IBM Software Group in Durham, North Carolina. As a technical consultant, he has worked with international clients to design and implement business integration solutions. Hui has developed and delivered workshops on object-oriented design and team development methods. He received his Ph.D. from the University of South Carolina in software engineering. His research interests include enterprise process and information modeling, Web services, and component development methodology. His personal interests include yoga, motorcycling, and digital image capture and creation.

Learn more about this topic

Join the discussion
Be the first to comment on this article. Our Commenting Policies