Apr 1, 1998 12:00 AM PT

Serialization grab bag

Answers to reader questions about serialization

JavaWorld readers have responded to the past three months of JavaBeans columns with some interesting questions. This month, we'll go through a grab bag of serialization topics in response to these questions. We'll look into the details of initializing transient fields when a JavaBean (or any other object) is deserialized. We'll also revisit the writeObject() and readObject() methods, and look at a new example of how they may be used. And, we'll take a look at another feature of the Java Object Serialization Specification, object validation, which checks an object after deserialization to ensure that it's valid.

Deserialization and static initialization of transients

This month I received the following reader comment:

... transient initializers aren't instantiated during object deserialization; that is, if you have: transient int fred = 4;, you would expect that this value would not be serialized during the saving process (it doesn't) but that when deserializing, it would assume a default value of 4. However, it doesn't. It assumes a value of 0 because that's the default value for things of type int. I have found that the sensible solution to this plan is:

  1. Don't have any initializers in the class definition
  2. Have a method called init(), which sets up the default values
  3. Invoke init() after the constructor has been called

However, I have had a number of problems with this depending on how the invocation of the constructor has been called, and vice versa.

I would be interested to hear if you have any other ways of achieving this.

So, the complaint is that each transient variable in a deserialized object is initialized to the default value for its type, even if static initializers for those variables appear in the source file. I agree that this seems counterintuitive.

Even though transient variables are (by definition) not part of an object's state, it would be reasonable to expect that the class initializes them with the static initializer when the instance is created. But such is not the case. Transient variables are instantiated (despite what the letter says), but their static initializations are never applied during deserialization.

Let's explore this situation a bit further. We'll try to reproduce the problem by creating a serializable class containing a transient field, then serialize and deserialize it and see what results. Let's define a class called SerTest that contains an int instance field, a String instance field, and a private transient int instance field. Then, we'll put print() statements in the code at specific places to see what the variable values are.

Just for fun, we'll give SerTest a base class called SerTestBase, which is not serializable but which also contains a transient field. (You'll see why shortly.) SerTest's transient field has a static initializer expression that initializes it to 12, and SerTestBase statically initializes its transient to 99. Here's the source code for the test class and its main routine:

001 
002 import java.lang.*;
003 import java.io.*;
004 
005 // Note that the base class is not serializable
006 class SerTestBase extends java.lang.Object
007 {
008     private transient int iTransient_ = 99;
009 
010     public SerTestBase() {
011         System.out.println("Called SerTestBase no-arg constructor");
012     }
013 
014     public void print() {
015         System.out.println("Base: iTransient_ = " + iTransient_);
016     }
017 }
018 
019 // Serializable subclass
020 class SerTest extends SerTestBase implements java.io.Serializable {
021             
022     protected int a_ = 25;
023     protected String sTitle_ = new String("");
024     private transient int iNotSerialized_ = 12;
025 
026     public SerTest() {
027         System.out.println("Default constructor");
028         print();
029     }           
030 
031     public SerTest(int a, String s) {
032         System.out.println("2-arg constructor");
033         a_ = a;
034         sTitle_ = s;
035         print();
036    }
037 
038     // How to load myself from a stream
039     private void readObject(java.io.ObjectInputStream in)
040         throws IOException, ClassNotFoundException {
041     System.out.println("BEFORE READ");
042     print();
043         in.defaultReadObject();
044     System.out.println("AFTER READ");
045     print();
046     }
047 
048     public void print() {
049         super.print();
050         System.out.println("a=" + a_ + ", s=" + sTitle_ +
051             ", NotSerialized = " + iNotSerialized_);
052     }
053 
054     // How to write myself to a stream
055     private void writeObject(java.io.ObjectOutputStream out) 
056     throws IOException {
057         out.defaultWriteObject();
058     }
059 
060     // Properties
061     public void setTitle(String sTitle) { sTitle_ = sTitle; }
062     public String getTitle() { return sTitle_; }
063 
064     public void setA(int aa) { a_ = aa; }
065     public int getA() { return a_; }
066 };
067 
068 //
069 // main()
070 //
071 class Demo1 {
072 
073     private static void Usage() throws java.io.IOException {
074         System.out.println("Usage:\n\tDemo1 w file a string\n\tDemo1 r file");
075         IOException ex = new IOException("ERROR");
076         throw ex;
077     }
078 
079     public static void main(String[] args)
080     {
081         String cmd = args[0];
082 
083         try {
084             if (cmd.compareTo("w") == 0) {
085                 if (args.length != 4) { Usage(); }
086 
087                 int aa = Integer.parseInt(args[2]);
088                 String ss = args[3];
089 
090                 SerTest   bar = new SerTest(aa, ss);
091 
092                 FileOutputStream f = new FileOutputStream(args[1]);
093                 ObjectOutputStream s = new ObjectOutputStream(f);
094             
095                 System.out.println("Write: a=" + aa + ", s='" + ss + "'");
096 
097                 s.writeObject(bar);
098                 s.flush();
099             } else if (cmd.compareTo("r") == 0) {
100                 if (args.length != 2) { Usage(); }
101 
102                 FileInputStream f = new FileInputStream(args[1]);
103                 ObjectInputStream s = new ObjectInputStream(f);
104 
105                 System.out.println("Read SerTest:");
106 
107                 SerTest bar = (SerTest) s.readObject();
108 
109         System.out.println("RECEIVED OBJECT:");
110                 bar.print();
111             } else {
112                 System.err.println("Unknown command " + cmd);
113                 Usage();
114             }
115         }
116 
117         catch (Exception ex) {
118             System.out.println("Exception: " + ex.getMessage());
119             ex.printStackTrace();
120         }
121     }
122 };

The Demo1 class writes or reads a SerTest object to or from a disk file, and tells the object to print itself after construction. So, let's run it and see what happens. First, we tell the object to serialize; here's the result:

                                     
001 C:\Java>java Demo1 w hal.ser 2001 Kubrick
002 Called SerTestBase no-arg constructor
003 2-arg constructor
004 Base: iTransient_ = 99
005 a=2001, s=Kubrick, NotSerialized = 12
006 Write: a=2001, s='Kubrick'

Of course, the displayed value of 12 for iNotSerialized_ and 99 for iTransient_ weren't stored in the object stream, as they are marked transient. Let's see what happens when we deserialize:

001 C:\Java>java Demo1 r hal.ser
002 Read SerTest:
003 Called SerTestBase no-arg constructor
004 BEFORE READ
005 Base: iTransient_ = 99
006 a=0, s=null, NotSerialized = 0
007 AFTER READ
008 Base: iTransient_ = 99
009 a=2001, s=Kubrick, NotSerialized = 0
010 RECEIVED OBJECT:
011 Base: iTransient_ = 99
012 a=2001, s=Kubrick, NotSerialized = 0

Yikes! Something spooky is going on here. On lines 6, 9, and 12 of the output, the NotSerialized field is not being initialized. What's more, the constructors for the SerTest class have System.out.println() calls in them, and yet we see no output from any SerTest constructor. You'll note that the constructor for SerTestBase was called, and the iTransient_ member was correctly initialized. But no constructor for the SerTest class is ever called! What's the story?

The Java Virtual Machine Specification states:

3.8 Special Initialization Methods

At the level of the Java Virtual Machine, every constructor (2.12) appears as an instance initialization method that has the special name <init>().

The purpose of a constructor is primarily to allow the programmer to initialize the fields of a new class instance. Constructors are actually implemented in a class file within a class's instance initialization method, called <init>(). This method is responsible both for executing the constructor code on an object when the object is first created and for executing static initializations on object fields. (For more on object initialization, see Bill Venners' article, "Object initialization in Java" in last month's edition of JavaWorld.)

Java's deserialization mechanism doesn't call the <init>() method, so the object is never initialized (except for the default values of the fields, which are set when the Java virtual machine allocates them). Why doesn't the deserialization code initialize an object it's creating? Presumably, it's rather a waste of time to initialize variables that are about to be overwritten by a deserialized object stream, so the "deserializer" (ObjectInputStream) simply skips that part of the object creation process and "initializes" the object with the values in the object stream.

Because this initialization code (<init>()) is never called when an object is being deserialized, the static initializations for transient fields are never executed; therefore, the transient variable retains its default initialization value (in the case of our NotInitialized variable above, the value is 0).

What does it mean for a variable to be transient? transient variables are defined in the Java documentation as variables that are not part of the persistent state of an object. Sometimes a programmer will make a variable transient because the information in that structure is only useful locally. An open file, for example, has a "descriptor" that would probably be different if calculated on another machine or opened in another process. It would make no sense to serialize an open file object, and then deserialize it elsewhere with an incorrect descriptor value: It simply wouldn't work. At other times, the transient keyword is used to prevent some particular variable from being included in the serialization stream for security reasons. Perhaps a password is kept as cleartext (that is, decrypted, human-readable text) within an object, but it must be encrypted when it's placed in an output stream. In any case, transient and static variables are not included in the output stream.

A problem arises when the class requires a transient variable to be initialized to a non-default value after construction or deserialization. You'd think that using a static initializer (like we did with iTransient_ and iNotSerialized_ in the examples above) would work, but, as we've seen, for Serializable classes, it doesn't. That's because instance variables are initialized by that <init>() method, and ObjectInputStream never calls it. So, how do we solve the problem of initializing transient instance variables?

There are several ways to solve this problem, one of which is outlined by the person who asked the original question: by writing an init() function that can be called after the constructor is called. The reader indicated that there were problems with this approach. I think the most reasonable solution here is to write the init() function, and then call it both from within the constructor and from readObject(), like this:

...
class SerTest extends SerTestBase implements java.io.Serializable {
...
   // New method
    private void initTransients() {
    iNotSerialized_ = 12;
    }
...
    // Changed methods
    public SerTest() {
        initTransients();
        System.out.println("Default constructor");
        print();
    }           
    public SerTest(int a, String s) {
        System.out.println("2-arg constructor");
        a_ = a;
        sTitle_ = s;
        initTransients();
        print();
   }
    // How to load myself from a stream
    private void readObject(java.io.ObjectInputStream in)
        throws IOException, ClassNotFoundException {
    System.out.println("BEFORE READ");
    print();
        in.defaultReadObject();
        initTransients();
    System.out.println("AFTER READ");
    print();
    }    
...
};

Adding this method, and calls to it, produces the expected results:

001 C:\Java> java Demo2 r x.ser
002 Read SerTest:
003 Called SerTestBase no-arg constructor
004 BEFORE READ
005 Base: iTransient_ = 99
006 a=0, s=null, NotSerialized = 0
007 AFTER READ
008 Base: iTransient_ = 99
009 a=2001, s=Kubrick, NotSerialized = 12
010 RECEIVED OBJECT:
011 Base: iTransient_ = 99
012 a=2001, s=Kubrick, NotSerialized = 12

You can see that on line 6, the fields of the subclass are at their default values. You can also see that after the call to readObject(), a and s have their deserialized values, and NotSerialized has been initialized to 12 via the call to initTransients() embedded in readObject().

Note that you'd want to do something similar to the base class SerTestBase to properly initialize the iTransient_ variable.

In my opinion, this is a design flaw in the Serialization spec. It could be argued that it's better to leave the static initializations of transients undone, thereby avoiding unnecessary work when the programmer doesn't want the transient initialized. Unfortunately, the fact that transient static initialization doesn't happen during serialization is a booby trap for the unwary Java programmer. If any reader can think of a good reason that transient variables should not evaluate static initializations when an object is being deserialized, please write (see About the author below for contact information) and I'll summarize responses in a future column.

Calling readObject() and writeObject()

One alert reader has detected what at first glance looks to be a violation of Java's protection rules:

I just read your February column, and noticed something that has me wondering. You say that the [object stream classes in java.io] call the writeObject and readObject methods of a serializable object.

My problem here is that the writeObject and readObject are private, which should mean that they cannot be called from outside the class. So how does an [object stream] access these methods?

This is a good question. Let's go over what the readObject() and writeObject() methods are and how to use them, and then we'll answer the question in that context.

The "official" signatures for the writeObject() and readObject() methods for ObjectOutputStream and ObjectInputStream are:

private void writeObject(java.io.ObjectOutputStream out)
     throws IOException;
private void readObject(java.io.ObjectInputStream in)
     throws IOException, ClassNotFoundException;

When ObjectOutputStream is told to serialize an object, it first checks to see if the object defines a writeObject() method with this exact signature that appears above. If the method exists, the ObjectOutputStream calls that method; otherwise, it calls ObjectOutputStream.defaultWriteObject(). (The ObjectOutputStream.defaultReadObject() works similarly with readObject().)

Most of the time, a programmer will use writeObject() to write additional information about the object or about relationships to other objects. What additional information might a programmer want to write? One common example might be objects within a Serializable object that are not themselves serializable. In this case, it's up to the programmer to come up with a way to represent the nonserializable object, either by subclassing the object and adding writeObject and readObject methods to the subclass (and making the subclass serializable), or by defining the writeObject and readObject methods in the containing class.

For example, imagine you had an object, Silly, that implements Serializable, but it contains an instance of java.io.RandomAccessFile, which is not Serializable. You could define writeObject() like this:

001       transient RandomAccessFile rafFile_;
002 
003       // ... other fields and methods ...
004 
005       private void writeObject(java.io.ObjectOutputStream out)
006         throws IOException {
007 
008         // First write all serializable sub-objects
009         try { out.defaultWriteObject(); } catch (Exception ex) { }
010
011         // Now, write the filename and position of the random access file
012         if (rafFile_ == null)
013         {
014                 out.writeLong(-1);
015         } else {
016                 out.writeLong(rafFile_.getFilePointer());
017         }
018     }
019 
020     private void readObject(java.io.ObjectInputStream in)
021         throws IOException {
022             // Deserialize serializable fields
023             try { in.defaultReadObject(); } catch (Exception ex) { }
024 
025             // Now, get the position, open the rafFile, and set position
026             long lPos = in.readLong();
027             if (lPos >= 0) {
028                 rafFile_ = new RandomAccessFile(sFilename_, "r");
029                 rafFile_.seek(lPos);
030             }
031     }
032 

(I'm ignoring exceptions here because they're not the focus of the exercise.) First look at writeObject(). Notice that I'm calling defaultWriteObject() at line 9 (in red). This writes all of the non-transient fields to the ObjectOutputStream. Note that (in line 1), the RandomAccessFile is defined as transient. This tells the serialization machinery to ignore that field. If RandomAccessFile weren't defined transient, defaultWriteObject would throw a NotSerializableException when it tried to serialize the RandomAccessFile.

After writing the non-transient fields, I have to save RandomAccessFile by saving its position as a long int (lines 11-18, in blue). Here I'm making the assumption that when the object deserializes, the file it opens will still be accessible and will not have changed. (If that assumption isn't valid, then the corresponding code will have to be more complex.) Upon deserialization, the readObject() method calls defaultReadObject() first (line 23, in red), to get the non-transient fields, then reads the seek position of the RandomAccessFile, and next opens and seeks to that position if the seek position is valid (lines 25-30, in blue). The complete class file Silly.java is available for download, along with the other examples from this column, in the Resources section below. Experiment with different ways of saving the RandomAccessFile, or try making it non-transient and see what happens.

Now that we're completely clear when and how to override these writeObject() and readObject methods, let's return to the original question: The writeObject() and readObject() members are private, so how can ObjectOutputStream and ObjectInputStream use them? Let's take the question one step further: How does the stream class access protected and private fields? The answer is the same for both fields and methods: black magic.

If you look at the source code for java.io.ObjectOutputStream, you'll see this method declaration:

    private native boolean invokeObjectWriter(Object o, Class c)
    throws IOException;

This is a native method declaration: This method is not written in Java, but in another language (probably C, or perhaps C++). The protection mechanisms of protected and private simply don't apply to native methods: They can do anything they like. And in the case of this method, what it likes to do is to call writeObject() for Object o, private notwithstanding. There is an analogous native method in ObjectInputStream that runs readObject() (if it exists), and other methods that write out all class fields, regardless of their protection level. Where are these methods (invokeObjectWriter() and friends) actually defined? In the java executable, of course.

The existence of these native methods is not a security hole from the Java language point of view because they are trusted methods. They're part of the Java core. You lose all guarantee of security (and possibly a lot of portability) if you write your own native methods, because such methods essentially have unlimited power. (This is one of the major problems with some other component systems, which will remain unnamed for now. All of their methods essentially are native, and so the components can do virtually anything, including nasty things.) The trusted methods in the Java core can be called without fear of a security breach, since they're in the java executable. It's possible that someone could hack that executable, of course, so maintaining security system-wide is as important as always. Java doesn't solve all of the world's problems, only some of them.

This question raises one final interesting question and subsequent point. Why are the writeObject and readObject methods defined as private, anyway? The answer is straightforward: Allowing other objects to call writeObject() and readObject() would break data encapsulation. One of the hallmarks of object-oriented programming is that the implementation of the services an object provides is hidden from the user. The only access to an object should be its public interface (for other objects), or its public and protected interfaces (for subclasses). If you could call writeObject() at will, it would be possible to bust apart the resulting stream and find out what is inside the object. (Of course, you would never do such a thing, but I'll bet you could name co-workers who would, right?) Similarly, if you could call readObject() for an object, you could change an object's state any way you like.

Validating objects after deserialization

An additional topic in object deserialization that we haven't yet covered is object validation. The readObject() method that you write for your class can check the consistency of the object once it is deserialized, and throw an exception if the data in the object are inconsistent within that one object. The serialization mechanism in ObjectOutputStream "chases" object references and serializes entire object structures. How can you check data correctness constraints that involve more than one object? The validateObject() method is where an application developer places code that checks consistency between objects.

The ObjectInputStream has a method called RegisterValidation() that an object uses to indicate to a stream that its validateObject() function should be called after it is completely deserialized. "Completely deserialized" means that all of the object's fields and all of the objects it references have been deserialized. This allows the object to check information from referenced objects to make decisions about data consistency. If data are inconsistent, the validateObject() method can throw an InvalidObjectException.

Any object that wishes to be perform validations after deserialization must implement the following interface:

interface java.io.ObjectInputValidation {
    public void validateObject() throws InvalidObjectException;
}

The object must also call the ObjectInputStream.RegisterValidation() method to register for validation after the entire object graph is deserialized. The first argument to RegisterValidation() is the object that implements validateObject(), usually a reference to the object itself. Note that any object that implements validateObject() may serve as an object validator, but usually the object validates its own references to other objects. RegisterValidation()'s second argument is an integer priority that determines the callback order, with higher numbers taking precedence over lower. The order of callbacks within a priority is undefined.

When an entire object graph has been deserialized, the ObjectInputStream goes through the list of objects, calling validateObject() on each object that registered, in descending order of priority.

You can use object validation to do all of your validation checking within an object as well as for verifying references between objects. For example, imagine you serialized a tree of DirectoryEntry objects, which represented a structure of directories and files that another part of your application (the part that deserializes the object) assumes to exist: a class file hierarchy, maybe. The next time your application starts, it loads the DirectoryEntry tree from storage (a file, a database, or whatever). You could write the deserialization code like this:

class DirectoryEntry implements Serializable {
   // An instance of this represents a file
   // ...methods and fields...
};
class DirectorySubdir 
     extends DirectoryEntry
     implements ObjectInputValidation {
     Vector vecFiles_;      // Files in this directory
     Vector vecSubdirs_;    // Subdirectories
  // ... other methods ...
  private void readObject(ObjectInputStream streamIn)
       throws ClassNotFoundException, IOException
  {
    // Deserialize
    streamIn.defaultReadObject();
    // Register to validate self
    streamIn.registerValidation(this, 0);
  }
  public void validateObject() throws InvalidObjectException
  {
    System.out.println("Validating object " + sName_);
    // Check that I have at least one file
    if (vecFiles_.size() == 0)
      {
        throw new InvalidObjectException(sName_ +
                                         ": Found a directory with no files");
      }
  }
};

When this directory structure is deserialized, each call to readObject() registers a callback on itself to be called back for validation after it and all of its children are deserialized. Once deserialization is complete for a node in the directory tree, the validateObject() for that object is called back.

Currently, the validateObject() for a directory simply throws an InvalidObjectException if any directory contains no files. The example program, TestDirs.java (download it from Resources below), creates, serializes, and deserializes a "directory structure" when run with no arguments. Specifying any argument to the program causes the bottom-level directory to be created empty, which causes validateObject() to fail for that directory. validateObject() might be modified to check that all files and directories exist on the machine where the object is deserialized, or any number of other checks.

A word of warning: The tree we've defined in this example is a DAG (directed acyclic graph), but in the general case it's possible for there to be cycles in an object graph. ObjectInputStream() handles this correctly (by not looping infinitely). Be on the lookout for cycles in your object structure when you're writing validateObject(). In fact, finding cycles in a deserialized object structure may be one application of validateObject().

Mark Johnson has a B.S. in computer and electrical engineering from Purdue University (1986). He is a fanatical devotee of the design pattern approach in object-oriented architecture, of software components in theory, and of JavaBeans in practice. Over the past several years, he worked for Kodak, Booz-Allen and Hamilton, and EDS in Mexico City, developing Oracle and Informix database applications for the Mexican Federal Electoral Institute and for Mexican Customs. He currently works as a designer and developer for Object Products in Fort Collins, CO.

Learn more about this topic