Recent top five:
Java.next -- Four languages that represent the future of Java
Blogger Stuart Halloway has begun a series of posts on trends that point to the future of the Java platform. In his first
post, he compares Clojure, Groovy, JRuby, and Scala -- four wildly different languages that nonetheless all play together
in the JRE. Find out what unites these languages and what they can tell us about the future of Java-based development ...
| Enterprise AJAX - Transcend the Hype |
| Memory Analysis in Eclipse |
| Oracle Compatibility Developer's Guide |
| Memory Analysis in Eclipse |
writeObject() and readObject() methods, and look at a new example of how they may be used. And, we'll take a look at another feature of the Java Object
Serialization Specification, object validation, which checks an object after deserialization to ensure that it's valid.This month I received the following reader comment:
... transient initializers aren't instantiated during object deserialization; that is, if you have:transient int fred = 4;, you would expect that this value would not be serialized during the saving process (it doesn't) but that when deserializing, it would assume a default value of 4. However, it doesn't. It assumes a value of 0 because that's the default value for things of typeint. I have found that the sensible solution to this plan is:
- Don't have any initializers in the class definition
- Have a method called
init(), which sets up the default values- Invoke
init()after the constructor has been called
However, I have had a number of problems with this depending on how the invocation of the constructor has been called, and vice versa.
I would be interested to hear if you have any other ways of achieving this.
So, the complaint is that each transient variable in a deserialized object is initialized to the default value for its type, even if static initializers for those variables appear in the source file. I agree that this seems counterintuitive.
Even though transient variables are (by definition) not part of an object's state, it would be reasonable to expect that the class initializes them with the static initializer when the instance is created. But such is not the case. Transient variables are instantiated (despite what the letter says), but their static initializations are never applied during deserialization.
Let's explore this situation a bit further. We'll try to reproduce the problem by creating a serializable class containing
a transient field, then serialize and deserialize it and see what results. Let's define a class called SerTest that contains an int instance field, a String instance field, and a private transient int instance field. Then, we'll put print() statements in the code at specific places to see what the variable values are.
Just for fun, we'll give SerTest a base class called SerTestBase, which is not serializable but which also contains a transient field. (You'll see why shortly.) SerTest's transient field has a static initializer expression that initializes it to 12, and SerTestBase statically initializes its transient to 99. Here's the source code for the test class and its main routine:
001
002 import java.lang.*;
003 import java.io.*;
004
005 // Note that the base class is not serializable
006 class SerTestBase extends java.lang.Object
007 {
008 private transient int iTransient_ = 99;
009
010 public SerTestBase() {
011 System.out.println("Called SerTestBase no-arg constructor");
012 }
013
014 public void print() {
015 System.out.println("Base: iTransient_ = " + iTransient_);
016 }
017 }
018
019 // Serializable subclass
020 class SerTest extends SerTestBase implements java.io.Serializable {
021
022 protected int a_ = 25;
023 protected String sTitle_ = new String("");
024 private transient int iNotSerialized_ = 12;
025
026 public SerTest() {
027 System.out.println("Default constructor");
028 print();
029 }
030
031 public SerTest(int a, String s) {
032 System.out.println("2-arg constructor");
033 a_ = a;
034 sTitle_ = s;
035 print();
036 }
037
038 // How to load myself from a stream
039 private void readObject(java.io.ObjectInputStream in)
040 throws IOException, ClassNotFoundException {
041 System.out.println("BEFORE READ");
042 print();
043 in.defaultReadObject();
044 System.out.println("AFTER READ");
045 print();
046 }
047
048 public void print() {
049 super.print();
050 System.out.println("a=" + a_ + ", s=" + sTitle_ +
051 ", NotSerialized = " + iNotSerialized_);
052 }
053
054 // How to write myself to a stream
055 private void writeObject(java.io.ObjectOutputStream out)
056 throws IOException {
057 out.defaultWriteObject();
058 }
059
060 // Properties
061 public void setTitle(String sTitle) { sTitle_ = sTitle; }
062 public String getTitle() { return sTitle_; }
063
064 public void setA(int aa) { a_ = aa; }
065 public int getA() { return a_; }
066 };
067
068 //
069 // main()
070 //
071 class Demo1 {
072
073 private static void Usage() throws java.io.IOException {
074 System.out.println("Usage:\n\tDemo1 w file a string\n\tDemo1 r file");
075 IOException ex = new IOException("ERROR");
076 throw ex;
077 }
078
079 public static void main(String[] args)
080 {
081 String cmd = args[0];
082
083 try {
084 if (cmd.compareTo("w") == 0) {
085 if (args.length != 4) { Usage(); }
086
087 int aa = Integer.parseInt(args[2]);
088 String ss = args[3];
089
090 SerTest bar = new SerTest(aa, ss);
091
092 FileOutputStream f = new FileOutputStream(args[1]);
093 ObjectOutputStream s = new ObjectOutputStream(f);
094
095 System.out.println("Write: a=" + aa + ", s='" + ss + "'");
096
097 s.writeObject(bar);
098 s.flush();
099 } else if (cmd.compareTo("r") == 0) {
100 if (args.length != 2) { Usage(); }
101
102 FileInputStream f = new FileInputStream(args[1]);
103 ObjectInputStream s = new ObjectInputStream(f);
104
105 System.out.println("Read SerTest:");
106
107 SerTest bar = (SerTest) s.readObject();
108
109 System.out.println("RECEIVED OBJECT:");
110 bar.print();
111 } else {
112 System.err.println("Unknown command " + cmd);
113 Usage();
114 }
115 }
116
117 catch (Exception ex) {
118 System.out.println("Exception: " + ex.getMessage());
119 ex.printStackTrace();
120 }
121 }
122 };
The Demo1 class writes or reads a SerTest object to or from a disk file, and tells the object to print itself after construction. So, let's run it and see what happens.
First, we tell the object to serialize; here's the result:
001 C:\Java>java Demo1 w hal.ser 2001 Kubrick 002 Called SerTestBase no-arg constructor 003 2-arg constructor 004 Base: iTransient_ = 99 005 a=2001, s=Kubrick, NotSerialized = 12 006 Write: a=2001, s='Kubrick'
Of course, the displayed value of 12 for iNotSerialized_ and 99 for iTransient_ weren't stored in the object stream, as they are marked transient. Let's see what happens when we deserialize:
001 C:\Java>java Demo1 r hal.ser 002 Read SerTest: 003 Called SerTestBase no-arg constructor 004 BEFORE READ 005 Base: iTransient_ = 99 006 a=0, s=null, NotSerialized = 0 007 AFTER READ 008 Base: iTransient_ = 99 009 a=2001, s=Kubrick, NotSerialized = 0 010 RECEIVED OBJECT: 011 Base: iTransient_ = 99 012 a=2001, s=Kubrick, NotSerialized = 0
Yikes! Something spooky is going on here. On lines 6, 9, and 12 of the output, the NotSerialized field is not being initialized. What's more, the constructors for the SerTest class have System.out.println() calls in them, and yet we see no output from any SerTest constructor. You'll note that the constructor for SerTestBase was called, and the iTransient_ member was correctly initialized. But no constructor for the SerTest class is ever called! What's the story?
The Java Virtual Machine Specification states:
3.8 Special Initialization Methods
At the level of the Java Virtual Machine, every constructor (2.12) appears as an instance initialization method that has the special name
<init>().
The purpose of a constructor is primarily to allow the programmer to initialize the fields of a new class instance. Constructors
are actually implemented in a class file within a class's instance initialization method, called <init>(). This method is responsible both for executing the constructor code on an object when the object is first created and for executing static initializations on object fields. (For more on object initialization, see Bill Venners' article, "Object initialization in Java" in last month's edition of JavaWorld.)
Java's deserialization mechanism doesn't call the <init>() method, so the object is never initialized (except for the default values of the fields, which are set when the Java virtual
machine allocates them). Why doesn't the deserialization code initialize an object it's creating? Presumably, it's rather
a waste of time to initialize variables that are about to be overwritten by a deserialized object stream, so the "deserializer"
(ObjectInputStream) simply skips that part of the object creation process and "initializes" the object with the values in the object stream.
Because this initialization code (<init>()) is never called when an object is being deserialized, the static initializations for transient fields are never executed;
therefore, the transient variable retains its default initialization value (in the case of our NotInitialized variable above, the value is 0).
What does it mean for a variable to be transient? transient variables are defined in the Java documentation as variables that are not part of the persistent state of an object. Sometimes
a programmer will make a variable transient because the information in that structure is only useful locally. An open file, for example, has a "descriptor" that would
probably be different if calculated on another machine or opened in another process. It would make no sense to serialize an
open file object, and then deserialize it elsewhere with an incorrect descriptor value: It simply wouldn't work. At other
times, the transient keyword is used to prevent some particular variable from being included in the serialization stream for security reasons.
Perhaps a password is kept as cleartext (that is, decrypted, human-readable text) within an object, but it must be encrypted
when it's placed in an output stream. In any case, transient and static variables are not included in the output stream.
A problem arises when the class requires a transient variable to be initialized to a non-default value after construction or deserialization. You'd think that using a static
initializer (like we did with iTransient_ and iNotSerialized_ in the examples above) would work, but, as we've seen, for Serializable classes, it doesn't. That's because instance variables are initialized by that <init>() method, and ObjectInputStream never calls it. So, how do we solve the problem of initializing transient instance variables?
There are several ways to solve this problem, one of which is outlined by the person who asked the original question: by writing
an init() function that can be called after the constructor is called. The reader indicated that there were problems with this approach.
I think the most reasonable solution here is to write the init() function, and then call it both from within the constructor and from readObject(), like this:
...
class SerTest extends SerTestBase implements java.io.Serializable {
...
// New method
private void initTransients() {
iNotSerialized_ = 12;
}
...
// Changed methods
public SerTest() {
initTransients();
System.out.println("Default constructor");
print();
}
public SerTest(int a, String s) {
System.out.println("2-arg constructor");
a_ = a;
sTitle_ = s;
initTransients();
print();
}
// How to load myself from a stream
private void readObject(java.io.ObjectInputStream in)
throws IOException, ClassNotFoundException {
System.out.println("BEFORE READ");
print();
in.defaultReadObject();
initTransients();
System.out.println("AFTER READ");
print();
}
...
};
Adding this method, and calls to it, produces the expected results:
001 C:\Java> java Demo2 r x.ser 002 Read SerTest: 003 Called SerTestBase no-arg constructor 004 BEFORE READ 005 Base: iTransient_ = 99 006 a=0, s=null, NotSerialized = 0 007 AFTER READ 008 Base: iTransient_ = 99 009 a=2001, s=Kubrick, NotSerialized = 12 010 RECEIVED OBJECT: 011 Base: iTransient_ = 99 012 a=2001, s=Kubrick, NotSerialized = 12
You can see that on line 6, the fields of the subclass are at their default values. You can also see that after the call to
readObject(), a and s have their deserialized values, and NotSerialized has been initialized to 12 via the call to initTransients() embedded in readObject().
Note that you'd want to do something similar to the base class SerTestBase to properly initialize the iTransient_ variable.
In my opinion, this is a design flaw in the Serialization spec. It could be argued that it's better to leave the static initializations of transients undone, thereby avoiding unnecessary work when the programmer doesn't want the transient initialized. Unfortunately, the fact that transient static initialization doesn't happen during serialization is a booby trap for the unwary Java programmer. If any reader can think of a good reason that transient variables should not evaluate static initializations when an object is being deserialized, please write (see About the author below for contact information) and I'll summarize responses in a future column.
One alert reader has detected what at first glance looks to be a violation of Java's protection rules:
I just read your February column, and noticed something that has me wondering. You say that the [object stream classes injava.io] call thewriteObjectandreadObjectmethods of a serializable object.
My problem here is that the
writeObjectandreadObjectare private, which should mean that they cannot be called from outside the class. So how does an [object stream] access these methods?
This is a good question. Let's go over what the readObject() and writeObject() methods are and how to use them, and then we'll answer the question in that context.
The "official" signatures for the writeObject() and readObject() methods for ObjectOutputStream and ObjectInputStream are:
private void writeObject(java.io.ObjectOutputStream out)
throws IOException;
private void readObject(java.io.ObjectInputStream in)
throws IOException, ClassNotFoundException;
When ObjectOutputStream is told to serialize an object, it first checks to see if the object defines a writeObject() method with this exact signature that appears above. If the method exists, the ObjectOutputStream calls that method; otherwise, it calls ObjectOutputStream.defaultWriteObject(). (The ObjectOutputStream.defaultReadObject() works similarly with readObject().)
Most of the time, a programmer will use writeObject() to write additional information about the object or about relationships to other objects. What additional information might
a programmer want to write? One common example might be objects within a Serializable object that are not themselves serializable. In this case, it's up to the programmer to come up with a way to represent the
nonserializable object, either by subclassing the object and adding writeObject and readObject methods to the subclass (and making the subclass serializable), or by defining the writeObject and readObject methods in the containing class.
For example, imagine you had an object, Silly, that implements Serializable, but it contains an instance of java.io.RandomAccessFile, which is not Serializable. You could define writeObject() like this:
001 transient RandomAccessFile rafFile_;
002
003 // ... other fields and methods ...
004
005 private void writeObject(java.io.ObjectOutputStream out)
006 throws IOException {
007
008 // First write all serializable sub-objects
009 try { out.defaultWriteObject(); } catch (Exception ex) { }
010
011 // Now, write the filename and position of the random access file
012 if (rafFile_ == null)
013 {
014 out.writeLong(-1);
015 } else {
016 out.writeLong(rafFile_.getFilePointer());
017 }
018 }
019
020 private void readObject(java.io.ObjectInputStream in)
021 throws IOException {
022 // Deserialize serializable fields
023 try { in.defaultReadObject(); } catch (Exception ex) { }
024
025 // Now, get the position, open the rafFile, and set position
026 long lPos = in.readLong();
027 if (lPos >= 0) {
028 rafFile_ = new RandomAccessFile(sFilename_, "r");
029 rafFile_.seek(lPos);
030 }
031 }
032
(I'm ignoring exceptions here because they're not the focus of the exercise.) First look at writeObject(). Notice that I'm calling defaultWriteObject() at line 9 (in red). This writes all of the non-transient fields to the ObjectOutputStream. Note that (in line 1), the RandomAccessFile is defined as transient. This tells the serialization machinery to ignore that field. If RandomAccessFile weren't defined transient, defaultWriteObject would throw a NotSerializableException when it tried to serialize the RandomAccessFile.
After writing the non-transient fields, I have to save RandomAccessFile by saving its position as a long int (lines 11-18, in blue). Here I'm making the assumption that when the object deserializes, the file it opens will still be
accessible and will not have changed. (If that assumption isn't valid, then the corresponding code will have to be more complex.)
Upon deserialization, the readObject() method calls defaultReadObject() first (line 23, in red), to get the non-transient fields, then reads the seek position of the RandomAccessFile, and next opens and seeks to that position if the seek position is valid (lines 25-30, in blue). The complete class file
Silly.java is available for download, along with the other examples from this column, in the Resources section below. Experiment with different ways of saving the RandomAccessFile, or try making it non-transient and see what happens.
Now that we're completely clear when and how to override these writeObject() and readObject methods, let's return to the original question: The writeObject() and readObject() members are private, so how can ObjectOutputStream and ObjectInputStream use them? Let's take the question one step further: How does the stream class access protected and private fields? The answer
is the same for both fields and methods: black magic.
If you look at the source code for java.io.ObjectOutputStream, you'll see this method declaration:
private native boolean invokeObjectWriter(Object o, Class c)
throws IOException;
This is a native method declaration: This method is not written in Java, but in another language (probably C, or perhaps C++). The protection
mechanisms of protected and private simply don't apply to native methods: They can do anything they like. And in the case of this method, what it likes to do is to call writeObject() for Object o, private notwithstanding. There is an analogous native method in ObjectInputStream that runs readObject() (if it exists), and other methods that write out all class fields, regardless of their protection level. Where are these
methods (invokeObjectWriter() and friends) actually defined? In the java executable, of course.
The existence of these native methods is not a security hole from the Java language point of view because they are trusted methods. They're part of the Java core. You lose all guarantee of security (and possibly a lot of portability) if you write
your own native methods, because such methods essentially have unlimited power. (This is one of the major problems with some other component
systems, which will remain unnamed for now. All of their methods essentially are native, and so the components can do virtually
anything, including nasty things.) The trusted methods in the Java core can be called without fear of a security breach, since
they're in the java executable. It's possible that someone could hack that executable, of course, so maintaining security system-wide is as important
as always. Java doesn't solve all of the world's problems, only some of them.
This question raises one final interesting question and subsequent point. Why are the writeObject and readObject methods defined as private, anyway? The answer is straightforward: Allowing other objects to call writeObject() and readObject() would break data encapsulation. One of the hallmarks of object-oriented programming is that the implementation of the services
an object provides is hidden from the user. The only access to an object should be its public interface (for other objects),
or its public and protected interfaces (for subclasses). If you could call writeObject() at will, it would be possible to bust apart the resulting stream and find out what is inside the object. (Of course, you would never do such a thing, but I'll bet you could name co-workers who would, right?) Similarly, if you could call readObject() for an object, you could change an object's state any way you like.
An additional topic in object deserialization that we haven't yet covered is object validation. The readObject() method that you write for your class can check the consistency of the object once it is deserialized, and throw an exception
if the data in the object are inconsistent within that one object. The serialization mechanism in ObjectOutputStream "chases" object references and serializes entire object structures. How can you check data correctness constraints that involve
more than one object? The validateObject() method is where an application developer places code that checks consistency between objects.