Do it the "Nescafe' " way -- with freeze-dried JavaBeans

How to use object serialization for bean persistence

Up to this point in the JavaWorld column on JavaBeans, we've discussed beans from the vantage point of how they behave within a single running Java program. The JavaBeans we've discussed so far only survive as long as there's an active reference to them, and as long as the program in which they run is executing. It would, however, be very useful for a software component to be able to survive the death of the program in which it runs, to be "resurrected" and run again when that program is revitalized; or maybe we'd want the component to be able to move from machine to machine, gathering information, or performing remote services. In either case, persistence is the key.

When the last reference to a bean goes out of scope, or when the program exits, all of a bean's "state" (the values of the bean's fields) is lost forever, unless we've saved enough information about what was inside the bean to reconstruct it later. Software object persistence is nothing more than saving information about an object so that it can be recreated at a different time and/or place. Object serialization is a means of implementing persistence by converting the object's state into a stream of bytes that can later be used to reconstruct a virtually identical copy of the original object.

In this article, we're going to take a look at some of the benefits that a persistence mechanism provides to a software component framework. We'll discuss the goals of the JavaBeans persistence approach, and then go over some introductory code examples of persistent JavaBeans.

A matter of simple storage

Though object persistence may seem like a new idea to you, you're probably more familiar with it than you know. Every file on your hard drive or floppy disk can be thought of as a persistent software object of one sort or another. For example, let's say you use a text editor to create a text file. At one point in time, there was a data structure in memory on some computer that contained the characters of your document. When you gave the editor the Save As command, you were really telling the program to persist the contents of its memory to a disk file. (Thank goodness the people who design user interfaces know better than to present information to users this way -- teaching my mom to use a word processor is hard enough without having to deal with menu items like Persist Memory State!)

The next time you run the text editor and you Load a file, the program reads the information in the text file and creates a structure in memory that is identical, more or less, to what was in memory when the file was last saved. The phrase "more or less" is significant because, typically, not all of the information about a software object is saved. For example, in the text editor there may be Undo information hanging around that goes away when the editor dies. The next time you start the editor, the file contents are there, but the Undo information is off in the Big Bit Bucket in the sky.

How object persistence and serialization work

Software object persistence works in precisely the same way as our file-saving example above: A software object in an object-oriented system can be serialized, or converted into a stream of bytes, which can be used to resurrect the object at some other place and/or time. If you're the developer who wrote the text editor discussed above, you may well have organized your program so that a document is a single object, which might be a single "monolithic" object that does everything, or an aggregation of many smaller objects that perform specialized tasks. If you ask the Document object for its serialized state, it simply returns a (possibly very long) string, which you then squirrel away on disk. When the user asks that a file be opened, your program opens the file, reads the string, and hands it to the Document class (or some class that knows how to create Document objects), and, voilà ! The Document object is risen from the dead.

Now, when the program told the Document object to serialize, the Document might have returned a long ASCII string with embedded newlines, which, when sent directly to a printer, would be readable. (Printing could actually be considered "serializing to paper," but only a geek would say it that way.) The Document also might have returned a string of compressed, illegible gibberish that you'd never be able to figure out in a thousand years, but which the Document object "understands" and can use to reconstruct that instance of the Document. This is an important point: Objects that serialize themselves into strings also know how to read those strings to restore themselves to their original state. An object of a certain class wrote that string, and given the same string later, that class had better know how to reconstruct the instance. Otherwise, persistence simply doesn't work.

Precisely how an object is serialized is arbitrary, at least in the general case. (Specific component technology specifications spell out the format of a serialized object, and that is anything but arbitrary.) That's why any reasonable word processor gives you the choice of one of a dozen or more file formats. Every word processor serializes its document objects in a different way, but since they're all documents, and documents share general traits like characters, fonts, paragraphs, leading, and so on, it's possible for software objects (and, therefore, word processors) to read each others' formats and interoperate. Likewise, software objects can be written to read and write each others' serialization formats and "pretend" to be instances of one another. We'll discuss this more in the section below entitled "Interoperation: works and plays well with others."

Objects that have been "freeze-dried" into strings can then be transmitted, stored, and otherwise manipulated as strings. The ability to store objects as strings gives system designers a lot of flexibility. Imagine you're designing the graphical user interface for a database application, and you've created a hot-looking component that manipulates the contents of a particular table or query result. You could serialize that customized component and save it in the database itself, say in a table called EDITORS, along with the name of the table. You could then organize your database application's user interface around combining these editing components, each of which specializes in manipulating a particular set of data. In order to change, for example, the screens used to edit particular tables, you need only change the associated serialized editing component in the EDITORS table, and the end-user application would automatically use the new component for future access to that table. (This is just an example. Obviously, there's a lot to be said for organizing your database applications around workflow instead of around the underlying data model.)

Interoperation: Works and plays well with others

So far, we've placed no restrictions or expectations on how objects serialize themselves. We've simply said that objects should be able to turn themselves into strings (serialization), and then turn strings into instances of themselves (deserialization). Component technologies, however, much like word processors, have specific formats (and rules) that make it easier to automate a lot of the details of how to perform the serialization and deserialization. These rules and formats are spelled out in the component technology specification document. (The serialization specification for JavaBeans can be found in the Resources section.) When serialization formats are standardized, software can manipulate the serialized data strings in more detail, since developers know what to expect from a well-formed string.

Standardizing serialization can also make programming easier. It's certainly possible for a programmer to go through every class that needs persistence in an application, writing functions that say "write this, write that" and "read this, read that," but this makes for awfully tedious work. A component technology specification provides detailed guidelines for how to perform the serialization, so much of the serialization code comes free of charge. (This frees programmers up for more interesting work, such as battling irreproducible bugs and narcoleptic operating systems.) We'll discuss how JavaBeans handles serialization in the section on JavaBeans serialization below.

One of the coolest things about standardized object serialization is that it allows different component technologies and even different languages to share and process each others' objects. If, for example, WordPerfect can read and write WordStar files (leaving aside the question of why it would want to), why couldn't a JavaBean read and write a file containing a serialized OLE or OpenDoc object? (The answer is: It can -- at least in theory.) If the OpenDoc serialization specification is openly available, a JavaBean could be written to be able read OpenDoc objects. The user could then manipulate the data in the object, and the JavaBean could write its internal data back to the file in OpenDoc object format. Later, when a "real" OpenDoc application opens the file, it would find the new state in its native format, never suspecting that a JavaBean had anything to do with it. As long as the standards remain open (and closed standards are arguably worse than no standards at all), components should be able to interoperate.

Beam me up, Scotty: Distributed systems

Years ago I spent a rainy week in Amsterdam feeling sorry for myself because I was stuck there, waiting for an insurance form to arrive in the mail. For reasons I still can't fathom, it hadn't occurred to me to have the document faxed to me for my signature. (Fax machines were less common then, but still...) It didn't really matter where I was when I signed the document; it just needed to be signed and returned to its point of origin for further processing. If the fax option had occurred to me, I could have asked dear old Dad to serialize the insurance form into electronic pulses (via a fax machine), after which it would have been deserialized into a duplicate document (with another fax) in Amsterdam, signed by myself, and sent or faxed back to the U.S.

Object serialization makes something analogous to the above insurance form example possible for software objects. Sometimes the resources necessary to perform a particular task aren't available locally. (In the case of my insurance form, the resource was my hand, which was, along with the rest of me, in The Netherlands.) Other times, it's computationally cheaper to pack up objects and ship them out for processing on other machines, a process called load balancing. In still other situations, existing ("legacy") systems can be wrapped in new layers of software, meaning that effectively they are repackaged as network services. Objects can be serialized and sent to the legacy-system "wrapper" code, which reconstitutes the objects, operates on them with the old system, and sends them back to the system from which they originated (or sends them on to other systems).

A distributed system can be loosely defined as a system in which data may be operated on by any of several processors. All of these applications are examples of distributed processing, in which software objects are freed from having to run on the individual computers on which they were, so to speak, born.

One form of distributed processing, called an object request broker, or ORB, involves serializing objects and method call arguments and shipping them around for processing on remote systems. One common example of an object request broker is CORBA. CORBA specifies object formats and operations so completely that objects can be created and processed by programs running on different computers on a network, even if the programs were originally written in different languages. We'll fool around with CORBA in a later column.

JavaBeans serialization

The JavaBeans API Specification spells out at specifically the goals of the bean serialization mechanism and how those goals are achieved in the API. JavaBeans persistence is constructed on top of two Java 1.1 features: object serialization (primarily) and introspection (which is built on top of reflection).

Several classes and interfaces were added to the java.io package to support object serialization. These new classes and interfaces know how to read and write all of Java's built-in data types like byte, int, double, and so on, so that's taken care of for you. (Strings are written in Universal Transfer Format, or UTF.) The interfaces that describe how to read and write these data types are specified in java.io.DataOutput and java.io.DataInput. These interfaces are implemented in various places, including java.io.ObjectOutputStream and java.io.ObjectInputStream. The object input and output streams are used (as we'll see in the coding examples below) to output object contents in accordance with rules specified by the Java Object Serialization Specification document (see Resources below).

1 2 Page 1
Page 1 of 2