Object persistence and Java

Get an in-depth look at the issues surrounding object persistence in object-oriented languages

Object durability, or persistence, is the term you often hear used in conjunction with the issue of storing objects in databases. Persistence is expected to operate with transactional integrity, and as such it is subject to strict conditions. (See the Resources section of this article for more information on transaction processing.) In contrast, language services offered through standard language libraries and packages are often free from transactional constraints.

As we'll see in this article, evidence suggests that simple Java persistence will likely stem from the language itself, while sophisticated database functionality will be offered by database vendors.

No object is an island

In the real world, you rarely find an object that lacks relations to other objects. Objects are components of object models. The issue of object durability transcends the issue of object model durability and distribution once we make the observation that objects are interconnected by virtue of their relations to one another.

The relational approach to data storage tends to aggregate data by type. Rows in a table represent the physical aggregate of objects of the same type on disk. The relationships among objects are then represented by keys that are shared across many tables. Although through database organization, relational databases sometimes allow tables that are likely to be used together to be co-located (or clustered) in the same logical partition, such as a database segment, they have no mechanism to store object relationships in the database. Hence, in order to construct an object model, these relationships are constructed from the existing keys at run time in a process referred to as table joins. This is the same well-known property of the relational databases called data independence. Nearly all variants of object databases offer some mechanism to enhance the performance of a system that involves complex object relationships over traditional relational databases.

To query or to navigate?

In storing objects on disk, we are faced with the choice of co-locating related objects to better accommodate navigational access, or to store objects in table-like collections that aggregate objects by type to facilitate predicate-based access (queries), or both. The co-location of objects in persistent storage is an area where relational and object-oriented databases widely differ. The choice of the query language is another area of consideration. Structured Query Language (SQL) and extensions of it have provided relational systems with a predicate-based access mechanism. Object Query Language (OQL) is an object variant of SQL, standardized by ODMG, but support for this language is currently scant. Polymorphic methods offer unprecedented elegance in constructing a semantic query for a collection of objects. For example, imagine a polymorphic behavior for acccount called isInGoodStanding. It may return the Boolean true for all accounts in good standing, and false otherwise. Now imagine the elegance of querying the collection of accounts, where inGoodStanding is implemented differently based on business rules, for all accounts in good standing. It may look something like:

setOfGoodCustomers = setOfAccounts.query(account.inGoodStanding());

While several of the existing object databases are capable of processing such a query style in C++ and Smalltalk, it is difficult for them to do so for larger (say, 500+ gigabytes) collections and more complex query expressions. Several of the relational database companies, such as Oracle and Informix, will soon offer other, SQL-based syntax to achieve the same result.

Persistence and type

An object-oriented language aficionado would say persistence and type are orthogonal properties of an object; that is, persistent and transient objects of the same type can be identical because one property should not influence the other. The alternative view holds that persistence is a behavior supported only by persistable objects and certain behaviors may apply only to persistent objects. The latter approach calls for methods that instruct persistable objects to store and retrieve themselves from persistent storage, while the former affords the application a seamless view of the entire object model -- often by extending the virtual memory system.

Canonicalization and language independence

Objects of the same type in a language should be stored in persistent storage with the same layout, regardless of the order in which their interfaces appear. The processes of transforming an object layout to this common format are collectively known as canonicalization of object representation. In compiled languages with static typing (not Java) objects written in the same language, but compiled under different systems, should be identically represented in persistent storage.

An extension of canonicalization addresses language-independent object representation. If objects can be represented in a language-independent fashion, it will be possible for different representations of the same object to share the same persistent storage.

One mechanism to accomplish this task is to introduce an additional level of indirection through an interface definition language (IDL). Object database interfaces can be made through the IDL and the corresponding data structures. The downside of IDL style bindings is two fold: First, the extra level of indirection always requires an additional level of translation, which impacts the overall performance of the system; second, it limits use of database services that are unique to particular vendors and that might be valuable to application developers.

A similar mechanism is to support object services through an extension of the SQL. Relational database vendors and smaller object/relational vendors are proponents of this approach; however, how successful these companies will be in shaping the framework for object storage remains to be seen.

But the question remains: Is object persistence part of the object's behavior or is it an external service offered to objects via separate interfaces? How about collections of objects and methods for querying them? Relational, extended relational, and object/relational approaches tend to advocate a separation between language, while object databases -- and the Java language itself -- see persistence as intrinsic to the language.

Native Java persistence via serialization

Object serialization is the Java language-specific mechanism for the storage and retrieval of Java objects and primitives to streams. It is worthy to note that although commercial third-party libraries for serializing C++ objects have been around for some time, C++ has never offered a native mechanism for object serialization. Here's how to use Java's serialization:

// Writing "foo" to a stream (for example, a file)

// Step 1. Create an output stream

// that is, create bucket to receive the bytes

FileOutputStream out = new FileOutputStream("fooFile");

// Step 2. Create ObjectOutputStream

// that is, create a hose and put its head in the bucket

ObjectOutputStream os = new ObjectOutputStream(out)

// Step 3. Write a string and an object to the stream

// that is, let the stream flow into the bucket

os.writeObject("foo");

os.writeObject(new Foo());

// Step 4. Flush the data to its destination

os.flush();

The Writeobject method serializes foo and its transitive closure -- that is, all objects that can be referenced from foo within the graph. Within the stream only one copy of the serialized object exists. Other references to the objects are stored as object handles to save space and avoid circular references. The serialized object starts with the class followed by the fields of each class in the inheritance hierarchy.

// Reading an object from a stream

// Step 1. Create an input stream

FileInputStream in = new FileInputStream("fooFile");

// Step 2. Create an object input stream

ObjectInputStream ins = new ObjectInputStream(in);

// Step 3. Got to know what you are reading

String fooString = (String)ins.readObject();

Foo foo = (Foo)s.readObject();

Object serialization and security

By default, serialization writes and reads non-static and non-transient fields from the stream. This characteristic can be used as a security mechanism by declaring fields that may not be serialized as private transient. If a class may not be serialized at all, writeObject and readObject methods should be implemented to throw NoAccessException.

Persistence with transactional integrity: Introducing JDBC

Modeled after X/Open's SQL CLI (Client Level Interface) and Microsoft's ODBC abstractions, Java database connectivity (JDBC) aims to provide a database connectivity mechanism that is independent of the underlying database management system (DBMS).To become JDBC-compliant, drivers need to support at least the ANSI SQL-2 entry-level API, which gives third-party tool vendors and applications enough flexibility for database access.

JDBC is designed to be consistent with the rest of the Java system. Vendors are encouraged to write an API that is more strongly typed than ODBC, which affords greater static type-checking at compile time.

Here's a description of the most important JDBC interfaces:

  • java.sql.Driver.Manager handles the loading of drivers and provides support for new database connections.

  • java.sql.Connection represents a connection to a particular database.

  • java.sql.Statement acts as a container for executing an SQL statement on a given connection.

  • java.sql.ResultSet controls access to the result set.

You can implement a JDBC driver in several ways. The simplest would be to build the driver as a bridge to ODBC. This approach is best suited for tools and applications that do not require high performance. A more extensible design would introduce an extra level of indirection to the DBMS server by providing a JDBC network driver that accesses the DBMS server through a published protocol. The most efficient driver, however, would directly access the DBMS proprietary API.

Object databases and Java persistence

A number of ongoing projects in the industry offer Java persistence at the object level. However, as of this writing, Object Design's PSE (Persistent Storage Engine) and PSE Pro are the only fully Java-based, object-oriented database packages available (at least, that I am aware of). Check the Resources section for more information on PSE and PSE Pro.

Java development has led to a departure from the traditional development paradigm for software vendors, most notably in the development process timeline. For example, PSE and PSE Pro are developed in a heterogeneous environment. And because there isn't a linking step in the development process, developers have been able to create various functional components independent of each other, which results in better, more reliable object-oriented code.

PSE Pro has the ability to recover a corrupted database from an aborted transaction caused by system failure. The classes that are responsible for this added functionality are not present in the PSE release. No other differences exist between the two products. These products are what we call "dribbleware" -- software releases that enhance their functionality by plugging in new components. In the not-so-distant future, the concept of purchasing large, monolithic software would become a thing of the past. The new business environment in cyberspace, together with Java computing, enable users to purchase only those parts of the object model (object graph) they need, resulting in more compact end products.

PSE works by post-processing and annotating class files after they have been created by the developer. From PSE's point of view, classes in an object graph are either persistent-capable or persistent-aware. Persistent-capable classes may persist themselves while persistent-aware classes can operate on persistent objects. This distinction is necessary because persistence may not be a desired behavior for certain classes. The class file post-processor makes the following modifications to classes:

  • Modifies the class to inherit from odi.Persistent or odi.util.HashPersistent.

  • Defines the initializeContents() method to load real values into hollow instances of your Persistent subclass. ObjectStore provides methods on the GenericObject class that retrieves each Field type.

    Be sure to call the correct methods for the fields in your persistent object. A separate method is available for obtaining each type of Field object. ObjectStore calls the initializeContents() method as needed. The method signature is:

    public void initializeContents(GenericObject genObj)
    
  • Defines the

    flushContents()

    method to copy values from a modified instance (active persistent object) back to the database.

    ObjectStore

    provides methods on the

    GenericObject

    Be sure to call the correct methods for the fields in your persistent object. A separate method is available for setting each type of Field object. ObjectStore calls the flushContents() method as needed. The method signature is:

    public void flushContents(GenericObject genObj)
    
  • Defines the clearContents() method to reset the values of an instance to the default values. This method must set all reference fields that referred to persistent objects to null. ObjectStore calls this method as needed. The method signature is:

    public void clearContents()
    
  • Modifies the methods that reference non-static fields to call the Persistent.fetch() and Persistent.dirty() methods as needed. These methods must be called before the contents of persistent objects can be accessed or modified, respectively. While this step is not mandatory, it does provide a systematic way to ensure that the fetch() or dirty() method is called prior to accessing or updating object content.

  • Defines a class that provides schema information about the persistence-capable class.

All these steps can be completed either manually or automatically.

PSE's transaction semantic

You old-time users of ObjectStore probably will find the database and transaction semantics familiar. There is a system-wide ObjectStore object that initializes the environment and is responsible for system-wide parameters. The Database class offers methods (such as create, open, and close), and the Transaction class has methods to begin, abort, or commit transactions. As with serialization, you need to find an entry point into the object graph. The getRoot and setRoot methods of the Database class serve this function. I think a few examples would be helpful here. This first snippet shows how to initialize ObjectStore:

ObjectStore.initialize(serverName, null);

try {

db = Database.open(dbName, Database.openUpdate);

} catch(DatabaseNotFoundException exception) {

db = Database.create(dbName, 0664);

}

This next snippet shows how to start and commit a transaction:

Transaction transaction = Transaction.begin(Transaction.update);

try {

foo = (Foo)db.getRoot("fooHead");

} catch(DatabaseRootNotFoundException exception) {

db.createRoot("fooHead", new Foo());

}

transaction.commit();

The three classes specified above -- Transaction, Database, and ObjectStore -- are fundamental classes for ObjectStore. PSE 1.0 does not support nested transactions, backup and recovery, clustering, large databases, object security beyond what is available in the language, and any type of distribution. What is exciting, however, is all of this functionality will be incrementally added to the same foundation as the product matures.

Conclusion

Although it is still too early to establish which methodology for object persistence in general and Java persistence in particular will be dominant in the future, it is safe to assume that a myriad of such styles will co-exist. The shift of storing objects as objects without disassembly into rows and columns is sure to be slow, but it will happen. In the meantime, we are more likely to see object databases better utilized in advanced engineering and telecommunications applications than in banking and back-office financial applications.

Arsalan Saljoughy is asystems engineer specializing in object technology at Sun Microsystems. He earned his M.S. in mathematics from SUNY at Albany, and subsequently was a research fellow at the University of Berlin. Before joining Sun, he worked as a developer and as an IT consultant to financial services companies.

Learn more about this topic

  • This is a classic premier for anyone interested in database technology.
  • Rick Cattel, Object Data Management, 1994, Addison-Wesley Publishing Company, New York.
  • Rick wrote this book way before Java was en vogue and has an excellent treatment of various styles of object storage, from API to distribution to actual disk layout. Rick has gone on to play a important role in the definition of JDBC API.
  • Won Kim (Editor), Modern Database Systems, 1995, ACM Press New York, New York.
  • This book is a collection of papers written on data storage, but has a bias toward demonstrating that object/relational systems will win out over pure object systems. Nonetheless, most anyone who is someone in this business has contributed to this book, so it's worth a look.

Join the discussion
Be the first to comment on this article. Our Commenting Policies