Object durability, or persistence, is the term you often hear used in conjunction with the issue of storing objects in databases. Persistence is expected to operate with transactional integrity, and as such it is subject to strict conditions. (See the Resources section of this article for more information on transaction processing.) In contrast, language services offered through standard language libraries and packages are often free from transactional constraints.
As we'll see in this article, evidence suggests that simple Java persistence will likely stem from the language itself, while sophisticated database functionality will be offered by database vendors.
No object is an island
In the real world, you rarely find an object that lacks relations to other objects. Objects are components of object models. The issue of object durability transcends the issue of object model durability and distribution once we make the observation that objects are interconnected by virtue of their relations to one another.
The relational approach to data storage tends to aggregate data by type. Rows in a table represent the physical aggregate of objects of the same type on disk. The relationships among objects are then represented by keys that are shared across many tables. Although through database organization, relational databases sometimes allow tables that are likely to be used together to be co-located (or clustered) in the same logical partition, such as a database segment, they have no mechanism to store object relationships in the database. Hence, in order to construct an object model, these relationships are constructed from the existing keys at run time in a process referred to as table joins. This is the same well-known property of the relational databases called data independence. Nearly all variants of object databases offer some mechanism to enhance the performance of a system that involves complex object relationships over traditional relational databases.
To query or to navigate?
In storing objects on disk, we are faced with the choice of co-locating related objects to better accommodate navigational access, or to store objects in table-like collections that aggregate objects by type to facilitate predicate-based access (queries), or both. The co-location of objects in persistent storage is an area where relational and object-oriented databases widely differ. The choice of the query language is another area of consideration. Structured Query Language (SQL) and extensions of it have provided relational systems with a predicate-based access mechanism. Object Query Language (OQL) is an object variant of SQL, standardized by ODMG, but support for this language is currently scant. Polymorphic methods offer unprecedented elegance in constructing a semantic query for a collection of objects. For example, imagine a polymorphic behavior for
isInGoodStanding. It may return the Boolean true for all accounts in good standing, and false otherwise. Now imagine the elegance of querying the collection of accounts, where
inGoodStanding is implemented differently based on business rules, for all accounts in good standing. It may look something like:
setOfGoodCustomers = setOfAccounts.query(account.inGoodStanding());
While several of the existing object databases are capable of processing such a query style in C++ and Smalltalk, it is difficult for them to do so for larger (say, 500+ gigabytes) collections and more complex query expressions. Several of the relational database companies, such as Oracle and Informix, will soon offer other, SQL-based syntax to achieve the same result.
Persistence and type
An object-oriented language aficionado would say persistence and type are orthogonal properties of an object; that is, persistent and transient objects of the same type can be identical because one property should not influence the other. The alternative view holds that persistence is a behavior supported only by persistable objects and certain behaviors may apply only to persistent objects. The latter approach calls for methods that instruct persistable objects to store and retrieve themselves from persistent storage, while the former affords the application a seamless view of the entire object model -- often by extending the virtual memory system.
Canonicalization and language independence
Objects of the same type in a language should be stored in persistent storage with the same layout, regardless of the order in which their interfaces appear. The processes of transforming an object layout to this common format are collectively known as canonicalization of object representation. In compiled languages with static typing (not Java) objects written in the same language, but compiled under different systems, should be identically represented in persistent storage.
An extension of canonicalization addresses language-independent object representation. If objects can be represented in a language-independent fashion, it will be possible for different representations of the same object to share the same persistent storage.
One mechanism to accomplish this task is to introduce an additional level of indirection through an interface definition language (IDL). Object database interfaces can be made through the IDL and the corresponding data structures. The downside of IDL style bindings is two fold: First, the extra level of indirection always requires an additional level of translation, which impacts the overall performance of the system; second, it limits use of database services that are unique to particular vendors and that might be valuable to application developers.
A similar mechanism is to support object services through an extension of the SQL. Relational database vendors and smaller object/relational vendors are proponents of this approach; however, how successful these companies will be in shaping the framework for object storage remains to be seen.
But the question remains: Is object persistence part of the object's behavior or is it an external service offered to objects via separate interfaces? How about collections of objects and methods for querying them? Relational, extended relational, and object/relational approaches tend to advocate a separation between language, while object databases -- and the Java language itself -- see persistence as intrinsic to the language.
Native Java persistence via serialization
Object serialization is the Java language-specific mechanism for the storage and retrieval of Java objects and primitives to streams. It is worthy to note that although commercial third-party libraries for serializing C++ objects have been around for some time, C++ has never offered a native mechanism for object serialization. Here's how to use Java's serialization:
// Writing "foo" to a stream (for example, a file)
// Step 1. Create an output stream
// that is, create bucket to receive the bytes
FileOutputStream out = new FileOutputStream("fooFile");
// Step 2. Create ObjectOutputStream
// that is, create a hose and put its head in the bucket
ObjectOutputStream os = new ObjectOutputStream(out)
// Step 3. Write a string and an object to the stream
// that is, let the stream flow into the bucket
// Step 4. Flush the data to its destination
Writeobject method serializes foo and its transitive closure -- that is, all objects that can be referenced from foo within the graph. Within the stream only one copy of the serialized object exists. Other references to the objects are stored as object handles to save space and avoid circular references. The serialized object starts with the class followed by the fields of each class in the inheritance hierarchy.
// Reading an object from a stream
// Step 1. Create an input stream
FileInputStream in = new FileInputStream("fooFile");
// Step 2. Create an object input stream
ObjectInputStream ins = new ObjectInputStream(in);
// Step 3. Got to know what you are reading
String fooString = (String)ins.readObject();
Foo foo = (Foo)s.readObject();
Object serialization and security
By default, serialization writes and reads non-static and non-transient fields from the stream. This characteristic can be used as a security mechanism by declaring fields that may not be serialized as private transient. If a class may not be serialized at all,
readObject methods should be implemented to throw
Persistence with transactional integrity: Introducing JDBC
Modeled after X/Open's SQL CLI (Client Level Interface) and Microsoft's ODBC abstractions, Java database connectivity (JDBC) aims to provide a database connectivity mechanism that is independent of the underlying database management system (DBMS).To become JDBC-compliant, drivers need to support at least the ANSI SQL-2 entry-level API, which gives third-party tool vendors and applications enough flexibility for database access.
JDBC is designed to be consistent with the rest of the Java system. Vendors are encouraged to write an API that is more strongly typed than ODBC, which affords greater static type-checking at compile time.
Here's a description of the most important JDBC interfaces:
java.sql.Driver.Managerhandles the loading of drivers and provides support for new database connections.
java.sql.Connectionrepresents a connection to a particular database.
java.sql.Statementacts as a container for executing an SQL statement on a given connection.
java.sql.ResultSetcontrols access to the result set.
You can implement a JDBC driver in several ways. The simplest would be to build the driver as a bridge to ODBC. This approach is best suited for tools and applications that do not require high performance. A more extensible design would introduce an extra level of indirection to the DBMS server by providing a JDBC network driver that accesses the DBMS server through a published protocol. The most efficient driver, however, would directly access the DBMS proprietary API.
Object databases and Java persistence
A number of ongoing projects in the industry offer Java persistence at the object level. However, as of this writing, Object Design's PSE (Persistent Storage Engine) and PSE Pro are the only fully Java-based, object-oriented database packages available (at least, that I am aware of). Check the Resources section for more information on PSE and PSE Pro.
Java development has led to a departure from the traditional development paradigm for software vendors, most notably in the development process timeline. For example, PSE and PSE Pro are developed in a heterogeneous environment. And because there isn't a linking step in the development process, developers have been able to create various functional components independent of each other, which results in better, more reliable object-oriented code.
PSE Pro has the ability to recover a corrupted database from an aborted transaction caused by system failure. The classes that are responsible for this added functionality are not present in the PSE release. No other differences exist between the two products. These products are what we call "dribbleware" -- software releases that enhance their functionality by plugging in new components. In the not-so-distant future, the concept of purchasing large, monolithic software would become a thing of the past. The new business environment in cyberspace, together with Java computing, enable users to purchase only those parts of the object model (object graph) they need, resulting in more compact end products.
PSE works by post-processing and annotating class files after they have been created by the developer. From PSE's point of view, classes in an object graph are either persistent-capable or persistent-aware. Persistent-capable classes may persist themselves while persistent-aware classes can operate on persistent objects. This distinction is necessary because persistence may not be a desired behavior for certain classes. The class file post-processor makes the following modifications to classes: