"Everything should be made as simple as possible, but not simpler."
The need to persist data created at runtime is as old as computing. And the need to store object-oriented data cropped up when object-oriented programming became pervasive. Currently, most modern, nontrivial applications use an object-oriented paradigm to model application domains. In contrast, the database market is more divided. Most database systems use the relational model, but object-based data stores prove indispensable in many applications. Plus, we also have legacy systems that we often need to interface to.
This article identifies the issues associated with data persistence in transactional middleware environments, such as J2EE (Java 2 Platform, Enterprise Edition), and shows how Java Data Objects (JDO) solves some of those issues. This article provides an overview, not a detailed tutorial, and is written from the viewpoint of an application developer, not a JDO implementation designer.
Read the whole series on Java Data Objects:
Those Java developers, designers, and J2EE architects who work on systems that must store data in relational or object databases, or other storage media should read this article. I assume you have a basic knowledge of Java and some familiarity with object-relational issues and terminology.
Transparent persistence: Why bother?
More than a decade of continuous attempts to bridge object-oriented runtime and persistence point to several important observations (listed in order of importance):
- Abstracting away any persistence details and having a clean, simple, object-oriented API to perform data storage is paramount. We don't want to handle persistence details and internal data representation in data stores, be they relational, object-based, or something else. Why should we deal with low-level constructs of the data-store model, such as rows and columns, and constantly translate them back and forth? Instead, we need to concentrate on that complex application we were required to deliver by yesterday.
- We want to use the plug-and-play approach with our data stores: We want to use different providers/implementations without changing a line of the application source code -- and perhaps without modifying more than a few lines in the appropriate configuration file(s). In other words, we need an industry standard for accessing data based on Java objects, one that plays a role similar to the one JDBC (Java Database Connectivity) plays as an industry standard for accessing SQL-based data.
We want to use the plug-and-play approach with different database paradigms -- that is, we want to switch from a relational database to an object-oriented one with minimal changes to the application code. Though nice to have, in practice, this capability is often not required.
One comment here: While relational databases enjoy the biggest market presence by far, providing a unified persistence API and allowing data-store providers to compete on implementation strengths makes sense, regardless of the paradigm these providers use. This approach might eventually help level the playing field between the two dominant database vendor groups: the well-entrenched relational camp and the struggling-for-market-share object-oriented camp.
The three discoveries listed above lead us to define a persistence layer, a framework that provides a high-level Java API for objects and relationships to outlive the runtime environment's (JVM) lifespan. Such a framework must feature the following qualities:
- Minimal intrusion
- Transparency, meaning the framework hides the data-store implementation
- Consistent, concise APIs for object storage/retrieval/update
- Transaction support, meaning the framework defines transactional semantics associated with persistent objects
- Support for both managed (e.g., application server-based) as well as unmanaged (standalone) environments
- Support for the necessary extras, such as caching, queries, primary key generation, and mapping tools
- Reasonable licensing fees -- not a technical requirement, but we all know that poor economics can doom an excellent project
I detail most of the above qualities in the following sections.
Simplicity rates high on my list of required traits for any software framework or library (see this article's opening quote). Developing distributed applications is already hard enough, and many software projects fail because of poor complexity (and, by extension, risk) management. Simple is not synonymous with simplistic; the software should have all the needed features that allow a developer to do his/her job.
Every persistent storage system introduces a certain amount of intrusion into the application code. The ideal persistence layer should minimize intrusion to achieve better modularity and, thus, plug-and-play functionality.
For the purpose of this article, I define intrusion as:
- The amount of persistence-specific code splattered across the application code
- The need to modify your application object model by either having to implement some persistence interface -- such as
Persistableor the like -- or by postprocessing the generated code
Intrusion also applies to object-oriented database systems and, although usually less of an issue there compared to relational data stores, it can vary significantly among ODBMS (object-oriented database management system) vendors.
The persistent layer transparency concept is pretty simple: the application uses the same API regardless of the data-store type (data storage-type transparency), or the data-store vendor (data storage-vendor transparency). Transparency greatly simplifies applications and improves their maintainability by hiding data-store implementation details to the maximum extent possible. In particular, for the prevalent relational data stores, unlike JDBC, you don't need to hardcode SQL statements or column names, or remember the column order returned by a query. In fact, you don't need to know SQL or relational algebra, because they're too implementation specific. Transparency is perhaps the persistence layer's most important trait.
Consistent, simple API
The persistence layer API boils down to a relatively small set of operations:
- Elementary CRUD (create, read, update, delete) operations on first-class objects
- Transaction management
- Application- and persistence-object identities' management
- Cache management (i.e., refreshing and evicting)
- Query creation and execution
An example of a
public void persist(Object obj); // Save obj to the data store. public Object load(Class c, Object pK); // Read obj with a given primary key. public void update(Object obj); // Update the modified object obj. public void delete(Object obj); // Delete obj from the database. public Collection find(Query q); // Find objects that satisfy conditions of our query.
A good persistence layer needs several elementary functions to start, commit, or roll back a transaction. Here is an example:
// Transaction (tx) demarcation. public void startTx(); public void commitTx(); public void rollbackTx(); // Choose to make a persistent object transient after all. public void makeTransient(Object o)
Note: Transaction demarcation APIs are primarily used in nonmanaged environments. In managed environments, the built-in transaction manager often assumes this functionality.
Managed environments support
Managed environments, such as J2EE application servers, have grown popular with developers. Who wants to write middle tiers from scratch these days when we have excellent application servers available? A decent persistence layer should be able to work within any major application server's EJB (Enterprise JavaBean) container and synchronize with its services, such as JNDI (Java Naming and Directory Interface) and transaction management.
The API should be able to issue arbitrary queries for data searches. It should include a flexible and powerful, but easy-to-use, language -- the API should use Java objects, not SQL tables or other data-store representations as formal query parameters.
Cache management can do wonders for application performance. A sound persistence layer should provide full data caching as well as appropriate APIs to set the desired behavior, such as locking levels, eviction policies, lazy loading, and distributed caching support.
Primary key generation
Providing automatic identity generation for data is one of the most common persistence services. Every decent persistence layer should provide identity generation, with support for all major primary key-generation algorithms. Primary key generation is a well-researched issue and numerous primary key algorithms exist.
Mapping, for relational databases only
With relational databases, a data mapping issue arises: the need to translate objects into tables, and to translate relationships, such as dependencies and references, into additional columns or tables. This is a nontrivial problem in itself, especially with complex object models. The topic of object-relational model impedance mismatch reaches beyond this article's scope, but is well publicized. See Resources for more information.
The following list of extras related to mapping and/or relational data stores are not required in the persistence layer, but they make a developer's life much easier:
- A GUI (graphical user interface) mapping tool
- Code generators: Autogeneration of DDL (data description language) to create database tables, or autogeneration of Java code and mapping files from DDL
- Primary key generators: Supporting multiple key-generation algorithms, such as UUID, HIGH-LOW, and SEQUENCE
- Support for binary large objects (BLOBs) and character-based large objects (CLOBs)
- Self-referential relations: An object of type
Barreferencing another object of type
Bar, for example
- Raw SQL support: Pass-through SQL queries
The following code snippet shows how to use the persistence layer API. Suppose we have the following domain model: A company has one or more locations, and each location has one or more users. The following could be an example application's code:
PersistenceManager pm =PMFactory.initialize(..); Company co = new Company("MyCompany"); Location l1 = new Location1 ("Boston"); Location l2 = new Location("New York"); // Create users. User u1 = new User("Mark"); User u2 = new User("Tom"); User u3 = new User("Mary"); // Add users. A user can only "belong" to one location. L1.addUser(u1); L1.addUser(u2); L2.addUser(u3); // Add locations to the company. co.addLocation(l1); co.addLocation(l2); // And finally, store the whole tree to the database. pm.persist(c);
In another session, you can look up companies employing the user
PersistenceManager pm =PMFactory.initialize(...) Collection companiesEmployingToms = pm.find("company.location.user.name = 'Tom'");
For relational data stores, you must create an additional mapping file. It might look like this:
<!DOCTYPE mapping PUBLIC ... > <mapping> <class name="com.numatica.example.Company" identity="companyID" key-generator="SEQUENCE"> <cache-type type="count-limited" capacity="5"/> <description>Company</description> <map-to table="Companies"/> <field name="companyID"type="long"> <sql name="companyID" type="numeric"/> </field> <field name="name" type="string"> <sql name="name" type="varchar"/> </field> <field name="locations" type="com.numatica.example.Location" collection="arraylist"> </field> </class> <class name="com.numatica.example.Location "identity="locationID" key-generator="SEQUENCE"> <cache-type type="unlimited"/> <description>Locations</description> <map-to table="Locations"/> <field name="locationID" type="long"> <sql name="locationID" type="numeric"/> </field> <field name="name" type="string"> <sql name="name" type="varchar"/> </field> <field name="company" type="com.numatica.example.Company"required="true"> <sql name="companyID"/> </field> </class> <class name="com.numatica.example.User" identity="userID" depends="com.numatica.example.Location" > <cache-type type="count-limited" capacity="200"/> <description>User</description> <map-to table="Users"/> <field name="userID" type="integer"> <sql name="userID" type="numeric"/> </field> <field name="location" type="com.numatica.example.Location"required="true"> <sql name="locationID"/> </field> <field name="name" type="string"> <sql name="username" type="varchar"/> </field> </class> </mapping>
The persistence layer takes care of the rest, which encompasses the following:
- Finding dependent object groups
- Managing application object identity
- Managing persistent object identities (primary keys)
- Persisting each object in the appropriate order
- Providing cache management
- Providing the proper transactional context (we don't want only a portion of the object tree persisted, do we?)
- Providing user-selectable locking modes