As a developer, you strive to provide both functionality and performance to your users. With that in mind, how do you ensure that your consideration for performance does not compromise data integrity?
Enterprise applications often try to balance application scalability with concurrency considerations. You can achieve scalability using many different techniques, including minimizing network traffic, which often manifests itself in the Value Object pattern. Using this pattern, however, raises data integrity issues, which we will address here.
This article describes the Optimistic Locking pattern. It begins with a transactional overview, then describes the data integrity problem in detail. Next, it considers a number of solutions, focusing on a specific solution illustrated with an example.
Transactions represent a fundamental unit of work for managing business complexity. At its most basic, a transaction is an execution of a program (updating a customer record in a database or making a reservation, for example). A typical business application has many associated transactions.
Transactions ensure the integrity of an application's business rules, a process that gives rise to transaction processing (TP) applications that execute multiple transactions simultaneously. A TP application creates, executes, and manages transactions, and also provides a scalable, distributed environment in which they can run.
As a TP application example, consider an airline reservation system. (One of the first TP applications was an airline reservation system called Sabre built by IBM for American Airlines.) In our example, when a passenger wants to book a seat on a flight, three things must happen:
- An open seat must exist on the flight
- A ticket must be issued
- The passenger must be billed via a credit card using Electronic Data Interchange (EDI)
This system quickly raises concurrency, scalability, and consistency concerns when multiple users try to make reservations on a distributed system.
To simplify the potential complexity, consider the entire flight booking process as one unit of work -- a transaction. A transaction comprises four critical properties:
Together, the four critical properties produce the ACID acronym.
In the context of our example, consider a transaction atomic if it executes either completely or not at all. For example, if no seat exists on the plane, the entire transaction aborts or rolls back to the transaction's start. When a transaction rolls back, the database returns to the state it had prior to the transaction's inception.
Moving on, consider a transaction isolated if the transaction executes serially. In other words, it should appear as if the transaction runs alone with no other transaction occurring simultaneously. This guarantees data integrity.
A transaction must also be durable; a permanent record of the transaction must persist. This may sound obvious, but for optimization purposes transactional records are often kept in memory. However, the transaction cannot be considered ACID until the data is written to permanent storage.
Although second on the list, the last term of an ACID transaction to consider is consistent. A transaction ensures consistency if it is atomic, isolated, and durable. If an airplane possesses 10 seats and each seat sells for 00, then at the end of 10 successful transactions the airline's account should have ,000 more than it did when it started. If this is the case, the database is in a consistent state.
Transactions in EJBs
EJBs specifically support distributed transactions. Two types of transaction handling exist: bean-managed and container-managed transactions. The two types simplify the deployment of transactionally aware EJBs.
Bean-managed transactions in EJBs are controlled via the Java Transactional API (JTA) and explicated with the
javax.transaction.UserTransaction interface -- the visible part of JTA.
UserTransaction interface represents a contract between the client and the transactional coordinator. Explicitly, this involves three
You can demark EJB transactions in three ways:
- By the client
- By an EJB
- By a container
Demark by the client
A client using an EJB can explicitly demark the transaction. All the operations called between the
commit() methods are involved in the transaction. In the event that any one of these operations fails, the transaction will roll back and the database will return to its initial state. Otherwise, the transaction will commit and the changes become permanent.
Demark by an EJB
As mentioned above, transaction in EJBs can be bean managed or container managed.
You can only use bean-managed transactions with session beans. Session beans use a technique similar to that of the client-managed transactions, described above, to explicitly demark transactional boundaries. Stateless session beans are restricted because they can demark transactions only within a single method, whereas stateful session beans can demark transactions across multiple methods.
Demark by a container
Both entity beans and session beans can demark transactions with container-managed transactions. In this case, you define a method's transaction attributes in an XML deployment file. This is the preferred way to define transactional behavior with EJBs. You can manage changes in transactional behavior by modifying deployment descriptors, thus minimizing propagation of code changes. There are six different transactional attributes that can define a method's transactional behavior:
For a full description of each attribute, see the EJB Specification in Resources. For the purposes of this article, we'll explain the
Required attribute. Bean methods marked with
Required must be invoked within a transaction's scope. If a transaction already exists (that is, the method is called from a client as part of an existing transaction), the
Required bean method is included in the scope of the current transaction. If it is not called within the context of a transaction, then a new transaction scope is created and the
Required bean method is added to that new transaction.
Optimistic data locking relies on the idea that data remains unmodified while it is away from the server. As a simple example, consider how you'd update client details. Customer details live in a row of a database (to a first approximation) and are represented by an entity bean. If a client wants to update these details in an manner that cuts down on network traffic, all the clients' details will be packaged inside a state holder object and sent back as a serialized object. This is known as the Value Object pattern. The data is not locked and other clients can have access to it simultaneously, thus ensuring a scalable system.
The problem: the customer details are away from the source and can possibly become stale. For example, a second client can request the same customer details, modify them, and commit them back to the entity beans, which will flush the data to the database. The first client, unaware that it is dealing with a stale copy of the data, modifies and commits the data. Obviously, with no checking mechanism to detect this conflict, the first client's changes, which commit last, will be made permanent, thus overwriting the changes made by the second client.
For optimistic locking to work effectively, you must be able to detect these write-write conflicts and to make the client aware of them so they can be dealt with appropriately.
In designing distributed multiuser applications, sharing objects in real time is essential, but it can lead to resource sharing conflicts. In such conflicts, a user or process can change an object's state while another user or process is using the object. Database managers solve this problem using various locking strategies.
You can employ one of two types of locking strategies: pessimistic or optimistic locking. With pessimistic locking, data is locked on the database when read for update. This can lead to high lock contention. In contrast, optimistic locking lets different transactions read the same state concurrently and checks data integrity only at update time.
A long transaction performs a series of DBMS commands that extend over a long period -- hours, days, weeks, or even months. A short transaction, in contrast, resolves a group of DBMS commands within a few seconds. For applications built with EJBs, short transactions are the default model. Long transactions are possible using EJBs, but an application using them can no longer take advantage of EJB's built-in transaction management. Instead, the application must use client-initiated transactions and the
UserTransaction interface to manage its own transactions, thus paying penalties in system scalability.
However, the short transaction model can simulate long transactions using either a pessimistic or optimistic locking strategy.
Transactions concerned only with simulating an exclusive lock on a shared data object use the pessimistic locking strategy. In contrast, optimistic locking involves checking data integrity only at update time. The second strategy proves most effective and acts as the basis for our discussion.
Optimistic locking support for entities is not part of the EJB specifications. Some applications servers support optimistic locking, but it's a mistake to rely on any one mechanism. A portable solution is definitely preferable.
So how does an application detect write-write conflicts? Before trying to commit changes, an application must reread the objects read earlier and determine whether they have been changed in the interim. At least three techniques determine whether objects have been changed:
- Version count
- State comparison
If you use timestamps, an object's state information is expanded to contain a field for the timestamp. As part of the commit process, the application obtains a fresh timestamp and updates this field. So, when the object's state is committed into the database, the timestamp field indicates the commit time. To detect a write-write conflict, the application rereads the object and compares the timestamp in the newly read object with the timestamp in the originally read copy. If they are the same, no write-write conflict exists. If they are different, a conflict does exist and you must take remedial action. This works only if every application that can change the object also changes the timestamp.
Another technique: rather than rereading the object, add the timestamp field along with the primary key to the
WHERE clause used in the
UPDATE statement. If no rows are updated, then the row no longer matches the timestamp (or the primary key was deleted). This is more efficient than two I/Os to the database.
The version counter technique is similar to the timestamp technique. The object contains a field for the version. Each time the object successfully commits, the version is incremented. When preparing for the commit, the application rereads the objects and compares the current version value with the one in its original copy of the object. Again, all applications that can change the object must increment the version number when they do so.
Of the three, the state comparison proves the most complex. The object is reread and then the state of the newly reread object is compared with the original copy. As this might involve comparing numerous fields, it is therefore more expensive than the previous techniques. Additionally, for state comparison to work, the application must keep its original copy of the object unchanged. This further complicates the application since it must now accumulate changes to the object in some other container and apply them at commit time. For these reasons, we recommend timestamps or counters for most situations.
Our solution uses a version counter applied to the simple
Customer capture and update example previously developed in "Add XML to Your J2EE Applications" (JavaWorld, February 2001).
A detailed example
To illustrate our solution we will use part of the "Add XML to Your J2EE Applications" case study in which we developed a software application that automates a car rental. We will again employ the excellent open source application server JBoss (now at version 2.2, with version 3.0 in production) with embedded Tomcat (version 3.2.1). Our objective: concentrate on the application's customer adding/updating section. This is a mini use case embedded in the reserve-car use case from the previous article.
Under the reserve-car use case, the customer first calls the reservation desk to make a rental reservation. The rental reservation agent (RRA) takes the customer's name and queries the system for a match. If the customer is new, the RRA takes the customer's information and enters it into the system. Otherwise, existing customer details are presented for update. If the user chooses to continue, existing details are added or updated as required. This getting and setting of the customer object, which involves the two dialogs, is illustrated in Figures 1 and 2.
Then enter and submit the customer information, including name, address, and credit card number.