The timestamp-based caching framework: Current data with peak performance

Build a dynamic LRU cache framework using standard Java utility classes

In his timeless masterpiece, The Art of War, Sun Tzu states: "...one who is skilled in warfare principles ...takes the enemy's walled city without attacking... His aim must be to take All-Under-Heaven intact. Therefore, weapons will not be blunted, and gains will be intact."

When we garbage-collect perfectly good objects, we "blunt our weapons" and waste precious system resources by re-creating these objects all over again. By implementing a caching framework, we hope that most of our "gains" resulting from our creation of complex objects will remain "intact." If we store completed objects in the cache, we avoid repeatedly re-creating these objects and thus enjoy a boost in application performance.

Time-to-live, or TTL, caching systems answer the needs of most applications that can afford to use slightly stale data. However, if the objects we cache contain time-sensitive pricing or availability information, they must always remain up to date and cannot be shown to the user as stale. These objects have no reliable time-to-live attribute, and to cache them effectively, we must employ different approaches. One successful approach, a timestamp-based caching framework (TBCF), is presented in this article.

Not caching Britney Spears

Imagine, if you will, a simple Website for selling CDs. On our site, users can search for CDs, check prices, and make purchases. For every user search, our application loads a CD cover picture, item details, price, and user ratings from the database. After all the information is loaded, the application creates CD value objects to massage and store the information from one or more JDBC (Java Database Connectivity) result sets. Then the application renders these CD objects as HTML and delivers the result to the user's browser. After the search results display, our application completes the user's request by allowing the JVM to garbage-collect the complex CD value objects it went to such great lengths to create. The same process then repeats for every other user who executes the same search for the same CD, for example, a Britney Spears CD.

While this typical design approach is straightforward and simple to implement, its performance is often suboptimal. With every user request, CD value objects undergo a "create-use-destroy" cycle, also known as object churning. This cycle does not typically cause problems for small, simple CD objects under low system loads. However, it does cause significant performance degradation if the CD objects are large and complex, requiring several JDBC calls and complex calculations. As the load on the application increases, object-churning problems rapidly intensify. Since objects are not reused, n users will always require the creation of n sets of CD objects and the use of n sets of system resources by our application. This problem is illustrated in Figure 1.

Figure 1. No cache: n users require n sets of objects, causing problems at higher loads

To address the object-churning problem, assembled CD value objects can be stored in a cache. A cache is typically implemented as a map, where CD objects are stored as values, with their IDs as keys. Each CD search request searches the cache first, before querying the database. If an existing CD value object is found in the cache, that object is used; a query to the database to create a new CD object is not required. Thus, with the cache in place, more CD objects are reused, and fewer objects are created and destroyed as part of the churning cycle. In a typical caching mechanism, all the objects that reside in the cache have a time-to-live (TTL) attribute. When the objects stay in the cache longer then their TTL, they become stale. Stale objects are periodically removed from the cache by a background thread (see "Develop a Generic Caching Service to Improve Performance" (JavaWorld, July 2001) for a good implementation of a TTL cache).

This approach works well for most applications, especially where objects have a well-defined TTL attribute or where users can accept somewhat stale data—for example, running a monthly report of product demand trends on data loaded nightly. Day-old data is perfectly acceptable in this case and can be safely used to create the desired report. We also know that TTL is exactly 24 hours, so our cache can be cleared and then refreshed nightly from the database, safely speeding up data delivery throughout the day.

The problem emerges when we try to cache time-sensitive objects. In most cases where product availability or pricing is concerned, we require an absolutely reliable way to determine staleness, as using stale data can have expensive consequences. Examples include product prices that change instantly with market demands, day-trader stock market quotes, mortgage rates, and any reports that must portray real-time results. In these complex applications, the TTL cache's background-thread refresh-scheme is simply not rigorous enough because no specific TTL exists for the cached objects, as pricing information changes randomly throughout the day. The challenges of TTL caching are shown in Figure 2.

Figure 2. Since TTL is not known, the object in the cache might be stale

Since we cannot serve a stale stock price to a day trader, we must first determine if the object in our cache is stale. In the famous mental experiment involving Schrodinger's Cat, the outside observer cannot reliably ascertain whether the cat is alive or dead without opening the sealed box. A TTL cache presents us with the same uncertainty: since TTL is not known, the object in the cache might be stale or it might still be fresh. The only way to know for sure is to open the proverbial box, i.e., look in the database. Thus, if we want to return only fresh data, we must go to the database to determine the staleness of every object we return, thereby negating any efficiency improvements we gained by using the TTL cache. In other words, the act of observing staleness destroys efficiency.

To guarantee the delivery of the most up-to-date information to the users, we need a more systematic approach to determine staleness. We must engineer some kind of a quick "staleness lookup" into every request addressed to our cache. The timestamp-based caching framework (TBCF) I present in this article does exactly that.

Timestamp-based caching framework requirements

We know enough about our Music Shop application to outline the software requirements of our new timestamp-based caching framework (TBCF):

  1. Our new caching framework should increase application performance by minimizing the object churning and maximizing object reuse.
  2. We want to display only the most up-to-date pricing and availability, thus TBCF must never return a stale object.
  3. We need to address compatibility with legacy systems. It would do us no good to have the world's greatest framework that can't be used in our application! Therefore, our new framework should have plug-and-play architecture: it should be easy to install and configure on any system. Ideally, TBCF should allow any existing application object to be cached, as long as it implements a simple Cacheable interface.
  4. We want to make it easy to plug any caching algorithm into our framework. Because one caching algorithm does not fit all applications, we should have the ability to plug in a least-frequently used (LFU), a least-recently used (LRU), or a first-in-first-out (FIFO) caching class, as long as it implements the required Cache interface.

Timestamp: The key to the cache kingdom

We are seemingly stuck with two main conflicting requirements: cache as much as we can and guarantee freshness of all the retrieved objects.

To satisfy these requirements we use Object's Timestamp attribute. Timestamp is simply the date and time the object was last modified by the system. Timestamp is typically stored as either Date or long. (On a related note, the Version attribute is related to Timestamp and can also be used in our framework to determine object freshness. Version is typically stored as an int.)

The easiest way to implement a Timestamp attribute is to use a database trigger. If any of the specified fields in a table row are modified in the database, the trigger automatically updates the timestamp column of the table with the latest SQL Sysdate. Database triggers are by nature application-independent, which renders them the ideal solution if multiple applications are accessing the same database tables. In our Music Shop application, we use a long representation of the Java Date attribute as our Timestamp. This long number is the count of milliseconds that separates "now" from the first moment of 1970—the year that launched the glorious decade that gave us Star Wars, Abba, punk, and Kermit the Frog.

A UML sequence diagram is worth 1,000 words

To help visualize how the Timestamp attribute is used, I created a UML sequence diagram found in Figure 3.

Figure 3. UML sequence diagram of loading objects from the cache and database using TBCF. Click on thumbnail to view full-sized image.

To summarize, a user request for Britney Spears CDs proceeds as follows:

  1. Obtain a list of IDs of "Britney" CDs
  2. Retrieve Timestamps for these IDs from the database
  3. Cache: Use the Id-Timestamp combination to collect the following:
    1. Fresh objects found in the cache
    2. IDs only for objects where the cache's timestamp differs from the database's timestamp (i.e., cached object is stale)
    3. IDs were not found in the cache (i.e., object is missing from the cache)
  4. Database: Load the data and create missing/stale objects
  5. Load all new and reloaded objects into cache
  6. Combine 3a's list with 4's list
  7. Sort the combined list
  8. Send CDs to users rendered as HTML

Object-oriented design ideas in action

To make our framework modular and extendable, we have used the best principles of object-oriented design to create our Java classes. For objects to participate in the TBCF, we only require them to implement the Cacheable interface (instead of extending some base class):

 

public interface Cacheable extends Timestamped {

public Collection loadFromIds(Collection idsToRefreshFromDb, Connection conn) throws ResourceLoadException;

}

For objects that lack a loadFromIds() method, we also provide an alternative static database-loading method based on the Java Reflection API. To use this loading method, objects need only implement a simpler Timestamped interface, shown in the code below. Access to a static loading method allows TBCF to be compatible with an even greater range of legacy systems.

 

public interface Timestamped {

public String getId(); public long getTimestamp(); }

To allow users the ability to plug in any caching scheme (LRU, LFU, etc.) a simple Cache interface (which extends java.util.Map) is used to denote the caching class:

 

public interface Cache extends java.util.Map {

/* Additional methods for Cache (not found in Map). */ //Returns the maximum size of the cache. public int getMaxSize();

}

The caching class is based on a popular LRU algorithm, called LRULInkedMapCache because it was created by extending the LinkedHashMap class, which has been provided in the standard java.util package since J2SE 1.4. Whenever possible, we use standard java.util interfaces (Collection, List, and Map) as return types for our framework methods.

To address multithreading and synchronization concerns, we use synchronized wrappers in the Sun Java naming convention (xxxTable wraps xxxMap).

We have developed our CacheManager using the Data Access Object (DAO) design pattern: the CacheManager automatically loads some objects from the cache and others from the database, insulating the caller from implementation. CacheManager is a singleton, implementing the static Cache initialization.

The entire timestamp-based caching framework has a dynamic Java runtime configuration that can be loaded from a properties file or entered on the command line. Whenever possible, we use immutable objects (like TimestampedId) for simplicity and reliability. We always throw layer-appropriate exceptions (like ResourceLoadException). To make our design intentions clear, we have created our Util class to be non-instantiatable and final, because it contains only static utility methods.

The resulting UML class diagram is shown in Figure 4.

Figure 4. UML class diagram of the timestamp-based caching framework. Click on thumbnail to view full-sized image.

A look at the code

Note that in the code snippets below, I have omitted error handling, comments, and many other essential details in the interest of moving the narrative along. I encourage you to look at the actual source code (downloadable from Resources) while you peruse this section.

Step 1. User's search: getCdIdsThatMatchUserSearch()

In this section, we load all the IDs/timestamps for CDs authored by Britney Spears. If we were not using a cache, our first step would be to create the complete list of CDs containing all of the required data. However, since our first TBCF requirement is to decrease object churning, we instead load a list of small immutable objects: TimestampedIds. As the name suggests, this object aggregates exactly two attributes: Id and Timestamp. The Id is used to retrieve CD objects from the cache or database, whereas the Timestamp attribute is used to determine the freshness of the cache's CD objects to ensure none of the delivered objects are stale. TimestampedId objects are designed to be light; they can be created quickly and put minimum load on our application:

1 2 3 Page
Join the discussion
Be the first to comment on this article. Our Commenting Policies
See more