Develop a generic caching service to improve performance

Build a timed cache with a background thread

Suppose a coworker asks you for a list of all the countries in the world. Because you are no geography expert, you surf over to the United Nations Website, download the list, and print it out for her. However, she only wishes to examine the list; she doesn't actually take it with her. Because the last thing you need is another piece of paper on your desk, you feed the list to the shredder.

A day later, another coworker requests the same thing: a list of every country in the world. Cursing yourself for not keeping the list, you again surf back to the United Nations Website. On this visit to the Website, you note that the UN updates its country list every six months. You download and print the list for your coworker. He looks at it, thanks you, and again, leaves the list with you. This time you file the list away with a message on an attached Post-it note that reminds you to discard it after six months.

Sure enough, over the next few weeks your coworkers continue to request the list again and again. You congratulate yourself for filing the document since you can extract the document from the filing cabinet more quickly than you can extract it from the Website. Your filing cabinet concept catches on; soon everyone starts putting items in your cabinet. To prevent the cabinet from growing disorganized, you set guidelines for using it. In your official capacity as filing cabinet manager, you instruct your coworkers to place labels and Post-it notes on all documents, which identify the documents and their discard/expiration date. The labels help your coworkers locate the document they're looking for, and the Post-it notes qualify whether the information is up to date.

The filing cabinet grows so popular that soon you can't file any new documents in it. You must decide what to throw out and what to keep. Although you throw out all expired documents, the cabinet still overflows with paper. How do you decide which nonexpired documents to discard? Do you discard the oldest document? You could discard the least frequently used or the least recently used; in both cases you would need a log that listed when each document was accessed. Or perhaps you could decide which documents to discard based on some other determinant; the decision is purely personal.

To relate the above real-world analogy to the computer world, the filing cabinet operates as a cache: a high-speed memory that occasionally needs maintenance. The documents in the cache are cached objects, all of which conform to the standards set by you, the cache manager. The process of cleaning out the cache is termed purging. Because cached items are purged after a certain amount of time has elapsed, the cache is called a timed cache.

In this article, you will learn how to create a 100 percent pure Java cache that uses an anonymous background thread to purge expired items. You will see how to architect such a cache while understanding the trade-offs involved with various designs.

Build the cache

Enough filing cabinet analogies: let's move on to Websites. Website servers need to deal with caching too. Servers repeatedly receive requests for information, which are identical to other requests. For your next task, you must build an Internet application for one of the world's largest companies. After four months of development, including many sleepless nights and far too many Jolt colas, the application goes into development testing with 1,000 users. A 5,000-user certification test and a subsequent 20,000-user production rollout follows the development testing. However, after you receive out-of-memory errors while only 200 users test the application, development testing halts.

To discern the source of the performance degradation, you use a profiling product and discover that the server loads multiple copies of database ResultSets, each of which has several thousand records. The records make up a product list. Moreover, the product list is identical for every user. The list does not depend on the user, as might have been the case if the product list had resulted from a parameterized query. You quickly decide that one copy of the list could serve all concurrent users, so you cache it.

However, a number of questions arise, which include such complexities as:

  • What if the product list changes? How can the cache expire the lists? How will I know how long the product list should remain in the cache before it expires?
  • What if two distinct product lists exist, and the two lists change at different intervals? Can I expire each list individually, or must they all have the same shelf life?
  • What if the cache is empty and two requesters try the cache at exactly the same time? When they both find it empty, will they create their own lists, and then both try to put their copies into the cache?
  • What if items sit in the cache for months without being accessed? Won't they eat up memory?

To address these challenges, you need to construct a software caching service.

In the filing cabinet analogy, people always checked the cabinet first when searching for documents. Your software must implement the same procedure: a request must check the caching service before loading a fresh list from the database. As a software developer, your responsibility is to access the cache before accessing the database. If the product list has already loaded into the cache, then you use the cached list, provided it is not expired. If the product list is not in the cache, then you load it from the database and cache it immediately.

Note: Before proceeding to the caching service's requirements and code, you might want to check out the sidebar below, "Caching Versus Pooling." It explains pooling, a related concept.

Requirements

In keeping with good design principles, I defined a requirements list for the caching service that we will develop in this article:

  1. Any Java application can access the caching service.
  2. Objects can be placed in the cache.
  3. Objects can be extracted from the cache.
  4. Cached objects can determine for themselves when they expire, thereby allowing maximum flexibility. Caching services that expire all objects using the same expiration formula fail to provide optimal use of cached objects. This approach is inadequate in large-scale systems since, for example, a product list may change daily, whereas a list of store locations might change only once a month.
  5. A background thread that runs under low priority removes expired cached objects.
  6. The caching service can be enhanced later through the use of a least-recently used (LRU) or least-frequently used (LFU) purging mechanism.

Implementation

To satisfy Requirement 1, we adopt a 100 percent pure Java environment. By providing public get and set methods in the caching service, we fulfill Requirements 2 and 3 as well.

Before proceeding with a discussion of Requirement 4, I'll briefly mention that we will satisfy Requirement 5 by creating an anonymous thread in the cache manager; this thread starts in the static block. Also, we satisfy Requirement 6 by identifying the points where code would later be added to implement the LRU and LFU algorithms. I will go into more detail about these requirements later in the article.

Now, back to Requirement 4, where things become interesting. If every cached object must determine for itself whether it is expired, then you must have a way to ask the object if it's expired. That means that objects in the cache must all conform to certain rules; you accomplish that in Java by implementing an interface.

Let's begin with the rules that govern the objects placed in the cache.

  1. All objects must have a public method called isExpired(), which returns a Boolean value.
  2. All objects must have a public method called getIdentifier(), which returns an object that distinguishes the object from all others in the cache.

Note: Before jumping straight into the code, you must understand that you can implement a cache in many ways. I have found more than a dozen different implementations. Enhydra and Caucho provide excellent resources that contain several cache implementations.

You'll find the interface code for this article's caching service in Listing 1.

Listing 1. Cacheable.java

/**
* Title:        Caching
Description:  This interface defines the methods, which must be implemented
by
all objects wishing to be placed in the cache.
*
* Copyright:    Copyright (c) 2001
* Company:  JavaWorld
* FileName:     Cacheable.java
@author Jonathan Lurie
@version 1.0
*/
public interface Cacheable
{
/* By requiring all objects to determine their own expirations, the
algorithm is abstracted from the caching service, thereby providing maximum
flexibility since each object can adopt a different expiration strategy.
*/
public boolean isExpired();
/* This method will ensure that the caching service is not responsible for
uniquely identifying objects placed in the cache.
*/
public Object getIdentifier();
}

Any object placed in the cache -- a String, for example -- must be wrapped inside an object that implements the Cacheable interface. Listing 2 is an example of a generic wrapper class called CachedObject; it can contain any object needed to be placed in the caching service. Note that this wrapper class implements the Cacheable interface defined in Listing 1.

Listing 2. CachedManagerTestProgram.java

/**
* Title:        Caching
* Description:  A Generic Cache Object wrapper.  Implements the Cacheable
interface
*               uses a TimeToLive stategy for CacheObject expiration.
* Copyright:    Copyright (c) 2001
* Company:  JavaWorld
* Filename: CacheManagerTestProgram.java
* @author Jonathan Lurie
* @version 1.0
*/
public class CachedObject implements Cacheable
{
    // +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    /*  This variable will be used to determine if the object is expired.
    */
    private java.util.Date dateofExpiration = null;
    private Object identifier = null;
    /*  This contains the real "value".  This is the object which needs to
be
        shared.
    */
    public Object object = null;
    // +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    public CachedObject(Object obj, Object id, int minutesToLive)
    {
      this.object = obj;
      this.identifier = id;
      // minutesToLive of 0 means it lives on indefinitely.
      if (minutesToLive != 0)
      {
        dateofExpiration = new java.util.Date();
        java.util.Calendar cal = java.util.Calendar.getInstance();
        cal.setTime(dateofExpiration);
        cal.add(cal.MINUTE, minutesToLive);
        dateofExpiration = cal.getTime();
      }
    }
    // +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    public boolean isExpired()
    {
        // Remember if the minutes to live is zero then it lives forever!
        if (dateofExpiration != null)
        {
          // date of expiration is compared.
          if (dateofExpiration.before(new java.util.Date()))
          {
            System.out.println("CachedResultSet.isExpired:  Expired from
Cache! EXPIRE TIME: " + dateofExpiration.toString() + " CURRENT TIME: " +
(new java.util.Date()).toString());
            return true;
          }
          else
          {
            System.out.println("CachedResultSet.isExpired:  Expired not from
Cache!");
            return false;
          }
        }
        else // This means it lives forever!
          return false;
    }
    // +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    public Object getIdentifier()
    {
      return identifier;
    }
    // +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
}

The CachedObject class exposes a constructor method that takes three parameters:

public CachedObject(Object obj, Object id, int minutesToLive)

The table below describes those parameters.

Parameter descriptions of the CachedObject constructor
NameTypeDescription
ObjObjectThe object that is shared. It is defined as an object to allow maximum flexibility.
IdObjectId contains a unique identifier that distinguishes the obj parameter from all other objects residing in the cache. The caching service is not responsible for ensuring the uniqueness of the objects in the cache.
minutesToLiveIntThe number of minutes that the obj parameter is valid in the cache. In this implementation, the caching service interprets a value of zero to mean that the object never expires. You might want to change this parameter in the event that you need to expire objects in less than one minute.

The constructor method determines the expiration date of the object in the cache using a time-to-live strategy. As its name implies, time-to-live means that a certain object has a fixed time at the conclusion of which it is considered dead. By adding minutesToLive, the constructor's int parameter, to the current time, an expiration date is calculated. This expiration is assigned to the class variable dateofExpiration.

Now, the isExpired() method must simply determine if the dateofExpiration is before or after the current date and time. If the date is before the current time, and the cached object is deemed expired, the isExpired() method returns true; if the date is after the current time, the cached object is not expired, and isExpired() returns false. Of course, if dateofExpiration is null, which would be the case if minutesToLive was zero, then the isExpired() method always returns false, indicating that the cached object lives forever.

Related:
1 2 3 Page 1