Pattern your way to automated regression testing

Implement popular design patterns to overcome unit-testing hurdles

We have all heard the mantra: test, test, test. Unit test all mission-critical business logic. If possible, write the unit tests before you write the code. Test everything you can, and test it often.

Frequent testing is one of the extreme programming fundamentals, a set of practices growing in popularity because it provides a sound approach to dealing with the inevitable complexity and change of today's programming projects. But the simple mantra to test is much easier said than done, and far more often preached than practiced. Deadlines approach, managers loom, and important testing gets postponed yet again.

In the past, a major obstacle to adequate unit testing was the inherent startup costs to design and implement an acceptable unit-testing framework. Without a framework, unit tests were haphazardly implemented, and very difficult to aggregate into test suites that ran with the click of a button. Fortunately, Erich Gamma and Kent Beck have implemented Beck's Smalltalk unit testing framework in Java, called JUnit. This excellent framework, already the de facto standard for implementing unit testing in Java, provides the basic plumbing for developers to quickly generate whole test suites that validate their programs with the click of a button. With JUnit, a major excuse for eschewing unit tests has been obviated.

"Trade"-offs

However, there is a more pernicious obstacle to unit testing system parts that involve access to a database system. As an example, imagine a system that processes trade transactions in a stock market simulator. One or more external systems place trade orders into a database, which our simulator then reads and processes. Assume our goal is to implement this system as well as all the unit tests necessary to ensure its smooth run. You can easily test certain system parts with JUnit.

Now consider the object that calculates the transaction's dollar value. The test could simply create a Trade object with the desired arguments, and then assert that the getPrice() method returns the expected amount. But what about the TradeProcessor object, which reads and processes the incoming database requests? Suppose myriad combinations of incoming trade orders have different outcomes in our simulator, and we wish to test those various combinations as part of our JUnit suite. Could you unit test that part of the system? (If you're not convinced you can by the end of this article, I haven't done my job!)

We could approach unit testing this core logic in several ways. We could set up multiple test databases (one for each test case), and between each unit test, change the database the program queries. We would have to ensure that those databases are immutable, since we want subsequent runs to produce identical results. If the system's nature is such that merely running a test causes the database to produce different results (for example, by setting the status of "pending" trades to "processed"), this approach is very problematic indeed. What's more, you would then have to somehow keep all those databases, with their precise state necessary for successful unit tests, stored with that source code version. For most source code revision-control tools, this requirement would present a major problem.

Another approach might start with an empty database, and have each unit test populate it with the requisite data before executing the code to test. We would need to clean out the database between each test, so that tests could run in any order. Besides being potentially resource-intensive, this approach still suffers from the revision-control problem noted above (i.e., what if the database schema changes and becomes incompatible between releases?). Again, we are back to managing different database versions to test different software versions. Unit testing should be more seamless than this.

Fortunately, a more elegant solution exists that lets you unit test code that accesses databases, and tightly couple the code version to the database version the code expects to see, allowing both to version together in your source code repository. The solution involves, as you might expect, the subject of another mantra frequent JavaWorld readers know well: design patterns. More on those in a moment. First, we must decide on a test data form.

The X factor

Ideally, our test data should be self-contained and easily version-controlled. We would need a text format compatible with most version-control systems, and a human-readable form, which would ease test data creation and debugging. Due to relational data's hierarchical nature, a flat text file is not expressive enough without some implicit assumptions. Fortunately, our good friend XML (Extensible Markup Language) fits the bill perfectly. It is text, human-readable (when formatted properly), and easily expresses hierarchical structures. If we encode the whole test data set for a particular test as a single XML file, we can construct that file with a simple text editor, and easily version-control it right alongside the code for which it provides test data.

So, how do we create an XML document that simulates a database? The following XML "database" format was created for a real-life testing scenario, and works for most common querying scenarios. Let's suppose we want to model a database having two common queries, which we'll label "query1" and "query2". "query1" takes one argument and returns two text columns. "query2" takes two arguments and returns one numeric column. The following XML document represents one possible database state:

<?xml version="1.0"?>
<dataSource>
  <query label="query1">
    <instance>
      <arg>value 1</arg>
      <resultSet>
        <result>
          <column>value 2</column>
          <column>value 3</column>
        </result>
      </resultSet>
    </instance>
    <instance>
      <arg>value 4</arg>
      <resultSet>
        <result>
          <column>value 5</column>
          <column>value 6</column>
        </result>
        <result>
          <column>value 7</column>
          <column>value 8</column>
        </result>
      </resultSet>
    </instance>
  </query>
  <query label="query2">
    <instance>
      <arg>value 9</arg>
      <arg>value 10</arg>
      <resultSet>
        <result>
          <column>11</column>
        </result>
      </resultSet>
    </instance>
  </query>
</dataSource>

When queried, this particular database returns a ResultSet with one row if "query1" executes with an argument of value 1. A call to getString(1) on the ResultSet (i.e., get the value of the first column as text) returns value 2. A call to getString(2) (the second column) returns value 3. A subsequent call to resultSet.next() returns false, since only one result exists in the result set. Similarly, executing "query1" with a value 4 argument returns a ResultSet with two hits. And, executing "query2" with value 9 and value 10 arguments returns a ResultSet with one hit. A call to getInt(1) on this ResultSet returns the integer value 11.

Although this XML database's structure is fairly simple, I've included a DataSource.dtd you can use to validate your XML data source documents for particularly large data sets.

Have I got a Bridge to sell ya

The first step toward achieving full unit-testability is to "abstract away" those system parts required to always act a certain way, but implemented differently in different circumstances. Let's assume we wish to retrofit a running application for testing. Here is TradeProcessor's core logic, as it runs currently:

public final void processPendingTrades()
  throws Exception
{
  final String query =
    "select id, account, transtype, symbol, quantity, 
status " +
    "from trades where status=
'" + Trade.STATUS_PENDING + "'";
  Connection con = null;
  Statement stmt = null;
  ResultSet resultSet = null;
  try
  {
    con = DriverManager.getConnection
("jdbc:db2:tradeDB");
    stmt = con.createStatement();
    resultSet = stmt.executeQuery(query);
    while (resultSet.next())
    {
      final String id = resultSet.getString(1);
      final String account = resultSet.getString(2);
      final String transType = resultSet.getString(3);
      final String symbol = resultSet.getString(4);
      final int quantity = resultSet.getInt(5);
      final String status = resultSet.getString(6);
      final Trade trade =
        new Trade(id, account, 
transType, symbol, quantity, status);
      process(trade);
    }
  }
  finally
  {
    if (resultSet != null)
      resultSet.close();
    if (stmt != null)
      stmt.close();
    if (con != null)
    {
      con.rollback();
      con.close();
    }
  }
}

This fairly standard code executes a query and steps through the result set, processing each row. When we run the application for real, the TradeProcessor must retrieve a list of currently pending trades from the database. In test mode, the TradeProcessor should retrieve that list from an XML file constructed specifically to test a particular situation.

An elegant, object-oriented approach to this problem inserts an abstraction layer between the processing logic and the implementation. In other words, the processing logic continues to act on an object that behaves like a ResultSet, but which may or may not actually talk to a real Java Database Connectivity (JDBC) database. The object represents abstractly a "ResultSet-like" object. The point is that the processing logic only cares that the object it talks to responds to commands such as next() and getString(). This approach of creating an interface for which the implementation may vary is so common in object-oriented programming that it has earned the distinction of being identified as the Bridge design pattern.

The solution then creates a new set of abstract types that capture the abstract nature of querying a data source and retrieving results. In addition to the ResultSet type, we must also create a DataSource type (which represents either the database or the XML file), and a Query type (which, when passed to the DataSource, allows the DataSource to create a ResultSet). These abstract types are packaged under com.paulitech.query in the example code for this article. The figure below shows a UML representation.

UML diagram of new abstract types. Click on thumbnail to view full-size image.

Using these new abstract types, the core logic becomes:

private final DataSource tradeDB = 
  DataSourceManager.get("tradeDB");
private static final Query getTradesByStatus = 
new Query(
  "getTradesByStatus",
  "select id, account, transtype, symbol, quantity, 
status " +
    "from trades where status='{0}'"); 
public final void processPendingTrades()
  throws QueryException
{
  getTradesByStatus.setArguments(new String[]
{Trade.STATUS_PENDING});
  ResultSet resultSet = null;
  try
  {
    resultSet = tradeDB.getResultSet
(getTradesByStatus);
    while (resultSet.next())
    {
      final String id = resultSet.getString(1);
      final String account = resultSet.getString(2);
      final String transType = resultSet.getString(3);
      final String symbol = resultSet.getString(4);
      final int quantity = resultSet.getInt(5);
      final String status = resultSet.getString(6);
      Trade trade =
        new Trade(id, account, transType, 
symbol, quantity, status);
      process(trade);
    }
  }
  finally
  {
    if (resultSet != null)
      resultSet.dispose();
  }
}

Other than the way we create and dispose of ResultSet, the logic remains identical. The Bridge pattern's nature allows the type implementation to vary without affecting that type's clients. The types are specified as abstract base classes (or ABCs), which forward their calls on to the implemented subclass. For example, here is the com.paulitech.query.ResultSet abstract class's next() method:

public final boolean next() throws QueryException
{
  try
  {
    return nextImpl();
  }
  catch (Exception e)
  {
    throw new QueryException
("error getting next result", e);
  }
}
protected abstract boolean nextImpl() 
throws Exception;

When a client object asks a ResultSet instance to go to the next() result, the next() method in the ResultSet ABC actually does the executing. This method forwards the call to the nextImpl() method (short for next implementation), which the concrete subclass must implement. Any exception that the subclass throws is then caught and repackaged in a com.paulitech.query.QueryException. This exception wrapping allows all implementations to adhere to a common interface. (For example, the implementation code might throw a java.sql.SQLException. The base class should not know about such implementation-specific details). Notice that nextImpl() is declared abstract, which means that any concrete ResultSet subclass must implement it. Let's look at the standard SQL implementation in the class com.paulitech.query.jdbc.JDBCResultSetAdapter:

private java.sql.ResultSet resultSet;
protected boolean nextImpl() throws SQLException
{
  return resultSet.next();
}

If this Bridge concept still isn't clear to you, spend time reviewing the full source code provided in the com.paulitech.query package (the abstract types used by client code) and the two different concrete implementations in com.paulitech.query.jdbc and com.paulitech.query.xml. When you experience the required "a-ha!" moment (make sure you are seated), you may proceed.

Adapters

Note that the JDBC implementation of our abstract com.paulitech.query.ResultSet type actually contains an instance of the familiar java.sql.ResultSet, to which it forwards the messages it receives. This approach of wrapping one object in a different interface and forwarding calls has been identified as the Adapter pattern. (Based on convention, you include the design pattern name in the type name if using a well-known pattern; thus the name JDBCResultSetAdapter.)

Singletons have more fun

Since most nontrivial applications access data from more than one source, the DataSourceManager object must manage multiple DataSources. DataSourceManager contains a simple mapping of String names (such as tradeSource) to instantiated DataSource objects. Pass DataSourceManager a String and it returns a DataSource that you may query.

Many code sections throughout an application need access to DataSources. How do we best provide access to the DataSourceManager? Should we pass a reference to the DataSourceManager to each method in the call chain? In addition to being tedious, that would bloat the code for methods only passing the reference to those methods actually needing it. Should we pass the DataSourceManger upon object construction, and maintain a global reference to it? Object-oriented programming is supposed to save us from globals -- and how do we know that all DataSourceManager references actually point to the same instance? If a bug created two different DataSourceManager objects, we'd pull out our hair trying to figure out the problem.

The Singleton design pattern, the standard approach to this common design problem, ensures that only one instance of a given type is created, and provides a standard means of accessing that one instance. Here is a snippet of code from the DataSourceManager class, implemented as a Singleton:

private static DataSourceManager theInstance;
private Map dataSources;
private DataSourceManager()
{
  dataSources = new java.util.HashMap();
}
private static synchronized DataSourceManager 
getInstance()
{
  if (theInstance == null)
    theInstance = new DataSourceManager();
  return theInstance;
}
public static void put(String label, DataSource 
dataSource)
{
  getInstance().dataSources.put(label, dataSource);
}
public static DataSource get(String label)
{
  return (DataSource)(getInstance().dataSources.
get(label));
}

Since Java's static modifier associates a variable with the class itself rather than its instances, and only one Class object is loaded for this type, you can conveniently implement Singleton using a static member variable. When we set up a new DataSource by calling DataSourceManager.put(), the put() method first calls getInstance() to get a handle to the current Singleton instance. Notice that getInstance() examines the static variable to see if the instance has initialized yet. If not (i.e., this is the first time anyone has attempted to access the Singleton), the Singleton initializes, and then the value returns. Deferring initialization until a client actually requires the object is known as lazy initialization, because it does no more work than necessary. If no clients need the Singleton, the initialization code never executes, thus saving computation costs.

Factory, create me an object

Now that we have an interface that performs in both live and test situations, a question remains: how and when do we specify whether the situation is live or a test? This is another common problem in object-oriented design -- the Abstract Factory design pattern to the rescue. A factory is an object that creates other objects. In our example, the DataSource object is a factory, since it creates ResultSet objects. Abstract Factory allows the object type to vary, depending on which factory type has instantiated. Thus, early in the program, we instantiate with the label "tradeDB" either a JDBCDataSource object (passing the appropriate database, username, and password) or an XMLDataSource object (passing the appropriate XML file resource path).

Thus, early in the real program, we should expect to see the following:

DataSourceManager.put(new JDBCDataSource
("tradeDB", "tradedb", "username", password"));

If we want instead to test our system with XML test scenario data, we would see the following in the JUnit test case setup code:

DataSourceManager.put(new XMLDataSource
("tradeDB", "TradeData.xml"));

Once either statement executes and a class calls DataSourceManager.get("tradeDB") asking for the "tradeDB" DataSource, a factory object that conforms to the DataSource interface (be it XML or JDBC) returns. This DataSource (which can be thought of as a ResultSet factory) is an abstract factory precisely because the client code doesn't care which factory type it is, as long as it adheres to the interface defined in the abstract base class DataSource.

Putting it all together

In the example code, I've implemented a business rule on the TradeProcessor called isMarketCrashing(), which returns a Boolean value. The stock market has crashed if the sell order number divided by the buy order number exceeds a certain ratio. The JUnit test, com.paulitech.examples.test.TradeProcessorTest, tests for this market-crashing condition on two distinct data sets (TradeSourceCrashing.xml and TradeSourceNotCrashing.xml), with two different results. Try altering the XML data by hand to make the unit test fail (for example, insert more buy orders so isMarketCrashing() returns false). Then fix the data and run the tests again. Make sure you understand why the tests fail or succeed.

Final thoughts

Due to space constraints, I didn't explain in detail the full implementation of the abstract classes' XML and JDBC versions. The main point of this article shows you how to apply well-known design patterns in order to unit test code that accesses databases. My team and I had to overcome many interesting hurdles when creating this framework; I encourage you to peruse the source code for a fuller understanding.

Although for a new, built-from-scratch application you could insert the abstraction layer at a higher level than the code that accesses database result sets, many existing applications have core processing logic intermingled with database access code. This logic usually needs unit testing as much or more than any other system part. Attempting to refactor the design at a higher level can exacerbate the trauma to the code and increase the chance of introducing new bugs.

In this example, we could refactor the common iteration logic into an abstract base class, and create two TradeProcessor subclasses, with two different getPendingTrades() method implementations (one for JDBC, one for XML) that return a Collection of Trade objects to the superclass for iteration. (Incidentally, this would utilize the Template Method pattern). But this approach requires tearing apart and reimplementing the core logic, a potentially risky maneuver. By inserting the abstraction layer at the database access level, the core logic is preserved, and you still gain the coveted property of unit-testability.

Kevin Pauli is an independent software consultant and founder of PauliTech Corporation. He specializes in providing software solutions with Java and XML using industry standard design patterns and open source software. His clients include IBM, NEC, and Sprint. He lives in a suburb of Dallas, Texas with his wife and various animals.

Learn more about this topic

Join the discussion
Be the first to comment on this article. Our Commenting Policies
See more