Writing good unit tests, Part 1: Follow your GUTs

Best practices and tools for high-quality test code

Just like production code, test code needs to be rigorously examined to ensure it's clean and bug free. In this first half of a two-part article, Klaus Berg makes the case for why good unit tests are as important as high-quality production code, then provides a comprehensive listing of agile tools and best practices used to improve the internal quality of test code. Level: Intermediate

Code quality is an important topic, but the definition of quality varies. According to the ISO 9126 standard, you can distinguish between internal quality, external quality, and quality in use. In the past, most companies looked only at production code quality, but that shouldn't be the end of your quest for better code. Writing good unit tests (GUTs) is just as important as writing high-quality code.

In this two-part article you'll get an overview of software quality -- how we define it and improve it -- followed by an introduction to methods, tools, and best practices for writing good unit test code. Some of the advice can even be applied to integration and system tests that are written as Java test programs.

Software quality: How to define it, how to improve it

When you talk about software quality, you need to consider both functional and non-functional aspects as defined, for example, in the ISO 9126 standard. These quality attributes (maintainability, portability, and so on) can be measured in two ways:

  • Using internal metrics, as covered by ISO 9126 Part 3, Internal Quality. This is typically done by static testing and analysis.
  • Using external metrics, as covered by ISO 9126 Part 2, External Quality. This is typically done by dynamic testing.

An internal metric (the focus of this article) is a quantitative method that can be used for measuring an attribute or characteristic of a software product, derived either directly or indirectly from the product itself. (It is not derived from measures of the behavior of the system -- that is, from a test execution.) Internal metrics are applicable to nonexecutable software products during design and coding in early stages of the development process. As Martijn de Vrieze states in his QA-themed blog:

External quality is that which can be seen by customers and which is traditionally tested. Bad external quality is what can be seen: system crashes, unexpected behavior, data corruption, slow performance. Internal quality is the hidden part of the iceberg, i.e. program structure, coding practices, maintainability, and domain expertise. Bad internal quality will result in lost development time; fixes are likely to introduce new problems and therefore require lengthy retesting. From a business point of view, this will invariably result in loss of competitiveness and reputation. External quality is a symptom whereas the root problem is internal quality.

Having read this, a question should occur to you: What do you know about the internal quality of your unit tests? Probably not as much as you know about the quality of your production code. Of course, agile software development processes like extreme programming or Scrum emphasize code-unit testing (preferably before it's written, thanks to test-driven development, or TDD) and thorough testing of software functionality as well as code refactoring steps. (Yael Dubinsky and Orit Hazzan have an excellent paper on this.) However, internal code quality assurance activities for the most part only address production code. It's time to boost the quality of your test code to the same level as the quality of your production code!

Writing good unit tests, Part 2: Follow your nose

Klaus Berg continues his investigation of the tools and best practices used by programmers with GUTs, with examples based on JUnit, TestNG, and Hamcrest. Get tips for writing cleaner, more efficient assertions, handling checked and unchecked exceptions, and knowing when and how to refactor your test code.

In the next sections I'll provide guidelines and best practices for writing good unit tests, both from a general process-oriented viewpoint and from a more technical angle. I'll demonstrate how to measure test coverage for a system under test, and I'll also put these coverage results in a global test context, because that metric is over-interpreted. In the second half of the article you'll learn how to apply those principles to the art of writing good assertions and exception checking test code.

From 'test infected' to 'test obsessed'

In "JUnit test infected: Programmers love writing tests" (Java Report, 1998), Kent Beck and Erich Gamma introduced a testing style called test infection. The goal of a test infected programmer or team is to write a unit test for every class in your system. But if you are a test developer (TD), strongly committed to the philosophies of extreme programming and TDD, you could end up with a test obsession! What are test developers? As Steve Rowe explains in his blog:

Test developers are the heart of a modern test team. There was a day when you could get away with hiring a few people to just use the product and call that a test team. This is no longer the case. Products are becoming more complex. The lifespan of products is increasing. More products are being created for developers instead of end users. These have no UI to interact with, so simple exploratory testing is insufficient. To test complex products, especially over an extended lifespan, the only viable solution is test automation. When the product is an API [library or framework] instead of a user interface, testing it requires programming. Test developers are programmers who happen to work on a test team. It is their job to write software which executes other software and verifies the results.

Because unit test classes don't go into production, you may be disinclined to put a lot of time, effort, and thought into them. But the unfortunate reality is that unit tests are rarely write-only code. Badly written or interdependent unit tests make it more difficult to refactor or add new code to the system. Therefore, a TD should not treat unit test code as a second-class citizen -- and a real test-obsessed TD would never do so. But what are the characteristics of good unit tests?

The good unit test

"The modern programming professional has GUTs." That declaration from Alistair Cockburn refers to a February 2008 interview in which Jim Coplien and Bob Martin talked about contract-driven development (CDD) and TDD. The core of the debate is the relationship between TDD and unit testing; Cockburn claimed that, for many people, TDD is just a synonym for having good unit tests, GUTs. However, in his opinion, if GUTs were an accepted term, then people would brag about having GUTs without implying whether they were written before or after the code -- that is, their pride would be distinct from when the tests were written.

I myself am not a TDD evangelist and -- you can quote me -- my personal opinion is that it's better to have good unit tests written just in time with the production code than to have bad unit tests written in a TDD manner, before the code materializes. In this sense I agree with Vikas Hazrati and Deborah Hartmann when they conclude that "development teams might want to practice TDD or write test cases after the code, as per their comfort. What really matters is that they should have GUTs."

But I still haven't answered the central question yet: What is a good unit test? Even Kevlin Henney cannot give us the ultimate answer in his article "Programming with GUTs" -- but he does have some good hints that are worth mentioning here. Henney states that unit tests are commonly viewed in terms of offering quantitative feedback on the presence or absence of defects in specific situations. But, in his opinion, they are in a position to offer much more than this. They can also provide feedback on how loosely coupled code is, and they can tell you a lot about code coverage -- that is, about how much production code (quantitative feedback) and which part of the production code (qualitative feedback) is tested at all.

Henney really focuses on the the "style and substance" of good unit tests. Good unit tests should communicate functionality to their readers -- that is, they should explain how to use a class or method in a specific environment or situation. You can choose between a per-method or procedural testing style, where every public method has a corresponding test method, and a more behavior-oriented style, where each unit test illustrates and defines the behavioral contract of the unit in question. Because behavior consists of more than just individual methods, Henney argues that you need a style that cuts across the interface, a style where each test case is structured in terms of a meaningful and specific behavioral goal. In his opinion, that's the style that characterizes good unit tests. Furthermore, good unit tests have good substance -- that is, they illustrate and define the behavioral contract of the unit under test.

In my opinion, it's worth having both the procedural style and the behavior-driven style that tells us a kind of user (or usage) story. Consider a large framework where you have a large public API. Because thousands of combinations of public methods will be possible (although not meaningful), I think the goal should be to test every single public method first, in isolation (excluding trivial getter/setter methods, of course). If this is still not possible, you should think about reducing a potentially large number number of tests by using risk-based methods (which I'll discuss in more detail later). Then you can create test scenarios that describe a specific program behavior.

Code quality with style

How can you make sure that you're writing high-quality test code? The following style considerations should be at the top of your mind when you're writing your tests.

Boundary conditions

You should test every method of your public API with appropriate boundary conditions (excluding trivial getter/setter methods if they really do no more than just provide or set a value). Here, an IDE can help you reduce coding effort by automatically generating JUnit or TestNG templates for every class under test. Do not just test the happy path, but write tests for invalid domain data and boundary conditions too. For instance, you should provide:

  • Null values to object parameters
  • Empty strings or very long strings
  • Special characters in certain languages, like the German characters ö, ä, ü, and ß, in strings
  • Special non-alphanumeric characters, like !, %, &, $, and ?, in strings
  • Empty collections, collections with exactly one element, or collections with the maximum number of elements
  • Invalid numbers and boundary numbers for numeric int parameters, such as Integer.MAX_VALUE, Integer.MIN_VALUE, and 0
  • Specific dates

Unfortunately, I cannot provide a best practice method for testing objects with a lot of different internal states -- for example, where state is represented as six member variables, two of which themselves contain collections of objects. If you've uncovered a good technique for such a scenario, please let me know.

Risk-based testing techniques

You should make use of risk-based testing techniques to reduce the infinite number of possible tests and to avoid testing trivial code. Your unit tests should test those parts of the code that are likely to have defects; try to imagine how the program could fail, and then try to get it to fail that way. Be sure to think about both where defects might lie in newly developed code, and about where they might arise when the code is changed (see "How to write good unit tests" by Basil Vandegriend for more about this). Risk-based testing also addresses parts of the system that are used often.

Exploratory testing

Make use of exploratory testing techniques to simultaneously process test design and test execution. As testing progresses, you learn more and more about the behavior of the code. With experience and creativity, you can craft more and better tests.

According to Cem Kaner, who coined the term in the book Testing Computer Software (Wiley, 1999), exploratory testing is "a style of software testing that emphasizes the personal freedom and responsibility of the individual tester to continually optimize the quality of his/her work by treating test-related learning, test design, test execution, and test result interpretation as mutually supportive activities that run in parallel throughout the project." In the context under discussion here, that might characterize a methodology where the user writes parameterized unit tests (with TestNG, for example) and lets an automated tool generate parameter values to cover all reachable statements as a kind of exploratory code analysis. (For more on this, see the article "Exploratory test-driven development.")

Scenario testing

You should write scenario tests that tell the reader something about the behavior of the system under test, or SUT. Such tests don't look at functions in isolation, but at the system under test (a class, in the case of unit tests) as a whole. You should collect user stories and then test them using a combination of the public API's methods.

Tests as documentation

Consider leveraging your unit tests so that they even serve as API documentation, as proposed by Brian Button in his article "Double duty." He proposes focusing more on simplicity and naming. You should look at your tests in a different way, making sure that the tests tell the story of your class or subsystem -- and that they tell the whole story. Then the process of creating your code through TDD is the process of creating the documentation too, and evolving the code also evolves the documentation. Button calls this agile documentation.

Reasonable test coverage

Try to create your tests and test data to produce reasonable test coverage for your SUT's code. I will not postulate an absolute value here; every project has to define its own goals. But keep in mind the wise advice that Lidor Wyssocky gives us in his blog post entitled "The illusion of high test coverage": "Test coverage data helps developers identify missing test cases. It also helps development managers to get a clearer picture of the functional quality of the code. However, the test code coverage metric can also create an illusion. The problem is that this number is always related to what's in the code. It can never tell you what is missing from the code. Unfortunately, many bugs are the result of unwritten code."

Andrew Binstock makes that same point in a blog post called "The fallacy of 100% code coverage," when he tells us, "If developers attain 100% code coverage -- even at the cost of writing meaningless tests -- they can be certain they haven't forgotten to test some crucial code. This viewpoint is the real illusion. By basing proper testing on 100% code coverage, the developer has confused two issues. It's what you're testing and how (so, quality) that determines code quality, not numerical coverage targets (quantity)." Matt Harrah takes the same line in a blog post called "How do you know if your tests are good?":

How do you measure the quality of your JUnit tests and put a number on it? This is a very good question with no quick and easy answers, I'm afraid. I can tell you two answers that are not measures of test quality:
  • We have 100% success rate in our test suites every night.
  • We have 100% code coverage.

Harrah is correct.

Tools for measuring test coverage

The table below offers an overview of tools that can help you with test coverage data collection and reporting. You can find links to all of these tools in the Resources section at the end of this article.

Table 1. A selected list of test coverage tools

ToolLicense typeDescription
CloverCommercial, but Clover is free for any open source project to use.

Clover measures code coverage generated by system tests, functional tests, or unit tests. Provides integrated plugins for IntelliJ IDEA 4.x and 5.x, NetBeans, Eclipse, JBuilder, and JDeveloper. Can be integrated into Ant or Maven 2. Uses source code instrumentation. Supports all JDK 1.5 language features.

Types of coverage measured: Statement, branch, method, and total coverage percentage for each class, file, and package, and for the project as a whole. Clover is a pure Java application and should run on any platform that has at least JDK 1.2 installed.

CoberturaOpen source

Cobertura is a free Java tool that calculates the percentage of code accessed by tests. It can be used to identify the parts of your Java program that are lacking test coverage. It is based on jcoverage. Cobertura modifies your Java bytecode slightly by adding bytecode instrumentation. It shows the percentage of lines and branches covered for each class, each package, and for the overall project.

It also shows McCabe cyclomatic code complexity of each class, and the average cyclomatic code complexity for each package and for the overall product. Ant and Maven integration is supported. Cobertura is platform independent.

EMMAOpen sourceSupported coverage types: class, method, line, and basic block. EMMA can even detect when a single source code line is covered only partially. Uses bytecode instrumentation. Can instrument all classes in a JAR file with one command. Ant integration is supported. EMMA is platform independent and works in any Java 2 JVM since 1.2.
CodeCoverEclipse Public License (EPL)CodeCover is a free glass box testing tool developed in 2007 at the University of Stuttgart in Germany. It measures statement, branch, loop, and MC/DC coverage. CodeCover runs in command line mode on Linux, Windows, and Mac OS X, and provides Eclipse and Ant integration.

Automation

Prepare your tests for automation. Automated tests form the basis of regression testing. You should aim for fully automated tests that can be run without effort or manual intervention. Tools like Ant and Maven 2 can support this task. Gerard Meszaros's book xUnit Test Patterns contains a large section about the importance and realization of automated tests. (I'll write more about this impressive book in the second half of this article.)

Mutation testing

Test your tests with a mutation testing tool. Kent Beck says, "Why just think your tests are good when you can know for sure? Sometimes Jester tells me my tests are airtight, but sometimes the changes it finds come as a bolt out of the blue. Highly recommended." Jester, a tool created by Ivan Moore, finds code that is not covered by tests. Jester will make some change to your code and run your tests; if the tests pass, Jester displays a message saying what it changed. Jester includes a script for generating Web pages that show the changes made that did not cause the tests to fail. The code coverage tools listed in Table 1 indicate code that is not executed by the test suites. In contrast, Jester can give more information about the sort of test that is missing, by showing how the code can be modified and still pass your tests. With Jester, the time-consuming role of a "project saboteur" can be significantly reduced.

Because Jester's setup procedure and usage are not so easy, Moore implemented a follow-up, called simple-jester, which is available for download at Sourceforge. Other open source candidates in the area of mutation testing are MuJava and Jumble. (See the Resources section to learn more about tools for mutation testing.)

Goals

You should set goals for code quality. Anyone on the team who writes test code -- including both developers and pure testers -- must know the quality goals that the team is aiming for. Establish and broadcast a best practice guide that collects all the quality-related items that are important (and agreed) for your team and project. You can't manage what you can't measure, and you need concrete quality criteria that can be checked later on.

Static code analyzers

If you need to improve the quality of your test code, you can find hints about where to look by using static code analyzers. That's the same procedure that you would use for production code. The list of popular open source Java code analyzers at Java-Source.net will get you started; there's also Enerjy, which is now freeware and not mentioned at the URL above. Commercial candidates include KlocWork, Coverity, and Bauhaus Suite. (There are more candidates, of course, but some are especially related to design and architecture, which is not the focus when looking at unit test code.)

In general, I believe that even pure developers should have a basic understanding of systematic test design methods and how to use them. Identifying and implementing the right test cases is the key for effective and efficient testing on every testing level, but it's particularly true for unit testing! To get started -- that is, for a developer to get some practical experience in using test design methods -- it can be useful to collaborate or pair with a good tester.

Real test coding

Of course, there is more to writing good unit tests than good style and code coverage. Therefore, let me reassemble and extend some rules Andy Schneider gave us in his December 2000 JavaWorld article entitled "JUnit best practices," complemented by some rules I personally find useful. (And you should remember that in 2000 Schneider was describing JUnit 3.2, and that TestNG wasn't released until about 2004!)

  • Do not use the test-case constructor to set up a test case. Test case setup should be part of your setUp() method (per test method or per test class) if it contains general initializations for your test. This also conforms to the famous DRY (don't repeat yourself) principle to avoid code duplication.

    I would like to go one step further and postulate that you should not initialize non-trivial objects (which may need things like remote resources) in static variables or static initializers. That could cause your unit test to crash during the class loading phase!

  • Call a superclass's setUp() and tearDown() methods when subclassing. If you don't, superclass behavior will be skipped. This rule is only related to JUnit 3.8.x (and below), because with JUnit 4 and TestNG you don't need to make your test a child class of a test framework class. You can easily check this rule with the static code analyzer Checkstyle. PMD, another representative of this category, has an even stronger built-in ruleset that deals with different problems that can occur with JUnit tests. With an Eclipse version that has Java 5 annotation support, you can use the @Override annotation to force the compiler to ensure that setUp() really overrides the superclass method (given that you have enabled the corresponding Eclipse settings).

  • Implement your tests as independent, isolated, and self-contained modules. For example, you can do this by using mocks or stubs wherever possible. Mock objects help you design and test the interactions between the objects in your programs. But good mocking is not a trivial task, even with the help of "expect-run-verify" libraries like EasyMock or JMock. (For a short introduction to the subject, see Simon Stewart's article on "Approaches to mocking.")

  • Avoid writing test cases with side effects.

  • Do not load data from hard-coded locations on a filesystem.

  • Name tests properly. This is a special case of what Kent Beck talks about in his book Implementation Patterns when it comes to writing, naming, and decomposing methods. Naming is always an essential subject -- for variables, for test methods, and especially for assertions. (The second half of this article will discuss writing proper assert statements in detail.)

  • Ensure that tests are time-independent. Where possible, avoid using data that may expire; such data (expired logon tokens, for example) should be programmatically refreshed. If this is not possible, you need a manual refresh, but that's a more fragile procedure.

  • Consider locale when writing tests. That's especially important when dealing with the Java classes Date, DateFormat, or Calendar.

  • If you're using JUnit, use its assert/fail methods and exception handling for clean test code. Writing good assertions and dealing with exception handling is an important topic for good unit tests.

Rules made to be broken?

While all of the above rules will help you write good unit tests, there are a couple of commonly accepted guidelines that I don't necessarily agree with:

  • Don't assume the order in which tests within a test suite will run. In other words, the conventional wisdom says that your tests should never have an order dependency.

    This probably doesn't need to be a strong rule as a core test design principle. If you're testing a moderately complex system, it may become crucial to have some kind of dependency mechanism like the one TestNG offers. In contrast, JUnit does not accept any compromise. Both sides of this debate were argued in the comments to Cedric Beust's blog post "Are dependent test methods really evil?" The debate on this topic will no doubt continue.

  • Keep tests in the same location as the source code. I have to say that I disagree with this rule. It's possible to follow it, but I find it better to have a parallel test class hierarchy in a directory called test (corresponding to src) that reflects your package structure. That way, you still have easy access to package private methods (and data) if needed without going the painful route of using reflection and setAccessible(true), or by providing a kind of PrivateFieldAccessor class. (For more on this topic, see "Subverting Java access protection for unit testing.")

Conclusion, and a look ahead

You've just learned how, in general, software quality is measured, and about the difference between external and internal quality. In this article, you looked at the internal quality of unit tests code. I have presented some best practice rules -- both from a more technical angle and from a process-oriented viewpoint. Special attention was given to the topic of code coverage as a measure of how well your tests perform with respect to executing your system under test code's different statements and branches.

In the second half of this article, I'll introduce you to how to write proper assert statements -- the basis of test verdicts -- and we'll go through different implementations for testing exceptions. Finally, I'll discuss code smells, refactorings, and test patterns.

Klaus P. Berg has a master's equivalent (Diplom) in electrical engineering and applied informatics from the University of Karlsruhe in Germany. He was an architect and implementor for projects at Siemens focused on Java GUI development with Swing and Java Web Start, and he also acted on the server side, creating Java-based intranet applications. Now he works as a senior engineer in the area of software quality, focusing on functional and performance testing, mainly for JEE software.

Learn more about this topic

Further reading

Testing tools

More from JavaWorld