Newsletter sign-up
View all newsletters

Sign up for our technology specific newsletters.

Enterprise Java
Email Address:
JavaWorld Daily Brew

JW Blogs > Thinking Inside a Bigger Box >

What makes a test suite good?


Your rating: None Average: 5 (4 votes)

Many people enjoy splitting testing up in a myriad of test types: Acceptance Tests, Functional Tests, Integration Tests, Performance Test, Technical Tests, Unit Tests. I have myself been guilty of such terminology as “embedded integration tests” and “requirement tests”. However, what unites the tests are more important than what divides them. The divisions are fuzzy, and they should be.

All tests have but two purposes: To tell you if you’ve completed a new requirement, and to ensure that you haven’t broken something that worked. There are three fundamental properties of a good test suite: Coverage, Robustness and Speed.

The properties of a good test suite

Coverage: I use the term coverage with some apprehension, as it has an existing and problematic definition. Test coverage to most people means line and/or branch coverage. That is: How many percent of your code is executed when you run the test suite? This metric can be misguiding, and it is probably not the goal you want. Instead, I propose a different definition of coverage: Coverage is the percentage of bugs you introduce into your code that are detected by your test suite. Stated in a different way: The higher number of false positive tests when you change your code, the lower your coverage.

Improving your line or branch coverage may or may not improve the chance that your test suite catches a defect. It may or may not be a good investment of time to improve how safe you are. Chances are that if your line coverage is over 70 %, there are better things to spend your time on than improving it further. And those things may in fact improve your line coverage as a result.

Robustness: The problem with the easy focus on line and branch coverage that tools give us is that it tends to hurt other characteristics of a good test suite. If you add a test to make sure that all the internals of your system are tested, chances are good that this test can break because of a non-destructive change. I’ve found that teams with high test coverage always seem to run into the problem of the Fragile test.

The fragility of a test suite can be described as the number of changes that breaks a test even though they did not introduce a defect. Stated in a different way: The higher number of false negative tests you have when you change your code, the lower your robustness.

You make tests more robust by testing the outcome and not the mechanism. Incidentally, I have found the mock objects seem to make my tests more fragile.

Speed: So a test breaks. What happens now? Presumably, you try to isolate the behavior that breaks, maybe by running a smaller suite of tests. Then you make some changes to the code (if there was an actual bug) or the test (if it was a false positive), you run the failing test again to check whether the problem is fixed. Repeat until done. Then you run the whole suite again to check that you didn’t introduce some other problem.

There are a few critical thresholds for tests when it comes to execution time. More than about 2 seconds, and I check my email. More than 10 seconds, and I try to respond to email. More than 20 seconds, and I start working on two tasks in parallel. More than 1 minute, and I go for a cup of coffee. Each of these secondary effects are ten times as time consuming as the test.

This means: Let’s say I introduce a bug where I need 5 attempts to fix it, and I introduce another bug that I detect when I run the full suite and that takes another 2 attempts to fix. So I run the full suite first, after fixing the first bug and after fixing the third bug. And I run a single test about 7 times. If running a full suite takes a minute and running a single test takes 10 seconds, this will have taken me 3 times * 1 minutes * 10 minutes for coffee + 7 * 10 seconds * 100 seconds to answer an email = 2500 seconds or about three quarters of an hour. If running the a tests took 1 second (no interruption) and running the suite takes 10 seconds (I’ll watch it for that long), the test time will be less than a minute. But I didn’t get to write those three emails.

The universal test

The suggested difference between an integration test and a unit test is the time it takes to run the test. The difference in running time is caused by the fact that an integration test has more setup and more realistic infrastructure. However, we usually want to test the same scenarios.

I would like to submit to you, gentle reader, that it is not only possible, but quite feasible to write a test that can be used both as a “unit test“, running with fast, in-memory implementation, and as an “integration test“, using the target infrastructure. This achieves the goals of high coverage, good robustness, and the right speed, by focusing on what the system is supposed to do, and using the infrastructure setup as a point of variation.

What about TestNG's "test group" feature?

Hi Johannes,

did you consider to switch from JUnit to TestNG that supports test groups (http://testng.org/doc/documentation-main.html#test-groups) out of the box?

"TestNG allows you to perform sophisticated groupings of test methods. Not only can you declare that methods belong to groups, but you can also specify groups that contain other groups. Then TestNG can be invoked and asked to include a certain set of groups (or regular expressions) while excluding another set. This gives you maximum flexibility in how you partition your tests and doesn't require you to recompile anything if you want to run two different sets of tests back to back."

--Klaus

Thanks for your question,

Thanks for your question, Claus.

Personally, I have found test groupings in Maven to be sufficient for my purpose. My most long-running tests aren't even Java code, but full simulation of production conditions.

I might look into TestNG in the future, but frankly, I find Java-code simpler to maintain than XML configuration like TestNG uses.

TestNG and Groups

I don't want to convince you of TestNG ;-) but you can use annotations alone; testng.xml is not mandatory for running your tests. I'm still using JUnit 3.8.2 in the majority of our tests but if I had to decide again, I would definitely prefer TestNG over JUnit4. So, if you ever plan to look at TestNG, I strongly recommend the book "Next Generation Java Testing -- TestNG and Advanced Concepts" by Cedric Beust (the creator of TestNG) and Hani Suleiman.

Thierry Janaudy's "TestNG:

Thierry Janaudy's "TestNG: The next generation of unit testing" is a good, quick intro to TestNG.

Cheers -

Athen

You missed property (quality) number 4: Isolation level

This one applies only to unit tests and distinguishes them from other test type: Isolation level (or independence)
A good Unit test is a test of a class (or module) interface (not a requirement). A good unit test will do that without requiring a plethora of other classes (or module) to work properly.
Of course you can use unit test tools to test compliance to requirements (probably a good idea) but this doesn't make those tests unit tests.

Good point

I have indeed missed isolation level and other maintainability characteristics of test suites.

I'm not sure what I agree about a good unit test being a test of a class as opposed to a requirement. I try to make all my tests associated with some external requirement. Otherwise, I'd rather delete the test (and the code!)

I guess my statement was somewhat exagerated. And...

I agree that all tests should relate to a requirement. However, sometimes the link can be quite remote and it might not be easy to relate all behavior of all classes to requirements. Nevertheless, you can still write good unit tests for them if you focus on the contract that they are suppose to implement. For example: you could use a FIFO in application A to implement feature X. This same FIFO could then be used in Application B to implement feature Y that is different from X. From a requirement point of view the test for application A would not be the same for application B. However, if you look at your FIFO as an abstract data type (that implements a well defined API) you can fully unit test your FIFO without reference to those varying requirements.
In fact, you might do a better job.
JS

The danger of gold plating

I agree. There's user oriented requirements and technical requirements for components like FIFO queues. But they're not all created equal.

Have you ever found yourself spending too much time and energy to design a perfect technical component, when a very simple solution was enough?

I have. The more my tests are connected to the real world, the less I'm tempted to gold plate my code.

I agree that you have to watchout for "goldplating".

I guess we are getting into a gray area where drawing the line depends on a number of factors. One of them being the amount of reuse you plan to do. Were I work we actually achieve a fair amount of reuse between projects and classes are often placed in shared libraries.
The important thing is that we agree about the importance of good testing. Agreement on the exact taxonomy of different species of test is less important.
JS

The cost of reuse

(I think we agree on most things, so I'm focusing on places where we might not, to keep the debate interesting)

Some studies document that the cost of developing reusable code is about three times that of developing on-off code. This mirrors my experience. Not only development, but maintainance is much more expensive for reused code.

We have a lot of code that has been reused between projects, but we find that at this point in time, we want to reduce the code base that is being reduced, as this incur a coordination cost on projects that outweigh the benefit.

There are some very few core classes that we have gotten extremely high value out of reusing. The rest has been more disappointing. This is despite excellent test coverage of reused classes.

There are no absolute rules

Thanks for the links !
Reading those and thinking about your last reply just strengthen my belief that there are no absolute rules that you can follow blindly. Before jumping into costly development you have to evaluate and before rewritting you also have to evaluate.
For us, using more senior (or talented) developers has proven to be a good path for reusable components.
JS

The value of reuse...

The value of reuse declines with the size of the reused code base.

Like you suggested, we have had senior people write code that was supposed to be reused. The core of this code is very valuable, but its value has been watered out by the size of the whole code base.

We also found that experienced developers who wrote code primarily for reuse, and not for their own use produced poor code. Myself included.

I agree that there are no rules. And as a matter of fact, most suggested "rules" or "best practices" was what's harmed us the most.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Post new comment

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <p> <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <br /> <br> <strike>
  • Lines and paragraphs break automatically.
  • Use <!--pagebreak--> to create page breaks.
  • You may post code using <code>...</code> (generic) or <?php ... ?> (highlighted PHP) tags.

More information about formatting options

CAPTCHA
Just checking to see if you're an actual person rather than a spammer. Sorry for the inconvenience.