Should we be doing more automated testing?

Decide whether automatic testing makes sense for your application

This article is about whether we, meaning professional software developers, should be doing more automated testing. This article is targeted to those who find themselves repeating manual tests over and over again, be it developers, testers, or anyone else.

In this article I ask:

  • Are we realistic about how much testing we're going to do?
  • When does it become feasible to automate testing?

Note that this article doesn't discuss whether we should be testing (be it automated or manual). Nor is it about any particular type of testing, be it unit testing, system testing, or user-acceptance testing.

Instead, this article is intended to act as a prompt for discussion and contains opinions based upon my own experience.

A real-world example, Part 1

Let's start with the job that I recently worked on. It involved small changes to a moderately complex online Website—nothing special, just some new dynamically-generated text and images.

Because the system had no unit tests and was consequently not designed in a way that facilitated unit testing, isolating and unit-testing my code changes proved to be difficult. Consequently, my unit tests were more like miniature system tests in that they indirectly tested my changes by exercising the new functionality via the Web interface.

I figured automating the tests would prove pointless, as I guessed I would be running them only a couple of times. So I wrote some plain English test plans that described the manual steps for executing the tests.

Coding and testing the first change was easy. Coding and testing the second change was easy too, but then I also had to re-execute the tests for the first change to make sure I hadn't broken anything. Coding and testing the third change was easy, but then I had to re-execute the tests for the first and second changes to make sure I hadn't broken them. Coding and testing the fourth change was easy, but...well, you get the picture.

What a drag

Whenever I had to rerun the tests, I thought: "Gee running these tests is a drag."

I would then run the tests anyway and, on a couple of occasions, found that I had introduced a defect. On such occasions, I thought: "Gee I'm glad I ran those tests."

Since these two thoughts seemed to contradict each other, I started measuring how long it was actually taking me to run the tests.

Once I had obtained a stable development build, I deployed my changes into a system-testing environment where somebody else would test them. However, because the environment differed, I figured I should re-execute the tests just to make sure they worked there.

Somebody then system-tested my changes and found a defect (something that wasn't covered by my tests). So I had to fix the defect in my development environment, rerun the tests to make sure I hadn't introduced a side effect, and then redeploy.

The end result

By the time I'd finished everything, I had executed the full test suite about eight times. My time measurements suggested that each test cycle took about 10 minutes to execute. So that meant I had spent roughly 80 minutes on manual testing. And I was thinking to myself: "Would it have been easier if I'd just automated those tests early on?"

Do you ever test anything just a couple of times?

I believe the mistake I made in underestimating the effort required to test my work is a mistake also made by other developers. Developers are renowned for underestimating effort, and I don't think that test-effort estimation is any different. In fact, given the disregard many developers have for testing, I think they would be more likely to underestimate the effort required to test their code than they would be to underestimate anything else.

The main cause of this test-effort blow-out is not that executing the test cycle in itself takes longer than expected, but that the number of test cycles that need to be executed over the life of the software is greater than expected. In my experience, it seems that most developers think they'll only test their code a couple of times at most. To such a developer I ask this question: "Have you ever had to test anything just a couple of times?" I certainly haven't.

But what about my JUnit tests?

Sure, you might write lots of low-level JUnit tests, but I'm talking about the higher-level tests that test your system's end-to-end functionality. Many developers consider writing such tests, but put the task off because it seems like a lot of effort given the number of times they believe they will execute the tests. They then proceed to manually execute the tests and often reach a point where the task becomes a drag—which is usually just after the point when they thought they wouldn't be executing the tests any more.

Alternately, the developer working on a small piece of work on an existing product (as I was doing) can also fall into this trap. Because it's such a small piece of work, there's no point in writing an automated test. You're only going to execute it a couple of times—right? Not necessarily, as I learned in my own real-world example.

Somebody will want to change your software

While developers typically underestimate the number of test cycles, I think they're even less likely to consider the effort required for testing the software later. Having finished a piece of software and probably manually testing it more times than they ever wanted to, most developers are sick of the software and don't want to think about it any more. In doing so, they are ignoring the likelihood that at some time, somebody will have to test the code again.

Many developers believe that once they write a piece of software, it will require little change in the future and thus require no further testing. Yet in my experience, almost no code that I write (especially if it's written for somebody else) goes through the code-test-deploy lifecycle once and never touched again. In fact, even if the person that I'm writing the code for tells me that it's going to be thrown away, it almost never is (I've worked on a number of "throw-away" prototypes that were subsequently taken into production and have stayed there ever since).

Even if the software doesn't change, the environment will

Even if nobody changes your software, the environment that it lives within can still change. Most software doesn't live in isolation; thus, it cannot dictate the pace of change.

Virtual machines are upgraded. Database drivers are upgraded. Databases are upgraded. Application servers are upgraded. Operating systems are upgraded. These changes are inevitable—in fact, some argue that, as a best practice, administrators should proactively ensure that their databases, operating systems, and application servers are up-to-date, especially with the latest patches and fixes.

Then there are the changes within your organization's proprietary software. For example, an enterprise datasource developed by another division in your organization is upgraded—and you are entirely dependent upon it. Alternately, suppose your software is deployed to an application server that is also hosting some other in-house application. Suddenly, for the other application to work, it becomes critical that the application server is upgraded to the latest version. Your application is going along for the ride whether it wants to or not.

Change is constant, inevitable, and entails risk. To mitigate the risk, you test—but as we've seen, manual testing quickly becomes impractical. I believe that more automated testing is the way around this problem.

But what about change management?

Some argue that management should be responsible for coordinating changes; they should track dependencies and ensure that if one of your dependencies changes, you will retest. Cross-system changes will be synchronized with releases. However, in my experience, these dependencies are complex and rarely tracked successfully. I propose an alternate approach—that software systems are better able to both test themselves and cope with inevitable change.

What happens when you don't have automated tests?

As I see it, organizations that do not cope with this change often lean in one of two directions: those who reduce their testing to maintain the pace, and those who reduce the pace to maintain their testing. Each of these approaches has its problems.

Reducing testing to maintain pace

Organizations that reduce their testing to maintain the pace tend to say: "Manual testing takes too long, and automated testing is too hard, so we just won't test as much." Consequently, they suffer from all of the problems that result when you reduce testing. However, as I mentioned in my introduction, this article doesn't argue why we should test, so I won't discuss the subject further.

Reducing pace to maintain testing

Organizations that reduce pace to maintain testing tend to say: "Testing is important, but writing automated testing is too hard, so we'll manually test." This is better than no testing, but I do not believe that on large systems in an enterprise environment, manual testing can cope with the necessary pace of change. Reduction in pace is a barrier to the system's advancement—the software's architecture slowly but steadily degrades. For example: application servers are not upgraded, and new projects are forced to use old platforms because it is not practical to manually retest everything already deployed on that platform.

I once worked on a project where the team was using an in-house persistence layer that was state-of-the-art five years ago, but now just slowed them down in comparison to those using more modern technology. In such cases (especially in well designed systems), it can seem that the cost of making the change is not the barrier—it's the cost of ensuring the change has not broken anything.

This problem can lead to a vicious circle: the architecture does not advance because there is no time for manual tests and nobody wants to introduce defects, but the software becomes increasingly difficult to maintain because the architecture doesn't advance.

When should you start automated testing?

So say that we agree that in an ideal world, we should automate tests for our software. In reality, however, we know effort is required for writing automated tests. So the question arises: At what point is testing worth automating?

If we were to assume that the cost of executing an automated test is more or less negligible in comparison to the cost of manually executing it, we could say that it is worth automating a test when the cost of automation is less than the projected total cost of manual execution. This is illustrated by the following graph, where the blue and red lines respectively represent the cost of automatically and manually testing a system in relation to the number of test cycles that will be executed:

Click on thumbnail to view full-sized image

The point at which the two lines intersect represents the point at which automated testing becomes worth the trouble. However, even if we do know where these lines intersect, we won't know whether our particular system warrants automated testing unless we are able to answer the following questions:

  1. How many times will we manually execute our test cycle?
  2. How long will it take to write automated tests?

The problem is we can't know the answer to those questions in advance. However, we can at least use the past as a basis for an estimate.

Let's reconsider the work I described earlier. We know how many times I manually executed that particular test cycle—now how long would it take to automate it?

A real-world example, Part 2

Using jWebUnit, I was able to automate 7 out of the 10 manual tests I had devised (the remaining tests used features that were beyond the tool's capabilities) in 1.5 hours, which comes to about 13 minutes per test. However, you'll recall that it took about 10 minutes to manually execute the 10 tests. Assuming that this comes to about 1 minute per test, we can conclude that we would expect to execute the tests at least 13 times before it would be worth automating them.

Consequently, it seems that in this case, automating the tests wasn't worth the trouble. Or was it? Perhaps not during the initial development, but in the future, new batches of changes must be made to the same codebase. After such changes, re-executing these tests to check for the introduction of defects would prove beneficial. In fact, I have already experienced such a scenario. There was a set of jWebUnit tests that I had already written for a prior release, and I re-executed them to determine whether I had introduced a breakage.

To be fair, the measurements I took don't account for the time required for familiarizing yourself with the testing framework. I spent at least a couple of hours learning how to use jWebUnit. I expect that other developers would do the same. However, as developers gain familiarity with their test automation framework, such time should decrease.

Conclusion

This article is not intended to demand blanket automation testing of your systems. Instead, it asks you to consider:

  • How often is a system or change really going to require testing?
  • At what point would it be worth it for me to start writing automated tests?

While the former question may require some thought, answering the latter could be as simple as measuring how long you spend manually testing your code. If it turns out to be more than you thought, consider writing some automated tests and seeing how long they take. From this, you can get some idea of if and when you would reap benefits from automated testing in future projects.

I am interested to hear your feedback. I did not touch upon some areas in this article—for example, the maintenance costs of automated tests. Furthermore, if you have done any measurements of your own, I would love you to contact me so I can hear the results.

Ben Teese is a software engineer at Shine Technologies.

Learn more about this topic