|
|
Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs
Page 7 of 7
One approach to testing for platform-dependent features is to use a golden file test, judging a result to be good if it is identical to a sample result that has been identified as acceptable ("golden"). The utility and portability of such an identity test can be increased by writing a more sophisticated comparison function to compare the actual program output with the reference output. However, before going too far down this path, please do think about how the comparison relates to the program specification. Is the comparison an accurate and efficient reflection of what the program should do for the user? The purpose of testing, after all, is to serve as an advocate and representative for the users of the program. Measurements made in testing should reflect users' needs.
In regression testing we test for degradation of the program's correctness or other quality attributes due to some modification of the software product. In general, during the regression testing, the platform is held constant while the program undergoes slight variations. This is the opposite of runnability testing, where the platform is subject to slight variation while the program is held constant. Otherwise, runnability testing and regression testing are similar. Both kinds of testing present the problems of repeating and reproducing tests accurately, and of dealing with slight variations in the test results. In both kinds of testing, a judgement must be made for each variation: Is this an allowed difference, or is it a bug?
Unlike runnability testing, regression testing doesn't necessarily focus on one special area. Regression testing is carried out because programmers have modified the code and aren't certain of the impact of their modifications. Additionally, during regression testing, special attention is paid to the areas of the code that can be predicted to be buggy -- either because they've been buggy in the past, or because they're new.
In contrast, runnability testing concentrates on the areas of the program that may reflect the runnability bugs described in the section "Differences in Java Environments," above. Our prediction of bugs is based on general knowledge, rather than on specific knowledge of the program under test.
A fundamental characteristic of a runnability test is that it will be repeated. This is true, of course, for any test -- a test case that can't be repeated is a test case that can't help you find or fix a bug. A runnability test is intended to be repeatable on each system to which you are porting, or (for run-anywhere Java programs) on each system you're running. Random test data lacks repeatability. It may seem tempting to supply a sequence of random inputs to a function, or to click on random buttons in a user interface, but in fact your time is better spent designing a list of inputs that reflect the users' requirements. This list provides a measure of confidence in the features that have been tested for runnability, and a measure of confidence that you have tested the same features on all platforms.
In order for a test case to be useful on a variety of platforms, runnability tests must themselves be portable. However, because they can be adapted to the specific set of platforms you're using, they don't have to achieve the level of runnability you're testing for. It isn't unreasonable, in order to spare users the pain, to subject test engineers to porting efforts. There are two requirements for a runnable test: it must be possible to run it on a variety of platforms, and its results must be valid.
Running tests often involves some kind of test harness or administrative framework, which imposes requirements of its own. Specifically, the connection between the test and the test framework may be difficult to set up when testing applets. With an applet test framework, you can't count on being able to save any test results to a file. Even when doing manual testing, there may be installation difficulties in getting the tested code installed and running on the test platforms. Packaging the code, setting the classpath, and invoking the Java virtual machine (JVM) are platform-dependent activities and may need to be done in a different way for each test platform. Testing an applet may also depend on the HTML tag used to embed the applet, which differs somewhat across different versions of HTML. Do you use:
<applet class="Foo.class" codebase="../.."></applet>
or
<applet class="Foo.class" archive="foo.jar"></applet>
or
<object></object>
or the Java Plug-In (Activator) syntax?
The other side of test runnability is the correctness criteria, which we mentioned in the first section. In order to write a runnable test, your evaluation of the result of the test must be sufficiently high-level to apply to different platforms. For example, a bitmap dump of the window painted by a Java program isn't likely to be of much use as a golden file in runnability testing, unless you have a human do the comparison. Any comparison or evaluation function you use in runnability testing must examine not the implementation details of the output, but its adequacy according to the program specifications.
Recall that runnability testing has some similarities to regression testing. In fact, runnability testing will often be performed along with regression testing. Once you have a runnability test set up, it makes good sense to repeat at least part of the testing for new releases. And certainly, if you do find and correct bugs as a consequence of runnability testing, you'll want to repeat the tests to verify that the bugs really are fixed. This means that runnability tests must also be robust against program changes. Some amount of this robustness is gained by the same abstraction that is necessary for test runnability. A test case that measures program compliance with the specification will be valid as long as the specification doesn't change.
The other requirement for robustness is in the test input data. Programmers should try to organize and structure their tests so that the tests will be easy to adapt to changes in the program. Good test documentation is needed that will specify tests' dependencies on other test cases, and on program features. The robustness of a test case improves when a test case is cohesive and focused on a single feature of the program. A test written using a large "smorgasbord" test case is likely to break with every detailed program change.
Automation is, essentially, investing in machinery rather than labor; or, investing in machinery to get more value from your labor. Automation can be applied during all phases of the testing process -- from test generation, to test execution and test analysis. It can provide real advantages, especially in highly-repetitive situations. Test automation can save time and money, and is especially applicable to runnability testing -- as long as your automation tools provide the necessary runnability and robustness.
The kinds of machinery available to help with runnability testing are:
The real test of any testing strategy ultimately is measured by how well the product works in real-life applications. Runnability testing could also benefit from real-life testing. This is achieved during a phase of the testing process that is often referred to as beta testing. Beta testing is useful because it brings a program into contact with real customers in real circumstances. The most direct way to get information on the suitability of a program to its users' requirements is to allow those users to use the program. No description, however accurate, can substitute for the experience of trying to use the program. No test laboratory, however extensive, can simulate all the variation of the real world.
Since beta testing requires the participation of multiple customers with varying needs and situations, it requires well-thought-out coordination. The beta sites should be selected to cover multiple scenarios that will give the best coverage of functionalities of the software under test. Specifically, for runnability testing it's best to select a number of beta sites that cover a wide variety of platform implementations and execution environments. Other standard beta criteria also need to be applied. These criteria include the establishment of strategic partnerships with selected beta sites, agreement on active use of the product, and the establishment of a formal and systematic communications channel between the development and support team and the beta sites. Typically, this may include the assignment of a product team member to one or more beta sites and the scheduling of regular review meetings between this member and the sites in order to harvest the results and communicate them to the program team.
Beta testing doesn't fit well with all marketing strategies. A company that relies on technical innovation and surprise for its competitive edge may have a problem with customer testing, as it gives the competition a longer lead-time with which to counter or imitate. This, as well as pride in craftsmanship, has at times given beta testing an unsavory reputation in engineering circles.
What's the best strategy for runnability testing? How can the concepts outlined above be put into practice? The goal of runnability testing, as with most other testing approaches, is to protect your customers from bugs. With a finite test budget, and with a limited amount of time for testing, it is not possible to prevent all bugs. Given these constraints, the goal of runnability test planning is to enable the test engineer to do as much good as possible within the time allowed.
A runnability test plan must tell programmers where to focus their testing efforts, how much testing to do, and how many test platforms are required.
The goal of a runnability test is to discover ways the program unintentionally depends on features of the JRE that may vary from one implementation to another. Therefore, runnability testing should focus on program features that make use of parts of the Java Core API that are known to expose platform differences, like file name syntax and screen layouts. Two kinds of platform variances must be kept in mind during test planning.
The first kind of platform variance is allowed variance. You can test for runnability to any platform by testing to make sure that your program works across a wide spectrum of allowed implementations. For example, you can test the file operations of your program on Unix and on MacOS. This combination covers a wide spectrum of file name syntaxes, so this testing gives you good confidence that your program will be runnable on other systems.
The other kind of platform variance is the result of a bug in the implementation of the JRE. Ideally, there would be no bugs in JRE implementations, so your program wouldn't have to cope with any. This isn't the case, however, and generalized testing for runnability in the presence of buggy platforms isn't all that effective. Instead, programmers must test on the specific platform on which they plan to deploy. Of course, one can generalize, to some degree, that two implementations derived from a common code base have some commonality of bugs; so testing on one implementation offers some confidence in the runnability of its cousins. We expect that as Java technology matures, these sources of bugs will become less prevalent.
At the minimum, a test plan should include platform variance tests that focus on specific areas of the program that might depend on implementation specifics. Test cases should be developed and run on implementations that vary as much as possible. The test plan should also include broad functionality tests, run on your most important deployment platforms.
How much runnability testing is required? For anything but the most trivial program, no test program can prove the code correct. It's always possible for a bug to slip through. However, there is a common relationship between the amount of testing performed and remaining bugs. In most cases, the curve looks something like the figure below, with a knee in the curve corresponding to a moderate testing effort. A test effort less than this knee won't catch the obvious bugs; a test effort much greater than this knee will catch more bugs, but at a much higher cost per bug. The appropriate test effort depends largely on the cost of a bug. For safety-critical programs, the cost of a bug is very high, so a large test effort is justified. For a throwaway personal program, the cost of a bug may be very low, so minimum testing may be appropriate.

Amount of testing performed versus remaining bugs
How do you know where you are on this curve? How do you measure the confidence gained by testing?
Coverage measurement
The most concrete measure of a testing effort is a coverage measurement. One example of a way to apply a coverage measurement is to use a path coverage measurement, looking only at the paths that use the java.io.File class, to check that your runnability testing has covered all uses of the File class. Note that if you have high confidence in the repeatability of your tests you can gather coverage measurements on your
test development platform alone. If your tests are robust, the coverage is likely to be the same on all platforms. This is
a judgement call, because until you have the coverage information from different platforms you don't have a measure of the
repeatability of your tests. You have to balance the cost of the coverage measurement against the benefit of the increased
confidence in the test.
Feature analysis
Another measure of the adequacy of your tests is a feature analysis or a scenario test. This is more closely tied to the program specification and the users' requirements than to the program code. The measure
of test adequacy here is: How well does this test mimic the user's experience? Are the program's features exercised? Does
the test include typical usage scenarios? This can be achieved by establishing a functionality traceability matrix that lists
all the relevant user requirements, and records availability and successful runnability of those tests on various platforms.
These two criteria for test adequacy are not exclusive. They can, and should, be used to judge the same test suite. A test suite has probably not made it to the sweet spot in the curve shown in the figure above if it only achieves 10 percent code coverage; a test suite is probably not adequate for commercial release if it does not reflect the major features of a product. As products mature, the goal for the code coverage metric generally approaches 80 percent, and the traceability matrix goal approaches 100 percent.
It is possible to get much deeper into the study of test adequacy than can be covered in this article. For example, the number of bugs found per unit testing time can be plotted and fitted to a curve in an attempt to predict the benefit of future test efforts. Another technique from the software engineering literature is bug seeding. This approach introduces known bugs into a program, without informing the test team, then tracks the proportion of those bugs that have been found in testing. We doubt that this latter technique will be useful for runnability bugs, as runnability bugs are so highly correlated that the underlying statistical assumption of the independence of each bug doesn't hold. Even though a full-scale defect analysis may not be feasible in your testing effort, it's still quite useful to know how you'll know you're done before you start. At least set a simple coverage goal, so that you can track and manage the effectiveness of your testing effort.
It would be nice if we could test on every platform that our customers are going to use. However, unless we're operating in an extremely constrained environment, this isn't possible. This means that our customers will use our Java program on platforms (or at least on platform configurations) that we haven't been able to test.
How can programmers limit the consequent risk? By choosing a representative set of platforms for our runnability testing.
This is similar to the problem of selecting test data. A good strategy is to use a mixture of the most common and the most
disparate platforms, thus ensuring utility and measuring runnability. By ensuring that the testing platforms span the range
of implementation choices in the parts of the JRE that your program uses, you gain confidence that your program will operate
correctly on platforms between the extremes of implementation choices. For example, if your program uses file names, it's
worthwhile to test it on both Macintosh and Windows machines, which have very different file-name syntaxes, to ensure that
the program has used the File class in a runnable way.
Testing cannot eliminate all risk. Runnability testing cannot ensure that your Java program will run in all Java environments. On the one hand, testing cannot catch all bugs even on a single environment. On the other, it is not feasible to test on all Java environments. The benefit of testing is to reduce the risk. There are particular areas of runnability where testing is particularly difficult. In particular, testing a program for correct use of the multithreading capabilities of the Java programming language is impossible. This is because multithreading bugs are typically exposed only by a particular order of execution of the threads involved, and this depends on the thread scheduler, the load on the machine, the time taken by various I/O operations, and on other factors that are neither measurable nor controllable in a test. So it's very difficult to trigger a multithreading bug -- and even harder to reproduce it. Particularly in the area of multithread behavior, tools that can analyze and diagnose program behavior can significantly improve your testing productivity. Finally, late testing is the third most expensive way to catch a bug. (The most expensive is to lose a sale; the second most expensive is to get a support call). The earlier you can catch a bug, the cheaper it will be, and the less widespread the bug is likely to be.
Testing is only part of a complete quality assurance strategy. The other, cheaper opportunities to catch bugs occur earlier in the software development process.
The best way to catch bugs is to prevent them. The Java programming language encourages good interface and class documentation, and good encapsulation. Why not use these facilities to aid in design and code reviews, to prevent bugs before they get coded? Especially on the topic of runnability, you may find the 100% Pure Java Cookbook (see Resources), which is a compendium of runnability hints and tips, useful background reading for a runnability review. After you have a program written, it can be statically checked. The Java compiler, with its strict type checking, does a great deal of this checking. The class verifier does more. In the area of runnability, the SunTest group at Sun builds and distributes a program that reads Java class files and informs the user of potential runnability problems -- see the "Java Testing Tools from Sun" section in Resources.
Java technology brings the dream of ultimate portability, which we have dubbed runnability, within our grasp. In order to keep that goal from slipping through our fingers, we need to do some testing specifically for runnability. This article has provided some recommendations and some ideas for how to perform runnability testing, so that you may confidently deliver your Java programs with the claim that they will run anywhere.