Can defect prediction enhance test suite prioritization techniques?
Introduction
I’ve written about test suite prioritization in two previous posts called Regression testing of software is costly — but you can do something about it! and Using real faults to evaluate test suite prioritization techniques. In those posts I pointed out that, to confirm the correctness of an evolving system, software engineers often write test suites that they will re-run as they modify a program. I also explained that this this valuable — and expensive! — process, called regression testing, helps developers to ensure that they have not introduced new defects as they add new features or bug fixes. One way to perform regression testing is to prioritize the test suite so that test execution first runs those tests that are most likely to find defects.
Prediction
In one of my recent research papers, (Paterson et al. 2019)
Results
Since the goal of our paper was not to implement a new defect prediction technique, we investigated how to configure an existing tool, called Schwa, to maximize the likelihood of an accurate prediction, surfacing the link between perfect defect prediction and test case prioritization effectiveness. Our paper’s experiments used 6 real-world Java programs containing 395 real faults, to compare the presented strategy, called G-clef, against eight existing test case prioritization strategies. The experiments reveal that using defect prediction to prioritize test cases reduces the number of test cases required to find a fault by 9.48% on average when compared with existing coverage-based strategies, and 10.5% when compared with existing history-based strategies.
You may be asking yourself whether or not decreasing by about 10% the number of test cases needed to find a fault is a noticeable improvement for the regression testing of a software application. It is a good question! If you are testing a small application with only a few tests, then the benefits of the presented approach may be outweighed by the costs of running G-clef. With that said, if you are a software engineer repeatedly running a comprehensive test suite in a continuous integration environment, then running 10% fewer tests before you find the first defect could be beneficial. From my own experience in developing software with thousands of tests that run on a cloud-based CI server in multiple operating systems, runtime environments, and versions of a programming language, then a small decrease in the number of needed tests would nicely streamline my workflow!
Conclusion
It is worth pointing out that our paper’s results hold for large Java programs and thus there is the need to replicate our experiments with programs implemented in different languages. So, if you want to replicate the experiments and need to first learn more, then please read my survey paper that overviews the regression testing field (Kapfhammer 2010)