Does parameter tuning improve search-based test data generation?
Introduction
Ever wondered about the intricacies of parameter tuning in search-based test data generation? In a recent research paper (Kotelyanskii and Kapfhammer 2014) EvoSuite
. This tool uses a genetic algorithm to generate a JUnit test suite for a Java class. The paper presents an empirical study that further supports previous research findings: tuning EvoSuite
’s parameters with a well-known optimizer called SPOT
does not yield configurations significantly better than the defaults. Keep reading to discover the key findings of this intriguing research!
Setup
This paper’s experiment involved a random selection of 10 Java projects available in the SF100
repository, with 475 classes in total. The evaluation metric for these experiments was the lower-is-better inverse branch coverage metric. To collect enough data points to support a rigorous statistical analysis, we ran EvoSuite
for 100 trials with the default configuration and 100 trials with the configuration returned after parameter tuning with SPOT
.
Findings
The paper presents several key findings:
Improvements: The configurations returned by the parameter tuning algorithm only performed better on eleven of the 475 classes.
Disparities: Many Java classes in the randomly chosen subset that were either “easy” (i.e., all configurations always achieved perfect coverage) or “hard” (i.e, all configurations always achieved no coverage because, in some cases,
EvoSuite
could not generate any data).Limitations: The
SPOT
-derived configuration either performed worse than the defaults or had no statistically significant impact, suggesting the limits of parameter tuning.
Conclusion
The research suggests that EvoSuite
’s default parameters have been set by experts and are thus suitable for use in future experimental studies and industrial testing efforts. This negative result highlights the challenges of parameter tuning in search-based test data generation.
If you’re interested in diving deeper into this research, I encourage you to read the full paper (Kotelyanskii and Kapfhammer 2014)