Research

Code

from rich.console import Console
console = Console()
console.print(
    f":rocket: Have fun exploring my research in fields like testing and debugging!"
)

🚀 Have fun exploring my research in fields like testing and debugging!

Overview

As an experimental computer scientist, the goal of my research is to design, implement, and evaluate techniques and tools that support the creation of useful, dependable, and efficient software. My research addresses the challenges in fields such as software engineering, software testing, and computer systems. I conduct this research with undergraduates at Allegheny College, graduate students and professors at several universities (e.g., University of Sheffield and the University of Passau), and colleagues in industry. Keep reading for more details!

Details

My research is empirical because it involves the application of the scientific method to formulate problem statements, carry out experiments, take observations, statistically analyze and visualize the data sets, and draw conclusions based on the results. Each research project focuses on the development, evaluation, and maintenance of software artifacts that can be used in both empirical studies and real-world practice by undergraduate and graduate students, software engineers, and computer scientists in both academia and industry.

In addition to supporting my professional service, my research results in award-winning and frequently cited research papers, oft-complimented presentations, and useful free and open-source software. I also leverage my research expertise as a co-host of Software Engineering Radio, the podcast for which I conduct interviews with the world’s leading experts on topics like automated refactoring and property-based testing. My research focuses on software engineering and software testing, examples of which are described in these summaries that reference key papers. Please click the icon next to a reference to access the full paper!

Flaky Tests

Since flaky tests can pass or fail without any changes to the test code or the code under test, they are an unreliable indicator of software correctness and a major problem for many software developers. To address this challenge, my collaborators and I have developed an automated method that uses machine learning to predict which tests are most likely to be flaky (Parry et al. 2022b) . Along with proposing a methodology for automatically determining which tests have latent flakiness (Parry et al. 2020) , my colleagues and I also published a survey of the literature relevant to flaky test research (Parry et al. 2022a) , and a multi-source investigation into how developers experience the impacts and causes of flaky tests (Parry et al. 2022c) . My most recent work in this area suggests that researchers and developers should go beyond any individual metric of “good” or “bad” tests — like test flakiness — as it argues for a more holistic view of a test suite’s health (McMinn, Roslan, and Kapfhammer 2025) .

Database Testing

Given the importance of verifying that a database application operates correctly, my PhD dissertation (Kapfhammer 2007) presented and evaluated an approach for testing this type of software; a notable paper derived from my dissertation is (Kapfhammer and Soffa 2003) . Since the relational schema preserves the correctness of the database’s state, my research has also focused on testing this complex artifact, with (McMinn, Wright, and Kapfhammer 2015) and (McMinn et al. 2019) being examples of papers. Recent papers such as (Alsharif, Kapfhammer, and McMinn 2020b) and (Alsharif, Kapfhammer, and McMinn 2020a) present and study techniques for improving the efficiency and effectiveness of regression testing for relational database schemas by reordering or reducing the schema’s test suites.

Web Testing

In response to the prevalence and complexity of mobile-ready web sites, my research has developed automated tools for checking the pages in these sites. Some examples of papers describing methods for automatically detecting defects in web pages include (Walsh, Kapfhammer, and McMinn 2017a) and (Walsh, Kapfhammer, and McMinn 2017b) . The automated approach introduced in (Althomali, Kapfhammer, and McMinn 2019) can visually confirm and classify the reported responsive layout failures in a web page. The method described in (Walsh, Kapfhammer, and McMinn 2020) can automatically identify potential regressions from the correct responsive layout of a web page. Finally, recent work introduced a method that automatically repairs a layout failure in a responsive web page by effectively “hiding” the defect from perception (Althomali, Kapfhammer, and McMinn 2022) .

Mutation Testing

Given the challenges associated with accurately judging the quality of a test suite, my research has designed efficient and useful ways to perform test assessment through mutation analysis. Presenting and evaluating automated techniques that efficiently insert synthetic faults into both real-world Java programs and relational database schemas, some examples of papers include (Just, Kapfhammer, and Schweiggert 2012) , (Wright, Kapfhammer, and McMinn 2014) , and (McMinn et al. 2019) . Finally, recent work explores how the eXtreme mutation testing (XMT) and statement deletion (SDL) mutation operators can detect pseudo-tested methods and statements in a Java program (Maton, Kapfhammer, and McMinn 2024) .

Regression Testing

Since software is often incrementally developed, my research on regression testing has created research automated ways to efficiently and effectively run test suites for complex programs. Some examples of papers that present and evaluate regression testing techniques for reordering and reducing test suites include (Walcott et al. 2006) , (Lin, Tang, and Kapfhammer 2014) , and (Lin et al. 2017) . Leading the way in the realistic assessment of regression testing methods, papers such as (Paterson et al. 2018) and (Paterson et al. 2019) , show how to conduct rigorous regression testing experiments with real program faults.

Search-Based Testing

Using a fitness function as a guide to a solution, search-based methods have shown promise in supporting many software engineering tasks. My work has focused on creating frameworks that support the development of search-based tools, with (McMinn and Kapfhammer 2016) being an example. Other papers like (Conrad, Roos, and Kapfhammer 2010) and (Kukunas, Cupper, and Kapfhammer 2010) describe search-based solutions to software engineering tasks like regression testing or performance optimization. Recent papers like (Alsharif, Kapfhammer, and McMinn 2019) present experimental methodologies that effectively involve humans when studying the usefulness of tests that were generated by search-based techniques.

Performance Evaluation

Given the importance of equipping software engineers with the insights and tools they need to create efficient software, my work has developed tools that automatically assess program performance. Focusing on the empirical evaluation of real-world software components like databases, relevant papers include (Jones and Kapfhammer 2011) , (Burdette et al. 2012) , and (Kinneer et al. 2015) . Papers such as (Kotelyanskii and Kapfhammer 2014) highlight how my research has investigated the influence that the parameters of a search-based algorithm have on efficiency and effectiveness of methods for automated test data generation.

Research Methods

My surveys of software testing techniques provide a starting point for people exploring this field, with (Kapfhammer 2004) and (Kapfhammer 2010) being examples of such articles. Papers like (Kapfhammer, McMinn, and Wright 2016) and (McMinn et al. 2016) show how I have articulated a research agenda for the field of software engineering that stresses, for instance, the need for well-tested statistical methods. I have also written papers, like (Alsharif, Kapfhammer, and McMinn 2018a) and (Alsharif, Kapfhammer, and McMinn 2018b) , that explain how to replicate my empirical studies of search-based testing techniques.

Establish Connections

Do you work on these topics and are you interested in collaborating with me on a project in the fields of software engineering and software testing? If so, then please contact me soon!

References

Alsharif, Abdullah, Gregory M. Kapfhammer, and Phil McMinn. 2018a. “Generating Test Suites with DOMINO.” In Proceedings of the 11th International Conference on Software Testing, Verification and Validation – Demonstrations Track.

———. 2018b. “Running Experiments and Performing Data Analysis Using SchemaAnalyst and DOMINO.” In Proceedings of the 11th International Conference on Software Testing, Verification and Validation – Artefacts Track.

———. 2019. “What Factors Make SQL Test Cases Understandable for Testers? A Human Study of Automated Test Data Generation Techniques.” In Proceedings of the 35th International Conference on Software Maintenance and Evolution.

———. 2020a. “Hybrid Methods for Reducing Database Schema Test Suites: Experimental Insights from Computational and Human Studies.” In Proceedings of the 1st International Conference on Automation of Software Test.

———. 2020b. “STICCER: Fast and Effective Database Test Suite Reduction Through Merging of Similar Test Cases.” In Proceedings of the 13th International Conference on Software Testing, Verification and Validation.

Althomali, Ibrahim, Gregory M. Kapfhammer, and Phil McMinn. 2019. “Automatic Visual Verification of Layout Failures in Responsively Designed Web Pages.” In Proceedings of the 12th International Conference on Software Testing, Verification and Validation.

———. 2022. “Automated Repair of Responsive Web Page Layouts.” In Proceedings of the 15th International Conference on Software Testing, Verification and Validation.

Burdette, Philip F., William F. Jones, Brian C. Blose, and Gregory M. Kapfhammer. 2012. “An Empirical Comparison of Java Remote Communication Primitives for Intra-Node Data Transmission.”Performance Evaluation Review 39 (4).

Conrad, Alexander P., Robert S. Roos, and Gregory M. Kapfhammer. 2010. “Empirically Studying the Role of Selection Operators During Search-Based Test Suite Prioritization.” In Proceedings of the 12th International Conference on Genetic and Evolutionary Computation.

Jones, William F., and Gregory M. Kapfhammer. 2011. “Ask and You Shall Receive: Empirically Evaluating Declarative Approaches to Finding Data in Unstructured Heaps.” In Proceedings of the 20th International Conference on Software Engineering and Data Engineering.

Just, René, Gregory M. Kapfhammer, and Franz Schweiggert. 2012. “Using Non-Redundant Mutation Operators and Test Suite Prioritization to Achieve Efficient and Scalable Mutation Analysis.” In Proceedings of the 23rd International Symposium on Software Reliability Engineering.

Kapfhammer, Gregory M. 2004. “Software Testing.” In The Computer Science Handbook. CRC Press.

———. 2007. “A Comprehensive Framework for Testing Database-Centric Applications.” PhD Dissertation, Department of Computer Science, University of Pittsburgh.

———. 2010. “Regression Testing.” In The Encyclopedia of Software Engineering. Taylor; Francis – Auerbach Publications.

Kapfhammer, Gregory M., Phil McMinn, and Chris J. Wright. 2016. “Hitchhikers Need Free Vehicles! Shared Repositories for Statistical Analysis in SBST.” In Proceedings of the 9th International Workshop on Search-Based Software Testing.

Kapfhammer, Gregory M., and Mary Lou Soffa. 2003. “A Family of Test Adequacy Criteria for Database-Driven Applications.” In Proceedings of the 9th European Software Engineering Conference and the 11th Symposium on the Foundations of Software Engineering.

Kinneer, Cody, Gregory M. Kapfhammer, Chris J. Wright, and Phil McMinn. 2015. “Automatically Evaluating the Efficiency of Search-Based Test Data Generation for Relational Database Schemas.” In Proceedings of the 27th International Conference on Software Engineering and Knowledge Engineering.

Kotelyanskii, Anton, and Gregory M. Kapfhammer. 2014. “Parameter Tuning for Search-Based Test-Data Generation Revisited: Support for Previous Results.” In Proceedings of the 14th International Conference on Quality Software.

Kukunas, James, Robert D. Cupper, and Gregory M. Kapfhammer. 2010. “A Genetic Algorithm to Improve Linux Kernel Performance on Resource-Constrained Devices.” In Proceedings of the 12th International Conference Companion on Genetic and Evolutionary Computation.

Lin, Chu-Ti, Kai-Wei Tang, and Gregory M. Kapfhammer. 2014. “Test Suite Reduction Methods That Decrease Regression Testing Costs by Identifying Irreplaceable Tests.”Information and Software Technology 56 (10).

Lin, Chu-Ti, Kai-Wei Tang, Jiun-Shiang Wang, and Gregory M. Kapfhammer. 2017. “Empirically Evaluating Greedy-Based Test Suite Reduction Methods at Different Levels of Test Suite Complexity.”Science of Computer Programming.

Maton, Megan, Gregory M. Kapfhammer, and Phil McMinn. 2024. “Exploring Pseudo-Testedness: Empirically Evaluating Extreme Mutation Testing at the Statement Level.” In Proceedings of the 35th International Conference on Software Maintenance and Evolution.

McMinn, Phil, Mark Harman, Gordon Fraser, and Gregory M. Kapfhammer. 2016. “Automated Search for Good Coverage Criteria: Moving from Code Coverage to Fault Coverage Through Search-Based Software Engineering.” In Proceedings of the 9th International Workshop on Search-Based Software Testing.

McMinn, Phil, and Gregory M. Kapfhammer. 2016. “AVMf: An Open-Source Framework and Implementation of the Alternating Variable Method.” In Proceedings of the 8th International Symposium on Search-Based Software Engineering.

McMinn, Phil, Muhammad Firhard Roslan, and Gregory M. Kapfhammer. 2025. “Beyond Test Flakiness: A Manifesto for a Holistic Approach to Test Suite Health.” In Proceedings of the 2nd International Flaky Tests Workshop.

McMinn, Phil, Chris J. Wright, and Gregory M. Kapfhammer. 2015. “The Effectiveness of Test Coverage Criteria for Relational Database Schema Integrity Constraints.”Transactions on Software Engineering and Methodology 25 (1).

McMinn, Phil, Chris J. Wright, Colton McCurdy, and Gregory M. Kapfhammer. 2019. “Automatic Detection and Removal of Ineffective Mutants for the Mutation Analysis of Relational Database Schemas.”Transactions on Software Engineering 45 (5).

Parry, Owain, Gregory M. Kapfhammer, Michael Hilton, and Phil McMinn. 2020. “Flake It till You Make It: Using Automated Repair to Induce and Fix Latent Test Flakiness.” In Proceedings of the 1st International Workshop on Automated Program Repair.

———. 2022a. “A Survey of Flaky Tests.”Transactions on Software Engineering and Methodology 31 (1).

———. 2022b. “Evaluating Features for Machine Learning Detection of Order- and Non-Order-Dependent Flaky Tests.” In Proceedings of the 15th International Conference on Software Testing, Verification and Validation.

———. 2022c. “Surveying the Developer Experience of Flaky Tests.” In Proceedings of the 44th International Conference on Software Engineering – Software Engineering in Practice Track.

Paterson, David, José Campos, Rui Abreu, Gregory M. Kapfhammer, Gordon Fraser, and Phil McMinn. 2019. “An Empirical Study on the Use of Defect Prediction for Test Case Prioritization.” In Proceedings of the 12th International Conference on Software Testing, Verification and Validation.

Paterson, David, Gregory M. Kapfhammer, Gordon Fraser, and Phil McMinn. 2018. “Using Controlled Numbers of Real Faults and Mutants to Empirically Evaluate Coverage-Based Test Case Prioritization.” In Proceedings of the 13th International Workshop on Automation of Software Test.

Walcott, Kristen R., Mary Lou Soffa, Gregory M. Kapfhammer, and Robert S. Roos. 2006. “Time-Aware Test Suite Prioritization.” In Proceedings of the International Symposium on Software Testing and Analysis.

Walsh, Thomas A., Gregory M. Kapfhammer, and Phil McMinn. 2017a. “Automated Layout Failure Detection for Responsive Web Pages Without an Explicit Oracle.” In Proceedings of the International Symposium on Software Testing and Analysis.

———. 2017b. “ReDeCheck: An Automatic Layout Failure Checking Tool for Responsively Designed Web Pages.” In Proceedings of the International Symposium on Software Testing and Analysis.

———. 2020. “Automatically Identifying Potential Regressions in the Layout of Responsive Web Pages.”Software Testing, Verification and Reliability 30 (6).

Wright, Chris J., Gregory M. Kapfhammer, and Phil McMinn. 2014. “The Impact of Equivalent, Redundant, and Quasi Mutants on Database Schema Mutation Analysis.” In Proceedings of the 14th International Conference on Quality Software.