Promise repository datasets for defect prediction

5/23/2023

Fair and balanced?: bias in bug-fix datasets. Our datasets significantly increase the pool of systems currently being used in defect analysis studies. Conclusions: The data we provide enables future studies to proceed with minimal effort. We make these datasets (the ELFF datasets) and our data extraction tools freely available to future researchers. Result: We have produced 138 fault and metrics datasets for the 23 identified systems. We use an enhanced SZZ algorithm to extract fault information and calculate metrics using JHawk. We reduce 50,000 potential candidates down to 23 suitable for defect prediction using a selection criteria based on the system's software repository and its defect tracking system. Method: We used the Boa to identify candidate open source systems. Goal: Identify open source Java systems suitable for defect prediction and extract high quality fault data from these datasets.

promise repository datasets for defect prediction

Identifying defect datasets for prediction is not easy and extracting quality data from identified datasets is even more difficult. Consequently our knowledge of defects is limited. Context: Defect prediction research is based on a small number of defect datasets and most are at class not method level.

0 Comments

Promise repository datasets for defect prediction

Leave a Reply.

Author

Archives

Categories