10th International Congress on Information and Communication Technology in concurrent with ICT Excellence Awards (ICICT 2025) will be held at London, United Kingdom | February 18 - 21 2025.
Authors - Erick Verdugo, Andy Abad, Remigio Hurtado Abstract - Software defect prediction is crucial for reducing costs and improving quality. According to a Cutter Consortium report, software defects cause an estimated annual loss of $1.56 trillion in global productivity. Additionally, Tricentis reported that over 30% of software development projects failed due to undetected defects. Undetected defects can increase maintenance costs, delay deliveries, and compromise security, particularly in critical applications such as financial or medical systems. A significant challenge is dealing with imbalanced data, where there are more defect-free modules than defective ones, making detection difficult. This study proposes a four-phase approach: loading and transforming data, using balancing techniques, applying machine learning models, and explaining predictions. Techniques such as SMOTE, ADASYN, and RandomUnderSampling were used to balance the data, applied to models like Random Forest, Gradient Boosting, and SVM. The JM1 dataset, containing software quality metrics and 80% defect-free modules, was used for analysis. Data preprocessing involved imputation, encoding, and normalization. Results show that Random Forest and Gradient Boosting, combined with balancing techniques, achieved the best performance in defect identification. In the future, advanced algorithms such as XGBoost and LightGBM will be explored, and parameter optimization will be conducted to further enhance results. This approach aims to improve defect detection in software and to be applied in other fields.