10th International Congress on Information and Communication Technology in concurrent with ICT Excellence Awards (ICICT 2025) will be held at London, United Kingdom | February 18 - 21 2025.
Authors - Shima Pilehvari, Wei Peng, Yasser Morgan, Mohammad Ali Sahraian, Sharareh Eskandarieh Abstract - Overfitting is a common problem during model training, particularly for binary medical datasets with class imbalance. This research specifically addresses this issue in predicting Multiple Sclerosis (MS) progression, with the primary goal of improving model accuracy and reliability. By investigating various data resampling techniques, ensemble methods, feature extraction, and model regularization, the study thoroughly evaluates the effectiveness of these strategies in enhancing stability and performance for highly imbalanced datasets. Compared to prior studies, this research advances existing approaches by integrating Kernel Principal Component Analysis (KPCA), moderate under-sampling, Synthetic Minority Oversampling Technique (SMOTE), and post-processing techniques, including Youden’s J Statistic and manual threshold adjustments. This comprehensive strategy significantly reduced overfitting while improving the generalization of models, particularly the Multilayer Perceptron (MLP), which achieved an Area Under the Curve (AUC) of 0.98—outperforming previous models in similar applications. These findings establish important best practices for developing robust prognostic models for MS progression and underscore the importance of tailored solutions in complex medical prediction tasks.