10th International Congress on Information and Communication Technology in concurrent with ICT Excellence Awards (ICICT 2025) will be held at London, United Kingdom | February 18 - 21 2025.
Authors - Stephen Opoku Oppong, Benjamin Ghansah, Christopher Yarkwah, Einstein Kow Essibu, Winston Kwamina Essibu Abstract - A significant problem in Educational Data Mining (EDM) receiving increasing attention is its ability to predict learners' academic performance. The ability to do this largely depends on the availability of datasets to train models. Conventionally, learners who are likely to fail a particular subject are identified through formative assessments and assisted by facilitators through guidance sessions and the implementation of interventions to help them optimise learning paths. Teachers also recommend personalised learning resources intended to build learners' capacity and understanding of the subject, which also hangs on the availability of data. In this study, Generative Adversarial Networks (GAN) and Synthetic Minority Over-Sampling Technique (SMOTE) are used to generate synthetic data to augment existing data to help predict student performance using self-cognitive variables. The purpose was to balance the original dataset to avoid bias. The implementation was done using six machine learning classifiers (Naïve Bayes, Decision Tree, Extra Trees, K-Nearest Neighbor, Logistic Regression and Random Forest) to test which preprocessing method gives optimum performance. Extensive experimental results demonstrate that the GAN approach achieved a superior performance accuracy of 99.98%, significantly outperforming the SMOTE method, which gained 97.36%.