Abstract
Predicting student performance is a critical aspect of educational systems. Although forecasting a student’s future performance is essential in many applications, it is a challenging process due to various factors. Previous research in this area has mainly focused on comparing machine learning methods to automate student evaluation and predict their final performance. However, there have been limited studies that thoroughly explore the issue of class imbalance using a deep learning approach. Moreover, the large dataset targeting university students makes it well-suited for in-depth analysis and increases the likelihood of obtaining more accurate results. This study presents a deep learning model based on convolution and introduces a comprehensive exploration of oversampling and undersampling methods to address the issue of imbalanced classes. The paper investigates various features and characteristics of undergraduate students at the University of Jordan, utilizing a large dataset collected from the university’s registration unit. These features include demographic information, attributes related to students’ majors, faculties, registrations, courses taken (such as passed, repeated, and completed), as well as their high school averages and performance in the first four semesters. The results demonstrate that the model performs exceptionally well in terms of gmean when predicting students’ excellence. This research project has significant implications and provides valuable insights to the research community and higher education managers, aiding in the development of improved strategies to enhance educational performance. Future researchers can utilize the methods employed in this paper during the data preprocessing stages and implement the demonstrated balancing strategies for further advancements in this field of study.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The data-sets generated during or analyzed during the current study are available from the corresponding author on reasonable request.
References
Guan C, Mou J, Jiang Z (2020) Artificial intelligence innovation in education: a twenty-year data-driven historical analysis. Int J Innov Stud 4(4):134–147
Zhang Y, Yun Y, An R, Cui J, Dai H, Shang X (2021) Educational data mining techniques for student performance prediction: method review and comparison analysis. Front Psychol 12:698490
Nisbet R, Miner G, Yale K (2009) Theoretical considerations for data mining. Handbook of statistical analysis and data mining applications, pp 21–37
Domingos P (1999) Metacost: A general method for making classifiers cost-sensitive, in: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 155–164
Liu Z, Cao W, Gao Z, Bian J, Chen H, Chang Y, Liu T-Y (2020) Self-paced ensemble for highly imbalanced massive data classification. In: 2020 IEEE 36th international conference on data engineering (ICDE). IEEE pp 841–852
Ketu S, Mishra PK (2021) Scalable kernel-based svm classification algorithm on imbalance air quality data for proficient healthcare. Complex & Intell Syst 7(5):2597–2615
Mohammed R, Rawashdeh J, Abdullah M (2020) Machine learning with oversampling and undersampling techniques: overview study and experimental results. In: 2020 11th international conference on information and communication systems (ICICS). IEEE pp 243–248
Razavi S (2021) Deep learning, explained: Fundamentals, explainability, and bridgeability to process-based modelling. Environ Model Softw 144:105159
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT press
Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E (2015) Deep learning applications and challenges in big data analytics. J Big Data 2(1):1–21
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Yamashita R, Nishio M, Do RKG, Togashi K (2018) Convolutional neural networks: an overview and application in radiology. Insights Into Imaging 9(4):611–629
Son LH, Fujita H (2019) Neural-fuzzy with representative sets for prediction of student performance. Appl Intell 49(1):172–187
Kamal P, Ahuja S (2019) Academic performance prediction using data mining techniques: Identification of influential factors effecting the academic performance in undergrad professional course. In: Harmony search and nature inspired optimization algorithms. Springer, pp 835–843
Almasri A, Celebi E, Alkhawaldeh RS (2019) Emt: Ensemble meta-based tree model for predicting student performance. Scientific Programming 2019
Deng H, Wang X, Guo Z, Decker A, Duan X, Wang C, Ambrose GA, Abbott K (2019) Performancevis: Visual analytics of student performance data from an introductory chemistry course. Vis Inf 3(4):166–176
Wang X, Yu X, Guo L, Liu F, Xu L (2020) Student performance prediction with short-term sequential campus behaviors. Information 11(4):201
Crespo-Turrado C, Casteleiro-Roca JL, Sánchez-Lasheras F, López-Vázquez JA, De Cos Juez FJ, Pérez Castelo FJ, Calvo-Rolle JL, Corchado E (2020) Comparative study of imputation algorithms applied to the prediction of student performance. Logic Journal of the IGPL 28(1):58–70
Mengash HA (2020) Using data mining techniques to predict student performance to support decision making in university admission systems. IEEE Access 8:55462–55470
Tsiakmaki M, Kostopoulos G, Kotsiantis S, Ragos O (2020) Transfer learning from deep neural networks for predicting student performance. Appl Sci 10(6):2145
Hai-tao P, Ming-qu F, Hong-bin Z, Bi-zhen Y, Jin-jiao L, Chun-fang L, Yan-ze Z, Rui S (2021) Predicting academic performance of students in chinese-foreign cooperation in running schools with graph convolutional network. Neural Comput Appl 33(2):637–645
Asselman A, Khaldi M, Aammou S (2021) Enhancing the prediction of student performance based on the machine learning xgboost algorithm. Interactive Learning Environments pp 1–20
Turabieh H, Azwari SA, Rokaya M, Alosaimi W, Alharbi A, Alhakami W, Alnfiai M (2021) Enhanced harris hawks optimization as a feature selection for the prediction of student performance. Computing 103(7):1417–1438
Pallathadka H, Wenda A, Ramirez-Asís E, Asís-López M, Flores-Albornoz J, Phasinam K (2021) Classification and prediction of student performance data using various machine learning algorithms. Materials Today: Proceedings
Yousafzai BK, Khan SA, Rahman T, Khan I, Ullah I, Ur Rehman A, Baz M, Hamam H, Cheikhrouhou O (2021) Student-performulator: student academic performance using hybrid deep neural network. Sustainability 13(17):9775
Mahareek EA, Desuky AS, El-Zhni HA (2021) Simulated annealing for svm parameters optimization in student’s performance prediction. Bull Electr Eng Inform 10(3):1211–1219
Keser SB, Aghalarova S (2022) Hela: A novel hybrid ensemble learning algorithm for predicting academic performance of students. Educ Inf Technol 27(4):4521–4552
Alarape MA, Ameen AO, Adewole KS (2022) Hybrid students’ academic performance and dropout prediction models using recursive feature elimination technique. In: Advances on smart and soft computing. Springer, pp 93–106
Shreem SS, Turabieh H, Al Azwari S, Baothman F (2022) Enhanced binary genetic algorithm as a feature selection to predict student performance. Soft Comput 26(4):1811–1823
Hidalgo ÁC, Ger PM, Valentín LDLF (2022) Using meta-learning to predict student performance in virtual learning environments. Appl Intell 52(3):3352–3365
Yağcı M (2022) Educational data mining: prediction of students’ academic performance using machine learning algorithms. Smart Learn Environ 9(1):1–19
Poudyal S, Mohammadi-Aragh MJ, Ball JE (2022) Prediction of student academic performance using a hybrid 2d cnn model. Electronics 11(7):1005
Kanetaki Z, Stergiou C, Bekas G, Jacques S, Troussas C, Sgouropoulou C, Ouahabi A (2022) Grade prediction modeling in hybrid learning environments for sustainable engineering education. Sustainability 14(9):5205
Abhinav K, Subramanian V, Dubey A, Bhat P, Venkat AD (2018) Lecore: A framework for modeling learner’s preference. In: EDM
Tang S, Peterson JC, Pardos ZA (2016) Deep neural networks and how they apply to sequential education data. In: Proceedings of the third acm conference on learning@ scale, pp 321–324
Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, Liu PJ, Liu X, Marcus J, Sun M et al (2018) Scalable and accurate deep learning with electronic health records. NPJ Digit Med 1(1):18
Guo B, Zhang R, Xu G, Shi C, Yang L (2015) Predicting students performance in educational data mining. In: 2015 International symposium on educational technology (ISET). IEEE pp 125–128
Khajah M, Lindsey RV, Mozer MC (2016) How deep is knowledge tracing. arXiv:1604.02416
Brugman S (2019) pandas-profiling: Exploratory Data Analysis for Python. https://github.com/pandas-profiling/pandas-profiling, version: 2.X, Accessed: June 22, 2022
Fujiwara K, Huang Y, Hori K, Nishioji K, Kobayashi M, Kamaguchi M, Kano M (2020) Over and under sampling approach for extremely imbalanced and small minority data problem in health record analysis. Front Public Health 8:178
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
He H, Bai Y, Garcia EA, Li S (2008) Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE international joint conference on neural networks (IEEE World Congress on Computational Intelligence). IEEE 2008:1322–1328
Han H, Wang WY , Mao B-H (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing. Springer, pp 878–887
Tang Y, Zhang Y-Q, Chawla NV, Krasser S (2008) Svms modeling for highly imbalanced classification, IEEE Transactions on Systems, Man, and Cybernetics. Part B (Cybernetics) 39(1):281–288
Batista GE, Bazzan AL, Monard MC et al (2003) Balancing training data for automated annotation of keywords: a case study. In: WOB pp 10–18
Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newsletter 6(1):20–29
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551
Mustaqeem Kwon S (2019) A cnn-assisted enhanced audio signal processing for speech emotion recognition. Sensors 20(1):183
Gómez WE, Isaza CV, Daza JM (2018) Identifying disturbed habitats: A new method from acoustic indices. Ecol Inform 45:16–25
Wang H, He J, Zhang X, Liu S (2020) A short text classification method based on n-gram and cnn. Chin J Electron 29(2):248–254
Hand DJ (2007) Principles of data mining. Drug Safety 30(7):621–622
Chollet F et al (2015) Keras. https://keras.io
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: Machine learning in python. J Mach Learn Res 12:2825–2830
Lemaître G, Nogueira F, Aridas CK (2017) Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res 18(1):559–563
Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (Eds.), Advances in neural information processing systems, vol 30, Curran Associates, Inc., 2017. https://proceedings.neurips.cc/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf
Suhaimi NM, Abdul-Rahman S, Mutalib S, Hamid NA, Hamid A (2019) Review on predicting students’ graduation time using machine learning algorithms. Int J Mod Educ Comput Sci 11(7):1–13
Acknowledgements
This work was funded by The University of Jordan (Deanship of Scientific Research).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Alshamaila, Y., Alsawalqah, H., Aljarah, I. et al. An automatic prediction of students’ performance to support the university education system: a deep learning approach. Multimed Tools Appl 83, 46369–46396 (2024). https://doi.org/10.1007/s11042-024-18262-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-024-18262-4