Cross Classification Matrix to Evaluate the Performance of Machine Learning Algorithms in Predicting Students Performance of Developing Regions

Imam Dad ORCID: orcid.org/0000-0002-7674-647X¹,
Jianfeng He¹,
Waheed Noor²,
Abdul Samad³,
Ihsan Ullah² &
…
Samina Ara⁴

84 Accesses
Explore all metrics

Abstract

In the rapidly evolving landscape of education, the integration of Big Data and AI presents significant opportunities for improving educational outcomes, especially in the context of Predicting Students Performance (PSP) applications in higher education. Today, Educational Data Mining (EDM) strategies have been implemented to overcome educational challenges in advanced nations. Nonetheless, the issues confronting developing countries, like the unavailability of educational datasets, and challenges with selecting Machine Learning (ML) algorithms that are effective in terms of accuracy, bias, and over-fitting, have never been investigated. Therefore, a novel dataset, UOBEDM, collected from the University of Baluchistan (UoB) in the developing region of Balochistan, Pakistan, comprises 49,835 student records, providing valuable insights into various demographic and academic aspects. Through meticulous data collection and cleaning processes, including feature selection techniques, the dataset was refined to 23,492 instances. Various ML algorithms were fine-tuned on the UOBEDM dataset, with the top five algorithms—Trees, K-Nearest Neighbors (KNN), Naive Bayes (NB), Random Forest (RF), and Support Vector Machines (SVM)—yielding accuracy scores of 0.95, 0.94, 0.92, 0.96, and 0.50, respectively. A novel approach called the Cross-Classification Matrix (CCM) was introduced to assess algorithm performance and select the best model. Trees emerged as the optimal predicting algorithm, simplifying decision-making processes for academics through the development of a graphical tree-based Early Intervention Model (EIM). The significance of the dataset extends beyond classification algorithms, paving the way for research in EDM and addressing educational inequalities. This study underscores the potential of data-driven approaches to enhance educational outcomes and foster innovation in education. The findings contribute to the understanding of predictive modeling in education and provide valuable insights for educators, policymakers, and researchers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Student performance prediction using datamining classification algorithms: Evaluating generalizability of models from geographical aspect

Article 03 April 2023

Machine Learning-Based Analysis of Academic Performance Determinants in Somaliland: Insights from the 2021/2022 National Secondary School Exams

Article 14 March 2024

Big Data as a Tool for Analyzing Academic Performance in Education

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

UOBEDM data is available on GitHub page link: https://github.com/ImamDad/UOBEDM.

References

Parnell A. Advancing from prediction to prescription: strategies for proactively and thoughtfully addressing students’ needs. J Postsecond Stud Success. 2022;2(1):1–11.
Article Google Scholar
Patil P, Hiremath R. Big data mining—analysis and prediction of data, based on student performance. In: Pervasive computing and social networking, 2022. pp. 201–215.
Mengash HA. Using data mining techniques to predict student performance to support decision making in university admission systems. Ieee Access. 2020;8:55462–70.
Article Google Scholar
Namoun A, Alshanqiti A. Predicting student performance using data mining and learning analytics techniques: A systematic literature review. Appl Sci. 2020;11(1):237.
Article Google Scholar
Aksangür İ, Eren B, Erden C. Evaluation of data preprocessing and feature selection process for prediction of hourly PM10 concentration using long short-term memory models. Environ Pollut. 2022;311: 119973.
Article Google Scholar
Syed Mustapha SMFD. Predictive analysis of students’ learning performance using data mining techniques: A comparative study of feature selection methods. Appl Syst Innov. 2023;6(5):86.
Article Google Scholar
Kukkar A, Mohana R, Sharma A, Nayyar A. A novel methodology using RNN+ LSTM+ ML for predicting student’s academic performance. Educ Inf Technol, 2024;1–37.
Hooda M, Rana C. Learning analytics lens: improving quality of higher education. Int J Emerg Trends Eng Res 2020.
Tan S. Harnessing artificial intelligence for innovation in education. In: Learning intelligence: innovative and digital transformative learning strategies: Cultural and social engineering perspectives, 2023. pp. 335–363.
Luhnen M, Ormstad SS, Willemsen A, Schreuder-Morel C, Helmink C, Ettinger S, Erdos J, Fathollah-Nejad R, Rehrmann M, Hviding K, Rüther A. Developing a quality management system for the European Network for Health Technology Assessment (EUnetHTA): toward European HTA collaboration. Int J Technol Assess Health Care. 2021;37(1): e59.
Article Google Scholar
Albreiki B, Zaki N, Alashwal H. A systematic literature review of student’performance prediction using machine learning techniques. Educ Sci. 2021;11(9):552.
Article Google Scholar
Bagunaid W, Chilamkurti N, Veeraraghavan P. AISAR: artificial intelligence-based student assessment and recommendation system for E-learning in big data. Sustainability. 2022;14(17):10551.
Article Google Scholar
Youssef M, Mohammed S, Hamada EK, Wafaa BF. A predictive approach based on efficient feature selection and learning algorithms’ competition: Case of learners’ dropout in MOOCs. Educ Inf Technol. 2019;24(6):3591–618.
Article Google Scholar
Baak M, Koopman R, Snoek H, Klous S. A new correlation coefficient between categorical, ordinal and interval variables with Pearson characteristics. Comput Stat Data Anal. 2020;152: 107043.
Article MathSciNet Google Scholar
Williamson S, Vijayakumar K, Kadam VJ. Predicting breast cancer biopsy outcomes from BI-RADS findings using random forests with chi-square and MI features. Multimed Tools Appl. 2022;81(26):36869–89.
Article Google Scholar
Moorthy U, Gandhi UD. A novel optimal feature selection technique for medical data classification using ANOVA based whale optimization. J Ambient Intell Humaniz Comput. 2021;12:3527–38.
Article Google Scholar
Song XF, Zhang Y, Gong DW, Sun XY. Feature selection using bare-bones particle swarm optimization with mutual information. Pattern Recogn. 2021;112: 107804.
Article Google Scholar
Gong L, Xie S, Zhang Y, Wang M, Wang X. Hybrid feature selection method based on feature subset and factor analysis. IEEE Access. 2022;10:120792–803.
Article Google Scholar
Batool S, Rashid J, Nisar MW, Kim J, Kwon HY, Hussain A. Educational data mining to predict students’ academic performance: A survey study. Educ Inf Technol. 2023;28(1):905–71.
Article Google Scholar
Chicco D, Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020;21:1–13.
Article Google Scholar
Hussain S, Khan MQ. Student-performulator: Predicting students’ academic performance at secondary and intermediate level using machine learning. Ann Data Sci. 2023;10(3):637–55.
Article Google Scholar
Iam-On N, Boongoen T. Improved student dropout prediction in Thai University using ensemble of mixed-type data clusterings. Int J Mach Learn Cybern. 2017;8:497–510.
Article Google Scholar
Alasadi SA, Bhaya WS. Review of data preprocessing techniques in data mining. J Eng Appl Sci. 2017;12(16):4102–7.
Google Scholar
Tomasevic N, Gvozdenovic N, Vranes S. An overview and comparison of supervised data mining techniques for student exam performance prediction. Comput Educ. 2020;143: 103676.
Article Google Scholar
Holmgren SD, Boyles RR, Cronk RD, Duncan CG, Kwok RK, Lunn RM, Osborn KC, Thessen AE, Schmitt CP. Catalyzing knowledge-driven discovery in environmental health sciences through a community-driven harmonized language. Int J Environ Res Public Health. 2021;18(17):8985.
Article Google Scholar
Al-Ashoor AHMED, Abdullah SHUBAIR. Examining techniques to solving imbalanced datasets in educational data mining systems. Int J Comput. 2022;21(2):205–13.
Article Google Scholar
Alghamdi AS, Rahman A. Data mining approach to predict success of secondary school students: A Saudi Arabian case study. Educ Sci. 2023;13(3):293.
Article Google Scholar
Alija S, Beqiri E, Gaafar AS, Hamoud AK. Predicting students performance using supervised machine learning based on imbalanced dataset and wrapper feature selection. Informatica, 2023;47(1).
Akter S, Habib A, Islam MA, Hossen MS, Fahim WA, Sarkar PR, Ahmed M. Comprehensive performance assessment of deep learning models in early prediction and risk identification of chronic kidney disease. IEEE Access. 2021;9:165184–206.
Article Google Scholar
Alyahyan E, Düştegör D. Predicting academic success in higher education: literature review and best practices. Int J Educ Technol High Educ. 2020;17(1):3.
Article Google Scholar
Wei G, Mu W, Song Y, Dou J. An improved and random synthetic minority oversampling technique for imbalanced data. Knowl-Based Syst. 2022;248: 108839.
Article Google Scholar
Ahamed MF, Hossain MM, Nahiduzzaman M, Islam MR, Islam MR, Ahsan M, Haider J. A review on brain tumor segmentation based on deep learning methods with federated learning techniques. Comput Med Imaging Graph. 2023;102313.
Thabtah F, Hammoud S, Kamalov F, Gonsalves A. Data imbalance in classification: Experimental evaluation. Inf Sci. 2020;513:429–41.
Article MathSciNet Google Scholar
Sarwar T, Seifollahi S, Chan J, Zhang X, Aksakalli V, Hudson I, Verspoor K, Cavedon L. The secondary use of electronic health records for data mining: Data characteristics and challenges. ACM Comput Surv (CSUR). 2022;55(2):1–40.
Article Google Scholar
Fernández A, Garcia S, Herrera F, Chawla NV. SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J Artif Intell Res. 2018;61:863–905.
Article MathSciNet Google Scholar
Križanić S. Educational data mining using cluster analysis and decision tree technique: a case study. Int J Eng Bus Manage. 2020;12:1847979020908675.
Article Google Scholar

Download references

Acknowledgements

We are thankful the Ministry of Science and Technology of the Republic of China, China Scholarship council, and Kunming University of Science and Technology for supporting us.

Funding

This study is supported in part by the Ministry of Science and Technology of the Republic of China under contract numbers MOST-109- 2511-H-011-002-MY3 and MOST-108-2511-H-011-005-MY3.

Author information

Authors and Affiliations

Key Laboratory of Artificial Intelligence Yunnan, Kunming University of Science and Technology, Kunming, 650500, Yunnan, China
Imam Dad & Jianfeng He
University of Balochistan, Quetta, Pakistan
Waheed Noor & Ihsan Ullah
Habib University Karachi, Karachi, Pakistan
Abdul Samad
Government Girls High School, Quetta, Pakistan
Samina Ara

Authors

Imam Dad
View author publications
You can also search for this author in PubMed Google Scholar
Jianfeng He
View author publications
You can also search for this author in PubMed Google Scholar
Waheed Noor
View author publications
You can also search for this author in PubMed Google Scholar
Abdul Samad
View author publications
You can also search for this author in PubMed Google Scholar
Ihsan Ullah
View author publications
You can also search for this author in PubMed Google Scholar
Samina Ara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianfeng He.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical Approval

The participants were protected by hiding their personal information in this study. They were voluntary and they knew that they could withdraw from the experiment at any time. The data can be provided upon request by sending e-mails to the corresponding author.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Dad, I., He, J., Noor, W. et al. Cross Classification Matrix to Evaluate the Performance of Machine Learning Algorithms in Predicting Students Performance of Developing Regions. SN COMPUT. SCI. 5, 621 (2024). https://doi.org/10.1007/s42979-024-02909-y

Download citation

Received: 08 December 2023
Accepted: 17 April 2024
Published: 06 June 2024
DOI: https://doi.org/10.1007/s42979-024-02909-y

Cross Classification Matrix to Evaluate the Performance of Machine Learning Algorithms in Predicting Students Performance of Developing Regions

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Student performance prediction using datamining classification algorithms: Evaluating generalizability of models from geographical aspect

Machine Learning-Based Analysis of Academic Performance Determinants in Somaliland: Insights from the 2021/2022 National Secondary School Exams

Big Data as a Tool for Analyzing Academic Performance in Education

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Cross Classification Matrix to Evaluate the Performance of Machine Learning Algorithms in Predicting Students Performance of Developing Regions

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Student performance prediction using datamining classification algorithms: Evaluating generalizability of models from geographical aspect

Machine Learning-Based Analysis of Academic Performance Determinants in Somaliland: Insights from the 2021/2022 National Secondary School Exams

Big Data as a Tool for Analyzing Academic Performance in Education

Explore related subjects

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation