Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

Cross Classification Matrix to Evaluate the Performance of Machine Learning Algorithms in Predicting Students Performance of Developing Regions

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

In the rapidly evolving landscape of education, the integration of Big Data and AI presents significant opportunities for improving educational outcomes, especially in the context of Predicting Students Performance (PSP) applications in higher education. Today, Educational Data Mining (EDM) strategies have been implemented to overcome educational challenges in advanced nations. Nonetheless, the issues confronting developing countries, like the unavailability of educational datasets, and challenges with selecting Machine Learning (ML) algorithms that are effective in terms of accuracy, bias, and over-fitting, have never been investigated. Therefore, a novel dataset, UOBEDM, collected from the University of Baluchistan (UoB) in the developing region of Balochistan, Pakistan, comprises 49,835 student records, providing valuable insights into various demographic and academic aspects. Through meticulous data collection and cleaning processes, including feature selection techniques, the dataset was refined to 23,492 instances. Various ML algorithms were fine-tuned on the UOBEDM dataset, with the top five algorithms—Trees, K-Nearest Neighbors (KNN), Naive Bayes (NB), Random Forest (RF), and Support Vector Machines (SVM)—yielding accuracy scores of 0.95, 0.94, 0.92, 0.96, and 0.50, respectively. A novel approach called the Cross-Classification Matrix (CCM) was introduced to assess algorithm performance and select the best model. Trees emerged as the optimal predicting algorithm, simplifying decision-making processes for academics through the development of a graphical tree-based Early Intervention Model (EIM). The significance of the dataset extends beyond classification algorithms, paving the way for research in EDM and addressing educational inequalities. This study underscores the potential of data-driven approaches to enhance educational outcomes and foster innovation in education. The findings contribute to the understanding of predictive modeling in education and provide valuable insights for educators, policymakers, and researchers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

UOBEDM data is available on GitHub page link: https://github.com/ImamDad/UOBEDM.

References

  1. Parnell A. Advancing from prediction to prescription: strategies for proactively and thoughtfully addressing students’ needs. J Postsecond Stud Success. 2022;2(1):1–11.

    Article  Google Scholar 

  2. Patil P, Hiremath R. Big data mining—analysis and prediction of data, based on student performance. In: Pervasive computing and social networking, 2022. pp. 201–215.

  3. Mengash HA. Using data mining techniques to predict student performance to support decision making in university admission systems. Ieee Access. 2020;8:55462–70.

    Article  Google Scholar 

  4. Namoun A, Alshanqiti A. Predicting student performance using data mining and learning analytics techniques: A systematic literature review. Appl Sci. 2020;11(1):237.

    Article  Google Scholar 

  5. Aksangür İ, Eren B, Erden C. Evaluation of data preprocessing and feature selection process for prediction of hourly PM10 concentration using long short-term memory models. Environ Pollut. 2022;311: 119973.

    Article  Google Scholar 

  6. Syed Mustapha SMFD. Predictive analysis of students’ learning performance using data mining techniques: A comparative study of feature selection methods. Appl Syst Innov. 2023;6(5):86.

    Article  Google Scholar 

  7. Kukkar A, Mohana R, Sharma A, Nayyar A. A novel methodology using RNN+ LSTM+ ML for predicting student’s academic performance. Educ Inf Technol, 2024;1–37.

  8. Hooda M, Rana C. Learning analytics lens: improving quality of higher education. Int J Emerg Trends Eng Res 2020.

  9. Tan S. Harnessing artificial intelligence for innovation in education. In: Learning intelligence: innovative and digital transformative learning strategies: Cultural and social engineering perspectives, 2023. pp. 335–363.

  10. Luhnen M, Ormstad SS, Willemsen A, Schreuder-Morel C, Helmink C, Ettinger S, Erdos J, Fathollah-Nejad R, Rehrmann M, Hviding K, Rüther A. Developing a quality management system for the European Network for Health Technology Assessment (EUnetHTA): toward European HTA collaboration. Int J Technol Assess Health Care. 2021;37(1): e59.

    Article  Google Scholar 

  11. Albreiki B, Zaki N, Alashwal H. A systematic literature review of student’performance prediction using machine learning techniques. Educ Sci. 2021;11(9):552.

    Article  Google Scholar 

  12. Bagunaid W, Chilamkurti N, Veeraraghavan P. AISAR: artificial intelligence-based student assessment and recommendation system for E-learning in big data. Sustainability. 2022;14(17):10551.

    Article  Google Scholar 

  13. Youssef M, Mohammed S, Hamada EK, Wafaa BF. A predictive approach based on efficient feature selection and learning algorithms’ competition: Case of learners’ dropout in MOOCs. Educ Inf Technol. 2019;24(6):3591–618.

    Article  Google Scholar 

  14. Baak M, Koopman R, Snoek H, Klous S. A new correlation coefficient between categorical, ordinal and interval variables with Pearson characteristics. Comput Stat Data Anal. 2020;152: 107043.

    Article  MathSciNet  Google Scholar 

  15. Williamson S, Vijayakumar K, Kadam VJ. Predicting breast cancer biopsy outcomes from BI-RADS findings using random forests with chi-square and MI features. Multimed Tools Appl. 2022;81(26):36869–89.

    Article  Google Scholar 

  16. Moorthy U, Gandhi UD. A novel optimal feature selection technique for medical data classification using ANOVA based whale optimization. J Ambient Intell Humaniz Comput. 2021;12:3527–38.

    Article  Google Scholar 

  17. Song XF, Zhang Y, Gong DW, Sun XY. Feature selection using bare-bones particle swarm optimization with mutual information. Pattern Recogn. 2021;112: 107804.

    Article  Google Scholar 

  18. Gong L, Xie S, Zhang Y, Wang M, Wang X. Hybrid feature selection method based on feature subset and factor analysis. IEEE Access. 2022;10:120792–803.

    Article  Google Scholar 

  19. Batool S, Rashid J, Nisar MW, Kim J, Kwon HY, Hussain A. Educational data mining to predict students’ academic performance: A survey study. Educ Inf Technol. 2023;28(1):905–71.

    Article  Google Scholar 

  20. Chicco D, Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020;21:1–13.

    Article  Google Scholar 

  21. Hussain S, Khan MQ. Student-performulator: Predicting students’ academic performance at secondary and intermediate level using machine learning. Ann Data Sci. 2023;10(3):637–55.

    Article  Google Scholar 

  22. Iam-On N, Boongoen T. Improved student dropout prediction in Thai University using ensemble of mixed-type data clusterings. Int J Mach Learn Cybern. 2017;8:497–510.

    Article  Google Scholar 

  23. Alasadi SA, Bhaya WS. Review of data preprocessing techniques in data mining. J Eng Appl Sci. 2017;12(16):4102–7.

    Google Scholar 

  24. Tomasevic N, Gvozdenovic N, Vranes S. An overview and comparison of supervised data mining techniques for student exam performance prediction. Comput Educ. 2020;143: 103676.

    Article  Google Scholar 

  25. Holmgren SD, Boyles RR, Cronk RD, Duncan CG, Kwok RK, Lunn RM, Osborn KC, Thessen AE, Schmitt CP. Catalyzing knowledge-driven discovery in environmental health sciences through a community-driven harmonized language. Int J Environ Res Public Health. 2021;18(17):8985.

    Article  Google Scholar 

  26. Al-Ashoor AHMED, Abdullah SHUBAIR. Examining techniques to solving imbalanced datasets in educational data mining systems. Int J Comput. 2022;21(2):205–13.

    Article  Google Scholar 

  27. Alghamdi AS, Rahman A. Data mining approach to predict success of secondary school students: A Saudi Arabian case study. Educ Sci. 2023;13(3):293.

    Article  Google Scholar 

  28. Alija S, Beqiri E, Gaafar AS, Hamoud AK. Predicting students performance using supervised machine learning based on imbalanced dataset and wrapper feature selection. Informatica, 2023;47(1).

  29. Akter S, Habib A, Islam MA, Hossen MS, Fahim WA, Sarkar PR, Ahmed M. Comprehensive performance assessment of deep learning models in early prediction and risk identification of chronic kidney disease. IEEE Access. 2021;9:165184–206.

    Article  Google Scholar 

  30. Alyahyan E, Düştegör D. Predicting academic success in higher education: literature review and best practices. Int J Educ Technol High Educ. 2020;17(1):3.

    Article  Google Scholar 

  31. Wei G, Mu W, Song Y, Dou J. An improved and random synthetic minority oversampling technique for imbalanced data. Knowl-Based Syst. 2022;248: 108839.

    Article  Google Scholar 

  32. Ahamed MF, Hossain MM, Nahiduzzaman M, Islam MR, Islam MR, Ahsan M, Haider J. A review on brain tumor segmentation based on deep learning methods with federated learning techniques. Comput Med Imaging Graph. 2023;102313.

  33. Thabtah F, Hammoud S, Kamalov F, Gonsalves A. Data imbalance in classification: Experimental evaluation. Inf Sci. 2020;513:429–41.

    Article  MathSciNet  Google Scholar 

  34. Sarwar T, Seifollahi S, Chan J, Zhang X, Aksakalli V, Hudson I, Verspoor K, Cavedon L. The secondary use of electronic health records for data mining: Data characteristics and challenges. ACM Comput Surv (CSUR). 2022;55(2):1–40.

    Article  Google Scholar 

  35. Fernández A, Garcia S, Herrera F, Chawla NV. SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J Artif Intell Res. 2018;61:863–905.

    Article  MathSciNet  Google Scholar 

  36. Križanić S. Educational data mining using cluster analysis and decision tree technique: a case study. Int J Eng Bus Manage. 2020;12:1847979020908675.

    Article  Google Scholar 

Download references

Acknowledgements

We are thankful the Ministry of Science and Technology of the Republic of China, China Scholarship council, and Kunming University of Science and Technology for supporting us.

Funding

This study is supported in part by the Ministry of Science and Technology of the Republic of China under contract numbers MOST-109- 2511-H-011-002-MY3 and MOST-108-2511-H-011-005-MY3.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianfeng He.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical Approval

The participants were protected by hiding their personal information in this study. They were voluntary and they knew that they could withdraw from the experiment at any time. The data can be provided upon request by sending e-mails to the corresponding author.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dad, I., He, J., Noor, W. et al. Cross Classification Matrix to Evaluate the Performance of Machine Learning Algorithms in Predicting Students Performance of Developing Regions. SN COMPUT. SCI. 5, 621 (2024). https://doi.org/10.1007/s42979-024-02909-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-024-02909-y

Keywords

Navigation