Data Mining Approach To Predict Academic Performance of Students

BOHR International Journal of Smart Computing
and Information Technology

2023, Vol. 4, No. 1, pp. 39–49
DOI: 10.54646/bijscit.2023.35
www.bohrpub.com
METHODS
Data mining approach to predict academic performance

of students
Partha Ghosh * , Reet Roy, Souhardya Mandal, Manthan Chowdhary and Subhajit Bokshi
Department of Computer Science and Engineering, Government College of Engineering & Ceramic Technology, Beleghata,
Kolkata
*Correspondence:
Partha Ghosh,
parth_ghos@rediffmail.com
Received: 20 May 2023; Accepted: 12 June 2023; Published: 20 July 2023
Powerful data mining techniques are available in a variety of educational fields. Educational research is
advancing rapidly due to the vast amount of student data that can be used to create insightful patterns
related to student learning. Educational data mining is a tool that helps universities assess and identify student
performance. Well-known classification techniques have been widely used to determine student success in
data mining. A decisive and growing exploration area in educational data mining (EDM) is predicting student
academic performance. This area uses data mining and automaton learning approaches to extract data from
education repositories. According to relevant research, there are several academic performance prediction
methods aimed at improving administrative and teaching staff in academic institutions. In the put-forwarded
approach, the collected data set is preprocessed to ensure data quality and labeled student education data
is used to apply ANN classifiers, support vector classifiers, random forests, and DT Compute and train a
classifier. The achievement of the four classifications is measured by accuracy value, receiver operating curve
(ROC), F1 score, and confusion matrix scored by each model. Finally, we found that the top three algorithmic
models had an accuracy of 86–95%, an F1 score of 85–95%, and an average area under ROC curve of
OVA of 98–99.6%.
Keywords: predictive analysis, KNN, SVM, random forest, DT classifier, students’ academic performance
1. Introduction Higher education management is very concerned with

predicting a student’s success. The purpose of this work is
Due to the wealth of information in educational databases, to discover the variables affecting students’ performance
it is difficult to anticipate student success. Malaysia today on final exams and to create a suitable data mining
does not have a mechanism to research and monitor student method to forecast students’ grades in order to provide
progress and achievement. There are two main reasons for timely and pertinent warnings to students who are at
this. First and foremost, more research needs to be done danger. In the current study, a survey-based experimental
on existing prediction techniques before determining the approach was used to create a database constructed
best one for predicting academic performance in Malaysian using primary and secondary sources. Hypothesis test
institutions. A second reason is the lack of research on results showed that parental occupation had a significant
the context-specific Malaysian factors that influence student effect on performance prediction, while school type had
performance in different courses. Therefore, it is proposed little effect on child performance. The results of this
to improve student performance by conducting a thorough study will help educational institutions identify at-risk
literature review on data mining strategies for predicting students and provide better additional education to
student success. vulnerable students.
39
40 Ghosh et al.
1.1. Motivation educational purposes by V Ganesh Karthikeyan, P Tangaraj,

and S Karthik (2).
Analyzing educational data is not a new practice, but The first article used a 2-layer approach for classifying:
recent developments in educational technology, such as the first, it fed the data to the supervised learning classifiers and
development of methods to analyze the vast amounts of then used an unsupervised learning clustering algorithm (K-
data generated in education, have led to new practices. means clustering plus majority voting) in order to increase
This enthusiasm led to a series of EDM workshops as part the accuracy. The data set includes several features such as
of a number of international academic conferences from the Demographic, Academic, Behavioral, and Extra features.
2000 to 2007. The first EDM research conference, now held These data have been taken with regard to the interaction
annually, was founded in 2008 by a group of scholars in of the student with the online website of 2 institutions
Montreal, Quebec, Canada. from Kerala. The researchers took a subset of different
EDM researchers launched the Journal of Educational features and used 4 supervised classifying algorithms: Naive
Data Mining in 2009 as a platform to share and publish Baye’s, Decision Tree (Iterative Dichotomies 3 algorithm),
their findings as the field grew in popularity. To connect Support Vector Machine, and Neural Network (Multilayer
and advance the discipline, EDM researchers founded the Perceptron) in first layer and then the K-means clustering
International Educational Data Mining Consortium in 2011. algorithm that groups into clusters showing high, medium,
Public education data sets have made education data and low performance. Therefore, they used four performance
mining more accessible and convenient, fueling its measures of accuracy, recall, precision, and F1 score
growth, since the launch of public education data sets to compare the achievement of different classifiers on
in 2008, including the Pittsburgh Academic Science Center different criteria. Through various combinations of feature
(PSLC) and the National Center for Education Statistics and machine learning models, they finally arrived at the
(NCES) Database. conclusion that Academic and Behavioral features of students
Today, EDM is becoming increasingly popular to achieve contributed largely to the grade received by the student and
the following goals: Decision Tree was the most accurate model for predicting
grades with an accuracy of 75%
(a) Predict future learning behavior of students – This The model presented in the second article combines the
goal can be achieved by developing student models strengths of the J48 classifier with Naive Baye’s classification
that incorporate learner characteristics, such as specific approach to accurately classify student performance and
information about learner knowledge, behavior, and draw inferences. The information from two public schools
motivation to learn. It also assesses learners’ overall in the Alentejo area of Portugal for the academic year 2005–
learning satisfaction and user experience. 2006 was utilized in this article. In this study, Naive Baye
(b) Identifying or enhancing domain models – was used to classify the data into pass and fail lists and
Discovering new models and improving old models then a J48 decision tree classifier was employed to further
are made possible by a multitude of EDM techniques classify the data into “pending list” and “good list., suggests
and applications. Examples of this include designing a identifying student dispositions in categories such as “average
lesson sequence that best fits a student’s learning style list.” Precision, Recall, F1 Score, and Accuracy are used as the
or displaying lesson material to engage the student. four performance criteria to assess the effectiveness of this
technique. The final accuracy achieved by Naive Baye’s + J48
(c) Examine the results of educational support by
Decision Tree classifier was 98.6% using the WEKA tool.
learning systems.
(d) Create and use student models, EDM research,
technology, and software to enhance learning and
learners’ scientific understanding. These goals serve as 3. The proposed work
the basis for investing time and effort in development
and are the driving force behind this work. 3.1. Objective and goals
The purpose of this project is to take various features of

students as input features and predict the grade of the
2. Literature survey student based on this factor using various ML classification
techniques to classify students and help the institute to take
This section examines a number of classification techniques necessary actions and effort to improve the performance of
traditionally used by scholars to predict student academic students of each category and support them to success.
performance. Here, we mainly discuss his two methods. The
first is a hybrid data mining strategy (1) developed by Bindhia – The core objective of the suggested study is to put
K Francis and Suvanam Sasidha rBabu, and the second is several ML classifiers into practice for estimating
development of Hybrid Data Mining Models (HEDM) for learner progress.
10.54646/bijscit.2023.35 41
– We want to analyze our model’s performance using 14. Studytime – total amount of time spent studying
various performance metrics. each week (in hours: 0: 1, 1: 2, 2: 5, 3: 5, or 4: > 10)
– In-depth analysis considers the impact of all 15. Failures – this is the total number of failures in the
classifiers to determine the best classifier to predict previous class (number: n if 1 = n3, otherwise 4)
student performance. 16. schoolsup – continuing education support (binary:
yes or no)
17. Family Education Support Program (famsup)
3.2. Data set (binary: yes or no)
18. Paid – additional paid tuition (binary: yes or no) for
• The information utilized in this lesson was gathered course subjects (mathematics or Portuguese)
from two community schools in Portugal’s Alentejo 19. After-school activities (binary: yes or no)
area in the academic year 2005–2006. 20. Nursery – attended a preschool (binary: yes or no)
• The database was built from following sources: 21. Higher – intends to further their study (binary: yes
or no)
– Paper-based reports for school 22. Access to the Internet at home (number 22) (yes or
– Characteristics (the absence and three-period no in binary)
grades) 23. Romantic – engaged in a romantic partnership
– Questionnaires, which are used to supplement the (binary: yes or no)
earlier data 24. famrel – strength of kinship bonds (numeric: from
1 – very bad to 5 – excellent)
• Characteristic Information: 25. Especially after school, free time (count: 1 – very
low to 5 – very high)
Both the student-mat.csv (for a math course) and student- 26. Fun with friends on the go (number: 1 – very low to
por.csv (for a Portuguese language course) databases include 5 – very high)
the same attributes: 27. Darc – alcohol consumption at work (number: 1 –
very little to 5 – very high)
1. School (binary: “GP” for Gabriel Pereira or “MS” for 28. Weekend alcohol consumption statistics (number:
Mousinho da Silveira) 1 – very low to 5 – very high)
2. Student’s gender (binary: “F” – female or “M” – male) 29. Health – current state of health (number: 1 – very
3. Age of the student, 3 (numerical: 15–22) bad to 5 – very good)
4. Address – the address of the student residence (in the 30. Absences – total number of days absent from school
city or in the countryside, in binary format) (number: 0–93). # These numbers relate to math or
5. Famsize – family size (binary: “GT3” – higher than 3 Portuguese classes.
or “LE3” – fewer than 3) 31. G1 – first-grade point (numeric: 0–20)
6 Pstatus: the parents’ status of cohabitation (binary: 32. G2 is the second-class (numeric: 0–20)
“T” for “living together” or “A” for “apartment”) 33. G3 – final grade (numeric: 0–20, output target)
7. Medu – maternity education (number: 0 – none,
1 – primary school (grade), 2 – "grades 5–9", 3 –
“secondary education” or “further education”) 3.3. Methodology
8 Fedu – father’s education (count: 0 – none, 1 –
primary education (grade 4), 2 – grades 5–9, 3 – The proposed model (Figure 1) is structured for the analysis
secondary, or 4 –continuing education) and evaluation of PIDD. In our model, we first import the
9. Mjob stands for mother’s job and means “education,” specified data set. Then, we use different data visualization
“medical care,” “public service” (police, government, techniques:
etc.), “home,” or “other”
– Histogram (to check count of student receiving final
10. Fjob – father’s occupation (nominative: “education,”
grade for each age)
“medical,” “public service” (government, police, etc.),
– Count plot (to compare the count of students with
“household,” or “other”)
different attributes)
11. Proximity to this school choice justification
– Box plot (to check if any outliers are present in the data)
(nominally: “home,” school “call,” desired course, or
“other”) Based on the nature of data, we perform data
12. Parent – the person responsible for the student preprocessing by removing the outliers in the data. Next, we
(nominative: “mother„ “father,” or “other”) divide the data set into test and training data sets. Then we
13. Travel time from house to school (in minutes: 1: train the data set individually on 4 different classification
15 min, 2: 15–30 min, 3: 1 h, or 4: more than an hour) algorithms:
42 Ghosh et al.
FIGURE 3 | Graph to visualize null values.
FIGURE 1 | Data flow diagram of our model.
FIGURE 4 | Count plot showing male/female count.
FIGURE 2 | Count plot showing Students final grade.

FIGURE 5 | Count plot showing rural and urban students.
– Support Vector Machine (SVM)

– K-Nearest Neighbors (KNN)
– Decision Tree Classifier (DT)
– Random Forest Classifier (RF)
We evaluate the performance of each of the algorithms.

The performance metrics used are as follows:
– Accuracy measure
– ROC score
– F1 score
FIGURE 6 | Boxplot showing final grade (G3) vs. age.
Finally, we compare the analysis based on accuracy and
obtain the final results.
were collected. As can be seen in the figure, there is a large
group of students whose age is evaluated as 0. Since these
3.4. Data preprocessing students have no age record, they were replaced as 0 in order
to eliminate null.
3.4.1. An overview of the data set Figure 3 shows whether or not any attribute of the data
Figure 2 represents the final grade (G3) of data set vs. set contains any null value. The uniform cyan color confirms
count of students in each age group for whom the data that none of the attributes has any null values.
10.54646/bijscit.2023.35 43
corresponding number of categories and populate the correct

category with 1 and others with 0.
3.5. Algorithms
FIGURE 7 | Boxplot showing final grade (G3) vs. higher education. In this work, tests were performed using classifiers such
as SVMs, ANNs, decision trees, and random forests. These
classifiers evaluate the metrics of the data set. Pattern
recognition and nonlinear function estimation problems are
solved with SVM. Training data are nonlinearly represented
in a high-dimensional feature space using support vector
machines. This helps to build a separating hyper-plane with
the widest possible margins, resulting in a nonlinear decision
boundary in the input space. A quadratic programming
problem with a global solution provides a support vector
machine solution.
3.5.1. Decision tree algorithm

By inferring straightforward decision rules from the data,
FIGURE 8 | Working of SVM (4).
decision trees build a model or tree that anticipates the value
of a target variable (in this case, passed). For instance, based
on our data set, children with a single guardian are more
Figure 4 compares the count of female students and male likely to drop out of school. If one of the guardian variables
on whom the data set was made. is zero in this situation, a straightforward decision tree with
The number of female candidate is 208 and that of male one node will designate the students as dropouts (3).
candidate is 187.
Figure 5 compares the residential status (rural or urban 3.5.1.1. Advantages and disadvantages of decision tree. It
origin) and the count of students. is easily imaginable. With many other algorithms, this is not
The plot shows the number of urban students as 307 and achievable. For instance, with SVM, it is practically hard for
that of rural students as 88. us to imagine data in a 10-dimensional space. It is fast since
Figure 6 is a boxplot showing the range of final grade (G3) they have logarithmic run times. Although we are already
received by students of a particular age and compared with preprocessing the data in our instances, another benefit of
other age groups. decision trees is that they can employ categorical data even
Figure 7 is a boxplot showing the range of final grade (G3) without formatting.
received by students who opted for higher education vs. those Decision tree suffers from overfitting, and the F1 scores
who did not opt for higher education. for both training and test data sets differ significantly. It
Once the data set is collected, preprocessing techniques are does not perform well if one class dominates; however in
employed to enhance the data set’s quality. Data cleansing, such situation, I believe it to be 30% vs. 70%, making
feature selection, data transformation, and data reduction them fairly equal.
are all components of data preprocessing and are considered In light of this discussion, we attempted using decision
important phases of the knowledge discovery process. The trees for this specific issue because they are frequently utilized
process of transforming real-world information into a form for classification, simple to comprehend and depict, and
that can be used by a particular data mining technique appropriate for our data, where the majority of the variables
is called data preprocessing, and this is performed before are categorical.
algorithms are used.
The data set used by us is a rather clean data set that has 3.5.2. SVM
no null values, as indicated by Figure 3. Therefore, there is SVMs are a group of classifiers that look for the best linear
no need to filter null values and either delete or replace them separator between two classes. Although in fact most data
by 0. However, the data set contains various text values for would not be linearly separable, SVM uses a kernel method
attributes, such as Parent’s Cohabitation status, Family size, to add additional features to the data by combining existing
and School. Most of these attributes have a yes or no binary features in different ways. For instance, a group of data points
answer, which we mapped to 1 and 0, respectively. However, with x and y attributes might not be linearly separable in two-
for attributes like Family size, we split the attribute to the dimensional space, but they might be in three-dimensional
44 Ghosh et al.
FIGURE 9 | Working of KNN (5).
3.5.3. K-Nearest neighbors algorithm

Classification based on nearest neighbors’ algorithm is one
where no explicit model for the data is constructed. Classifier
simply stores the data and, whenever a new point comes,
classifier assigns the new point to a class based on its closest
k neighbors. The working of KNN is shown in Figure 9.
3.5.3.1. Advantages and disadvantages of K-Nearest

neighbors algorithm. Here training times are short. There
is essentially no training and no model. Therefore, it is easy
FIGURE 10 | Working of random forest classifier (6). to update KNN as the new data come. It will be very easy to
explaining this model in Laymen’s terms.
Predicting each new data point takes a long time. For
space where x2 + y2 is the third variable. The working each data point, we have to calculate its nearest neighbors.
principle of SVM is shown in Figure 8. Another disadvantage is the curse of dimensionality. As
the dimensions of the feature space increase, each data
3.5.2.1. SVM: Benefits and drawbacks. It offers two point is going to represent a bigger volume of space for
benefits. One reason they are known as support vector is that which it may not be representative. We wanted to try
they only employ a small subset of data to train the model. KNN, as it is a very simple algorithm that makes no
When we have a lot of data, this strategy is more memory- assumption about the data.
efficient. Another benefit is its robustness in situations where
the number of data points is on the order of the number of 3.5.4. Random forest algorithm
features. It experiences the dimensionality curse less severely. To generate a random forest, combine N decision trees. Then,
It has disadvantages too. It may be difficult to explain in the subsequent stage, we make guesses for each tree in the
how they work. The concept of high dimensionality may first stage. The working of random forest classifier is shown
be difficult to grasp. SVM also does not work when the in Figure 10.
number of features is much bigger than data, but this is
rarely the case. I believe it is suitable in this case because it 3.5.4.1. Advantages and disadvantages of random forest
is a widely used classification algorithm that performs best algorithm. Classification and regression tasks can be
right out of the box. performed on random forests. It can handle large data sets
10.54646/bijscit.2023.35 45
FIGURE 11 | Count plot for comparison of probability of outcome as grade X vs. All for decision tree.
FIGURE 12 | ROC OVR of the 4 grades using decision tree.
FIGURE 13 | Count plot for comparison of probability of outcome as grade X vs. all for KNN.
with high dimensions. It avoids overfitting problems and 4. Experimental results and
improves model accuracy.
Regression and classification problems may
discussion
be both accomplished using random forests, but
We have applied all the above-mentioned classifiers
they are not well suited for applications that
on the data set one by one and have calculated the
require regression.
performance metrics.
i) Decision Tree
Confusion matrix for test and train data set
3.6. Performance metrics used
This is an illustration of accuracy metrics including accuracy,

F-measure, and ROC (receiver operating curve) examined
for several models.
Accuracy (A) – It is the proportion of accurate predictions
to all input samples.
Accuracy (A): (TN + TP)/(TN + TP + FP + FN) (i)
F1 measure between recall and precision – It is the
harmonic mean. Better model performance is indicated by
a higher F1 score.
F1 measure (F): (2 ∗ P ∗ R) / (P + R) (ii)
Plotting the ratio of true positives (TP) to false positives
(FP) is known as ROC. Measurement of test utility is helpful.
46 Ghosh et al.
FIGURE 14 | ROC OVR of the 4 grades using KNN.
FIGURE 15 | Count plot for comparison of probability of outcome as grade X vs. all for SVC.
as a second grader, and three who were misclassified

as fourth graders.
A total of 38 children were in grade 4, of whom 35
were accurately identified as such, while 3 were misidentified
as being in grade 3. Count plot for comparison of
4.1. Conclusion from the train data’s
probability of outcome as grade X vs. all for decision
confusion matrix
is shown in Figure 11. And ROC OVR (One vs.
Seven children in group 1 were correctly classified as such. Rest) of the 4 grades using decision tree is shown in
Of the 61 children, 59 were correctly assigned to Group 2, Figure 12.
while the other 2 were incorrectly assigned to Group 3. ii) KNN
Of the 117 third graders, 29 were correctly identified, 1
was misclassified as a second grader, and the other 2 were
misclassified as fourth graders.
Of the 115 grade 4 students, 112 were correctly identified
as grade 4 and 3 were incorrectly identified as grade 3.
4.2. Conclusion from the test data’s

confusion matrix
Three of the six grade 1 kids in the test group were accurately
categorized as being in grade 1, while the other three were
categorized as being in grade 2.
All 19 of the grade 2 kids were appropriately categorized as
being in grade 2.
The 30 third-grade pupils were divided into 29 who were
accurately identified as such, one who was misclassified
10.54646/bijscit.2023.35 47
FIGURE 16 | ROC OVR of the 4 grades using SVC.
FIGURE 17 | Count plot for comparison of probability of outcome as grade X vs. all for random forest.
FIGURE 18 | ROC OVR of the 4 grades using random forest.
4.3. Conclusion from the train data’s TABLE 1 | Comparison between performances of different classifier.
confusion matrix Classifier Name F1 score Accuracy Area under ROC curve
Four of the seven grade 1 kids in the test group were Decision tree 0.90 0.90 0.91
accurately categorized as being in grade 1, while three were K-Nearest neighbor 0.85 0.86 0.98
SVM 0.95 0.95 0.996
appropriately categorized as being in grade 2.
Random forest 0.80 0.83 0.98
Of the 61 grade 2 pupils, 54 were accurately categorized
as being in that grade, while 7 were misclassified as
being in that grade. 4.4. Conclusion from the test data’s
Of the 117 third-grade pupils, 107 were accurately confusion matrix
identified as such, while 2 were misclassified as second
graders and another 8 as fourth graders. One of the six grade 1 pupils in the test group was
Of the 115 grade 4 pupils, 106 were categorized properly as accurately categorized as in grade 1, while the other five were
being in grade 4 and 9 were misclassified as being in grade 3. categorized as in grade 2.
48 Ghosh et al.
FIGURE 19 | Comparison of performance of different classifiers.
Of the 19 kids in grade 2, 18 were appropriately identified Of the 117 third-grade pupils, 111 were accurately
as such, while 1 was mistakenly placed in grade 3. identified as such, while 2 were misclassified as second
Of the 30 third-grade pupils, 28 were accurately identified graders and another 4 as fourth graders.
as such, while 1 was misclassified as a second grader and the Of the 115 grade 4 pupils, 113 were accurately categorized
remaining 3 as a fourth grader. as being in grade 4, while 2 were mistakenly categorized as
Of the 38 fourth-grade pupils, 35 were accurately identified being in grade 3.
as such, while 3 others were misidentified as such. Count plot
for comparison of probability of outcome as grade X vs. all for
KNN is shown in Figure 13. And ROC OVR of the 4 grades
using KNN is shown in Figure 14. 4.6. Conclusion from the test data’s
iii) SVM confusion matrix
Confusion matrix for test and train data set
Of the six grade 1 pupils tested, four were
accurately categorized as in grade 1, while two were
categorized as in grade 2.
All 19 of the grade 2 children were indeed
placed in that grade.
Notably, 32 pupils in grade 3 were evaluated, of whom
29 were accurately identified as such, while 3 were
misclassified as such.
Out of the 38 grade 4 students, all were correctly classified
as grade 4. Count plot for comparison of probability of
outcome as grade X vs. all for SVC is shown in Figure 15. And
ROC OVR of the 4 grades using SVC is shown in Figure 16.
iv) Random Forest
4.5. Conclusion from the train data’s

confusion matrix
All seven grade 1 pupils in the test group were accurately

identified as such.
Of the 61 grade 2 kids, 60 were accurately identified as
such, while 1 was misidentified as a grade 3 student.
10.54646/bijscit.2023.35 49
5. Conclusion and future scope

This study suggests using data mining techniques to
forecast students’ final grades based on past performance.
4.7. Conclusion from the train data’s The accuracy rates of four classification methods (KNN
Classifier, Support Vector Classifier, Random Forest, and DT
confusion matrix
classifier) were compared.
All seven grade 1 children in the test data were misclassified Using the accuracy score, ROC, F1 score, and Confusion
as being in grade 2. Matrix calculated from each model, the performance
Of the 61 grade 2 kids, 57 were accurately identified as of all four classifiers is compared. Finally, we found
such, while 4 were misclassified as grade 3. that the average area under the ROC curve for OVA
Of the 117 third-grade pupils, 109 were accurately for our top 3 algorithm models is between 98 and
identified as such, while 2 were misclassified as second 99.6%, with an accuracy range of 86–95% and F1 score
graders and another 6 as fourth graders. of 85–95%.
Of the 115 grade 4 pupils, 114 were accurately identified Other factors can be used as input variables and other
as being in that grade, while 1 kid was misclassified as machine learning algorithms added to the modeling process
being in grade three. to perform future research. In addition, it is important to
make use of data mining techniques’ effectiveness to examine
students’ academic behaviors, deal with their issues, improve
4.8. Conclusion from the test data’s the learning environment, and enable data-driven decision-
confusion matrix making.
Different feature selection techniques may be utilized in
All six grade 1 pupils in the test sample were appropriately the future. The data sets can also be used with a variety of
categorized as grade 2 students. classification algorithms.
Of the 19 pupils in grade 2, 15 were accurately identified as
such, while 4 were identified as grade 3.
Of the 32 third-grade pupils, 28 were accurately identified References
as such, while 4 were misclassified as fourth-grade students.
1. Francis BK, Babu SS. Predicting academic performance of students using
Out of the 38 grade 4 students, 36 were correctly classified
a hybrid data mining approach. J Med Syst. (2019) 43:162. doi: 10.1007/
as grade 4 while 2 were classified as grade 3. Count plot for s10916-019-1295-4
comparison of probability of outcome as grade X vs. all for 2. Karthikeyan VG, Thangaraj P, Karthik S. Towards developing hybrid
random forest is shown in Figure 17. And ROC OVR of the educational data mining model (HEDM) for efficient and accurate
4 grades using random forest is shown in Figure 18. student performance evaluation. Soft Comput. (2020) 24:18477–87. doi:
10.1007/s00500-020-05075-4
Performance of each model on test data is mentioned in
3. AfrozChakure. Decision Tree Classification. Medium (2019). Available
Table 1. online at: https://medium.com/swlh/decision-tree-classification-
The training set and test set accuracy for each model is de64fc4d5aac (accessed July 6, 2019).
presented side by side in Figure 19. 4. Javatpoint. Support Vector Machine Algorithm. Available online
In this work, we applied various classification techniques at: https://www.javatpoint.com/machine-learning-support-vector-
on the student data set to determine the student’s final result machine-algorithm
5. AfrozChakure. K-Nearest Neighbors (KNN) Algorithm. Medium (2019).
(i.e., Pass or Fail) and we have found that the SVC can be
Available online at: https://www.datacamp.com/community/tutorials/k-
a reliable model to predict student performance accurately nearest-neighbor-classification-scikit-learn (accessed July 6, 2019).
because it shows consistent accuracy, F1 score, and AU-ROC 6. Javatpoint. Random Forest Algorithm. Available online at: https://www.
curve performance. javatpoint.com/machine-learning-random-forest-algorithm

Data Mining Approach To Predict Academic Performance of Students

Uploaded by

Copyright:

Available Formats

Data Mining Approach To Predict Academic Performance of Students

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Mining Approach To Predict Academic Performance of Students

Uploaded by

Copyright:

Available Formats

BOHR International Journal of Smart Computing

and Information Technology

Data mining approach to predict academic performance

Received: 20 May 2023; Accepted: 12 June 2023; Published: 20 July 2023

1. Introduction Higher education management is very concerned with

1.1. Motivation educational purposes by V Ganesh Karthikeyan, P Tangaraj,

The purpose of this project is to take various features of

FIGURE 3 | Graph to visualize null values.

FIGURE 1 | Data flow diagram of our model.

FIGURE 4 | Count plot showing male/female count.

FIGURE 2 | Count plot showing Students final grade.

– Support Vector Machine (SVM)

We evaluate the performance of each of the algorithms.

corresponding number of categories and populate the correct

3.5.1. Decision tree algorithm

FIGURE 9 | Working of KNN (5).

3.5.3. K-Nearest neighbors algorithm

3.5.3.1. Advantages and disadvantages of K-Nearest

FIGURE 12 | ROC OVR of the 4 grades using decision tree.

This is an illustration of accuracy metrics including accuracy,

FIGURE 14 | ROC OVR of the 4 grades using KNN.

as a second grader, and three who were misclassified

4.2. Conclusion from the test data’s

FIGURE 16 | ROC OVR of the 4 grades using SVC.

FIGURE 18 | ROC OVR of the 4 grades using random forest.

FIGURE 19 | Comparison of performance of different classifiers.

4.5. Conclusion from the train data’s

All seven grade 1 pupils in the test group were accurately

5. Conclusion and future scope

You might also like