1822 b.e Cse Batchno 7
1822 b.e Cse Batchno 7
1822 b.e Cse Batchno 7
MACHINE LEARNING
by
R.Manideep (38110296)
P.V.Adarsh Kumar (38110011)
SATHYABAMA
INSTITUTE OF SCIENCE AND TECHNOLOGY
(DEEMED TO BE UNIVERSITY)
Accredited with Grade “A” by NAAC
MARCH – 2022
I
SATHYABAMA
(DEEMED TO BE UNIVERSITY)
Accredited with “A” grade by NAAC
Jeppiaar Nagar, Rajiv Gandhi Salai, Chennai - 600119
www.sathyabama.ac.in
BONAFIDE CERTIFICATE
This is to certify that this Project Report is the bonafide work of R.Manideep
(38110296) and P.V.Adarsh Kumar (38110011) who carried out the project
entitled “PREDICTING STUDENT'S PERFORMANCE BASED ON MACHINE
LEARNING” under my supervision from November 2021 to March 2022.
Internal Guide
MRS D.Deepa
II
DECLARATION
DATE:
III
ACKNOWLEDGEMENT
IV
ABSTRACT
V
TABLE OF CONTENTS
VI
B. SCREENSHOTS 41
C. PLAGIARISM REPORT 43
D. JOURNAL PAPER 45
VII
LIST OF FIGURES
LIST OF ABBREVIATIONS
ABBREVIATIONS EXPANSION
ML Machine Learning
NB Naive Bayes
VIII
CHAPTER 1
INTRODUCTION
1.1 OVERVIEW
There are many studies in the learning field that investigated the ways of
applying machine learning techniques for various educational purposes. One of the
focuses of these studies is to identify high-risk students, as well as to identify
features which affect the performance of students. Students are the major strength
for numerous universities. Universities and students play a significant part in
producing graduates of superior calibers with its academic performance
accomplishment. However, academic performance achievement changes as various
sort of students may have diverse degree of performance achievement. Machine
learning is the ability of a system to consequently gain from past experience and
improve performance. Nowadays machine learning for education gains more
attention. Machine learning is used for analyzing information based on past
experience and predicting future performance.
1
algorithms instead give computers to coach on knowledge inputs and use applied
math analysis so as to output values that fall inside a particular vary. thanks to
this, machine learning facilitates computers in building models from sample
knowledge so as to modify decision-making processes supported knowledge
inputs.
1.3 OBJECTIVE
The main objective this paper introduces a ML model that classify and predict
student performance by utilizing supervised ML algorithms like Naïve Bayes and K-
Nearest Neighbour. Thus, the proposed approach offers a solution to predict
performance efficiently and accurately by comparing several ML model.
CHAPTER 2
LITERATURE SURVEY
Durgesh Ugale et.al[2] discussed the handling step will be applied to the crude
informational index to appropriately utilize the extraction calculation. The
"execution" of guiding an understudy can assist with working on their presentation.
2
The answer for the issue is very distinct. He didn't remark on the understudies'
reactions. You didn't talk about the cutoff. You didn't analyze the idea of the
prescience.
Ali Salah Hashim et.al[3] In this review, we analyzed the exhibition of machine-
controlled calculations. As per the consequences of the overview, the
coordination‟s grouping was the most right in picking the last score of the
understudy (68.7% of champs and 88.8% of washouts). Depict the strategies
utilized in insightful exploration to anticipate understudy execution. You have not
appraised the nature of the preparation. You have not thought about the models.
We talk about dynamic trees, data mining techniques, and a blend of strategies
that empower understudies to anticipate understudy execution, and educators can
make significant strides in creating understudy information. execution. The
presentation of different tree calculations can be investigated dependent on its
exactness and timing of tree conveyance. Speculations eliminated from the
framework assisted the educator with distinguishing understudies with incapacities
and further develop execution. Knowing the quantity of gatherings ahead of time. It
is hard to know the quantity of gatherings when there is a slight change in the
information.
3
Leena H. Alamri et.al[6] proposed consequences of the SVM and RF calculations
utilized in the two segments show that the honesty of the two parts is up to 93%,
while the least RMSE is 1.13 for the RF. The request in which the examination
report is set up is simple and quick to foresee. It functions admirably in numerous
classes of speculations. Since SVMs are known as terrible number crunchers,
yield as expectations ought not be viewed as significant.
C. Verma et.al[7] discussed the outcomes show that understudy interest and GPA
in the principal semester drove the whole choice interaction, and the Bayesian
organization is better than the choice tree because of its high precision. It works
better as far as pay level than the quantity of factors. In the event that you attempt
a little example, the information might be excessively high or excessively high.
Ferda Ünal[8] proposed this paper exhibits the utilization of information recovery
innovation to distinguish eventual outcomes dependent on understudy history. In the
exploration, three techniques for information mining (choice tree, non-repeating
memory, and naive Bayes) were utilized in two lines of information science and
Portuguese. The outcomes show that mining strategies are valuable in choosing
understudy execution. Build up clear guidelines for anticipating preparing data. With
the progression of innovation, e-learning as a web-based learning webpage and
progressed mixed media innovation, preparing costs have been decreased and time
and difficulties have been eliminated.
4
and knowledge to support decisionmaking. While it is important to have models at
local level, their results makes it difficult to extract knowledge that can be useful at
the global level. Therefore, to support decision making at this area, it is important to
generalize the information contained in those models, specific classifier method can
be used to generalize these rules for global model.
Engineering schools worldwide have a relatively high attrition rate. Typically, about
35% of the first-year students in various engineering programs do not make it to the
second year. Of the remaining students, quite often they drop out or fail in their
second or third year of studies. The purpose of this investigation is to identify the
factors that serve as good indicators of whether a student will drop out or fail the
program. In order to establish early warning indicators, principal component analysis
is used to analyze, in the first instance, first-year engineering student academic
records. These performance predictors, if identified, can then be used effectively to
formulate corrective action plans to improve the attrition rate.
5
Education is the backbone of all developing countries. Upgrading of the education
system, upgrades the country to the world top ranking level. One of the major
problems that the education system facing is predicting the behaviour of students
from large database. This paper focus on upgrading Indian education system by
using one of the techniques in Data mining namely clustering. Cluster analysis
solves the given data into some meaningful groups. Normally the performances of
the students can be classified into different patterns as normal, average and below
average. In this paper we attempt to analyze student's data in different angle beyond
the above indicated patterns through newly proposed UCAM (Unique clustering with
Affinity Measures) clustering algorithm.
6
Now a day‟s students have a large set of data having precious information hidden.
Data mining technique can help to find this hidden information. In this paper, data
mining techniques name Byes classification method is used on these data to help an
institution. Institutions can find those students who are consistently perform well.
This study will help to institution reduce the drop put ratio to a significant level and
improve the performance level of the institution.
The present study was conducted on 400 students (200 boys and 200 Girls) selected
from senior secondary school of A.M.U., Aligarh-India, to establish the prognostic
value of different measures of cognition, personality and demographic variables for
success at higher secondary level in science stream. The scores obtained on
different variables were factor-analyzed to get a smaller number of meaningful
variables or factors to establish the predictive validity of these predictors. Factors
responsible for success in science stream were identified. The prognostic value of
the predictors was compared for high achievers and low achievers in order to identify
the factors which differentiate them.
7
paper is an attempt to use the data mining processes, particularly classification, to
help in enhancing the quality of the higher educational system by evaluating student
data to study the main attributes that may affect the student performance in courses.
For this purpose, the CRISP framework for data mining is used for mining student
related academic data. The classification rule generation process is based on the
decision tree as a classification method where the generated rules are studied and
evaluated. A system that facilitates the use of the generated rules is built which
allows students to predict the final grade in a course under study.
Few years ago, the information flow in education field was relatively simple and the
application of technology was limited. However, as we progress into a more
integrated world where technology has become an integral part of the business
processes, the process of transfer of information has become more complicated.
Today, one of the biggest challenges that educational institutions face is the
explosive growth of educational data and to use this data to improve the quality of
managerial decisions. Data mining techniques are analytical tools that can be used
to extract meaningful knowledge from large data sets. This paper addresses the
applications of data mining in educational institution to extract useful information
from the huge data sets and providing analytical tool to view and use this information
for decision making processes by taking real life examples.
CHAPTER 3
METHODOLOGY
8
classification are implemented in existing framework. The datasets are uploaded in
WEKA tool with any WINDOWS OS configuration. K-means clustering algorithms
provides reduce number of error rate values.
EXISTING ALGORITHM
What is K-means?
Details of K-means
1. Initial centroids are often chosen randomly. - Clusters produced vary from one run
to another
• 5. Most of the convergence happens in the first few iterations. - Often the
stopping condition is changed to „Until relatively few points change clusters‟
In the proposed system Machine Learning algorithm is used for the classification. An
automated evaluation system has been proposed to evaluate student performance
and to analyze. A prediction system has been proposed by using their marks, staff
opinion, attendance and ragging. The study is evaluated using machine learning
classifier. Analysis of the student behaviour has been proposed by using intellectual
9
parameters of the student which affect their study. Various mining techniques are
used to determine the educational data covering some factors. In this paper, a novel
approach based on KNN significant academic attributes for performance predictions.
The experiment displays good performance of the proposed algorithm and was
compared to similar approaches over the same dataset. By analyzing the
experimental results, it is observed that the Naïve Bayes and KNN algorithm turned
out to be best classifier for student performance prediction because it contains more
accuracy and least error rate.
• We did same steps to build cumulative predicting model and there may be
some change of syntax due to different technology as in paper author used
JAVA.
Naive Bayes model is easy to build and particularly useful for very large data sets.
Along with simplicity, Naive Bayes is known to outperform even highly sophisticated
classification methods.
10
Bayes theorem provides a way of calculating posterior probability P(c|x) from P(c),
P(x) and P(x|c). Look at the equation below:
Above,
Let‟s understand it using an example. Below I have a training data set of weather
and corresponding target variable „Play‟ (suggesting possibilities of playing). Now,
we need to classify whether players will play or not based on weather condition. Let‟s
follow the below steps to perform it.
Step 2: Create Likelihood table by finding the probabilities like Overcast probability =
0.29 and probability of playing is 0.64.
11
Fig:3.3.2 :Likelihood table
Step 3: Now, use Naive Bayesian equation to calculate the posterior probability for
each class. The class with the highest posterior probability is the outcome of
prediction.
Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14 =
0.64
Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability.
Naive Bayes uses a similar method to predict the probability of different class based
on various attributes. This algorithm is mostly used in text classification and with
problems having multiple classes.
Pros:
12
It is easy and fast to predict class of test data set. It also perform well in multi
class prediction
When assumption of independence holds, a Naive Bayes classifier performs
better compare to other models like logistic regression and you need less
training data.
It perform well in case of categorical input variables compared to numerical
variable(s). For numerical variable, normal distribution is assumed (bell curve,
which is a strong assumption).
Cons:
If categorical variable has a category (in test data set), which was not
observed in training data set, then model will assign a 0 (zero) probability and
will be unable to make a prediction. This is often known as “Zero Frequency”.
To solve this, we can use the smoothing technique. One of the simplest
smoothing techniques is called Laplace estimation.
On the other side naive Bayes is also known as a bad estimator, so the
probability outputs from predict_proba are not to be taken too seriously.
Another limitation of Naive Bayes is the assumption of independent
predictors. In real life, it is almost impossible that we get a set of predictors
which are completely independent.
13
e-mail) and Sentiment Analysis (in social media analysis, to identify positive
and negative customer sentiments)
Recommendation System: Naive Bayes Classifier and Collaborative
Filtering together builds a Recommendation System that uses machine
learning and data mining techniques to filter unseen information and predict
whether a user would like a given resource or not.
STEP 1: BEGIN
STEP 8: END
14
3.4 SYSTEM ARCHITECTURE
HARDWARE REQUIREMENTS:
System - Pentium-IV
15
Speed - 2.4GHZ
Hard disk - 40GB
Monitor - 15VGA color
RAM - 512MB
SOFTWARE REQUIREMENTS:
3.6 MODULES
Data Collection
Preprocessing
Classification module
Prediction
DATA COLLECTION
In this module, Student data‟s will be collected from the college. Student‟s data like
mark, attendance, staff opinion, extracurricular activities, Ragging and stress.
PREPROCESSING
Data pre-processing is done to remove the incomplete noisy and inconsistent data.
Data must be pre-processed before using in feature selection task.
CLASSIFICATION MODULE
16
The data mining techniques used for identifying the performance of the student using
Naïve Bayes and KNN algorithms. These two algorithm‟s identifies and analyses the
performance of the student.
PREDICTION
In this module, to predict the student performance based upon student mark,
attendance, staff opinion, extracurricular activities, Ragging and stress.
ADVANTAGES
To represent complete systems (instead of only the software portion)
using object oriented concepts
To establish an explicit coupling between concepts and executable
code
To take into account the scaling factors that are inherent to complex
and critical systems
To creating a modeling language usable by both humans and
machines
17
The deployment model provides details that pertain to process
allocation
USECASE DIAGRAM
Use case diagrams overview the usage requirement for system. They are
useful for presentations to management and/or project stakeholders, but for actual
development you will find that use cases provide significantly more value because
they describe “the meant” of the actual requirements. A use case describes a
sequence of action that provides something of measurable value to an action and is
drawn as a horizontal ellipse.
18
SEQUENCE DIAGRAM
Sequence diagram model the flow of logic within your system in a visual
manner, enabling you both to document and validate your logic, and commonly
used for both analysis and design purpose. Sequence diagram are the most popular
UML artifact for dynamic modeling, which focuses on identifying the behavior within
your system.
ACTIVITY DIAGRAM
19
Activity diagram are graphical representations of workflows of stepwise
activities and actions with support for choice, iteration and concurrency. The activity
diagrams can be used to describe the business and operational step-by-step
workflows of components in a system. Activity diagram consist of Initial node, activity
final node and activities in between.
JAVA
20
Java is one of the world‟s most important and widely used computer
languages, and it has held this distinction for many years. Unlike some other
computer languages whose influence has weared with passage of time, while Java's
has grown.
APPLICATION OF JAVA
Java is widely used in every corner of world and of human life. Java is not
only used in softwares but is also widely used in designing hardware controlling
software components. There are more than 930 million JRE downloads each year
and 3 billion mobile phones run java.
FEATURES OF JAVA
The prime reason behind creation of Java was to bring portability and security
feature into a computer language. Beside these two major features, there were many
other features that played an important role in moulding out the final form of this
outstanding language. Those features are;
1) Simple
Java is easy to learn and its syntax is quite simple, clean and easy to
understand. The confusing and ambiguous concepts of C++ are either left out in
Java or they have been re-implemented in a cleaner way.
21
Eg: Pointers and Operator Overloading are not there in java but were an important
part of C++.
2) Object Oriented
In java everything is Object which has some data and behaviour. Java can be
easily extended as it is based on Object Model.
3) Robust
4) Platform Independent
Unlike other programming languages such as C, C++ etc. which are compiled
into platform specific machines. Java is guaranteed to be write-once, run-anywhere
language.
On compilation Java program is compiled into byte code. This byte code is
platform independent and can be run on any machine, plus this byte code format
also provide security. Any machine with Java Runtime Environment can run Java
Programs.
22
Fig:3.8.1 : platform independence method
5) Secure
When it comes to security, Java is always the first choice. With java secure
features it enable us to develop virus free, temper free system. Java program always
runs in Java runtime environment with almost null interaction with system OS, hence
it is more secure.
6) Multi-Threading
7) Architectural Neutral
8) Portable
23
Java Byte code can be carried to any platform. No implementation dependent
features. Everything related to storage is predefined, example: size of primitive data
types
COLLECTION FRAMEWORK
Collection framework was not part of original Java release. Collections was added to
J2SE 1.2. Prior to Java 2, Java provided adhoc classes such as Dictionary, Vector,
Stack and Properties to store and manipulate groups of objects. Collection
framework provides many important classes and interfaces to collect and organize
group of alike objects.
24
MYSQL
MySQL, officially, but also called "My Sequel" is the world's most widely used
open-source relational database management system (RDBMS) that runs as a
server providing multi-user access to a number of databases, though SQLite
probably has more total embedded deployments. The SQL phrase stands for
Structured Query Language.
The MySQL development project has made its source code available under
the terms of the GNU General Public License, as well as under a variety of
proprietary agreements. MySQL was owned and sponsored by a single for-profit
firm, the Swedish company MySQL AB, now owned by Oracle Corporation. MySQL
is a popular choice of database for use in web applications, and is a central
component of the widely used LAMP open source web application software stack
(and other 'AMP' stacks).
For commercial use, several paid editions are available, and offer additional
functionality. Applications which use MySQL databases include: TYPO3, MODx,
Joomla, WordPress, phpBB, MyBB, Drupal and other software. MySQL is also used
in many high-profile, large-scale websites, including Wikipedia, Google (though not
for searches), Facebook, Twitter, Flicker and YouTube.
25
that determine the need for analysing, documenting, validating and managing
software or system requirements. The requirements should be documentable,
actionable, measurable, testable and traceable related to identified business needs
or opportunities and define to a level of detail, sufficient for system design.
FUNCTIONAL REQUIREMENTS
Usability
It specifies how easy the system must be use. It is easy to ask queries in any
format which is short or long, porter stemming algorithm stimulates the desired
response for user.
Robustness
It refers to a program that performs well not only under ordinary conditions but
also under unusual conditions. It is the ability of the user to cope with errors for
irrelevant queries during execution.
Security
Reliability
It is the probability of how often the software fails. The measurement is often
expressed in MTBF (Mean Time Between Failures). The requirement is needed in
order to ensure that the processes work correctly and completely without being
aborted. It can handle any load and survive and survive and even capable of working
around any failure.
26
Compatibility
It is supported by version above all web browsers. Using any web servers like
localhost makes the system real-time experience.
Flexibility
The flexibility of the project is provided in such a way that is has the ability to
run on different environments being executed by different users.
Safety
Portability
It is the usability of the same software in different environments. The project
can be run in any operating system.
Performance
These requirements determine the resources required, time interval,
throughput and everything that deals with the performance of the system.
Accuracy
The result of the requesting query is very accurate and high speed of
retrieving information. The degree of security provided by the system is high and
effective.
Maintainability
Project is simple as further updates can be easily done without affecting its
stability. Maintainability basically defines that how easy it is to maintain the system. It
means that how easy it is to maintain the system, analyse, change and test the
27
application. Maintainability of this project is simple as further updates can be easily
done without affecting its stability.
FEASIBILITY STUDY
The feasibility of the project is analyzed in this phase and business proposal
is put forth with a very general plan for the project and some cost estimates. During
system analysis the feasibility study of the proposed system is to be carried out. This
is to ensure that the proposed system is not a burden to the company. For feasibility
analysis, some understanding of the major requirements for the system is essential.
The feasibility study investigates the problem and the information needs of the
stakeholders. It seeks to determine the resources required to provide an information
systems solution, the cost and benefits of such a solution, and the feasibility of such
a solution.
ECONOMICAL FEASIBILITY
This study is carried out to check the economic impact that the system will
have on the organization. The amount of fund that the company can pour into the
research and development of the system is limited. The expenditures must
justified. Thus the developed system as well within the budget and this was
achieved because most of the technologies used are freely available. Only the
customized products had to be purchased.
28
TECHNICAL FEASIBILITY
This study is carried out to check the technical feasibility, that is, the
technical requirements of the system. Any system developed must not have a high
demand on the available technical resources. This will lead to high demands on the
available technical resources. This will lead to high demands being placed on the
client. The developed system must have a modest requirement, as only minimal or
null changes are required for implementing this system.
SOCIAL FEASIBILITY
CHAPTER 4
4.1 WORKING
INPUT DESIGN
The input design is the link between the information system and the user. It
comprises the developing specification and procedures for data preparation and
those steps are necessary to put transaction data in to a usable form for processing
29
can be achieved by inspecting the computer to readata from a written or printed
document or it can occur by having people keying the data directly into the system.
The design of input focuses on controlling the amount of input required, controlling
the errors, avoiding delay, avoiding extra steps and keeping the process simple. The
input is designed in such a way so that it provides security and ease of use with
retaining the privacy. Input Design considered the following things:
OUTPUT DESIGN
A quality output is one, which meets the requirements of the end user and
presents the information clearly. In any system results of processing are
communicated to the users and to other system through outputs. In output design it
is determined how the information is to be displaced for immediate need and also the
hard copy output. It is the most important and direct source information to the user.
Efficient and intelligent output design improves the system‟s relationship to help user
decision-making.
The output form of an information system should accomplish one or more of the
following objectives.
Test plan
30
item. Testing assesses the quality of the product. Software testing is a process that
should be done during the development process. In other words software testing is a
verification and validation process.
Verification
Verification is the process to make sure the product satisfies the conditions
imposed at the start of the development phase. In other words, to make sure the
product behaves the way we want it to.
Validation
Validation is the process to make sure the product satisfies the specified
requirements at the end of the development phase. In other words, to make sure the
product is built as per customer requirements.
There are two basics of software testing: black box testing and white box
testing.
Black box testing is a testing technique that ignores the internal mechanism of
the system and focuses on the output generated against any input and execution of
the system. It is also called functional testing.
White box testing is a testing technique that takes into account the internal
mechanism of a system. It is also called structural testing and glass box testing.
Black box testing is often used for validation and white box testing is often used for
verification.
31
Types of testing
Unit Testing
Integration Testing
Functional Testing
System Testing
Stress Testing
Performance Testing
Usability Testing
Acceptance Testing
Regression Testing
Beta Testing
Unit Testing
Unit testing is the testing of an individual unit or group of related units. It falls
under the class of white box testing. It is often done by the programmer to test that
the unit he/she has implemented is producing expected output against given input.
Integration Testing
Functional Testing
System Testing
32
System testing is the testing to ensure that by putting the software in different
environments (e.g., Operating Systems) it still works. System testing is done with full
system implementation and environment. It falls under the class of black box testing.
Stress Testing
Stress testing is the testing to evaluate how system behaves under unfavorable
conditions. Testing is conducted at beyond limits of the specifications. It falls under
the class of black box testing.
Performance Testing
Usability Testing
Acceptance Testing
Acceptance testing is often done by the customer to ensure that the delivered
product meets the requirements and works as the customer expected. It falls under
the class of black box testing.
Regression Testing
33
CHAPTER 5
CONCLUSION
5.1 CONCLUSION
The paper focuses on the student academic growth analysis using machine learning
techniques. For analysis Naïve Bayes and KNN classifier are used. This process can
help the instructor to decide easily about performance of the students and schedule
better method for improving their academics. In future additional features are added
to our dataset to acquire better accuracy.
REFERENCES
[2] Ugale, Durgesh, Jeet Pawar, Sachin Yadav, and Chandrashekhar Raut.
"STUDENT PERFORMANCE PREDICTION USING DATA MINING TECHNIQUES."
(2020).
34
[3] Hashim, Ali Salah, Wid Akeel Awadh, and Alaa Khalaf Hamoud. "Student
performance prediction model based on supervised machine learning algorithms."
In IOP Conference Series: Materials Science and Engineering, vol. 928, no. 3, p.
032019. IOP Publishing, 2020.
[4] Alnassar, Fatema, Tim Blackwell, Elaheh Homayounvala, and Matthew Yee-king.
"How Well a Student Performed? A Machine Learning Approach to Classify
Students‟ Performance on Virtual Learning Environment." In 2021 2nd International
Conference on Intelligent Engineering and Management (ICIEM), pp. 1-6. IEEE,
2021.
[5] Saifuzzaman, M., Parvin, M., Jahan, I., Moon, N.N., Nur, F.N. and Shetu, S.F.,
2021, June. Machine Learning Approach to Predict SGPA and CGPA. In 2021
International Conference on Artificial Intelligence and Computer Science Technology
(ICAICST) (pp. 211-216). IEEE.
[6] H. Alamri, Leena, Ranim S. Almuslim, Mona S. Alotibi, Dana K. Alkadi, Irfan Ullah
Khan, and Nida Aslam. "Predicting Student Academic Performance using Support
Vector Machine and Random Forest." In 2020 3rd International Conference on
Education Technology Management, pp. 100-107. 2020.
[7] Verma, C., Illés, Z. and Stoffová, V., 2019, February. Age group predictive
models for the real time prediction of the university students using machine learning:
Preliminary results. In 2019 IEEE International Conference on Electrical, Computer
and Communication Technologies (ICECCT) (pp. 1-7). IEEE.
[8] Ünal, Ferda. "Data mining for student performance prediction in education." Data
Mining-Methods, Applications and Systems (2020).
35
[9] Q. A. AI-Radaideh, E. W. AI-Shawakfa, and M. I. AI-Najjar, “Mining student data
using decision trees”, International Arab Conference on Information
Technology(ACIT'2006), Yarmouk University, Jordan, 2006.
APPENDICES
A. SOURCE CODE
NaiveBayes.java
import java.util.ArrayList;
//holds the mean values for each attribute in the yes class
//holds the standard deviation values for each attribute in the yes class
36
private double yesStdDev[];
for(Person p : training) {
attributes = p.getAttributes();
if(p.getOutcome() == Outcome.yes) {
y++;
yesMean[i] += attributes[i];
37
else {
n++;
noMean[i] += attributes[i];
yesMean[i] = yesMean[i] / y;
noMean[i] = noMean[i] / n;
for(Person p : training) {
attributes = p.getAttributes();
if(p.getOutcome() == Outcome.yes) {
yesStdDev[i] += Math.pow(attributes[i] -
yesMean[i], 2);
else {
38
}
yesStdDev[i] = Math.sqrt((yesStdDev[i])/(y-1));
noStdDev[i] = Math.sqrt((noStdDev[i])/(n-1));
pYesGivenE *= pYes;
pNoGivenE *= pNo;
39
if(pNoGivenE > pYesGivenE) {
return Outcome.no;
else {
return Outcome.yes;
return probability;
40
B. SCREENSHOTS
41
Fig:B.3 : Accuracy Graph
42
C. PLAGIARISM REPORT
43
D. JOURNAL PAPER
123
Sathyabama Institute of Science and Technology, India
44
scholarly execution. It is additionally open from the e- different kinds of knowledge can be discovered using
learning chronicle framework, which is utilized by association rules, classification and grouping.
many organizations.
By using this we extract knowledge that
It utilizes various strategies to successfully describes students’ performance in examination and
carry out information mining, for example, K-implies all their detail information. From This huge amounts
Clustering, K-Near neighbour calculations. Utilizing of data, the first task is to sort them out, cluster
these abilities, it is feasible to acquire various kinds of
analysis is to classify the raw data in a reasonable
information utilizing hierarchical guidelines, rankings,
way. Clustering is a bunch of physical or abstract
and coordination. Utilizing this, we infer information
objects, as per the degree of similarity between them,
that clarifies the presentation of the understudies
divided into several groups, and makes the same data
who took the test and every one of their subtleties.
objects within a groups of high similarity and different
The first of these numerous information groups of data objects which are not similar.
assignments is to order essential data, investigate
groups, and arrange them coherently. A gathering is a
gathering of unmistakable and immaterial articles,
separated into many gatherings at a similar level
among them, and doing likewise in bunches that are
II. RELATED WORK
firmly identified with various gatherings of various Abdallah Namoun et.al[1] proposed
species. accomplishing learning results is estimated
principally by the exhibition of the grades (f i.e.,
There are many tremendous improvement
level) and the accomplishment scores (i.e., grades).
research interests in using data mining in educational
Audit and investigation of AI machines were
sector. This modern emerging sector, called
regularly used to exhibit understudy execution. At
educational Mining data, affected by improved
long last, online understudy exercises, evaluation
methods that extract knowledge of data come from
scores, and understudy feelings were the main
the educational sector. Data mining is a technique of
indicators of accomplishment. Distinguish the
sorting which is actually used to extract hidden
attributes and strategies that decide the exhibition
patterns from huge databases. This concepts and
of understudies utilizing the PICO strategy. This
methods can be applied in various fields like
evaluation has had many difficulties, as it is by and
marketing, medicine, real estate, customer
large wide, not zeroed in on the utilization of
relationship management, engineering, web mining,
understudy input as a benchmark for understudy
etc.
execution, experienced quality issues, and has not
Educational data mining is a new emerging or been distributed in the most as often as possible
advanced technique of data exploration that can be posed inquiries.
applied to data related to the field of education. The
Durgesh Ugale et.al[2] discussed the handling step
data can be collected from past used data and
will be applied to the crude informational index to
operational data reside in the databases of
appropriately utilize the extraction calculation. The
educational institutes. The data of students can be
"execution" of guiding an understudy can assist
personal information or academic performance.
with working on their presentation. The answer for
Exploration that can be applied to data related to the
the issue is very distinct. He didn't remark on the
field of education systems which have a huge amount
understudies' reactions. You didn't talk about the
of data and information used by most institutes. It
cutoff. You didn't analyze the idea of the
uses many techniques for proper implementation of
prescience.
data mining concepts such as K-means Clustering and
K-Nearest neighbour. Using these techniques Ali Salah Hashim et.al[3] In this review, we analyzed
the exhibition of machine-controlled calculations.
45
As per the consequences of the overview, the The request in which the examination report is set
coordination’s grouping was the most right in up is simple and quick to foresee. It functions
picking the last score of the understudy (68.7% of admirably in numerous classes of speculations.
champs and 88.8% of washouts). Depict the Since SVMs are known as terrible number
strategies utilized in insightful exploration to crunchers, yield as expectations ought not be
anticipate understudy execution. You have not viewed as significant.
appraised the nature of the preparation. You have
C. Verma et.al[7] discussed the outcomes show that
not thought about the models.
understudy interest and GPA in the principal
Alnassar, Fatema et.al[4] talks about the semester drove the whole choice interaction, and
connection between understudy association law the Bayesian organization is better than the choice
calculation, K-key calculation, and choice tree. This tree because of its high precision. It works better as
review looks at understudy execution dependent far as pay level than the quantity of factors. In the
on various attributes. The design incorporates event that you attempt a little example, the
questions and replies to illustrations, little and last information might be excessively high or
experimental outcomes, schoolwork, and lab work. excessively high.
We talk about dynamic trees, data mining Ferda Ünal[8] proposed this paper exhibits the
techniques, and a blend of strategies that empower utilization of information recovery innovation to
understudies to anticipate understudy execution, distinguish eventual outcomes dependent on
and educators can make significant strides in understudy history. In the exploration, three
creating understudy information. execution. The techniques for information mining (choice tree,
presentation of different tree calculations can be non-repeating memory, and naive Bayes) were
investigated dependent on its exactness and timing utilized in two lines of information science and
of tree conveyance. Speculations eliminated from Portuguese. The outcomes show that mining
the framework assisted the educator with strategies are valuable in choosing understudy
distinguishing understudies with incapacities and execution. Build up clear guidelines for anticipating
further develop execution. Knowing the quantity of preparing data. With the progression of innovation,
gatherings ahead of time. It is hard to know the e-learning as a web-based learning webpage and
quantity of gatherings when there is a slight change progressed mixed media innovation, preparing
in the information. costs have been decreased and time and difficulties
have been eliminated.
Saifuzzaman et.al[5] discussed outcomes show that
calculations and different techniques are utilized to III. PROPOSED WORK
acquire data from instruction. A significant number
As of now, the current framework just
of these calculations are significant for considers the deficiencies of the framework, which
characterizing and controlling data. In this review, assists with surveying understudy execution.We
the calculations C5.0, C4.5, and K-esteem were don't have a framework to assist us with organizing
distinguished. At 48% of the locales, these three our exercises, and we would rather not be slow-
uncovering calculations are broadly utilized for witted.But now they only take into consideration
information examination, particularly in instruction. the constitution that the performance of the
He wrote about the general accomplishment of the system is not sufficient to have a system that can
understudies. The gauge study is featured with help us assess the performance of the student. We
practically no huge clarification. won't do it for the reason of having no future for
the performance of the, for our help, and they find
Leena H. Alamri et.al[6] proposed consequences of that the unwanted fit into its own works.
the SVM and RF calculations utilized in the two
segments show that the honesty of the two parts is The venture intends to foster a solid model
utilizing data mining innovation and the necessary
up to 93%, while the least RMSE is 1.13 for the RF.
46
extraction, with the goal that this instruction can be STEP 2: Input: D = {(x1, c1), . . . , (xN , cN )}
considered as an essential administration tool. STEP 3: x = (x1. . . xn) new instance to be
The proposed framework utilizes mining classified
methods to assess execution and distinguish STEP 4: FOR each labelled instance (xi, ci)
undesirable practices. calculate d (xi, x)
STEP 5: Order d (xi , x) from lowest to
In the field of schooling, information mining is
highest, (i = 1. . . N)
utilized in an assortment of ways, including
STEP 6: Select the K nearest instances to x:
understudy cooperation, worker input, uncommon
Dkx
exercises, and stress. Information mining
STEP 7: Assign to x the most frequent class
procedures are utilized to decide understudy
in Dkx
execution utilizing the KNN and NAÏVE BAYES
STEP 8: END
calculations.
SYSTEM
IV.
ARCHITECTURE
B. NAIVE BAYES ALGORITHM
It is a classification technique based on Bayes
Theorem with an assumption of independence among
predictors. In simple terms, a Naive Bayes classifier
assumes that the presence of a particular feature in a
class is unrelated to the presence of any other
feature. For example, a fruit may be considered to be
an apple if it is red, round, and about 3 inches in
diameter. Even if these features depend on each
other or upon the existence of the other features, all
of these properties independently contribute to the
probability that this fruit is an apple and that is why it
is known as ‘Naive’.
STEP 1: BEGIN
47
staff opinion, extracurricular activities, ragging etc.
will be collected.
B. Preprocessing
Adapting to data kills commotion and
struggle. The determination of highlights should
begin working before use. Data pre-processing is
done to remove the noisy and inconsistent data.
Data should be pre-processed before utilizing it in
feature selection job.
Above,
VI. MODULES
A. Data Collection
This module assists you with gathering
understudy data. Understudy scores, participation,
staff criticism, unprecedented exercises, stress, and
significantly more. In this module student’s data
will be gathered from the respective educational
institution. Student’s data like mark, attendance,
Fig. 2. Dataset.
48
This data approach student achievement in
secondary education of two schools. The data
attributes include student grades, demographic, social
and school related features) and it was collected by
using school reports and questionnaires. Two datasets
are provided regarding the performance, the two
datasets were modeled under binary/five-level
classification and regression tasks.
B. Input
Fig. 4. Prediction.
D. Graph
Fig. 3. Input.
C. Prediction
Fig. 5. Graph.
VIII. CONCLUSION
49
In the news story, the arranging capacity is Classify Students’ Performance on Virtual Learning
utilized to illuminate the understudy division Environment." In 2021 2nd International Conference
utilizing the fundamental chronicles displayed in on Intelligent Engineering and Management (ICIEM),
the understudy files. pp. 1-6. IEEE, 2021.
In view of key regions, like participation, school [5] Saifuzzaman, M., Parvin, M., Jahan, I., Moon, N.N.,
tests, studios, and results gave, the understudy has Nur, F.N. and Shetu, S.F., 2021, June. Machine
finished the grades. Essential exploration helps Learning Approach to Predict SGPA and CGPA. In 2021
understudies and educators further develop International Conference on Artificial Intelligence and
understudy privileges. Computer Science Technology (ICAICST) (pp. 211-216).
IEEE.
Administration preparing additionally centers
around distinguishing understudies who need to
zero in on the best way to assist them with finishing
[6] H. Alamri, Leena, Ranim S. Almuslim, Mona S.
the following test. This assists the understudies
Alotibi, Dana K. Alkadi, Irfan Ullah Khan, and Nida
with working on their learning and perform well in
Aslam. "Predicting Student Academic Performance
their tests.
using Support Vector Machine and Random Forest."
In 2020 3rd International Conference on Education
Technology Management, pp. 100-107. 2020.
IX. REFERENCES
[1] Namoun, Abdallah, and Abdullah Alshanqiti.
"Predicting student performance using data mining [7] Verma, C., Illés, Z. and Stoffová, V., 2019, February.
and learning analytics techniques: a systematic Age group predictive models for the real time
literature review." Applied Sciences 11, no. 1 (2021): prediction of the university students using machine
237. learning: Preliminary results. In 2019 IEEE
International Conference on Electrical, Computer and
Communication Technologies (ICECCT) (pp. 1-7). IEEE.
[2] Ugale, Durgesh, Jeet Pawar, Sachin Yadav, and
Chandrashekhar Raut. "STUDENT PERFORMANCE
PREDICTION USING DATA MINING TECHNIQUES." [8] Ünal, Ferda. "Data mining for student performance
(2020). prediction in education." Data Mining-Methods,
Applications and Systems (2020).
50
performers in scientific literacy using a machine
learning approach. Research in Science
Education, 51(1), pp.129-158.
51
52