Nothing Special   »   [go: up one dir, main page]

1 s2.0 S235291481830217X Main

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Informatics in Medicine Unlocked 16 (2019) 100203

Contents lists available at ScienceDirect

Informatics in Medicine Unlocked


journal homepage: www.elsevier.com/locate/imu

Improving the accuracy of prediction of heart disease risk based on T


ensemble classification techniques
C. Beulah Christalin Latha, S. Carolin Jeeva∗
Karunya Institute of Technology and Sciences, India

ARTICLE INFO ABSTRACT

Keywords: Machine learning involves artificial intelligence, and it is used in solving many problems in data science. One
Heart disease common application of machine learning is the prediction of an outcome based upon existing data. The machine
Machine learning learns patterns from the existing dataset, and then applies them to an unknown dataset in order to predict the
Ensemble classifier outcome. Classification is a powerful machine learning technique that is commonly used for prediction. Some
Prediction model
classification algorithms predict with satisfactory accuracy, whereas others exhibit a limited accuracy. This
paper investigates a method termed ensemble classification, which is used for improving the accuracy of weak
algorithms by combining multiple classifiers. Experiments with this tool were performed using a heart disease
dataset. A comparative analytical approach was done to determine how the ensemble technique can be applied
for improving prediction accuracy in heart disease. The focus of this paper is not only on increasing the accuracy
of weak classification algorithms, but also on the implementation of the algorithm with a medical dataset, to
show its utility to predict disease at an early stage. The results of the study indicate that ensemble techniques,
such as bagging and boosting, are effective in improving the prediction accuracy of weak classifiers, and exhibit
satisfactory performance in identifying risk of heart disease. A maximum increase of 7% accuracy for weak
classifiers was achieved with the help of ensemble classification. The performance of the process was further
enhanced with a feature selection implementation, and the results showed significant improvement in prediction
accuracy.

1. Introduction such as eating habits, physical inactivity, and obesity are also con-
sidered to be major risk factors [5,8,15]. There are different types of
One of the prominent diseases that affect many people during heart diseases such as coronary heart disease, angina pectoris, con-
middle or old age is heart disease, and in many cases it eventually leads gestive heart failure, cardiomyopathy, congenital heart disease, ar-
to fatal complications [3]. Heart diseases are more prevalent in men rhythmias, and myocarditis. It is difficult to manually determine the
than in women. According to statistics from WHO, it has been estimated odds of getting heart disease based on risk factors [1]. However, ma-
that 24% of deaths due to non-communicable diseases in India are chine learning techniques are useful to predict the output from existing
caused by heart ailments [12,19]. One-third of all global deaths are due data. Hence, this paper applies one such machine learning technique
to heart diseases [10]. Half of the deaths in the United States and in called classification for predicting heart disease risk from the risk fac-
other developed countries are due to heart ailments [18]. Around 17 tors. It also tries to improve the accuracy of predicting heart disease risk
million people die due to cardiovascular disease (CVD) every year using a strategy termed ensemble.
worldwide, and the disease is highly prevalent in Asia [2,12,13]. The
Cleveland Heart Disease Database (CHDD) is considered the de facto 2. Literature review
database for heart disease research [17].
Age, sex, smoking, family history, cholesterol, poor diet, high blood Machine learning or data mining is useful for a diverse set of pro-
pressure, obesity, physical inactivity, and alcohol intake are considered blems. One of the applications of this technique is in predicting a de-
to be risk factors for heart disease, and hereditary risk factors such as pendent variable from the values of independent variables. The
high blood pressure and diabetes also lead to heart disease. Some risk healthcare field is an application area of data mining since it has vast
factors are controllable. Apart from the above factors, lifestyle habits data resources that are difficult to be handled manually. Heart disease


Corresponding author.
E-mail address: caroljeeva@gmail.com (S.C. Jeeva).

https://doi.org/10.1016/j.imu.2019.100203
Received 26 November 2018; Received in revised form 30 June 2019; Accepted 1 July 2019
Available online 02 July 2019
2352-9148/ © 2019 Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
C.B.C. Latha and S.C. Jeeva Informatics in Medicine Unlocked 16 (2019) 100203

has been identified as one of the largest causes of death even in de- Table 1
veloped countries [20]. One of the reasons for fatality due to heart Feature information of the cleveland dataset.
disease is due to the fact that the risks are either not identified, or they S.No Attribute Description Range of
are identified only at a later stage. However, machine learning tech- Name Values
niques can be useful for overcoming this problem and to predict risk at
1 Age Age of the person in years 29 to 79
an early stage. Some of the techniques used for such prediction pro-
2 Sex Gender of the person [1: Male, 0: 0, 1
blems are the Support Vector Machines (SVM), Neural Networks, De- Female]
cision Trees, Regression and Naïve Bayes classifiers. SVM was identified 3 Cp Chest pain type [1-Typical Type 1 1, 2, 3, 4
as the best predictor with 92.1% accuracy, followed by neural networks Angina
with 91% accuracy, and decision trees showed a lesser accuracy of 2- Atypical Type Angina
3-Non-angina pain
89.6% [23]. Sex, age, smoking, hypertension, and diabetes were con-
4-Asymptomatic)
sidered to be the risk factors for heart disease [22]. Analytical studies 4 Trestbps Resting Blood Pressure in mm Hg 94 to 200
on data mining techniques for heart disease prediction reveal that 5 Chol Serum cholesterol in mg/dl 126 to 564
neural networks, decision trees, Naïve Bayes and associative classifi- 6 Fbs Fasting Blood Sugar in mg/dl 0, 1
7 Restecg Resting Electrocardiographic Results 0, 1, 2
cation are powerful in predicting heart disease. Associative classifica-
8 Thalach Maximum Heart Rate Achieved 71 to 202
tion produces a high accuracy and strong flexibility as compared with 9 Exang Exercise Induced Angina 0, 1
traditional classifiers, even in handling unstructured data [7,8]. 10 OldPeak ST depression induced by exercise 1 to 3
A comparative analysis of classification techniques has shown that relative to rest
decision tree classifiers are simple and accurate [9]. Naïve Bayes was 11 Slope Slope of the Peak Exercise ST segment 1, 2, 3
12 Ca Number of major vessels colored by 0 to 3
found to be the best algorithm, followed by neural networks and de-
fluoroscopy
cision trees [7]. Artificial neural networks are also employed for the 13 Thal 3 – Normal, 6 – Fixed Defect, 7 – 3, 6, 7
prediction of diseases. Supervised networks have been used for diag- Reversible Defect
nosis and they can be trained using the Back Propagation Algorithm. 14 Num Class Attribute 0 or 1
The experimental results have shown satisfactory accuracy [16].
The existing research has used ensemble methods to improve clas-
sification accuracy in prediction of heart disease [2]. A combination of 3. Materials and methods
genetic algorithms and neural networks based on fuzzy logic for feature
extraction exhibited an increase in accuracy of up to 99.97% [6]. A 3.1. Description of the dataset
genetic algorithm based trained recurrent fuzzy neural network pro-
duced an accuracy of 97.78% for diagnosing heart disease [10]. Clas- The Cleveland heart dataset from the UCI machine learning re-
sification accuracy of up to 93% was achieved in the prediction of heart pository has been used for the experiments. The dataset consists of 14
disease risk using a rough set based classification system with a dif- attributes and 303 instances. There are 8 categorical attributes and 6
ferent dataset [22]. Neural networks were also used to reduce human numeric attributes. The description of the dataset is shown in Table 1.
error in the detection and measurement of blood sugar, blood pressure, Patients from age 29 to 79 have been selected in this dataset. Male
and heart disease [15,18]. A new model coactive neuro-fuzzy inference patients are denoted by a gender value 1 and female patients are de-
system (CANFIS) combined with neural networks, fuzzy logic and ge- noted by gender value 0. Four types of chest pain can be considered as
netic algorithms, was shown to produce good results for predicting indicative of heart disease. Type 1 angina is caused by reduced blood
heart disease. The genetic algorithm was used for tuning the parameters flow to the heart muscles because of narrowed coronary arteries. Type 1
for CANFIS automatically, and for the selection of an optimal feature Angina is a chest pain that occurs during mental or emotional stress.
set. The model was shown to be a useful tool for assisting medical Non-angina chest pain may be caused due to various reasons and may
professionals in predicting heart disease [11]. In order to obtain better not often be due to actual heart disease. The fourth type,
accuracy, an additional step of feature selection has been proposed Asymptomatic, may not be a symptom of heart disease. The next at-
[19]. tribute trestbps is the reading of the resting blood pressure. Chol is the
SVM based classifiers had been shown to provide highly accurate cholesterol level. Fbs is the fasting blood sugar level; the value is as-
output for classifying heartbeats. The parameters have been optimized signed as 1 if the fasting blood sugar is below 120 mg/dl and 0 if it is
using particle swarm optimization (PSO). The performance of the above. Restecg is the resting electrocardiographic result, thalach is the
classifier was improved using PSO [1,21]. The K-means clustering al- maximum heart rate, exang is the exercise induced angina which is
gorithm was utilized to extract data from the dataset and the frequent recorded as 1 if there is pain and 0 if there is no pain, oldpeak is the ST
patterns were mined using the Maximal Frequent Itemset Algorithm depression induced by exercise, slope is the slope of the peak exercise ST
(MAFIA) for predicting heart disease based on different weightage as- segment, ca is the number of major vessels colored by fluoroscopy, thal
signed to different factors. The frequent patterns having a value greater is the duration of the exercise test in minutes, and num is the class
than a specific threshold were found to be precise in detecting the oc- attribute. The class attribute has a value of 0 for normal and 1 for pa-
currence of a myocardial infarction [18]. Though various methods were tients diagnosed with heart disease.
used for predicting heart disease risks with good accuracy in state-of-
the-art research, some classification algorithms identify heart disease
risk with poor accuracy. Most of the state-of-art research that produces 3.2. Classification and ensemble algorithms
high accuracy employs a hybrid method which include classification
algorithms. Our study described herein focused on improving the Classification is a supervised learning procedure that is used for
weakness of weak classification algorithms by combining them with predicting the outcome from existing data. This paper proposes an
other classification algorithms. This assists not only to increase the ef- approach for the diagnosis of heart disease using classification algo-
ficiency of such classification algorithms, but also the prediction ac- rithms, and to improve the classification accuracy using an ensemble of
curacy for heart disease. A research on using ensemble techniques such classifiers. The dataset has been divided into a training set and a test
as bagging, boosting, majority voting, and stacking is done and the set, and individual classifiers are trained using the training dataset. The
results are evaluated. The results are further enhanced by applying efficiency of the classifiers is tested with the test dataset. The working
feature selection. The results are a measure to indicate how these of the individual classifiers is explained in the next section.
classifiers can effectively be used in the medical field.

2
C.B.C. Latha and S.C. Jeeva Informatics in Medicine Unlocked 16 (2019) 100203

3.2.1. Bayes Net Let D be a training set = {( 1, 1), …, ( )}


The Bayesian network is a graphical prediction model based on Let = 1( ), 2( ),…, , an ensemble of weak classifiers
probability theory. Bayesian networks are built from probabilistic dis- If each is a decision tree, the parameters of the tree are defined
tributions, and they utilize the laws of probability for prediction and as = ( , , …, )
diagnosis. Bayesian networks support both discrete and continuous Each decision tree k leads to a classifier ( ) = )
variables. The network is represented as a set of variables whose con- Final Classification f(x) = Majority of ( )
ditional dependencies are described using acyclic directed graphs. In a
Fig. 1. Random forest algorithm.
Bayesian network, edges between the nodes represent dependent fea-
tures, whereas nodes that are not connected are conditionally in-
dependent. Let X be an evidence that is dependent on n attributes creates a set of decision trees from a random sample of the training set.
X = {A1,A2, ….., An). Let H be a hypothesis that the evidence belongs to It repeats the process with multiple random samples and makes a final
a class C. The probability of the hypothesis H, given the evidence X is decision based on majority voting. The Random forest algorithm is ef-
represented as P(H|X). P(X|H) is the posterior probability of X condi- fective in handling missing values but it is prone to overfitting.
tioned on H. The posterior probability can be calculated using the Bayes Appropriate parameter tuning can be applied to avoid overfitting. The
theorem as shown in equation (1). algorithm for Random forest is shown in Fig. 1.
P (H|X ) = P (X |H ) P (H )/P (X ) (1)
3.2.4. C4.5
Where P(H) is the probability of the hypothesis being True. P(X) is the
The C4.5 algorithm is derived from the ID3 algorithm, which is a
probability of the evidence. P(X|H) is the probability of the evidence
simple decision tree algorithm. The algorithm was proposed by
given that hypothesis is True and P(H|X) is the probability of the hy-
Quinlan. It uses information gain ratio as the metric for splitting the
pothesis given that the evidence is present.
trees. It accepts data as input and produces a decision tree as output.
This algorithm creates univariate trees. Classification rules are framed
3.2.2. Naive Bayes in the form of decision trees. The splitting of trees is halted when the
The Naïve Bayes classifier or simply, the Bayesian classifier, is based split is below a certain threshold value. It performs error based pruning
on the Bayes theorem. It is a special case of the Bayesian network, and it and it is a good algorithm for handling numeric attributes. The algo-
is a probability based classifier. In the Naïve Bayes network, all features rithm for generating a decision tree from training tuples using C4.5
are conditionally independent. The changes in one feature therefore algorithm is shown in Fig. 2.
does not affect another feature. The Naïve Bayes algorithm is suitable
for classifying high dimensional datasets. The classifier algorithm uses
conditional independence. Conditional independence assumes that an 3.2.5. Multilayer perceptron
attribute value is independent of the values of the other attributes in a Artificial neurons are used in multiple layers including hidden
class. layers in the multilayer perceptron algorithm. These algorithms are
Let D be a set of training data and associated class labels. Each tuple used for binary classification problems. A perceptron uses an activation
in the dataset is defined with n attributes that are represented by function for each neuron. Multilayer perceptrons are algorithms
X = {A1, A2, ….., An). Let there be m classes represented by C1, C2, evolved from biological neurons. They use artificial neurons that are
…Cm. For a given tuple X, the classifier predicts that X belongs to the called perceptrons. The activation function maps the weighted inputs of
class having the highest posterior probability, conditioned on X. The each neuron and reduces the number of layers to two layers. A per-
Naïve Bayes classifier predicts that the tuple X belongs to the class Ci if ceptron learns by varying the weights assigned to it. The algorithm for a
and only if multilayer perceptron is shown in Fig. 3.
P (Ci |X ) > P (Cj |X ) for 1 j m, j i (2)
3.2.6. PART
Thus, P (Ci |X ) is maximized. The class Ci for which P (Ci |X ) is
PART is the acronym for Projective Adaptive Resonance Theory.
maximized is called the maximum posteriori hypothesis. According to
PART is a rule-based classification algorithm. It is a neural network
Bayes' theorem,
developed by Cao and Wu. It is an advanced version of the C4.5 and
P (X |Ci ) P (Ci ) RIPPER algorithms. The PART algorithm is suitable for high dimen-
P (Ci |X ) =
P (X ) (3) sional datasets. The key feature of the PART network lies is the presence
of a hidden layer of neurons, which calculate the variations between the
If the attribute values are conditionally independent of one another,
output and input neurons, and work on reducing the similarity differ-
n ences.
P (X |Ci ) = P (xk |Ci )
k=1 (4)
Let N be a node
Where xk refers to the value of attribute Ak for tuple X . Let D be a tuple in class C and X be the set of attributes = { 1, 2,…, 3}
If Ak is categorical, then P (xk |Ci ) is the number of tuples of class Ci if
return = , where L is a leaf node
in D having the value xk for Ak , divided by |Ci, D |, the number of tuples
if = then
of class Ci in D . The classifier predicts the class label of X is the class Ci return N as a leaf node labeled with the majority class in D
if and only if, Split N with the best splitting criterion
P (X |Ci ) P (Ci ) > P (X |Cj ) P (Cj ) for 1 j m , j i (5) for each splitting criterion j
– set of tuples satisfying j
if = then
Bayesian classifiers are effective in the sense that they have the
attach a leaf labeled with the majority class in D to node N;
minimum error rate for classification. else
attach the node returned by Generate decision tree(Dj, attribute list) to node N;
end for
3.2.3. Random forest
return N
Random forest is a tree based classification algorithm. As the name
indicates, the algorithm creates a forest with a large number of trees. It
is an ensemble algorithm which combines multiple algorithms. It Fig. 2. C4.5 algorithm.

3
C.B.C. Latha and S.C. Jeeva Informatics in Medicine Unlocked 16 (2019) 100203

Initialize weights and biases in N, where N is the Network Let D={d1,d2,d3, … dn} be the given dataset
while condition is true {
for each training tuple X in D { E = {}, the set of ensemble classifiers
for each input layer unit j { C = {c1, c2, c3, …cn}, the set of classifiers
=
for each hidden or output layer unit j {
X = the training set, X D
= +
Y = the test set, Y D
L = n(D)
1
= Let init = 1
1+
for each unit j in the output layer
S(init)=A random subset of X; S(init) X
= (1 )( ) M(0) = { }
for each unit j in the hidden layers, from the last to the first hidden layer fori =1 to L do
= (1 ) if i>1
for each weight wij in N { s(i) = Set of incorrectly classified instances of M(i-1) + S(i)
= M(i) = Model trained using C(i) on S(i)
= + } E = E C(i)
for each bias in N {
= end if
= + next i
}} for i = 1 to L
R(i) = Y classified by E(i)
Fig. 3. Multilayer perceptron algorithm. next i
Result = max(R(i): i=1,2, …, n)
3.3. Ensemble techniques
Fig. 5. Algorithm for boosting.
Ensemble is a strategy that can be used to improve the accuracy of a
classifier. It is an effective meta classification technique that combines created training set will have the same number of patterns as the ori-
weak learners with strong learners to improve the efficacy of the weak ginal training set with a few omissions and repetitions. The new
learner. In this paper, the ensemble technique is used to improve ac- training set is known as Bootstrap replicate. In bagging, bootstrap sam-
curacy of various algorithms for heart disease prediction. The aim of ples are fetched from the data and the classifier is trained with each
combining multiple classifiers is to obtain better performance as com- sample. The voting from each classifier is combined, and the classifi-
pared with an individual classifier. The procedure for ensemble is cation result is selected based on majority voting or averaging.
shown in Fig. 4. Research shows that bagging can be used to increase the performance of
a weak classifier optimally. The algorithm for bagging is shown in
3.3.1. Boosting Fig. 6.
Boosting is an algorithm used for ensembling. In boosting, the ori- Bagging decreases the variance of prediction, since it generates
ginal dataset is divided into various subsets. The classifier is trained multiple sets of data from random samples of the original dataset, with
with the subset to produce a series of models of moderate performance. replacement.
New subsets are created based on the elements that are not correctly
classified by the previous model.
3.3.3. Stacking
Then, the ensembling process boosts their performance by com-
Stacking is an ensemble technique in which multiple classification
bining the weak models together using a cost function. The algorithm
models are combined via a meta classifier. Multiple layers are placed
for boosting is shown in Fig. 5.
one after the other, where each of the models pass their predictions to
the model in the layer above, and the model in the topmost layer makes
3.3.2. Bagging decisions based on the models below. The bottom layer models receive
Bagging is also known as bootstrap aggregation. Bagging randomly input features from the original dataset. The top layer model takes the
selects some patterns from the training set with replacement. The newly

Training Set 1 Training Set 2 Training Set 3


Let D={d1,d2,d3, … dn} be the given dataset
E = {}, the set of ensemble classifiers
C = {c1, c2, c3, …cn}, the set of classifiers
Classi�ier 1 Classi�ier 2 Classi�ier 3
X = the training set, X D
Y = the test set, Y D
Test Set
L = n(D)
for i =1 to L do
S(i) = {Bootstrap sample I with replacement} I X
M(i) = Model trained using C(i) on S(i)
Combined Results
Predictions in
E = E C(i)
using Averaging,
Majority Voting or the test Set next i
Weighted Averaging for i = 1 to L
R(i) = Y classified by E(i)
Feature selection next i
Result = max(R(i): i=1,2, …, n)

Fig. 4. The ensemble process. Fig. 6. Algorithm for bagging.

4
C.B.C. Latha and S.C. Jeeva Informatics in Medicine Unlocked 16 (2019) 100203

The results show that weak classifiers can perform better when they are
Let D={d1,d2,d3, … dn} be the given dataset ensembled. The Weka tool is used for classification of the dataset.
E = {E1, E2, E3, …En}, the set of ensemble classifiers At the outset, the dataset is cleaned and preprocessed for missing
C = {c1, c2, c3, …cn}, the set of classifiers data and invalid data. Then, classifiers such as SVM, Naive Bayes, Bayes
X = the training set, X D Net, C4.5, Multilayer Perceptron and PART are used for the classifica-
tion of the dataset. C4.5, Multilayer Perceptron and PART are found to
Y = the test set, Y D be weaker than the classifiers such as Naive Bayes, Random forest and
K = meta level classifier Bayes Net. Since, ensemble is a proven strategy for boosting the clas-
L = n(D) sification accuracy, the weak learners are tested with the meta classi-
for i =1 to L do fication algorithms. Three types of techniques, namely, bagging,
boosting and stacking are used for ensembling, and the results are
M(i) = Model trained using E(i) on X analyzed. Ten-fold cross validation was utilized to evaluate the per-
Next i formance of the classification models. In this approach, the entire da-
M=M K taset is divided into ten subsets and processed ten times where, nine
Result = Y classified by M subsets are used as testing sets and the remaining subset is used as
training. Finally, the results are obtained by averaging each ten itera-
tions.
Fig. 7. Algorithm for stacking.
Fig. 9 compares the classification accuracy of individual classifiers
and with bagging. When the dataset is classified using individual clas-
output from the bottom layer and makes the prediction. The algorithm sifiers, the accuracy rates of Naïve Bayes, Random forest, Bayes Net,
for stacking is shown in Fig. 7. In stacking, the original data is provided C4.5, Multilevel Perceptron and PART are found in the range of
as input to several individual models. Then the meta classifier is used to 75.58%–83.17%. The Naïve Bayes classifier exhibits the best accuracy
estimate the input together with the output of each model and the of 83.17% whereas C 4.5, Multilevel Perceptron and PART show com-
weights are estimated. The best performing models are selected and the paratively poor accuracy of less than 80%. It has been inferred from the
others are discarded. Stacking combines multiple base classifiers results that the bagging technique can increase the classification ac-
trained by using different learning algorithms L on a single dataset S, by curacy by up to 6.92%.
means of a meta classifier. Fig. 10 shows the results of the ensemble technique, namely
boosting. There was an increase of 0.99% for the Naïve Bayes algo-
3.3.4. Majority vote rithm, 1.65% for the Bayes Net, 1% for multilayer perceptron and
The majority voting classifier is a meta classifier that is used to 5.94% for PART through boosting. The Naïve Bayes algorithm produced
combine any classifier through majority voting. The final class label the highest accuracy value with boosting.
would be the class label that had been predicted by a majority of the Majority voting is another ensemble strategy that combines multiple
classifiers. The final class label dJ is defined as classifiers in order to improve their accuracy. In the proposed approach,
dJ = mode {C1, C2, …, Cn} for the Cleveland dataset, C4.5, multilayer perceptron and PART clas-
sifiers turned out to be weak classifiers and they showed less accuracy.
Where {C1, C2, …, Cn} represents the individual classifiers that partici- Naïve Bayes and Bayes Net performed well and had better classification
pate in the voting. The majority voting algorithm is shown in Fig. 8. accuracy. It is inferred from Fig. 11 that an ensemble of weak classifiers
with strong classifiers using majority voting improves the accuracy of
4. Experiments and results the weak classifier to a considerable extent. Ensembling C4.5 with the
strong classifiers improved the accuracy by 3.3%. Ensembling PART
4.1. Performance of the classifier with ensemble with the strong classifier set improved the accuracy by 7.26%. En-
sembling multilayer perceptron with the strong classifier set improved
A comparative analysis of various classification algorithms on the the accuracy by 3.65%.
Cleveland dataset has been performed. Some algorithms show good Stacking is a methodology used for ensembling in which one or
accuracy whereas some other algorithms perform poorly. In order to more base level classifiers are stacked with a meta level classifier. In
improve the performance of the weak classifiers, ensemble algorithms this paper, random forest and random tree classifiers are used as meta
are used. This work has used ensemble algorithms such as bagging, classifiers. Naïve Bayes, Bayesian Network, C4.5 and PART are chosen
boosting, voting, and stacking. The Bagging algorithm performs an as the base level classifiers. It is inferred from Fig. 12 that stacking with
ensemble with the Naïve Bayes, Random Forest, Bayes Net, C4.5, random forest produces better accuracy than stacking with random tree
multilayer perceptron and PART algorithms. For boosting, the as the base classifier.
Adaboost.M1 algorithm has been used. For the present study, ensembles While stacking with random trees, the PART algorithm alone
are created using Naïve Bayes, Random Forest, Bayes Net, C4.5, mul- showed an increase in accuracy by 1.98% whereas all of the other al-
tilayer perceptron and PART classifiers for boosting. Majority voting gorithms showed a decline of up to 2.96%. However, when stacked with
has also been used as one of the ensemble techniques. In stacking, the random forest as the base classifier, the accuracy of the weak classifiers
Naïve Bayes classifier is used as the meta classifier, and the results are improved. The accuracy of the Bayesian network improved by 0.99%,
obtained by stacking one, two, and three more classifiers, respectively. C4.5 by 3.3%, Multilayer perceptron by 3.64% and PART by 6.93%. It is
inferred that when the weak classifiers are stacked with random forest,
Let be the prediction of the ith classifier on a class with j labels the accuracy is higher than when they are stacked with random tree.
= A comparative analysis of bagging and boosting is shown in Fig. 13.
= 1,
=1 =1 The results show that both bagging and boosting are efficient in in-
The ensemble classifier’s probability for the decision to be better is creasing the accuracy of weak classifiers. Bagging shows a better im-
=
= ()
2
+1
() (1 provement for all weak classifiers. The computation time is calculated
as the average of 100 runs. The unit is in seconds. A comparison of the
computation time of the classifiers with bagging and boosting techni-
ques are shown in Table 2.
Fig. 8. Algorithm for majority voting. A comparison of the various ensembling strategies reveal that the

5
C.B.C. Latha and S.C. Jeeva Informatics in Medicine Unlocked 16 (2019) 100203

86

84

82

80

Accuracy % 78
Base
76
classifier
74 Bagging
72

70
NB Bayes Net C4.5 Multilayer PART
Preceptron
Classifier

Fig. 9. Improvement in accuracy of classifiers with bagging.

84
accuracy of the weak classifiers can be increased by a maximum of 82
7.26%. The maximum increase in accuracy of a weak classifier with Accuracy % 80
various ensembling techniques is shown in Fig. 14. The results show 78
76
that ensemble is a good strategy for improving the accuracy of weak 74
classifiers, and majority voting produces the highest increase in accu- 72
70
racy.

4.2. Performance enhancement using feature selection

The accuracy of the classifiers are further improved using feature


selection [4]. Six sets of features were selected for the evaluation of the Classifier
performances. The attributes ‘age’ and ‘Sex’ are considered as the per-
Fig. 11. Classifiers with majority voting.
sonal information of the patient and the remaining 11 attributes are
collected from the medical observation of the patient. The Brute force
method is applied to limit the lower bound with a minimum of 3 at- description of the features are shown below:
tributes [14]. In this work, all of the possible combinations of 3 attri-
butes from the 13 attributes were selected, and each combination was FS1 = {sex, cp, fbs, restecg, oldpeak, ca, thal}
tested with the classifiers. Secondly, the experiment was repeated to FS2 = {age, sex, cp, chol, fbs, exang, oldpeak, slope, ca}
select the possible combination of 4 attributes from the total 13 attri- FS3 = {sex, cp, fbs, thalach, exang, slope, ca, thal}
butes. FS4 = {sex, cp, thalach, exang, oldpeak, ca}
The maximum number of combination from 13 attributes, without FS5 = {age, sex, cp, chol, restecg, oldpeak, slope, ca, thal}
considering the empty set, is represented as 2n 1. In this experiment, FS6 = {sex, cp, trestbps, fbs, restecg, thalach, exang, oldpeak, slope,
the combination less than 3 attributes are omitted. The total number of ca, thal}
combination is derived as follows.
Total number of combination The improvement in accuracy of bagging with feature selection is
tabulated in Table 3.
n! n! n2 + n The highest increase in accuracy of 2.31% was observed for the C4.5
2n ( ) ( ) 1 = 2n ( + 1)
1! (n 1)! 2!(n 2)! 2 classifier with bagging, with feature set 1. The accuracy of the multi-
Where n represents 13 attributes. layer perceptron was increased by 0.66% by feature sets FS4 and FS6.
The features are named as FS1, FS2, FS3, FS4, FS5 and FS6. The The accuracy of the random forest classifier was increased by 1.65%

86
84
82
Accuracy %

80
78 Base
76 Classifier
74 Boosting
72
70
NB Bayes Net C4.5 Multilayer PART
Preceptron
Classifier
Fig. 10. Improvement in accuracy of classifiers with boosting.

6
C.B.C. Latha and S.C. Jeeva Informatics in Medicine Unlocked 16 (2019) 100203

84
82
80
Accuracy %
78
76 Stacking with RF
74
Stacking with RT
72
70

Fig. 12. Classifier with stacking.

with feature set FS6. Table 2


It has been inferred from the results that the accuracy of boosting Comparison of Computation time of Bagging and Boosting.
was improved by a maximum of 3.97% with the C 4.5 classifier and Classification Without Ensembling With Bagging With Boosting
feature set FS6. The maximum increase in accuracy in boosting with Algorithm (secs) (secs) (secs)
random forest was recorded as 3.3% with feature set FS6. For boosting
with multilayer perceptron there has been an increase of 1.32% with Naïve Bayes 0.04 0.03 0.21
Bayes Net 0.04 0.11 0.13
feature set FS4. For the Naïve Bayes classifier with boosting there was
C 4.5 0.04 0.46 0.3
an increase of 0.33% with feature set FS6. Thus, feature set FS6 has Multilayer Perceptron 2.8 8.06 14.36
been effective in increasing the prediction rate of classifiers C 4.5, PART 0.09 0.45 0.99
Random forest, and Naïve Bayes. The results are tabulated in Table 4.
Feature selection shows improvement in majority voting also.
Majority voting of Naïve Bayes, Bayes Net, Random Forest and applied to stacking with Random Tree. The results are shown in
Multilayer Perceptron was improved by all of the feature sets; the Table 5.
highest increase in accuracy was 3.29%, with feature set FS2. The in- The highest increase of 4.63% was observed when feature set FS2
crease in accuracy of the majority voting of Naïve Bayes, Bayes Net, was applied to the stack of Naïve Bayes, Bayes Net, C4.5, PART and
Random Forest and Multilayer Perceptron is shown in Fig. 15. MLP with random tree. The comparison of the proposed model with the
The maximum increase in accuracy was achieved by feature set FS2. existing model is shown in Table 6.
The increase in accuracy of majority voting of Naïve Bayes, Bayes Net,
Random Forest and C 4.5 is shown in Fig. 16. 5. Conclusion
The maximum increase in accuracy was caused by feature set FS6.
The increase in accuracy of the majority voting of Naïve Bayes, Bayes This paper analyses the accuracy of prediction of heart disease using
Net, Random Forest and PART is shown in Fig. 17. an ensemble of classifiers. The Cleveland heart dataset from the UCI
Feature selection shows improvement in stacking also. Feature set machine learning repository was utilized for training and testing pur-
FS3 increases the accuracy of stacking Naïve Bayes, 201Bayesian Net, poses. The ensemble algorithms bagging, boosting, stacking and ma-
C4.5 and PART with Random Forest by 0.94%. However, significant jority voting were employed for experiments. When bagging was used,
improvements in accuracy were observed when features selection was the accuracy was improved by a maximum of 6.92%. When boosting

86
84
82
Accuracy (%)

80
78
76
74
72
70
Naïve Bayes Bayes Net C 4.5 Multilayer PART
Perceptron
Classi ication Algorithms

Original Accuracy Bagging Boosting


Fig. 13. Comparative analysis of bagging and boosting.

7
C.B.C. Latha and S.C. Jeeva Informatics in Medicine Unlocked 16 (2019) 100203

8
7

Maximum Increase in
6

Accuracy (%)
5
4
3
2
1
0
Bagging Boosting Majority Voting Stacking
Ensembling Techniques

Fig. 14. Comparison of ensembling methods.

Table 3
Improvement in bagging accuracy with feature selection.
85

Percentage in Accuracy
Algorithm Bagging Increase in accuracy with Feature Set 84
Accuracy Feature Selection
83
C4.5 79.87 82.18 FS1
Random Forest 80.53 82.18 FS6
Random Forest 80.53 81.52 FS2
82
Multilayer Perceptron 81.52 82.18 FS6
Multilayer Perceptron 81.52 82.18 FS4 81
Multilayer Perceptron 81.52 81.85 FS3 FS3 FS5 FS6
Bayes Net 84.16 84.82 FS1
Naïve Bayes 84.16 84.49 FS6
Feature Set

Majority Voting With Feature Selection


Table 4
Improvement in Boosting Accuracy with feature selection.
Fig. 16. Increase in Accuracy of Majority Voting of NB, BN, RF and C4.5 using
Algorithm Boosting Increase in Accuracy with Feature Set Feature Selection.
Accuracy Feature Selection

C 4.5 75.9 79.87 FS6


86
C 4.5 75.9 79.21 FS3
Percentage of Accuracy

C 4.5 75.9 78.22 FS2


C 4.5 75.9 77.23 FS5 85
C 4.5 75.9 76.57 FS4
Random Forest 78.88 82.18 FS6 84
Random Forest 78.88 80.86 FS2
Random Forest 78.88 80.86 FS3 83
Random Forest 78.88 79.87 FS5
Multilayer Perceptron 79.54 80.86 FS4 82
Multilayer Perceptron 79.54 80.53 FS5
Naïve Bayes 84.16 84.49 FS6 81
FS2 FS3 FS4 FS6
Feature Set

86
Majority Voting With Feature Selection
Percentage of Accuracy

85
84
Fig. 17. Increase in Accuracy of Majority Voting of NB, BN, RF and PART using
83
Feature Selection.
82
81
80 was used, the accuracy was improved by a maximum of 5.94%. When
FS1 FS2 FS3 FS4 FS5 FS6 the weak classifiers are ensembled with majority voting, the accuracy
Feature Set was improved by a maximum of 7.26%, and stacking improved the
accuracy by a maximum of 6.93%. A comparison of results showed that
Majority Voting With Feature Selection majority voting produces the highest improvement in accuracy. The
performance was further enhanced using feature selection techniques.
The feature selection techniques helped to improve the accuracy of the
Fig. 15. Increase in Accuracy of Majority Voting of NB, BN, RF and MP using
Feature Selection. ensemble algorithms. The highest accuracy was obtained with majority
voting with the feature set FS2.

8
C.B.C. Latha and S.C. Jeeva Informatics in Medicine Unlocked 16 (2019) 100203

Table 5
Accuracy with stacking and feature selection.
Algorithms Stacked with random tree Stacking Accuracy Accuracy with feature selection Feature Set

Naïve Bayes, Bayes Net, C4.5 77.89 78.55 FS3


Naïve Bayes, Bayes Net, C4.5 77.89 78.55 FS4
Naïve Bayes, Bayes Net, C4.5, PART 77.56 78.22 FS3
Naïve Bayes, Bayes Net, C4.5, PART 77.56 77.89 FS1
Naïve Bayes, Bayes Net, C4.5, PART, MLP 75.58 80.21 FS2
Naïve Bayes, Bayes Net, C4.5, PART, MLP 75.58 76.24 FS4

Table 6 2016;9(S1).
Comparison of the Proposed Model with the existing approaches. [7] JyotiSoni Ujma Ansari, Sharma Dipesh. Predictive data mining for medical diag-
nosis: an overview of heart disease prediction‖. Int. J. Comput. Appl. March
Source Approach Accuracy 2011;17(8). (0975 – 8887).
[8] Sudhakar K. Study of heart disease prediction using data mining.
Proposed Model Majority vote with NB, BN, RF and MP 85.48% 2014;4(1):1157–60.
Paul et al. (2016) Neural Network with fuzzy 80% [9] Thenmozhi K, Deepika P. Heart disease prediction using classification with different
Verma et al.(2016) Decision Tree 80.68% decision tree techniques. Int J Eng Res Gen Sci 2014;2(6).
Ei-bialy et al. (2015) Decision Tree 78.54% [10] KaanUyar Ahmet Ilhan. Diagnosis of heart disease using genetic algorithm based
Nahar et al.(2013) Naïve Bayes 69.11% trained recurrent fuzzy neural networks. 9th international conference on theory and
application of soft computing, computing with words and perception. Budapest,
Hungary: ICSCCW; 2017. 24-25 Aug 2017.
[11] LathaParthiban, Subramanian R. Intelligent heart disease prediction system using
Conflicts of interest CANFIS and genetic algorithm. Int. J. Biol. Biomed. Med. Sci. 2008;3(No. 3).
[12] Mackay J, Mensah G. Atlas of heart disease and stroke. Nonserial Publication; 2004.
[13] Vasighi Mahdi, Ali Zahraei, Bagheri Saeed, Vafaeimanesh Jamshid. Diagnosis of
None. coronary heart disease based on Hnmr spectra of human blood plasma using genetic
algorithm-based feature selection. Wiley Online Library; 2013. p. 318–22.
Ethical approval [14] Amin Mohammed Shafennor, et al. Identification of Significant features and data
mining techniques in predicting heart disease. Telematics Inf 2019:82–93.
[15] Nahar J, Imam T, Tickle KS, Chen YPP. Computational intelligence for heart disease
None. diagnosis: a medical knowledge driven approach. Expert Syst Appl
2013;40(1):96–104.
[16] Guru Niti, Dahiya Anil, NavinRajpal. Decision support system for heart disease
Acknowledgment diagnosis using neural network, Delhi Business Review. 2007;8(1). January-June.
[17] Detrano Robert. Cleveland heart disease database. V.A. Medical Center, Long Beach
None. and Cleveland Clinic Foundation; 1989.
[18] Patil SB, Kumaraswamy YS. Extraction of significant patterns from heart disease
warehouses for heart attack prediction. Int. J. Comput. Sci. Netw. Secur(IJCSNS)
References 2009;9(2):228–35.
[19] Chauhan Shraddha, Aeri Bani T. The rising incidence of cardiovascular diseases in
India: assessing its economic impact. J. Prev. Cardiol. 2015;4(4):735–40.
[1] Ali Khazaee. Heart beat classification using particle swarm optimization. Intell.
[20] Vanisree K, JyothiSingaraju. Decision support system for congenital heart disease
Syst. Appl. 2013:25–33.
diagnosis based on signs and symptoms using neural networks. Int J Comput Appl
[2] Fida Benish, Nazir Muhammad, Naveed Nawazish, Akram Sheeraz. Heart disease
April 2011;19(6). (0975 8887).
classification ensemble optimization using genetic algorithm. IEEE; 2011. p. 19–25.
[21] Verma L, Srivastava S, Negi PC. A hybrid data mining model to predict coronary
[3] Centers for Disease Control and Prevention (CDC). Deaths: leading causes for 2008.
artery disease cases using non-invasive clinical data. J Med Syst 2016;40(7):1–7.
Natl Vital Stat Rep June 6, 2012;60(No. 6).
[22] Liu Xiao, Wang Xiaoli, Su Qiang, Zhang Mo, Zhu Yanhong, Wang Qiugen, Wang
[4] EI-Bialy R, Salamay MA, Karam OH, Khalifa ME. Feature analysis of coronary artery
Qian. A hybrid classification system for heart disease diagnosis based on the RFRS
heart disease data sets. Procedia Comput. Sci. 2015;65:459–68.
method. Comput. Math. Methods Med. 2017;2017:1–11.
[5] Lee HeonGyu, Noh Ki Yong, Ryu Keun Ho. Mining biosignal data: coronary artery
[23] Xing Yanwei, Wang Jie, Yonghong Gao Zhihong Zhao. Combination data mining
disease diagnosis using linear and nonlinear features of HRV. LNAI 4819: emerging
methods with new medical data to predicting outcome of Coronary Heart Disease.
technologies in knowledge discovery and data mining. May 2007. p. 56–66.
Convergence Information Technology. 2007. p. 868–72.
[6] Singh Jagwant, Kaur Rajinder. Cardio vascular disease classification ensemble op-
timization using genetic algorithm and neural network. Indian J. Sci. Technol.

You might also like