Group9 ABA Ensemble Model
Group9 ABA Ensemble Model
Group9 ABA Ensemble Model
Group 9
1
Ensemble Model:
Ensemble method is a process of combining and creating two or more models of same or
different types to produce improved results. It is used to produce a more accurate solution
than a single model would. It is a robust approach to improving the accuracy of the
predictive models. In layman terms, it is considering opinion from all relevant people and
later applying a voting system or giving equal or higher weightage to some people.
ensemble models are composed of several supervised learning models that are
independently trained, and the results are combined in different ways to obtain the final
prediction result.
Types of ensembling
Input
Sample
Bagging
Model 2
N = {23,45,55,47,34,88,95,27,87,78,26,19,10,3,4,17} – Original sample with 16 elements
Boosting: Boosting is a sequential technique used to convert weak models into strong
models in which, the first algorithm is trained on the entire dataset and the subsequent
algorithms are built by fitting the residuals of the first algorithm, thus giving higher weight
to those observations that were poorly predicted by the previous model. It helps to reduce
mostly bias in the data set and somewhat leads to a reduction in variance as well. Boosting
also requires bootstrapping. However, there is another difference here. Unlike in bagging,
boosting weights each sample of data. When boosting runs each model, it tracks which data
samples are the most successful and which are not. The data sets with the most
misclassified outputs are given heavier weights. These are considered to be data that have
more complexity and requires more iterations to properly train the model. That way, when
the “voting” occurs, like in bagging, the models with better outcomes have a stronger pull
on the final output
Some examples of boosting are XGBoost, GBM, ADABOOST, etc.
Step 1: The base learner takes all the distributions and assign equal weight or attention to
each observation.
Step 2: If there is any prediction error caused by first base learning algorithm, then we give
higher attention to observations having prediction error. Then, we apply the next base
learning algorithm.
Step 3: Iterate Step 2 till the limit of base learning algorithm is reached or higher accuracy is
achieved.
Stacking: Also called as super learning and stacked regression. In stacking multiple layers of
machine learning models are placed one over another where each of the models passes their
predictions to the model in the layer above it and the top layer model takes decisions based
on the outputs of the models in layers below it.
3
Training
data set
Model
F()
Output
Bottom layer models (d1, d2, d3 ) which receive the original input features(x) from the
dataset.
Top layer model, f() which takes the output of the bottom layer models (d1, d2, d3 )
as its input and predicts the final output.
Here, we have used only two layers but it can be any number of layers and any number of
models in each layer. Two of the key principles for selecting the models:
One thing that you might have realized is that we have used the top layer model which takes
as input the predictions of the bottom layer models. This top layer model can also be replaced
by many other simpler formulas like:
Averaging
Majority vote
Weighted average
4
Bagging Ensemble Method:
We have used Ionosphere Dataset to predict the accuracy of both the algorithms and which
one is better
We can see that random forest produces a more accurate model with an accuracy of
93.25%.