Group9 ABA Ensemble Model

Ensembling Methods
Group 9
Raj Ratan Soren

Gopal Singh Lora
Reshma
Deeksha Sharma
Pooja Gupta
1
Ensemble Model:
Ensemble method is a process of combining and creating two or more models of same or
different types to produce improved results. It is used to produce a more accurate solution
than a single model would. It is a robust approach to improving the accuracy of the
predictive models. In layman terms, it is considering opinion from all relevant people and
later applying a voting system or giving equal or higher weightage to some people.
ensemble models are composed of several supervised learning models that are
independently trained, and the results are combined in different ways to obtain the final
prediction result.
Types of ensembling
Bagging: Bagging is also called as bootstrap aggregation. It is used to decrease variance

error of models result. For this first we have to understand bootstrapping. Bootstrapping is
a sampling technique in which we have to prepare random samples from training data set in
which repeated samples can be created and each selected sample is being replaced, so it’s
like a replacement model . If you want to create a random samples with m elements, you
should select a random sample from the original dataset m times. Then same leaning
algorithm models are built on each random sample to combine their results using averaging
to get best predictive outcome. Random forest is an example of Bagging or bootstrap
aggregation.
Input
Sample
BootStrap BootStrap BootStrap BootStrap

Sample 1 Sample 2 Sample 3 Sample 4
Model Model Model Model

1 2 3 4
Bagging
Model 2
N = {23,45,55,47,34,88,95,27,87,78,26,19,10,3,4,17} – Original sample with 16 elements
Bootstrap sample 1: {10, 78, 87, 55, 26, 88, 10}
Boosting: Boosting is a sequential technique used to convert weak models into strong
models in which, the first algorithm is trained on the entire dataset and the subsequent
algorithms are built by fitting the residuals of the first algorithm, thus giving higher weight
to those observations that were poorly predicted by the previous model. It helps to reduce
mostly bias in the data set and somewhat leads to a reduction in variance as well. Boosting
also requires bootstrapping. However, there is another difference here. Unlike in bagging,
boosting weights each sample of data. When boosting runs each model, it tracks which data
samples are the most successful and which are not. The data sets with the most
misclassified outputs are given heavier weights. These are considered to be data that have
more complexity and requires more iterations to properly train the model. That way, when
the “voting” occurs, like in bagging, the models with better outcomes have a stronger pull
on the final output
Some examples of boosting are XGBoost, GBM, ADABOOST, etc.
How Boosting Algorithm Works:
Step 1: The base learner takes all the distributions and assign equal weight or attention to
each observation.
Step 2: If there is any prediction error caused by first base learning algorithm, then we give
higher attention to observations having prediction error. Then, we apply the next base
learning algorithm.
Step 3: Iterate Step 2 till the limit of base learning algorithm is reached or higher accuracy is
achieved.
Stacking: Also called as super learning and stacked regression. In stacking multiple layers of
machine learning models are placed one over another where each of the models passes their
predictions to the model in the layer above it and the top layer model takes decisions based
on the outputs of the models in layers below it.
Let’s understand it with an example:
3
Training
data set
Model Model Model Model

D1 D2 D3 D4
Model
F()
Output
Here, we have two layers of machine learning models:
 Bottom layer models (d1, d2, d3 ) which receive the original input features(x) from the
dataset.
 Top layer model, f() which takes the output of the bottom layer models (d1, d2, d3 )
as its input and predicts the final output.
Here, we have used only two layers but it can be any number of layers and any number of
models in each layer. Two of the key principles for selecting the models:
 The individual models fulfill particular accuracy criteria.

 The model predictions of various individual models are not highly correlated with the
predictions of other models.
One thing that you might have realized is that we have used the top layer model which takes
as input the predictions of the bottom layer models. This top layer model can also be replaced
by many other simpler formulas like:
 Averaging
 Majority vote
 Weighted average
4
Bagging Ensemble Method:
We have used Ionosphere Dataset to predict the accuracy of both the algorithms and which
one is better
1. We have installed 3 packages which are mlbench, caret and caretensemble.

2. We have loaded the data set in R
3. We have eliminated column 2 from the data set
4. We have converted all the values of column with variable name V1 into numeric values
of 0 and 1
5. We have created sampling data for bagging and random forest
6. We have tested Bagged cart and random forest algorithm
7. We have finally summarized both the algorithm to check the accuracy of the
algorithms
We can see that random forest produces a more accurate model with an accuracy of
93.25%.

Group9 ABA Ensemble Model

Uploaded by

Copyright:

Available Formats

Group9 ABA Ensemble Model

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Group9 ABA Ensemble Model

Uploaded by

Copyright:

Available Formats

Ensembling Methods

Raj Ratan Soren

Bagging: Bagging is also called as bootstrap aggregation. It is used to decrease variance

BootStrap BootStrap BootStrap BootStrap

Model Model Model Model

Bootstrap sample 1: {10, 78, 87, 55, 26, 88, 10}

Bootstrap sample 2: {55, 78, 45, 78, 55, 23, 23}

Bootstrap sample 3: {88, 27, 10, 27, 34, 34, 23}

How Boosting Algorithm Works:

Let’s understand it with an example:

Model Model Model Model

Here, we have two layers of machine learning models:

 The individual models fulfill particular accuracy criteria.

1. We have installed 3 packages which are mlbench, caret and caretensemble.

You might also like