Ensemble to improve performance of Machine Learning models

Ensemble in  Machine Learning by definition

The technique of combining Many Machine Learning models  to make one  Extremely powerful model  ,though this strategy is not preferred  in real time applications where latency is required but it is very popular technique where accuracy is the major factor or in competitions. That is why ensemble methods placed first in many prestigious machine learning competitions, such as the Netflix Competition, KDD 2009, and Kaggle.

There exist many techniques to combine but in this article we will cover only following three,

  1. Bagging
  2. boosting
  3. stacking

For  understanding purpose  we will take algorithms in each category and try to understand the concept using that algorithm.

Bagging:

The concept of bagging is simple we first train ‘M’ models on an random sampled data for each model from a master dataset then we predict result if our problem is regression based then we use Mean,Median or other central tendency calculation procedure then apply on the output of all the models and predict one final result but if our problem is classification based then we take a majority vote.

One of the popular bagging technique is Random forest ,as it name suggest it is a forest of many decision trees and random because we do some random sampling on the data,the procedure works as follows.

  1. For each decision tree model we Row sample and column sample the data
  2. train each decision tree on that sampled data
  3. Predict final output either using Mean metric or majority vote

Note :In bagging each individual model have high variance and low bias as while doing bagging variance of whole model becomes low.

Boosting

This technique is a little bit complex as its concept says that use previous model error and a differentiable Loss function to minimize the error as the whole models gets trained,For sake of simplicity of this article consider m Linear regression models which are being trained using Gradient boosting.

The procedure works as in image given below,https://en.wikipedia.org/wiki/Gradient_boosting

In above Image, Step 1: is our initial model as a constant Gamma which are then combined with actual output Y[i] and passed into a loss function which we have to minimize its value  and as only gamma can be tweaked so our goal for rest of algorithm is to minimize the error.

Step 2: Is for training M models which works as follows,

  1. calculate pseudo residuals for previous  model and train the next model on data {X[i] , r[i]},where r[i] is just pseudo residual of previous model.(pseudo is just negative gradient of the loss function with respect to the collective error  generated by previous model)
  2. Calculate the combined error of all previous models by passing it into loss function and calculate the gamma for next model.
  3. Update the model by simple gradient descent methodology
  4. Repeat step 2 till all models gets trained

Step 3:Generate the output

Most Popular Boosting algorithm which is also used in many internet companies is GBDT (Gradient Boosting decision Trees).

Stacking

The concept of stacking is very similar to boosting ,but here we have no restriction of keeping same models to create one big model ,The Procedure of stacking is as follows,

  1. Select M same or different models for a specific problem it can be Knn ,Svm  with different types of kernels ,logistic classifier Decision trees etc ,and train them and predict the output.
  2. Take prediction of all these models and combine it with Actual output labels y[i] and train the final meta classifier and generate the final output.

Ensemble of  machine learning models is a very Powerful technique when it comes to increase the accuracy of the models but comes with its own drawbacks as training these models it itself a  challenge and even if you are successful in training these models the real time latency of the model will be poor .Hope You Like reading it .

Send a Message