**Ensemble in Machine Learning by definition**

**Ensemble in Machine Learning by definition**

The technique of combining Many Machine Learning models to make one Extremely powerful model ,though this strategy is not preferred in real time applications where latency is required but it is very popular technique where accuracy is the major factor or in competitions.

That is why ensemble methods placed first in many prestigious machine learning competitions, such as the Netflix Competition, KDD 2009, and Kaggle.

There exist many techniques to combine but in this article we will cover only following three,

- Bagging
- boosting
- stacking

For understanding purpose we will take algorithms in each category and try to understand the concept using that algorithm.

Bagging:

The concept of bagging is simple we first train ‘M’ models on an random sampled data for each model from a master dataset then we predict result if our problem is regression based then we use Mean,Median or other central tendency calculation procedure then apply on the output of all the models and predict one final result but if our problem is classification based then we take a majority vote.

One of the popular bagging technique is Random forest ,as it name suggest it is a forest of many decision trees and random because we do some random sampling on the data,the procedure works as follows.

- For each decision tree model we Row sample and column sample the data
- train each decision tree on that sampled data
- Predict final output either using Mean metric or majority vote

Note :In bagging each individual model have high variance and low bias as while doing bagging variance of whole model becomes low.

Boosting

This technique is a little bit complex as its concept says that use previous model error and a differentiable Loss function to minimize the error as the whole models gets trained,For sake of simplicity of this article consider m Linear regression models which are being trained using Gradient boosting.

The procedure works as in image given below,

In above Image, Step 1: is our initial model as a constant Gamma which are then combined with actual output Y[i] and passed into a loss function which we have to minimize its value and as only gamma can be tweaked so our goal for rest of algorithm is to minimize the error.

Step 2: Is for training M models which works as follows,

- calculate pseudo residuals for previous model and train the next model on data {X[i] , r[i]},where r[i] is just pseudo residual of previous model.(pseudo is just negative gradient of the loss function with respect to the collective error generated by previous model)
- Calculate the combined error of all previous models by passing it into loss function and calculate the gamma for next model.
- Update the model by simple gradient descent methodology
- Repeat step 2 till all models gets trained

Step 3:Generate the output

Most Popular Boosting algorithm which is also used in many internet companies is GBDT (Gradient Boosting decision Trees).

Stacking

The concept of stacking is very similar to boosting ,but here we have no restriction of keeping same models to create one big model ,The Procedure of stacking is as follows,

- Select M same or different models for a specific problem it can be Knn ,Svm with different types of kernels ,logistic classifier Decision trees etc ,and train them and predict the output.
- Take prediction of all these models and combine it with Actual output labels y[i] and train the final meta classifier and generate the final output.

Ensemble of machine learning models is a very Powerful technique when it comes to increase the accuracy of the models but comes with its own drawbacks as training these models it itself a challenge and even if you are successful in training these models the real time latency of the model will be poor .Hope You Like reading it .