When the model is trying too hard to fit noise in the data ,i.e model is Overfitting the data . This drops the accuracy of model with huge margin , to avoid this we use Regularizaton.
Let’s understand why overfitting is not good ,
- The model learns from noise in data.
- It looses its ability to generalize.
It is must that you prevent your model to overfit the data.Overfitting occurs when the model fails to understand the underlying structure of the data .This happens due to various reasons ,
- The criterion used to select the model is not same which is used for judging its performance.
- overtraining of the model.
- too many non-linear polynomials in models.
- No Regularization.
- High variance in model, (for illustration see figure)
Before moving ahead in this article it is recommended that you first understand any one regression model.
How Does Regularization avoids Overfitting
To continue let’s take a example of our Logistic regression model and try to understand what it is trying to do . The Loss function of logistic regression is given as follows,
Let’s modify our loss function by putting ,now our loss function becomes
So our model to work most accurately we want our z to be positive i.e the model should predict class label most accurately , as if our z is positive the value of becomes close to zero and we know the log(x) where X 1 is 0. This all explanation makes sense as this is our minimization objective.
But before claiming that this model is correct let’s examine some cases.
- What if some data points are noise i.e points which do not follow underlying structure of data
- what if some of your data points are corrupted i.e someone entered wrong or some mishandling.
To avoid above mentioned problems it is must that your model should generalize well rather than focuses of fitting itself on one data. That’s where the regularization comes into the equation. The idea of regularization is simple it tries to avoid the weights W becoming too large or too small . Let’s understand with help of logistic loss function,
This is formally called L2 Regularization . here is l2 regularization hyperparameter which decides how much to penalize the weights in order to prevent overfitting . Let’s understand using cases as how it is able to control the order of magnitude of weights .
case 1: if , then the loss-term will become very small but the square of w becomes very large and this is against minimization objective and the whole term will try to reduce the magnitude of w’s .
case 2: if , the value loss-term will become large ,which is also against minimization objective.
So there will be a tug-of-war situation which will tries to prevent w’s either becoming too large or too small.
There is also another version of regularization known as L1 regularization but the main difference between L1 and L2 regularization is that L1 regularization introduces sparsity which is sometimes needed or sometime not. The whole concept of regularization is a kind of hack by doing we will prevent w’s either becoming too large or too small. It is a must that you always check for overfitting in your model before deploying it in production. We will be covering many such interesting topics in machine learning so stay tuned.
“Happy Machine Learning”