Normalization and Standardization in Data pre-processing

Goal of this article is to achieve following points:

  • What is Normalization and Standardization ?
  • Why we need Normalization and Standardization ?
  • Advantages of Normalization and standardization ?

 

What is Normalization ?

Normalization :- This is process is commonly termed as column normalization as it is mostly applied column wise or on every feature of a dataset. This is one of the important process in data pre-processing before applying any operation or algorithm.

Procedure:  \frac{X_{i}-X_{min}}{X_{max}-X_{min}}, where X_{{i}} is a particular value in a feature ,and X_{{min}} is minimum value of that column ,X_{{max}} is the maximum value in that column .After doing this for all values in each column of dataset will lie in range [0,1].

Advantages of Normalization:

  1. Scaling of all the values without destroying the relationship between data.
  2. Getting rid of the calculation with very large values .

Geometric interpretation of Normalization:

 

Before Applying normalization
Before Applying normalization
After applying normalization

What is Standardization

Standardization: It is a practice of making the mean of each column of data to zero and std-dev equal to1 . This is  common practice in data cleaning process.By applying standardization it makes  application of algorithm much more accurate.

The process of applying standardization:

Find :

  • \mu – Mean of the column on which standardization is to be applied
  • \sigma – Std-dev of the column  on which standardization is to be applied

Then replace X_{{i}} with  X_{i}^{'} ,where  X_{i}^{'} is:-

 X_{i}^{'} = \frac{X_{i}-\mu}{\sigma}

Advantages of Standardization:

  • Geometric interpretation of loss function gets more accurate
  • the spread of the data is confined in range [-0.5,0.5].
  • Squashing.

Geometric interpretation of Standardization:

Before applying standardization
After applying standardization

Normalization and Standardization are very common practice in data cleaning process and they offer many advantages when comes to application  of any machine learning algorithm over the cleaned data.

Send a Message