Goal of this article is to achieve following points:
- What is Normalization and Standardization ?
- Why we need Normalization and Standardization ?
- Advantages of Normalization and standardization ?
What is Normalization ?
Normalization :- This is process is commonly termed as column normalization as it is mostly applied column wise or on every feature of a dataset. This is one of the important process in data pre-processing before applying any operation or algorithm.
Procedure: , where is a particular value in a feature ,and is minimum value of that column , is the maximum value in that column .After doing this for all values in each column of dataset will lie in range [0,1].
Advantages of Normalization:
- Scaling of all the values without destroying the relationship between data.
- Getting rid of the calculation with very large values .
Geometric interpretation of Normalization:
What is Standardization
Standardization: It is a practice of making the mean of each column of data to zero and std-dev equal to1 . This is common practice in data cleaning process.By applying standardization it makes application of algorithm much more accurate.
The process of applying standardization:
- – Mean of the column on which standardization is to be applied
- – Std-dev of the column on which standardization is to be applied
Then replace with ,where is:-
Advantages of Standardization:
- Geometric interpretation of loss function gets more accurate
- the spread of the data is confined in range [-0.5,0.5].
Geometric interpretation of Standardization:
Normalization and Standardization are very common practice in data cleaning process and they offer many advantages when comes to application of any machine learning algorithm over the cleaned data.