# Part -2 Comparing Optimizers

This is the part 2 of the Optimizers series, for part 1 click here .In this article we will continue our discussion on optimizers  and will cover some of the most advance optimizers present.So by not wasting any more time let’s straight jump into discussion.

2. RmsProp

In above equation all we are trying to do is introduce Exponential decay average rather than simple average

this update to Adagrad solves the major problem of learning rate diminishing.

RMSProp

The idea behind RMsProp and adadelta is same they both uses the technique of Exponential decay average but RmsProp never gets published ,it was discussed by Geoffery hinton in one of his lectures where he claimed that the most tested and best value of is 0.9 and that of is 0.001. Rest all is same as Adadelta even the update equationof Rmsprop is same as that of Adadelta.

This is one of the most used Optimizers in present DeepLearning applications in 2018.The idea behind Adam is instead of storing Eda of square of  why not store eda(Exponential decay average) of gt itself. The ‘Moment’ inside the name is because of ,In statistics:

Mean :- this is called first order moment
Variance:- is called Second order moment

So eda can be roughly thought of as mean and eda of can be roughly thought of as variance but don’t get me wrong this is not the exact explanation as this variance is un-centered,so going through whole statistics will deviate us from our topic.Now coming back to our discussion , we have introduced mean and variance which will be used in our update equations,

mean at time t

variance at time t

Now combining these two equation to state our order equation,

first order moment

second order moment

So, our update equation for Adam is ,

This algorithm tries to estimate the moments of the gradients to move forward in optimization.

If all the above equations seems too difficult  to code then don’t worry all major deeplearning frameworks like tensorflow ,keras or pytorch comes pre-defined  with the implementation of all these optimizers you just have to mentioned it, but understanding the math’s behind these optimizers is very necessary as then only you can harness their true powers.

“Happy Machine Learning”