A complete beginner’s guide to SVM

Note: This article is purely focused on the mathematical aspects of SVM.The readers should have basic idea of what is classification ,for this article we will consider binary classification only.

SVM (Support vector machine) is a very powerful machine learning algorithm used for both classification and regression and it is one of the most popular kernel based algorithm.

The goal of algorithm is to find the plane that separates the positive and the negative points as widely as possible .This is called marginal  maximization. consider the image below for better understanding,

 

here the dark points are positive points and the white one’s belongs to negative class.There are some key points to be noted down,

  1. Plane \pi_{{1}} = W.X -B =1 is the maximum separating plane for positive points i.e all the positive points lie either one side of the plane or on the plane but there will be no extra gap between positive points and in the plane.
  2. Plane \pi_{{2}} =W.X -B =-1 is the maximum separating plane for negative points i.e all the negative points lie either one side of the plane or on the plane but there will be no extra gap between negative points and  the plane.
  3. Plane  W.X -B =0 is a equidistant plane from both negative , positive plane

You must be wondering why is there a +1 and -1  why not some other  value so to answer this you need to know the distance between two planes which is 2\ast K\div||W|| and here K is the constant to which the lines is equating,In our case  K is 1 so in simple terms all we are trying to do is minimize our calculations.

if the distance between \pi_{{1}} and \pi_{{2}} is maximized then the points can be separated by this maximum distance hyperplane.As the margin increases generalization of the model increases.

Mathematical formulation of SVM

By above  discussion we can clearly see that all we wanted to do is to maximize the distance between \pi_{{1}} and \pi_{{2}},Our objective function is

w* ,b* = Maximize(2\div ||w||)

such that for all the points aka support vectors which lie on the planes i.e negative points on negative plane and positive points on positive planes for them Y_{{i}} \ast (W\cdot T\ast X -b) =1 and for rest of the points it will be Y_{{i}} \ast (W\cdot T\ast X -b) >1.Now you can relate why the algorithm is called support vectors.This formulation of SVM is called hard margin SVM which is possible only for  linearly separable but in general case we will mostly encounter non-linearly separable data, for those situation we will use Soft margin SVM,which is defined as follows,

Y_{{i}} \ast (W\cdot T\ast X_{i} -b) =1-E_{{i}}

 

now for each point we will use this equation and for all the support vectors E  will be 0 ,for rest it will vary.

Now our objective function becomes,

(w* ,b*) = maximize(2\div ||w|| +C\cdot(1/n)\ast\Sigma E_{i})

or

(w* ,b*) = minimize(( ||w||\div 2) +C\cdot(1/n)\ast\Sigma E_{i})
Equation 1

Such that (1-Y_{{i}}(W.T\ast X_{i}-b)) >=E_{i}

This is the formulation of soft margin svm here C is a hyper parameter and working as a regularization parameter If is too high then the model overfit and if C is too low then the model underfit..The above formulation of SVM is called the primal form of SVM which we have used only for derivation puposes but for implementation purpose we will use its dual form which is defined as follows,

 maximize(\alpha_{i,j})(\Sigma _{{i=1}}^{n}\alpha_{i} -(1/2)( \Sigma _{i=1}^{n} \Sigma _{j=1}}^{n}\alpha_{i}\alpha_{j}y_{i}y_{j}x_{i}x_{j}))

here X_{{i}}X_{{j}} can be replaced with any similarity matrix or any svm kernel such as RBF kernel or Linear kernel, and here also \alpha_{{1...n}} >0 for all support vectors but for rest of the points it will be zero.This objective function can be minimized by using any optimization algorithm such as gradient descent or SGD.

 

 

Send a Message