Note: This article is purely focused on the mathematical aspects of SVM.The readers should have basic idea of what is classification ,for this article we will consider binary classification only.
SVM (Support vector machine) is a very powerful machine learning algorithm used for both classification and regression and it is one of the most popular kernel based algorithm.
The goal of algorithm is to find the plane that separates the positive and the negative points as widely as possible .This is called marginal maximization. consider the image below for better understanding,
here the dark points are positive points and the white one’s belongs to negative class.There are some key points to be noted down,
- Plane is the maximum separating plane for positive points i.e all the positive points lie either one side of the plane or on the plane but there will be no extra gap between positive points and in the plane.
- Plane is the maximum separating plane for negative points i.e all the negative points lie either one side of the plane or on the plane but there will be no extra gap between negative points and the plane.
- Plane is a equidistant plane from both negative , positive plane
You must be wondering why is there a +1 and -1 why not some other value so to answer this you need to know the distance between two planes which is and here K is the constant to which the lines is equating,In our case K is 1 so in simple terms all we are trying to do is minimize our calculations.
if the distance between and is maximized then the points can be separated by this maximum distance hyperplane.As the margin increases generalization of the model increases.
Mathematical formulation of SVM
By above discussion we can clearly see that all we wanted to do is to maximize the distance between and ,Our objective function is
such that for all the points aka support vectors which lie on the planes i.e negative points on negative plane and positive points on positive planes for them and for rest of the points it will be .Now you can relate why the algorithm is called support vectors.This formulation of SVM is called hard margin SVM which is possible only for linearly separable but in general case we will mostly encounter non-linearly separable data, for those situation we will use Soft margin SVM,which is defined as follows,
now for each point we will use this equation and for all the support vectors E will be 0 ,for rest it will vary.
Now our objective function becomes,
This is the formulation of soft margin svm here C is a hyper parameter and working as a regularization parameter If C is too high then the model overfit and if C is too low then the model underfit..The above formulation of SVM is called the primal form of SVM which we have used only for derivation puposes but for implementation purpose we will use its dual form which is defined as follows,
here can be replaced with any similarity matrix or any svm kernel such as RBF kernel or Linear kernel, and here also for all support vectors but for rest of the points it will be zero.This objective function can be minimized by using any optimization algorithm such as gradient descent or SGD.