Linear Algebra for Machine Learning – Part 5

This is part 5 of the linear algebra series .The goal of every single article will be to make linear algebra more understandable in geometric and applied aspects  .

Linear algebra is a branch of mathematics that is widely used throughout science and engineering .A good understanding of linear algebra is essential for understanding the working  with many machine learning algorithms especially deep Learning. linear algebra is a very wide branch and covering all topics is itself a semester course and many topics are not that much relevant to learn about machine learning algorithms. So I will omit some of the most important topics in linear algebra which are not essential for understanding machine learning algorithms. If you are a beginner or have no prior experience with linear algebra then it is recommended to follow this series of article in predefined order.

Note : If you haven’t read part 4 click here.

The goal of this article is make reader aware how machine learning algorithms uses basic shapes like line , curves , circles to decide the decision boundary . This topic is extremely important in order to understand the concept of decision  boundary in machine learning algorithms.

After reading this article you will know,

  • shapes in context of machine learning
  • how algorithms decide which decision boundary is best.
Shapes for decision boundary

In this section we will cover shapes in context of machine learning, primarily lines . Before diving into any specific details you must understand there are commonly two types of data,

  • Linearly separable : data which can be perfectly or almost separated by a line(2-D), plane(3-D), Hyperplane(n-D)(n-dimensional plane).
  • Non Linearly separable : data which can not be separated by any straight line ,plane or hyperplane ,data of this kind can sometime be separated by more complex figures like circle ,sphere, ellipsoid ,hypersphere.


This is one of the most used method of separating data points which are linearly separable.

Equation of a line : y = m.x +c ,where x is data point ,m is slope of line , c is y intercept, graphical visualization given below.

But for complete understanding we will use more general form of eq of line , which is a.x+b.y+c=0, change ‘a’ with W_2 and ‘b’ by W_1 and ‘c’ by B , now our equation becomes,

W_2.x + W_1.y +B =0

the reader should make himself familiar with these types of manipulations. In above eq ‘x’ and ‘y’  represents coordinate of a data points . Let’s do some manipulations on above eq,

W_2.x + W_1.y +B =0vectorize above equation,

[W_2,W_1]^T\ast [x_1,x_2] +B =0

where x_1 is x coordinate of data point and x_2 is y coordinate of data point , so

W^t.x +B =0

where W is [W_2,W_1] and x is [x_1,x_2] ,now you can easily relate why we use this as our hypothesis in most machine learning algorithms . There is one more interpretation of above equation is that value  of  W’s  represent the importance of particular feature. higher the value of W of a feature higher the importance.

The B is the bias term which is used to decide the inclination of the line , it is a convention to add a bias term but even if you don’t it will not affect much .If you compare above equation with y= m.x +c then here m= -\frac{W_1}{W_2} and C =  -\frac{W_0}{W_2}. This concept can be extended to any plane , hyperplane , circle etc.

How to find best decision boundary

In the above derivation , the terms in the final eq which can be manipulated are W’s and B , there is also one point to be noticed in above eq is that there is a dot product between W and x and we know that dot product between any two vectors tells us about the dissimilarity between two vectors . In our final eq the W is unit vector i.e the L_{2} norm of the vector is 1, so we only care about the direction of W and if our decision boundary passes through origin then our eq becomes W^t.x=0 and this means W_0 becomes zero , if this condition has to be true then we can say W is a unit vector perpendicular to vector x   as only the value of  cos 90 is 0, for complete understanding check the image given below.

In this situation all the data points which are one side of line will have the result of dot product +ve as the angle between them and W will be less than 90 . The points on other side of the line will have the result of dot product -ve as the angle between them and W will be greater than 90 and then points which lie on the line will have result of dot product as 0. So in order to find the best decision boundary we need to find W‘s which minimize following function,

J(W) =  \Sigma _{i=0}^{n} (W^t.x) This can be one of the simplest cost function you will ever see in any machine learning algorithm.

The understanding of the concepts in this linear algebra series  are necessary in order to understand or deriving any machine learning algorithm from scratch . This is the last article of linear algebra series hope you enjoyed reading it .


Send a Message