Linear Algebra for Machine Learning – Part 4

This is part 4 of the linear algebra series .The goal of every single article will be to make linear algebra more understandable in geometric and applied aspects  .

Linear algebra is a branch of mathematics that is widely used throughout science and engineering .A good understanding of linear algebra is essential for understanding the working  with many machine learning algorithms especially deep Learning. linear algebra is a very wide branch and covering all topics is itself a semester course and many topics are not that much relevant to learn about machine learning algorithms. So I will omit some of the most important topics in linear algebra which are not essential for understanding machine learning algorithms. If you are a beginner or have no prior experience with linear algebra then it is recommended to follow this series of article in predefined order.

Note : If you haven’t read part 3 click here.

The goal of this article is to make reader aware of some of the most used properties of Linear algebra which are directly used in machine learning ,norms of vector, eigen decomposition , singular valued decomposition. These are the concept on which almost all machine learning algorithms like PCA , Recommender systems  are built upon .

After reading this article you will know,

  • what are norms , different types of norms.
  • what is eigen decomposition.
  • what is SVD decomposition and how it is different from eigen decomposition.

Norm of a vector can be understood as size of a vector . In machine learning we usually measure the size of vector by using a function called norm. Formally known as L_{p} norm which is given by ,

||x||_{p} = (\Sigma |x_{i}|^{p})^{1/p}

for p\in R, p \geq 1 ,norms are function used to map vectors to non-negative scalar values. The geometric interpretation of norm can be understood as the distance of vector x .

One of the most famous norm of all time is L_{2} norm with p=2, known as Euclidean norm. It is simply the Euclidean distance of vector x from the origin, L_{2} norm is by far most used norm, that even we some times ignore to write 2 in its mathematical notation and just write ||x|| .Pythagoras  theorem can be thought of as a implementation of L_{2} norm.In dot product we use  L_{2} norm. L_{2} norm increases very slowly when near origin and sometimes in most machine learning applications it is necessary to discriminate the vectors close to 0 but not exactly 0 .In those situations we use L_{1} norm instead of L_{2}  , which is defined as,

||x||_{1} = \Sigma |x|

The L_{1} norm is commonly used in machine learning applications where the difference between zero and non-zero elements is very important. In these situations we want our norm function which increases at same rate in all locations on plane.

In machine Learning there is also need sometime to calculate size of the matrix , in those situation we use Frobenius norm, which is defined as follows,

||A||_{F} = \sqrt{ \Sigma _{i,j}^{} (A_{i,j}^{} )^{2}}

which is analogous to  L_{2} norm of vector.

Eigen decomposition

Many mathematical objects can be understood better by breaking them into constituents parts, or finding some properties of them that are universal, not caused by the way we used to represent them. One the widely used kinds of matrix decomposition is called eigendecomposition , in which we decompose a matrix in its eigenvalues and eigenvectors  .A eigen vector of a square matrix is a non-zero , such that when A  is multiplied by that eigen vector , it forces the eigen vector to scale up or down by some scalar value which is called eigen-value.

A.v = \lambda.v

here v is eigen vector corresponding to eigen value lambda \lambda, the geometric interpretation of above statement is given below,

Before multiplication
After multiplication

Eigen vectors are orthogonal to each other . If you are interested in how to find eigen vectors you can google it most post covers it mathematical calculation but none of them covers geometric interpretation.One of the most popular dimensionality reduction algorithm PCA is completely based on the concept of eigen decomposition.

Singular valued decomposition

We have now covered eigen decomposition of a matrix , there is another type of decomposition called SVD decomposition . This decomposition provides another way to factorize the matrix in corresponding singular vectors and singular values. The information we get with SVD of a matrix is quite similar to that of eigen-decomposition but SVD is more generally applicable than eigen-decomposition . SVD exist for all real valued matrix , but the same is not applicable in case of eigen-decomposition.If a matrix is not square, the eigen-decomposition is not defined,in that case we have to use SVD  instead.The SVD of a matrix is defined a follows,

A = U.D.V^{T}

All the three matrices U,D,V posses special structure .U and are orthogonal matrices and matrix D is defined to be a diagonal matrix ,the D need not to be square in shape.The elements along the diagonal of D are known as the singular values of the matrix A. The columns of U are known as the left-singular vectors. The columns  of V are known as as the right-singular vectors. We can actually interpret the singular value decomposition of A in terms of the eigen-decomposition of functions of A . The left-singular vectors of A are the eigen vectors of AA^{T} The right-singular vectors of A are the eigenvectors of A^{T}A. The non-zero singular values of A are the square roots of the eigenvalues of A^{T}A or AA^{T} .

Understanding all the above concepts is very crucial to actually understand any machine learning algorithms. In next article we will cover some shapes like line, circle, which are used to defined decision boundary in machine learning algorithm. This is second last article of  linear algebra series hope you enjoyed reading it.

For reading part 5 click here.

Send a Message