Linear Algebra for Machine Learning – Part 3

This is part 3 of the linear algebra series .The goal of every single article will be to make linear algebra more understandable in geometric and applied aspects  .

Linear algebra is a branch of mathematics that is widely used throughout science and engineering .A good understanding of linear algebra is essential for understanding the working  with many machine learning algorithms especially deep Learning. linear algebra is a very wide branch and covering all topics is itself a semester course and many topics are not that much relevant to learn about machine learning algorithms. So I will omit some of the most important topics in linear algebra which are not essential for understanding machine learning algorithms. If you are a beginner or have no prior experience with linear algebra then it is recommended to follow this series of article in predefined order.

Note : If you haven’t read part 2 click here.

The goal of this article is to introduce some special type of matrices i.e identity matrix , matrix inverse .We will also cover what is system of linear equations and how to solve it, some important properties of determinant and their geometric significance and many more.

After completing this article you will know ,

  • what are identity matrices .?
  • what is matrix inverse , adjoined of a matrix ?
  • what is determinant , its geometric significance ?

So let’s begin ,

Identity matrix

An identity matrix is a pure diagonal matrix i.e   A_{i,j} =0 where i\neq j  and A_{i,j} =1 where i= j ,the 2-d array representation of identity matrix is given below,

This property of identity matrix makes it a very good candidate for all computation .this also means dot product between an identity matrix and a vector does not change the vector. The concept of identity matrix is also very useful to find inverse of any matrix. One of the most important property of identity matrix is


Where A is known matrix and A^{-1} is inverse of matrix A and I is identity matrix.

before going to matrix inversion , let’s first understand why we even need to find inverse of any matrix.If you are following this series from beginning then We now know enough linear algebra notation to write down a system of linear equations:

A.x =b

where A^{m\ \ast\ n} \in R is a known matrix and b^{m}\in R is a known vector and x^{n}\in R is an unknown vector of variables which need to be find out in order to satisfy this equation, each row of A dot product with x gives a single value of b. This can be written as ,

A_{1,1}\ast x_{1} + A_{1,2} \ast x_{2} +..........A_{1,n}\ast x_n = b

To find the solution of these linear equations ,

A.x =b

A^{-1} .A.x =A^{-1}.b

the we know A^{-1}.A=Iso , the final equation becomes     I.x =A^{-1}.b

so in order to find x we need to find A^{-1} , Of course, this depends on it being possible to find , A^{-1}, The necessary and sufficient condition for a square matrix A to posses inverse is that |A| \neq \ 0 where |A| is determinant of A . proof of this is not required and also not very important but still if you are interested then comment it below.

Finding inverse of A

Calculating A^{-1} is quite easy just follow the steps given below,

Step 1 : find adjoint of A

Step 2 : divide each element of adj(A) by its determinant |A|.

The main part of this whole calculation is to find adj(A), and frankly speaking it is a computationally expensive part . The idea behind adj(A) is simple just replace every element of A its co-factor then transpose its co-factor matrix . To find co-factor consider the example given below.

Consider a 3×3 matrix

So its co-factor matrix is,

Adj(A) is ,

After finding the adj(A) just divide its each element by its determinant. To find determinant

To find determinant,

determinant has of the deepest geometric significance which most of text books fails to cover which is , the value of determinant specify how much the matrix will scale after applying the determinant as a linear transformation to some other matrix.

In the next article we will cover some of the most used properties of matrix which are norms , eigen decomposition , SVD decomposition which are formulations on which many algorithms are built on like recommender  systems ,PCA etc. If you have any doubts feel free to drop queries in comment section below. Hope you enjoyed reading it.

For reading part 4 click here.

Send a Message