A plot is a graphical technique for representing a data set.Graphs are use to represent relation between variables and to display the information in intuitive manner .Different types of graphs represent different types of relations between variables.
In this article we will cover following very popular plots with their interpretation and python code snippets:-
- Line plots
- Scatter Plots
- Pair Plots
- box plots
- violin plots
Note : Some plots implementation is library dependent .Covering all mathematical aspects of all these plots is beyond the scope of this article.
This is one the simplest plot to understand and yet very powerful when it comes to its interpretation. It is a plot with two variables X and Y and a line is drawn connecting the given coordinates
A scatter plot also called scatterplot, scatter graph ,scatter chart, is a type of plot which uses X and Y coordinates on a 2d plane to display points.It is a well known plotting technique to study the interdependence of one variable over other.
This is a extension to pair plots and histogram and its pdf representation.This is primarily used when we want to study the behavior of all variables with every other variables when the data is more than 2-Dimension.
Histogram is one of the best way for intensity representation.It is one of the accurate way of representing the distribution of the data more precisely probability distribution of data .The plotting of the histogram depends upon ‘bins’ i.e dividing the entire range into series of interval then based upon the number of values present inside a range of a bin the height of the bar of that bin is determined.
These are fairly complex types of plots ,these are used to represent data according to their quartiles.They also have lines extending vertically from the boxes indicating the variance outside the upper and lower quartile .The Space between different parts of the box indicate the variance(Spread) of the data.
It is a extension of box plots in this the kernel density plot is also plotted with box plots .A violin plot is better than a plain box plot as a box plot only shows summary statistics such as mean/median and interquartile ranges, the violin plot shows the full distribution of the data.
EDA is a must step to be performed before building any machine Learning Algorithm.All the plots are very useful In EDA(Exploratory data analysis) of data before designing any machine Learning algorithm.