In our previous posts of this series, we have studied everything about data science and similar data analytics approaches. So, in this post, you are going to learn about the life cycle of data with a case study.
Data Science Life Cycle
In one data science project, six different phases are included which forms the shape of the life cycle.
Phase 1- Discovery.
At the beginning of a project, you need to assess numerous factors like specifications, requirements, priorities, and budget to complete the project. You have to ask the right questions and determine whether you have skilled labor and technology to complete the project or not. In this phase, you need to identify business problems and develop hypothesis tests to solve them.
Phase 2- Data Preparation.
This is a crucial step, as here you have to create the analytical sandbox which is going to serve the purpose of analytics for the whole project. You have to clearly preprocess, explore and condition data before starting the modeling phase. Further, you will have to perform ETLT (extract, transform, load and transform) to get data into the sandbox.
You can use R to transform, clean or visualize data in the preparation phase. By doing this you can easily spot outliers and can establish relationships between the different variables. If you have cleaned and transformed your data, then next you have to apply exploratory analytics on it. Now, let’s see how you can do that.
Phase 3- Modeling
This is yet another important phase of the life cycle where you have to select the method and techniques to establish relationships between different variables. These relationships will serve as the base of algorithms which you have to implement in the next phase. In our explanation, we have taken Exploratory Data Analytics with the standard statistical formula and visualization tools.
Have a brief look at the explanation below;
- R has a complete sequence of modeling capabilities and creates a good environment for building the interpretative model.
- SQL Analysis services help by performing in the database analytics using common data mining functions and basic predictive tools.
- SAS/ACCESS can be used to access data with the help of Hadoop. It is also used to create repeatable and reusable model flow diagrams for your ease.
There are plenty of analytics tools available in the market, but R is the most important factor here. Okay, so now you have learned about the data and which algorithm should be used, in the next phases, you will learn the method to implement them.
Phase 4- Model Building
In this phase, you will create the datasets for training and testing purpose. You have to determine whether your current tool is capable enough to model or you have to create a more robust environment. You will also learn the different techniques and methodologies to create a model like classification, association, and clustering. There are plenty of model building tools available such as; SAS enterprise miner, WEKA, SPCS modeler, Matlab, Alpine miner, Statistica.
Phase 5- Operationalisation
In this phase, you have to design final reports, code, briefings, and technical documents. Sometimes, in addition, pilot projects are implemented in real-time production environments. This will create a clear picture of the project on small scale before deploying it on the mainframe.
Phase 6- Result
Now, in the last phase, you have to access whether you are able to receive the desired result from the project or not. So, in the last phase, you will note down all the key finds of the project and share them with the stakeholders to determine that whether it matches the goal you have set in the first phase or not.
Okay, so now you are completely familiar with the data science life cycle. In the next and final installment, we will show this life cycle by implementing in the case study.