Welcome to the first in a series of posts dedicated to the Analytics Journey. More specifically, we will demonstrate how we at Ruths.ai incorporate the industry-proven methodology, CRISP-DM, into our data science life cycle. Over the ensuing posts, we will take the reader along each step of the journey’s path from beginning to end . . . and beginning to end . . . the Analytics Journey never truly ends, only optimizes . . .
In the 1990s, as computing and data evolved from a fringe asset to a necessity for all companies, organizations grasped to find an efficient, structured process with which they could feel comfortable. In response, several industry leaders formed a consortium to find a standardized process for data mining. The consortium birthed the CRISP-DM process, or the Cross Industry Standard Process for Data Mining.
CRISP-DM remains the standard methodology for tackling data-centric projects because it proves robust while simultaneously providing flexibility and customization. The CRISP-DM model outlines the steps involved in performing data science activities from business need to deployment but more importantly defines a framework that allows iterations through all the phases. In real world applications, the iterative nature allows constant improvement via backtracking to previous tasks and repeating certain actions.
CRISP-DM consists of six major phases defined as follows: Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, Deployment.
· Set objectives – Describe your primary objective from a business perspective.
· Produce project plan – Describe the plan for achieving all project goals.
· Business success criteria – Specify the criteria that will be used to evaluate the success of the project from the business point of view.
· Collect – Perform initial data collection.
· Describe data – Familiarize yourself with the data.
· Explore data – Identify data quality problems and discover first insights into the data.
· Select data – Depending on your goals, determine the final set of data that you will use for the project.
· Clean data – Improve data quality by correcting data errors, deal with missing values etc.
· Construct required data – Merge and format data accordingly.
· Select modeling technique – Select the best modeling technique given the specific scenario.
· Build model – Build and calibrate your model.
· Evaluate the results – Review and evaluate your results using the business success criteria established at the beginning of the project.
· Determine next steps – Decide if you should move forward to deployment or go back and further refine your models.
· Planning for Deployment – Develop a deployment plan.
· Monitor – Perform periodic monitoring and maintenance.
· Review project – If any new insights or issues are found, reiterate through any of the necessary phases.
So, we hope you will join us on our Analytics Journey. Stay tuned for the next post on the Business Understanding phase and creating buy in for a data driven project.
Jason is a Data Scientist at Petro.ai with a master’s degree in Predictive Analytics and Data Science from Northwestern University. He has experience with a multitude of machine learning techniques such as Random Forest, Neural Nets, and Support Vector Machines. With a previous Master’s in Creative Writing, Jason is a fervent believer in the Oxford comma.