Welcome to the next installment of our Analytics Journey, which explores how we at Ruths.ai apply the CRISP-DM method to our Data Science process. Previously, we looked at an overview of the methodology as a whole as well as the Business Understanding, Data Understanding, Data Preparation, and Modeling stages. Next, we examine the Evaluation stage.
Evaluation Stage: What Defines Business Success?
The Evaluation stage consists of evaluating model results based on business metrics created at the beginning of the project and then refining the model to prepare it for deployment. As with previous posts the first time through the cycle, I will refrain from getting too technical and explaining how we evaluate a model. Instead, I want to look at why we are evaluating the model. Let’s look at a couple of key parameters mentioned above: business metrics and defining them at the beginning of the process.
We must define our metrics at the beginning of a project so that we don’t cook the books, so to speak. As a general rule, we don’t want to adjust our metrics based on our results, or move the goal posts, because we then risk finding the results we want to see rather than the results that truly bring insight and potential unexpected findings.
At the same time, these metrics must support the business first and foremost to have the greatest chance of being useful.
Consider an example where our two main metrics are accuracy and interpretability. We strive for getting the right answer, but we also must find a model that makes sense to the people with learned industry knowledge. So, in this instance, we might choose a highly interpretable Linear Regression model rather than a Neural Network, which serves as a black box and might not explain how its results evolve.
This choice speaks to evaluating based on metrics created before an analysis. This choice also speaks to prioritizing the business needs first.
Every Point Has a Counter Point
However, we certainly don’t want our company to ignore potential game changing findings simply because of pre-ordained metrics, which might echo preconceived notions (aka bias). What if our Citizen Data Scientist finds that a Neural Network has a 95% accuracy level compared to a 65% percent accuracy level in a Linear Regression model. Is interpretability worth a 30% loss in accuracy?
While our stated goal in evaluation was to stick to metrics created before the project started, should we ignore the potential gain in accuracy?
Not moving our goal posts, so to speak, is a best practice, yet we need to remain agile.
Am I sending a mixed message? Perhaps. The larger point is that we should not work in a vacuum when evaluating our models. In the scenario described, I would recommend bringing the options to the decision makers for further evaluation. Perhaps interpretability is in fact more valuable than a 30% gain in accuracy, perhaps not. Perhaps, I am losing some Data Science street cred by introducing subjectivity into an evaluation process; however, I feel the sentiment consistent with the persistent message of this Analytics Journey: to meld Data Science with a human element.
Remember, the second part of the Evaluation stage is refining the model. If the decision makers decide they want the increased accuracy, we can iterate through from the beginning and reconsider our model with new business objectives and metrics. Then, we are not moving the goal post to fit our model but rebuilding our model based on new goal posts.
Let’s give our abstract discussion a concrete example. Many people might be hesitant at this point to fully cede control to driverless cars, which happen to have a heavy influence from Neural Networks. We all know most other drivers on the road are bad, but most people think they themselves are a special case. How much evidence would you need to reconsider your metrics and willingness to cede control? If driverless cars become 99.99% crash free, would you still consider yourself a superior driver or would you cede control? How about 99.9999999999? Where is your threshold?
What’s the threshold of evidence needed to change our decision making process? These are the subjective, human aspects that combine with objective metrics in the Evaluation stage of the CRISP-DM method.
Jason is a Data Scientist at Ruths.ai with a master’s degree in Predictive Analytics and Data Science from Northwestern University. He has experience with a multitude of machine learning techniques such as Random Forest, Neural Nets, and Support Vector Machines. With a previous Master’s in Creative Writing, Jason is a fervent believer in the Oxford comma.