Tag: CRISP-DM

CRISP-DM Evaluation: Train and Test Set

Last week in our Analytics Journey, we worked on variable selection in the Modeling stage of the CRISP-DM method.  Having built a model, it’s once again time to see how it did with the Evaluation stage.  One of the most important parts of evaluating a model comes in properly constructing a training and testing set for evaluation.

Read More

Jason is a Junior Data Scientist at Ruths.ai with a Master’s degree in Predictive Analytics and Data Science from Northwestern University. He has experience with a multitude of machine learning techniques such as Random Forest, Neural Nets, and Hidden Markov Models. With a previous Master’s in Creative Writing, Jason is a fervent believer in the Oxford comma.

CRISP-DM Modeling: Forward and Backward Selection

Welcome back everyone to our Analytics Journey series.  Those of us in Houston have been through a trying time, and our thoughts are with the community.  We will try to return to a semblance of normalcy by continuing where we left off in our journey.

With all of our hard work in understanding and preparing the data during previous steps of the CRISP-DM method–exploring data, choosing a model space, removing NULLs, removing Multicollinearity–it’s time to have some fun with the Modeling stage.  Today, we’ll look at an aspect of Multiple Linear Regression:  Forward and Backward Selection.

Read More

Jason is a Junior Data Scientist at Ruths.ai with a Master’s degree in Predictive Analytics and Data Science from Northwestern University. He has experience with a multitude of machine learning techniques such as Random Forest, Neural Nets, and Hidden Markov Models. With a previous Master’s in Creative Writing, Jason is a fervent believer in the Oxford comma.

CRISP DM Data Preparation: Finding and Counting NULL Values in Spotfire

Hello, good friends.  The next step in our Analytics Journey takes us to the second iteration of Data Preparation.  This is the third step of the CRISP-DM method.

Today, we are going to look at one of the most common data quality issues in Spotfire:  the NULL, aka missing values.  While there are many ways to address NULL values like imputation (a lesson for another day), the first step is simply identifying them. We will walk through Spotfire’s built in NULL identifier and also a more advanced TERR based method.

Read More

Jason is a Junior Data Scientist at Ruths.ai with a Master’s degree in Predictive Analytics and Data Science from Northwestern University. He has experience with a multitude of machine learning techniques such as Random Forest, Neural Nets, and Hidden Markov Models. With a previous Master’s in Creative Writing, Jason is a fervent believer in the Oxford comma.

CRISP-DM Data Understanding: Marking and Filtering

For the first Data Understanding stage installment in our Analytics Journey, we explored Simpson’s Paradox in the survival statistics from the Titanic to highlight why the Data Understanding stage proves so important in the CRISP-DM process.  This week, we will use the same dataset and demonstrate how Spotfire’s unique Marking and Filtering capabilities make the Data Understanding stage much more efficient and powerful.

Read More

Jason is a Junior Data Scientist at Ruths.ai with a Master’s degree in Predictive Analytics and Data Science from Northwestern University. He has experience with a multitude of machine learning techniques such as Random Forest, Neural Nets, and Hidden Markov Models. With a previous Master’s in Creative Writing, Jason is a fervent believer in the Oxford comma.

CRISP-DM Business Understanding: KPI Charts

As Rustin Cohle said in True Detective, “Time is a flat circle,” so welcome back to the beginning of our Analytics Journey!  Previously, we cycled through the CRISP-DM process from beginning to end, explaining the stages as well as the way we approach our Data Science life cycle at Ruths.ai.  We have strived to demonstrate the importance of melding the human element with quantitative rigor.  Now, we will re-iterate through the steps as all good analytics processes will do, looking for ways to strengthen our model.  This time through, we will move from the theoretical to practical with an eye towards enacting the stages in the real world.

Read More

Jason is a Junior Data Scientist at Ruths.ai with a Master’s degree in Predictive Analytics and Data Science from Northwestern University. He has experience with a multitude of machine learning techniques such as Random Forest, Neural Nets, and Hidden Markov Models. With a previous Master’s in Creative Writing, Jason is a fervent believer in the Oxford comma.

CRISP DM: Deployment

Welcome to the next installment of our Analytics Journey, which explores how we at Ruths.ai apply the CRISP-DM method to our Data Science process. Previously, we looked at an overview of the methodology as a whole as well as the Business UnderstandingData UnderstandingData Preparation, Modeling, and Evaluation stages.  Next, we examine the final stage:  Deployment.

The.  Final.  Stage.  Now, we just have to turn this thing on and reap the rewards, right?

      

Unfortunately, Deployment does not just happen with the push of a George Jetson button.

Read More

Jason is a Junior Data Scientist at Ruths.ai with a Master’s degree in Predictive Analytics and Data Science from Northwestern University. He has experience with a multitude of machine learning techniques such as Random Forest, Neural Nets, and Hidden Markov Models. With a previous Master’s in Creative Writing, Jason is a fervent believer in the Oxford comma.

CRISP-DM: Evaluation

Welcome to the next installment of our Analytics Journey, which explores how we at Ruths.ai apply the CRISP-DM method to our Data Science process. Previously, we looked at an overview of the methodology as a whole as well as the Business UnderstandingData Understanding, Data Preparation, and Modeling stages.  Next, we examine the Evaluation stage.

Read More

Jason is a Junior Data Scientist at Ruths.ai with a Master’s degree in Predictive Analytics and Data Science from Northwestern University. He has experience with a multitude of machine learning techniques such as Random Forest, Neural Nets, and Hidden Markov Models. With a previous Master’s in Creative Writing, Jason is a fervent believer in the Oxford comma.

CRISP-DM: Modeling

Welcome to the next installment of our Analytics Journey, which explores how we at Ruths.ai apply the CRISP-DM method to our Data Science process. Previously, we looked at an overview of the methodology as a whole as well as the Business UnderstandingData Understanding, and Data Preparation stages.  Next, we examine the Modeling stage.

We have finally reached the fun part!  We have reached the step where we can move from a descriptive look back to a predictive look forward.  In the Data Understanding phase, we discussed the 80-20 rule, which states that data professionals spend 80% of their time cleaning data, akin to the tedious hours of practice preparing for the big game.  Hopefully, we have shown through phenomena like Simpson’s Paradox that even the Data Understanding/Preparation stages can bring intriguing insight; however, the Modeling stage represents the true opportunity and most intellectually stimulating phase.

Read More

Jason is a Junior Data Scientist at Ruths.ai with a Master’s degree in Predictive Analytics and Data Science from Northwestern University. He has experience with a multitude of machine learning techniques such as Random Forest, Neural Nets, and Hidden Markov Models. With a previous Master’s in Creative Writing, Jason is a fervent believer in the Oxford comma.

CRISP-DM Data Preparation: Data Selection

Welcome to the next installment of our Analytics Journey, which explores how we at Ruths.ai apply the CRISP-DM method to our Data Science process.   Previously, we looked at an overview of the methodology as a whole as well as the Business Understanding and Data Understanding stages.  Next, we examine the stage of Data Preparation.

Read More

Jason is a Junior Data Scientist at Ruths.ai with a Master’s degree in Predictive Analytics and Data Science from Northwestern University. He has experience with a multitude of machine learning techniques such as Random Forest, Neural Nets, and Hidden Markov Models. With a previous Master’s in Creative Writing, Jason is a fervent believer in the Oxford comma.

CRISP-DM Data Understanding: Simpson’s Paradox

Welcome to the next installment of our Analytics Journey, which explores how we at Ruths.ai apply the CRISP-DM method to our Data Science process.   Previously, we looked at an overview of the methodology as a whole as well as the first step, Business Understanding.  Next, we examine the stage of Data Understanding.

Read More

Jason is a Junior Data Scientist at Ruths.ai with a Master’s degree in Predictive Analytics and Data Science from Northwestern University. He has experience with a multitude of machine learning techniques such as Random Forest, Neural Nets, and Hidden Markov Models. With a previous Master’s in Creative Writing, Jason is a fervent believer in the Oxford comma.