Jason is a Junior Data Scientist at Ruths.ai with a Master’s degree in Predictive Analytics and Data Science from Northwestern University. He has experience with a multitude of machine learning techniques such as Random Forest, Neural Nets, and Hidden Markov Models. With a previous Master’s in Creative Writing, Jason is a fervent believer in the Oxford comma.
In Spotfire, the filter panel allows one to easily remove ranges of values from your data. We can gain even further granularity and control of what we hide from a dataset by applying the “Limit data using expression” window. However, the “Limit data using expression window” doesn’t play nice when you want to replace a data table by matching columns with different names.
When we use replace data functionality and the limiting expression uses a matched column, the expression doesn’t update the column name (as it does with other expressions), which leads to unexpected results. Call this one of those “endearing” Spotfire intricacies.
Fortunately, we can get around this issue by creating a Show/Hide calculated column and rerouting our limiting expression through a calculate column, which will update when you replace data.
With the Super Bowl just behind us, it’s time to predict wins for the 2018 NFL Season. At the start of the playoffs, we looked at a model which predicted how many games NFL teams should have won in 2017 and compared our results to Football Outsider’s Pythagorean Win Expectancy. We were able to improve on Pythagorean Win Expectancy for last year’s results, aka how many games a team should have won, but our backwards looking models were unable to beat Pythagorean Win Expectancy in predicting next year’s wins. Today, we will build some models trying specifically to predict how many games teams will win next year.
If you simply want to know how many games your team will win in 2018, strictly for recreational purposes of course, you can skim to the end or check out our Spotfire Template. But, for Football Outsiders fans, those interested in what makes up wins and losses, or those interested in the Data Science process, read on.
How many games should your NFL team have won this season? Everyone knows a lucky bounce here and a bad call there can have a significant impact on the win-loss bottom line. Hard core fans of Sports Analytics would recognize this factor as the driver behind Pythagorean Win Totals, a statistic derived to measure true performance. Today, we are going to look to see if we can beat Pythagorean Win Totals as a predictor for how many games a team won in a certain season. IE, how many games should your team have won.
Spoiler: we can make a better predictor, but in a way that makes us re-evaluate our understanding of Pythagorean Win Totals.
If you simply want to know how many games your team should have won, you can go straight to our Spotfire Template. But, for Football Outsiders fans or those more interested in what makes up wins and losses, read on.
We interrupt this analytically, data focused blog to attempt a little tug at the heart strings. After all, Ruths.ai is a Houston proud company, and we all went through Hurricane Harvey and the subsequent Astros World Series run that brought the city together. While this article might not delve into analytics, its subject–the 2017 World Series Champion Houston Astros–certainly serves as a model for how an analytically focused enterprise should run.
This article first appeared Friday, November 17 at Astros County, written by myself, our resident Astros fanatic.
Last week in our Analytics Journey, we worked on variable selection in the Modeling stage of the CRISP-DM method. Having built a model, it’s once again time to see how it did with the Evaluation stage. One of the most important parts of evaluating a model comes in properly constructing a training and testing set for evaluation.
Welcome back everyone to our Analytics Journey series. Those of us in Houston have been through a trying time, and our thoughts are with the community. We will try to return to a semblance of normalcy by continuing where we left off in our journey.
When building a Multiple Linear Regression model, we want to limit the correlation between predictor (X) variables. Luckily, Spotfire has a tool that makes identifying the correlation (called multicollinearity) effortless. I will walk you through the tool, and you can see the resulting template here.
Hello, good friends. The next step in our Analytics Journey takes us to the second iteration of Data Preparation. This is the third step of the CRISP-DM method.
Today, we are going to look at one of the most common data quality issues in Spotfire: the NULL, aka missing values. While there are many ways to address NULL values like imputation (a lesson for another day), the first step is simply identifying them. We will walk through Spotfire’s built in NULL identifier and also a more advanced TERR based method.
For the first Data Understanding stage installment in our Analytics Journey, we explored Simpson’s Paradox in the survival statistics from the Titanic to highlight why the Data Understanding stage proves so important in the CRISP-DM process. This week, we will use the same dataset and demonstrate how Spotfire’s unique Marking and Filtering capabilities make the Data Understanding stage much more efficient and powerful.
As Rustin Cohle said in True Detective, “Time is a flat circle,” so welcome back to the beginning of our Analytics Journey! Previously, we cycled through the CRISP-DM process from beginning to end, explaining the stages as well as the way we approach our Data Science life cycle at Ruths.ai. We have strived to demonstrate the importance of melding the human element with quantitative rigor. Now, we will re-iterate through the steps as all good analytics processes will do, looking for ways to strengthen our model. This time through, we will move from the theoretical to practical with an eye towards enacting the stages in the real world.