Month: August 2017

CRISP DM Data Preparation: Finding and Counting NULL Values in Spotfire

Hello, good friends.  The next step in our Analytics Journey takes us to the second iteration of Data Preparation.  This is the third step of the CRISP-DM method.

Today, we are going to look at one of the most common data quality issues in Spotfire:  the NULL, aka missing values.  While there are many ways to address NULL values like imputation (a lesson for another day), the first step is simply identifying them. We will walk through Spotfire’s built in NULL identifier and also a more advanced TERR based method.

Read More

CRISP-DM Data Understanding: Marking and Filtering

For the first Data Understanding stage installment in our Analytics Journey, we explored Simpson’s Paradox in the survival statistics from the Titanic to highlight why the Data Understanding stage proves so important in the CRISP-DM process.  This week, we will use the same dataset and demonstrate how Spotfire’s unique Marking and Filtering capabilities make the Data Understanding stage much more efficient and powerful.

Read More

Linear Regression, the simplest Machine Learning Model

Linear Regression models are the simplest linear models available in statistical literature. While the assumptions of linearity and normality seem to restrict the practical use of this model, it is surprisingly successful at capturing basic relationships and predicting in most scenarios. The idea behind the model is to fit a line that mimics the relationship between target variables and a combination of predictors (called independent variables). Multiple regression refers to only one target variable and multiple predictors. These models are popular not only for solving the prediction task but also for working as a model selection tools allowing to find the most important predictors and eliminate redundant variables from the analysis.

Read More

Real Estate Secrets: Hidden Trend Visualization

Everyone who has ever owned or lived in a house knows at least a little bit about the whims of the real estate market. Big houses cost more, neighborhood matters, proximity to basic services is great, age and style are important in some markets, you name it. But what is it that matters the most? This is a question that visualization can help us answer.

Read More

Normal Distribution Curve on a Visualization

I received an interesting request from a user that deserves sharing.  The user requested a visualization showing the curve of a normal distribution of data points.  Now, just to be clear, a visualization that shows the distribution of data points is a histogram, which looks like this:

Histogram

histogram

The histogram might vary a little bit if you change the number of bins being used, but it always has the continuous value along the X-axis and the (Row Count) on the Y-Axis.  However, the user didn’t want to see the bars of the histogram, just a curve that represented the histogram, which would look like this:

Normal Distribution Curve

curve only

This type of visualization is simple and easy to create in Spotfire using the following steps.

Creating the Visualization

  1. Add a bar chart
  2. Configure the X-Axis with the continuous value and the Y-Axis with (Row Count)
  3. On the X-Axis, click the down arrow on the axis selector and make sure the “Auto-Bin” box is checked.
  4. If needed, right click on the axis selector and choose “Number of Bins” to set the desired number of bins.
  5. In the legend, click on the color circle and color the bars the same color as the background (probably white).
  6. Go to Properties > Lines & Curves > Add > Gaussian Curve fit

BAM!  Done!  The Gaussian Curve fit is the normal distribution and represents the histogram as a curve.  If you combine the curve and the histogram, it looks like this:

curve and histogram

 

In the end, Spotfire had the functionality to quickly and easily meet the user’s needs!

CRISP-DM Business Understanding: KPI Charts

As Rustin Cohle said in True Detective, “Time is a flat circle,” so welcome back to the beginning of our Analytics Journey!  Previously, we cycled through the CRISP-DM process from beginning to end, explaining the stages as well as the way we approach our Data Science life cycle at Ruths.ai.  We have strived to demonstrate the importance of melding the human element with quantitative rigor.  Now, we will re-iterate through the steps as all good analytics processes will do, looking for ways to strengthen our model.  This time through, we will move from the theoretical to practical with an eye towards enacting the stages in the real world.

Read More