Author: Jason May

Jason is a Junior Data Scientist at Ruths.ai with a Master’s degree in Predictive Analytics and Data Science from Northwestern University. He has experience with a multitude of machine learning techniques such as Random Forest, Neural Nets, and Hidden Markov Models. With a previous Master’s in Creative Writing, Jason is a fervent believer in the Oxford comma.

Exporting Spotfire to PDF with Action Control Button

Recent Spotfire versions include the ability to export visualizations, pages, or the entire analysis into a PDF from the File menu or even with an Action Control button.  Finally!  Version 7.12 includes extremely user friendly customization options while previous versions require some IronPython code:  we will address both here.

The 7.12 version gives the user an interface for customization that we could previously only access via IronPython.  Spotfire has delivered what many users for years have clamored for, so check it out after the break…

Read More

Jason is a Junior Data Scientist at Ruths.ai with a Master’s degree in Predictive Analytics and Data Science from Northwestern University. He has experience with a multitude of machine learning techniques such as Random Forest, Neural Nets, and Hidden Markov Models. With a previous Master’s in Creative Writing, Jason is a fervent believer in the Oxford comma.

Spotfire: Utilizing Column Properties (and How to Make Your Own)

Did you know that you can create your own Column Property and then designate a property value for each column?  Did you know that you can do this not only in the Property Control window but also in the Column Property window for even more flexibility?

Why might you want to create your own Column Property?  Maybe you want to group a handful of columns together by some shared quality.  Why might you want to do that?  Both Custom Expressions and Search Expressions can utilize a Column Property, so you can create these properties for columns and then reference the columns with a shared quality.

Frankly, there is a lot of uncharted territory and potential innovation within this relatively unknown Spotfire feature, so our main focus today will be to discuss briefly what Column Properties are, demonstrate two ways to make your own, and show one possible use case.

Read More

Jason is a Junior Data Scientist at Ruths.ai with a Master’s degree in Predictive Analytics and Data Science from Northwestern University. He has experience with a multitude of machine learning techniques such as Random Forest, Neural Nets, and Hidden Markov Models. With a previous Master’s in Creative Writing, Jason is a fervent believer in the Oxford comma.

Using Spotfire Search Expressions to Limit Columns in Data Function Parameters Window

Recently, I added a hierarchical column to a data table only to see a data function break and scream error messages at me as a result.  The data function used the table in question and included all of the columns of the data table using “*” in the Parameter Input Search Expression window.  However, the data function did not know how to handle a hierarchical column as an input.  So, how could I exclude this column while keeping all of the others without hard coding them, so that they would remain dynamic for data replacement purposes?  The Search Expression window is the answer.

In the Edit Parameters window, I chose the data table input, then typed “not ColumnType::Hierarchy” in the Search Expression window.  Each column has a column type, which we can see in Column Properties.

This Search Expression calls the Hierarchy column type, then excludes it with “not”.  That’s all it took to fix my data function!

Search Expressions can be very powerful in limiting data table metadata so that only columns, categories, or certain data types get utilized by different Spotfire capacities.  Check out a recent blog I wrote about using Search Expressions in Property Controls to get a more in depth look at different expression terminology.

Jason is a Junior Data Scientist at Ruths.ai with a Master’s degree in Predictive Analytics and Data Science from Northwestern University. He has experience with a multitude of machine learning techniques such as Random Forest, Neural Nets, and Hidden Markov Models. With a previous Master’s in Creative Writing, Jason is a fervent believer in the Oxford comma.

Using Spotfire Search Expressions to Limit Property Controls

Spotfire Property Controls like drop-down menus and list boxes can become cumbersome when a dataset has numerous variables and/or multiple data types.  What if we don’t want all columns to be available to the user for industry reasons (some columns wouldn’t make sense) or simply efficiency reasons (too many columns to sift through)?  What if only certain data types would be appropriate to use?  Fortunately, Spotfire lets us limit these list Property Controls via Search Expression.

Maybe you’ve seen those little search boxes in Spotfire which allow you to search for data (columns, categories, etc).  Maybe not, as I’m finding out even many Spotfire experts haven’t.  Search Expressions act distinctly from Custom Expressions.  These expressions do not use mathematical operations.  They do not use brackets to denote a column.  You cannot use another Document Property within the limiting expression.  These expressions serve only to search for strings using a simplified language (usually with the goal of selecting or limiting something).  Property Controls, Data Function Parameter windows, and the Information Designer are a few of the places you might find a Search Expression window and today’s discussion on Search Expressions will most likely apply to all of the scenarios.

So, let’s take a look at how we can use Search Expressions in different ways to control and limit the options in our list based Property Controls.

Read More

Jason is a Junior Data Scientist at Ruths.ai with a Master’s degree in Predictive Analytics and Data Science from Northwestern University. He has experience with a multitude of machine learning techniques such as Random Forest, Neural Nets, and Hidden Markov Models. With a previous Master’s in Creative Writing, Jason is a fervent believer in the Oxford comma.

Well Spacing: More Than One Number

With the rise of unconventionals and the increase in wells permeating already tapped fields, Well Spacing has become the hot topic du jour.  But, what is well spacing?  Does it refer simply to how many wells are in one area?  If so, is that area defined by a circle or rectangle or other definition?  What is the make-up of the nearby wells?  Today, we will examine some common terms and approaches to analyzing well spacing.

Two templates that utilize the features that we will discuss in today’s post are Well Spacing Feature Calculations and Horizontal Well Spacing Model.

Key terms we will discuss today:  Voronoi Diagram, circular and rectangular radius, Area of Interest, intersect area, intersecting wells, closest wells, closest distance, aggregated well statistics.

Read More

Jason is a Junior Data Scientist at Ruths.ai with a Master’s degree in Predictive Analytics and Data Science from Northwestern University. He has experience with a multitude of machine learning techniques such as Random Forest, Neural Nets, and Hidden Markov Models. With a previous Master’s in Creative Writing, Jason is a fervent believer in the Oxford comma.

Rank Sorting a Spotfire Gantt Chart’s by the X-axis

Recently, a client reached out to see if I could help re-order the categorical Y-axis of their gantt chart by the numerical value on the X-axis. The client wanted the Y- axis ordered by the date of the first occurrence of an event. As the chart descended on the Y-Axis, the values would get larger on the X-axis. To do so took some trickery and an outsmarting of Spotfire–the methods which I will share here. Solving the problem left me with two learned lessons:

  1. How to re-order a Gantt Chart’s (or any Scatter Plot’s) categorical Y-axis by the value on the X-axis.
  2. Bonus trick: how to “white-out” an axis label so that it doesn’t show, while other labels remain.

Read More

Jason is a Junior Data Scientist at Ruths.ai with a Master’s degree in Predictive Analytics and Data Science from Northwestern University. He has experience with a multitude of machine learning techniques such as Random Forest, Neural Nets, and Hidden Markov Models. With a previous Master’s in Creative Writing, Jason is a fervent believer in the Oxford comma.

Spotfire Solution: Replacing Data with Different Named Columns while Using the Data Limiting Expression

In Spotfire, the filter panel allows one to easily remove ranges of values from your data.  We can gain even further granularity and control of what we hide from a dataset by applying the “Limit data using expression” window.  However, the “Limit data using expression window” doesn’t play nice when you want to replace a data table by matching columns with different names.

When we use replace data functionality and the limiting expression uses a matched column, the expression doesn’t update the column name (as it does with other expressions), which leads to unexpected results.  Call this one of those “endearing” Spotfire intricacies.

Fortunately, we can get around this issue by creating a Show/Hide calculated column and rerouting our limiting expression through a calculate column, which will update when you replace data.

Read More

Jason is a Junior Data Scientist at Ruths.ai with a Master’s degree in Predictive Analytics and Data Science from Northwestern University. He has experience with a multitude of machine learning techniques such as Random Forest, Neural Nets, and Hidden Markov Models. With a previous Master’s in Creative Writing, Jason is a fervent believer in the Oxford comma.

NFL: Predicting 2018 Win Totals with Data Science

With the Super Bowl just behind us, it’s time to predict wins for the 2018 NFL Season.  At the start of the playoffs, we looked at a model which predicted how many games NFL teams should have won in 2017 and compared our results to Football Outsider’s Pythagorean Win Expectancy.  We were able to improve on Pythagorean Win Expectancy for last year’s results, aka how many games a team should have won, but our backwards looking models were unable to beat Pythagorean Win Expectancy in predicting next year’s wins.  Today, we will build some models trying specifically to predict how many games teams will win next year.

If you simply want to know how many games your team will win in 2018, strictly for recreational purposes of course, you can skim to the end or check out our Spotfire Template.  But, for Football Outsiders fans, those interested in what makes up wins and losses, or those interested in the Data Science process, read on.

Read More

Jason is a Junior Data Scientist at Ruths.ai with a Master’s degree in Predictive Analytics and Data Science from Northwestern University. He has experience with a multitude of machine learning techniques such as Random Forest, Neural Nets, and Hidden Markov Models. With a previous Master’s in Creative Writing, Jason is a fervent believer in the Oxford comma.

How many games should your NFL team have won this season?

How many games should your NFL team have won this season?  Everyone knows a lucky bounce here and a bad call there can have a significant impact on the win-loss bottom line.  Hard core fans of Sports Analytics would recognize this factor as the driver behind Pythagorean Win Totals, a statistic derived to measure true performance.  Today, we are going to look to see if we can beat Pythagorean Win Totals as a predictor for how many games a team won in a certain season. IE, how many games should your team have won.

Spoiler:  we can make a better predictor, but in a way that makes us re-evaluate our understanding of Pythagorean Win Totals.

If you simply want to know how many games your team should have won, you can go straight to our Spotfire Template.  But, for Football Outsiders fans or those more interested in what makes up wins and losses, read on.

Read More

Jason is a Junior Data Scientist at Ruths.ai with a Master’s degree in Predictive Analytics and Data Science from Northwestern University. He has experience with a multitude of machine learning techniques such as Random Forest, Neural Nets, and Hidden Markov Models. With a previous Master’s in Creative Writing, Jason is a fervent believer in the Oxford comma.

Memories from the Houston Astros World Series Championship

We interrupt this analytically, data focused blog to attempt a little tug at the heart strings.  After all, Ruths.ai is a Houston proud company, and we all went through Hurricane Harvey and the subsequent Astros World Series run that brought the city together.  While this article might not delve into analytics, its subject–the 2017 World Series Champion Houston Astros–certainly serves as a model for how an analytically focused enterprise should run.

This article first appeared Friday, November 17 at Astros County, written by myself, our resident Astros fanatic.

Read More

Jason is a Junior Data Scientist at Ruths.ai with a Master’s degree in Predictive Analytics and Data Science from Northwestern University. He has experience with a multitude of machine learning techniques such as Random Forest, Neural Nets, and Hidden Markov Models. With a previous Master’s in Creative Writing, Jason is a fervent believer in the Oxford comma.