Have you ever needed to create a hybrid Spotfire column that combines user inputs with other values? Have you ever had a Rolling Average column in Spotfire get derailed by a lack of data? This post will address both scenarios, in this case one and the same.
The other day, a client came to me with a problem: they had a perfectly functional rolling average column; however, when new wells were released, the data had not yet stabilized, which made the rolling averages either null or, perhaps worse, untrustworthy. They decided the optimal solution would be for their users to input an approximate value to be utilized for the first 10 days until the data stabilized.
To solve their problem, we built a table visualization for them to input values into, then applied those values to every observation of each well. With the inputted value on each row, we were able to create a hybrid column that started with user inputs and rolled into a rolling average.
In this post, we’ll look at the steps needed to address this focused issues, but several tips will be applicable to a variety of situations:
- How to create a rolling average
- How to create a table visualization with only one observation per well (or unique value)
- How to apply an inputted value across all observations of a well (or any categorical group, really)
- How to ensure calculated columns can be utilized with various Spotfire functionality like data replacement and transformations
- How to create a hybrid column with inputted and calculated values
10 Day Rolling Average
The client wanted a 10 Day Rolling Average, which was straightforward enough.
We used a calculated column with the following expression:
Avg([prod vol]) OVER (Intersect([well name],LastPeriods(10,[date])))
This expression results in the average production volume for the last ten days for each well.
Our calculation worked perfectly on existing wells. However, whenever a new well came online, they had either no data or non-representative data affecting the equation, which created a Rube Goldberg effect on other calculations and visualizations. Here is a look at the data (Note that all data, anonymized and simplified for clarity:
So, the production column had no data for the first 10 days. Next, we will look at how we implemented the client’s desired solution of creating a user generated input.
User Generated Rolling Average Substitution
First, we wanted to create a virtual table with just one row per well, so the user could input values in an orderly manner. I say “virtual” because I will only be altering the table in a visualization, not the underlying master table itself. To get started, I created a calculated column using the following:
If(Rank(RowId(),”asc”,[well name])=Min(Rank(RowId(),”asc”,[well name])),True)
This expression leverages the Row ID, ranks rows per well, and gives a True value to the first well. While in our case, the first date gets the True, really all that matters is there is one per well. We can now create a table with only one observation per well by limiting the data. Right-click the visualization > Properties > Data > Limit data using expression and use the following expression:
[Unique Wells] = True
Now that we have a table visualization with a view only of one column per well, we need a column for the user to input their desired values. Because we will be using this column in a subsequent transformation (when we do the inputting), we need to create this calculated column not through traditional column properties but through a transform. In pre-Spotfire X versions, go to Insert > Transformations > Calculate new column > Add. In Spotfire X, go to Data > Transform data > Calculate new column > Add. By creating a column this way, you can also use the column in joins, pivots, etc.
Now, type null in the expression window and name the column as desired. We end up with a column with all nulls to use as our user input. Try double clicking one of the blank cells. You will probably get something like this:
Notice that you can replace the null values with whatever you want. However, unless you have a unique ID, you can only replace all occurrences in the column. Since our whole column is null, this would give the same value to all wells. This behavior may be what you want, but if you want to apply different values to different wells, there are a couple of additional steps.
- Again through a transform and not column properties, create a calculated column and use only RowID() in the expression, giving you a column with all unique RowID values.
- In the window above, choose Select key columns and select your RowID column.
- Put your value in, select the This occurrence only circle, and click Apply.
Now, remember, we have only applied this input to one observation of the well in our virtual table visualization. We need to create a new column derived from the input column, substituting the nulls with whatever was inputted for that well. Use the following:
sn([Avg Rolling Substitution Input],Max([Avg Rolling Substitution Input]) over ([WELL+CMPL_NUM]))
Above, the sn stands for substitute null and replaces the first value when null with the second value. Our second value takes the Max value from our input column for each well. Since each well only has the one inputted value and the rest are nulls, whatever was inputted into our unique well table will be applied to every observation of that well.
Now, we have two columns we will use to create our final hybrid output: a Rolling Average column and a User Input Substitution column, which we will replace based on whatever stipulation the user desires. In our situation, the client wanted to replace the first 10 days. So, we used the following:
If(DenseRank([date],[well name])<=10,[User Input Substitution],[Rolling Avg 10])
First, we rank the observations by date per well. If the row is in the first 10 of the ranking, use the user input. If not, use the Rolling Average.
And there you have it! We have leveraged several Spotfire tricks and expressions to create a Rolling Average, a table visualization with one observation, a calculated column that can be utilized in transformations, and user inputs. All in order to make a Rolling Average allowing the user to manually input the first several values. It seems like a lot of steps, but if you break it down into pieces, you can come out with a valuable column and new skills along the way.
Jason is a Data Scientist at Ruths.ai with a master’s degree in Predictive Analytics and Data Science from Northwestern University. He has experience with a multitude of machine learning techniques such as Random Forest, Neural Nets, and Support Vector Machines. With a previous Master’s in Creative Writing, Jason is a fervent believer in the Oxford comma.