- Have you wanted to to limit calculated columns to a smaller data set but been unable to because calculated columns always take into account the entire data set?
- Have you ever wanted IF statements to be more dynamic?
- Would you like to build filtering or data reduction into a workflow but still maintain the original data connections?
In today’s post, I am going to show you how to use TERR to pass rows and columns through from one table to another. There are probably many good use cases for this functionality, but I will focus on two.
- Building a predictive model
- Creating calculations on a subset of data
Building a predictive model — When running predictive models in Spotfire (Regression, Classification and most templates I’ve seen), the models will run on the entire data table. Filters are not applied. However, it is quite often the case that users need to filter or remove data, be it for nulls or values outside an allowable range or perhaps the user simply wants to run a model for a certain type curve region or area of the field. That’s when being able to pass through specific records to another table is extremely helpful. Passing records through to a second table and then connecting the second table to the model allows the user to maintain the original data source, set and connections. Additionally, as part of process, the user can decide whether they want to pass all columns or only a few, which can reduce the number of columns in a drop down list or selector.
Creating calculations on subset of data — When calculated columns are created in Spotfire, they always take into account the entire data set and are NOT inclusive of filtering. This can be worked around in DXPs by writing expressions on the y axis of a visualization or by incorporating property controls into calculations, but these are time consuming workarounds. Another option is to pass a subset of data through to another table where the calculations are housed.
This is all done with an extremely simply TERR script.
- Go to the Edit menu>Data Function Properties> Register New
- Name the script, something like “Pass through”
- Set the type as R script – TIBCO Enterprise Runtime for R
- Enter this script in the script tab (as shown in the screenshot below) — output <- input
- Go to the Input parameters tab, click the add button
- Name the input parameter ‘input’, name the display name ‘input’.
- Set the Type: to Table, click the all button to all data types
- Go to the output parameters tab, click the add button
- Name the output parameter ‘output’, name the display name ‘output,’
- Set the Type: to Table
- Click the run button
- If Spotfire asks you about saving, this is optional. You may save the script in the Library or you may keep it embedded in the file.
- In the Input page, take note of the Refresh function automatically checkbox, users may want to check it depending on the workflow…then…
- Select Columns as the input handler. Then users may select which columns are passed through in one of two ways….
- Set the radio button to Columns and select which columns the new data table should have OR
- Set can set the radio button to Search expression and enter ‘*’ in the expression box, and all columns will be passed through.
- Then, after this selection has been made, if the user would like to use marking or filtering to limit to a subset of records, choose a Marking or Filtering scheme (as shown below) in order to further limit the data passed through to the new table.
- Go to the Output page
- Set the output handler to data table
- Set the radio button to create new data table
- As soon as the dialog is closed, there should be a new data table called output. If marking or filtering was used to limit data, make sure this is clearly described and setup in the DXP.
In the example shown below, I have created two scripts, one attached to marking and another attached to filtering. Watch the short video to see the script in action.
I found an even better use for this function today that I wanted to share. I was working with a user who had facie data in box blots. He had one box plot showing data by each individual facie, but then he wanted to be able to select a subset of facies and show a box plot for the combined facie data. We used the code above in conjunction with a treemap. The treemap had no size by variable and is connect to a single level hierarchy. The treemap also has its own marking, and when the user marks each facie, that data is sent to the new table, which the box plot is connected to. This is a huge improvement over the IF statements in calculated columns that we put together at first. This same result can also be achieved with the subsets function, but the user liked the look and feel of the treemap “buttons”.
Guest Spotfire blogger residing in Whitefish, MT. Working for SM Energy’s Advanced Analytics and Emerging Technology team!