This is the fifth part of a series on Spotfire Properties. In previous posts, I discussed Document Properties
, Data Table Properties
, Column Properties
, and Data Connection
Properties. This week we’ll take a look at data function properties. Next week, the series will conclude with Visualization Properties. Data function properties is a bit of a beefier subject because all data functions can be created out of the data function properties menu.
What is a Data Function?
Since I am writing this for the new Spotfire user, let’s start with the basics. What is a data function, and why would you need one? A data function is TERR code written to perform a specific task within Spotfire. What kind of tasks you ask? I’ll come back to that in just a sec. First, I want to define TERR. TERR stands for TIBCO Enterprise Runtime for R. TERR is the Spotfire version of open source R. R is a programming language for statistical computing and graphics. The R language is widely used among statisticians and data miners for developing statistical software and data analysis (Wiki).
As it relates to Spotfire, TERR expands upon the functionality of the application. For example, Exchange.ai is about to post a template that uses TERR to identify duplicate records (link coming soon). You can use TERR to manipulate data, pass data thru to new tables, and expand upon the statistical and predictive modeling tools that come with the application. For example, Spotfire doesn’t have a Random Forest model, but you can build one with TERR. The only limits are really your own programming skills.
Now, I’ve already written a blog post on TERR basics, which includes an explanation of all the TERR screens. Most of what is in that post I would have also covered here. Therefore, I am going to use this post to expand on the different options in the screens, rather than walking through a single use case. Lastly, Exchange.ai also has a few templates such as this template and this template to get you started.
Data Function Basics
Let’s review the basics of Data Functions. Creating data functions is a 3 step process.
- Create the script
- Create the parameters
- Run the script to map the parameters to the data in the DXP
Key concepts to be aware of:
- The script is the “meat” of the data function. Within the script, you’ll find at least one input and one output parameter. The simplest R script I’ve ever written is output <- input. Input is the input parameter, and…yeah, you can finish that sentence I bet.
- TERR (and R) are object-oriented languages, which means programmers can create objects within the code, assign values to the objects and then reference the object down the road rather than all the values. This makes programming easier. In the example above, input and output are both objects.
- Input and output parameters tell Spotfire what type of object to work with. The object could be a table, column, document property or another object.
- Running the script triggers the dialogs where you will map the parameters to the actual data in the DXP.
- Data functions can be connected to marking and filtering. For example, you can pass the results of marking or filtering to a new table.
- Users may create data functions from scratch in Spotfire, or users may import data functions from the Spotfire library or another file.
- By default, data functions embed within the analysis. However, users have the ability to save them to the library for reuse or sharing.
With that said, here is a very simple script that I’ll reference throughout the post. This single line of R code that will check for duplicates in a data set using two identifiers that define the granularity of the table. The output is a column called duplicate.
duplicate <- duplicated(data.frame(key1, key2))
duplicated and data.frame are TERR (or R) functions
duplicate, key1, and key2 are objects in the script
key1 and key2 are input parameters
duplicate is the output parameter
Data Function Properties Main Screen
This dialog lists all data functions in the DXP. From this screen you can …
- Create new data functions
- Edit existing data functions and their parameters
- Refresh data functions
- Delete data functions
- Save data functions to the library
- Export a data function
Related to the buttons above:
- The term “Register New” can be a bit confusing to new users. This really means create a new data function. In the process, you’ll have the option to save it in the library or register it.
- Clicking the Edit Script button will let you modify the script or the input and output parameters.
- Clicking the Edit Parameters button allows you to change the mapping of data from the parameters to the DXP content.
Script & Parameters
These three dialogs define the script, input parameters, and output parameters. Because the TERR Basics post covers them in detail, I want to focus on the different options available to the input and output parameters.
Input and Output Parameters
When creating input and output parameters, there are three options — table, value, and column. ‘
- If your input or output is an entire table, choose Table. I use this option when I am simply passing a limited data set from my original table to a new table.
- If your input or output is a single column, choose Column. The script shown above for identifying duplicates uses a Column output. The data function creates a column called “duplicate”.
- If your input is a hard-coded value or a document property, choose Value.
Run the Data Function
After you have entered the script, input parameters, and output parameters, the next step is clicking the Run button. If Spotfire asks if you want to save the data function to the library, you can say no. It will not impact your DXP. This is simply to give the option to save the data function to the library so others may access it. As an administrator, I ask users NOT to do this because it clutters up the library. It is also hard to know what a given data function is for or if it even works.
Anyway, this is the step in the process where you map the parameters to the content of the DXP. Let’s tackle the inputs first. I have intentionally added two unnecessary parameters to demonstrate that the options for input handlers depend on the type of input parameter. Each input parameter type has different options.
- For Column type, there are three options — Column, Expression, and None. The most common input handler is Column, which I have used in data functions that manipulate or calculate based on a specific column of data.
- For Value type, there are six options — Value, Document property, Data table property, Column property, Expression, and None. I most frequently use Document property.
- For Table type, there are three options — Columns, Expression, and None. You can tell Spotfire to work with a subset of the columns in the table by using the Select Columns button. Alternatively, typing “*” in “Search expression” will use all columns in a table. It’s not visible in the screenshot shown, but just below the “Search expression” section, you will also find options to connect the contents of the table to marking or filtering. This is explained in the TERR Basics post.
I do want to note that I have never used the None option in either input or output handlers. If someone has, please tell me about it in Comments.
Now, for outputs, it is also true that the options presented differ depending on the parameter type. As you can see, Column, Value, and Table all have different options.
- The Column and Table Type have the same four options — Data table, Columns, Rows, and None. Use Data table if you are creating an entirely new table. Set the type to Columns if the output is a column that should be added to another table. Use Rows if you are adding rows to a table.
- In Value Type, there are six options — Data table, Columns, Rows, Document property, Data table property, Column property, and None. The same advice is true of outputs here as for inputs.
As I was writing this, I realized that if I were creating a data function that output rows, I’m not sure which type I would use. The options for adding rows are part of both the Column and Table Type. Setting up a Column type to insert rows seems counter-intuitive. I just haven’t had to write this type of data function yet. If you know, please Comment!
Hopefully, explaining some of the common uses of the different types of input and output parameters will help you better understand TERR function and how to convert R code to TERR. Thanks!
Guest Spotfire blogger residing in Whitefish, MT. Working for SM Energy’s Advanced Analytics and Emerging Technology team!