What is a Data Function?
Data Function Basics
- Create the script
- Create the parameters
- Run the script to map the parameters to the data in the DXP
- The script is the “meat” of the data function. Within the script, you’ll find at least one input and one output parameter. The simplest R script I’ve ever written is output <- input. Input is the input parameter, and…yeah, you can finish that sentence I bet.
- TERR (and R) are object-oriented languages, which means programmers can create objects within the code, assign values to the objects and then reference the object down the road rather than all the values. This makes programming easier. In the example above, input and output are both objects.
- Input and output parameters tell Spotfire what type of object to work with. The object could be a table, column, document property or another object.
- Running the script triggers the dialogs where you will map the parameters to the actual data in the DXP.
- Data functions can be connected to marking and filtering. For example, you can pass the results of marking or filtering to a new table.
- Users may create data functions from scratch in Spotfire, or users may import data functions from the Spotfire library or another file.
- By default, data functions embed within the analysis. However, users have the ability to save them to the library for reuse or sharing.
duplicate <- duplicated(data.frame(key1, key2))
Data Function Properties Main Screen
- Create new data functions
- Edit existing data functions and their parameters
- Refresh data functions
- Delete data functions
- Save data functions to the library
- Export a data function
- The term “Register New” can be a bit confusing to new users. This really means create a new data function. In the process, you’ll have the option to save it in the library or register it.
- Clicking the Edit Script button will let you modify the script or the input and output parameters.
- Clicking the Edit Parameters button allows you to change the mapping of data from the parameters to the DXP content.
Script & Parameters
Input and Output Parameters
- If your input or output is an entire table, choose Table. I use this option when I am simply passing a limited data set from my original table to a new table.
- If your input or output is a single column, choose Column. The script shown above for identifying duplicates uses a Column output. The data function creates a column called “duplicate”.
- If your input is a hard-coded value or a document property, choose Value.
Run the Data Function
After you have entered the script, input parameters, and output parameters, the next step is clicking the Run button. If Spotfire asks if you want to save the data function to the library, you can say no. It will not impact your DXP. This is simply to give the option to save the data function to the library so others may access it. As an administrator, I ask users NOT to do this because it clutters up the library. It is also hard to know what a given data function is for or if it even works.
Anyway, this is the step in the process where you map the parameters to the content of the DXP. Let’s tackle the inputs first. I have intentionally added two unnecessary parameters to demonstrate that the options for input handlers depend on the type of input parameter. Each input parameter type has different options.
- For Column type, there are three options — Column, Expression, and None. The most common input handler is Column, which I have used in data functions that manipulate or calculate based on a specific column of data.
- For Value type, there are six options — Value, Document property, Data table property, Column property, Expression, and None. I most frequently use Document property.
- For Table type, there are three options — Columns, Expression, and None. You can tell Spotfire to work with a subset of the columns in the table by using the Select Columns button. Alternatively, typing “*” in “Search expression” will use all columns in a table. It’s not visible in the screenshot shown, but just below the “Search expression” section, you will also find options to connect the contents of the table to marking or filtering. This is explained in the TERR Basics post.
I do want to note that I have never used the None option in either input or output handlers. If someone has, please tell me about it in Comments.
Now, for outputs, it is also true that the options presented differ depending on the parameter type. As you can see, Column, Value, and Table all have different options.
- The Column and Table Type have the same four options — Data table, Columns, Rows, and None. Use Data table if you are creating an entirely new table. Set the type to Columns if the output is a column that should be added to another table. Use Rows if you are adding rows to a table.
- In Value Type, there are six options — Data table, Columns, Rows, Document property, Data table property, Column property, and None. The same advice is true of outputs here as for inputs.
As I was writing this, I realized that if I were creating a data function that output rows, I’m not sure which type I would use. The options for adding rows are part of both the Column and Table Type. Setting up a Column type to insert rows seems counter-intuitive. I just haven’t had to write this type of data function yet. If you know, please Comment!
Hopefully, explaining some of the common uses of the different types of input and output parameters will help you better understand TERR function and how to convert R code to TERR. Thanks!
Guest Spotfire blogger residing in Whitefish, MT. Working for SM Energy’s Advanced Analytics and Emerging Technology team!