- Have you ever wondering why someone might use a data function as part of a transformation in Spotfire?
- Have you ever wanted to see an example?
This is week 4 in the Transformation series. So far, I have covered pivots, unpivots, and normalization. This week I am going to focus on data functions as transformations. Data functions are some of the work horses of Spotfire. They perform a wide variety of tasks and analysis including (but not limited to) data manipulation, data conditioning and statistical operations. They can be executed through a number of different languages including (but not limited to) TERR (TIBCO Enterprise Runtime for R) or Open Source R. In Spotfire, data functions are frequently used to get around some of the limitations of the software and manipulate or analyze in ways that would otherwise be difficult or inefficient. Now that you know what data functions are, I will continue by touching on a subject that pertains to all transformations but that I haven’t yet touched on in this series.
As many of you know, transformations duplicate functionality found elsewhere in the application, but for good reasons. For example, users can calculate new columns through the Insert menu, or through transformations. When calculating columns through a transformation, they can be used in joins. When users calculate columns through the insert menu, they cannot be used in joins. The functionality of two similar operations differs because of when those two operations happen and don’t happen.
The user may not be aware of it, but Spotfire performs a series of operations when opening and updating the DXP file, and those operations occur in a very specific order. The program is based on code after all, and code doesn’t happen randomly. When a DXP is loaded, the first thing that happens is the data is queried from it’s source. Then, transformations are applied, and they are applied in the order that the user created them. For example, if a user adds a data table and then adds a change column name transformation, any future or subsequent transformations will involve the new column name, not the old. What’s also important is that transformations happen before other processes, which is why you will always see transformations before calculated columns when viewing the Source Information in Edit –> Data Table Properties.
Furthermore, transformations take place when the DXP file is opened and when data is reloaded, and those are the only times they take place. In contrast, if the user inserts a calculated column, that column can be updated any time a value used in the calculation changes, which could be any point in time. That is why you cannot use inserted calculated columns to join. If you could, then the data load process would have to be performed any time a calculated value changed (to update based on the join) and that would simply bog down the program.
Thus, any data functions added to a DXP file via a transformation will also be performed when the DXP is opened or when a data source is reloaded. Therefore, it makes sense that you would insert a data function as a transformation when you want it to be part of the data load and conditioning process that takes place when the DXP file is opened.
Adding data functions as transformations is done with the steps shown below. Warning: There are limitations.
- Go to the Insert menu, Transformation, select Data function from the drop down
- When you click Add, you will be brought to this screen. This screen will show all of the data functions saved to the Spotfire library on the Spotfire server, arranged by the language they are written in. This screenshot is looking at the TERR data functions. Because the dialog box takes you to this particular menu, it means the user cannot write a data function from scratch or “one the fly” via Transformations. It must first be written and then saved to the library. I’m not quite sure why this limitation exists, but I’m sure there’s a good reason.
- From here, you simply select the data function, and you will be prompted to map the data to the function. The screens might not be the same as what you have seen in other dialogs when working with data functions, but that is the function being performed.
I would definitely recommend experimenting with these types of transformations before committing to them. I experienced a number of issues as described below that I haven’t figured out yet:
- Errors on functions that should run easily with any data set.
- My first attempt at adding a correlation table resulting in my source data table being turned into the correlation table, which is not how the exact same function works when I use it through Insert, Data function.
- When attempting to run a data function used to pass data from one table to another based on marking or filtering, there was no option to change or update the marking or the filtering.
- I was unable to find data functions saved in the library that I could see and find by using Insert, Data function.
I will update this post if/when I have explanations for the conditions experienced above. As always, you may reach me at firstname.lastname@example.org with questions or comments.
Guest Spotfire blogger residing in Whitefish, MT. Working for SM Energy’s Advanced Analytics and Emerging Technology team!