Spotfire Data Functions — TERR Basics

  • Have you experienced difficulty trying to implement data functions or TERR code from blog posts?
  • Do you find that blog post frequently assume you know how data functions work and simple but important steps are missing?
  • Can you follow blog post steps but don’t know why a step is required and thus can’t implement it in a different scenario?
  • Do you feel like you are lacking in a foundational or basic understanding of how TERR works in Spotfire?

This post will explain how TERR works in Spotfire, detail the steps required in order to implement TERR code and explain all of the menu items.  The intention is to build a foundation of knowledge to help you implement data functions you find on the internet. Click on this link to download the accompanying DXP file.  terr-basics

Let’s start with a little context first.  TERR stands for TIBCO Enterprise Runtime for R and is the Spotfire version of Open Source R.  Open Source R is a statistical programming language that has it’s roots in academia.  It is useful not only for statistics but also for manipulating data.  Spotfire can run Open Source R, but only if you have Stats Server, and we aren’t going to get into running Open Source R in this post.  This post will focus solely on TERR.

The TIBCO website describes TERR as follows:

“TERR, a key component of Spotfire Predictive Analytics, is an enterprise-grade analytic engine that TIBCO has built from the ground up to be fully compatible with the R language, leveraging our long-time expertise in the closely related S+ analytic engine. This allows customers to continue to develop in open source R, but to then integrate and deploy their R code on a commercially-supported and robust platform—without the need to rewrite their code.

Prototypes are often developed in R, but then typically re-implemented in another language for production purposes because R was not built for enterprise usage. TERR brings enterprise-class scalability and stability to the agile R-language, and enables statisticians to broadly share their analyses through TIBCO Spotfire Statistics Services or by directly embedding the TERR engine.”

While TIBCO says TERR is fully compatible with the R language, this isn’t entirely correct.  TERR is not an exact replica of R, and it isn’t updated as frequently as R, but they are very similar.  You can even connect TERR to your installation of R Studio in order to test TERR data functions in R Studio.  Data Shop Talk has a blog post about that here. 

Now that we understand what TERR is, let’s look at the high level steps you’ll follow to create your own data function.

  1. Write the script
  2. Create the input parameters
  3. Create the output parameters
  4. Run the script
  5. Map the input and output parameters to the data in the DXP file
  6. Find outputs and make visible in DXP as needed

The script will be universal.  It can be used in any DXP file.  The input/output parameters that are created with the script are also universal and can be thought of as shells or containers.  When the script is run, data will be put into those containers and run against the TERR functions to generate the desired output specific to the data.

Things to Know

  1. Case and spacing are important, especially in the input and output parameters.
  2. Filtering and marking can be utilized in order to limit data.
  3. Data types defined in the input/output parameters can also be of critical importance.
  4. Data functions can be setup to run automatically anytime data, filtering or marking changes or  you can choose to press a button to rerun the function.

Now that we are ready to take action, let’s look at where to find data functions in the menus, which can be confusing to the new user because they are in several places. Where you go in the menus will depend on whether the data function is being created from scratch or whether it is being pulled from the library or another DXP file.

  • Insert from the library or another file
    • Insert menu, Data Function
    • Tools menu, Register Data Functions — using the Open or Import button
    • Edit menu, Data Function Properties — using the Insert button
  • Create them in the DXP
    • Tools menu, Register Data Functions
    • Edit menu, Data Function Properties — using the Register New button

If you found a data function on a blog that you want to try, use one of the steps in no 2 listed above.  From here, I’ll explain how to create new data functions using the second bullet, second subbullet (Edit menu, Data Function Properties — using the Register New button).

Next, I have two examples using the DXP file attached to this post.  The first example is one that Troy wrote up at this link on Data Shop Talk.  He describes how to use a data function to generate a QQ plot that can be used to test for normality in a column of data.  This is a very common data profiling task prior to building a predictive model.  He provides the TERR code but not all of the exact steps.  It’s a good sample because it requires a few input parameters.  Note, I did not use the exact same code.  I change a few of the parameter names.

Here is the script:

input = na.omit(input);

qq <- qqnorm(input)

output <- as.data.frame(cbind(qq$x, qq$y))

colnames(output) <- c(‘Normal Quantiles’,’Sample Quantiles’)

QQResult <- output;

Here are the steps to follow:

  1. Go to the Edit menu, Data Function Properties, click the Register new button
  2. Name the script QQ
  3. Copy and paste the script in the Script tab.  If you do not have a Script tab, make sure that the Type drop down is set to R script — TIBCO Enterprise Runtime for R and not R Script – Open Source R.
  4. Go to the Input Parameters tab and click the Add button.  Add an input parameter with the following configurations
    • Input parameter name: input
    • Display name: input
      • For the sake of simplicity, I always make the input parameter name and the display name the same.
    • Type: Column
    • Allowed data types: Integer, Real, SingleReal, Currency
      • NOTE: When working with scripts you find online, it’s possible to run into problems with the data type.  In this case, we are creating a QQ plot, which tests for normality prior to an analysis of some type.  All of the columns should be integers or real numbers.  Thus, we haven’t selected any other data types.
  5. Go to the Output Parameters tab and click the Add button.  Add an output parameter with the following configurations:
    • Result parameter name: QQResult
    • Display name: QQResult
    • Type: Table
      • NOTE: In this example, we want to create an entirely new table.  Hence, the type is Table.  There are other scenarios where you may want to create a new column and add it to an existing table, in which case, you would choose Column as the Type.

 

Before we map the data, let’s look at what the script is doing and how the parameters are being used.

input = na.omit(input);

qq <- qqnorm(input)

output <- as.data.frame(cbind(qq$x, qq$y))

colnames(output) <- c(‘Normal Quantiles’,’Sample Quantiles’)

QQResult <- output;

Now that the script and it’s parameters are in place, click the run button.  You may see a pop up asking if you would like to save to the library.  If you click no, nothing bad will happen.  It means that the data function will simply stay saved in the DXP file only, as opposed to saved in the file and the Spotfire library (if you have one).

  1. Click the run button
  2. Save to the library (or don’t)
  3. If you would like the function to always refresh automatically, check the Refresh function automatically check box.  If you do not check this box, and something changes, you will see a set of circular arrows in the table header implying the data function needs to be refreshed.
  4. Click on the Input tab and select Column as the Input handler.  This is where we map the data in the DXP to the input parameters in the script.
  5. Select the data table and select the column of data that you would like to test for normality.
    • This step is where errors can crop up.  A QQ plot is meant to represent only continuous data.  If you select a categorical column of data, the data function will most likely throw an error.
  6. If you would like to restrict the records that go into the data function, either choose a filtering scheme from the drop down or check a box for a specific marking.  limit
  7. Click on the Output tab and set the radio button to create a new data table called QQResult.
  8. After you click OK, the data function will run and there should be a new table in the DXP file.  If anything went wrong, there will be either a red (error) or yellow (warning) message in the bottom left hand corner of the screen.  There are further instruction in the DXP that show you how to actually setup the QQ plot, but this is where I stop for the blog post.

That concludes the first example.  I want to throw one more simple example on this post because the first example only analyzed a single column of data.  In this example, we will pass through more columns, as you might do if you going through a data profiling or data conditioning exercise.

  1. Go to the Edit menu, Data Function Properties, click the Register new button
  2. Name the script Pass thru
  3. Copy and paste the script

Here is the script (yep, just one line):

Output <- input

Here are the steps to follow:

  1. Go to the Input Parameters tab and click the Add button.  Add an input parameter with the following configurations:
    • Input parameter name: input
    • Display name: input
    • Type: Table
    • Allowed data types: Select all
  2. Go to the Output Parameters tab and click the Add button.  Add an output parameter with the following configurations:
    • Result parameter name: output
    • Display name: output
    • Type: Table
  3. Click the run button
  4. Save to the library (or don’t)
  5. If you would like the function to always refresh automatically, check the Refresh function automatically check box.
  6. Click on the Input tab and select Columns as the Input handler.  This is where we map the data in the DXP to the input parameters in the script.
  7. Select the data table
  8. Here you have choices in terms of which columns are brought across.  You don’t have to bring all columns.
    • If you would like to bring all columns, set the radio button to Search expression and enter “*” in the text box.
    • If you would like to bring only select columns, set the radio button to Columns, click the Select Columns button and choose your columns.
  9. If you would like to restrict the records that go into the data function, either choose a filtering scheme from the drop down or check a box for a specific marking.
  10. Click on the Output tab and set the radio button to create a new data table called output
  11. After you click OK, the data function will run and there should be a new table in the DXP file.  If anything went wrong, there will be either a red (error) or yellow (warning) message in the bottom left hand corner of the screen.

After the data function is created, it’s very likely that at some point you will want to go in and modify what is being fed into the data function.  Do this via the Edit, Data Functions Properties menu.  Clicking on the Edit script button will allow you to edit the script or the input/output parameters (the containers).  Clicking the Edit Parameters button will allow you to change what data from the DXP is being fed into the data function (including any limiting done by filtering or marking).

edit

I hope you found this to a helpful and complete explanation of how to use TERR data functions.

 

 

 

 

 

7 thoughts on “Spotfire Data Functions — TERR Basics

  1. Top Intro! Was just at the TIBCO NOW event, and I must say things like this are really helpful to bridge the gap between a very good beginner and a budding master of Spotfire, as sometimes I did not know what they were talking about.

    Thanks again Julie!

  2. Hi Julie,

    Thanks for the excellent article. It helped me in getting started with data functions.

    I am writing a data function which takes a table as an input and outputs the table in xlsx format.

    Here is my code:

    library(xlsx)
    exceloutput <- excelinput

    write.table(exceloutput,file = filepath, row.names=F)

    The table is getting exported but when I open it, it says the "file format or file extension is not valid'. Looks like the file is not getting exported as an excel file.

    How do I need to install xlsx package?

    Could you please help me on this one? Thank you!

  3. Hi, Julie. Nice intro and example. Thanks so much.

    I hope you check out the new debugging data functions feature available in Spotfire 7.10 (Just released). You can set it in the Tools > Options dialog box, and then you get comprehensive debugging info from the yellow Notifications triangle on the status bar (lower left corner) in Spotfire.

    1. Thanks for letting me know about that. I saw the release of 7.10 and reviewed the release notes. There is no mention of that new functionality in the release notes. Can you tell me if there is more comprehensive documentation available?

Leave a Comment

Your email address will not be published. Required fields are marked *