A key part of analytics in the oil and gas industry is evaluating opportunities at different locations. Space is always present when looking for profitable development projects. We usually look at the already in production wells and try to find some spatial trends. To stay competitive, we need to find better ways to access the data of different areas and its wells. For instance, we can transform the spatial information to compact objects that store the location and shape of each well and lease. These objects can be feed to different calculations and analyses as geometries. For Spotfire, it also has some advantages, you can use the feature layers of the map chart. In this case, we can visualize the leases as polygons and wells as lines.
- Are you working with a team that shares DXPs amongst team members?
- Do your TERR data functions require CRAN packages to be installed?
- Are you worried your teammates won’t be able to run your data function(s)?
This post explains my struggle to convert strings to Date or Time with TERR. I recently spent so much time on this that I thought it deserved a blog post. Here’s the story…
I was recently working on a TERR data function that calls a publicly available API and brings all the data into a table. I used the as.data.frame function to parse out my row data. In that function, I used the stringsAsFactors = FALSE argument, and as a result (the desired result), all of my data came back as strings. This was fine because the API included column metadata with the data type. As you can see in the script below, I planned on “sapplying” through the metadata with as.POSIXct and as.numeric. This worked just fine in RStudio, and it also worked for the numeric columns and for the DateTime columns. However, it did not work for Date and Time columns. I tried different syntax, functions (as.Date didn’t work either), packages, etc to get it to work and NOTHING! The struggle was very real.
Finally, I Googled the right terms and came across a TIBCO knowledge base article with this information….
Spotfire data functions recognize TERR objects of class “POSIXct” as date/time information. As designed, the Spotfire/TERR data function interface for date/time information does the following:
– Converts a Spotfire value or column whose DataType is “Date”, “Time” or “DateTime” into a TERR object of class “POSIXct”.
– Converts a TERR object of class “POSIXct” into a Spotfire value or column with a DataType of “DateTime”, which can then be formatted in Spotfire to display only the date (or to display only the time) if needed.
This interface does not use any other TERR object classes (such as the “Date” class in TERR) to transfer date/time information between Spotfire and TERR.
That told me that all my effort was for naught, and it just wasn’t possible. I contacted TIBCO just to make sure there wasn’t some other solution out there that the article was not addressing. In the end, I just used a transformation on the Date and Time columns to change the data type. I hope that you, dear Reader, find this post before you spend hours on the same small problem. I did put in an enhancement request. Fingers crossed. Please let me know if you have a better method!
What is a Data Function?
Data Function Basics
- Create the script
- Create the parameters
- Run the script to map the parameters to the data in the DXP
- The script is the “meat” of the data function. Within the script, you’ll find at least one input and one output parameter. The simplest R script I’ve ever written is output <- input. Input is the input parameter, and…yeah, you can finish that sentence I bet.
- TERR (and R) are object-oriented languages, which means programmers can create objects within the code, assign values to the objects and then reference the object down the road rather than all the values. This makes programming easier. In the example above, input and output are both objects.
- Input and output parameters tell Spotfire what type of object to work with. The object could be a table, column, document property or another object.
- Running the script triggers the dialogs where you will map the parameters to the actual data in the DXP.
- Data functions can be connected to marking and filtering. For example, you can pass the results of marking or filtering to a new table.
- Users may create data functions from scratch in Spotfire, or users may import data functions from the Spotfire library or another file.
- By default, data functions embed within the analysis. However, users have the ability to save them to the library for reuse or sharing.
duplicate <- duplicated(data.frame(key1, key2))
Data Function Properties Main Screen
- Create new data functions
- Edit existing data functions and their parameters
- Refresh data functions
- Delete data functions
- Save data functions to the library
- Export a data function
- The term “Register New” can be a bit confusing to new users. This really means create a new data function. In the process, you’ll have the option to save it in the library or register it.
- Clicking the Edit Script button will let you modify the script or the input and output parameters.
- Clicking the Edit Parameters button allows you to change the mapping of data from the parameters to the DXP content.
Script & Parameters
Input and Output Parameters
- If your input or output is an entire table, choose Table. I use this option when I am simply passing a limited data set from my original table to a new table.
- If your input or output is a single column, choose Column. The script shown above for identifying duplicates uses a Column output. The data function creates a column called “duplicate”.
- If your input is a hard-coded value or a document property, choose Value.
Run the Data Function
After you have entered the script, input parameters, and output parameters, the next step is clicking the Run button. If Spotfire asks if you want to save the data function to the library, you can say no. It will not impact your DXP. This is simply to give the option to save the data function to the library so others may access it. As an administrator, I ask users NOT to do this because it clutters up the library. It is also hard to know what a given data function is for or if it even works.
Anyway, this is the step in the process where you map the parameters to the content of the DXP. Let’s tackle the inputs first. I have intentionally added two unnecessary parameters to demonstrate that the options for input handlers depend on the type of input parameter. Each input parameter type has different options.
- For Column type, there are three options — Column, Expression, and None. The most common input handler is Column, which I have used in data functions that manipulate or calculate based on a specific column of data.
- For Value type, there are six options — Value, Document property, Data table property, Column property, Expression, and None. I most frequently use Document property.
- For Table type, there are three options — Columns, Expression, and None. You can tell Spotfire to work with a subset of the columns in the table by using the Select Columns button. Alternatively, typing “*” in “Search expression” will use all columns in a table. It’s not visible in the screenshot shown, but just below the “Search expression” section, you will also find options to connect the contents of the table to marking or filtering. This is explained in the TERR Basics post.
I do want to note that I have never used the None option in either input or output handlers. If someone has, please tell me about it in Comments.
Now, for outputs, it is also true that the options presented differ depending on the parameter type. As you can see, Column, Value, and Table all have different options.
- The Column and Table Type have the same four options — Data table, Columns, Rows, and None. Use Data table if you are creating an entirely new table. Set the type to Columns if the output is a column that should be added to another table. Use Rows if you are adding rows to a table.
- In Value Type, there are six options — Data table, Columns, Rows, Document property, Data table property, Column property, and None. The same advice is true of outputs here as for inputs.
As I was writing this, I realized that if I were creating a data function that output rows, I’m not sure which type I would use. The options for adding rows are part of both the Column and Table Type. Setting up a Column type to insert rows seems counter-intuitive. I just haven’t had to write this type of data function yet. If you know, please Comment!
Hopefully, explaining some of the common uses of the different types of input and output parameters will help you better understand TERR function and how to convert R code to TERR. Thanks!
- Uninstall R
- Uninstall RStudio
- Install new R
- Install new RStudio
Hopefully, your installation goes easier than mine did.
Ruths.ai recently published a free template on the Ruths.ai Exchange that reads and writes data from/to MS Access. Under the covers, you’ll find two property controls and two data functions working with the RODBC package. Now, we know that templates are good, but being able to replicate the work is better. Users want to be able to recreate that functionality in their own files, which is why I am writing this post to explain the code and how everything fits together so you can recreate this functionality in your own DXP files. Before reading any farther, use this link to download a copy of the template and familiarize yourself with how it works.
- Do you know how to check which version of TERR is installed on your Spotfire installation?
- Are you unsure of which versions of R or RStudio to download so that you know it lines up with TERR?
- Did you have it all figured out but now you’ve upgraded and lost all the answers???
Incomplete data is a problem that Data Scientists face every day. Most common practices vary from complete deletion of the observations with missing values, substitution by a fixed value, or performing imputation using statistics like the mean or median. Since these approaches have limitations on capturing the structure of the data, scientists have developed more sophisticated methods.
The “spTimer” package uses three Bayesian models to fit Spatio-Temporal Data. The data may be given at sparse spatial stations, where observations at each station are considered time series. The package can model the residual spatio-temporal variation to measure uncertainty. It also gives flexibility to customize covariance function selection, the hyper-parameters of the prior distributions and the tuning parameters for the implemented MCMC algorithms.