- Do you want to learn more about the Spotfire expression language?
- Do you have difficulty getting calculated columns to respond to filtering?
- Do you wonder why there are so many custom expression options in the right click menus?
In my last two blog posts, I discussed how to use Over and Node Navigation. Understanding these functions are big steps towards becoming an expressions expert, but one big piece is still missing – learning how to write expressions directly on a visualization. What do I mean by writing an expression on the visualization? I mean, right clicking on any of the variable selectors (eg. Y axis, x axis, color by) and selecting Custom Expression (or using the Edit Expression section of the y axis variable selector).
The ability to write expressions directly on a visualization can add a layer of flexibility to the analysis that isn’t possible with calculated columns. How so? Well, when columns are inserted into data tables, they will always take into account the entire data table. Calculated columns do not respond to filtering, but expressions written on a visualization do. They can also respond to other changes on the visualization, such as a change in hierarchy. I’ll show you an example shortly, but before I do, I need to point out that when Over and Node Navigation functions are written as custom expressions on visualizations, the syntax changes. I know what you’re thinking – MORE/DIFFERENT SYNTAX!?!!? – but it’s for good reason. The difference in syntax is what makes it flexible.
When calculated columns are inserted into a data table, Over and Node Navigation reference column names. When writing custom expressions on a visualization, Over and Node Navigation will reference what is called an Axis Name. There are 17 different Axis Names, and you choose the Axis.Name depending on the visualization type (eg. Bar chart, line chart) and the part of the visualization you want the calculation to reference (eg. Color, X Axis, Trellis, Line).
My first example will use the Axis Name, Axis.X. Note, Axis Name values are case sensitive. Axis.X will be recognized, but Axis.x will not be. So let’s say I have a production data set for 5 wells that covers six months, and I have one record per well per day. If I want to calculate the delta from yesterday’s production to today’s production for each well, I could use a calculated column like this:
Sum([OilProd]) over ([Well Name], [ProductionDate]) – Sum([OilProd]) over (Intersect([Well Name], Previous([ProductionDate])))
The first part of the expression sums up the oil production for each well, for each day and then subtracts it from the second part of the expression which adds up the oil production for each well for the previous day.
You also could write a similar expression on the y axis of a bar chart as:
Sum([OilProd]) – Sum([OilProd]) over (Previous([Axis.X]))
The first part of the expression, is adding up OilProd for each node on the x axis and subtracting it from the sum of OilProd from the previous node on the x axis. What is a node you ask? A node is whatever is on the x axis of the visualization. If you put a date hierarchy on the x axis that is set to month, then this calculation will calculate the difference month to month. If the hierarchy is set to day, it will calculate the difference day to day, which is why this method of calculation is so flexible. This is demonstrated in the animated GIF shown below.
In a similar example, I will calculate the % of total production that each Reservoir, Field and Well Contributes using the expression Sum([OilProd]) / Sum([OilProd]) over (All([Axis.X])). I have connected a drop down selector that allows me to easy toggle between Reservoir, Field and Well, you can see how the visualization updates in the animated GIF below.
And did you see at the end there how the calculation also updated when I changed the date range? This isn’t possible with calculated columns because calculated columns will always take into account the entire data set. I’ll show you. In the animated GIF below, I have added a calculated column to the table with the expression Sum([OilProd]) over ([Reservoir]) / Sum([OilProd]). Notice that when I move the date selector, the values don’t change. This is the primary difference between writing expressions on a visualization and inserting calculated columns.
This is where I will end this blog post, but we only covered 1 of the 17 Axis Names. I’ll touch on the others in future posts, but for now, here is a summary graphic that is part of the TIBCO Advanced Calcs course online showing all of the Axis Names and which visualizations they can be used with.
Guest Spotfire blogger residing in Whitefish, MT. Working for SM Energy’s Advanced Analytics and Emerging Technology team!