# Moving Averages in Spotfire

This week a user contacted me for assistance setting up a 3-month moving average calculation.  He’d already attempted it, but the result was wrong.  This is a common problem with the moving average function because of the way it’s built.  That’s not to say that it’s built wrong.  It’s just wasn’t built the way he wanted it to be built.

To explain, I will begin with an example of the Moving Average aggregation used on the y-axis of a visualization because it’s the easiest to understand.  Then, I’ll move on to a moving average calculation in a calculated column, which is a bit different.

## Example of Moving Average Written on Y-Axis

The first screenshot below shows the configuration of the bar chart below it.  We are using the Moving Average aggregation and have chosen an Interval size of 3.  The actual expression makes use of the Last Periods node navigation method.  Note, Spotfire uses the term “Last Periods”.  A period is whatever you put on the x-axis, whether that be days, weeks, or months.  In our example, a period is a month. In it’s simplest form, the expression sums up oil prod then averages it over the last three periods on the x-axis.   However, it’s a bit more complex because there is also an If statement after the average.  The If statement is counting periods on the x-axis.  A result is returned only when the count is 3.  If the result is not 3, null is returned.  That’s why the result is null until the visualization makes it past 3 periods of data. When you are using a similar expression in a calculated column, it works a bit differently.

## Example of Moving Average as Calculated Column

In this example, I am going to use a 3-day moving average rather than a 3-month moving average.  The premise is the same.    I calculated the 3-day moving average with this expression:

Avg([Gas Prod]) over (Intersect([Well Name], LastPeriods(3,[Prod Dt])))

That expression says — Average Gas Prod for each Well Name over the last three periods as defined by the Prod Dt.  In this case, Prod Dt is a day of the month.  Thus, the expression will average Gas prod for each Well Name for the last three days.  Here is the data: As you can see, Spotfire is taking the first day of gas prod and dividing by one.  Then it adds day 1 and day 2 and divides by 2.  Thus, the first two days aren’t really a 3-day moving average.  This may work for you or it might not.  If you don’t want to see the average until 3-days have passed, simply add 2 more calculations.  One is a counter for the days.  The other is an if statement. This will return null until 3 days have passed, just as the previous example did.

Counter — Rank([Prod Dt],[Well Name])

3-day moving average — If([Count Days]<3,null,Avg([Gas Prod]) over (Intersect([Well Name],LastPeriods(3,[Prod Dt])))) Hopefully, this clarifies how the function works and also how to use the Last Periods node navigation method.

## Spotfire Version

Content created with Spotfire 7.12.

### 15 thoughts on “Moving Averages in Spotfire”

1. Hi,

I need to calculate column where first need to add revenue for last 7 days. I have 21 other columns which need to work as filter so I applied a calculation with all these 21 column in intersect along with Lastperiods(7,[date]): To clarify more:

Sum([revenue]) over Intersect(col1, col2, col3,…..col21, Lastperiods(7,[Date])).

Now, it is resulted into missing several records. I analyze and then found first it filter for col1 and then it works for other column which resulted into wrong value.

Let me know if you are getting some idea.

Rgards,
Ravi

• I think I understand what you are saying. Let’s say that col 1 = customer, col 2 = region, and col 3 = city. Then you have an expression like this — Sum([revenue]) over Intersect([customer], [region], [city]), Lastperiods(7, [Date]). That is going to give you the revenue for the last 7 days for each customer in each region in each city. Within the scope of that more detailed example what would you want it to do? Using over is like saying “for each”. It’s not a filter per se. It’s more of a grouping. Can you be a bit more specific about what you are trying to accomplish?

2. Hi! How would I do a % difference in production from today’s production vs. yesterday’s production. I have a similar data set with a lot of wells and lots of days of production as provided above. How would I, instead of averaging, just find the difference in today’s production and yesterday’s divided by yesterday’s. Or % Diff = (Today BOPD – Yesterday BOPD)/Yesterday BOPD

• Do you want to create a calculated column or do you want to write the expression on the y-axis of a visualization? The answer is different depending on which method you use. If the answer is y axis, what visualization type.

3. Julie- Thanks for the write up. I am attempting something very simple, yet similar using the aggregation method seen in your first screenshot, except with a line graph. As soon as I click moving average, my display goes away and I cant see any data. Any thoughts?

• I pulled up a line chart and a bar chart, and I wasn’t able to replicate that problem exactly. I did have problems when I didn’t use a hierarchy on the x axis. When I just put the raw date (not a hierarchy) on the x, I got an error that said could not find axis: X on the line chart. I am in version 10.2.

4. Hi Julie,
I am trying to do something very similar except my data has multiple data points for each day. When I try to use the LastPeriods function what will Spotfire use as the “Periods” since my data contains timestamps down to the second? In that case would it only average the data by seconds? I would like to create a moving average over a 7 day period but I do not want to use a hierarchied X Axis. Is that possible? Below is the command I am currently using on the Y-Axis under Custom Expression.

Avg([Column]) OVER (LastPeriods(7,[Time]))

• To use Last Periods and calcualte on the axis, you have to use a hierarchy. Can you just create a hierarchy than only goes down to the day and doesn’t go down all the way to the second, even though the data goes that far? That might work.

5. Hi Julie,
I could do that but it makes the visualization look funny cause all the data points for each day stack on top of each other in a vertical line. Then my plot looks somewhat like a bar chart but with scattered points. If I use a calculated column instead would that work?
Maybe creating an additional column that grabs the day from each data set and intersect that with my Time column?

• Okay, this is a bit of a dump of info. You’ll need to use LastPeriod or NavigatePeriod.

LastPeriod
o Allows you to specify the number of periods or records to be used in the calculation. If the first argument is 4, the current node will be grouped with the previous 3.
o Must be used with time, as do all “Period” Node Navigation methods.
o Commonly used to calculate moving averages.
 ex. sum([OIL]) over (5, LastPeriod[D_DATE])
 ex. (sum([OIL]) over (5, LastPeriod[D_DATE])) / 5
 ex. sum([OIL]) over (intersect([Lease], (5, LastPeriod[D_DATE])))

Axis.NavigatePeriod
• This node navigation method allows you to specify exactly what to pick up. You would use this when you maybe aren’t sure how to get one of the other node navigation methods to point to a specific place in time.
• This particular node navigation method works a little bit differently than the others. With other node navigation methods, if you change the hierarchy, the calculations update. However, with this method, changing the hierarchy can actually break it, as will be shown with the examples.
• Syntax
o The syntax has 3 mandatory arguments and one optional argument.
o Arg1 is the hierarchy to navigate, which is generally always going to be Axis.X
o Arg2 is a string OR integer value value specifying the level in the hierarchy that you should navigate up to.
o Arg3 is an integer specifying the number of steps to move sideways in the hierarchy at the level specified by Arg1.
o Arg4 is optional and and is the name of the level of the hierarchy to move down to, or an integer specifying the number of steps to move down. This argument can be omitted in which case a navigation to the leaf level is made. Almost all uses of Navigate Period that I have seen omit this argument, with the exception of the Year to Date Total and the Year to Date Growth Expression shortcuts.
• Key Notes:
o Arguments 2 and 4 can be strings. Argument 3 must always be a number.
o If you use numbers, you can move the hierarchy slider, and the visualization will update.
o If you use string (specifying month, quarter, etc), the slider setting must match the syntax.

Example where the hierarchy is set differently than what you want to calculate.
Sum([OIL]) OVER (NavigatePeriod([Axis.X],”Year”,0,0)) or
Sum([OIL]) OVER (NavigatePeriod([Axis.X],”Year”,0,”Year”))
• Both of these expressions will show the total for each year, regardless of how the slider is set.
• The value shown is the total for the year.
• If you change the first “Year” to “Quarter”, the value shown is the total for each quarter, regardless of how the slider is set.

6. So, you have to use a hierarchy on the x axis to make the calculation work. With that said, I think you can still use last periods to navigate the hierarchy the way that you want, and still get a decent looking visualization. A few years ago, I spent a whole lot of time figuring out the details of that function to build a calculations course. I’ll dig up those old materials and see what I can find.

7. Hey Julie,

I noticed on Day 3 of your updated 3-day moving average column, the same issue is happening as the case without the counter and if statement. It is taking Day 3 divided by one, Days 3 and 4 divided by two and then the moving 3-day average starts to tie to the original column beginning Day 5. I believe that both the original moving average and updated moving average columns should have the same values with the exception of the (# moving average days – 1) values being null in the updated case. Thoughts?

• Hi Brian,
Thanks for catching that. I think I looked at the wrong column when building the example. I’ll put out a new version of this post next week on https://bigmountainanalytics.com/blog. I am no longer writing for Data Shop Talk, although I do still see comments. Thank you!

8. Hello Julie,
I am facing an issue since 2 weeks.
I want to calculate a 3-rolling-month with this formula:
Count() Over (Intersect(LastPeriods(3,[Axis.X])))
It works well but now, I wanted to take into account the day of the month.
For example, if we are the 21/11/19.
The “Lastperiods” should last from 21/08/19 to 21/11/19 whereas with my actual formula, my “Lastperiods” lasts from 01/09/19 to 21/11/19.

I tried to do the same with days and display month part of my hierarchy but it seems it works only with my lowest granularity level.

Please could give me if is it possible ?

Best regards,
Matthieu D.

9. Hello Julie,

Here is my sample data
Name Time Status
Nikhil 12/1/01 3:35 AM 0
Nikhil 12/1/01 5:35 AM 0
Nikhil 12/2/01 3:35 AM 1
Nikhil 12/2/01 5:35 AM 1
Nikhil 12/3/01 3:34 AM 0
Nikhil 12/3/01 5:35 AM 1
Shekhar 12/1/01 6:35 AM 0
Shekhar 12/1/01 7:34 AM 0
Shekhar 12/2/01 6:35 AM 1
Shekhar 12/2/01 7:35 AM 0
Shekhar 12/3/01 6:35 AM 0
Shekhar 12/3/01 7:35 AM 0

I need to a find the status change between the first and last dates for the respective names.
I first tried :
Sum([Status]) OVER (Intersect([Name],LastNode([Time]))) – Sum([Status]) OVER (Intersect([Name],FirstNode([Time])))

But the first and last node of time vary with ‘Name’ and so it doesn’t give me the right results. Could you please help me out

Thanks
Nikhil