Arps Type Curve in Spotfire in 20 minutes

Type curves are an important part of resource assessment of an oil and gas asset. In this workflow, well declines are aggregated to determine typical behavior of a well ensemble. These well ensembles usually reflect a reservoir or set of analog reservoirs that will help determine characteristic behavior. In this post, we will build a decline model for the group of wells that is called a type curve. The type curve will capture the production rate forecast for a single “average” well and so can be used to determine Estimated Ultimate Recover (EUR). Best yet, we’ll do it in 20 minutes. A little longer than GEICO, but you’ll save so much more money.

There are countless tools that support the type curve workflow. There are also many levels of sophistication around type curve prediction, handling different flow regimes, decline characteristics, and uncertainty. Since type curve generation is so central to resource assessment, though, it’s useful to be able to merge it with the adhoc analytics capability of Spotfire. So useful, in fact, that it will open up a lot of different pathways for improvements of the resource assessment process especially for unconventionals. But that’s for a different post.

At the end of the day, anyone who has done type curve fitting knows it’s as much of an art as it is a science. The statistical best fit may not actually model the dynamics of the reservoir and it takes an engineer’s intuition to model a more accurate forecast. The machine can get close, though, and that’s where the dynamic and interactive nature of Spotfire is really going to improve the type curve fitting workflow.

In this post, we will walk through the mechanics of calculating vanilla type curves in Spotfire.
You can get the end result off of Exchange.ai. I call our objective ‘vanilla’ because this is the most basic, straightforward method. There’s a lot more you can add to the decline fit to improve its usefulness. Let’s consider those extra toppings… like we could make hot fudge type curves. Who says data science can’t be delicious?

Because we want this to run out of the box in Spotfire, we are going to use base R, meaning we won’t use any packages. There are several things we are doing which would be easier done with an R package, but we don’t want to complicate the user’s experience by having to install packages. Plus, there’s nothing better than implementing things from scratch to make sure you know what you’re doing.

Data

We will assume that you have access to well header and production data. On the well header table, it’s nice to have location, reservoir, well type, and other metadata that will help you grab reasonable well groupings. For production, there needs to be a valid lookup column to relate it back to the well header table and should have your fluid rates (oil, gas, water). The table itself can be daily or monthly. If you’re doing a play assessment, it’s likely you only have monthly data. The granularity of monthly data works well for the type curve workflow as it cleans up a lot of the noise inherent in daily rates. If you are using your own proprietary data then you probably have daily. The method in this post is general to either; it really just changes the units of the results (e.g. BBLs decline per day or per month).

For this post, we will assume you have daily data.

Type Curve Equation

For this workflow, we will use the Arps equation. You’ll see in the code how you can make this more general (or put in your own “secret sauce” type curve equation). Ultimately, we will use R’s nonlinear equation fitting functionality to find the best fit for parameters. Since it’s nonlinear, you can put in pretty much any function to be fit. There are caveats, of course, as it may not be able to find the best fit parameters given the starting conditions and function characteristics.

Calculating a Best Fit Type Curve

Let’s start by getting the data into Spotfire’s TERR environment. This is done by creating a data function and sending in the production data. I’d suggest limiting the production data you send in by marking or filtering, but of course that depends on the workflow you want to develop. In this post, I’ll assume you’re only sending in production you’ve limited by marking. If you have no idea what I’m talking about, just keep reading because this part isn’t necessary for calculating type curves.

For simplicity, we will just walk through putting type curves on the oil stream. Doing the same for gas and water is pretty much an identical procedure. We will assume that the production data table has a well name column, a date column, and an oil rate column. We have twenty minutes, remember!

My input for the calculation will be
The first step is to calculate DaysOn in order to normalize the type curves. We will do this by calculating the minimum production day. We can improve this by calculating the minimum production day which has nonzero oil rate, or even the date of peak oil. To do this we will use the aggregate function.

minDate = aggregate(x=data$PDate, by=list(data$WellName), FUN=min)

We will merge the result back with our production table and subtract the min date minus the production date to get Days On. Technically, you may want to remove down days from the DaysOn calculation. There are some easy ways to do this if you want to clean up “DaysOn”.

data = merge(data, minDate, by="WellName", all.x=TRUE)

Having the well count will be useful for improving the curve fit and the user will need to see this curve back in Spotfire. A simple way to do this is to make a column of 1’s which we will sum.

data$WellCount = 1

Okay, let’s aggregate our oil across wells to get our dataset to run the type curves on.

type.data = aggregate( Oil ~ DaysOn, data=data, FXN=mean )

And let’s add in the well count.

type.data = merge( type.data, aggregate(WellCount ~ DaysOn, data=data, FXN=sum))

Now we are going to predict oil rate using time (basically decline). We will use the Arps function which takes three parameters: initial oil rate (Qi), effective decline (De or a), and decline degradation (b). These three parameters describe the initial production rate, its initial decline, and how decline changes as a function of time. In R, we can use the ‘nls’ function to find a best fit of the parameters.

fit = nls(Oil ~ qi/((1+b*a*DaysOn)^(1/b)), data=type.data, start=list(qi=max(type.data$Oil),a=0.01,b=1),control=nls.control(warnOnly=TRUE))

I’ve provided some reasonable start values for the parameters. If you don’t provide this, nls may not consistently find a good value. We could improve this by first doing a linear exponential fit then using those values as start parameters. I’ve told the function to only warn me if it doesn’t converge. Basically, I still want the values back even if we didn’t get to convergence. The user can just run it again if they don’t like the fit.

You can grab the Spotfire template off of Exchange.ai. In the template, it helps the user define type curves for different regions, then compare the results of the fit curves. Usually you want to compare type curves from different areas (geographically, geologically, or competitively). So, this means you need to store the results of each type curve run if you like the fit. The template provides the functionality for you.

Going Further

Type Curves are a core part of several oil and gas resource assessment workflows. For anyone doing data science in oil and gas, they are a common and useful technique to have in your back pocket. Because time is one of the strongest predictors for performance of a reservoir – due to the drop in pressure of the reservoir by production – any regression technique needs to take time into account. So, whether you like it or not, you have to take type curves into account!

We will be releasing a more sophisticated version of the type curve analysis that handles gas and water and also gives uncertainty bands around the decline. Stay tuned for that.

The Code

Here’s the full code. I’ve added some extra stuff that I didn’t walk through, including how to forecast the model into the future. The end result is less than 50 lines of code.

# We need to convert the date column to a posixct date... just trust me on this one
data$ProdDate = as.POSIXct(data$ProdDate, origin = "1970-01-01", tz="UTC")
minDate = aggregate(x=data$ProdDate, by=list(data$WellName), FUN=min)
colnames(minDate) <- c("WellName","MinDate")
data = merge(data, minDate, by="WellName", all.x=TRUE)

# Calculate Days On
data$MinDate = as.POSIXct(data$MinDate, origin="1970-01-01", tz="UTC")
data$DaysOn = as.numeric(data$ProdDate - data$MinDate, units="days")
data$WellCount = 1

# Bin daysOn into X day groups
cutpoints <- seq(0, max(data$DaysOn, na.rm=TRUE), by=DaysBin)
binned <- cut( data$DaysOn, cutpoints, include.lowest=TRUE, labels=FALSE )
data$DaysOn <- sapply( binned, function(x) { cutpoints[x] } );

# Calculate Mean(Oil), Sum(WellCount) over DaysOn
gdata = aggregate(Oil ~ DaysOn, data=data, FUN=mean)
wcount = aggregate(WellCount ~ DaysOn, data=data, FUN=sum)
gdata = merge(gdata, wcount)

# We are going to extend the DaysOn column to include the number of days to the ForecastYears parameter (e.g. 30)
max.days = max(gdata$DaysOn, na.rm=TRUE)
if(max.days < ForecastYears*365) {
	gdata = merge(gdata, data.frame(DaysOn=seq(max.days,ForecastYears*365,DaysBin)), all=TRUE)
}

mtime = Sys.time()
idx = data$DaysOn <= max.days
fit = nls(Oil ~ qi/((1+b*a*DaysOn)^(1/b)),
			data=gdata[idx,],
			start=list(qi=max(data$Oil,na.rm=TRUE), a=.1, b=1),
			control=nls.control(warnOnly=TRUE))
gdata$OilPredict = predict(fit, gdata)

25 thoughts on “Arps Type Curve in Spotfire in 20 minutes

    1. Outlier removal is an interesting topic in Spotfire. Because its so visual, you could use filtering or marking and let the user select which data points should be considered “bad” or outliers. If you wanted to do it from an automated standpoint, you could identify (a) well-by-well outliers (as in shut-ins, bad test data, etc.) or (b) ensemble outliers (this well shouldn’t be in the group). For either, you’d use some sort of outlier detection method. A good starter method would be removing outliers based on standard deviations from mean (more than 3 STD). There are more robust methods out there but this would give you an idea of how much outliers are biasing your results. For (a), you’d run the outlier detection on the volume numbers per day, and for (b) you’d run it on the decline fit per well.

  1. This may be basic, but when I try to use this code, I get the error that “DaysBin” cannot be found. Should this be a column in my data table…? I have limited programming experience but understand Spotfire alright, and I know this is something my company wants!

    1. DaysBin should be an input to the TERR script that the user can change. It allows the user to dynamically change how much aggregation occurs along the time axis. For instance, setting DaysBin to 60 would aggregate the performance for each 60 day period.

      If you want to see how this is all put together in Spotfire, you can download the template off of Exchange.ai here: https://exchange.ai/downloads/basic-type-curve/

  2. When I try to implement the above posted code (34 line block), I get an error stating “TIBCO Enterprise Runtime for R returned an error: ‘Error in nlsfit.default(start, RHS, response, mEnv, weights = weights : missing value where TRUE/FALSE needed’.”

    Given the weights argument is optional, I can’t figure out why it’s requiring true/false for input.

    1. My guess is it has something to do with the data you are passing in, not the weights input. Maybe Spotfire/TERR did a funky type conversion you weren’t expecting with the data or there are blanks/NAs causing some havoc. The posted code doesn’t do any robust checking of inputs.

      1. First, Thank you for making this code available. I have successfully run most of the elements of this script in the R environment, however, when implementing in spotfire I am able to run everything except the nls function. Upon further research my understanding is that this function is not available in R. See page 14 of link below which states “nls” function is not available in current R version in spotfire. I am using spotfire version 6.5. Does one need a later version of spotfire or a plugin / R package addition to gain access to the nls function. Any help much appreciated.

        https://docs.tibco.com/pub/enterprise-runtime-for-R/2.5.0/doc/pdf/TIB_TERR_2.5.0_Unavailable_Functions.pdf

        1. The code provided runs on Spotfire 7.0. I’m pretty sure TIBCO made a lot of upgrades to TERR between 6.5 and 7.0. For instance, they added the TERR Tools and improved the coverage of the TERR engine. I’m not sure if a plugin exists for R since it is a core feature in R. Can you upgrade to 7?

  3. Thanks for the reply – my data appears to be tidy numeric/real data. Everything executes except the nls curve fit. I’ll keep playing around!

    1. Hey Lee- I tried running the script on Spotfire Cloud Workgroup and see the same issue. It looks like there’s been a change with the nls function between Spotfire 7.0 (which I wrote the blog on) and Spotfire 7.5/Workgroup 2.4. Did you end up figuring out a solution?

  4. R first-timer here… Trying to register the data function but am unable to get it to run. How are your Input Parameters and Output parameters setup? Also, my column names are slightly different. Prod Date vs ProdDate, Oil Rate (bopd) vs Oil. Do I need to change the script or the way the parameters are mapped? Also, I would prefer the x-axis to be in months instead of days. I already have a “Norm Month” column created. Any suggestions?

    Great article by the way. I am a reservoir engineer and do appreciate the way you described the nuances of DCA vs best-fit averaging.

    1. You can take a look at how we did the input/output matching in Spotfire on the Exchange.ai template. It has it all set up for you. In the wiring up, you just need to make input and output variables that match what’s in the R code instead of what you have in your table. When you execute the function you would then map the R variable names to the column names in your DXP.

      Changing x-axis is relatively straightforward. You would just change DaysOn to MonthsOn in the code and run it. This would also change the a and b since you are adjusting the time domain.

  5. Troy, I am just learning Spotfire, but I really enjoyed your tutorial. So is your code listed different from the exchange file that must be purchased? If so, how? Is the code listed what the exchange file is running in the background?

    Can either your code of the exchange file be added onto an existing SF database? For example, we already have a SF database with many more variables than what are used above, but the decline curve function seems like a useful functionality. How would it integrate?

    1. Hey Austin – thanks for checking out the post. We provide the full code in the post. We’ve done some optimization for Spotfire/TERR and interactivity in the Exchange template, like being able to override manual values, which doesn’t make sense in the R code here.

      Once you have the Exchange template, or this code, its pretty easy to merge that into an existing Spotfire file. If your data tables are the same, you can drag and drop visualizations or pages. You can also save the type curve function as an *.sfd file and import it directly into Spotfire, wire it up to the data you have loaded, and you’re off to the races. Alternatively, you can save the Spotfire Data Function (basically wrapped R) on the library. Then, when someone makes an update to that library data function, you can sync your local version and get the updated functionality.

  6. Hi Troy,

    Thank you for making this available. I adapted the code a bit for my own purposes and am trying to run it in R Studio for now since my company has not yet upgraded to Spotfire 7.0. Everything seems fine until the end of the script when I get to the nls portion. After that, I get the error:

    Error in numericDeriv(form[[3L]], names(ind), env) :
    Missing value or an infinity produced when evaluating the model

    Through a little research, it seems that error is typical when selecting start values that are not reasonable, but I feel I’ve selected reasonable values.

    Any suggestions or guidance would be greatly appreciated.

    1. We actually ran into some of this as well. We decided to change the nls call to optim. This turned out to be more robust than nls when it came to starting conditions. Using optim also makes it easier to bound the search, so for instance Qi is positive, a and b should be between 0 and 1.


      # Min squared error function
      min.RSS < - function(data, par) { with(data, sum(((par[1]/((1+par[2]*par[3] * DaysOn)^(1/par[3]))) - Oil)^2)) } idx = gdata$DaysOn <= max.days startQi<-max(gdata$Oil[idx]) # use optim instead of nls fit <- optim(par = c(startQi, .2,1), min.RSS, data = gdata[idx,]) fpar <- fit$par gdata$OilPredict = fpar[1]/((1+fpar[2]*fpar[3]*gdata$DaysOn)^(1/fpar[3]))

  7. I hadn’t checked this thread for a couple months but was glad to see it was alive and well! I had success with NLS in v7.0 and had built in more and more functionality. Sure enough, when we upgraded to7.5 nothing worked!

    Utilized your OPTIM suggestion and seems to be working well, and was able to bound variables (something I couldn’t make work with NLS).

  8. After using this and plotting stats on fit parameters, I realized that results were HIGHLY dependent on initial guesses. I later outputted the fit$convergence and fit$message attributes and realized many decline fits were ending prematurely/not converging. I am digging into more options and will report back when I have more info.

    1. Lee – you are right on. Optimization often gets stuck in “local minima or maxima” in the high dimensional search space. Basically, by providing too broad of a search space, the optimization routine won’t find the right answer. I imagine it’s like blind-folding someone in a room and asking them to find the doorknob. They do a lot better when you put them right next to the door.

      Some basic techniques you can apply is to do a fit to a simpler model first (a la exponential), then add more degrees of freedom in once you’ve fixed the major ones – like the limiting exponential decline. The benefit here is you can also use this simpler model to do outlier detection. The tradeoff is you could miss the best answer by providing a misguided guess.

      Another thing you could do is decrease your step size on the optimizer or change the parameter window. For instance, if your decline factor ranges from .001 to .009, change it so that it scales from 1 to 1000.

      Hope that helps. We are working on a more robust algorithm for this, and that’s the approach we take.

  9. Question on storing the curves – how do you do this? Is this via R or Python? Can you store multiple type curves to compare on a graph?

  10. I ended up using the spg optimizer in the BB package. Advantages are:

    – you can put box constraints in, aka an upper/lower bound
    – function is not as sensitive to initial guesses
    – you can output iteration information, which can be useful for diagnostics (number of iterations, convergence success, etc)
    – you can rather easily modify the convergence criteria in order to tune your run-time performance. Depending on your function and how reasonable you can make your initial guesses, you can likely relax the criteria significantly from default, making it run much faster.

    Good luck all!

  11. I used both the optim function and the spg function in the BB package. I get the error: ‘Error in nlminb(c(q[1], -((q[2] – q[1])/q[1])/(t[2] – t[1]), 1.5, 0.1 : failed with unrecognized error code (63)’.
    Did anybody experience the same and rectify it? Google search turns up nothing.

Leave a Comment

Your email address will not be published. Required fields are marked *