Author: Troy Ruths

You'll conquer the present suspiciously fast if you smell of the future....and stink of the past.

Wrangling Data Science in Oil & Gas: Merging MongoDB and Spotfire

Data science in Oil and Gas is central stage as operators work in the new “lower for longer” price environment. Want to see what happens when you solve data science questions with the hottest new database and powerful analytics of Spotfire? Read on to learn about our latest analytics module, the DCA Wrangler. If you want to see it in action, scroll down to watch the video.

Layering Data Science on General Purpose Data & Analytics is a startup focused on energy analytics and technical data science. We are both TIBCO and MongoDB partners, heavily leveraging these two platforms to solve real-world problems revolving around the application of data science at scale and within the enterprise environment. I started our plucky outfit a little under four years ago. We’ve done a lot of neat things with Spotfire including analyzing seismic, and well log data. Here, we’ll look at competitor/production data.

The Document model allows for flexible and powerful encoding of decline curve models.

MongoDB provides a powerful and scalable general purpose database system. TIBCO provides tested and forward thinking general purpose analytics platforms for both streaming and data at rest. They also provide great infrastructure products which isn’t in focus in this blog. provides the domain knowledge and we infuse our proprietary algorithms and data structures for solving common analytics problems into products that leverage the TIBCO and MongoDB platforms.

We believe that these two platforms can be combined to solve innumerable problems in the technical industries represented by our readers. TIBCO provides the analytics and visualization while MongoDB provides the database. This is a powerful marriage for problems involving analytics, single view or IOT.

In this blog, I want to dig into a specific and fundamental problem within oil and gas and how we leveraged TIBCO Spotfire and MongoDB to solve it — namely Autocasting.

What is Autocasting?

Oil reserves denote the amount of crude oil that can be technically recovered at a cost that is financially feasible at the present price of oil. Crude oil resides deep underground and must be extracted using wells and completion techniques. Horizontal wells can stretch two miles within a vertical window the height of most office floors.

For those with E&P experience, I’m going to elide some important details, like using “oil” for “hydrocarbons” and other technical nomenclature.

Because the geology of the subsurface cannot be examined directly, indirect techniques must be used to estimate the size and recoverability of the resource. One important indirect technique is called decline curve analysis (DCA), which is a mathematical model that we fit to historical production data to forecast reserves. DCA is so prevalent in oil and gas that we use it for auditing, booking, competitor analysis, workover screening, company growth and many other important tasks. With the rise of analytics, it has therefore become a central piece in any multi-variate workflow looking to find the key drivers for well and resource performance.

The DCA Wrangler provides fast autocasting and storage of decline curves. Actual data (solid) is modeled using best-fit optimization on mathematical models (dashed line forecast).

At the heart of any resource assessment model is a robust “autocasting” method. Autocasting is the automatic application of DCA to large ensembles of wells, rather than one at a time.
But there’s a problem. Incumbent technologies make the retrieval of decline curves and their parameters very difficult. Decline curve models are complex mathematical forecasts with many components and variation. Retrieving models from a SQL database often requires parsing text expressions. And interacting with many tables within a database.

Further, with the rise of unconventionals, the fundamental workflow of resource assessment through decline curves is being challenged. Spotfire has become a popular tool for revamping and making next generation decline curve analysis solutions.

Autocasting in Action

What I am going to demonstrate is a new autocast workflow that would not be possible without the combined performance and capability of MongoDB and Spotfire. I’ll be demonstrating using our DCA Wrangler product – which is one of over 250 analytics workflows that we provide through a comprehensive subscription.

Its important to note that software exists to decline wells and database their results. People have even declined wells in Spotfire before. What I hope you see in our new product is the step change in performance, ease-of-use, and enablement when you use MongoDB as the backend.

What’s Next?

First, we have a home run solution for decline curves that requires a MongoDB backend. In the near future, more vendor companies will be leveraging Mongo as their backend database.

Second, I hope you see the value in MongoDB for storing and retrieving technical data and analytic results, especially within powerful tools like Spotfire. Plus, how easy it is to set up and use.

And Lastly, I hope you get excited about the other problems that can be solved by marrying TIBCO with MongoDB – imagine using Streambase as your IOT processor and MongoDB as your deposition environment. Or even store models and sensor data within Mongo and use Spotfire to tweak model parameters and co-visualize data.

If you’re interested in learning more about our subscription, get registered today.

Let’s make data great again.

5 Simple Prep Steps for Multivariate Analyses

Trust me, I get excited about a new data set just like anybody. I just want to tear straight to the good stuff and find those hidden correlations and to use all my fancy tests and methods. But before you get to that point, it’s important to run important preparation on your data – each a critical “Prep Step” that can save you time, rework, and wrong conclusions down the line.

Read More

Making a QQ Plot in Spotfire with TERR

QQ Plots are a standard visualization that compares the distribution of your data under study to the normal distribution. Since most statistical tests assume normality, the QQ Plot is an important diagnostic visualization during any analysis of uni-variate or multi-variate studies. We had a previous post that made a QQ Plot using custom expressions, and in this post we will show how to do it in TERR.

Read More

PCA in Spotfire TERR

PCA (Principal Component Analysis) is a core data science technique for not only understanding colinearity of independent variables in a dataset, but can provide a reduced dimensional model by rotating your high-D data into lower dimensions. Here’s some quick info on getting PCA in Spotfire. If you want more info on PCA, of course check out Wikipedia or a great interactive example on Setosa.

Read More

Adding Collaboration to Spotfire using SharePoint

Spotfire is a powerful BI tool but lacks some fundamental collaborative functions. As a core principle of BI, tools are meant to report on and not change systems of record. Of course, anyone who has built BI solutions knows that in the real world you want to add the ability for users to capture their findings or edit/adjust key values without adjusting or talking to source vendor systems. Enter SharePoint! Here’s a quick guide for creating a collaborative analytics dashboard in Spotfire using SharePoint.

Read More

Data Science Design Pattern: Train & Predict

Spotfire is a great tool that lets you run asynchronous R code right next to your data and visualizations. This makes for what I like to call the Data Science Trifecta. There’s lots of applications out there that provide the Data Science Trifecta – data, visualizations, and computation – and I prefer Spotfire’s relational data model, snappy visualizations, and embedded R engine. So let’s talk about reusing predictive models in this Trifecta. If you’re eager to try it out, you can grab the template off of

Read More

Arps Type Curve in Spotfire in 20 minutes

Type curves are an important part of resource assessment of an oil and gas asset. In this workflow, well declines are aggregated to determine typical behavior of a well ensemble. These well ensembles usually reflect a reservoir or set of analog reservoirs that will help determine characteristic behavior. In this post, we will build a decline model for the group of wells that is called a type curve. The type curve will capture the production rate forecast for a single “average” well and so can be used to determine Estimated Ultimate Recover (EUR). Best yet, we’ll do it in 20 minutes. A little longer than GEICO, but you’ll save so much more money.

Read More

Creating well paths from control points

Well paths encode the trajectory and curvature of a wellbore – oftentimes tailored to avoid drilling hazards, improve productivity and reservoir contact, and reduce costs of drilling. There are consequently many factors that go into creating an ideal or optimized well path that reflect play characteristics, technology improvements, regional drilling environments and historical drilling performance. While there are many tools that provide functionality for designing wellbores, none have the ability to adapt to the drilling landscape and capture the historical performance statistics for a vendor or company. Enter Spotfire and its ad-hoc data and compute model.

Read More