# Excel to Spotfire: Targeting Missing Values

In Data Science, prior to using any analytics we always face the problem of having missing values. Deciding when to delete cases or fill missing values is totally dependent on the data set and the target problem. A general practice consists on deleting variables and samples with more that 30% of missing values and then use Multiple Imputation techniques to fill the remaining missing values. In Excel, we can use the replace tool or the filter to approach this problem, and even use Visual Basic to code a more customized solution. In Spotfire, we have the advantage of using more advanced methods by accessing R libraries that contain MCMC, Bayesian and Multivariate Algorithms. Spotfire’s integrative tools really make a difference on how to approach the missing values problem: putting together advanced algorithms, amazing visualizations and user interactivity.

# Build Type Wells using Selected Wells Method in DCA Wrangler

Reserves evaluators often want to build a Percentile Type Well that represents a certain percentile of the population. It is desired to determine a “P90 Type Well”, “P50 Type Well”, or “P10 Type Well”. When expressed this way evaluator is inherently seeking a type well that results in a percentile EUR. The P90 Type Well will be a representative well where there is a 90% chance that the EUR will be that number or greater. There are two published methods for creating Percentile Type Wells, Time Slice approach and Selected Wells approach.

So, the Percentile Type Wells are expected to provide a forecast that will have an EUR consistent with the target probability. This is not possible with the Time Slice method because that method is based on Initial Productivity (IP) and rates. In other words, Time Slice method makes an implicit assumption of a strong correlation between IP and EUR, whereas in a real-world scenario correlation between IP and EUR has a wide scatter, resulting in a Type Well with an EUR that does not represent the desired percentile. Refer to SPE – 162630 for a more technical discussion on the two methods.

In this blog post we will go through a workflow on how to create Type Wells using the Selected Wells method in DCA Wrangler. We created a template that creates Type Wells using Selected Wells Method, Time Slice Method and using individual well forecasts in the Selected Wells Method.

Following is the workflow for Selected Wells Method:

1. Select wells in an Area of Interest (AOI)

2. Create an Auto-Forecast for all the selected wells with desired number of years using DCA wrangler. While doing the Auto Forecast we will use a three-segment approach. The first segment with a constrained b – factor between 1 and 2 (this will take care of the characteristic steep initial decline present in most MFHWs in unconventionals). The second segment with a constrained b – factor between 0 and 1. The third segment for terminal exponential decline.

3. Generate Well DCA and Well DCA Time results in DCA Wrangler. The Well DCA Time table will have the forecast data for all the wells created using the fitted Arps Model. Remember to refresh these tables every time you change the wells in your AOI.

4. Next, we will find wells for Target EUR probabilities on an EUR Probit plot generated using all the wells in our AOI. We can enter a threshold value (α) to find wells which have their EUR within the (1 ± α) × EUR at the target probabilities. We can also quickly check the number of wells present within the threshold at each of the target probabilities. Adjust the threshold to get a minimum desired number of wells at each of the target probabilities.

5. Now we can create Percentile Type Wells for our AOI by running DCA Wrangler in the Type Well mode using the wells we selected in our previous step.

Check out the template and try it with your production data.

Nitin is a Data Scientist at Ruths.ai working passionately towards helping companies realize maximum potential of their data. He has experience with machine learning problems in clustering, classification and regression applying ensemble and Bayesian approaches with toolsets from R, Python, and Spotfire. He is currently pursuing his PhD in Petroleum Engineering at Texas A&M University, where his research is focused on applications of machine learning algorithms in petroleum engineering workflows. He enjoys cycling, running and overindulging in statistical blogs in his pastime.

# The Art of Data Simulation

Most problems in the scientific world are about understanding different phenomena. We want to learn the characteristics and patterns of the systems we study to be able to preview and predict behavior. As humans, we learn by observing these processes when they happen naturally or with controlled experiments. This might not be an option if we are studying a rare or dangerous event.

# Spatial Objects Using TERR

A key part of analytics in the oil and gas industry is evaluating opportunities at different locations. Space is always present when looking for profitable development projects. We usually look at the already in production wells and try to find some spatial trends. To stay competitive, we need to find better ways  to access the  data of different areas and its wells. For instance, we can transform the spatial information to compact objects that store the location and shape of each well and lease. These objects can be feed  to different calculations and analyses as geometries. For Spotfire, it also has some advantages, you can use the feature layers of the map chart. In this case, we can visualize the leases as polygons and wells as lines.

# NFL: Predicting 2018 Win Totals with Data Science

With the Super Bowl just behind us, it’s time to predict wins for the 2018 NFL Season.  At the start of the playoffs, we looked at a model which predicted how many games NFL teams should have won in 2017 and compared our results to Football Outsider’s Pythagorean Win Expectancy.  We were able to improve on Pythagorean Win Expectancy for last year’s results, aka how many games a team should have won, but our backwards looking models were unable to beat Pythagorean Win Expectancy in predicting next year’s wins.  Today, we will build some models trying specifically to predict how many games teams will win next year.

If you simply want to know how many games your team will win in 2018, strictly for recreational purposes of course, you can skim to the end or check out our Spotfire Template.  But, for Football Outsiders fans, those interested in what makes up wins and losses, or those interested in the Data Science process, read on.

Jason is a Junior Data Scientist at Ruths.ai with a Master’s degree in Predictive Analytics and Data Science from Northwestern University. He has experience with a multitude of machine learning techniques such as Random Forest, Neural Nets, and Hidden Markov Models. With a previous Master’s in Creative Writing, Jason is a fervent believer in the Oxford comma.

# How many games should your NFL team have won this season?

How many games should your NFL team have won this season?  Everyone knows a lucky bounce here and a bad call there can have a significant impact on the win-loss bottom line.  Hard core fans of Sports Analytics would recognize this factor as the driver behind Pythagorean Win Totals, a statistic derived to measure true performance.  Today, we are going to look to see if we can beat Pythagorean Win Totals as a predictor for how many games a team won in a certain season. IE, how many games should your team have won.

Spoiler:  we can make a better predictor, but in a way that makes us re-evaluate our understanding of Pythagorean Win Totals.

If you simply want to know how many games your team should have won, you can go straight to our Spotfire Template.  But, for Football Outsiders fans or those more interested in what makes up wins and losses, read on.

Jason is a Junior Data Scientist at Ruths.ai with a Master’s degree in Predictive Analytics and Data Science from Northwestern University. He has experience with a multitude of machine learning techniques such as Random Forest, Neural Nets, and Hidden Markov Models. With a previous Master’s in Creative Writing, Jason is a fervent believer in the Oxford comma.

# Wrangling Data Science in Oil & Gas: Merging MongoDB and Spotfire

Data science in Oil and Gas is central stage as operators work in the new “lower for longer” price environment. Want to see what happens when you solve data science questions with the hottest new database and powerful analytics of Spotfire? Read on to learn about our latest analytics module, the DCA Wrangler. If you want to see it in action, scroll down to watch the video.

## Layering Data Science on General Purpose Data & Analytics

Ruths.ai is a startup focused on energy analytics and technical data science. We are both TIBCO and MongoDB partners, heavily leveraging these two platforms to solve real-world problems revolving around the application of data science at scale and within the enterprise environment. I started our plucky outfit a little under four years ago. We’ve done a lot of neat things with Spotfire including analyzing seismic, and well log data. Here, we’ll look at competitor/production data.

MongoDB provides a powerful and scalable general purpose database system. TIBCO provides tested and forward thinking general purpose analytics platforms for both streaming and data at rest. They also provide great infrastructure products which isn’t in focus in this blog.

Ruths.ai provides the domain knowledge and we infuse our proprietary algorithms and data structures for solving common analytics problems into products that leverage the TIBCO and MongoDB platforms.

We believe that these two platforms can be combined to solve innumerable problems in the technical industries represented by our readers. TIBCO provides the analytics and visualization while MongoDB provides the database. This is a powerful marriage for problems involving analytics, single view or IOT.

In this blog, I want to dig into a specific and fundamental problem within oil and gas and how we leveraged TIBCO Spotfire and MongoDB to solve it — namely Autocasting.

## What is Autocasting?

Oil reserves denote the amount of crude oil that can be technically recovered at a cost that is financially feasible at the present price of oil. Crude oil resides deep underground and must be extracted using wells and completion techniques. Horizontal wells can stretch two miles within a vertical window the height of most office floors.

For those with E&P experience, I’m going to elide some important details, like using “oil” for “hydrocarbons” and other technical nomenclature.

Because the geology of the subsurface cannot be examined directly, indirect techniques must be used to estimate the size and recoverability of the resource. One important indirect technique is called decline curve analysis (DCA), which is a mathematical model that we fit to historical production data to forecast reserves. DCA is so prevalent in oil and gas that we use it for auditing, booking, competitor analysis, workover screening, company growth and many other important tasks. With the rise of analytics, it has therefore become a central piece in any multi-variate workflow looking to find the key drivers for well and resource performance.

At the heart of any resource assessment model is a robust “autocasting” method. Autocasting is the automatic application of DCA to large ensembles of wells, rather than one at a time.
But there’s a problem. Incumbent technologies make the retrieval of decline curves and their parameters very difficult. Decline curve models are complex mathematical forecasts with many components and variation. Retrieving models from a SQL database often requires parsing text expressions. And interacting with many tables within a database.

Further, with the rise of unconventionals, the fundamental workflow of resource assessment through decline curves is being challenged. Spotfire has become a popular tool for revamping and making next generation decline curve analysis solutions.

## Autocasting in Action

What I am going to demonstrate is a new autocast workflow that would not be possible without the combined performance and capability of MongoDB and Spotfire. I’ll be demonstrating using our DCA Wrangler product – which is one of over 250 analytics workflows that we provide through a comprehensive subscription.

Its important to note that software exists to decline wells and database their results. People have even declined wells in Spotfire before. What I hope you see in our new product is the step change in performance, ease-of-use, and enablement when you use MongoDB as the backend.

## What’s Next?

First, we have a home run solution for decline curves that requires a MongoDB backend. In the near future, more vendor companies will be leveraging Mongo as their backend database.

Second, I hope you see the value in MongoDB for storing and retrieving technical data and analytic results, especially within powerful tools like Spotfire. Plus, how easy it is to set up and use.

And Lastly, I hope you get excited about the other problems that can be solved by marrying TIBCO with MongoDB – imagine using Streambase as your IOT processor and MongoDB as your deposition environment. Or even store models and sensor data within Mongo and use Spotfire to tweak model parameters and co-visualize data.

If you’re interested in learning more about our subscription, get registered today.

Let’s make data great again.

You’ll conquer the present suspiciously fast if you smell of the future….and stink of the past.

# TERR — Converting strings to date and time

This post explains my struggle to convert strings to Date or Time with TERR.  I recently spent so much time on this that I thought it deserved a blog post.  Here’s the story…

I was recently working on a TERR data function that calls a publicly available API and brings all the data into a table.  I used the as.data.frame function to parse out my row data.  In that function, I used the stringsAsFactors = FALSE argument, and as a result (the desired result), all of my data came back as strings.  This was fine because the API included column metadata with the data type.  As you can see in the script below, I planned on “sapplying” through the metadata with as.POSIXct and as.numeric.  This worked just fine in RStudio, and it also worked for the numeric columns and for the DateTime columns.  However, it did not work for Date and Time columns.  I tried different syntax, functions (as.Date didn’t work either), packages, etc to get it to work and NOTHING!  The struggle was very real.

Finally, I Googled the right terms and came across a TIBCO knowledge base article with this information….

##### Spotfire data functions recognize TERR objects of class “POSIXct” as date/time information. As designed, the Spotfire/TERR data function interface for date/time information does the following:

– Converts a Spotfire value or column whose DataType is “Date”, “Time” or “DateTime” into a TERR object of class “POSIXct”.

– Converts a TERR object of class “POSIXct” into a Spotfire value or column with a DataType of “DateTime”, which can then be formatted in Spotfire to display only the date (or to display only the time) if needed.

This interface does not use any other TERR object classes (such as the “Date” class in TERR) to transfer date/time information between Spotfire and TERR.

That told me that all my effort was for naught, and it just wasn’t possible.  I contacted TIBCO just to make sure there wasn’t some other solution out there that the article was not addressing.  In the end, I just used a transformation on the Date and Time columns to change the data type.  I hope that you, dear Reader, find this post before you spend hours on the same small problem.  I did put in an enhancement request.  Fingers crossed.  Please let me know if you have a better method!

Guest Spotfire blogger residing in Whitefish, MT.  Working for SM Energy’s Advanced Analytics and Emerging Technology team!

# Data Science Toolkit Improvements

This week, I was able to test out the latest and greatest changes to the Ruths.ai Data Science Toolkit.  New options and features allow users to easily split test and training data sets prior to model building, as all good data scientists should!  This new functionality speeds up your analysis by making model build and evaluation faster and more efficient.  I worked up this video to demonstrate.

## Data Science Toolkit for Spotfire

The Data Science Toolkit brings the power of advanced data science to Spotfire.  Ruths.ai designed it with simplicity and efficiency in mind to support a wide range of analytics applications. This extension is coupled with comprehensive training that provides both beginner and experienced users a strong foothold in data science analysis.  The Data Science Toolkit is available to Premium subscribers.  Once deployed on your Spotfire server, quickly and easily access the toolkit via the Tools menu as shown below.  Find out more, including videos, at this link.