Data Science & Analytics / Drilling & Completions / Transfer

Demystifying Completions Data: Collecting and Organizing Data for Analytics (Part 2)

As promised, let’s now walk through a specific example to illustrate an approach to analytics that we’ve seen be very effective.

I’m going to focus more on the methodology and the tools used rather than the actual analysis. The development of stacked pay is critical to the Permian as well as other plays. Containment and understanding vertical frac propagation is key to developing these resources economically. We might want to ask if a given pumping design (pump rate, intensity, landing) will stay in the target interval or break into other, less desirable rock. There are some fundamental tradeoffs that we might want to explore. For example, we may break out of zone if we pump above a given rate. If we lower the pump rate and increase the duration of the job, we need to have some confidence that the increase in day rates will yield better returns.

We can first build simulations for the frac and look at the effects of different completions designs. We can look at offset wells and historical data – though that could be challenging to piece together. We may ultimately want to validate the simulation and test different frac designs. We could do this changing the pumping schedule at different stages along the lateral of multiple wells.

Data collection

With this specific question in mind, we need to determine what data to collect. The directional survey, the formation tops (from reference well logs) and the frac van data will all be needed. However, we will also want micro seismic to see where the frac goes. Since we want to understand why the frac is either contained or not we will also need the stress profile across the intervals of interest. These could be derived from logs but ideally measured from DFITs. We may also want to collect other data types that we think could be proxies to relate back to the stress profile, like bulk seismic or interpreted geologic maps.

These data types will be collected by different vendors, at different times, and delivered to the operator in a variety of formats. We have bulk data, time series data, data processed by vendors, data interpreted by engineers and geologists. Meaningful conclusions cannot be derived from any one data type, only by integrating them can we start to see a mosaic.


Integrating the data means overcoming a series of challenges. We first need to decide where this data will live. Outlook does not make a good or sustainable data depository. Putting it all on a shared drive is not ideal as it’s difficult to relate. We could stand up a SQL database or bring all the data into an application and let it live there but both have drawbacks. Our approach leverages which uses a NoSQL back end. This provides a highly scalable and performant environment for the variety of data we will need. Also, by not trapping the data in an application (in some proprietary format) it can easily be reused to answer other questions or by other people in the future.

Getting the data co-located is a start but there’s more work to be done before we can run analytics. Throwing everything into a data lake doesn’t get us to an answer and it’s why we now have the term “data swamp”. A critical step is relating the data to each other. takes this raw data and transforms it using a standard, open data model and robust well alias system; all built from the ground up for O&G. For example, different pressure pumping vendors will have different names for common variables (maybe even different well names) that we need to reconcile. We use a well-centric data model that currently supports over 60 data types and exposes the data through an open API. also accounts for things like coordinate reference systems, time zones, and units. These are critical corrections to make since we want to be able to reuse as much of our work as possible in future analysis. Contrast this approach with the one dataset – one use case approach where you essentially rebuild the data source for every question you want to ask. We’ve seen the pitfalls of that approach as you quickly run into sustainability challenges around supporting these separate instances. At this point we have an analytics staging ground that we can actually use.

Interacting with and analyzing data

With the data integrated we need to decide how users are going to interact with the data. That could be through Matlab, Spotfire, python, excel, or PowerBI. Obviously, there are trade-offs here as well. Python and Matlab are very flexible but require a lot of user expertise. We need to consider not only the skill set of the people doing the analysis, but the skill set of the those who may ultimately leverage the insights and workflows. Do only a small group of power users need to run this analysis, or do we want every completions engineer to be able to take these results and apply them to their wells? We see a big push for the latter and so our approach has been to use a combination of custom web apps we’ve created along with O&G specific Spotfire integrations. Spotfire is widespread in O&G and it’s great for workflows. We’ve added custom visualizations and calculations to Spotfire to aid in the analysis. For example, we can bring in the directional surveys, grids, and micro seismic points to see them in 3D.

Figure 4: enables a user friendly interface, meeting engineers where they are already working with integrations to Spotfire and web apps.

We now have the data merged in an open, NoSQL back end, and have presented that processed data to end users through Spotfire where the data can be visualized and interrogated to answer our questions. We can get the well-well and well-top spacing. We can see the extent of vertical frac propagation from the micro seismic data. From here we can characterize the frac response at each stage to determine where we went out of zone. We’re building a 360 view of the reservoir to form a computational model that can be used to pull out insights.

In the third and final post of this series, we will continue this containment example and review how we can extend our analysis across an asset. We’ll also revisit the data integration challenges as we expand our approach to other questions we may want to ask while designing completions.


1 thought on “Demystifying Completions Data: Collecting and Organizing Data for Analytics (Part 2)

  1. Pingback: Demystifying Completions Data: Collecting and Organizing Data for Analytics (Part 3) — Data Shop Talk

Leave a Reply

Your email address will not be published. Required fields are marked *