Loss of contact with the borehole. Cave-ins. Sub-optimal mud types. Recording errors. Unit conflicts. Equipment failure. A number of things can cause a Well Log to have bad hole readings. Perhaps the Caliper log indicates a series of unreliable borehole sections, and an expert has flagged them. Perhaps the expert has run an outlier detection algorithm to identify aberrant well log readings.
What next? What to do? Re-logging the well is often cost prohibitive.
We’ve built a Synthetic Well Log tool in Spotfire that uses machine learning to help replace those bad hole values with more accurate ones.
Our tool uses the theory behind academic studies (e.g. An Artificially Intelligent Technique to Generate Synthetic Geomechanical Well Logs for the Bakken Formation, Synthetic well logs generation via Recurrent Neural Networks, Generating Synthetic Well Logs by Artificial Neural Networks (ANN) Using MISO-ARMAX Model in Cupiagua Field) and supports several machine learning algorithms (Random Forest, Gradient Boosting, and Support Vector Machines). The algorithms ingest the curves which do not have data integrity issues in order to predict more accurate values of the missing or faulty curve.
Jumping into the machine learning arena might feel daunting, so we developed this tool to help Geo experts with the process. Better yet, our tool doesn’t just reach into a black box and hand back a reconstructed well log. The Synthetic Well Log tool:
- Works with the Geo expert to build an imputation model
- Lets that person examine modeling validation metrics
- Displays the predictions and reconstructed logs next to the original for a sanity check
- Exports the chosen reconstructed model
Our tool puts reconstructed curves right next to the originals so they pass the expert’s eye test as well as the modeling diagnostics.
Let me walk you through a use case we experienced with a client.
Their Geo expert had already culled through the dataset and tagged depths with bad hole readings (machine learning can also help identify bad hole readings via outlier detection, but we want to focus on the log reconstruction for this post). We wanted to take those flags and reconstruct several well logs, replacing the observations flagged as bad with the predictions from our model.
The first step as always was loading the data. We used our Spotfire Petro Panel, which allows users to seamlessly pull in data. First, we chose the well we wanted to QC from a header list (and/or map).
While we only see two wells above, we can work with literally thousands of well logs at a time since we are filtering from the header list. Once we selected our well, the Petro Panel pulled its well log data into Spotfire.
Next, we mapped the well log columns to their corresponding columns in the template. None of these columns other than Depth and Bad Hole Flag are mandatory.
With the data mapped, we could see the original data in a well log visualization, the bad hole areas marked in red on the right end:
The red lines are the areas with bad hole readings. When we zoom in later, we will get a better look at the bad hole observations.
With our data loaded and mapped, we built the imputation models. Studies have identified several strategies for reconstructing curves. We used valid curves as inputs to predict the sections of a curve with bad hole readings.
In the dropdown below, we can choose what variables to use as inputs. We selected Neutron, Gamma Ray, Shallow Resistivity, and Deep Resistivity as predictors and Sonic as the curve to predict.
We also set various hyper-parameters for each model, as seen below.
Currently, the tool supports Random Forest, Support Vector Machine, and Gradient Boosting Machine algorithms.
- Random Forest: a versatile algorithm, robust to many different scenarios. Maintains a competitive accuracy across a variety of scenarios.
- Gradient Boosting Machine: can be even more accurate than Random Forest but is highly dependent on parameter tuning and the use case.
- Support Vector Machine: robust to overfitting but highly dependent on parameter tuning. Also good in high dimensional cases if you have many, many curves.
Other algorithms like Neural Nets may be implemented in the future.
After running the models, we used the tool to look at diagnostics for each model…
Above we see several useful measures:
- Root Mean Squared Error to see how accurate the predictions are
- Percent Variance Explained, an R-squared equivalent
- Line chart to measure how more trees affect error rate (to determine point of diminishing returns)
- Variable importance Plot to explain what inputs the model leaned on
- Scatterplot to evaluate the model’s predictions vs actual values (points hugging the line indicate a closer fit)
We checked the results, then iterated, trying models with different inputs to see how the model diagnostics were affected. Once we had a model we liked for that algorithm, we moved onto the next algorithm.
And once we ran models for the different algorithms, we were able to compare the models…
Above, we see that the Random Forest model has a 2.4421 RMSE compared to 4.1862 in the Gradient Boosting Machine, a higher % Variance Explained, and the Random Forest prediction points fall much closer to the regression line than the GBM.
Those diagnostics gave us an idea that the Random Forest predictions fit the data better. Let’s take a look at how they look in a Well Log visualization.
Choosing a Model
We needed a sanity check to make sure our predicted curves made sense and to give our Geo expert an opportunity to use his highly trained eyes to validate them. So, we put our predictions on a track with the original curve.
Below, we see a Random Forest prediction in turqoise and GBM in pink, overlayed on the original in red. We also see a track with the bad hole flags (BHF), indicating where the bad hole readings are so we can examine how a reconstructed curve would look in those spots.
We can look at multiple curves with their predictions in adjacent tracks.
Notice the Photoelectric curve (PEF) in the marked section above. An entire section of the curve was missing, but both models were still able provide a viable curve.
We can also zoom in for a closer look.
In the instance above, the bad hole section is between the two red lines. Within that section, we see an actual break in the curve, which both models can remedy. Further, they can just as easily improve faulty data as absent data.
Remember, we used Neutron, Gamma Ray, Shallow Resistivity, and Deep Resistivity logs to predict the Sonic log. We subsequently used the same to predict Density and PEF. We can look at all of the tracks side by side to see how they interact.
After evaluating all the predicted curves vs original curves and considering the model diagnostics, we were able to make a decision and select the models to be used in our reconstruction. For the Sonic curve, the model diagnostics preferred the Random Forest model and nothing in the reconstructed curve suggested otherwise to our expert, so we chose that one.
Synthetic Well Logs
After choosing the models for each desired curve reconstruction, we looked at the reconstructed curves overlaid on the originals.
Of course, when there is not a Bad Hole Flag, the curves will overlap. However, when a Bad Hole Flag exists, the dotted red line shows how our reconstructed, Synthetic Well Log diverges from the original.
So, there you have it! With our tool, we were able to build machine learning models to replace bad hole readings with predicted values, evaluate the model diagnostics, compare the models in a Well Log Visualization, and reconstruct the curves to create a Synthetic Well Log. The reconstructed curves can be saved or exported.
So, next time you identify dubious well log readings or can’t afford to re-log a well to recover a specific curve, don’t worry about a re-log. Just fix your curves with Petro.ai.
Jason is a Data Scientist at Ruths.ai with a master’s degree in Predictive Analytics and Data Science from Northwestern University. He has experience with a multitude of machine learning techniques such as Random Forest, Neural Nets, and Support Vector Machines. With a previous Master’s in Creative Writing, Jason is a fervent believer in the Oxford comma.