In Data Science, prior to using any analytics we always face the problem of having missing values. Deciding when to delete cases or fill missing values is totally dependent on the data set and the target problem. A general practice consists on deleting variables and samples with more that 30% of missing values and then use Multiple Imputation techniques to fill the remaining missing values. In Excel, we can use the replace tool or the filter to approach this problem, and even use Visual Basic to code a more customized solution. In Spotfire, we have the advantage of using more advanced methods by accessing R libraries that contain MCMC, Bayesian and Multivariate Algorithms. Spotfire’s integrative tools really make a difference on how to approach the missing values problem: putting together advanced algorithms, amazing visualizations and user interactivity.
Most problems in the scientific world are about understanding different phenomena. We want to learn the characteristics and patterns of the systems we study to be able to preview and predict behavior. As humans, we learn by observing these processes when they happen naturally or with controlled experiments. This might not be an option if we are studying a rare or dangerous event.
A key part of analytics in the oil and gas industry is evaluating opportunities at different locations. Space is always present when looking for profitable development projects. We usually look at the already in production wells and try to find some spatial trends. To stay competitive, we need to find better ways to access the data of different areas and its wells. For instance, we can transform the spatial information to compact objects that store the location and shape of each well and lease. These objects can be feed to different calculations and analyses as geometries. For Spotfire, it also has some advantages, you can use the feature layers of the map chart. In this case, we can visualize the leases as polygons and wells as lines.
Linear Regression models are the simplest linear models available in statistical literature. While the assumptions of linearity and normality seem to restrict the practical use of this model, it is surprisingly successful at capturing basic relationships and predicting in most scenarios. The idea behind the model is to fit a line that mimics the relationship between target variables and a combination of predictors (called independent variables). Multiple regression refers to only one target variable and multiple predictors. These models are popular not only for solving the prediction task but also for working as a model selection tools allowing to find the most important predictors and eliminate redundant variables from the analysis.
Incomplete data is a problem that Data Scientists face every day. Most common practices vary from complete deletion of the observations with missing values, substitution by a fixed value, or performing imputation using statistics like the mean or median. Since these approaches have limitations on capturing the structure of the data, scientists have developed more sophisticated methods.
The “spTimer” package uses three Bayesian models to fit Spatio-Temporal Data. The data may be given at sparse spatial stations, where observations at each station are considered time series. The package can model the residual spatio-temporal variation to measure uncertainty. It also gives flexibility to customize covariance function selection, the hyper-parameters of the prior distributions and the tuning parameters for the implemented MCMC algorithms.