Month: July 2017

CRISP DM: Deployment

Welcome to the next installment of our Analytics Journey, which explores how we at apply the CRISP-DM method to our Data Science process. Previously, we looked at an overview of the methodology as a whole as well as the Business UnderstandingData UnderstandingData Preparation, Modeling, and Evaluation stages.  Next, we examine the final stage:  Deployment.

The.  Final.  Stage.  Now, we just have to turn this thing on and reap the rewards, right?


Unfortunately, Deployment does not just happen with the push of a George Jetson button.

Read More

Using Support Vector Machines in Spotfire

(Image Source:

Support Vector Machines (SVMs) is one of the most popular and most widely used machine learning algorithms today. It is robust and allows us to tackle both classification and regression problems. In general, SVMs can be relatively easy to use, have good generalization performance, and often do not require much tuning. Follow this link for further information regarding support vector machines. To help illustrate the power of SVMs, we thought it would be useful to go through an example using a custom template we have created for SVMs.

Read More

Using the “spTimer” Package to Model Spatio-Temporal Data in R

The “spTimer” package uses three Bayesian models to fit Spatio-Temporal Data. The data may be given at sparse spatial stations, where observations at each station are considered time series. The package can model the residual spatio-temporal variation to measure uncertainty. It also gives flexibility to customize covariance function selection, the hyper-parameters of the prior distributions and the tuning parameters for the implemented MCMC algorithms.

Read More

Marking, Filtering, and Limiting, Oh My!

To veteran Spotfire users, the distinction between Marking, Filtering, and Limiting might seem obvious; however, to an uninitiated member, some similarities might cause confusion. In fact, one often can obtain the same exact result using combinations of Marking, Filtering, and Limiting. All the methods allow the user to make a click in one area that affects other visualizations. All the methods in their own way highlight a subset of the data.

Read More

Missing Value Imputation with Data Augmentation in R

Incomplete data is a problem that Data Scientists face every day. Most common practices vary from complete deletion of the observations with missing values, substitution by a fixed value, or performing imputation using statistics like the mean or median. Since these approaches have limitations on capturing the structure of the data, scientists have developed more sophisticated methods.

Read More