- Are you looking for a more efficient way to perform pairwise comparisons than changing variables on the x and y axis or toggling a drop down.
Before performing any type of predictive analytics, it’s always helpful to know how variables relate or correlate to each other. This is part of data profiling. In Spotfire, you can use a data function to create a correlation table or pairwise comparison table. Then, using a scatter plot connected to both the pairwise comparison table and the raw data table, you will be able to click on records in the pairwise comparison table and see the correlations and raw data.
I have created a DXP called lending_club_loans (2) to accompany this post using data from Data.World on Lending Club loans. I have dabbled a bit with Lending Club myself, so I found this data set particularly interesting.
So, let’s go!
Here is the data function to create the pairwise comparison table. One input and one output parameter are needed, and those are also shown below. If you are unfamiliar with how to input data functions, go to this post.
This data function uses the cor() R function to calculate correlations between numeric columns of data. You can read about that here.
checkNumeric <- sapply(PairsInputTable, is.numeric)
pairs <- cor(na.omit(PairsInputTable[, checkNumeric, drop=FALSE]))
which <- row(pairs) > col(pairs)
PairsOutputTable <- data.frame(
X = rownames(pairs)[row(pairs)[which]],
Y = colnames(pairs)[col(pairs)[which]],
Pairwise = pairs[which])
Note that when the data function runs, you won’t see any big change, but there should be a new table in Edit, Data Table properties.
After the pairwise comparison table has been created, follow these steps to setup the scatter plot.
- Add a table visualization connected to the PairsOutputTable
- NOTE: You may want to create a custom marking for the PairsOutputTable, which can be done by right clicking on the Marking selector in the legend and selecting ‘New’. Then name a custom marking. If you don’t see Marking in the legend, right click in the white space of the legend and select Marking to turn it on.
- Add a scatter plot, and set the data table to the table that contains the raw data.
- Right click on the x axis select and choose ‘Column from Marked’
- Set the data table to the PairsOutputTable, choose the ‘X’ column. If you setup a custom marking, choose in from the ‘Row from marking:’ drop down. Click OK.
- Repeat these steps for the Y axis selector using the ‘Y’ column.
- Right click on the scatter plot, select ‘Properties’ and go to the Lines & Curves menu.
- Add a straight line fit.
- Click the Labels and Tooltips button and check the R2 box under Labels.
- Click OK.
Now, when the PairsOutputTable is marked, the scatter plot will update with the raw data, and you can see the R2. Note, this will note be the same number as the correlation show in the table.
Guest Spotfire blogger residing in Whitefish, MT. Working for SM Energy’s Advanced Analytics and Emerging Technology team!