Spotfire Solutions — Add probit plot to existing DXP

  • Would  you like to add a probit plot to an existing DXP file?

 

There are a few Spotfire probit plot templates out there, and they vary considerably in terms of functionality and complexity.  The difficulty with these templates is that because they are templates, you are supposed to add data to the template rather than add the probit to your existing DXP.  Therefore, I have put together a set of instructions for adding a probit plot to your DXP file. I want to emphasize that this is a long post, but that’s mostly because I provided every single step in great detail.  This should take 20 min for a user familiar with Spotfire and up to 40 min if you have to jump back and forth between the instructions and Spotfire.

Key Components in the Build

  1. 1-2 scatter plots
  2. 1-2 cross tables
  3. 1 map chart
  4. 1-2 text areas
  5. 4 drop down document properties

This build will create two probit plots (so that you can see multiple variables next to each other).  It is possible to create a single probit plot with multiple variables, but that requires some data wrangling and is not included in this set of instructions.  There will be functionality to:

  1. Filter and have calculations update for the reduced data set
  2. Mark any of the visualizations and see where the data falls on all of the other plots
  3. Change the color by variable with a drop down

Notes

  • The assumption for the data table is one row per well. If your data table has more than one row per well, some adjustment may be required.
  • Some of the components in this description are optional, such as the drop downs for the color by selector or having more than one scatter plot/cross table. If you only need one, only create one.
  • There are no calculated columns, only custom expressions written on the y axis of the scatter plots and cross tables. Because the expressions are written on the axis of the visualization, the calclations will update with filtering.
  • In order for the scatter plots and map chart to be connected, they MUST use the same marking.
  • The Marker By setting is also very important. Because my data table has one record per well, I can use (Row Number) as the Marker By variable.  If you have more than one well per row, the probit plots might require additional configuration not covered in this tutorial.
  • Do not reuse document properties. Create separate document properties for each scatter plot/cross table.
  • The 90th percentile in Spotfire is equivalent to P10. The 10th percentile in Spotfire is equivalent to P90.
  • Lastly, the one thing I don’t know how to do just yet, is get the axis ranges to hold fixed. Currently, every time you change the variable on the x axis, the range changes to the min and max for that column of data.  I am working on figuring this out and will hopefully have an update soon.

Calculations Used

  • Y axis of the scatter plot –> NormInv((Rank([${Probit}]) – 0.5) / Count([${Probit}]))as[Probit]
  • X axis of the cross table –>

UniqueCount([API]) as [Well Count],

P10([${Probit}]) as [P90],

P90([${Probit}]) as [P10],

P90([${Probit}]) / P10([${Probit}]) as [P10 / P90],

Avg([${Probit}]) as [Mean],

Median([${Probit}]) as [Median], ( 0.3 * P10([${Probit}])) + (0.4 * Median([${Probit}])) + (0.3 * P90([${Probit}])) as [Swanson’s Mean]

Step by Step Instructions

  1. Add all the pieces
    1. Add a new page
    2. Add a cross table, a scatter plot and a map (there will eventually be two cross table and two scatter plots, but these will be duplicated from the first one rather than configured separately)
      1. Set the data table on all visualizations
      2. Set the Marker by: to (Row number) on all visualizations
    3. Add two text areas
    4. Position the visualizations as desired. I recommend the following configuration: 9-13-2016-7-45-51-am
  2. Set the marking
    1. Right click on the marking selector of any of the visualizations and select New Marking. Create a new marking specific to this analysis.
    2. Click the Marking: drop down on each of the other visualizations and set the marking to the new marking just created.

9-12-2016-3-28-03-pm

  1. Create the property controls
    1. Right click on one of the text areas and select Edit text area
    2. Click the insert Property Control button in the toolbar (second from the right), select drop down list
    3. Click the new button
    4. Name the property control (no spaces, suggest naming with 1, 2 since there will be two controls) and do not add a value, click OK
    5. Choose Column selection in the Set property value through: drop down 9-12-2016-4-13-03-pm
    6. Set the data table
    7. Click the Select columns button in order to define which columns will appear in the drop down
    8. Click the New button – it’s easy to forget this step. In this dialog, you create a new column property and it will be attached to the document property you just created.  Document properties and column properties are different.
    9. Name column property (no spaces and can’t be the same name as the document property)
    10. Set the Default value: to false. A value of false means any columns added to the data table will NOT be added to the list.  A value of true means any columns added to the data table WILL be added to the list.
    11. Move the columns you want in the drop down from left to right.
    12. Click ok and click ok
    13. Repeat these steps to create the Color By drop down.
    14. Repeat these steps in the second text area to create the controls for the second set of visualizations.
  2. Configure scatter plot before duplicating
    1. Right click on the scatter plot, select Properties.
    2. Go to the Lines & Curves menu
    3. Click Add, select Horizontal Line, Straight Line
    4. For the P50 line, set the radio button to Aggregated value and select Median
    5. Change the Line name radio button from Automatic to custom and label it P50
    6. Click OK <insert screenshot>
    7. Configure the appearance of the line in the Settings dialog as desired 9-12-2016-4-23-21-pm
    8. Click the Label and Tooltip button
    9. Select the Label checkbox under Curve name 9-12-2016-4-23-05-pm
    10. Repeat the steps for P10 and P90. The P90 line should use the 10th percentile aggregated value.  The P10 line should use the 90th percentile aggregated value.
    11. Right click on the scatter plot, select Properties
    12. Go to the X axis menu, select Log scale
  3. Duplicate the scatter plot and the cross table
  4. Configure Scatter plot
    1. Right click on the Color by selector, select Set from property and choose the appropriate drop down property control.
    2. Right click on the x axis of the visualization, select Set from property and choose the appropriate drop down property control.
    3. Right click on the y axis of the visualization, select Custom expression and paste in the expression –> -NormInv((Rank([${Probit2}]) – 0.5) / Count([${Probit2}]))as[Probit]
      1. NOTE: Some users have told me that when they copy and paste the formula, Spotfire doesn’t like the minus before 0.5.  Just delete it and retype the minus sign.
    4.  Replace the Probit2 with the name of your drop down control
    5. Repeat these steps for the second plot.
  5. Configure Cross tables
    1. Right click on the x axis and choose Custom expression and paste in the expression shown below and replace Probit with the name of your drop down property control and replace API with your own well identifier column. Repeat these steps for the second plot.

UniqueCount([API]) as [Well Count],

P10([${Probit}]) as [P90],

P90([${Probit}]) as [P10],

P90([${Probit}]) / P10([${Probit}]) as [P10 / P90],

Avg([${Probit}]) as [Mean],

Median([${Probit}]) as [Median], (

0.3 * P10([${Probit}])) + (0.4 * Median([${Probit}])) + (0.3 * P90([${Probit}])) as [Swanson’s Mean]

8. Configure Map charts

  1. Right click on the map chart, select Properties
  2. Select the marker layer, click the Settings button
  3. Go to the Positioning menu, scroll down and make sure that the correct lat and long are setup to position the wells property
  4. Click close until all dialogs are closed
  5. Optional — Configure the color by variable

9. Clean up

  1. Right click on the cross tables, choose Visualization Features and select Axis Selectors to hide the axis selectors
  2. Hide the legends by clicking the legend button in the title bar of each visualization

Congrats! You now have a probit plot in your DXP file.  If you have any difficulty with these instructions or additional functionality you would like to see, please leave me the info in comments.  Thanks!

Guest Spotfire blogger residing in Whitefish, MT.  Working for SM Energy’s Advanced Analytics and Emerging Technology team!

12 thoughts on “Spotfire Solutions — Add probit plot to existing DXP

  1. Kyle Wilson Reply

    Great article to begin.

    I work in the O&G industry (reserves) and probit plots are crucial to our job.
    The workflows you presented were simple to follow and extremely helpful.

    I have a few questions I’m hoping you’ll have answers to.

    1. Do you know a way to fix the log scales on x-axis to 10, 100, 1000 etc.?
    2. Do you know how to change the Y axis to the typical P1 to P99 scale we are used to seeing?
    3. On the cross table, I’m having trouble getting all the custom expressions to show up. Whenever I build the P90 custom expression it removes all other custom epxressions and only shows P90.

    I would be thankful for any advice/help.

    Thanks!

    • Julie Schellberg of Big Mountain Analytics, LLC Post authorReply

      1. In the properties dialog you can set both the x and y axis mins and maxes. Set those, turn off the axis labels, and then use lines and curves (horizontal/vertical lines) to set the lines you want.
      2. You could always do a calculated column (or something from lines and curves) using the P10 function, which is our P90 and then rename it. There is no way to flip how it interprets the built in functions.
      3. Are you writing the expression on the y axis or just adding a calculated column? This one I would need to see. It might be a bug in the software. If you have a TIBCO support account, I would submit it to them.

    • Julie Schellberg of Big Mountain Analytics, LLC Post authorReply

      PS Thanks for the feedback and questions!! Much appreciated.

    • Ben Zapp Reply

      Thanks for the good write-up, Julie! Really appreciate it.

      Kyle, some info for your questions:

      1. SpotFire doesn’t handle log scale very well at all for some reason. Please go ‘up-vote’ the ask to improve SpotFire’s log scale capabilities here: https://ideas.tibco.com/ideas/TS-I-5335

      2. We are using NormInv (quantile function) which produces a ‘q’ value from ‘p’ values. We need a proper cumulative distribution function (CDF) or Norm.Dist if using Excel. Please go ‘up-vote’ the ask to improve SpotFire with a CDF here: https://ideas.tibco.com/ideas/SPF-I-5153

      • Julie Schellberg of Big Mountain Analytics, LLC Post authorReply

        Thanks for that information Ben! Appreciate it!

        • Ben Zapp Reply

          Just solved the CDF issue for the y axis. You will have to create an expression function. Step by step to create:

          1. Edit -> Data Function Properties -> Expression Functions -> New
          2. Fill in expression function dialogue box as below:

          Name = I called mine CDF
          Function Type = Column Function
          Return Type = Real, Category = Statistical functions.
          Description = I wrote “Cumulative Normal Distribution Function from negative infinity to x. Calculates mean and standard deviation from arg1.

          CDF(arg1)”

          3. Script:

          output <-pnorm(input1,mean(input1),sd(input1),lower.tail=TRUE)

          4. Click OK
          5. Replace Y-axis of scatter plot with custom expression:

          CDF(${Probit1}) as [Probit]

          6. Format as %
          7. Enjoy!

          • Ben Zapp

            Update: Need to ignore null values replace script with below:

            output <- round(pnorm(input1,mean(input1,na.rm=TRUE),sd(input1,na.rm=TRUE),lower.tail=TRUE),digits=5)

          • Julie Schellberg of Big Mountain Analytics, LLC Post author

            That is a great solution. I don’t use expression functions very often, and I kind of forgot that they exist. In reality, they are a great option when you don’t like how Spotfire calculates something. Instead, you calculate it your own way using TERR>

  2. Logan Heffner Reply

    Julie,

    Thank you so much for this how-to. This is great. Just finished building and had a question:

    If I want to run this on a new data table, what is the best way to go about re-pointing everything to that new workbook as I know there are a lot of places where the data table is referenced. Would I have to rebuild every time I pull in a new data table?

    Thanks again.

    • Julie Schellberg of Big Mountain Analytics, LLC Post authorReply

      If you use the DXP as a template and replace the table, some rebuild will be required, such as connecting the property controls to the new columns, but that’s the only way to minimize rework. In that scenario, all the visualizations would still be connected to the table.

  3. Pingback: How to Build a Multiple Variable Probit Plot - Data Shop Talk

  4. Pingback: How to Add Lines to a Probit Plot with IronPython - Data Shop Talk

Leave a Reply

Your email address will not be published. Required fields are marked *