Business Intelligence Tools / Data Science & Analytics / Production & Operations / Reservoir Engineering

How to Build a Multiple Variable Probit Plot

A few years ago, I wrote a post on how to add a probit plot to a Spotfire project.  I wrote the post early on in my blogging days, and going back to it was a little painful.  It violates several of my “blogging rules” developed over the years.  For example, I should have simplified the example.  I wrote it, and it was even hard for me to reread.  Blog and learn.  Blog and learn.

Anyway, early in the post, I note, ” It is possible to create a single probit plot with multiple variables, but that requires some data wrangling and is not included in this set of instructions.”  Well, these days all my probit plots contain multiple variables.  A manager recently asked me to write up a set of instructions for this common task.  So, here you go.

How to Build a Multiple Variable Probit Plot in Spotfire

Desired Output

First, what exactly are we trying to create?  Here is an example.  You’ll be familiar with the log scale, and as you can see there are multiple lines on the plot for different variables.

High-Level Steps to Desired Output

Creating the desired output is a two-step process.

  1. Unpivot data table
  2. Create probit plot (scatter plot)

Unpivot Data Table

Presumably, the user has a data table that looks like the one shown below.  Each variable is it’s own column.

The first step to creating a multiple variable probit is to unpivot this data with a transformation. The end result will look like the example below.  Columns are transformed into rows.

 

The column names Measure and Value can be changed to names the user finds appropriate. The table will be narrower and taller.

Follow the steps below…

  1. Go to Edit – Transformations.
  2. Select Unpivot from the drop-down menu of transformations.
  3. Move all columns that are staying the same to “Columns to pass through”.
  4. Move all other columns, the columns that are being transformed from columns to rows, to “Columns to transform”.
  5. Name the two new columns and make sure the data types are correct. Measure should be string and Value should be real or integer.
  6. Click OK.

With the data taken care of, now create the probit plot.

Create Probit Plot (scatter plot)

High-Level Steps

Creating the plot is actually more time consuming than wrangling the data.  Adding the secondary lines takes the most time and is optional.

  1. Create a basic plot
  2. Configure the visualization
  3. Format the x axis
  4. Add straight line fit
  5. Add secondary lines (optional)
  6. Filter to appropriate content

Notes:

  1. There are no calculated columns, only a custom expression written on the axis of the scatter plot. Because the expression is written on the axis of the visualization, the calculations will update with filtering.
  2. Filtering on the Measure column will control which variables appear in the plot.
  3. The 90th percentile in Spotfire is equivalent to P10. The 10th percentile in Spotfire is equivalent to P90.

Follow the steps below…

  1. Create basic plot
    1. Add a scatter plot to the page
    2. Set the data table to the “unpivoted” table
  2. Configure the visualization
    1. Place the Value column on the x-axis of the visualization
    2. Right-click on the y-axis of the visualization and select Custom Expression
    3. Enter the following expression — NormInv((Rank([Value],”desc”,[Measure]) – 0.5) / Count([Value]) OVER ([Measure])) as [Cum Probability]
    4. Set the Color by selector to the Measure column
  3. Format the x-axis
    1. Right-click on the visualization, select Properties. Go to x-axis menu.
    2. Set the Min and Max range as shown below in Figure 1.
  4. Add straight line fit
    1. Right-click on the visualization, select Properties. Go to the Lines & Curves menu.
    2. Click the Add button to add a Horizontal Line, Straight Line.
    3. Click OK at the next dialog box.
    4. Click the One per Color checkbox as shown below in Figure 2.
  5. Add secondary lines (see Figure 3 below for example)
    1. Horizontal Lines (P10, P50, P90, etc)
      1. If you are still in the Lines & Curves menu, click the Add button to add a Horizontal Line, Straight Line.
      2. To add P10, P50, and P90, select the Aggregated Value radio button as shown below in Figure 3.
        1. For P10, select 90th
        2. For P50, select Median.
        3. For P90, select 10th
      3. For all other values, select the Custom expression radio button as shown in below in Figure 4. Enter this expression — Percentile([Y], 30) and modify.  For P70, the value is 30.  For P60, the value in the expression should be 40, and so on.
      4. Format the line color, weight, format, and label as desired using the options circled in Figure 5 shown below.
    2. Vertical Lines
      1. The example plot shown has vertical lines at 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, and 10000. Each one must be added individually.
      2. If you are still in the Lines & Curves menu, click the Add button to add Vertical Line, Straight Line.
      3. To the line, select the Fixed Value radio button and enter the value as shown below in Figure 6.
      4. Format the line color, weight, format, and label as desired using the options circled in Figure 5 shown below.

Reference Figures

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

And now you have a multiple variable probit plot!

Spotfire Version

Content created with Spotfire 7.12.

6 thoughts on “How to Build a Multiple Variable Probit Plot

  1. Pingback: How to Add Lines to a Probit Plot with IronPython - Data Shop Talk

  2. Nicholas Krzewinski Reply

    Hi Julie!

    Thank you so much for this tutorial! I have done PROBIT too many times to count, and never thought about doing it in Spotfire until I saw your post.

    I am having one issue though. Do you know what might cause me to be unable to add a horizontal line to the plot? Spotfire tells me that a horizontal line expression cannot be evaluated with the current visualization setup. I think it’s because my y-axis is set to categorical instead of continuous, but if I set it to continuous the visualization breaks and tells me the expression is not valid. I also am unable to manually set my y-axis min/max.

    Have you experienced this before?

    • Julie Sebby Post authorReply

      It has to do with the setting of the x or the y axis. The axis is too short. Try making the max value on the axis larger.

      • Nicholas Krzewinski Reply

        You were right, except it actually was that my data range was way too big. I did some looking and it looks like the EUR data I pulled from PHDWin included a “Grand Total” line which was making the range massive. Thank you for the help!

  3. Mike Reply

    Hi,

    How do you plot a probability line. I am noticing that a straight line best fit (step 4) is not plotting the probabilities against the x-axis. In other words, how would I plot a between ‘percentile([x], 99)’ and percentile([x], 1)’?

    Thank you, this post overall is extremely helpful.

    • Julie Sebby Post authorReply

      You would use lines and curves and write an expression using the percentile function.

Leave a Reply

Your email address will not be published. Required fields are marked *