How to Build a Multiple Variable Probit Plot

A few years ago, I wrote a post on how to add a probit plot to a Spotfire project.  I wrote the post early on in my blogging days, and going back to it was a little painful.  It violates several of my “blogging rules” developed over the years.  For example, I should have simplified the example.  I wrote it, and it was even hard for me to reread.  Blog and learn.  Blog and learn.

Anyway, early in the post, I note, ” It is possible to create a single probit plot with multiple variables, but that requires some data wrangling and is not included in this set of instructions.”  Well, these days all my probit plots contain multiple variables.  A manager recently asked me to write up a set of instructions for this common task.  So, here you go.

How to Build a Multiple Variable Probit Plot in Spotfire

Desired Output

First, what exactly are we trying to create?  Here is an example.  You’ll be familiar with the log scale, and as you can see there are multiple lines on the plot for different variables.

High-Level Steps to Desired Output

Creating the desired output is a two-step process.

  1. Unpivot data table
  2. Create probit plot (scatter plot)

Unpivot Data Table

Presumably, the user has a data table that looks like the one shown below.  Each variable is it’s own column.

The first step to creating a multiple variable probit is to unpivot this data with a transformation. The end result will look like the example below.  Columns are transformed into rows.

 

The column names Measure and Value can be changed to names the user finds appropriate. The table will be narrower and taller.

Follow the steps below…

  1. Go to Edit – Transformations.
  2. Select Unpivot from the drop-down menu of transformations.
  3. Move all columns that are staying the same to “Columns to pass through”.
  4. Move all other columns, the columns that are being transformed from columns to rows, to “Columns to transform”.
  5. Name the two new columns and make sure the data types are correct. Measure should be string and Value should be real or integer.
  6. Click OK.

With the data taken care of, now create the probit plot.

Create Probit Plot (scatter plot)

High-Level Steps

Creating the plot is actually more time consuming than wrangling the data.  Adding the secondary lines takes the most time and is optional.

  1. Create a basic plot
  2. Configure the visualization
  3. Format the x axis
  4. Add straight line fit
  5. Add secondary lines (optional)
  6. Filter to appropriate content

Notes:

  1. There are no calculated columns, only a custom expression written on the axis of the scatter plot. Because the expression is written on the axis of the visualization, the calculations will update with filtering.
  2. Filtering on the Measure column will control which variables appear in the plot.
  3. The 90th percentile in Spotfire is equivalent to P10. The 10th percentile in Spotfire is equivalent to P90.

Follow the steps below…

  1. Create basic plot
    1. Add a scatter plot to the page
    2. Set the data table to the “unpivoted” table
  2. Configure the visualization
    1. Place the Value column on the x-axis of the visualization
    2. Right-click on the y-axis of the visualization and select Custom Expression
    3. Enter the following expression — NormInv((Rank([Value],”desc”,[Measure]) – 0.5) / Count([Value]) OVER ([Measure])) as [Cum Probability]
    4. Set the Color by selector to the Measure column
  3. Format the x-axis
    1. Right-click on the visualization, select Properties. Go to x-axis menu.
    2. Set the Min and Max range as shown below in Figure 1.
  4. Add straight line fit
    1. Right-click on the visualization, select Properties. Go to the Lines & Curves menu.
    2. Click the Add button to add a Horizontal Line, Straight Line.
    3. Click OK at the next dialog box.
    4. Click the One per Color checkbox as shown below in Figure 2.
  5. Add secondary lines (see Figure 3 below for example)
    1. Horizontal Lines (P10, P50, P90, etc)
      1. If you are still in the Lines & Curves menu, click the Add button to add a Horizontal Line, Straight Line.
      2. To add P10, P50, and P90, select the Aggregated Value radio button as shown below in Figure 3.
        1. For P10, select 90th
        2. For P50, select Median.
        3. For P90, select 10th
      3. For all other values, select the Custom expression radio button as shown in below in Figure 4. Enter this expression — Percentile([Y], 30) and modify.  For P70, the value is 30.  For P60, the value in the expression should be 40, and so on.
      4. Format the line color, weight, format, and label as desired using the options circled in Figure 5 shown below.
    2. Vertical Lines
      1. The example plot shown has vertical lines at 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, and 10000. Each one must be added individually.
      2. If you are still in the Lines & Curves menu, click the Add button to add Vertical Line, Straight Line.
      3. To the line, select the Fixed Value radio button and enter the value as shown below in Figure 6.
      4. Format the line color, weight, format, and label as desired using the options circled in Figure 5 shown below.

Reference Figures

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

And now you have a multiple variable probit plot!

Spotfire Version

Content created with Spotfire 7.12.

Part 6 – Decomposing Text Areas & Scripts

It’s been 3 weeks since my last post.  I haven’t given up Spotfire blogging.  I just went on a nice long vacation, and then needed a week to recover from my vacation.  But, I’m back on track, and this week’s post is part 6 of the 7 part series on Decomposing Spotfire projects.  This piece focuses on decomposing the text area and scripts within the text area.  Because there is a lot you can put in a text area, this is going to be one of the longer posts in the series. If you are new to the series, here are links to the other posts.
 As usual, each post in the series breaks down into four sections.
 
1. Quick and Dirty (Q&D)
2. Extended Version
3. Documentation
4. Room for Improvement
 
First, the Q&D explains what to look for to get a general idea of what is going on. Then, the Extended Version presents a more complex picture. Documentation provides a few examples of how to document if necessary. Lastly, Room for Improvement looks at a project from the standpoint of making it better.  This post will kick off with a brief summary of the components users can add to text areas.

Text Area Basics

 
If you haven’t spent much time in the text area, check out this post, which explains the breadth of text area usage. To summarize, the text area is a mini web page containing one or more of the elements listed below.
  1. Images
  2. Action controls in the form of buttons, links, or images.
  3. Property controls in the form of drop downs, inputs, list boxes, or sliders
  4. Filters
  5. Dynamic Items in the form of icons, calculated values, bullet graphs, or sparklines

If you are unfamiliar with property controls, check out this $5 template I have posted on the Ruths.ai Exchange. It explains in detail how to use all of the Spotfire property controls.

Users may also further customize text areas with HTML, CSS, IronPython, or JavaScript. In fact, the more I learn to add functionality with these programming languages, the more I love Spotfire. They really allow you to expand upon the base functionality of the application. Now that you understand the scope of text areas, let’s break it down.

Quick & Dirty (Q&D)

Here are the quick and dirty questions to ask and answer to get started.
  1. What types of elements does the text area contain? 
  2. Are scripts or data functions triggered by items in the text area?
  3. Do all elements work (including scripts)?

Before we dive in, I want to note all examples are shown using the Edit HTML option for editing text areas.  I always customize the HTML in text areas and thus don’t use Edit Text Area anymore.

What types of elements does the Text Area contain?

Perform a quick scan to get an idea of the volume of elements you’ll have to investigate. Do you see property controls, action controls, dynamic items, and filters? Are there a lot of buttons? These elements are easy enough to observe (most of the time). It is possible to hide elements. Here’s an example of HTML where an element is being hidden.  In this example, JavaScript is used to pass a value to a hidden property control.  The user doens’t need to see the property control, but the value it contains is referenced elsewhere in the DXP.
Here, the style attribute hides a property control.
Now, IronPython and JavaScripts are only employed in the text area. Buttons and property controls frequently trigger IronPython scripts.  Data functions are different.  They may run automatically when a file opens, when a filter changes, or when data is marking.  However, it’s also possible to configure them to run with the click of a button.   
Before exploring your text areas, go to Edit — Document Properties — Scripts to see a list of IronPython and JavaScripts.  Go to Edit — Data Function Properties to see how many TERR data functions the documents contain.  Now that you know the names, you’ll be aware of them as you review text areas.  If these properties dialogs are empty, you can scratch scripts off the list.  If they do appear, the next step is finding which text area they work in. We’ll discuss this below.
All scripts in a DXP are shown in Edit – Document Properties – Scripts.
 
All data functions live in Edit — Data Function properties.  This example has one data function.

Are scripts or data functions triggered by items in the Text Area?

Once you know what you are looking for, open each text area. Remember to right-click on the text area and select Edit HTML rather than Edit Text Area. This is important because Edit Text Area won’t show you a list of JavaScripts or the HTML.   
Edit Text Area looks very different from Edit HTML.

“JS” signifies JavaScripts.  All JavaScripts used in a text area will appear in the list on the right-hand side.

All elements (including scripts) appear in a list to the right.
 
Now, while you’ll see JavaScripts clearly, IronPython and data functions are different.  They don’t appear on the list.  They are attached to the element, and you must edit the element to see the script.  
The Edit Action Controls dialog includes a list of all scripts in the DXP. If one is highlighted, it is connected to the action control.
Scripts have their own column in the edit Property Control dialog as shown here. This means a script is connected to the property control.

 

If a button connects to a data function, the dialog will look like this when editing.
Now that you know what to be on the lookout for, let’s talk about functionality.
 

Do all the elements work?

This is a very important questions, and you should not assume everything works.  To give you some direction, here’s a list of items and functions to test.
  • Links — Do they connect to the right website? Do they navigate to the correct place?
  • Buttons — Do they perform the described function? Do they navigate to the correct place? It’s not hard to break buttons.  
  • Filters — Are they connected to the right column for the right filtering scheme?
  • Property controls — Do the property controls shown actually control something on the page?
  • Dynamic items — Are the values correct?
Those are the basics. Next, let’s move on to the Extended Version.
 

Extended Version

  1. Are all the elements used?
  2. Are property controls being re-used?
There is always the possibility of unnecessary elements in the Text Area. These would be elements the developer added but never implemented or things he/she intended to go back and delete but didn’t have the time. It happens a lot.  It’s also good to know if the developer reused property controls, which developers do avoid creating more property controls. It’s not necessarily a bad thing. But, it is good to know that modifying a property control in one page will affect a different page.
 
Next, let’s talk a little bit about documentation.
 

Documentation

To be honest, I don’t normally document the text area. However, I can think of a few good reasons to do so.  It would be helpful to know….
  • Which pages contain utilize scripts or data functions?
  • What are those scripts supposed to do?
  • Where are property controls reused?
  • If there is a workflow, how is the user supposed to move thru it or work with it?
Since I haven’t documented my text areas previously, I don’t have any good examples to show.  So, let’s move on to improving the project.
 

Room for Improvement

When thinking about how to improve a project, here are some questions to ask and answer.
  1. Is it pretty enough? Does each page have a consistent look and feel? 
  2. Are you optimizing “real estate” on the page?
  3. Do images need resizing?
  4. Is the HTML a mess?

Is it pretty enough? Does each page have a consistent look and feel? 

Now, this question may seem silly or unnecessary. But, I will say two things. One, before I knew how to write HTML and CSS, I built some really ugly text areas.  Just take a look at my early template submissions on Ruths.ai (anything from 2016).  My submissions improved when I built a template with custom HTML and CSS to use as a starting point.
Two, make it pretty counts for a lot.  Upper management wants to see pretty reports, charts, and graphs.  Also, if the project looks good, people will use it.  As a result, much of my job when building projects is to make it pretty. Never underestimate the importance of aesthetics. 
 

Are you optimizing “real estate” on the page?

Items like legends, panels, axis selectors, and descriptions take up space on the page. It’s possible to hide all of them.  The captions below explain how.
Right-click in the white space of the legend to see what items can be turned on/off.

 

Right-click on a visualization and select Visualization Features to see what items can be turned on/off.
You can also use IronPython to turn panels on and off with the click of a button. Check out this post for an example.  

Do images need resizing?

Sometimes users add images to a text area but don’t take the time to get the image sizing/resolution right. You can modify this in the HTML.
 

Is the HTML a mess?

The single most common source of messy HTML is copy and paste, usually from Word. Here is a painful example. Not only is this difficult to edit, but the copy and paste usually don’t come out how you want it in the text area anyway.  
 
Avoid copying and pasting from Word into Spotfire Text Areas. Learn a little bit of HTML, and you will be much better served. HTML is one of the easiest languages to learn because you can learn a lot in a short period of time. If you need a starter tutorial, check out my Intro to HTML series.
 

Conclusion

Whew, that was a long one!  It’s been quite the journey.  Only one post remains in the series — Decomposing Visualizations. After that, I’ll do a follow up with a post on Best Practices. Then, I’ll round it out with a post on things I would like to see improved in the application to would help users break apart projects.
 

Spotfire Version

 Content created with Spotfire version 7.12.

Part 5 – Decomposing Calculations

This week’s post (part 5 of 7) focuses on decomposing columns and calculations.  If you are new to the series, here are links to the other posts.

  1. Intro to the Series
  2. Decomposing Data Tables & Connections
  3. Breaking Down Data Functions
  4. Untangling Data Wrangling
  5. Decomposing Document Properties

As usual, each post in the series will be broken down into four sections.

  1. Quick and Dirty (Q&D)
  2. Extended Version
  3. Documentation
  4. Room for Improvement

First, the Q&D explains what to look for to get a general idea of what is going on.  This week, I have combined the Extended Version and Room for Improvement.  Lastly, Documentation provides a few examples of how to document if necessary. 

Quick & Dirty

The quick and dirty on this post is going to be very quick.  The most important thing to understand about calculations is — Do they work?  After that, it’s all additional information and improvement.  There are two easy ways to find out if all calculations are intact.

  1. Warnings upon file opening
  2. Edit > Column Properties > Is Valid

Spotfire will provide warnings in the bottom left-hand side of the screen when the DXP opens indicating issues or problems.  If you see a red exclamation mark when opening the DXP, click on the link, copy the contents to the clipboard and save to review.

This example shows a warning. A red exclamation mark will appear if calculations are broken.

My preferred way to check for broken calculations is to use the Edit > Column Properties menu.  Open this dialog, find the IsValid column, and click on it to sort.  A value of True means the calculation is working.  A value of False means the calculation is broken.  When I say broken, I mean one of the columns or document properties referenced in the calculation is no longer recognized.  Investigate each False column by clicking the Edit button.  Any element not recognized will be red.

Note the value of false indicating a broken column.
This is broken because I deleted the GAS column.

Now, keep reading if you want to go deeper.

Extended Version/Room for Improvement

Once you have confirmed all calculations are working, everything else is really information and/or improvement.  Here are some questions you can ask and answer.

  1. Did the author use a column naming convention? Should you create one?
  2. Did the developer add descriptions to the columns? Should you add some?
  3. Are all of the columns and calculations used? Can you delete/exclude unused calculations or columns?
  4. Did the developer insert calculated columns into other tables? Is it possible to rearchitect?
  5. Is it possible to combine calculations to create a smaller table?

Did the author use a column naming convention? Should you create one?

Naming conventions are great…if you can come up with one and stick to it.  A few weeks ago, I had serious heartburn about a large project I built without a column naming convention.  Most developers don’t because Spotfire doesn’t make it easy.  Naming conventions take time but are well worth the effort.  To find them, simply scroll thru the columns to see if anything jumps out at you.  Developers may employ naming conventions in the following places…

  1. Notations for calculations, like “c.” (a personal favorite as indicated by my screenshots)
  2. Notations for calculation created by transformations “t.”
  3. Units of measure
  4. Data sources
  5. Conventions for tables that might have similar or duplicate columns, such as “c.Master”

Did the developer add descriptions to the columns? Should you add some?

Now, Spotfire does not allow you to embed comments in calculations, as with scripts or data functions.  Why would you need that?  Well, I recently wrote a calculation that was 67 lines of case logic.  It would have been nice to add comments explaining what each line of logic was doing.  However, I do frequently use the column description to explain the details of a calculation or column.  Simply walk down the list of columns looking at each one to see if there is a description.

Are all of the columns and calculations used? Can you delete/exclude unused calculations or columns?

It is also helpful to know if all of the columns are used.  You could delete or exclude (via transformation) unused columns to make the project more user-friendly.  Unfortunately, there is no easy or quick way to figure this out.  Ultimately, you want to know if columns connect to visualizations, custom expressions, calculations, scripts, data functions, or text areas.

Additionally, attempting to delete a calculation will indicate if it is part of another column.  As shown below, Spotfire will warn against deleting it.  It would be a nice improvement on the software if you could see where columns are used in calculations.

Did the developer insert calculated columns into other tables? Is it possible to rearchitect?

I see users do this all the time, and it’s one of my pet peeves.  Avoid this if you can. Why?  First, if the granularity of the tables differs, the data may be wrong.  Second, if a user wants to understand the calculation, they must trace back to the table where the calculation originated.  Third, most likely, the calculation isn’t needed in both tables, which means it’s inefficient. If the developer employed a naming convention, it is easy to find inserted calculated columns.

Here, I have used the nomenclature “c.” to denote the calculated column in the table where they originate.

 

Note: Calculated columns inserted into a table don’t appear as Column Type “Calculated” anymore. Now they are “Imported”, which is why the naming convention is helpful.

 Is it possible to combine calculations to create a smaller table?

When developing a project, it is extremely common to break complex calculations into pieces to see if they work or to troubleshoot.  A good developer will go back and combine them for maximum efficiency.  However, it’s easy to fall into the trap of — It works!  Let’s move on.  There’s no easy way to figure this out other than to just spend time in the project and get to know the calculations.  I recommend undertaking this task after developing a solid understanding of the project.

Documentation

Documentation like this is valuable for two reasons.  First, it allows you to take a comprehensive look at the project.  You get a high-level look, rather than the detailed look of examining one by one. Second, it prevents having to relearn the project.  Check the reference rather than digging thru the project repeatedly.

This matrix explains usage by asking and answering the question — Where is this column used?

 

This matrix shows how calculations connections and relationships.

 

Conclusion

Well, that wraps up this post.  There are only two posts left in the series — Text Areas & Scripting and Visualizations & Data Limiting. Originally, this was a 7 part series, but I will add on at least one more part if not two.  As I’ve written the series, I have also slowly generated a list of things good developers do.  Thus, it makes sense to summarize best practices.  I’ve also inadvertently generated a list of potential improvements to the application.  I’ll write those up, add them to the Ideas portal, and create a summary linking to the Ideas portal that will allow users to upvote those ideas.

Spotfire Version

Content created with Spotfire 7.12.

Part 4 — Decomposing Document Properties

This series has covered decomposing tables and connections, data functions, and data wrangling.  This week, it moves to decomposing document properties.  I have intentionally broken document properties out separately because there are so many different places they can be used.   If you are just tuning in, here are links to the previous posts in the series.

 Intro to the Series

Decomposing Data Tables & Connections

Decomposing Data Functions

Decomposing Data Wrangling

As usual, each post in the series will be broken down into four sections.

  1. Quick and Dirty (Q&D)
  2. The Extended Version
  3. Documentation
  4. Room for Improvement
In previous posts, both the Q&D and Extended Version sections presented 3 – 5 different questions. Q&D questions were simple in nature, while the Extended Version dug deeper.  In this post, the questions are the same for both sections.  The difference between the two is where you will go to answer them.  The Extended Version will involve more digging.  The Documentation section will provide examples of how to document just like the last post.  Lastly, Room for Improvement looks at a project from the standpoint of making it better. So, let’s get started.

Understanding Document Properties

Now, the most important questions to ask and answer about doc properties are these:

  1. Which types of doc properties are being used?
  2. Are all of the doc properties used?
  3. Are any parts of the project broken because document properties were deleted?

Before we get started on answering these questions, I want to lay out a few critical pieces of information on doc properties.

  1. Types of doc properties in Spotfire
  2. Locations where doc properties can appear
  3. Syntax used to reference doc properties

Types of Document Properties

There are 7 types of doc properties available in Spotfire.  I have included an example of each.

  1. Label
  2. Input field
  3. Input field multiple lines
  4. Drop down
  5. List box
  6. List box multi-select
  7. Slider

Locations Where Document Properties Appear

Doc properties can be engaged in the following areas within a DXP.

  1. Text Areas
  2. Data Functions
  3. Calculations
  4. IronPython and JavaScripts
  5. Visualization Properties

Now, a good place to develop an understanding of doc properties is the Edit menu > Document Properties > Properties tab.

This dialog will show all doc properties in the DXP, including default properties created by the application (MaxMissingTimeParts, FiscalYearOffset).  This list will tell you exactly how many doc properties exist in the DXP.  Use it as a checklist while walking thru the project.  Unfortunately, this dialog fails to detail the type of doc property or where it is used.  This is also where you will delete unused document properties if necessary.

Doc Property Syntax

Expressions reference doc properties in 2 ways.  You will see different syntax depending on how the document property is being used or the type of value in the doc.  In the screenshot below, I inserted the same doc property as a value and again as text to demonstrate the difference.

Quick and Dirty

The Q&D section will focus on text areas, data functions, and scripts because it’s very easy to click thru a project and see document properties engaged in these areas.

Right-click on a text area and select Edit HTML. All document properties in a text area (even hidden ones) appear on the right-hand side of the dialog with the type in parenthesis.

JavaScripts also appear in the menu to the right of the text area.  Select each one and click Edit to look for doc property references.  In order to find doc properties used in IronPython scripts, go to Edit > Document Properties > Scripts tab and look at each script individually to find where doc properties are used.  I don’t have a project handy with an example, so no screenshots for this one (SORRY!). I do have an example of a doc property used in a data function.

Go to Edit > Data Function Properties > Edit Parameters and look for document properties in each input or output parameter. This doc property works with an expression in a data function.

Extended Version

Now, for the Extended Version.  It is much more time consuming (although not difficult) to find doc properties used in calculations and visualization properties.  For calculations, go to the Edit > Column Properties > General tab.  Scroll toward the bottom of the list (calculations are appended below imported columns) and keep an eye out for the document property syntax.  There really isn’t a better way to do this.  In theory, you could delete one document property at a time and look for ‘False’ in the IsValid Column Property, but that seems like overkill.  Perform this task one table, one column at a time.

Next, doc properties can run deep in visualization properties.  Anywhere you can right-click and either Set from Property or write a Custom Expression, doc properties can be used.

In this example, a drop-down property control sets the categorical axis. You must right click and look at the expression to know.

This includes most submenus in the properties dialog.  Pay particular attention to data limiting (in the Data menu), Colors, Lines & Curves, and Show/Hide Items.

I will admit that finding all possible references to document properties can be extremely time-consuming.  If in doubt, leave the doc property in and/or save versions of the project as you go.

Documentation

The screenshots below came from a PPT documentation I assembled for a large project with a LOT of document properties.  The text areas, data functions, IronPython scripts, visualization properties, and calculations all contained doc properties.  Documenting usage became necessary to gain a comprehensive understanding.

First, list out all the document properties.
Create a matrix showing where utilized.
Map interaction with data functions. This maps the object name in the code to the doc property name.
List where used in calculated columns.
Note which page includes doc property references and their type.

Room for Improvement

Lastly, here are a few questions to ask and answer that will improve the project.

  1. Would renaming document properties make their purpose easier to understand?
  2. Can you delete unused doc properties?
  3. Would adding document properties improve the flexibility of the project?

Unfortunately, Spotfire does not allow you to rename document properties.  To rename them, you must essentially create a new document property.  I only “rename” document properties when completely rebuilding a project. I recommend removing unused document properties.  If you delete, do a quick walk-thru of the project after each removal to ensure nothing has broken. Lastly, you might be able to improve the flexibility of a project by adding document properties.  For example, if visualizations have show/hide rules or color rules that are currently hardcoded, you could replace the hard coding with a document property reference to make them more dynamic.

Conclusion

Wow, that was a long post.  Now, you can see why I chose to break doc properties out on their own rather than including with another post. I hope you now have a much more comprehensive understanding of where document properties can be used in a DXP.  They are one of the standout features of the application.  If anyone from TIBCO is reading this, it would be great to have functionality within the application to track down usage of document properties.

Spotfire Version

Content created with Spotfire 7.12.

How to Remove Title from DCA Wrangler

Right now there isn’t a way to take the title off the DCA Wrangler visualization. It takes up some space so it would be nice to tighten it up a bit. Here’s a quick IronPython script that will easily unflag the ShowTitle attribute:

Remember, once you write the script, you’ll still need to press OK to make it permanent and go through the whole process of creating a button. Only then can you delete the button and keep it the way you want. This procedure works for any custom visualization.

Finished product!

Part 3 – Decomposing Spotfire Data Wrangling

I am knee deep in a 7 part series on decomposing Spotfire projects.  In case you missed it, here are the links to the previous posts.  This week, I am covering decomposing data wrangling.  Next, with an understanding of data sources, we’ll look at how they come together and how much data manipulation happens.

 Intro to the Series

Week 1 — Decomposing Data Tables & Connections

Week 2 — Decomposing Data Functions

Now, each post in the series will be broken down into four sections.

  1. Quick and Dirty (Q&D)
  2. The Extended Version
  3. Documentation
  4. Room for Improvement
First, the Q&D explains what to look for to get a general idea of what is going on.  Then, the Extended Version presents a more complex picture. Documentation provides a few examples of how to document if necessary. Lastly, Room for Improvement looks at a project from the standpoint of making it better.

Quick and Dirty

  1. How do the tables fit together?
  2. Did the project developer choose to merge tables or keep data in separate tables?
  3. Are all transformations intact?
  4. For insert columns, do the joins make sense? For insert rows, are columns mapped correctly?

In the first two posts, it was easy to explain the answers with captions on screenshots. Here, the answers are a bit more complicated and require more explanation.

How: Open the Data Panel > Select a data table from the drop-down > Click on the Cog > Click on Source View.

The Data Panel is the best place in the application to understand how a project comes together. It draws a picture of the flow of data, including data wrangling tasks. Look for numbers indicating a count of transformation steps as shown below. Click on the block with the number to get more information, also shown. What you are looking to understand is how tables are (or are not connected). Are tables built with insert columns or insert rows? Did the developer choose to merge data sources rather than keep them in separate tables? Is one table the input for another? I commonly start with a table and then pivot or unpivot it to get a different arrangement of data.
Pay particular attention to any red exclamation marks or yellow warning signs. The red exclamation mark indicates a transformation needs attention and isn’t working. A yellow warning sign signifies that a transformation may no longer apply. In the screenshot below, you also see a blue “i”.  This popped up when a change data type transformation no longer applied.  Originally, the columns were Currency type, so I changed them in the DXP to Real.  Then, they were updated in the information link, so this transformation was no longer relevant.  I recommend removing any that are not necessary. It saves on load time and just makes the project easier to understand.
 
 
It’s not uncommon (especially in the version before 7.13) to see more than one join to the same table. For years, Spotfire did not allow the user to edit joins. This led to painful inefficiency and duplicate columns in projects. For example, let’s say you want to join Table A to Table B. You think you want 3 columns, so you insert them. Later, you realize you need to more. At this point, there are two options — start over or add another join. No one ever chooses to start over…EVER. Consider consolidating all those joins to improve the project.
This can also add duplicate columns to your table. Spotfire builds tables based on a list of columns to ignore, not a list of columns to add. At first glance, you might think this doesn’t matter. It does matter.  Click on that link for more information.  That’s how projects get messy.  Fortunately, in Spotfire 7.13 and higher, users have the ability to edit joins. HOORAY!!!! We waited for that feature for years.
 
Okay, back to the original questions? Do the joins make sense? What does that mean? Did the developer know what they were doing when they chose the join type or the key column? That’s really what you need to review. If they joined on a string column, such as well name, as opposed to API, there is the risk the join won’t always work. Did they use an inner join when they should have used a left join? As a Spotfire developer, you MUST understand the different join methods.
 
If that’s all you have time for, thanks for playing. If you want to really get your hands dirty, keep reading.
 

Extended Version

  1. Are there any tables that aren’t used?
  2. Are all transformations necessary?
  3. Does the flow of data make sense?
  4. Are any data tables created by data functions?
 It is very common for developers to add tables and not use them. Why would they do that you ask? Well, here are few potential scenarios….
  1. The developer found a better/different data source.
  2. The project began with QA and the swapped to PROD and didn’t delete QA.
  3. They started with a spreadsheet and switched to a SQL table.
  4. The user was testing out something that didn’t work. When it didn’t work, they moved on.
The reasons are endless. As the new owner of the project, you should know what is and isn’t used.  The yellow warning signs are a good indicator of transformations that aren’t necessary. Also, there may be calculations that aren’t used in the project. The more excess you can remove from the project, the better it will be.
 
Next, does the flow of data make sense? This is really a broad architecture question. It’s one that will likely involve a good bit of study on the project. Here are a few examples to get you mind moving.
 
Example 1— I was recently working on a project that needed to calculate a net number. Net is calculated by multiplying a value (production or a cost) by a working interest. The developer of the project merged three tables into a master table. Even though the net calculation was only needed in the master table, the working interest was inserted into all three tables. This is inefficient and makes the project slower.
 
Example 2 — I build a LARGE project that sent data through several different tables. Each table performed a QAQC step. When the data was clean, it merged into a master table. The desired output was a set of calculations. I could have done those calculations in the beginning. Instead, I chose to place them only in the master table rather than passing from table to table.
Lastly, just be aware if data functions are creating any of your tables.  If they are, do they run automatically or do the user have to click something or interact with the analysis?  This is commonly a bottleneck in projects.  Depending on how much data you are working with, those data functions might run slowly.
 

Documentation

The screenshots below come from two projects different projects.  The PPT screenshots are from a slide deck whose purpose was to document the project.  The Excel shots are a breakdown I worked up to understand a project in order to modify it.  
This screenshot has all of my tables on the horizontal and vertical axis. The green 1 indicates joined tables. The blue lines indicate stand-alone tables not connected to anything else.
This matrix shows all my tables, and counts are used and summed to indicate how “deeply” a table is used in a project. DF = data function. Tables with a zero indicate they aren’t used anywhere and could be removed from the project.
In PPT, I created a flow to explain how tables work together to create an output.
 When working with large projects, it’s a good idea to document.  Six months later, you aren’t going to remember how you built the project. 

Improvement

  1. Is there any way to make the table build/project load more efficient?
  2. Is Spotfire the best place to perform the data wrangling?
  3. Would it be possible to speed up the project with a different architecture?
  4. Where are bottlenecks?
 I could write forever on these broad questions. For the sake of time, I am going to link to a few previous blog posts on optimization to help answer the first question.
 Next, I’m seeing more companies deploy Alteryx alongside BI tools like Spotfire and Tableau.  It’s definitely something to think about. I firmly believe there are use cases where data wrangling in Alteryx makes more sense than Spotfire. If you are using Spotfire data wrangling to provide an input to visualizations and that data wrangling is time-consuming, do it in Alteryx.  Because Alteryx provides a single input, it will load faster.  Alternatively, leave the data wrangling in Spotfire when using dynamic or interactive capabilities like marking, filtering, or property controls to move data around. 
Lastly, is it possible to speed up the project with a different architecture? Could you use a data connection instead of an information link for faster loading? Could you connect to a data set in the library instead of an Excel file on a shared drive? Would an ODBC connection be faster loading MS Access than a File connection? Would it be possible to configure a table as an On Demand table rather than loading all the data?  I realize these are more questions than answers, but it should get your head in the right place to make solid improvements.
If you made it this far, gold star for you. Thanks for sticking with it. Next week, I’ll cover decomposing document properties.
 

Spotfire Version

 All content created with Spotfire 7.12.

 

Excel to Spotfire: Targeting Missing Values

In Data Science, prior to using any analytics we always face the problem of having missing values. Deciding when to delete cases or fill missing values is totally dependent on the data set and the target problem. A general practice consists on deleting variables and samples with more that 30% of missing values and then use Multiple Imputation techniques to fill the remaining missing values. In Excel, we can use the replace tool or the filter to approach this problem, and even use Visual Basic to code a more customized solution. In Spotfire, we have the advantage of using more advanced methods by accessing R libraries that contain MCMC, Bayesian and Multivariate Algorithms. Spotfire’s integrative tools really make a difference on how to approach the missing values problem: putting together advanced algorithms, amazing visualizations and user interactivity.

Read More

Build Type Wells using Selected Wells Method in DCA Wrangler

Reserves evaluators often want to build a Percentile Type Well that represents a certain percentile of the population. It is desired to determine a “P90 Type Well”, “P50 Type Well”, or “P10 Type Well”. When expressed this way evaluator is inherently seeking a type well that results in a percentile EUR. The P90 Type Well will be a representative well where there is a 90% chance that the EUR will be that number or greater. There are two published methods for creating Percentile Type Wells, Time Slice approach and Selected Wells approach.

So, the Percentile Type Wells are expected to provide a forecast that will have an EUR consistent with the target probability. This is not possible with the Time Slice method because that method is based on Initial Productivity (IP) and rates. In other words, Time Slice method makes an implicit assumption of a strong correlation between IP and EUR, whereas in a real-world scenario correlation between IP and EUR has a wide scatter, resulting in a Type Well with an EUR that does not represent the desired percentile. Refer to SPE – 162630 for a more technical discussion on the two methods.

In this blog post we will go through a workflow on how to create Type Wells using the Selected Wells method in DCA Wrangler. We created a template that creates Type Wells using Selected Wells Method, Time Slice Method and using individual well forecasts in the Selected Wells Method.

Following is the workflow for Selected Wells Method:

  1. Select wells in an Area of Interest (AOI)

  2. Create an Auto-Forecast for all the selected wells with desired number of years using DCA wrangler. While doing the Auto Forecast we will use a three-segment approach. The first segment with a constrained b – factor between 1 and 2 (this will take care of the characteristic steep initial decline present in most MFHWs in unconventionals). The second segment with a constrained b – factor between 0 and 1. The third segment for terminal exponential decline.

  3. Generate Well DCA and Well DCA Time results in DCA Wrangler. The Well DCA Time table will have the forecast data for all the wells created using the fitted Arps Model. Remember to refresh these tables every time you change the wells in your AOI.

  4. Next, we will find wells for Target EUR probabilities on an EUR Probit plot generated using all the wells in our AOI. We can enter a threshold value (α) to find wells which have their EUR within the (1 ± α) × EUR at the target probabilities. We can also quickly check the number of wells present within the threshold at each of the target probabilities. Adjust the threshold to get a minimum desired number of wells at each of the target probabilities.

  5. Now we can create Percentile Type Wells for our AOI by running DCA Wrangler in the Type Well mode using the wells we selected in our previous step.

Check out the template and try it with your production data.

The Most Important Thing You Don’t Know About How Spotfire Builds Tables

  • After you insert columns, the Spotfire Source Information tab describes the columns ignored rather than the columns added. Why is that?
  • Did Spotfire add columns to your project? Are you seeing columns in your project that you don’t recognize?
  • Do you have duplicate columns in a table, and you have no idea how they got there? Have you tried to delete them, and they keep coming back?

Read More