Author: Julie Sebby

Guest Spotfire blogger residing in Whitefish, MT.  Working for SM Energy's Advanced Analytics and Emerging Technology team!

Learning IronPython for Spotfire


Ever since I learned what IronPython was and what it could do in Spotfire, I’ve had a “goal” to learn IronPython. I put the word goal in air quotes because learning IronPython isn’t a goal. Or at least, it’s not a very good one because it’s not measurable.  How do I know when I’ve accomplished it? And that’s not to say I haven’t learned any IronPython over the years.  Clearly, I have.  I write about it a lot, but I aspire for deep knowledge.  

The struggle was knowing how to compartmentalize learning IronPython. How could I break it down into something measurable? Finally, in October I commited to 100 hours of IronPython.  Now, that’s measurable.  The amount of time is somewhat arbitrarily.  However, I expected that if I spent 100 hours on IronPython, I would know significantly more than when I started and be able to explain it other people.
So, did that happen? OF COURSE! Heck, I’m only 16 hours in, and I understand so much more now than I did 16 hours ago. Now, you might be saying — Wait, you set the goal in October, and you’re only 16 hours in? Yes. That is correct. I do have a day job and this awesome blog and HOLIDAYS. Stuff happens, but life (and my career) is a long game, so no giving up.
That is my somewhat long-winded explanation to this post. I know all users struggle with applying IronPython in Spotfire, and I want to make that easier on you. Since it’s a bit of an unknown path for me, I can’t break this down into a series like I have other topics. You’ll just see Learning IronPython posts now and again.
This post in particular is going to cover the following…
  1. An explanation of why learning IronPython is so hard
  2. My initial learning goals
  3. My learning methods
  4. Some syntax and structure via a code example
Please feel free to comment if you see places where I am going awry, but be nice.

Why is learning IronPython so hard?

When learning a new coding language, many users look for books or online tutorials. If you google “learn IronPython”, the results are surprisingly sparse. The first result is to a set of documentation, and the second is Quora. Anytime Quora is in your top results, you are in trouble.  The fourth reference is in the UK….you see where this is going.
If you search Amazon for books on IronPython, you get the results shown below.  After 3 or 4 books, the results shift to books about Python. None of the results look suitable for beginners.  Strike two.
At some point, you have to ask, what is IronPython? will tell you…
IronPython is an open-source implementation of the Python programming language which is tightly integrated with the .NET
That might lead you to think that you need to learn Python, and that would be helpful, but Wiki will tell you…..
IronPython is written entirely in C#, although some of its code is automatically generated by a code generator written in Python.Wikipedia
  The hamster wheel is now starting to turn, and perhaps you remembered that Spotfire extensions are also written in C#, and you have seen a reference to C# in the TIBCO API documentation, as shown here.
Wait….you mean I should actually be learning C#??? Yup.  Now, things get a lot easier. 

Learning Goals

Now, it may seem like a bit of backtracking, but I want to explain what I set out to learn. If you’ve read the blog for any length of time, you know I’ve posted IronPython code snippets before. How can I do that but not really know IronPython? Easy…I learn by looking at code online, breaking it down bit by bit, and learning by doing (i.e modern learning). However, without a solid understanding of the underlying architecture, that method is limited. Thus, my primary goals for the first 10 hours were…
  1. Find better resources
  2. Get an understanding of IronPython structure
  3. Get an understanding of the Spotfire API
  4. Apply that understanding in Spotfire code examples
It turns out, that took 16 hours, not 10. I had to put in 16 hours of learning before I felt confident enough to write this post.

Learning Methods

I started my learning process by evaluating what I knew and didn’t know. Right before I kicked off this journey, I learned about The Spotfire IronPython Quick Reference.  This is an amazing website for learning IronPython for Spotfire. I started this code snippet from the Quick Reference to make a basic assessment.
As a result, here are a few questions that came up.  
  1. Why is there no reference to the namespace? Most IronPython that I’ve seen before always starts with the “import something” command.
  2. IronPython is an object-oriented programming language. How do I differentiate between developer named objects and syntax that was part of the code structure?
  3. I know references to the API should be capitalized. If that’s true why are “page” and “visual” in page.Visuals and visual.Title lowercase?
  4. I can see that “Pages” is a property in the Document class. “Visuals” and “Title” are properties in the Page class. “Title” is also a property in the Visual class. Thus, why does the code only call the Document class with “Document.Pages”? There is no reference to the Page class or the Visual class. (This question might be difficult if you aren’t familiar with traversing the Spotfire API).
So, let’s answer those questions.

Syntax & Structure

Why is there no reference to the namespace? Most of the classes in the Spotfire.Dxp.Application namespace load by default, so you don’t have to import. There are some exceptions like DocumentSaveSettings and DocumentOpenSettings. Thank you TIBCO support for that answer.
How do I differentiate between developer named objects and code structure? Simple. You test it. In the code snippet provided, you might wonder if “page” in “for page in Document.Pages:” is part of the code structure or an object. Replace “page” with any other word. If the code runs, it was a named object. If it fails, it’s part of the code structure.  It’s also lower case, so that is a hint.  All references to the API are in uppercase. 
Why are “page” and “visual” in “page.Visuals” and “visual.Title” not capitalized? They are objects, not references to the API. They could be any word.
I can see that “Pages” is a property in the Document class. “Visuals” and “Title” are properties in the Page class. “Title” is also a property in the Visual class. Thus, why does the code only call the Document class with “Document.Pages”?  To follow along with the answer to this, go to the Spotfire API reference. Open the Spotfire.Dxp.Application namespace (first namespace in the API). Click on the Document Class. The intro to the Document Class says this:
A document opened in a running instance of TIBCO Spotfire is referred to as an Analysis Document. The document not only contains a series of metadata information (see DocumentMetadata), but it also contains references to the data itself (see DataManager), and to various other components being part of the document, such as pages, filterings, bookmarks, etc. As soon as data has been opened in TIBCO Spotfire, an instance of this class can be accessed through the Document property of the AnalysisApplication. This is regardless of whether the data was opened through the user interface or programmatically. TIBCO Support
There’s a lot going on in that statement.  To explain it, go to the Spotfire.Dxp.Application namespace and click on the line below it that says AnalysisApplication Class.
You can see that Document is, in fact, a property of the namespace, as indicated in the description.  If you click on Document, it jumps to the Document class.  How does that answer the original question?  We’ll get there.  I just wanted to start with an explanation from the API and show you how navigating it works.  Our original question asked about the Document class, which is where you should be in the API reference if you are following along. So….
Pages is a property of the Document class. The code “for page in Document.Pages:” will get the pages of the document. More specifically, it is getting a collection of pages.  If you click on the Pages property in the Document class, it will take you to the screen show below.  There, you’ll note that the Type = PageCollection. The pages of the document are part of a PageCollection.  Now, click on PageCollection.
Now you jump down to the PageCollection class, which is below the Page Class and essentially because you are in the PageCollection Class you also have access to the properties of the Page class.  Title and Visuals are both properties of the Page class (as noted a while ago in the post).   The code only calls the Document class because you can access Title and Visuals by navigating the hierarchy as we just did.
As someone new to understanding the structure, I find this a bit confusing.  Because the API reference is organized like a tree, I expect to navigate it in a certain way, and that is clearly not how it should be navigated.  Hopefully, this will make more sense in another 10 hours or so.


Once I started searching for C# references, I found several very good tutorials worth sharing. To keep everything straight, I started saving tutorials and links by C# structure/syntax so I could have them handy and not have to search each time.  I’ve made the list available for you in this post. I will maintain this as I move thru learning. Please feel free to comment with other links, and I’ll add them.

Spotfire Version

All content created with Spotfire 7.12.

Guest Spotfire blogger residing in Whitefish, MT.  Working for SM Energy’s Advanced Analytics and Emerging Technology team!

How to Add Lines to a Probit Plot with IronPython

A few weeks ago, I wrote a post detailing how to create a multiple variable probit plot.  This post improved upon an older post on creating a single variable probit plot.  Part of those instructions included adding several “supplemental” lines via Lines and Curves like P10, P90, and the Median.  This is actually the most time-consuming part of the process.  Each line must be added one by one.

Today, while reviewing my instructions, I realized I had to do better. I know this can be done with IronPython! A quick Google search pulled up this TIBCO community post that I was able to use as a guide. I modified that script to work for my probit plot use case. Now, I have a piece of code that will add all of those lines and is easily modifiable and scalable.

The Code

Here is what the code looks like in my DXP.  I made the following modifications from TIBCO’s original:

  1. Changed BarChart to ScatterPlot to suit my visualization
  2. Modified the expressions from an average to the Percentile, P10, P90, and Median.
  3. Added code for vertical lines.

Code for Copy & Paste

from Spotfire.Dxp.Application.Visuals import *

scatterPlot = sp.As[ScatterPlot]()

#Add Horizontal Straight Line
horizontalLine1 = scatterPlot.FittingModels.AddHorizontalLine(‘P90([Y])’)
horizontalLine2 = scatterPlot.FittingModels.AddHorizontalLine(‘P10([Y])’)
horizontalLine3 = scatterPlot.FittingModels.AddHorizontalLine(‘Median([Y])’)
horizontalLine4 = scatterPlot.FittingModels.AddHorizontalLine(‘Percentile([Y],20)’)
horizontalLine5 = scatterPlot.FittingModels.AddHorizontalLine(‘Percentile([Y],30)’)
horizontalLine6 = scatterPlot.FittingModels.AddHorizontalLine(‘Percentile([Y],40)’)
horizontalLine7 = scatterPlot.FittingModels.AddHorizontalLine(‘Percentile([Y],60)’)
horizontalLine8 = scatterPlot.FittingModels.AddHorizontalLine(‘Percentile([Y],70)’)
horizontalLine9 = scatterPlot.FittingModels.AddHorizontalLine(‘Percentile([Y],80)’)

#Add Vertical Straight Line
verticalLine1 = scatterPlot.FittingModels.AddVerticalLine(’10’)
verticalLine2 = scatterPlot.FittingModels.AddVerticalLine(‘100’)
verticalLine3 = scatterPlot.FittingModels.AddVerticalLine(‘1000’)

Detailed Steps

  1. Add a Text Area to the page, right-click, select Edit HTML.
  2. Click the Add Action Control button.
  3. Name the button.
  4. Click the Script button.
  5. Click the New button.
  6. Name the script.
  7. Copy and paste code.  Modify to suit.
  8. Add a parameter called “sp” and connect it to your visualization.
  9. Run script to test. Click OK to close on script window.
  10. Modify the HTML as shown to hide the button.  You don’t want to click it again.


  1. Once you run the script, it does not need to be run again.  When you clicked Run Script the first time, 13 lines were created.  Clicking again will create another 13 lines.  I made this mistake when testing.  Then, I had to delete a ton of lines one by one! (Please upvote my Idea to allow users to delete more than one line at a time).
  2. The script creates the lines, but you still have to edit them one by one.  This might also be possible with IronPython, but I haven’t dug that far yet.
  3. If you copy and paste from my code snippet above, you’ll need to replace the quotes.  Spotfire won’t recognize them correctly from copy and paste.

This should make setting up probit plots just a little bit faster.  You can also modify this code any time you want to add multiple lines to a different visualization or another probit plot.

Spotfire Version

Content created with Spotfire 7.12.

Guest Spotfire blogger residing in Whitefish, MT.  Working for SM Energy’s Advanced Analytics and Emerging Technology team!

Spotfire Best Practices

This post is a follow-up to the 7 part series I wrote on Decomposing Spotfire Projects.  The idea first came to me when I documented a large enterprise project. I didn’t build it. It wasn’t my baby, but it was important to the company. Management wanted it documented in case the author moved on. I also think about this subject every time I build a project for a user. What’s the best way to go about helping the user get a solid grasp on the project?  Is there a formulaic way to decompose a Spotfire project or at least a good order of operations.  Decomposing a project you didn’t build is often quite difficult and can seem like trying to assemble a jigsaw puzzle with 1,000 pieces.  Where do you even start?

So, I broke it up into 7 pieces.  As I wrote the series, I realized I was also writing a bit of a best practices guide or a do’s and don’ts guide for project development.  In an effort to help my readers, I thought a summary post would be helpful.  Thus, this post summarizes 3 – 5 pieces of development advice from each part in the series.  Keep in mind, I am not going to elaborate in great detail.  All the info you need is in each post.  Use this to jog your memory.

Data Tables & Data Sources

  1. Use a naming convention.
  2. Delete unused tables, connections, and data sources.
  3. Limit data as much as you can.  Less is more.

Data Functions

  1. Leave comments in your code to explain what the code is doing in case someone has to modify it later.
  2. Provide a general description of what the data function is doing in the description section.
  3. Include code that will install and attach any required packages.

Data Wrangling

  1. Clean up your joins.  Don’t join to the same table over and over again.  Delete duplicates and have one join and only one join from table to table.
  2. Architecture is the single most important component of any project.  Put time into planning it out.  Don’t just start building.
  3. Document your architecture choices.  Include information on not just what you did but why you did it.

Document Properties

  1. Delete unused document properties.
  2. Document what each document property should influence or control. l
  3. Use a naming convention.

Columns & Calculations

  1. Delete unused calculations or columns.
  2. Add a description to the calculation if it gets complex.
  3. Use naming conventions.
  4. Use Exclude columns transformations to exclude any columns that aren’t needed.

Text Areas & Scripts

  1. Don’t copy and paste from Word into a text area.  Just don’t do it.
  2. Include descriptions in scripts and data functions that explain where they are used or what they impact in the project.
  3. Use the text area to explain how the user should move thru a workflow or text area.
  4. Use HTML and CSS.  That’s not really a best practice per se, but learning even a little bit of HTML will make text areas so much better.

Visualizations & Data Limiting

  1. Remove the unnecessary.  Hide column selectors.  Only show what a user needs to see in a legend if a legend is even needed.
  2. Make data limiting as visible as possible with legend items or naming conventions.
  3. Minimize usage of custom expressions in visualization properties.
  4. Don’t put too many visualizations on a page.  Four is usually the limit.
  5. Articulate the business question you are trying to ask and answer with visualizations.


Now, I know some of these may seem super obvious, but I never cease to be amazed at what people create or leave behind.  Please take these to heart.  The developer that comes after you will be thankful.  

Guest Spotfire blogger residing in Whitefish, MT.  Working for SM Energy’s Advanced Analytics and Emerging Technology team!

Moving Averages in Spotfire

This week a user contacted me for assistance setting up a 3-month moving average calculation.  He’d already attempted it, but the result was wrong.  This is a common problem with the moving average function because of the way it’s built.  That’s not to say that it’s built wrong.  It’s just wasn’t built the way he wanted it to be built.

To explain, I will begin with an example of the Moving Average aggregation used on the y-axis of a visualization because it’s the easiest to understand.  Then, I’ll move on to a moving average calculation in a calculated column, which is a bit different.

Example of Moving Average Written on Y-Axis

The first screenshot below shows the configuration of the bar chart below it.  We are using the Moving Average aggregation and have chosen an Interval size of 3.  The actual expression makes use of the Last Periods node navigation method.  Note, Spotfire uses the term “Last Periods”.  A period is whatever you put on the x-axis, whether that be days, weeks, or months.  In our example, a period is a month.

In it’s simplest form, the expression sums up oil prod then averages it over the last three periods on the x-axis.   However, it’s a bit more complex because there is also an If statement after the average.  The If statement is counting periods on the x-axis.  A result is returned only when the count is 3.  If the result is not 3, null is returned.  That’s why the result is null until the visualization makes it past 3 periods of data.


When you are using a similar expression in a calculated column, it works a bit differently.


Example of Moving Average as Calculated Column

In this example, I am going to use a 3-day moving average rather than a 3-month moving average.  The premise is the same.    I calculated the 3-day moving average with this expression:

Avg([Gas Prod]) over (Intersect([Well Name], LastPeriods(3,[Prod Dt])))

That expression says — Average Gas Prod for each Well Name over the last three periods as defined by the Prod Dt.  In this case, Prod Dt is a day of the month.  Thus, the expression will average Gas prod for each Well Name for the last three days.  Here is the data:

As you can see, Spotfire is taking the first day of gas prod and dividing by one.  Then it adds day 1 and day 2 and divides by 2.  Thus, the first two days aren’t really a 3-day moving average.  This may work for you or it might not.  If you don’t want to see the average until 3-days have passed, simply add 2 more calculations.  One is a counter for the days.  The other is an if statement. This will return null until 3 days have passed, just as the previous example did.

Counter — Rank([Prod Dt],[Well Name])

3-day moving average — If([Count Days]<3,null,Avg([Gas Prod]) over (Intersect([Well Name],LastPeriods(3,[Prod Dt]))))

Hopefully, this clarifies how the function works and also how to use the Last Periods node navigation method.

Spotfire Version

Content created with Spotfire 7.12.

Guest Spotfire blogger residing in Whitefish, MT.  Working for SM Energy’s Advanced Analytics and Emerging Technology team!

Part 7 — Decomposing Visualizations and Data Limiting

This is the final blog post in my 7 part series on decomposing Spotfire projects.  This series is one of the longest I’ve ever written.  Two parts in, I considered writing an ebook instead of a blog series!  This week’s post focuses on decomposing visualizations and data limiting.  If you are new to the series, here are links to the other posts.

As usual, each post in the series will break down into four sections.
  1. Quick and Dirty (Q&D)
  2. Extended Version
  3. Documentation
  4. Room for Improvement
First, the Q&D explains what to look for to get a general idea of what is going on. Then, the Extended Version presents a more complex picture. Documentation provides examples of how to document if necessary. Lastly, Room for Improvement looks at making the project better. Before diving in, I would like to provide context on two subjects — the potential complexity of visualizations and the breadth of data limiting.  Let’s start with the potential complexity of visualizations.

Understanding the Potential Complexity of Visualizations

 One hand, this complexity is awesome.  It allows for a ton of customization in visualizations.  But, it’s a double-edged sword when inheriting a project.  What do I mean exactly?  Custom expressions can be added to any column selector. And, column selectors are everywhere.  They exist in every visualization properties menu.  Thus, without going thru every menu, right-clicking, and selecting Custom Expression, it’s almost impossible to know where they are used.  The same is true for property controls.  You can also ‘Set from Property’ against any column selector.

This is why, I I take screenshots or save copies of DXPs that I am modifying.   From a decomposition standpoint, this is problematic.  There’s not a lot you can do.  Just know what’s possible.  Next, I want to make sure the reader is aware of all the ways in which visualization can be limited.  So, let’s talk about data limiting.

Data Limiting

Data limiting is actually quite a large topic.  I’ve wanted to write comprehensively about it for a while and have the first post in a series drafted.  Thus, this post will stay high level.  That future series will go into greater detail.  The questions we are asking and answering right now are — In what ways can a visualization be limited?  Where are all the possible places you might find data limiting?  Here’s the summary.
  1. Filtering with the filter or data panel
  2. Filtering schemes (Visualization Properties — Data menu — Limit data using filtering section)
  3. Details visualizations or limiting with marking (Visualization Properties — Data menu — Limiting with Marking section)
  4. Limiting with expression  (Visualization Properties — Data menu — Limiting with Expression)
  5. Show/hide rules  (Visualization Properties — Show/Hide Items menu)
  6. Subsets (Visualization Properties — Data menu — Subsets menu)
  7. Relations (Edit menu — Data Table Properties — Relations tab)

You might be wondering why I threw relations in there.  Relations integrate filtering across tables.  I’ve had enough users have problems with it that I thought it worth mentioning separately.

Limiting with 2 – 7 shown here.

Okay, now let’s get into the quick and dirty.

 Quick & Dirty (Q&D)

 Here are the first set of questions you want to ask and answer about visualizations.
  1. Do all of the visualizations work? Do you see any obvious errors?
  2. Is data limited and if so, where or how?
  3. Did the developer include add-ons like lines and curves?

Do all of the visualizations work? Do you see any obvious errors?

Here is an example of a visualization with an error.  A column used in the visualization can’t be found.  When you see these errors, the first and most obvious place to look are the x and y-axis. But, don’t forget columns can be used in custom expressions anywhere in the Visualization Properties menus.  They can be a bit difficult to find.  If the problem isn’t on the axis, start with the Data menu and work your way down the menu list.

Is data limited and if so, where or how?

The data limiting section above explains where to look for data limiting.  Without checking every single menu location, you can also get good information from the legend as shown in this example.

You may need to turn on data limiting and show hide by right-clicking in the white space of the legend.

Did the developer include add-ons like lines and curves?

 Lines and curves may not be super high on the priority list, but it is a good idea to know if they are used.  If the developer included a label, they will be easy to identify as shown in this example below.  They are also identifiable by different formats.  You can only customize the format of lines added via the Lines and Curves menu.  In a generic line chart, all lines will be solid.  

Extended Version

If you want to dig deeper, ask and answer these questions.

  1. Is the visualization controlled by property controls?
  2. Did the developer write expressions on an axis of visualization?
  3. Did the developer build custom expressions into visualizations?

Is the visualization controlled by property controls?

As mentioned above, property controls can be attached to any column selector.  This makes it difficult to find everything they control.  However, I want to show you a little indicator you might not have noticed.  In the screenshot below, the exact same chart is duplicated.  The top chart’s Line by variable is controlled with a property control.  The bottom chart’s Line by variable is not.  The subtle difference is the presence of the down arrow and plus sign.  When property controls are attached, these are no longer options.  Keep an eye out for this.

Did the developer write expressions on an axis of visualizations? Did the developer build custom expressions into visualizations?

Both of these questions might be hard to answer without right-clicking and looking for a custom expression.  However, there are clues.  In the example below, the developer has written an expression on the y-axis.  I know this because the [Axis.] syntax used. Unfortunately, if the developer used the “As” keyword to rename the expression, you will only see the given name.

To learn more about writing expressions on the axis, check out this link.  The post is a bit old, so I apologize if it’s hard to read.  I was new to blogging when I wrote it.


 I am guilty of not documenting my visualizations.  As I write, I realize that I should.  The application doesn’t natively allow tracing of data limiting, custom expressions, or property control usage.  Thus, it’s up to the developer to leave some breadcrumbs.  It might be a good idea to keep lists of ….
  1. Property control connections
  2. Custom expression locations
  3. Data limiting in visualizations

You can also create rules for the project, such as…

  1. Always show data limiting in the legend when applicable.
  2. Indicate property controls define a column selector with an asterisk as shown below.
  3. Use a similar convention for custom expressions in the naming.


Next, let’s talk about making the project better.

Room for Improvement

  1. Does the vis tell the story? What are the questions you are trying to ask and answer? Does the project flow well?
  2. Would zoom sliders make the data easier to consume?
  3. Are the fonts and sizing easy to read?
  4. Does the user need to see everything on the page?

Does the visualization tell the story? What are the questions you are trying to ask and answer? Does it flow well?

Spotfire projects always seem to start out nice and clean and then slowly morph into a bit of a mess.  It’s easy to clutter up visualizations by adding….

  • Big text areas that aren’t using HTML and CSS.
  • Too many visualizations on the page.
  • Too many visualizations attempting to answer the same question.
  • Visualizations that don’t answer the question clearly.

One of the best ways to improve a project is to clearly define the business questions the visualizations are supposed to answer.  Then, build around the order of those questions and other potential questions that might arise while working thru it.

Would zoom sliders make the data easier to consume?

When I first started using Spotfire, zoom sliders were one of my favorite features. In Excel, I had to duplicate bar charts to make up for a jam-packed x-axis.  Add them to any x or y-axis by right-clicking on the visualization and choosing from among the options in Visualization Features.  To take the analysis to the next level, incorporate this IronPython script into a button to easily reset zoom sliders.

Are the fonts and sizing easy to read?

The default font sizing may not work with your monitors and the content shown.  There are three ways to modify.

  1. Modify every visualization (not the best option) in Visualization Properties — Fonts menu.
  2. Modify the theme, which controls fonts and sizing across the analysis.
  3. Use HTML and CSS to customize the fonts.
Click the black and white button in the toolbar to access themes. Then, review all of the themes menus for font and sizing customization in different places.
Example of fonts in visualization properties.

Does the user need to see everything on the page?

Visualizations can get overcrowded, but you may hide many of the “standard” items on a plot.  Simply right-click on the visualization, select Visualization Features, and then deselect as desired.  It would be nice if there was a way to incorporate this across all visualizations, but this feature isn’t available yet.  More specifically, if you are worried about users changing visualizations, you can hide the axis selectors.  New users won’t know how to turn them back on.



Whew!  That was a seriously long online discussion.  I hope you found the content useful.  Next up, I have a series coming out on data limiting and another intermittent series that I’m calling “Simple R”.  Until then, Happy Holidays!

Spotfire Version

Content created with Spotfire 7.12.

Guest Spotfire blogger residing in Whitefish, MT.  Working for SM Energy’s Advanced Analytics and Emerging Technology team!

How to Build a Multiple Variable Probit Plot

A few years ago, I wrote a post on how to add a probit plot to a Spotfire project.  I wrote the post early on in my blogging days, and going back to it was a little painful.  It violates several of my “blogging rules” developed over the years.  For example, I should have simplified the example.  I wrote it, and it was even hard for me to reread.  Blog and learn.  Blog and learn.

Anyway, early in the post, I note, ” It is possible to create a single probit plot with multiple variables, but that requires some data wrangling and is not included in this set of instructions.”  Well, these days all my probit plots contain multiple variables.  A manager recently asked me to write up a set of instructions for this common task.  So, here you go.

How to Build a Multiple Variable Probit Plot in Spotfire

Desired Output

First, what exactly are we trying to create?  Here is an example.  You’ll be familiar with the log scale, and as you can see there are multiple lines on the plot for different variables.

High-Level Steps to Desired Output

Creating the desired output is a two-step process.

  1. Unpivot data table
  2. Create probit plot (scatter plot)

Unpivot Data Table

Presumably, the user has a data table that looks like the one shown below.  Each variable is it’s own column.

The first step to creating a multiple variable probit is to unpivot this data with a transformation. The end result will look like the example below.  Columns are transformed into rows.


The column names Measure and Value can be changed to names the user finds appropriate. The table will be narrower and taller.

Follow the steps below…

  1. Go to Edit – Transformations.
  2. Select Unpivot from the drop-down menu of transformations.
  3. Move all columns that are staying the same to “Columns to pass through”.
  4. Move all other columns, the columns that are being transformed from columns to rows, to “Columns to transform”.
  5. Name the two new columns and make sure the data types are correct. Measure should be string and Value should be real or integer.
  6. Click OK.

With the data taken care of, now create the probit plot.

Create Probit Plot (scatter plot)

High-Level Steps

Creating the plot is actually more time consuming than wrangling the data.  Adding the secondary lines takes the most time and is optional.

  1. Create a basic plot
  2. Configure the visualization
  3. Format the x axis
  4. Add straight line fit
  5. Add secondary lines (optional)
  6. Filter to appropriate content


  1. There are no calculated columns, only a custom expression written on the axis of the scatter plot. Because the expression is written on the axis of the visualization, the calculations will update with filtering.
  2. Filtering on the Measure column will control which variables appear in the plot.
  3. The 90th percentile in Spotfire is equivalent to P10. The 10th percentile in Spotfire is equivalent to P90.

Follow the steps below…

  1. Create basic plot
    1. Add a scatter plot to the page
    2. Set the data table to the “unpivoted” table
  2. Configure the visualization
    1. Place the Value column on the x-axis of the visualization
    2. Right-click on the y-axis of the visualization and select Custom Expression
    3. Enter the following expression — NormInv((Rank([Value],”desc”,[Measure]) – 0.5) / Count([Value]) OVER ([Measure])) as [Cum Probability]
    4. Set the Color by selector to the Measure column
  3. Format the x-axis
    1. Right-click on the visualization, select Properties. Go to x-axis menu.
    2. Set the Min and Max range as shown below in Figure 1.
  4. Add straight line fit
    1. Right-click on the visualization, select Properties. Go to the Lines & Curves menu.
    2. Click the Add button to add a Horizontal Line, Straight Line.
    3. Click OK at the next dialog box.
    4. Click the One per Color checkbox as shown below in Figure 2.
  5. Add secondary lines (see Figure 3 below for example)
    1. Horizontal Lines (P10, P50, P90, etc)
      1. If you are still in the Lines & Curves menu, click the Add button to add a Horizontal Line, Straight Line.
      2. To add P10, P50, and P90, select the Aggregated Value radio button as shown below in Figure 3.
        1. For P10, select 90th
        2. For P50, select Median.
        3. For P90, select 10th
      3. For all other values, select the Custom expression radio button as shown in below in Figure 4. Enter this expression — Percentile([Y], 30) and modify.  For P70, the value is 30.  For P60, the value in the expression should be 40, and so on.
      4. Format the line color, weight, format, and label as desired using the options circled in Figure 5 shown below.
    2. Vertical Lines
      1. The example plot shown has vertical lines at 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, and 10000. Each one must be added individually.
      2. If you are still in the Lines & Curves menu, click the Add button to add Vertical Line, Straight Line.
      3. To the line, select the Fixed Value radio button and enter the value as shown below in Figure 6.
      4. Format the line color, weight, format, and label as desired using the options circled in Figure 5 shown below.

Reference Figures

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

And now you have a multiple variable probit plot!

Spotfire Version

Content created with Spotfire 7.12.

Guest Spotfire blogger residing in Whitefish, MT.  Working for SM Energy’s Advanced Analytics and Emerging Technology team!

Part 6 – Decomposing Text Areas & Scripts

It’s been 3 weeks since my last post.  I haven’t given up Spotfire blogging.  I just went on a nice long vacation, and then needed a week to recover from my vacation.  But, I’m back on track, and this week’s post is part 6 of the 7 part series on Decomposing Spotfire projects.  This piece focuses on decomposing the text area and scripts within the text area.  Because there is a lot you can put in a text area, this is going to be one of the longer posts in the series. If you are new to the series, here are links to the other posts.
 As usual, each post in the series breaks down into four sections.
1. Quick and Dirty (Q&D)
2. Extended Version
3. Documentation
4. Room for Improvement
First, the Q&D explains what to look for to get a general idea of what is going on. Then, the Extended Version presents a more complex picture. Documentation provides a few examples of how to document if necessary. Lastly, Room for Improvement looks at a project from the standpoint of making it better.  This post will kick off with a brief summary of the components users can add to text areas.

Text Area Basics

If you haven’t spent much time in the text area, check out this post, which explains the breadth of text area usage. To summarize, the text area is a mini web page containing one or more of the elements listed below.
  1. Images
  2. Action controls in the form of buttons, links, or images.
  3. Property controls in the form of drop downs, inputs, list boxes, or sliders
  4. Filters
  5. Dynamic Items in the form of icons, calculated values, bullet graphs, or sparklines

If you are unfamiliar with property controls, check out this $5 template I have posted on the Exchange. It explains in detail how to use all of the Spotfire property controls.

Users may also further customize text areas with HTML, CSS, IronPython, or JavaScript. In fact, the more I learn to add functionality with these programming languages, the more I love Spotfire. They really allow you to expand upon the base functionality of the application. Now that you understand the scope of text areas, let’s break it down.

Quick & Dirty (Q&D)

Here are the quick and dirty questions to ask and answer to get started.
  1. What types of elements does the text area contain? 
  2. Are scripts or data functions triggered by items in the text area?
  3. Do all elements work (including scripts)?

Before we dive in, I want to note all examples are shown using the Edit HTML option for editing text areas.  I always customize the HTML in text areas and thus don’t use Edit Text Area anymore.

What types of elements does the Text Area contain?

Perform a quick scan to get an idea of the volume of elements you’ll have to investigate. Do you see property controls, action controls, dynamic items, and filters? Are there a lot of buttons? These elements are easy enough to observe (most of the time). It is possible to hide elements. Here’s an example of HTML where an element is being hidden.  In this example, JavaScript is used to pass a value to a hidden property control.  The user doens’t need to see the property control, but the value it contains is referenced elsewhere in the DXP.
Here, the style attribute hides a property control.
Now, IronPython and JavaScripts are only employed in the text area. Buttons and property controls frequently trigger IronPython scripts.  Data functions are different.  They may run automatically when a file opens, when a filter changes, or when data is marking.  However, it’s also possible to configure them to run with the click of a button.   
Before exploring your text areas, go to Edit — Document Properties — Scripts to see a list of IronPython and JavaScripts.  Go to Edit — Data Function Properties to see how many TERR data functions the documents contain.  Now that you know the names, you’ll be aware of them as you review text areas.  If these properties dialogs are empty, you can scratch scripts off the list.  If they do appear, the next step is finding which text area they work in. We’ll discuss this below.
All scripts in a DXP are shown in Edit – Document Properties – Scripts.
All data functions live in Edit — Data Function properties.  This example has one data function.

Are scripts or data functions triggered by items in the Text Area?

Once you know what you are looking for, open each text area. Remember to right-click on the text area and select Edit HTML rather than Edit Text Area. This is important because Edit Text Area won’t show you a list of JavaScripts or the HTML.   
Edit Text Area looks very different from Edit HTML.

“JS” signifies JavaScripts.  All JavaScripts used in a text area will appear in the list on the right-hand side.

All elements (including scripts) appear in a list to the right.
Now, while you’ll see JavaScripts clearly, IronPython and data functions are different.  They don’t appear on the list.  They are attached to the element, and you must edit the element to see the script.  
The Edit Action Controls dialog includes a list of all scripts in the DXP. If one is highlighted, it is connected to the action control.
Scripts have their own column in the edit Property Control dialog as shown here. This means a script is connected to the property control.


If a button connects to a data function, the dialog will look like this when editing.
Now that you know what to be on the lookout for, let’s talk about functionality.

Do all the elements work?

This is a very important questions, and you should not assume everything works.  To give you some direction, here’s a list of items and functions to test.
  • Links — Do they connect to the right website? Do they navigate to the correct place?
  • Buttons — Do they perform the described function? Do they navigate to the correct place? It’s not hard to break buttons.  
  • Filters — Are they connected to the right column for the right filtering scheme?
  • Property controls — Do the property controls shown actually control something on the page?
  • Dynamic items — Are the values correct?
Those are the basics. Next, let’s move on to the Extended Version.

Extended Version

  1. Are all the elements used?
  2. Are property controls being re-used?
There is always the possibility of unnecessary elements in the Text Area. These would be elements the developer added but never implemented or things he/she intended to go back and delete but didn’t have the time. It happens a lot.  It’s also good to know if the developer reused property controls, which developers do avoid creating more property controls. It’s not necessarily a bad thing. But, it is good to know that modifying a property control in one page will affect a different page.
Next, let’s talk a little bit about documentation.


To be honest, I don’t normally document the text area. However, I can think of a few good reasons to do so.  It would be helpful to know….
  • Which pages contain utilize scripts or data functions?
  • What are those scripts supposed to do?
  • Where are property controls reused?
  • If there is a workflow, how is the user supposed to move thru it or work with it?
Since I haven’t documented my text areas previously, I don’t have any good examples to show.  So, let’s move on to improving the project.

Room for Improvement

When thinking about how to improve a project, here are some questions to ask and answer.
  1. Is it pretty enough? Does each page have a consistent look and feel? 
  2. Are you optimizing “real estate” on the page?
  3. Do images need resizing?
  4. Is the HTML a mess?

Is it pretty enough? Does each page have a consistent look and feel? 

Now, this question may seem silly or unnecessary. But, I will say two things. One, before I knew how to write HTML and CSS, I built some really ugly text areas.  Just take a look at my early template submissions on (anything from 2016).  My submissions improved when I built a template with custom HTML and CSS to use as a starting point.
Two, make it pretty counts for a lot.  Upper management wants to see pretty reports, charts, and graphs.  Also, if the project looks good, people will use it.  As a result, much of my job when building projects is to make it pretty. Never underestimate the importance of aesthetics. 

Are you optimizing “real estate” on the page?

Items like legends, panels, axis selectors, and descriptions take up space on the page. It’s possible to hide all of them.  The captions below explain how.
Right-click in the white space of the legend to see what items can be turned on/off.


Right-click on a visualization and select Visualization Features to see what items can be turned on/off.
You can also use IronPython to turn panels on and off with the click of a button. Check out this post for an example.  

Do images need resizing?

Sometimes users add images to a text area but don’t take the time to get the image sizing/resolution right. You can modify this in the HTML.

Is the HTML a mess?

The single most common source of messy HTML is copy and paste, usually from Word. Here is a painful example. Not only is this difficult to edit, but the copy and paste usually don’t come out how you want it in the text area anyway.  
Avoid copying and pasting from Word into Spotfire Text Areas. Learn a little bit of HTML, and you will be much better served. HTML is one of the easiest languages to learn because you can learn a lot in a short period of time. If you need a starter tutorial, check out my Intro to HTML series.


Whew, that was a long one!  It’s been quite the journey.  Only one post remains in the series — Decomposing Visualizations. After that, I’ll do a follow up with a post on Best Practices. Then, I’ll round it out with a post on things I would like to see improved in the application to would help users break apart projects.

Spotfire Version

 Content created with Spotfire version 7.12.

Guest Spotfire blogger residing in Whitefish, MT.  Working for SM Energy’s Advanced Analytics and Emerging Technology team!

Part 5 – Decomposing Calculations

This week’s post (part 5 of 7) focuses on decomposing columns and calculations.  If you are new to the series, here are links to the other posts.

  1. Intro to the Series
  2. Decomposing Data Tables & Connections
  3. Breaking Down Data Functions
  4. Untangling Data Wrangling
  5. Decomposing Document Properties

As usual, each post in the series will be broken down into four sections.

  1. Quick and Dirty (Q&D)
  2. Extended Version
  3. Documentation
  4. Room for Improvement

First, the Q&D explains what to look for to get a general idea of what is going on.  This week, I have combined the Extended Version and Room for Improvement.  Lastly, Documentation provides a few examples of how to document if necessary. 

Quick & Dirty

The quick and dirty on this post is going to be very quick.  The most important thing to understand about calculations is — Do they work?  After that, it’s all additional information and improvement.  There are two easy ways to find out if all calculations are intact.

  1. Warnings upon file opening
  2. Edit > Column Properties > Is Valid

Spotfire will provide warnings in the bottom left-hand side of the screen when the DXP opens indicating issues or problems.  If you see a red exclamation mark when opening the DXP, click on the link, copy the contents to the clipboard and save to review.

This example shows a warning. A red exclamation mark will appear if calculations are broken.

My preferred way to check for broken calculations is to use the Edit > Column Properties menu.  Open this dialog, find the IsValid column, and click on it to sort.  A value of True means the calculation is working.  A value of False means the calculation is broken.  When I say broken, I mean one of the columns or document properties referenced in the calculation is no longer recognized.  Investigate each False column by clicking the Edit button.  Any element not recognized will be red.

Note the value of false indicating a broken column.
This is broken because I deleted the GAS column.

Now, keep reading if you want to go deeper.

Extended Version/Room for Improvement

Once you have confirmed all calculations are working, everything else is really information and/or improvement.  Here are some questions you can ask and answer.

  1. Did the author use a column naming convention? Should you create one?
  2. Did the developer add descriptions to the columns? Should you add some?
  3. Are all of the columns and calculations used? Can you delete/exclude unused calculations or columns?
  4. Did the developer insert calculated columns into other tables? Is it possible to rearchitect?
  5. Is it possible to combine calculations to create a smaller table?

Did the author use a column naming convention? Should you create one?

Naming conventions are great…if you can come up with one and stick to it.  A few weeks ago, I had serious heartburn about a large project I built without a column naming convention.  Most developers don’t because Spotfire doesn’t make it easy.  Naming conventions take time but are well worth the effort.  To find them, simply scroll thru the columns to see if anything jumps out at you.  Developers may employ naming conventions in the following places…

  1. Notations for calculations, like “c.” (a personal favorite as indicated by my screenshots)
  2. Notations for calculation created by transformations “t.”
  3. Units of measure
  4. Data sources
  5. Conventions for tables that might have similar or duplicate columns, such as “c.Master”

Did the developer add descriptions to the columns? Should you add some?

Now, Spotfire does not allow you to embed comments in calculations, as with scripts or data functions.  Why would you need that?  Well, I recently wrote a calculation that was 67 lines of case logic.  It would have been nice to add comments explaining what each line of logic was doing.  However, I do frequently use the column description to explain the details of a calculation or column.  Simply walk down the list of columns looking at each one to see if there is a description.

Are all of the columns and calculations used? Can you delete/exclude unused calculations or columns?

It is also helpful to know if all of the columns are used.  You could delete or exclude (via transformation) unused columns to make the project more user-friendly.  Unfortunately, there is no easy or quick way to figure this out.  Ultimately, you want to know if columns connect to visualizations, custom expressions, calculations, scripts, data functions, or text areas.

Additionally, attempting to delete a calculation will indicate if it is part of another column.  As shown below, Spotfire will warn against deleting it.  It would be a nice improvement on the software if you could see where columns are used in calculations.

Did the developer insert calculated columns into other tables? Is it possible to rearchitect?

I see users do this all the time, and it’s one of my pet peeves.  Avoid this if you can. Why?  First, if the granularity of the tables differs, the data may be wrong.  Second, if a user wants to understand the calculation, they must trace back to the table where the calculation originated.  Third, most likely, the calculation isn’t needed in both tables, which means it’s inefficient. If the developer employed a naming convention, it is easy to find inserted calculated columns.

Here, I have used the nomenclature “c.” to denote the calculated column in the table where they originate.


Note: Calculated columns inserted into a table don’t appear as Column Type “Calculated” anymore. Now they are “Imported”, which is why the naming convention is helpful.

 Is it possible to combine calculations to create a smaller table?

When developing a project, it is extremely common to break complex calculations into pieces to see if they work or to troubleshoot.  A good developer will go back and combine them for maximum efficiency.  However, it’s easy to fall into the trap of — It works!  Let’s move on.  There’s no easy way to figure this out other than to just spend time in the project and get to know the calculations.  I recommend undertaking this task after developing a solid understanding of the project.


Documentation like this is valuable for two reasons.  First, it allows you to take a comprehensive look at the project.  You get a high-level look, rather than the detailed look of examining one by one. Second, it prevents having to relearn the project.  Check the reference rather than digging thru the project repeatedly.

This matrix explains usage by asking and answering the question — Where is this column used?


This matrix shows how calculations connections and relationships.



Well, that wraps up this post.  There are only two posts left in the series — Text Areas & Scripting and Visualizations & Data Limiting. Originally, this was a 7 part series, but I will add on at least one more part if not two.  As I’ve written the series, I have also slowly generated a list of things good developers do.  Thus, it makes sense to summarize best practices.  I’ve also inadvertently generated a list of potential improvements to the application.  I’ll write those up, add them to the Ideas portal, and create a summary linking to the Ideas portal that will allow users to upvote those ideas.

Spotfire Version

Content created with Spotfire 7.12.

Guest Spotfire blogger residing in Whitefish, MT.  Working for SM Energy’s Advanced Analytics and Emerging Technology team!