Part 5 – Decomposing Calculations

This week’s post (part 5 of 7) focuses on decomposing columns and calculations.  If you are new to the series, here are links to the other posts.

  1. Intro to the Series
  2. Decomposing Data Tables & Connections
  3. Breaking Down Data Functions
  4. Untangling Data Wrangling
  5. Decomposing Document Properties

As usual, each post in the series will be broken down into four sections.

  1. Quick and Dirty (Q&D)
  2. Extended Version
  3. Documentation
  4. Room for Improvement

First, the Q&D explains what to look for to get a general idea of what is going on.  This week, I have combined the Extended Version and Room for Improvement.  Lastly, Documentation provides a few examples of how to document if necessary. 

Quick & Dirty

The quick and dirty on this post is going to be very quick.  The most important thing to understand about calculations is — Do they work?  After that, it’s all additional information and improvement.  There are two easy ways to find out if all calculations are intact.

  1. Warnings upon file opening
  2. Edit > Column Properties > Is Valid

Spotfire will provide warnings in the bottom left-hand side of the screen when the DXP opens indicating issues or problems.  If you see a red exclamation mark when opening the DXP, click on the link, copy the contents to the clipboard and save to review.

This example shows a warning. A red exclamation mark will appear if calculations are broken.

My preferred way to check for broken calculations is to use the Edit > Column Properties menu.  Open this dialog, find the IsValid column, and click on it to sort.  A value of True means the calculation is working.  A value of False means the calculation is broken.  When I say broken, I mean one of the columns or document properties referenced in the calculation is no longer recognized.  Investigate each False column by clicking the Edit button.  Any element not recognized will be red.

Note the value of false indicating a broken column.
This is broken because I deleted the GAS column.

Now, keep reading if you want to go deeper.

Extended Version/Room for Improvement

Once you have confirmed all calculations are working, everything else is really information and/or improvement.  Here are some questions you can ask and answer.

  1. Did the author use a column naming convention? Should you create one?
  2. Did the developer add descriptions to the columns? Should you add some?
  3. Are all of the columns and calculations used? Can you delete/exclude unused calculations or columns?
  4. Did the developer insert calculated columns into other tables? Is it possible to rearchitect?
  5. Is it possible to combine calculations to create a smaller table?

Did the author use a column naming convention? Should you create one?

Naming conventions are great…if you can come up with one and stick to it.  A few weeks ago, I had serious heartburn about a large project I built without a column naming convention.  Most developers don’t because Spotfire doesn’t make it easy.  Naming conventions take time but are well worth the effort.  To find them, simply scroll thru the columns to see if anything jumps out at you.  Developers may employ naming conventions in the following places…

  1. Notations for calculations, like “c.” (a personal favorite as indicated by my screenshots)
  2. Notations for calculation created by transformations “t.”
  3. Units of measure
  4. Data sources
  5. Conventions for tables that might have similar or duplicate columns, such as “c.Master”

Did the developer add descriptions to the columns? Should you add some?

Now, Spotfire does not allow you to embed comments in calculations, as with scripts or data functions.  Why would you need that?  Well, I recently wrote a calculation that was 67 lines of case logic.  It would have been nice to add comments explaining what each line of logic was doing.  However, I do frequently use the column description to explain the details of a calculation or column.  Simply walk down the list of columns looking at each one to see if there is a description.

Are all of the columns and calculations used? Can you delete/exclude unused calculations or columns?

It is also helpful to know if all of the columns are used.  You could delete or exclude (via transformation) unused columns to make the project more user-friendly.  Unfortunately, there is no easy or quick way to figure this out.  Ultimately, you want to know if columns connect to visualizations, custom expressions, calculations, scripts, data functions, or text areas.

Additionally, attempting to delete a calculation will indicate if it is part of another column.  As shown below, Spotfire will warn against deleting it.  It would be a nice improvement on the software if you could see where columns are used in calculations.

Did the developer insert calculated columns into other tables? Is it possible to rearchitect?

I see users do this all the time, and it’s one of my pet peeves.  Avoid this if you can. Why?  First, if the granularity of the tables differs, the data may be wrong.  Second, if a user wants to understand the calculation, they must trace back to the table where the calculation originated.  Third, most likely, the calculation isn’t needed in both tables, which means it’s inefficient. If the developer employed a naming convention, it is easy to find inserted calculated columns.

Here, I have used the nomenclature “c.” to denote the calculated column in the table where they originate.

 

Note: Calculated columns inserted into a table don’t appear as Column Type “Calculated” anymore. Now they are “Imported”, which is why the naming convention is helpful.

 Is it possible to combine calculations to create a smaller table?

When developing a project, it is extremely common to break complex calculations into pieces to see if they work or to troubleshoot.  A good developer will go back and combine them for maximum efficiency.  However, it’s easy to fall into the trap of — It works!  Let’s move on.  There’s no easy way to figure this out other than to just spend time in the project and get to know the calculations.  I recommend undertaking this task after developing a solid understanding of the project.

Documentation

Documentation like this is valuable for two reasons.  First, it allows you to take a comprehensive look at the project.  You get a high-level look, rather than the detailed look of examining one by one. Second, it prevents having to relearn the project.  Check the reference rather than digging thru the project repeatedly.

This matrix explains usage by asking and answering the question — Where is this column used?

 

This matrix shows how calculations connections and relationships.

 

Conclusion

Well, that wraps up this post.  There are only two posts left in the series — Text Areas & Scripting and Visualizations & Data Limiting. Originally, this was a 7 part series, but I will add on at least one more part if not two.  As I’ve written the series, I have also slowly generated a list of things good developers do.  Thus, it makes sense to summarize best practices.  I’ve also inadvertently generated a list of potential improvements to the application.  I’ll write those up, add them to the Ideas portal, and create a summary linking to the Ideas portal that will allow users to upvote those ideas.

Spotfire Version

Content created with Spotfire 7.12.

Guest Spotfire blogger residing in Whitefish, MT.  Working for SM Energy’s Advanced Analytics and Emerging Technology team!

Leave a Comment

Your email address will not be published. Required fields are marked *