Part 2 – Decomposing Data Functions

This is part 2 of a 7 part series on how to decompose Spotfire projects.  Last week, I started off with a post on data tables and connections.  This week, I am covering how to decompose data functions.  Data functions are quite frequently used to create data tables and can create a lot of complexity in a project.  Thus, the series covers them early on.

Now, each post in the series will be broken down into four sections.

  1. Quick and Dirty (Q&D)
  2. The Extended Version
  3. Documentation
  4. Room for Improvement

The Q&D explains what to look for to get a general idea of what is going on.  The Extended version works to develop an in-depth knowledge of the project.  Each post will also include suggestions for documentation and improving the project.

Quick and Dirty

  1. What are the data functions doing?  Are they creating new tables, new columns, or just transforming data?
  2. What’s going into and out of them? What are the input and output parameters?
  3. Do they work?  It happens that someone attempts to write a data function and it doesn’t work out.  Rather than delete it, they just leave it.

How: Go to Edit > Data Function properties and review the script, input parameters, and output parameters.

 

First, go to Edit script. Then, Edit Parameters. Finally, hit Refresh to confirm the data function works.

 

 

Work thru each tab to understand the data function.

 

The Edit Parameters dialog provides information on how the parameters are connected to the data in the DXP.

Extended Version

  1. Are the data functions using property controls, filtering schemes, or marking?
  2. Is any code commented out?
  3. Do the data functions use packages?
Data limiting with filtering or marking is shown in the Edit Parameters dialog towards the bottom of the menu.

 

The # sign comments out a line of code.  When a line of code is commented out, it will not be executed. Sometimes developers comment out code while testing.  Other times, the # sign is use to add comments on what the code does.

 

Developers commonly attach a package to the library or install the package in the first few lines of code.

Documentation

The screenshots below were taken from the documentation of two different projects.  One of them I fully documented using a PowerPoint deck.  The slides shown below summarize the usage of data functions. The last screenshot is Excel documentation to show which tables were an input or an output of a data function.

Improvement

Now, here are a few suggestions for improving data functions in a project.

  1. Add commenting in the data function to improve ease of understanding, especially if it’s a long data function.
  2. Remove unused sections of code to clean up the data function.
  3. Combine data functions where possible. It’s very common for developers to build as they go, but that separate might make the project harder to understand. Consider combining data functions where applicable.
  4. Add descriptions in the data functions if they don’t have them.  This will make them easier to understand 6 months from now.
This screenshot demonstrates adding commenting to translate the object names to table names in the DXP.

Spotfire Version

All content created with Spotfire 7.12.

Well, that wraps up how to decompose data functions.  Next week, we’ll look at how to understand table build and data wrangling.

Accessing Data Function Parameters

I had one of those moments recently, where a dialog box I’ve stared at for years presented something new to me.  In this case, it was the Settings button in the Data Table Properties dialog.  Now, this button will be grayed out for all tables except on-demand tables.  At least, that is what I thought.  As I scrolled thru my tables, the box appeared active for a table I knew was not configured on-demand.  Rather, a data function built the table.  Clicking on it then pulled up the Edit Parameters dialog.  When investigating tables, it’s helpful to be able to access data function parameters from Data Table properties.

 

Clicking on Settings pulls up the Edit parameters dialog for any table created by a data function.
This allows the user to access data function parameters without going thru the Data Functions dialogs.

Spotfire Version

Content created with Spotfire 7.12.

Part 1 – Decomposing Tables & Data Sources

Last week, I teed up a series on decomposing Spotfire projects.  This week, we are diving in with Part 1 — Decomposing tables and data sources.  I cover tables and data sources first because they are the foundation for the project. You must understand where data is coming from to understand the rest of the project.

Now, each post in the series will be broken down into four sections.

  1. Quick and Dirty (Q&D)
  2. The Extended Version
  3. Documentation
  4. Room for Improvement

The Q&D explains what to look for to get a general idea of what is going on.  The Extended version works to develop an in-depth knowledge of the project.  Each post will also include suggestions for documentation and improving the project.

Quick Word on Documentation

Before we dive in, I want to say a few words about documentation.  Decomposing Spotfire projects will most likely be messy and complex.  It may start off simple, leading you to believe, you can keep track of how everything is built and structured in your head.  I’ll tell you right now, you can’t.  Any project that has more than 3 tables will get complicated quickly.  It’s also likely you’ll be interrupted or have to work intermittently.  It’s likely, even probable, that you’ll forget what you learned.  I highly recommend documenting as you go.  There is no perfect tool.  I use a combination of PowerPoint and Excel.  Ultimately, I would love to see some documentation functionality available in the application in the future.  Just pick a tool and stick with it through a single project.

Now, let’s dig into tables.

Quick and Dirty

  1. How many tables does the project contain?
  2. Are any tables embedded?
  3. Where do the tables originate from?

How: Go to  Edit > Data Table Properties > Source Information or open the Data Panel.  See specific details in the captions.  Click on one table at a time.

Click on one table at a tab and view the first piece of information in the Source Information tab.

 

Select the table from the drop-down in the top right-hand corner of the panel. Then, click on the Cog. Finally, click Source View in the top right-hand corner of the expanded menu. Source Information is in the information tab.

 

Information on whether a table is linked or embedded is in the General tab.

 

Extended Version

  1. Are any of the data sources on demand?
  2. Are any of the data sources limited by prompts before entering the DXP?
  3. How large are the tables and how long do they take to load?
  4. Are there any data connections in the DXP?

How: Go to  Edit > Data Table Properties > Source Information or open the Data Panel.  See specific details in the captions.  Click on one table at a time.

If a table is either configured with on-demand settings or created with a data function, the Settings button will be available (i.e. not grayed out). Click to see on demand settings or data function parameters.

 

On demand settings are clearly visible from the Data Panel when you click on the first data source.

 

Unfortunately, the only way to know if an information link has prompts and what has been selected is to click Refresh Data With Prompt on all information links. It’s painful, but you’ll only do it once.

How: Go to Help > Support Diagnostic and Logging > Diagnostic Information

This menu shows the size and load time for every table in the DXP. This includes counts of columns and rows. Click the Copy Info button and paste into Excel for easier viewing.

How: Go to Edit > Data Connection Properties 

Go to Edit > Data Connections to see if any data connections exist.

Lastly, another obvious question to ask and answer is — Are all of the data tables being used in the project?  A future post will address this question.

Documentation

As previously noted, documentation can be as simple or as complex as you want it to be.  This is an example of a simple PPT table containing information on the table source and some general properties.  I would also add a column to denote On Demand tables.

This is a PPT slide I put together to summarize the usage of different tables in a DXP.  This slide was part of a larger PPT deck to document a project.

Improvement

Finally, before wrapping up, I want to suggest a few ways to improve the project tables.  Here is a bit of QA for you to think about.

  1. Would renaming tables make them easier to understand?  Take a look at this post on naming data tables. 
  2. Would renaming data connections make them easier to understand?  Most people don’t name their data connections, and if there is more than one, it gets confusing quickly.
  3. If a table is embedded, would it load faster if transitioned to a data set? It is frequently the case that connecting to a data set is faster than connecting to a file.

That’s it for this post.  I hope you found it useful. Next week’s article will look at Data Functions.

Spotfire Version

Content created with Spotfire 7.12.

IronPython to reset all tags

For a recent project, we wanted the user of a Spotfire template to be able to reset all the tags, including both built-in tags and tags the user may have added, with a single click.

One solution is to use an IronPython script to reset each tag collection in the template, keeping in mind that, if we hard-code the names of the tags to reset, we won’t be able to handle cases where users have added custom tags. In order to reset tags that may be added later on, we can use Spotfire’s IronPython API to identify and reset tag columns on the fly.

Here is what the code looks like in Spotfire:

And in more copy/paste-friendly format:

from Spotfire.Dxp.Data import *
from System import Array

for table in Document.Data.Tables:
    for column in table.Columns:
        if str(column.Properties.ColumnType) == “Tags”:
            tagCollectionName = column.Properties.Name
            tagColumn = column.As[TagsColumn]()
            tagNames = Array[str](tagColumn.TagValues)
            currentTable.Columns.Remove(column)
            currentTable.Columns.AddTagsColumn(tagCollectionName,tagNames)

Notes on the code:

This script loops over all the tables in the Spotfire analysis and checks the type of each column. When we locate a tags column, we extract the name of the tag collection, as well as any associated tags, drop the column from the data table, and create the tags collection anew. We can use this approach to reset all tags with one command because, by default, all datapoints are untagged when a “new” tag collection is created. The script will reset all tags in the analysis, including custom tags added by the user, and it will simply exit quietly if there are no tags. There are no input/output parameters for this script, but we could use input parameters to apply the same tag-resetting logic to a specific subset of the data.

Spotfire Version

Content created with Spotfire 7.12