Air Traffic Delays during the Holiday Season

Happy Holi-delays!

Howdy! I’m going to be looking at US air traffic delays during the holiday season. More specifically air travel trends in November and December. This is more of a for fun analysis as well as my first real dive into the world of Spotfire. If there’s anything wonky or weird about my analysis, just bear with me! I’ve posted this template on so feel free to download it here and follow along.

The data I’m using is from a few different sources which I’ve cited at the bottom of this post.

The data set that kicked off this idea was found on Kaggle, a site for data sets and data science competitions. The original set contained air traffic delays for the entire year of 2008 which then led me to an even larger data set with data all the way back to 1987. My analysis is only looking at November/December in 2006 and 2007 so the view is a little narrow. After concatenating the entire data set, the file was almost 11GB which did result in some cool visualizations but it was too large to make a template out of.

The Map

This data set has some fun properties to it. It allows you to get an idea of the different types of delays that occur such as weather or security. In Spotfire I joined another data set that I found which contains the latitudes and longitudes of each airport. This allowed me to get a map of each airport in the United States.


In the above visual, the size represents the number of people traveling to each city while  the color represents the average departure delay. Right off the bat, you can see the highest traffic ports such as Atlanta, Chicago, and DFW. These guys have an absurd amount of traffic going through them. DFW, for example, has about 46k flights going into the port and Atlanta has a whopping 65k unique flights. For fun I did some napkin math to get an idea of how many people are flying into just Atlanta for these two months. I used a rough average for the number of seats on a plane which was about 200 [3] and about 80-85% of the seats are usually filled [4]. That puts us at something like 166 people if we use 83% of the seats being filled. Which means that Atlanta handled somewhere around 10.8M people in these two months for these years combined. That’s a lot of people for a single airport, and they seem to do a pretty good job at handling it! The average overall delay is about 23 minutes. Chicago (ORD) on the other hand is a little worse off with an average overall delay of about 40 minutes. The overall was calculated by just adding the arrival delay and departure delay for each instance.

The Graph

The above line graph shows the delay per day. It’s pretty obvious when the holidays occur and end, which was kind of a neat result of visualizing this. To me, the most interesting thing about this graph is the Half Dome peak right before Christmas Eve.

The similarities are striking!  I also enjoy how the middle of December sees a giant spike in delays and then a big dip back to normality before the climb to the top.

The Data

Interested in digging more into this data? Download the template and play around for some fun visualizations and neat stats. One thing to note is that this template only contains a small subset of the actual dataset that I used. If you want the full set, you’ll have to go to this place and download the zip files. They’re compressed bz2’s so you’ll need a special program to open these such as 7zip or WinRAR. Unless you’re on Linux, then the ole bzip2 -dk in your data’s directory from the shell should be enough. For the longitude and latitudes, I used this site which provides a csv for all airports, not just the USA’s. One thing I found limiting was that the airport data doesn’t contain an airport’s state for those in the United States. Fortunately, I found a site that has this information in table form so you would just need to scrape the site for the relevant information and mess with the data table properties in Spotfire.

Hopefully this post was interesting to you or at the very least an insight into how busy Atlanta is. Thanks for reading and have some happy holidays!









Wrangling Data Science in Oil & Gas: Merging MongoDB and Spotfire

Data science in Oil and Gas is central stage as operators work in the new “lower for longer” price environment. Want to see what happens when you solve data science questions with the hottest new database and powerful analytics of Spotfire? Read on to learn about our latest analytics module, the DCA Wrangler. If you want to see it in action, scroll down to watch the video.

Layering Data Science on General Purpose Data & Analytics is a startup focused on energy analytics and technical data science. We are both TIBCO and MongoDB partners, heavily leveraging these two platforms to solve real-world problems revolving around the application of data science at scale and within the enterprise environment. I started our plucky outfit a little under four years ago. We’ve done a lot of neat things with Spotfire including analyzing seismic, and well log data. Here, we’ll look at competitor/production data.

The Document model allows for flexible and powerful encoding of decline curve models.

MongoDB provides a powerful and scalable general purpose database system. TIBCO provides tested and forward thinking general purpose analytics platforms for both streaming and data at rest. They also provide great infrastructure products which isn’t in focus in this blog. provides the domain knowledge and we infuse our proprietary algorithms and data structures for solving common analytics problems into products that leverage the TIBCO and MongoDB platforms.

We believe that these two platforms can be combined to solve innumerable problems in the technical industries represented by our readers. TIBCO provides the analytics and visualization while MongoDB provides the database. This is a powerful marriage for problems involving analytics, single view or IOT.

In this blog, I want to dig into a specific and fundamental problem within oil and gas and how we leveraged TIBCO Spotfire and MongoDB to solve it — namely Autocasting.

What is Autocasting?

Oil reserves denote the amount of crude oil that can be technically recovered at a cost that is financially feasible at the present price of oil. Crude oil resides deep underground and must be extracted using wells and completion techniques. Horizontal wells can stretch two miles within a vertical window the height of most office floors.

For those with E&P experience, I’m going to elide some important details, like using “oil” for “hydrocarbons” and other technical nomenclature.

Because the geology of the subsurface cannot be examined directly, indirect techniques must be used to estimate the size and recoverability of the resource. One important indirect technique is called decline curve analysis (DCA), which is a mathematical model that we fit to historical production data to forecast reserves. DCA is so prevalent in oil and gas that we use it for auditing, booking, competitor analysis, workover screening, company growth and many other important tasks. With the rise of analytics, it has therefore become a central piece in any multi-variate workflow looking to find the key drivers for well and resource performance.

The DCA Wrangler provides fast autocasting and storage of decline curves. Actual data (solid) is modeled using best-fit optimization on mathematical models (dashed line forecast).

At the heart of any resource assessment model is a robust “autocasting” method. Autocasting is the automatic application of DCA to large ensembles of wells, rather than one at a time.
But there’s a problem. Incumbent technologies make the retrieval of decline curves and their parameters very difficult. Decline curve models are complex mathematical forecasts with many components and variation. Retrieving models from a SQL database often requires parsing text expressions. And interacting with many tables within a database.

Further, with the rise of unconventionals, the fundamental workflow of resource assessment through decline curves is being challenged. Spotfire has become a popular tool for revamping and making next generation decline curve analysis solutions.

Autocasting in Action

What I am going to demonstrate is a new autocast workflow that would not be possible without the combined performance and capability of MongoDB and Spotfire. I’ll be demonstrating using our DCA Wrangler product – which is one of over 250 analytics workflows that we provide through a comprehensive subscription.

Its important to note that software exists to decline wells and database their results. People have even declined wells in Spotfire before. What I hope you see in our new product is the step change in performance, ease-of-use, and enablement when you use MongoDB as the backend.

What’s Next?

First, we have a home run solution for decline curves that requires a MongoDB backend. In the near future, more vendor companies will be leveraging Mongo as their backend database.

Second, I hope you see the value in MongoDB for storing and retrieving technical data and analytic results, especially within powerful tools like Spotfire. Plus, how easy it is to set up and use.

And Lastly, I hope you get excited about the other problems that can be solved by marrying TIBCO with MongoDB – imagine using Streambase as your IOT processor and MongoDB as your deposition environment. Or even store models and sensor data within Mongo and use Spotfire to tweak model parameters and co-visualize data.

If you’re interested in learning more about our subscription, get registered today.

Let’s make data great again.

Part 6 – Visualization Properties

This is the sixth and final part of a series on Spotfire Properties.  In previous posts, I discussed Document PropertiesData Table PropertiesColumn PropertiesData Connection Properties, and Data Function Properties.  This week, we’ll take a look at Visualization Properties properties.

Visualization Properties

To begin, each and every Spotfire visualization has it’s own visualization properties dialog controlling what is possible.  Basically, if it’s not in visualization properties, it can’t be done.  I am sure you have noticed, the dialog changes with each visualization based on the content and functionality of the vis.  In the course of this post, I will explain which ones are common across all visualizations and provide a few “pro” tips.

Common Visualization Properties

When writing this blog post, I decided to create a matrix showing which submenus appear in each visualization properties dialog.  This seemed like a good idea when I started.  Halfway through the assembly, I started to question my motives and the utility of such a matrix.  In the end, the result surprised me. You can download the DXP with this matrix, and I have posted a screenshot below.

Visualization Properties Summary

As it turns, out only three menus are common across all visualization properties — General, Appearance, and Fonts.   After these menus, Data, Legend,  and Show/Hide Items are the most common.

Most Common Visualization Properties Menus


Pro Tips

Next, I promised a few pro tips.

  • First, if you ever wonder what’s possible in a given visualization, consult this matrix.  For example, if you want to put Labels on a visualization but don’t see a way to do that, check to see if there is a Labels menu.  If you don’t see a Labels menu, you can’t put Labels on the visualization.
  • Second, always check the Appearance menu for your visualizations, especially if they are new to you or you have gone thru an upgrade.  The Appearance menu usually contains little gems for beautifying visualizations.  I have seen several new options appear there in the last few upgrades.
  • Third, don’t perform formatting in the Formatting menu.  Instead, format in Column Properties or Tools –> Options.  Formatting via this menu is generally the most inefficient way to apply formatting, unless you have one off needs.
  • Fourth, if you aren’t familiar with these menus, I highly recommend checking them out.  They are very useful.  I have a blog post on using the Line Connection, and I’ll update with posts on Error Bars and Show Hide soon.
    • Line Connection —
    • Error Bars — Error bars are used to indicate the estimated error in a measurement or the uncertainty in a value. Bar charts and line charts can display vertical errors, as indicated in the matrix.
    • Show/Hide — Allows you to restring content.  For example, if you have a bar chart with wells on the X-Axis and production on the Y-Axis, you can ask Spotfire to show only the top 10 producers.  Similarly, you could ask Spotfire to hide the bottom 10 producers.
  • Lastly, the same is true for fonts.  Don’t use the Fonts menu.  Go through Themes.

Additional Settings

In conclusion, I want to point out that a few visualizations also contain Settings menus.  Settings menus are used when the vis has individual, configurable components.  For example, the maps menu also contains a Setting menu for each Layer.  Graphical Tables contain Settings menus for each element in the graphical table.  A summary of such visualizations appears below.

  • Maps — Layer Settings
  • KPI Tiles — KPI Settings
  • Graphical Tables — Icon/Bullet Graph/Sparkline/Calculate Value Settings


In order to wrap up the series, I want to revisit the original questions I posed in the beginning.

  1. What do all of these properties menus do?
  2. Where can I go to change <insert preference here>? I keep setting <insert preference here> over and over again.  There must be a better or faster way.

The six-part series has addressed the first question.  The second question can be answered with this post on user preferences and administration manager preferences.  I hope you found the series useful.

HTML to Place Button in Center of a Text Area

Hey everyone! Here is a quick and dirty HTML code snippet for those of you who are working on your HTML skills.  Earlier this week, I was building a template with a Text Area containing several buttons.  I wanted to center the buttons in the Text Area, which is really easy to do with HTML.  You can also do this with CSS and <style>, but I’m just going to show you the two HTML versions that I worked with.

HTML Snippet No. 1

In this case, I have wrapped the buttons with <p> and </p> and simply added the align attribute with the “center” value.  Here’s what it looks like. Note, the buttons sit on their own line because they are individually wrapped with <p> and </p>.  <p> is a block level element that always starts on a new line and takes up the full width available (stretches out to the left and right as far as it can in the Text Area).  Compare this to the second code snippet below.


HTML to center buttons 1



HTML Snippet No. 2

In this case, I have placed all buttons inside a <div> container and added the align attribute with the “center” value.  Now, <div> is also a block level element, but in this case, the buttons are not on their own line because they are inside of one container.  They would be on their own lines if I had placed them in individual <div> containers, but I was just seeing what I could do with less code.  Personally, I prefer the first look.

HTML to center buttons 2


I highly suggest reading up on the different block and inline elements.  Understanding what space will be taken up with a given element will help you better design Text Areas.  Please feel free to comment and provide suggestions for better code.  I am still very much working on HTML and CSS.  Thank you!




TERR — Converting strings to date and time

This post explains my struggle to convert strings to Date or Time with TERR.  I recently spent so much time on this that I thought it deserved a blog post.  Here’s the story…

I was recently working on a TERR data function that calls a publicly available API and brings all the data into a table.  I used the function to parse out my row data.  In that function, I used the stringsAsFactors = FALSE argument, and as a result (the desired result), all of my data came back as strings.  This was fine because the API included column metadata with the data type.  As you can see in the script below, I planned on “sapplying” through the metadata with as.POSIXct and as.numeric.  This worked just fine in RStudio, and it also worked for the numeric columns and for the DateTime columns.  However, it did not work for Date and Time columns.  I tried different syntax, functions (as.Date didn’t work either), packages, etc to get it to work and NOTHING!  The struggle was very real.

Script convert strings to Date or Time with TERR


Finally, I Googled the right terms and came across a TIBCO knowledge base article with this information….

Spotfire data functions recognize TERR objects of class “POSIXct” as date/time information. As designed, the Spotfire/TERR data function interface for date/time information does the following:

– Converts a Spotfire value or column whose DataType is “Date”, “Time” or “DateTime” into a TERR object of class “POSIXct”.

– Converts a TERR object of class “POSIXct” into a Spotfire value or column with a DataType of “DateTime”, which can then be formatted in Spotfire to display only the date (or to display only the time) if needed.

This interface does not use any other TERR object classes (such as the “Date” class in TERR) to transfer date/time information between Spotfire and TERR.

That told me that all my effort was for naught, and it just wasn’t possible.  I contacted TIBCO just to make sure there wasn’t some other solution out there that the article was not addressing.  In the end, I just used a transformation on the Date and Time columns to change the data type.  I hope that you, dear Reader, find this post before you spend hours on the same small problem.  I did put in an enhancement request.  Fingers crossed.  Please let me know if you have a better method!



Part 5 – Data Function Properties

This is the fifth part of a series on Spotfire Properties.  In previous posts, I discussed Document PropertiesData Table PropertiesColumn Properties, and Data Connection Properties.  This week we’ll take a look at data function properties.  Next week, the series will conclude with Visualization Properties.  Data function properties is a bit of a beefier subject because all data functions can be created out of the data function properties menu.

What is a Data Function?

Since I am writing this for the new Spotfire user, let’s start with the basics. What is a data function, and why would you need one? A data function is TERR code written to perform a specific task within Spotfire. What kind of tasks you ask? I’ll come back to that in just a sec. First, I want to define TERR. TERR stands for TIBCO Enterprise Runtime for R. TERR is the Spotfire version of open source R. R is a programming language for statistical computing and graphics. The R language is widely used among statisticians and data miners for developing statistical software and data analysis (Wiki). 
As it relates to Spotfire, TERR expands upon the functionality of the application. For example, is about to post a template that uses TERR to identify duplicate records (link coming soon).  You can use TERR to manipulate data, pass data thru to new tables, and expand upon the statistical and predictive modeling tools that come with the application. For example, Spotfire doesn’t have a Random Forest model, but you can build one with TERR.  The only limits are really your own programming skills. 
Now, I’ve already written a blog post on TERR basics, which includes an explanation of all the TERR screens.  Most of what is in that post I would have also covered here.  Therefore, I am going to use this post to expand on the different options in the screens, rather than walking through a single use case.  Lastly, also has a few templates such as this template and this template to get you started.    

Data Function Basics

Let’s review the basics of Data Functions.  Creating data functions is a 3 step process.
  1. Create the script
  2. Create the parameters
  3. Run the script to map the parameters to the data in the DXP
Key concepts to be aware of:
  • The script is the “meat” of the data function. Within the script, you’ll find at least one input and one output parameter.  The simplest R script I’ve ever written is output <- input.  Input is the input parameter, and…yeah, you can finish that sentence I bet.  
  • TERR (and R) are object-oriented languages, which means programmers can create objects within the code, assign values to the objects and then reference the object down the road rather than all the values.  This makes programming easier. In the example above, input and output are both objects.
  • Input and output parameters tell Spotfire what type of object to work with.  The object could be a table, column, document property or another object. 
  • Running the script triggers the dialogs where you will map the parameters to the actual data in the DXP.
  • Data functions can be connected to marking and filtering.  For example, you can pass the results of marking or filtering to a new table.
  • Users may create data functions from scratch in Spotfire, or users may import data functions from the Spotfire library or another file.
  • By default, data functions embed within the analysis. However, users have the ability to save them to the library for reuse or sharing.
With that said, here is a very simple script that I’ll reference throughout the post.  This single line of R code that will check for duplicates in a data set using two identifiers that define the granularity of the table. The output is a column called duplicate.

duplicate <- duplicated(data.frame(key1, key2))

duplicated and data.frame are TERR (or R) functions
duplicate, key1, and key2 are objects in the script
key1 and key2 are input parameters
duplicate is the output parameter

Data Function Properties Main Screen

This dialog lists all data functions in the DXP.  From this screen you can …
  • Create new data functions
  • Edit existing data functions and their parameters
  • Refresh data functions
  • Delete data functions
  • Save data functions to the library
  • Export a data function
Data Function Properties
Related to the buttons above:
  • The term “Register New” can be a bit confusing to new users.  This really means create a new data function.  In the process, you’ll have the option to save it in the library or register it.  
  • Clicking the Edit Script button will let you modify the script or the input and output parameters.
  • Clicking the Edit Parameters button allows you to change the mapping of data from the parameters to the DXP content.

Script & Parameters

These three dialogs define the script, input parameters, and output parameters.  Because the TERR Basics post covers them in detail, I want to focus on the different options available to the input and output parameters.  
Data Function PropertiesData Function PropertiesData Function Properties

Input and Output Parameters

When creating input and output parameters, there are three options — table, value, and column.  ‘
  • If your input or output is an entire table, choose Table.  I use this option when I am simply passing a limited data set from my original table to a new table.
  • If your input or output is a single column, choose Column.  The script shown above for identifying duplicates uses a Column output.  The data function creates a column called “duplicate”.
  • If your input is a hard-coded value or a document property, choose Value.

Data Function Properties

Data Function Properties

Run the Data Function

After you have entered the script, input parameters, and output parameters, the next step is clicking the Run button.  If Spotfire asks if you want to save the data function to the library, you can say no.  It will not impact your DXP.  This is simply to give the option to save the data function to the library so others may access it.  As an administrator, I ask users NOT to do this because it clutters up the library.  It is also hard to know what a given data function is for or if it even works.

Input Parameters

Anyway, this is the step in the process where you map the parameters to the content of the DXP.  Let’s tackle the inputs first.  I have intentionally added two unnecessary parameters to demonstrate that the options for input handlers depend on the type of input parameter.  Each input parameter type has different options.

  • For Column type, there are three options — Column, Expression, and None.  The most common input handler is Column, which I have used in data functions that manipulate or calculate based on a specific column of data.
  • For Value type, there are six options — Value, Document property, Data table property, Column property, Expression, and None.  I most frequently use Document property.
  • For Table type, there are three options — Columns, Expression, and None.  You can tell Spotfire to work with a subset of the columns in the table by using the Select Columns button. Alternatively, typing “*” in “Search expression” will use all columns in a table.  It’s not visible in the screenshot shown, but just below the “Search expression” section, you will also find options to connect the contents of the table to marking or filtering.  This is explained in the TERR Basics post.

I do want to note that I have never used the None option in either input or output handlers.  If someone has, please tell me about it in Comments.

Data Function Properties

Data Function Properties

Data Function Properties


Output Parameters

Now, for outputs, it is also true that the options presented differ depending on the parameter type.  As you can see, Column, Value, and Table all have different options.

  • The Column and Table Type have the same four options — Data table, Columns, Rows, and None.  Use Data table if you are creating an entirely new table.  Set the type to Columns if the output is a column that should be added to another table.  Use Rows if you are adding rows to a table.
  • In Value Type, there are six options — Data table, Columns, Rows, Document property, Data table property, Column property, and None.  The same advice is true of outputs here as for inputs.

Data Function Properties Data Function PropertiesData Function Properties


As I was writing this, I realized that if I were creating a data function that output rows, I’m not sure which type I would use.  The options for adding rows are part of both the Column and Table Type.  Setting up a Column type to insert rows seems counter-intuitive.  I just haven’t had to write this type of data function yet.  If you know, please Comment!

Hopefully, explaining some of the common uses of the different types of input and output parameters will help you better understand TERR function and how to convert R code to TERR.  Thanks!






Part 4 – Data Connection Properties

This is the fourth part of a series on Spotfire Properties.  In previous posts, I discussed Document Properties, Data Table Properties, and Column Properties.  This week we’ll take a look at Data Connection Properties.

Data Connections

When I first started working with Spotfire, I didn’t know what a data connection was.  Thus, I didn’t understand the dialog at all or what it did.  So, let’s first talk about what a data connection is.

When the application opens, there are four options for adding data tables, as shown below.  If you added a data table using the last option, the Data Connection Properties dialog will be relevant.  If you used one of the first two options, you do not have any data connections in the DXP, and you don’t need to worry at all about the Data Connections dialog.

Data Connection

When you add a data table with a data connection, you are making a direct connection to the data source (such as SQL databases).  This type of connection will usually require Windows authentication or some type of username and password.  Many companies do not allow Spotfire users to make this type of connection because it hits the database directly, which can impact resources and other processes.  This is a meaty subject, but for now, just understand that connecting with a data connection to a database is different from connecting to a database with an information link.

Data Connection Properties

The Data Connection Properties dialog opens up to the screen shown below where you have the option to Rename the connection, go to Settings, or Refresh the data.  I highly recommend coming into this screen after adding a data connection to rename the connection.  Why?  Well, when you import data from a data connection, most users name the table, but they don’t know they can also name the connection.  When the DXP has many connections, they can get confusing quickly, and so it’s a best practice to rename them.  In this case, the delete button is grayed out because the DXP only has one table from one connection, so deleting it isn’t an option.

Data Connection Properties


If you click on the Settings button in Data Connection Properties, a new dialog box appears with three tabs.  By default, the data connection is embedded in the analysis, but you can also choose to save the connection in the library for other users.  This dialog also has an option to replace the data connection.

Settings – General

From within the General tab, you can see the views and tables associated with the data connection.  There are two actions you can take — Edit the data connection and Refresh the Schema.  Prior to version 7.11, clicking on the Edit button was the only way to edit the data coming in.  However, this type of editing is now available in the Data Panel as well.  Refreshing the Schema will simply ping the data connection to show you any changes to the schema.  This could be new tables, new views, or other changes to the database.

Data Connection Properties General

Settings – Data Source

The data source tab shows all of the information about the data connection itself.

Data Connection Properties Data Source

Settings – Login

Clicking on the Settings button will bring up another dialog with the details of the connection and two more tabs.  You can edit the data connection in the Login tab by clicking the Edit button.  For example, if you were working with a data table in a “dev” or “uat” instance, you can change the connection to the “prod” instance.  As long as everything is the same, the tables are easily swapped out without any loss of inserted columns, transformations, or calculations.

Data Connection Properties Credentials

Settings – Credentials

The Credentials tab is very important, perhaps the most important dialog for Data Connections.  There are three ways credentials may be retained in Spotfire.

  1. No, do not save any credentials — This means that each time a new user attempts to open the DXP, they will need to enter their own database credentials in order to update the table.  This can be problematic depending on how free your company is about handing out database credentials.
  2. No, but save credentials profile — This will save a credentials profile, which consists of a profile name, a username, and a password (only the profile name is saved with the connection data source).
  3. Yes, save credentials… — This can be risky, but I have used this option before when had to use a data connection and had many users without direct database access.  In this case, we setmup a generic account (rather than an account for each user).  As the administrator, I set up the DXP, entered the username and password.  Spotfire saved it, and the users didn’t need to know either username or password, and the file worked.

The third option does not appear in the screenshot below because I opened the data connection with Windows authentication, rather than SQL Authentication.  You would see this option when it is relevant.

Data Connection Properties Settings

Settings – Cache Settings

Finally, the Cache Settings tab specifies when data should refresh.  How you configure these settings will vary depending on…

  • The frequency of data updates
  • The number of users using the direct connection
  • The volume of data being loaded

Spotfire provides three options.

  1. No, always get fresh data from the external source.  Use this option if you do not want to cache data from the data connection.  This may put a very high load on the database.
  2. Yes, but let the cached data expire.  Use this option to cache data but refresh if the cached data is older than a specified limit.
  3. Yes, but let the cached data expire every (specify interval).  Use this option to cache data but refresh at a specified time or interval.

Data Connection Properties Settings

Hopefully, you have a better understanding of what Data Connections are and what options you have to configure them.  The next post will look at Data Function Properties.

Part 3 – Spotfire Properties – Column Properties

This is the third part of a series on Spotfire Properties.  In previous posts, I discussed Document Properties and Data Table Properties.  This week we’ll take a look at Column Properties.

Column Properties

As the name indicates, the Column Properties dialog deals with properties for columns in a data table.  Now, that explanation seems clear and simple, but when making the switch from Excel to Spotfire, there are a few things that get lost in translation.  That’s okay.  We’ll work thru it in this post.

Column Properties contains 5 tabs.  No matter which tab you click on, the data table drop down and the list of column names will always be visible.  You will also always see the Insert and Delete buttons.  The Insert button allows you to insert a calculated column, a binned column, or a hierarchy column.  This might be preferable to using the Insert menu because you can insert more than one calculated column at a time without exiting the dialog.

Additional features of this main dialog include:

  • Type to search — This is almost always faster than scrolling, and you can use wildcards (“*”).  For example, if I want to search for any column of data with the word date, I would search “*date*”.
  • Click to sort — Users may also click on the column headers to sort.  I frequently use this when I create new information links, and I need to check for Currency data types.  I click the Column Type header and can quickly see if I have any Currency columns.
  • Metadata — There is a ton of information about each column.  See below for more detail.

Column Properties Header


Meta Data

As you can see from the screenshot, there is a ton of information about each column.  This is too much to cover in one post, so I’ll just go over the most important information.

  • Column Type — There are 9 column types — None, Imported, Calculated, Binned, Frozen, Result, Tags, Mask, and Hierarchy.  I’ve never seen “None” listed as a column type in any of my projects, but the Help menu says it’s an option.  It is important to note that if you create a calculated column with a transformation, the type reflects “Imported”, not “Calculated”.
  • Data Type – Self Explanatory
  • External Name — Users can and do change column names in the General tab.  The ExternalName shows the column name from the source.  I’ve found this to be very handy from time to time when I lose track of changes or am working on someone else’s project.
  • IsValid — Each column has a True or False value, but is the value is only relevant for calculated columns.  If the value is True, the column expression is valid.  If the value is False, the column expression is not valid.  False is bad.  That means the calculation is not working.  Mostly the user deleted a column used in the calculation.  Excel simply returns a #NA, but Spotfire returns a NULL, and the calculation becomes invalid.  These types of errors are hard to spot unless you know to review IsValid.
  • Origin — This metadata specifies the data table the column originates from, which is very important because this information is not provided elsewhere.  If you review Data Table Properties, Source Information, the dialog will list columns ignored, not columns added.  This isn’t terribly helpful.  However, you can easily insert column by reviewing the Origin column.

Next, let’s look at the General tab.

Column Properties Metadata


The general tab is used for two things.

  1. Changing column names
  2. Updating calculated, binned, or hierarchy columns

You may also Freeze columns here, but I have yet to encounter a scenario where I want to freeze a column.  Freezing columns locks the column and disallows editing. The Help menu says this function was created for scenarios where “you want to save the result from a calculation to prevent it from being overwritten or for performance reasons”.  However, freezing columns also embeds the table, which I rarely want.

Column Properties General



The Formatting tab is self-explanatory, but I want to point out a few things.

  • Formatting columns of data are faster than formatting each and every visualization.
  • Spotfire has a short number format that takes up very little space on visualizations.  It’s handy, and I use it a lot.
  • Users may select many columns of data (of the same data type) and format them at the same time.
  • Users can apply formatting from one column to other columns so the same steps aren’t repeated unnecessarily.

Column Properties Formatting


The properties dialog lists all Column Properties in a DXP and then notes whether a particular column is part of that property.  Go to the next screenshot for an example.

Column Properties Properties1

In this example, I have created a drop down property control called MyDropDown.  It has 4 columns in it.  Note, the drop down is a Document Property.  By default, Spotfire will place all columns from a data table in the drop down unless you create a Column Property to limit the columns.  In this case, I created a Column Property called MyDropDownOptions restricting the drop down to only four columns.  Review the previous screenshot to see MyDropDownOptions listed in the Properties tab.  The value is set to True for c.Total BOE.  It would be False for Well Name, Prod Date, etc.
Column Properties Properties2

Column properties can be added through this menu or also through the Information Designer.  When using the Information Designer, the column property is added directly to the Column Element (in the case of information links).


The geocoding dialog allows the user to specify that a column contains geographic information.  That information may be used for positioning data on a map.  This post is already quite long, so I won’t to get into geocoding details.

Column Properties Geocoding

Sort Order

When placing a categorical column of data on a visualization, such as a bar chart, the data sorts itself in the “Standard sort order”.  However, users often want to change this order, and that is possible here in Column Properties using the “Custom sort order”.

Column Properties Sort Order


Final Notes

Before I wrap up Column Properties, I also want to note that you can create Column Properties for individual column elements in the Information Designer.  That is a broad subject, and I’ve only done a few of these, so I’m not going to dive into it here.  However, this blog post on creating information links for spatial data provides two examples of adding column properties to the column element.  I would really like to learn what else is possible in this space.

And that wraps up Column Properties!  Next, I’ll discuss Data Function Properties.