Tag: architecture

There are only two hard things in Computer Science….

One of my favorite quotes is this…

There are only two hard things in Computer Science: cache invalidation and naming things.

— Phil Karlton from Martin Fowler

Even though I am not a computer scientist by any stretch of the imagination, I can identify with the difficulty of naming things.  You see, I just reached a major milestone in a big project.  I’ve spent the last couple of weeks really getting to know the data, focusing on designing the right architecture, and finally building out the functionality.  Now that it’s built, I have one and only one regret.

Before I tell you what that regret is, I want to say having only one regret is a damn fine professional accomplishment.  Usually, I have a bullet list at the end entitled — When I rebuild this, change these things.  In this case, I think what I built is glorious and fit for purpose, but enough basking in the sunshine.

Back Story

Project Summary: Combine the same data set from multiple sources, throw in some extra data for funsies, allow users to remove/filter out bad data, and then provide functionality for three different types of analysis.  The analysis contains 12 raw tables (as opposed to tables generated by data functions or copies of tables).  There is a bit more to it, but that’s the quick and dirty version.

The Regret

My biggest regret in my latest project (drum roll) is not picking and sticking with a consistent column naming convention throughout the project.  Instead, I let the column names come in as they were from each data table.  I have regretted this almost every single day of working on the project.

How Did This Happen

So, how did this happen?  It happened because changing more than a handful of column names in Spotfire is incredibly time-consuming and tedious.  It’s so time-consuming and tedious, I let these excuses win out…

  1. It’s will be fine (biggest lie ever).
  2. It will be easy to change later (that’s also a lie, it won’t be).
  3. I don’t know what to set the naming convention to, so better to do it later than put out something I have to change later (also, not a good reason, sigh).

To be fair, I did have a naming convention for some things, such as calculated columns and transformations, but not for all things, and it is difficult to know at the start of a project all of the things you might need to consider when defining a naming convention.  Anyway, 1 – 3 are all just….

 

 

Why The Regret

So, what were the consequences of not setting a column naming convention from the start?  Well, there are many.

  1. It’s not always easy to tell what a column of data is and/or where it came from.  If there’s one thing I want for this project, it’s for the content to be easy for users to understand. I could have used the naming convention to better explain where data comes from and what the data is.
  2. It just looks messy.  Some column names are all upper case.  Others are lower case.  Some have underscores.  My naming convention for calculations and transformations includes periods.  The OCD part of my brain just can’t handle it.
  3. The project uses data functions, and now I have to pull out spaces and underscores in code.  (See note above….it’s hard to know all the things you’ll need to consider in a naming convention at the start).
  4. I did have to change some column names, so now I have transformations attached to my tables, which isn’t a huge deal in and of itself, but it adds unnecessary complexity.
  5. After changing those column names, bits and pieces of the project broke.  For example, I had to modify all my pivot transformations and columns feeding to and from data functions (i.e. REWORK, which is bad).

How to Rename Columns in Spotfire

There are many different ways to rename columns in Spotfire including….

  1. In the information designer/information link if you have info links
  2. In SQL code if you have data connections
  3. In column properties
  4. With a transformation
  5. With a TERR data function
  6. I assume also with IronPython, although I haven’t tried this route

Each of these methods have pros and cons, some more “cony” than others.  None of them are terribly easy, and some of them aren’t even always an option, like changing the column names in the information designer (if you don’t have admin rights) or writing a TERR data function (if you don’t know TERR/R code).

The Do-Over

So, what would I do if I had to do it all over again?  Well, I would assume I wouldn’t get it right on the first try.  Naming things is an incredibly difficult task, and I’m fairly certain more than one go would be required.  With that in mind, I would use a easy to edit TERR data function.  The other methods, such as modifying the information link and adding in transformations are simply too time-consuming.  Let’s be real.  If I did it the hard way (modify the info link or use transformations), this is what would happen….I take the time-consuming route and do it once, utter a huge sigh of relief, and then cry into my keyboard when I realized I have to do it again.

I know some TIBCO folks read this.  Please, please, please add functionality to the tool that makes it easy to change many column names at one time and not break other parts of the project.  It would dramatically improve the software.  Us Spotfire developers have to compete with things like company column naming standards that don’t always make sense in every project.  The struggle is real.  I swear.

In Conclusion

Take the time to create and execute a standard column naming convention.  It is incredibly time consuming, especially if you want to update it later, but it’s worth the effort.

And now, just because this post was kind of wordy, and you made it to the end….here’s a photo of my dog when he was a puppy and my husband when he was growing a “yeard”.  I like ’em both a whole lot.  You are welcome.

 

 

 

Guest Spotfire blogger residing in Whitefish, MT.  Working for SM Energy’s Advanced Analytics and Emerging Technology team!

MS Access vs SQL for Spotfire

Are you starting to question whether you need to move beyond MS Access into SQL or Oracle?

  • Are MS Access limitations ruining your projects?
  • Are you thinking about connecting to an SQL database via MS Access?
  • Are you running into limitations in Spotfire using MS Access?

Read More

Guest Spotfire blogger residing in Whitefish, MT.  Working for SM Energy’s Advanced Analytics and Emerging Technology team!

Optimal Architecture to Avoid Rebuilding Tables

  • Are you frustrated with having to rebuild Spotfire tables when you need to modify insert columns or make a major change?
  • Would you like to build Spotfire projects so that making modifications to tables is easier?
  • Do your tables have duplicate columns and you don’t know why?

Read More

Guest Spotfire blogger residing in Whitefish, MT.  Working for SM Energy’s Advanced Analytics and Emerging Technology team!

Optimal Architecture for Combining Spotfire Tables

One of the most critical aspects of building Spotfire projects is choosing the optimal architecture.  I recently had the “opportunity” to rebuild the same Spotfire project 3x.  Let’s not worry about why that happened, but instead focus on the fact that I did arrive at the optimal architecture for the project.

In rebuilding the same thing three times, I came up with a few tips and tricks, especially as it relates to scalability.  Now, before I dive into the details, I need to brief you on the project.

The Project

The purpose of the project was to compare the same (allegedly) data from three different sources, and I needed to be prepared for the following:

  • Different wells in each data set
  • Different date ranges in each dataset (ex. Well 1 has production volumes reported for Jan 1, 2017 – Jan 1, 2018 in dataset 1 but only for Feb 1, 2017 thru Dec 1, 2017 in data set 2)
  • Comparing oil, gas, and water volumes across the three data sets
  • Comparing several dates across the three data sets (ex. spud date, completion date, first date of production)
  • The possibility of broadening the analysis to include more points of comparison at a later date (remember this, it will be important).

Furthermore, the analysis needed to be efficient and interactive.  One page per data point comparison was too many pages.  Instead, the design called for one page comparing volumes and one page for comparing dates, with property controls for the user to specify the comparison they would like to see.

Now, if you have been thru the discussion of insert rows versus insert columns, just skip to the end of the post where I have saved the best for last.  If you aren’t certain whether you would use insert rows or insert columns, read on.

Insert Rows versus Insert Columns

What is the optimal method to combine data tables in Spotfire?  Well, the answer is not always straightforward, and there are pros and cons to inserting rows versus inserting columns.  The basics are this…

Insert Rows

Pros

  • SUPER easy and fast.  The easiest of all options. Just match up the columns.
  • Changes? No problem!  Insert row operations are editable in the Data Panel.
  • Scaleable.  Just insert more rows.

Cons

  • Calculations are much more difficult.  Comparing Gas/Oil from one data set to another is not as simple as Column A – Column B.  You must use Over and Node Navigation.
  • Calculations get extra complicated with the scenarios described above (different wells, different periods, missing wells, etc).
  • Cannot remove an insert rows operation (yet).

Insert Columns

Pros

  • Calculations are much easier.

Cons

  • Cannot edit the insert columns operation (yet).
  • Scaleable, but with the risk that IF the underlying tables get more columns, all of a sudden you’ll have columns in your tables that you don’t need or want.
  • Joins can be messy or difficult.
  • The biggest con of this method (and the ultimate reason I didn’t use insert columns) is that you wind up with multiple columns for each value (ex. multiple well name columns).  If your dataset contains a lot of header data, this can be particularly frustrating and time-consuming.  The most common way to deal with it is to create calculated columns with IF logic (or case statements) that look at each column and return a value.  This was a deal breaker.

The Final Answer

So, which method did I choose? In the past, I have gone with insert columns.  However, in the end, I went with an insert rows operation with a pivot transformation.  The insert rows operation allowed me to easily combine the tables in a way that was editable and scalable.  Then, by adding a pivot operation, I was able to quickly create more columns so that the calculations were easier, and I didn’t have multiple columns for a single value.  WIN! If you aren’t familiar with pivot transformations, you can read about them here.  Here is what it all looks like in a very simple one well example.

Data Sets

Pivot

Final Table

(Remember, it’s one well, so there is one record)

Saved the Best for Last

Of course, what I really wanted to share with you I saved for the end.  One of the most critical elements of this build was the fact that it needed to be scaleable.  I needed to be able to add additional points of comparison at a later date.  This can be problematic with insert columns operations.  However, I found a great solution for scalability with insert columns. (Just ignore the fact that I ultimately went with insert rows.  This still good information).

Now, the first step in any project is adding the data tables.  In this case, I used information links, and they were HUGE!  They contained far more columns than I was going to use and a lot of data I wasn’t familiar with.  Therefore, I added in an Exclude columns transformation to exclude all unused columns.   This proved to be incredibly helpful.  For the sake of expediency, I’m going to list out the steps so far….

  1. Add all data tables to the project (completion data table, header table, production summary table)
  2. Insert an Exclude columns transformation on each table to exclude the columns not being used
  3. Create a “master” table by duplicating one of the tables (File > Add Data Table > From Current analysis) — learn more about the master table and why you should exclude columns in this post.
  4. Insert columns from the other tables into the master table

At this point, I have a master table with all of my data.  Now, if you have ever looked closely at the Data Table Source Information tab, you have noticed that they way Spotfire inserts columns is by building a list of columns to ignore rather than columns to insert.  I have never understood why the application is built this way (surely just to frustrate users). In this case, it is EXTREMELY helpful!

Keep in mind, the objective to be scaleable.  If I want to add in data points, all I have to do is modify the Exclude columns transformation and the columns flow in.   Now, because Spotfire builds up a list of columns to ignore rather than a list of columns to add, the newly included columns are not in that list, and so they flow through step 4.  There is no need to perform additional insert columns operations!!!! (I just got a little sheepish when I realized how excited I am about this).  This made the analysis much more scaleable.  I was relieved to not have to perform another insert columns operation for each new data point.  Tragedy averted. DONE.

If you wondered what I was doing up in Step 3, just hang on until next week.  I’ll explain that bit of architecture as well.

Guest Spotfire blogger residing in Whitefish, MT.  Working for SM Energy’s Advanced Analytics and Emerging Technology team!

Better Architecture with Joined Data Tables

Recently we had a client that was having trouble with their tables freezing. They had a SQL database joined directly to an Excel sheet but found that the table would not update and was embedded. These frozen tables can be avoided with proper architecture. It’s best to have sources that are separate and then joined later. You can use the ‘From Current Analysis’ data source option to make a reference to these tables.

Technical Director at Ruths.ai