How to Trim Tagged Data

Sometimes you’ll get data back that has been tagged. This can be frustrating in Spotfire because at present there is no method for either splitting these apart or delimiting them into a single category.

Consider the following chart, which is actually a breakdown of all the posts on DataShopTalk by author. To get this data I exported from WordPress and did a bit of cleaning to get the table into this format.

I wanted to see what kinds of articles have been posted so I set the y-axis to count(), the x-axis to [Author], and the Color By to Category to get this result:

Too many categories! I just want to get a general idea of the category breakdowns and looking through, I think I just want to truncate to the first category. I need to create a calculated column to parse out that first tag. Here is an example of the script needed to do this:

This is a convoluted query so I’ll break this down. Here are my objectives:

  1. Does it have a comma in it? If it doesn’t then we can skip it.
  2. If it does have a comma, where is it? We can delete everything afterwards.
  3. I want to get just the first characters from the left up to the comma and make that the new Category.

The first part of the script is the IF conditional statement. When FIND returns an integer, if it doesn’t find anything it gives back a 0. This is what we are using to determine whether the category has a comma or not. If the FIND result is greater than zero, we move along to the LEFT command.

LEFT has two arguments, the original string and the number of characters to use. First we place [Categories] as the first argument, then number of spaces it takes to get to the comma. FIND will return this for us again, however, we will need to subtract one in order to trim it.

Here is the code by itself:

IF(FIND(“,”,[Categories])>0,

LEFT(

[Categories], FIND(“,”,[Categories]) – 1),

[Categories]

)

And the final result:

Enjoy!

Leave a Reply

Your email address will not be published. Required fields are marked *