Howdy! I’m going to be looking at US air traffic delays during the holiday season. More specifically air travel trends in November and December. This is more of a for fun analysis as well as my first real dive into the world of Spotfire. If there’s anything wonky or weird about my analysis, just bear with me! I’ve posted this template on Exchange.ai so feel free to download it here and follow along.
The data I’m using is from a few different sources which I’ve cited at the bottom of this post.
The data set that kicked off this idea was found on Kaggle, a site for data sets and data science competitions. The original set contained air traffic delays for the entire year of 2008 which then led me to an even larger data set with data all the way back to 1987. My analysis is only looking at November/December in 2006 and 2007 so the view is a little narrow. After concatenating the entire data set, the file was almost 11GB which did result in some cool visualizations but it was too large to make a template out of.
This data set has some fun properties to it. It allows you to get an idea of the different types of delays that occur such as weather or security. In Spotfire I joined another data set that I found which contains the latitudes and longitudes of each airport. This allowed me to get a map of each airport in the United States.
In the above visual, the size represents the number of people traveling to each city while the color represents the average departure delay. Right off the bat, you can see the highest traffic ports such as Atlanta, Chicago, and DFW. These guys have an absurd amount of traffic going through them. DFW, for example, has about 46k flights going into the port and Atlanta has a whopping 65k unique flights. For fun I did some napkin math to get an idea of how many people are flying into just Atlanta for these two months. I used a rough average for the number of seats on a plane which was about 200  and about 80-85% of the seats are usually filled . That puts us at something like 166 people if we use 83% of the seats being filled. Which means that Atlanta handled somewhere around 10.8M people in these two months for these years combined. That’s a lot of people for a single airport, and they seem to do a pretty good job at handling it! The average overall delay is about 23 minutes. Chicago (ORD) on the other hand is a little worse off with an average overall delay of about 40 minutes. The overall was calculated by just adding the arrival delay and departure delay for each instance.
The above line graph shows the delay per day. It’s pretty obvious when the holidays occur and end, which was kind of a neat result of visualizing this. To me, the most interesting thing about this graph is the Half Dome peak right before Christmas Eve.
The similarities are striking! I also enjoy how the middle of December sees a giant spike in delays and then a big dip back to normality before the climb to the top.
Interested in digging more into this data? Download the template and play around for some fun visualizations and neat stats. One thing to note is that this template only contains a small subset of the actual dataset that I used. If you want the full set, you’ll have to go to this place and download the zip files. They’re compressed bz2’s so you’ll need a special program to open these such as 7zip or WinRAR. Unless you’re on Linux, then the ole bzip2 -dk in your data’s directory from the shell should be enough. For the longitude and latitudes, I used this site which provides a csv for all airports, not just the USA’s. One thing I found limiting was that the airport data doesn’t contain an airport’s state for those in the United States. Fortunately, I found a site that has this information in table form so you would just need to scrape the site for the relevant information and mess with the data table properties in Spotfire.
Hopefully this post was interesting to you or at the very least an insight into how busy Atlanta is. Thanks for reading and have some happy holidays!
Sources: http://stat-computing.org/dataexpo/2009/the-data.html  https://openflights.org/data.html  https://www.quora.com/What-is-the-average-amount-of-passengers-on-a-plane  https://www.quora.com/How-many-empty-seats-are-there-on-the-average-US-domestic-flight  https://www.kaggle.com/giovamata/airlinedelaycauses