The New Box-and-Whisker Plot

Robb Tufts
on November 22, 2013

{This is the third blog post in the #TableauStatsMonth series, and is a guest post by Tableau Public author Robb Tufts. Robb gives his reasons for choosing the Box-and-Whisker plot and then walks you through the process of creating one and adding it to an interactive dashboard which you can see at the end of the post.}

Writing a weekly hockey stats column that focuses mainly on data visualization to tell the story can be a challenge. Designing a new visualization each week, without repeating myself or relying on scatterplots, can be difficult even when using Tableau. Most recently I wanted to tell the story of the St. Louis Blues Alexander Steen and his rise to the top of the NHL goal scorers this season. Hello Tableau Public 8.1 and the new box-and-whisker plots!

Why Box-and-Whisker Plot?
Box-and-whisker plots are typically used to demonstrate distribution of observations. They're also useful for showing outliers, and this is exactly the type of story I wanted to tell. Which players are the outliers (i.e. best) when it came to scoring goals this season, and where did they fall within the distribution of previous seasons?

You rarely see this chart type used in data visualizations for a mass audience. But with some highlighting options, the ability to interact by choosing your player, and a second tab to explain what the box plot means, I was able to create a visualization that told the story of Steen’s early season success while placing it within the context of past performance and overall performance of the NHL each season.

First I created the initial visualization to show each NHL players' goals per 60 minutes of ice time for each season played. I used this stat that is provided by the site BehindTheNet because the “per 60 minutes of ice time” standardizes the number across players whose 5-on-5 ice time might vary from game to game for a variety of reasons. Seasons are dragged out to Columns, G/60 on the Rows shelf, and player names on Details.

Box plots - Step 1

Not very pretty yet, but this is just the first step. Next, I created the box plot. Click on “Show Me” and click on the box plot down in the lower right corner and voilà!

Box plots - Step 2

There are outliers and then there are outliers. Obviously the purpose of this visualization is to show who the outliers are on the upper end because those are going to be the top goal scorers. However, you occasionally get players who only play a handful of games during a season and they skew the results. So let’s filter by the number of games played. I typically use 10 games played for these types of visualizations.

Box plots - Step 3

This is where it starts to get really fun. Typically box plots just show distribution, but by highlighting individual players we can start showing where they fall within the distribution over time. First I created a parameter based on the player name field. Then a quick if/then calculated field to determine if the player name field equals the name being returned from the parameter. If it does, then we label that player as “highlighted”. We drag that field onto the color square and now we can see where Steen has fallen within the distribution each season. Some of you may question my choice of blue and yellow. However, I do write for a St. Louis Blues blog and their colors are blue and gold. If I do not try to incorporate those colors somehow into a viz, I do get some friendly snarks from my readers about it. In this case, I think the color contrast works. (Hint: make sure you sort the player highlight field so “highlight” is on top otherwise your highlighted player might be hidden.)

Box plots - Step 4

I move the viz over into a dashboard because I need to size it for the SBNation site. SBNation does not give me much width to work with so we are going to have to fine tune the viz.
For starters, the circles are a bit too small and the bars and whiskers a bit too thick so they end up covering up some of the data points. Not only are the box plots easy to make, but they are easy to modify as well. You can access their formatting just as you would with a reference line. Right click on the G/60 axis and choose the “edit reference line, band, box” option. I chose a thinner line from the formatting options for both the lines and the borders. I also chose a lighter shade of gray for the boxes in order for the reader to see the data points easier. I resized the circles as well to make them slightly larger.

Box plots - Step 5

Finally, add in the parameter up top and you have your quick analysis. I went ahead and added in the quick filter for number of games played, and a list of teams to the right with a highlight action. Readers can not only choose a player to highlight, but they can also click on a team to highlight an entire team to see where the team’s players fell within the distributions of each season.

Box plots - Step 6

One final touch, is the addition of a tab that describes what each segment of the box plot means in relation to player performance. My goal for the weekly column I write has been to try to make hockey stats and advanced hockey metrics more accessible to the average fan. Box plots can be tricky to explain, and I probably oversimplified my explanation. But it demonstrated that the outliers up top are the top players and each break out below that represented varying degrees of goal scoring skill for that season.

Here's an interactive version:


Thanks a lot for the comprehensive explanation.
I am working on a very similar table and I was wondering how you added the Min - Max game played box and also how did you added the highlight team box to be linked to the result of the Box Plot.

Thanks a lot,

This is very helpful. Any idea if we can have the player to highlight as a filter and not a parameter? I want the flexibility of a user filter. When I use a filter, it skews the box plot, since the chart now shows only 1 data point.

How would I get side-by-side boxplots? If I drag Weeks from Dimensions to columns and Costs to rows and disaggregate the data, I get the individual dots for each week with a single boxplot superimposed across the weeks. Is it possible to do side-by-side boxplots for each week? If so, how?