Data Prepping and Data Cleaning in Tableau Explained

Sophie Sparkes and Florian Ramseger
on May 27, 2016

As part of #SpringDataCleaning month, we hosted a live webinar on some of the data-preparation and data-cleaning features of Tableau. Here a short recap of what we've covered. We've attached all the resources we used during the webinar to this post (check out the links at the bottom of the page) so you can try all the techniques shown in the webinar. We'll start with a recording for those who missed the live training or would like to rewatch parts of it.

The Webinar

Data Format (click to watch from 6:11)

First, we we went through the whole process from connecting to data to publishing a viz on Tableau Public. Having data that is formatted in the way that Tableau likes is crucial to building any viz using Tableau. Tableau likes data to be structured with each different field/variable in your data set to be in its own individual column and each unique data point in each row, with all the column headers being in the first row.

For example, in our first data set from Wikipedia, all the countries are in one column, all the cities are in one column, and all population size is in one column. You can also see that each row is a unique data point (for example, our first data point which starts in the second row is the city Tokyo in Japan). Having data formatted like this makes your data set look tall and thin, a bit like a skyscraper. If your data is formatted differently, see how you can pivot the data set around using Tableau's pivot function (for more information see the pivot section below).

Data Interpreter, Part I (click to watch from 13:14)

We then learned how the data interpreter can be used to clean a messy data set. By simply switching on the data interpreter we can strip out titles, footnotes, and empty cells, as well as fix column headers that run across several rows.

Data Interpreter, Part II (click to watch from 17:25)

The data interpreter can also be used to extract sub-tables from spreadsheets that have several tables on them. You often find data sets from government websites have multiple tables within the same worksheet all containing information about the same topic.

Joins (click to watch from 23:45)

We then looked at how we can merge different data tables to create a larger data set. We started with joins which add new columns to a data set based on a common variable between both tables.

Understanding joins

Unions (click to watch from 35:29)

Next up were unions, which are a way of appending a data set with another table that has the same column headers than the first one, thereby increasing the number of rows of the data set.

Getting to grips with Unions

Splits (click to watch from 40:38)

In the example with the unions, we also used the split function to split off a new column from an existing column, breaking off the year from an existing column name.

Pivots (click to watch from 43:19)

Finally, we looked at pivots, which is a way to stack columns on top of each other so as to create a single variable to work with.

How to pivot

Reshaping from Tall to Wide (not covered in the webinar)

A popular question we didn't cover during the webinar is: How do you go in the other direction than a pivot? For completeness, here the executive summary:

The trick is to create a calculated field for each category of your dimension that you want to use to break up the data into two or more columns. In the example below, you would need the following two calculated fields:

Tall to Wide
Tall to wide in action

We hope you found these helpful. Stay tuned for more in-depth webinars on different topics. And do let us know which topics you'd like us to cover.

Download PowerPoint slides
Download data sets