Clean Your Excel Data with the Data Interpreter

Roger Hau
el Mayo 25, 2016

Excel data is often formatted for human consumption in Excel. Extraneous headers, crosstabs, footnotes—all of these things have to be tediously removed prior to analysis. Well, the Excel Data Interpreter was built to reduce that tedium, and we put it through its paces in this post.

We’ll start with marital data from the Department of Defense, and I’ll show you how quickly we can clean this crosstab for Tableau analysis, rebuild it, then revisualize it. The first sheet contains the data I need, but look at its formatting:

There’s a header, subtext, and a footer. And the author split the data into multiple sheets. Reformatting the data for Tableau analysis is straightforward, but would require a lot of copying and pasting.

When I load the raw Excel sheet into Tableau, I can see that Tableau couldn’t find the headers. And there are lots of nulls where the header subtext was.

After enabling the Data Interpreter, though, we can see the header junk disappear, and the correct headers are detected.

So what’s happening behind the scenes? We have a large corpus of test files that we’ve collected internally, and built up a giant decision tree for determining what data to keep and exclude. If you click “review the results” in the left pane, you can see how the Data Interpreter is classifying the rows of your data.

The formatting and merged cells have been stripped out, giving you a bare-boned inside look at how Tableau now sees your cleaned data. You’ll also notice that Tableau found a number of sub-tables for your cleaned sheets.

You can use these sub-tables just like any other table. However, you can also modify their boundaries in case Tableau guessed incorrectly. “Review the results” also shows the default sub-table boundaries:

Since our decision tree is driven with heuristics, it’s impossible for Tableau to be perfect in all scenarios. However, if you find bugs and contribute data, our algorithms will improve!

The Data Interpreter has saved a lot of copy-paste time in Excel, but do we have to make a separate data source for each of these sheets? Nope! With the Union feature we introduced in Tableau 9.3, we can just combine all the other sheets together, and they will be cleaned prior to the union:

With the cleaned and unioned data set, you can make vizzes like this one, which shows the number of service men and women and their varied marital statuses.

Hopefully this was a fun, inside look at one of Tableau’s data-prep features. If you have a great example of how the Data Interpreter cleaned your data, feel free to share it with us. And of course, if you have an example where you think the Data Interpreter could do better, let us know so we can take a look!

Bonus: Meet the devs behind this feature

Agregar nuevo comentario