State of the Union, in Words

on February 12, 2013

For those living in the US, a new year includes a new State of the Union address from the president. Tonight, President Obama will continue that tradition.

As data enthusiasts, we naturally have a viz to go along with it. This viz shows the top 100 words used in every State of the Union speech, going back to George Washington. Select a president to see what their popular words were, and click on a specific word to see how its usage changed over time.

This viz comes from Tableau Public authors Dan Huff and Andrew Hill. I've made a few modifications to fit into our blog. It's also in 8.0 beta, and includes a new feature: word clouds. (You may need to view the viz in Firefox, Chrome or Safari).

Here are some findings from me:

  • Based on the top 100 words in each speech, overall speech length has actually decreased in recent decades (barring a few spikes). Looking at the full data confirms this.
  • "American" shot to popularity in the early 1900s, but "America" has really started climbing in the past few decades.
  • "United States" was a much more common term in the early and mid decades of the US' history.
  • Throughout history, it's common to emphasize the country as a collective ("we", "our", etc.).

What do you see? Will you be watching the State of the Union this year, and what do you think President Obama will speak about?


The only problem with this viz is that "missing values" should have been represented as zero. These line charts are misleading when a word occurred zero times in some years, because the connection lines imply that the instance of a word smoothly increased from one year to another, whereas what actually happened is that instances of the word dropped to zero in the intervening years.

Fantastic visualization, Dan & Andrew.
What tool did you use to process/generate the word counts?
I'd like do something similar for trending survey verbatims over time.


Here is the process we used:

Copied and pasted every speech from the Presidency Project into Excel
Removed all punctuation with the exception of dashes and apostrophes
Made all the words lowercase
This then left us with a ton of rows with each row containing a number of words
We then used the Data to Columns functionality of excel with a space as my delimiter to break up the rows into columns of words
Andrew and I then recorded a macro to pivot the data so that it had the following format:

President Date Word
Obama 1/1/2008 school
Obama 1/1/2008 is
Obama 1/1/2008 good

From this point, we just had to bring this into Tableau and use Number of Records to get counts of the words.

So simple :D

Actually, the only real challenge here was pivoting such a massive amount of data, hence the macro. I have a feeling if you are visual basic literate (which I am not), this would be a lot simpler.