A lot of people don't know this but before I entered the world of Pokemon and Hello Kitty visualizations, I actually did some serious data work. I interned at a thinktank in Arizona helping them do data analysis for their annual Congressional Report Card. We had a monster of an Excel workbook filled with data about how every state congress member voted and how that scored in the metrics that this thinktank defined. Since it's Politics Month, I figured I'd revisit my past a bit and take a look at some national congressional data:
There's a ton you can learn from congressional voting data and at this time of year, when you are deciding whether to keep your congressman or give him the boot, it's pretty important that you know what's going on. The good news is that there are tons of ways to get congressional voting data at various levels of detail. Here's how I got mine to make the viz above.
National Congressional Data
There are a bunch of resources out there to get congressional voting data, but I found two that were super easy to use and helped me make this viz. First, the New York Times has a Congress API that makes it easy to pull congressional data. If you are like me and think APIs are great but still aren't super interested in programming anything, you can use an IFTTT recipe like this one to collect all the data as it comes in and put it into Google Spreadsheets.
I've been running this IFTTT recipe since last year, so go ahead and profit off of my industriousness: here's everything that I've collected so far.
Another awesome site to add to your political data toolkit is the Sunlight Foundation. They've got a bunch of great databases on voting, campaign finance, and more. I used their site OpenCongress.org to get additional data about the bills I had collected data on via IFTTT. I used Import.io to scrape the pages of the bills. To do that, I had to first determine what pages I needed my web crawler to go to. Each bill has it's own page on OpenCongress.org. And they all had a URL similar to this:
As you can see, the number of the bill is right there in the URL. That means I just needed to generate a URL is this format for every bill I wanted to look up. I did a little work in Excel to make that happen. In my dataset from IFTTT, the bill numbers look like "H.R. 3865" but I needed them without spaces or punctuation so that I could use them in my URLs. In order to get them there, I did a series of Find-and-Replace, replacing periods and spaces with nothing, ensuring that there weren't any in those cells.
Next, I needed to get those bill numbers into the URL format that OpenCongress.org uses. To do this, I created a CONCATENATE formula in Excel and used that to insert the bill number into the middle of the URL. By doing this I could quickly get the URLs for several hundred bills in seconds.
Once I had all the individual pages that I wanted to pull data from, I was able to use a crawler on Import.io to pull the Sponsor and Committee information for each bill. For more information on how to use import.io in this way, check out this webinar I did with them a couple weeks ago.
Since I had the URL I created with concatenate function in my original dataset and in the new one created by import.io, I was able to use that as the field to join my two tables together and use the data together in Tableau.
Of course, this is barely scratching the surface of the data you can get from the Sunlight Foundation. I'd love to see a viz that uses the actual individual votes by each member of congress. They have so many great tools for everything from tracking the most popular words and phrases used in Congress to where political fundraising parties are happening.