As a life-long Kansas City Royals fan, I have always been intrigued by the numbers that can be found shadowing every baseball game. As a young child, I can remember following the progression of Royals-great, George Brett, as he approached the 3,000 hits milestone. In middle school, the statistics on the back of baseball cards were my primary way to evaluate the ‘important’ prospective trades I had on the table with my friends. During my freshman year of college, I wrote my first paper on the economics of baseball. As Major League Baseball (MLB) evolved, so did my connection to the statistics measuring the game.
The sport of baseball is particularly special when it comes to stats because it is one of the easiest sports to quantify and things have remained largely unchanged across many generations of fans. The length of the MLB season has been within 8 games (5%) since 1904; moving from 154 games to the current 162-game schedule in 1962. The statistics found on the back of a Topps baseball card in 1952 can still be seen there today. This means you can compare Mickey Mantle to your favorite center-fielder playing today and you’ll have a relatively even baseline.
If you are new to the sport of baseball, you may enjoy this whiteboard explanation from an outside perspective:
If you need a primer on basic baseball statistics and terminology such as hits, strikes, and outs, visit this Baseball for Beginners glossary from PBS.
Integrating numbers into the fan experience has been gaining major ground in recent years. Earlier this month, there was some debate around including advanced baseball statistics in the booth for telecasts on Fox. There has always been a niche group of baseball ‘data-geeks’, but the more nuanced measures are becoming increasingly mainstream. This movement has been led by author and baseball researcher, Bill James, who coined the term used for many advanced baseball statistics, sabermetrics. This is a play on the acronym SABR, which stands for Society for American Baseball Research. This new group of measures, introduced in the 1980’s, aims to predict a player’s future performance. For that reason, sabermetrics are extremely popular in MLB front offices, and the unexpected success of the teams paying attention to them has helped drive adoption.
Here are a few numbers you may find yourself hearing more of in the near future and some statistics you can help people understand through Tableau visualization.
WAR | Wins Above Replacement
(also seen as WARP; Wins Above Replacement Player)
WAR is becoming the standard metric for determining how much a player is worth to their team. It estimates the number of wins each player contributes to their team over how much contribution the team could expect from a “replacement level player”, or a player that the team could acquire for minimal cost, such as a Minor League Baseball player. To put this statistic into perspective, Mike Trout of the Angels was the league-leader in WAR last season, contributing around 10 Wins Above Replacement.
BABIP | Batting Average on Balls In Play
BABIP calculates the batting average of the hitter only for balls that they are able to put into play. A good BABIP would indicate the player is good at not popping or grounding out when they are able to make contact with a pitch and have it land in fair play. Conversely, BABIP is also a good measure for pitchers because it can indicate how good the defense behind them is, and how lucky (or unlucky) a pitcher is. If their opponent’s BABIP is high, the pitcher may be unlucky. As a predictor of future performance, pitchers with a high opponent BABIP can reasonably expect for the performance to adjust back down towards the mean, improving their other pitching statistics, such as opponent’s batting average and ERA (Earned Run Average).
ISO | Isolated Power
I like ISO because it’s the coolest-sounding sabermetric, and, possibly more importantly, it is calculated using a simple formula: extra bases per at bat. Players with high ISO are going to be some of the most fun to watch. Not only do they get the ball in play more frequently than average, but when they do, there’s a higher chance they’re going to earn at least a double.
These are my favorites to be mentioned in a broadcast this season because they are fairly easy to explain, are useful in comparing players, and add some excitement to existing measures of performance. With 87 published statistics from MLB even before getting into the advanced baseball statistics such as sabermetrics, the full list is much too long to include here, but here are several more stats with their simple definitions:
Defense-Independent ERA (dERA):
The pitcher’s ERA after eliminating the impact of their defense and luck.
Equivalent Average (EqA):
A batter’s equivalent average, which attempts to calculate the average after eliminating the impact of the ballpark they are playing in and the difficulty of their league.
Adjusted Earned Run Average (ERA+):
The equivalent of EqA for pitchers, eliminating the impact of the ballpark they are pitching in and the difficulty of their league.
Late-Inning Pressure Situations (LIPS):
Any at-bat in the seventh inning or after for games within three runs.
Quality Start (QS):
For a pitcher, this means that they were able to pitch at least six innings, allowing no more than three runs.
If you are interested in learning more advanced baseball statistics or finding data sources that include a wide variety of stats, Baseball-Reference.com is the gold-standard source for baseball data. You can also download Sean Lahman’s Baseball Database in a Tableau-ready format.
Benchmarking with Tableau Public
As I’m sure most Tableau-enthusiasts are, I’m a huge believer that the key to understanding data is visualization. With the increasing adoption of advanced baseball statistics, we share a great opportunity to leverage Tableau to both teach and learn how these numbers can provide value for baseball fans. Tableau Public is a natural fit with sports because the statistics are public information and the interactivity the software provides allows the user to look up their own favorite teams and players. As I do every time I mention Tableau Public, I encourage you to visit Tableau’s Viz of the Day page and subscribe in the top right corner. This RSS feed will drop a top quality data visualization in your inbox every afternoon. All Tableau Public workbooks can be downloaded so that you can ‘look under the hood’ and see how they were constructed. Here are three examples from my Tableau Public portfolio relating to baseball statistics.
MLB Cost Per Win
Growing up in the third-smallest media market and second-smallest metro area among all cities with MLB, I have always been fascinated by the economics of baseball and what is required for smaller-market teams to compete with teams that can afford higher-priced players. MLB has no team salary cap, so the disparity between the teams with the highest paid players and lowest paid players is typically in excess of $100 Million. To level the playing field, I enjoy following the cost per win statistic each season. Cost per win is a simple formula that takes each team’s total salary divided by their number of wins. I like to share this as an example because I did not invent this statistic, but Tableau allowed me the ability to re-tell an old story in a better way. This concept helped earn me several inbound links from reputable sports sites including Grantland.com.
XVAL Player Comparison Tool
In my next example, Tableau Public was used to help introduce a brand new advanced baseball statistic, XVAL. XVAL was invented by a sports marketing agency called Premier Sports Management. The XVAL statistic combines salary data and sabermetrics to evaluate how much excess value each MLB player provides their team. It’s essentially a cost per win measure at the player level, so of course I was instantly hooked.
Odds of Going Pro in Sports
The tip from my final example is to keep it simple. This has been my most-popular Tableau Public work to date, and I largely credit the success of this viz to its simplicity. The viz asks one question, uses one chart to explain the answer to the question, and has a navigation that is easy to understand and interact with.
As I hope you will agree, Tableau is a great vehicle for dissecting and sharing baseball data. I am always on the lookout for unique sports data visualizations, so please nominate a design for Viz of the Day honors or take part in Tableau Public's Sports Viz Contest this month.