How Well Do Advanced Defensive Statistics Correlate?
by John Dewan
We’ve put a lot of effort into improving defensive metrics in recent years, but how much progress have we really made? In the introduction to The Fielding Bible—Volume III, I said:
“For hitters, we might be at the 85-90 percent mark of being able to measure offense. We have a lot of good tools like OPS (on-base plus slugging), Runs Created, Wins Above Replacement. For pitchers, we are not quite as far along. Maybe we’re at the 75 percent level of understanding pitcher effectiveness with our numerical tools like ERA, Batting Average on Balls in Play, and Opponent OPS. For defense, ten years ago we were probably around the 10th percentile. Now with three volumes of The Fielding Bible under our belts, plus the work of many other excellent sabermetricians, we are probably in the 60-70 percent range.”
In our book, The Fielding Bible—Volume III, we put our newest defensive analytics to the test. If our statistics are measuring something meaningful, we would expect them to correlate well from year to year. In other words, since Evan Longoria topped all third basemen with 20 Defensive Runs Saved in 2010, we would expect him to remain one of the league’s top defenders at the position in subsequent seasons. (Longoria saved an estimated 22 runs in the field in 2011, also a league-leading total.)
To measure the consistency of our Defensive Runs Saved numbers, we calculated what we’ll call Even/Odd Year Correlations. We added each fielder’s Runs Saved totals from 2006, 2008, and 2010 and compared to the subtotal from 2007, 2009, and 2011, with the requirement that the fielder have amassed at least 667 innings in both subsets. We would expect the players with higher totals in even years to also have high totals in odd years, while players with low totals in even years should also tend to have low totals in odd years.
By calculating the correlation coefficient of the even and odd year totals, we can measure just how consistent our statistics are. Correlation coefficients range from -1.0 to 1.0 and show relationships between two sets of numbers. A correlation coefficient of 1.0 represents a perfectly predictable relationship. For instance, if every fielder had the same number of Runs Saved in both even and odd seasons, that would produce a correlation of 1.0. On the other hand, a correlation coefficient of zero means that there is no measurable relationship, while a correlation coefficient of -1.0 signifies an inverse relationship between the sets of numbers.
Defensive Runs Saved produced an Even/Odd Year Correlation of .59. This high, positive correlation value indicates a strong relationship between even and odd season totals and a good consistency in measuring fielders’ value. But, how does this compare to traditional hitting and pitching statistics?
Even/Odd Year Correlation Coefficients for Commonly Cited Statistics
|Defensive Runs Saved||
As you can see, both batting average and ERA also produce high positive Even/Odd Year correlations, though Defensive Runs Saved correlates better than both. (We used a minimum of 150 innings or 500 at bats in both subtotals for pitching and hitting statistics, respectively, although the correlations didn’t change much when we adjusted the minimum cutoffs in either direction.)
Comparing our defensive analytics to batting average and ERA, which have been the staples of analytics in baseball for the first 100 years of its existence, we find that our Defensive Runs Saved system is a better way to measure defense than are batting average to measure offense and ERA to measure pitching.
Of course, we now have more advanced measures of hitting and pitching performance. Let’s see how well a few other statistics correlate between even and odd seasons.
Even/Odd Year Correlation Coefficients for Additional Statistics
|Pitcher Strikeouts per 9 Innings||
|Pitcher Walks per 9 Innings||
Home runs correlate at .83, indicating a very strong correlation between even and odd seasons. OPS correlates at .69, and Opponent OPS, which for me is the most important pitching statistic, correlates at .61.
We are at the point where our defensive analytics are nearly as reliable as offensive and pitching analytics. Just looking at the single best statistic in each: OPS is .69, Opponent OPS is .61, Defensive Runs Saved is .59. We’ve come a long way.
Used with permission from John Dewan’s Stat of the Week®, www.statoftheweek.com