2012 WAS A good year for numbers in American presidential politics, but it also highlighted a collective squeamishness about statistics. By mid-October, New York Times number-cruncher Nate Silver found himself in a modern-day version of seventeenth-century Salem, with a long line of pitch-fork wielding poll-doubters accusing Silver of magic and wizardry.
Silver claimed he wasn’t doing anything special. It’s just math, he told his detractors. If anything, Silver might not have gone far enough. In general, it doesn’t take any math more complicated than addition to reach the same conclusions as Silver’s FiveThirtyEight model. Moreover, the math generally does not lead to major surprises: scroll through the old polling averages on RealClearPolitics, and you won’t find too many instances when the polls and the FiveThirtyEight results diverge.
While there was widespread appreciation for the accuracy of polling-based models after they nailed election results, many people still don’t understand why or how these models work. Some individuals seem perpetually determined not to understand, like MSNBC’s Joe Scarborough or Politico’s Dylan Byers. But others are probably just bewildered. They assume that Silver was being modest when he said his model wasn’t rocket science.
Charles Wheelan might be Nate Silver’s best new advocate—Naked Statistics is a well written, surprisingly funny, and enthusiastic primer on statistics. Instead of equations and practice problems, Wheelan proceeds from the assumption that statistics are intuitive, and, consequently, can be taught through examples with minimal intimidation.
Losing the formulas and the practice problems has downsides—it’s unlikely that very many readers will remember how to calculate a standard error after reading about a bus full of overweight sausage-convention attendees driving through Boston on the same day as the marathon. (And, yes, that’s a real example.) But the narrative- and example-based approach succeeds at easily communicating statistical concepts, even if the reader is unlikely to ever use most of them.
The absence of hard math is Wheelan’s central point: statistics should be accessible. Consequently, the book starts from the most basic descriptive statistics, like averages, before tackling more daunting theories. The concept of the standard error, for instance, might seem frightening: it’s the standard deviation of the sample means. (I realize that even this simple explanation probably sends many people to Google.) Yet most would recognize that a bus full of passengers with an average weight of 194 pounds probably isn’t heading for a marathon where the average participant weighs 162 lbs. While most might recognize that instinctually, there is a numbers-based method to achieve the same conclusion. If you have already sampled a few buses worth of confirmed marathon runners and determined their average weight, you can calculate the probability that a bus full of passengers with an average weight of 194 pounds is also heading for the marathon.
For every example explaining a new concept, Wheelan provides just as many examples of statistics getting misused. Wheelan might be at his best in arming the reader with a reasonably comprehensive set of deceptive or just poor uses of statistics—whether it’s using a mean when a median is more appropriate or selection bias in a study. At the very least, the volume of bad examples should instill readers with skepticism the next time they hear an especially provocative statistic. That’s important, since the data that gets the most attention are often the most surprising, and the most surprising statistics are often the most deceptive.
With data-driven analysts gaining newfound fame, there’s a growing mythology about the power of statistics. Some appear to be convinced that statistics possess nearly unlimited explanatory power, at least if the data is good enough. Some even resolved that the Obama campaign’s data-driven approach is responsible for the president’s reelection, even though there isn’t any public data to support that conclusion.
Sometimes, good data can’t provide the whole answer, or even the best answer. Sabrematricians have failed to improve upon traditional scouting methods in minor league baseball, despite great data. Public opinion polls nailed the presidential election, yet thousands of votes and district-level data can’t make similar predictions about the behavior of Congress in the upcoming debt-ceiling fight. To the extent that growing segments of the media and public have become convinced that statistics can yield dispositive answers to difficult problems, it may be increasingly necessary to offer as much attention to the types of questions that statistics can’t answer as the ones that statistics can answer well.
The growing mythology around data-driven analysis isn’t just a testament to recent high-profile successes; it’s also the result of a lack of knowledge. While it makes a strong case for the overall relevance of statistics, Naked Statistics isn’t promoting data mythology; it is providing a nuanced and useful assessment. The book is actually at its best when discussing the limits of statistics, and perhaps especially the flawed Wall Street models that contributed to the financial crisis. Stripped down, accessible, even amusing, it is hard to imagine a more accessible introduction to a field with an undeserved reputation for inaccessibility.
Nate Cohn is a staff writer at The New Republic. Follow @electionate