The financial crisis, as Silver conceives it, was a chain of prediction failures, from homeowners making mistaken predictions about rising housing prices to the economists in the Obama administration initially making mistaken predictions about the depth and the duration of the recession. Though there were some exceptions, economic forecasters generally failed to see the crisis coming. Following the lead of one forecaster who got it right, Jan Hatzius of Goldman Sachs, Silver focuses on the limits in understanding causal relations in the economy, the difficulties in grasping economic change, and the poor quality of much of the data that go into forecasts.

Another source of overreach is that a discipline may simply not have developed to the point where it can provide the basis for accurate predictions. The history of earthquake prediction is littered with failures. Although scientists can make forecasts about the probability that a quake will hit an area over a period of years or centuries, they do not have the ability to predict exactly when and where an earthquake will strike. Nor is there any sign that the field is close to that goal.

Weather forecasting offers Silver an instructive contrasting case. Although the weather system is extraordinarily complex, the basic science is well-understood, and the accuracy of forecasts has improved significantly in recent decades as a result of increased computer power and more sophisticated models. Weather forecasting also illustrates how human judgment can either improve or worsen the accuracy of forecasts, depending on an organization’s incentives. At the National Weather Service—one of the least appreciated services that the federal government provides—meteorologists make use of both computational power and their own judgment to arrive at forecasts. The meteorologists’ judgments, according to data from the agency, result in precipitation forecasts that are about 25 percent more accurate, and temperature forecasts that are about 10 percent more accurate, than the computer results alone. Improved forecasts have real human benefit; the National Hurricane Center can now do what it was unable to do twentyfive years ago: make predictions three days in advance of where a hurricane will make landfall that are accurate enough to enable people to evacuate in time.

But some weather forecasts are less accurate because of judgments by forecasters who have other motivations. Most Americans do not get weather forecasts directly from the National Weather Service; they receive them instead from commercial services such as AccuWeather and the Weather Channel, and also from local TV stations, which tailor the government data to their own needs. According to Silver, the forecasts of the National Weather Service are well-calibrated: when it forecasts a 40 percent chance of rain, it actually does rain 40 percent of the time. The Weather Channel, in contrast, has a slight “wet bias.” When there is a 5 percent chance of rain, it says that the odds are 20 percent because people are angry at a forecaster when they leave home without an umbrella and get soaked, whereas they are delighted when they take an umbrella and the sun shines. But it’s at the local level that forecasts get seriously distorted. According to a study of Kansas City stations, the local weathermen provide much worse forecasts than the National Weather Service and are unashamed about it. (“Accuracy is not a big deal,” one of them said.) Again, television does not reward accuracy because ratings come first.

The role of judgment in forecasts also comes up in Silver’s discussion of the field where he initially made his reputation. Professional baseball scouts, he now concludes, are more like the National Weather Service’s meteorologists than the local weatherman: the scouts’ judgments contribute to more accurate evaluations of baseball players. A decade ago, when Michael Lewis wrote Moneyball, statheads and scouts eyed each other with suspicion verging on contempt. That was the world of baseball that Silver entered a few years out of college when he turned a childhood interest (“as an annoying little math prodigy, I was attracted to all the numbers in the game”) into a professional career. The statistical model that he developed, called pecota, was not the first to project players’ performances, but he says it had its advantages for a while, until others caught up with its innovations. Revisiting his old predictions of how well minor-league prospects would do in the major leagues, he finds that his system did not perform as well as the more traditional list produced by Baseball America. The old Moneyball battle is over. Supplement a statistical model with trained judgment, and the result is improved predictions.

Silver’s acceptance of judgment in baseball and weather forecasts reflects his more general outlook. The hero of his book is Thomas Bayes, the eighteenth-century English author of a classic theorem showing how to revise a prior estimate of the probability of an event on the basis of additional information. The difficulty that some statisticians have with a Bayesian approach is that it demands an exercise of judgment in establishing “priors,” whereas the dominant “frequentist” approach in statistics seems more objective because judgment plays no role in it. Silver’s discussion of these issues is rather one-sided, and he veers into a personal attack on R.A. Fisher, one of the central figures in the frequentist tradition. But he is right to argue that the demand for judgment as a point of departure in Bayesian thinking has a real merit in forcing the analyst to take background and context into account.

The Bayesian approach, Silver writes, “encourages us to hold a large number of hypotheses in our head at once, to think about them probabilistically, and to update them frequently when we come across new information that might be more or less consistent with them.” The foxes in Tetlock’s study do this on their own; aggregating the judgments of different forecasters is another way of getting at the same objective. Despite all the problems with economic forecasts, for example, aggregated judgments are more accurate than individual ones. According to Silver, the Survey of Professional Forecasters—a quarterly poll produced by the Federal Reserve Bank of Philadelphia—is “about 20 percent more accurate than the typical individual’s forecast at predicting GDP, 10 percent better at predicting unemployment, and 30 percent better at predicting inflation.”

When Silver turned to election forecasting, he made use of that insight. Aggregating the results of election polls is a way not just of increasing the sample size but also of aggregating the judgments that went into the polling. Yet not every field can profit from that strategy; aggregating earthquake predictions would not do any good. Fortunately, predicting elections is more like predicting the weather than predicting earthquakes. The basic science seems reasonably well-understood. Although elections do not require models that are nearly as complex as the weather, these are two of the fields where forecasting has made substantial progress.

But as Silver’s book highlights, that is not true generally of the forecasting fields, where overconfidence and overreach are the more common pattern. And there lies much of our problem with the uses of forecasting. Many predictions carry more weight than they deserve. In the struggle over the federal budget, we are officially bound to specific—and often arbitrary—numbers produced by economists and statisticians at the Congressional Budget Office. The markets respond nervously to the day’s economic forecasts. One advantage in reading Silver is that if the latest numbers are keeping you up at night, his work may calm your nerves. Your prior judgment may have more value than you realize.

