If Nate Silver was the winner of last year's presidential election, then Gallup was the loser. The long-time leader in polling consistently showed Mitt Romney ahead, sometimes by a decisive margin. And then President Barack Obama won by 3.9 points—5 points off of Gallup’s final margin. In response, Gallup launched an investigation headed by the University of Michigan’s Michael Traugott, a highly regarded survey methodologist who investigated polling errors in the 2008 Democratic primaries. The initial report, released yesterday, shows Gallup is on the right track to correcting the problems that bedeviled it in 2012.
Gallup’s issues were evident long before the 2012 election, since Obama’s approval rating was consistently lower in Gallup’s poll than other surveys throughout his presidency. In June 2012, The Huffington Post’s Mark Blumenthal investigated Gallup’s methodology and found that the poll underrepresented minorities by neglecting to weight its sample to Census targets and by employing a peculiar battery of yes-no questions about race and ethnicity—the former, but not the latter, were among the fixes Gallup made in early October. Even though those fixes improved Obama’s approval rating among all adults and brought it into line with other pollsters, Gallup continued to tilt decidedly—and, as we now know, inaccurately—toward Romney in the election.
The fault still lay with how Gallup handled race and ethnicity. In pre-election polls, non-Hispanic whites represented as much as 80 percent of Gallup’s electorate, even though other polls showed whites representing 74-76 percent of likely voters. Since Gallup was now weighting its sample of adults to Census targets, Gallup’s likely voter model screen got the blame. The Obama campaign released a memo documenting Gallup poll's failings, including its emphasis on self-reported interest in the campaign, past voting behavior, and familiarity with voting procedures in their precinct, which privileges long-time residents that tend to be older and white. Gallup’s likely voter model was also blamed with contributing to the poll’s volatility, including in previous elections.
At one point in mid-October, Gallup showed Romney leading by a massive 7-point margin—distorting the polling averages and lending credibility to an alternative reality where Romney was heading for a landslide. Although the gap between Gallup and the other pollsters closed over the campaigns' final days, Gallup was the only pollster, among those conducting live interviews with voters on cell phones, that showed Romney winning heading into Election Day.
In March, Blumenthal identified an additional issue: voters with unlisted landlines. In April 2011, Gallup shifted from contacting voters by random digit dialing to randomly selecting voters from lists of phone numbers. Research from Democracy Corps found that the small group of voters with unlisted landlines—just 2 percent of likely voters who could be contacted by randomly dialing a series of digits, but not by randomly drawing from lists of phone numbers—supported Obama by a wide 22-point margin, 58-36. This only has a negligible effect on the overall result, especially since weighting for race and ethnicity helps to compensate, but it’s yet another peculiarity that might explain Gallup’s overall problem.
Gallup’s report largely conceded these criticisms—and a little more. After conducting experiments scrutinizing every element of their survey design, Gallup identified four main problem areas: their likely voter model; under-representation, within Census regions, of the Eastern and Pacific time zones versus the Central and Mountain Time zones; the use of a series of yes/no questions to measure race; and, the use of listed landline sampling instead of random digit dial samples. Of these, all but the time-zone problem were identified by Mark Blumenthal.
The time-zone issue is a novel one, and more important than it might seem. Like most polls, Gallup weights by Census region (Northeast, South, Midwest, West), but does not weight within each region. Their investigation found that the Mountain Time Zone was overrepresented compared to the Pacific Time Zone within the West, while the Central Time Zone was overrepresented compared to the Eastern Time Zone in the Midwest and South. Since the Pacific and Eastern Time Zones are more Democratic than the Mountain and Central Time Zones, Republicans wind up doing better than they ought to. Indeed, the report found that “[s]tricter controls on interviewing by time zone within regions would most likely have increased Obama’s margin over Romney in Gallup’s election tracking by […] at least one percentage point in the final Nov. 1-4 sample.” One important question is whether this is an issue confronting all pollsters, or just Gallup. Pew’s methodology, for instance, makes no reference to controlling for time zones. It's possible that this issue helps explain the persistent gap between state and national polls.
Gallup says that it has already implemented remedies for three of the four problems raised by the report, including switching to random digit dialing, controlling for time zones within regions, and adopting a more traditional battery of race questions. But Gallup hasn’t yet decided how to change its likely voter model, which indeed needs improvement. In pre-election polls, Obama did 3 points better among registered than likely voters, but the difference was four points in the Gallup poll. Again, one percentage point might not seem like much, but it adds up. In particular, it’s clear that Gallup’s likely voter model excluded too many black voters. The combination of Gallup’s likely and registered voter screens resulted in blacks representing a smaller share of the electorate than their share of the adult population, even though the combination of ineligible, unregistered Hispanic adults and high enthusiasm among blacks ought to guarantee that the black share of likely voters was higher than their share of adults. Pew's likely voter model appeared to work—Gallup's did not.
Some adjustments seem obvious. For instance, the report states that Gallup’s questions are “more weighted” toward past voting than other surveys, and acknowledges that the poll would have been more accurate if they had removed the question about whether voters “thought” a lot about the election. Those changes might narrow, eliminate, or even reverse the difference between Gallup’s 4-point gap and the 3-point gap in other surveys after the switch from likely voters to registered voters.
But was the “average” of three points accurate? That’s an important, unsolved mystery—and it’s why Gallup hasn’t yet adopted changes to its likely voter model. Here’s the problem: Nearly all of the likely-voter surveys underestimated Obama’s support, and we don’t know if that’s because the likely-voter screen was excluding too many Obama supporters, or if the underlying sample of registered voters was too good for Romney.
There's a case that the likely-voter screens were to blame. Certainly, elements of the screen, like asking about interest in the election, were problematic. But the average of registered-voter surveys actually nailed the result, raising the question of whether there was any difference between Obama’s standing among likely and registered voters. Additionally, the Current Population Survey found almost no difference in the racial composition of registered and likely voters. If Obama’s share of the vote among registered and likely voters was actually about the same, we’d have to rethink which polls were most accurate, and, suddenly, Gallup’s 3-point Obama lead among registered voters wouldn’t look so bad.
That said, there probably was a gap between likely and registered voters in 2012, even if it’s not clear whether it was 1, 2, or 3 points. The CPS didn’t show a big difference in the racial composition of registered and actual voters in 2008, but likely voter surveys were still more accurate than registered voter polls. And more generally, it seems difficult to imagine that Obama’s young and diverse coalition matched the turnout of Romney’s older, whiter base. But the bottom line is that there wasn’t an election of registered voters, so there’s no authoritative answer on the size of the gap between registered and likely voters. As a result, Gallup plans to conduct experiments in the upcoming Virginia gubernatorial contest to help calibrate its likely voter model going forward. We’ll see what they find.
The most important Gallup news might not have been any innovative finding from the report, but a continued commitment to transparency. Gallup deserves considerable credit for releasing their raw data through the Roper Center for Public Opinion archives, which allowed Blumenthal to diagnose Gallup’s failings early on. Blumenthal and Ariel Edwards-Levy reported Tuesday that Gallup would make the raw data from their experiments and future surveys available through the Roper Center, which will allow analysts to confirm that their changes have yielded improvements.
Indeed, Gallup’s adjustments have reduced the "house effect"—the tendency of a pollster to favor Republicans or Democrats—of their tracking poll of the president’s approval rating. Similarly, the appendix to this week's report confirmed that the racial composition of all adults on Election Day was far closer to Census targets than their polls from earlier in 2012, although minorities were still under-represented. Gallup’s most recent round of changes have already brought the firm's sample of adults even closer to CPS targets, according to the report, although it’s unclear by how much.
The details in the report suggest that those additional improvements might be substantial: a point from weighting within regions, a point from eliminating the likely voter question about “interest,” and unknown, additional gains from coming closer to CPS targets for race and ethnicity and returning to random digit dialing samples—all could contribute to a Gallup comeback. The survey is extremely well-funded and has huge samples, especially over a multi-week period. They’ve brought in a highly regarded team of survey methodologists, and their commitment to transparency means that analysts should ultimately be able to confirm that their efforts have paid dividends. It just might be enough to restore confidence in America’s oldest polling firm.