A Quick Post Mortem on Oscars Forecasting

I was intrigued to see that statistical forecasts of the Academy Awards from PredictWise and FiveThirtyEight performed pretty well this year. Neither nailed it, but they both used sound processes to generate probabilistic estimates that turned out to be fairly accurate.

In the six categories both sites covered, PredictWise assigned very high probabilities to the eventual winner in four: Picture, Actor, Actress, and Supporting Actress. PredictWise didn’t miss by much in one more—Supporting Actor, where winner Christoph Waltz ran a close second to Tommy Lee Jones (40% to 44%). Its biggest miss came in the Best Director category, where PredictWise’s final forecast favored Steven Spielberg (76%) over winner Ang Lee (22%).

At FiveThirtyEight, Nate Silver and co. also gave the best odds to the same four of six eventual winners, but they were a little less confident than PredictWise about a couple of them. FiveThirtyEight also had a bigger miss in the Best Supporting Actor category, putting winner Christoph Waltz neck and neck with Philip Seymour Hoffman and both of them a ways behind Jones. FiveThirtyEight landed closer to the mark than PredictWise in the Best Director category, however, putting Lee just a hair’s breadth behind Spielberg (0.56 to 0.58 on its index).

If this were a showdown, I’d give the edge to PredictWise for three reasons. One, my eyeballing of the results tells me that PredictWise’s forecasts were slightly better calibrated. Both put four of the six winners in front and didn’t miss by much on one more, but PredictWise was more confident in the four they both got “right.” Second, PredictWise expressed its forecasts as probabilities, while FiveThirtyEight used some kind of unitless index that I found harder to understand. Last but not least, PredictWise also gets bonus points for forecasting all 24 of the categories presented on Sunday night, and against that larger list it went an impressive 19 for 24.

It’s also worth noting the two forecasters used different methods. Silver and co. based their index on lists of awards that were given out before the Oscars, treating those results like the pre-election polls they used to accurately forecast the last couple of U.S. general elections. Meanwhile, PredictWise used an algorithm to combine forecasts from a few different prediction markets, which themselves combine the judgments of thousands of traders. PredictWise’s use of prediction markets gave it the added advantage of making its forecasts dynamic; as the prediction markets moved in the weeks before the awards ceremony, its forecasts updated in real time. We don’t have enough data to say yet, but it may also be that prediction markets are better predictors than the other award results, and that’s why PredictWise did a smidgen better.

If I’m looking to handicap the Oscars next year and both of these guys are still in the game, I would probably convert Silver’s index to a probability scale and then average the forecasts from the two of them. That approach wouldn’t have improved on the four-of-six record they each managed this year, but the results would have been better calibrated than either one alone, and that bodes well for future iterations. Again and again, we’re seeing that model averaging just works, so whenever the opportunity presents itself, do it.

UPDATE: Later on Monday, Harry Enten did a broader version of this scan for the Guardian‘s Film Blog and reached a similar conclusion:

A more important point to take away is that there was at least one statistical predictor got it right in all six major categories. That suggests that a key fact about political forecasting holds for the Oscars: averaging of the averages works. You get a better idea looking at multiple models, even if they themselves include multiple factors, than just looking at one.

It’s Not Just The Math

This week, statistics-driven political forecasting won a big slab of public vindication after the U.S. election predictions of an array of number-crunching analysts turned out to be remarkably accurate. As John Sides said over at the Monkey Cage, “2012 was the Moneyball election.” The accuracy of these forecasts, some of them made many months before Election Day,

…shows us that we can use systematic data—economic data, polling data—to separate momentum from no-mentum, to dispense with the gaseous emanations of pundits’ “guts,” and ultimately to forecast the winner.  The means and methods of political science, social science, and statistics, including polls, are not perfect, and Nate Silver is not our “algorithmic overlord” (a point I don’t think he would disagree with). But 2012 has showed how useful and necessary these tools are for understanding how politics and elections work.

Now I’ve got a short piece up at Foreign Policy explaining why I think statistical forecasts of world politics aren’t at the same level and probably won’t be very soon. I hope you’ll read the whole thing over there, but the short version is: it’s the data. If U.S. electoral politics is a data hothouse, most of international politics is a data desert. Statistical models make very powerful forecasting tools, but they can’t run on thin air, and the density and quality of the data available for political forecasting drops off precipitously as you move away from U.S. elections.

Seriously: you don’t have to travel far in the data landscape to start running into trouble. In a piece posted yesterday, Stephen Tall asks rhetorically why there isn’t a British Nate Silver and then explains that it’s because “we [in the U.K.] don’t have the necessary quality of polls.” And that’s the U.K., for crying out loud. Now imagine how things look in, say, Ghana or Sierra Leone, both of which are holding their own national elections this month.

Of course, difficult does not mean impossible. I’m a bit worried, actually, that some readers of that Foreign Policy piece will hear me saying that most political forecasting is still stuck in the Dark Ages, when that’s really not what I meant. I think we actually do pretty well with statistical forecasting on many interesting problems in spite of the dearth of data, as evidenced by the predictive efforts of colleagues like Mike Ward and Phil Schrodt and some of the work I’ve posted here on things like coups and popular uprisings.

I’m also optimistic that the global spread of digital connectivity and associated developments in information-processing hardware and software are going to help fill some of those data gaps in ways that will substantially improve our ability to forecast many political events. I haven’t seen any big successes along those lines yet, but the changes in the enabling technologies are pretty radical, so it’s plausible that the gains in data quality and forecasting power will happen in big leaps, too.

Meanwhile, while we wait for those leaps to happen, there are some alternatives to statistical models that can help fill some of the gaps. Based partly on my own experiences and partly on my read of relevant evidence (see here, here, and here for a few tidbits), I’m now convinced that prediction markets and other carefully designed systems for aggregating judgments can produce solid forecasts. These tools are most useful in situations where the outcome isn’t highly predictable but relevant information is available to those who dig for it. They’re somewhat less useful for forecasting the outcomes of decision processes that are idiosyncratic and opaque, like North Korean government or even the U.S. Supreme Court. There’s no reason to let the perfect be the enemy of the good, but we should use these tools with full awareness of their limitations as well as their strengths.

More generally, though, I remain convinced that, when trying to forecast political events around the world, there’s a complexity problem we will never overcome no matter how many terabytes of data we produce and consume, how fast our processors run, and how sophisticated our methods become. Many of the events that observers of international politics care about are what Nassim Nicholas Taleb calls “gray swans”—”rare and consequential, but somewhat predictable, particularly to those who are prepared for them and have the tools to understand them.”

These events are hard to foresee because they bubble up from a complex adaptive system that’s constantly evolving underfoot. The patterns we think we discern in one time and place can’t always be generalized to others, and the farther into the future we try to peer, the thinner those strands get stretched. Events like these “are somewhat tractable scientifically,” as Taleb puts it, but we should never expect to predict their arrival the way we can foresee the outcomes of more orderly processes like U.S. elections.

Follow

Get every new post delivered to your Inbox.

Join 3,481 other followers

%d bloggers like this: