Last week, Hans Noel wrote a post for Mischiefs of Faction provocatively titled “Stop trying to predict the future“. I say provocatively because, if I read the post correctly, Noel’s argument deliberately refutes his own headline. Noel wasn’t making a case against forecasting. Rather, he was arguing in favor of forecasting, as long as it’s done in service of social-scientific objectives.
If that’s right, then I largely agree with Noel’s argument and would restate it as follows. Political scientists shouldn’t get sucked into bickering with their colleagues over small differences in forecast accuracy around single events, because those differences will rarely contain enough information for us to learn much from them. Instead, we should take prediction seriously as a means of testing competing theories by doing two things.
First, we should build forecasting models that clearly represent contrasting sets of beliefs about the causes and precursors of the things we’re trying to predict. In Noel’s example, U.S. election forecasts are only scientifically interesting in so far as they come from models that instantiate different beliefs about why Americans vote like they do. If, for example, a model that incorporates information about trends in unemployment consistently produces more accurate forecasts than a very similar model that doesn’t, then we can strengthen our confidence that trends in unemployment shape voter behavior. If all the predictive models use only the same inputs—polls, for example—we don’t leave ourselves much room to learn about theories from them.
In my work for the Early Warning Project, I have tried to follow this principle by organizing our multi-model ensemble around a pair of models that represent overlapping but distinct ideas about the origins of state-led mass killing. One model focuses on the characteristics of the political regimes that might perpetrate this kind of violence, while another focuses on the circumstances in which those regimes might find themselves. These models embody competing claims about why states kill, so a comparison of their predictive accuracy will give us a chance to learn something about the relative explanatory power of those competing claims. Most of the current work on forecasting U.S. elections follows this principle too, by the way, even if that’s not what gets emphasized in media coverage of their work.
Second, we should only really compare the predictive power of those models across multiple events or a longer time span, where we can be more confident that observed differences in accuracy are meaningful. This is basic statistics. The smaller the sample, the less confident we can be that it is representative of the underlying distribution(s) from which it was drawn. If we declare victory or failure in response to just one or a few bits of feedback, we risk “correcting” for an unlikely draw that dimly reflects the processes that really interest us. Instead, we should let the models run for a while before chucking or tweaking them, or at least leave the initial version running while trying out alternatives.
Admittedly, this can be hard to do in practice, especially when the events of interest are rare. All of the applied forecasters I know—myself included—are tinkerers by nature, so it’s difficult for us to find the patience that second step requires. With U.S. elections, forecasters also know that they only get one shot every two or four years, and that most people won’t hear anything about their work beyond a topline summary that reads like a racing form from the horse track. If you’re at all competitive—and anyone doing this work probably is—it’s hard not to respond to that incentive. With the Early Warning Project, I worry about having a salient “miss” early in the system’s lifespan that encourages doubters to dismiss the work before we’ve really had a chance to assess its reliability and value. We can be patient, but if our intended audiences aren’t too, then the system could fail to get the traction it deserves.
Difficult doesn’t mean impossible, however, and I’m optimistic that political scientists will increasingly use forecasting in service of their search for more useful and more powerful theories. Journal articles that take this idea seriously are still rare birds, especially on things other than U.S. elections, but you occasionally spot them (Exhibit A and B). As Drew Linzer tweeted in response to Noel’s post, “Arguing over [predictive] models is arguing over assumptions, which is arguing over theories. This is exactly what [political science] should be doing.”