One of the lessons I think I’ve learned from the nearly 15 years I’ve spent developing statistical models to forecast rare political events is: keep it simple unless and until you’re compelled to do otherwise.
The fact that the events we want to forecast emerge from extremely complex systems doesn’t mean that the models we build to forecast them need to be extremely complex as well. In a sense, the unintelligible complexity of the causal processes relieves us from the imperative to follow that path. We know our models can’t even begin to capture the true data-generating process. So, we can and usually should think instead about looking for measures that capture relevant concepts in a coarse way and then use simple model forms to combine those measures.
A few experiences and readings have especially shaped my thinking on this issue.
- When I worked on the Political Instability Task Force (PITF), my colleagues and I found that a logistic regression model with just four variables did a pretty good job assessing relative risks of a few forms of major political crisis in countries worldwide (see here, or ungated here). In fact, one of the four variables in that model—an indicator that four or more bordering countries have ongoing major armed conflicts—has almost no variance, so it’s effectively a three-variable model. We tried adding a lot of other things that were suggested by a lot of smart people, but none of them really improved the model’s predictive power. (There were also a lot of things we couldn’t even try because the requisite data don’t exist, but that’s a different story.)
- Toward the end of my time with PITF, we ran a “tournament of methods” to compare the predictive power of several statistical techniques that varied in their complexity, from logistic regression to Bayesian hierarchical models with spatial measures (see here for the write-up). We found that the more complex approaches usually didn’t outperform the simpler ones, and when they did, it wasn’t by much. What mattered most for predictive accuracy was finding the inputs with the most forecasting power. Once we had those, the functional form and complexity of the model didn’t make much difference.
- As Andreas Graefe describes (here), models that assign equal weights to all predictors often forecast at least as accurately as multiple regression models that estimate weights from historical data. “Such findings have led researchers to conclude that the weighting of variables is secondary for the accuracy of forecasts,” Graefe writes. “Once the relevant variables are included and their directional impact on the criterion is specified, the magnitudes of effects are not very important.”
Of course, there will be some situations in which complexity adds value, so it’s worth exploring those ideas when we have a theoretical rationale and the coding skills, data, and time needed to pursue them. In general, though, I am convinced that we should always try simpler forms first and only abandon them if and when we discover that more complex forms significantly increase forecasting power.
Importantly, the evidence for that judgment should come from out-of-sample validation—ideally, from forecasts made about events that hadn’t yet happened. Models with more variables and more complex forms will often score better than simpler ones when applied to the data from which they were derived, but this will usually turn out to be a result of overfitting. If the more complex approach isn’t significantly better at real-time forecasting, it should probably be set aside until it does.
Oh, and a corollary: if you have to choose between a) building more complex models, or even just applying lots of techniques to the same data, and b) testing other theoretically relevant variables for predictive power, do (b).