Why Political Instability Forecasts Are Less Precise Than We’d Like (and Why It’s Still Worth Doing)

From my own (admittedly narrow) experience, I get the impression that consumers of forecasts of political instability are usually looking for two things, in descending order of importance: 1) to avoid being blindsided by crises to which they might have to respond, and 2) to prioritize cases for interventions now that might help to prevent (or to produce) those surprises in the future.

To help those consumers achieve those goals, forecasts have to do a very good job distinguishing the situations where crises are going to occur from the ones that will remain stable. A single “surprise” in a high-impact case can be very costly, and lots of false alarms make it difficult to distribute preventive (or proactive) resources in an effective way. Practically speaking, these concerns mean that consumers are usually looking for forecasts that never fail to warn of impending crises while only generating a very short list of cases to watch. In technical terms, what we want is a forecasting system that produces no false negatives and only a very small number of false positives.

Unfortunately, that kind of precision is pretty much impossible to achieve, and that near-impossibility is not just a function of the complexity of human social behavior (as if that weren’t enough). As Kaiser Fung nicely shows in Chapter 4 of his book Numbers Rule Your World, the same difficulties bedevil detection systems in many fields, from testing for illegal drug use or rare diseases to the polygraph tests used to screen people for security clearances. Imprecision is an unavoidable consequence of the fact that major political crises are rare, and it is extremely difficult to predict rare events of any kind as sharply the consumers of instability forecasts would like.

An example helps to show why. Imagine that there are 150 countries worldwide, and that five of those countries suffer onsets of civil war each year. That translates to an annual incidence (or onset rate) of 3% (5/150 = 0.03) — slightly higher than the average rate of civil-war onset in the real world over the past several decades, but close, and in round numbers that make the ensuing calculations easier to follow.

When an event is rare, it’s easy to make predictions that are very accurate by simply saying that the event will never happen. In our civil-war-onset example, that blind forecast would give us an impressive accuracy rate of 97%. Every year, we would make 145 correct predictions and only miss five. Of course, those predictions would not be very useful, because they wouldn’t help us at all with our #1 goal of avoiding surprises.

To do better, we have to build a system that uses information about those countries to try to distinguish the ones with impending civil wars from the ones that will remain peaceful. Now let’s imagine that we have done just that with a statistical model which estimates the probability of a civil-war onset in every country every year. To convert those estimated probabilities into sharp yes/no predictions, we need to set a threshold, where values above the threshold are interpreted as predictions that the event will occur and values below it are interpreted as predictions that it will not. To think about how useful those predictions are, statisticians sometimes identify the threshold that produces equivalent error rates in both groups (events and non-events, or positives and negatives) and then use the accuracy rate that results from that “balancing” threshold as a summary of the model’s predictive power.

Now, back to our example. Let’s imagine that we’ve developed a model of civil wars that sorts countries into high-risk (predicted onset) and low-risk (predicted non-onset) groups with 80% accuracy when that balancing threshold is employed–on par with the state of the art in annual forecasts of political instability today. Using that model, the high-risk group would include 33 countries each year: 80% of the five onsets (four countries), and 20% of the 145 non-onsets (29 countries). Of those 33 countries identified as high risk, only four (12%) would actually experience a civil-war onset; the other 29 would be false alarms, or what statisticians call “false positives.” Meanwhile, one of the five civil-war onsets would occur in the set of 117 countries identified as low risk.

If you’re a decision-maker looking at that list of 33 high-risk countries and trying to decide how to allocate resources in an effort to prevent those wars or mitigate their effects, you are probably going to find the length of that high-risk list frustrating. The large number of high-risk cases means you have to spread your preventive actions thinly across a large group, and the one conflict your warnings miss could prove very costly as well. Your odds of hitting impending crises with your preventive efforts are much better than they would be if you picked targets at random, but they’re still not nearly as focused as you’d like them to be.

Now let’s imagine that some breakthrough — an improvement in our statistical methods, an improvement in our data, or an improvement in our understanding of the origins of civil wars — carries us to a new model that is 95% accurate at that balancing threshold. In light of the complexity of the processes generating those events and the (often poor) quality of the data we use to try to forecast them, that’s an achievement I don’t expect to see in my lifetime, but let’s consider it anyway for the sake of illustration. At 95% accuracy, we would largely have solved the problem of false negatives; only once in a great while would a war break out in a country we had identified as low risk. Meanwhile, though, our high-risk group would still include 12 countries each year, and only five of those 12 countries (42%) would actually suffer war onsets. In other words, we would still have more false positives than true positives in our high-risk group, and we would still have no way of knowing ahead of time which five of those dozen countries were going to be the unlucky ones. The imprecision is greatly reduced, but it’s hardly eliminated.

When the resources that might be used to respond to those warnings are scarce, those false positives are a serious concern. If you can only afford to muster resources for serious preventive actions in a handful of countries each year, then trying to choose which of those 33 countries — or, in an unlikely world, that dozen — ought to be the targets of those actions is going to be a daunting task.

Unfortunately, that uncertainty turns out to be an unavoidable product of the rarity of the events involved. We can push our prediction threshold higher to shrink the list of high-risk cases, but doing that will just cause us to miss more of the actual onsets. The problem is inherent in the rarity of the event, and there is no magic fix. As Kaiser Fung puts it (p. 97), “Any detection system can be calibrated, but different settings merely redistribute errors between false positives and false negatives; it is impossible to simultaneously reduce both.”

If this is the best we can do, then what’s the point? Well, consider the alternatives. For starters, we might decide to skip statistical forecasting altogether and just target our interventions at cases identified by expert judgment as likely onsets. Unfortunately, those expert judgments are probably going to be an even less reliable guide than our statistical forecasts, so this “solution” only exacerbates our problem.

Alternatively, we could take no preventive action and just respond to events as they occur. If the net costs of responding to crises as they happen are roughly equivalent to the net costs of prevention, then this is a reasonable choice. Maybe responding to crises isn’t really all that costly; maybe preventive action isn’t effective; or maybe preventive action is potentially effective but also extremely expensive. Under these circumstances, early warning is not going to be as useful as we forecasters would like.

If, however, any of those last statements are false–if responding to crises already underway is very costly, or if preventive action is (relatively) cheap and sometimes effective–then we have an incentive to use forecasts to help guide that action, in spite of the lingering uncertainty about exactly where and when those crises will occur.

Even in situations where preventive action isn’t feasible or desirable, reasonably accurate forecasts can still be useful if they spur interested observers to plan for contingencies they otherwise might not have considered. For example, policy-makers in one country might be rooting for a dictatorship in another country to fall but still fail to plan for that event because they don’t expect it to happen any time soon. A forecasting model which identifies that dictatorship as being at high or increasing risk of collapse might encourage those policy-makers to reconsider their expectations and, in so doing, lead them to prepare better for that event.

Where does that leave us? For me, the bottom line is this: even though forecasts of political instability are never going to be as precise as we’d like, they can still be accurate enough to be helpful, as long as the events they predict are ones for which prevention or preparation stand a decent chance of making a (positive) difference.