Conflict Events, Coup Forecasts, and Data Prospecting

Last week, for an upcoming post to the interim blog of the atrocities early-warning project I direct, I got to digging around in ACLED’s conflict event data for the first time. Once I had the data processed, I started wondering if they might help improve forecasts of coup attempts, too. That train of thought led to the preliminary results I’ll describe here, and to a general reminder of the often-frustrating nature of applied statistical forecasting.

ACLED is the Armed Conflict Location & Event Data Project, a U.S. Department of Defense–funded, multi-year endeavor to capture information about instances of political violence in sub-Saharan Africa from 1997 to the present.ACLED’s coders scan an array of print and broadcast sources, identifiy relevant events from them, and then record those events’ date, location, and form (battle, violence against civilians, or riots/protests); the types of actors involved; whether or not territory changed hands; and the number of fatalities that occurred. Researchers can download all of the project’s data in various formats and structures from the Data page, one of the better ones I’ve seen in political science.

I came to ACLED last week because I wanted to see if violence against civilians in Somalia had waxed, waned, or held steady in recent months. Trying to answer that question with their data meant:

  • Downloading two Excel spreadsheets, Version 4 of the data for 1997-2013 and the Realtime Data file covering (so far) the first five months of this year;
  • Processing and merging those two files, which took a little work because my software had trouble reading the original spreadsheets and the labels and formats differed a bit across them; and
  • Subsetting and summarizing the data on violence against civilians in Somalia, which also took some care because there was an extra space at the end of the relevant label in some of the records.

Once I had done these things, it was easy to generalize it to the entire data set, producing tables with monthly counts of fatalities and events by type  for all African countries over the past 13 years. And, once I had those country-month counts of conflict events, it was easy to imagine using them to try to help forecast of coup attempts in the world’s most coup-prone region. Other things being equal, variations across countries and over time in the frequency of conflict events might tell us a little more about the state of politics in those countries, and therefore where and when coup attempts are more likely to happen.

Well, in this case, it turns out they don’t tell us much more. The plot below shows ROC curves and the areas under those curves for the out-of-sample predictions from a five-fold cross-validation exercise involving a few country-month models of coup attempts. The Base Model includes: national political regime type (the categorization scheme from PITF’s global instability model applied to Polity 3d, the spell-file version); time since last change in Polity score (in days, logged); infant mortality rate (relative to the annual global median, logged); and an indicator for any coup attempts in the previous 24 months (yes/no). The three other models add logged sums of counts of ACLED events by type—battles, violence against civilians, or riots/protests—in the same country over the previous three, six, or 12 months, respectively. These are all logistic regression models, and the dependent variable is a binary one indicating whether or not any coup attempts (successful or failed) occurred in that country during that month, according to Powell and Thyne.

ROC Curves and AUC Scores from Five-Fold Cross-Validation of Coup Models Without and With ACLED Event Counts

ROC Curves and AUC Scores from Five-Fold Cross-Validation of Coup Models Without and With ACLED Event Counts

As the chart shows, adding the conflict event counts to the base model seems to buy us a smidgen more discriminatory power, but not enough to have confidence that they would routinely lead to more accurate forecasts. Intriguingly, the crossing of the ROC curves suggests that the base model, which emphasizes structural conditions, is actually a little better at identifying the most coup-prone countries. The addition of conflict event counts to the model leads to some under-prediction of coups in that high-risk set, but the balance tips the other way in countries with less structural vulnerability. In the aggregate, though, there is virtually no difference in discriminatory power between the base model and the ones that at the conflict event counts.

There are, of course, many other ways to group and slice ACLED’s data, but the rarity of coups leads me to believe that narrower cuts or alternative operationalizations aren’t likely to produce stronger predictive signals. In Africa since 1997, there are only 36 country-months with coup attempts, according to Powell and Thyne. When the events are this rare and complex and the examples this few, there’s really not much point in going beyond the most direct measures. Under these circumstances, we’re unlikely to discover finer patterns, and if we do, we probably shouldn’t have much confidence in them. There are also other models and techniques to try, but I’m dubious for the same reasons. (FWIW, I did try Random Forests and got virtually identical accuracy.)

So those are the preliminary results from this specific exercise. (The R scripts I used are on Github, here). I think those results are interesting in their own right, but the process involved in getting to them is also a great example of the often-frustrating nature of applied statistical forecasting. I spent a few hours each day for three days straight getting from the thought of exploring ACLED to the results described here. Nearly all of that time was spent processing data; only the last half-hour or so involved any modeling. As is often the case, a lot of that data-processing time was really just me staring at my monitor trying to think of another way to solve some problem I’d already tried and failed to solve.

In my experience, that kind of null result is where nearly all statistical forecasting ideas end. Even when you’re lucky enough to have the data to pursue them, few of your ideas pan out. But panning is the right metaphor, I think. Most of the work is repetitive and frustrating, but every so often you catch a nice nugget. Those nuggets tempt you to keep looking for more, and once in a great while, they can make you rich.

Leave a comment

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: