Be Vewy, Vewy Quiet

This blog has gone relatively quiet of late, and it will probably stay that way for a while. That’s partly a function of my personal life, but it also reflects a conscious decision to spend more time improving my abilities as a programmer.

I want to get better at scraping, making, munging, summarizing, visualizing, and analyzing data. So, instead of contemplating world affairs, I’ve been starting to learn Python; using questions on Stack Overflow as practice problems for R; writing scripts that force me to expand my programming skills; and building Shiny apps that put those those skills to work. Here’s a screenshot of one app I’ve made—yes, it actually works—that interactively visualizes ACLED’s latest data on violence against civilians in Africa, based partly on this script for scraping ACLED’s website:

acled.visualizer.20150728

When I started on this kick, I didn’t plan to stop writing blog posts about international affairs. As I’ve gotten into it, though, I’ve found that my curiosity about current events has ebbed, and the pilot light for my writing brain has gone out. Normally, writing ideas flare up throughout the day, but especially in the early morning. Lately, I wake up thinking about the coding problems I’m stuck on.

I think it’s a matter of attention, not interest. Programming depends on the tiniest details. All those details quickly clog the brain’s RAM, leaving no room for the unconscious associations that form the kernels of new prose. That clogging happens even faster when other parts of your life are busy, stressful, or off kilter, as they are for many of us, as as they are for me right now.

That’s what I think, anyway. Whatever the cause, though, I know that I’m rarely feeling the impulse to write, and I know that shift has sharply slowed the pace of publishing here. I’m leaving the channel open and hope I can find the mental and temporal space to keep using it, but who knows what tomorrow may bring?

ACLED in R

The Armed Conflict Location & Event Data Project, a.k.a. ACLED, produces up-to-date event data on certain kinds of political conflict in Africa and, as of 2015, parts of Asia. In this post, I’m not going to dwell on the project’s sources and methods, which you can read about on ACLED’s About page, in the 2010 journal article that introduced the project, or in the project’s user’s guides. Nor am I going to dwell on the necessity of using all political event data sets, including ACLED, with care—understanding the sources of bias in how they observe events and error in how they code them and interpreting (or, in extreme cases, ignoring) the resulting statistics accordingly.

Instead, my only aim here is to share an R script I’ve written that largely automates the process of downloading and merging ACLED’s historical and current Africa data and then creates a new data frame with counts of events by type at the country-month level. If you use ACLED in R, this script might save you some time and some space on your hard drive.

You can find the R script on GitHub, here.

The chief problem with this script is that the URLs and file names of ACLED’s historical and current data sets change with every update, so the code will need to be modified each time that happens. If the names were modular and the changes to them predictable, it would be easy to rewrite the code to keep up with those changes automatically. Unfortunately, they aren’t, so the best I can do for now is to give step-by-step instructions in comments embedded in the script on how to update the relevant four fields by hand. As long as the basic structure of the .csv files posted by ACLED doesn’t change, though, the rest should keep working.

[UPDATE: I revised the script so it will scrape the link addresses from the ACLED website and parse the file names from them. The new version worked after ACLED updated its real-time file earlier today, when the old version would have broken. Unless ACLED changes its file-naming conventions or the structure of its website, the version should work for the rest of 2015. In case it does fail, instructions on how to hard-code a workaround are included as comments at the bottom of the script.]

It should also be easy to adapt the part of the script that generates country-month event counts to slice the data even more finely, or to count by something other than event type. To do that, you would just need to add variables to the group_by() part of the block of code that produces the object ACLED.cm. For example, if you wanted to get counts of events by type at the level of the state or province, you would revise that line to read group_by(gwno, admin1, year, month, event_type). Or, if you wanted country-month counts of events by the type(s) of actor involved, you could use group_by(gwno, year, month, interaction) and then see this user’s guide to decipher those codes. You get the drift.

The script also shows a couple of examples of how to use ‘gglot2’ to generate time-series plots of those monthly counts. Here’s one I made of monthly counts of battle events by country for the entire period covered by ACLED as of this writing: January 1997–June 2015. A production-ready version of this plot would require some more tinkering with the size of the country names and the labeling of the x-axis, but the kind of small-multiples chart offers a nice way to explore the data before analysis.

Monthly counts of battle events, January 1997-June 2015

Monthly counts of battle events, January 1997-June 2015

If you use the script and find flaws in it or have ideas on how to make it work better or do more, please email me at ulfelder <at> gmail <dot> com.

Conflict Events, Coup Forecasts, and Data Prospecting

Last week, for an upcoming post to the interim blog of the atrocities early-warning project I direct, I got to digging around in ACLED’s conflict event data for the first time. Once I had the data processed, I started wondering if they might help improve forecasts of coup attempts, too. That train of thought led to the preliminary results I’ll describe here, and to a general reminder of the often-frustrating nature of applied statistical forecasting.

ACLED is the Armed Conflict Location & Event Data Project, a U.S. Department of Defense–funded, multi-year endeavor to capture information about instances of political violence in sub-Saharan Africa from 1997 to the present.ACLED’s coders scan an array of print and broadcast sources, identifiy relevant events from them, and then record those events’ date, location, and form (battle, violence against civilians, or riots/protests); the types of actors involved; whether or not territory changed hands; and the number of fatalities that occurred. Researchers can download all of the project’s data in various formats and structures from the Data page, one of the better ones I’ve seen in political science.

I came to ACLED last week because I wanted to see if violence against civilians in Somalia had waxed, waned, or held steady in recent months. Trying to answer that question with their data meant:

  • Downloading two Excel spreadsheets, Version 4 of the data for 1997-2013 and the Realtime Data file covering (so far) the first five months of this year;
  • Processing and merging those two files, which took a little work because my software had trouble reading the original spreadsheets and the labels and formats differed a bit across them; and
  • Subsetting and summarizing the data on violence against civilians in Somalia, which also took some care because there was an extra space at the end of the relevant label in some of the records.

Once I had done these things, it was easy to generalize it to the entire data set, producing tables with monthly counts of fatalities and events by type  for all African countries over the past 13 years. And, once I had those country-month counts of conflict events, it was easy to imagine using them to try to help forecast of coup attempts in the world’s most coup-prone region. Other things being equal, variations across countries and over time in the frequency of conflict events might tell us a little more about the state of politics in those countries, and therefore where and when coup attempts are more likely to happen.

Well, in this case, it turns out they don’t tell us much more. The plot below shows ROC curves and the areas under those curves for the out-of-sample predictions from a five-fold cross-validation exercise involving a few country-month models of coup attempts. The Base Model includes: national political regime type (the categorization scheme from PITF’s global instability model applied to Polity 3d, the spell-file version); time since last change in Polity score (in days, logged); infant mortality rate (relative to the annual global median, logged); and an indicator for any coup attempts in the previous 24 months (yes/no). The three other models add logged sums of counts of ACLED events by type—battles, violence against civilians, or riots/protests—in the same country over the previous three, six, or 12 months, respectively. These are all logistic regression models, and the dependent variable is a binary one indicating whether or not any coup attempts (successful or failed) occurred in that country during that month, according to Powell and Thyne.

ROC Curves and AUC Scores from Five-Fold Cross-Validation of Coup Models Without and With ACLED Event Counts

ROC Curves and AUC Scores from Five-Fold Cross-Validation of Coup Models Without and With ACLED Event Counts

As the chart shows, adding the conflict event counts to the base model seems to buy us a smidgen more discriminatory power, but not enough to have confidence that they would routinely lead to more accurate forecasts. Intriguingly, the crossing of the ROC curves suggests that the base model, which emphasizes structural conditions, is actually a little better at identifying the most coup-prone countries. The addition of conflict event counts to the model leads to some under-prediction of coups in that high-risk set, but the balance tips the other way in countries with less structural vulnerability. In the aggregate, though, there is virtually no difference in discriminatory power between the base model and the ones that at the conflict event counts.

There are, of course, many other ways to group and slice ACLED’s data, but the rarity of coups leads me to believe that narrower cuts or alternative operationalizations aren’t likely to produce stronger predictive signals. In Africa since 1997, there are only 36 country-months with coup attempts, according to Powell and Thyne. When the events are this rare and complex and the examples this few, there’s really not much point in going beyond the most direct measures. Under these circumstances, we’re unlikely to discover finer patterns, and if we do, we probably shouldn’t have much confidence in them. There are also other models and techniques to try, but I’m dubious for the same reasons. (FWIW, I did try Random Forests and got virtually identical accuracy.)

So those are the preliminary results from this specific exercise. (The R scripts I used are on Github, here). I think those results are interesting in their own right, but the process involved in getting to them is also a great example of the often-frustrating nature of applied statistical forecasting. I spent a few hours each day for three days straight getting from the thought of exploring ACLED to the results described here. Nearly all of that time was spent processing data; only the last half-hour or so involved any modeling. As is often the case, a lot of that data-processing time was really just me staring at my monitor trying to think of another way to solve some problem I’d already tried and failed to solve.

In my experience, that kind of null result is where nearly all statistical forecasting ideas end. Even when you’re lucky enough to have the data to pursue them, few of your ideas pan out. But panning is the right metaphor, I think. Most of the work is repetitive and frustrating, but every so often you catch a nice nugget. Those nuggets tempt you to keep looking for more, and once in a great while, they can make you rich.

States Aren’t the Only Mass Killers

We tend to think of mass killing as something that states do, but states do not have a monopoly on this use of force. Many groups employ violence in an attempt to further their political and economic agendas; civilians often suffer the consequences of that violence, and sometimes that suffering reaches breathtaking scale.

This point occurred to me again as I thought about the stunning acts of mass violence that Boko Haram has carried out in northern Nigeria in the past few weeks. The chart below comes from the Council on Foreign Relations’ Nigeria Security Tracker, an online interface for a data set that counts deaths from “violent incidents directed at government property, places of worship, and suicide bombings.” The sharp upward bend at the far right of that red line represents the sudden and brutal end of several hundred lives in the past two months in various towns and villages in a part of the world that surely isn’t as alien to Americans as many of us assume. In Nigeria, too, parents wake up and set about the business of providing for themselves and their families, and many kids toddle off to school to learn and fidget and chatter with friends. Over the past few years, Boko Haram has repeatedly interrupted those daily routines with scores of attacks resulting in thousands of murders.

boko.haram.killings.chart.20140307

I suspect the tendency to see mass killing as the purview of states is driven by the extraordinary salience of two archetypal cases—the Holocaust, of course, but also the Rwandan genocide. From those examples, we infer that violence on this scale requires resources, organization, and opportunity on a scale that in “modern” times only states are supposed to possess. The Holocaust took this bureaucratic logic to unique extremes, but many accounts of the Rwandan genocide also emphasize state planning and propaganda as necessary conditions for that episode of mass murder in extremis.

It’s true that resources, organization, and opportunity facilitate mass violence, and that states are much more likely to have them. In some contexts, though, rebel groups and other non-state actors can accumulate enough resources and become well enough organized to kill on a comparable scale. This is especially likely in the same contexts in which states usually perpetrate mass killing, namely, in civil wars. In some wars, rebels manage to establish governance systems of their own, and the apparent logic of the atrocities committed by these quasi-states looks very similar to the logic behind the atrocities perpetrated by their foes: destroy your rival’s base of support, and scare civilians into compliance or complicity.

Rebels don’t need to govern to carry out mass killings, though, a point driven home by groups like the RUF in Sierra Leone, the Seleka and anti-balaka militias in the Central African Republic, and, of course, Boko Haram. Sometimes the states we now expect to protect civilians against such violence are so weak or absent or uncaring that those non-state groups don’t need deep pockets and sprawling organizations to accomplish mass murder. On Boko Haram, CFR’s John Campbell observes that, “Several of the most recent incidents involve government security forces unaccountably not at their posts, allowing Boko Haram freedom of movement. The governor of Borno state publicly said that Boko Haram fighters outgun government forces.” Campbell also notes that those security forces might be shirking their duty because they are poorly paid and equipped, and because they simply fear a group that “has a long tradition of killing any person in the security services that it can.” With a state like that, the resources and organization required to accomplish mass murder are, unfortunately, not so vast. What is required is a degree of ruthlessness that most of us find hard to understand, but that incomprehensibility should not be confused with impossibility.

Acts we conventionally describe as “terrorism” nowadays are also atrocities by another name, and so-called terrorist groups occasionally succeed in their lethal business on an extraordinary scale. Al Qaeda’s attacks on September 11, 2001, certainly qualify as a mass killing as we conventionally define it. Nearly 3,000 noncombatant civilians from a discrete group (Americans) were deliberately killed as part of a wider political conflict, and all in a single day. The torrent of car bombings and other indiscriminate attacks in Iraq in recent months has surely crossed that arbitrary 1,000-death threshold by now, too.

For analytical purposes, it would be useful to have a catalog of episodes in which non-state organizations committed atrocities on such a large scale. That catalog would allow us to try to glean patterns and develop predictive models from their comparison to each other and, more important, to situations in which those episodes did not occur. Even more useful would be a reliable assemblage of data on the incidents comprising those episodes, so we could carefully study how and where they arise and accumulate over time, perhaps with some hope of halting or at least mitigating future episodes as they develop.

Unfortunately, the data we want usually aren’t the data we have, and that’s true here, too. The Uppsala Conflict Data Program (UCDP) has compiled a data set on “one-sided violence,” defined as “intentional attacks on civilians by governments and formally organized armed groups,” that includes low, high, and best estimates of deaths attributed to each perpetrator group in cases where that annual estimate is 25 deaths or more (here). These data are an excellent start, but they only cover years since 1989, so the number of episodes involving non-state groups as perpetrators is still very small. The Armed Conflict Location & Event Data Project (ACLED) compiles detailed data (here) on attacks by non-state groups, among others, but it only covers Africa since 1997. New developments in the automated production of political event data hint at the possibility of analyzing deliberate violence against civilians around the world at a much higher resolution in the not-too-distant future. As I’ve discovered in an ongoing efforts to adapt one of these data sets to this purpose, however, we’re not quite there yet (see here).

In the meantime, we’ll keep seeing accounts of murderous sprees by groups like Boko Haram (here and here, to pick just two) and CAR’s Seleka (here) and anti-balaka (here) alongside the thrum of reporting on atrocities from places like Syria and Sudan. And as we read, we would do well to remember that people, not states, are the the common denominator.

PS. In the discussion of relevant data sets, I somehow forgot to mention that the Political Instability Task Force also funds the continuing collection of data on “atrocities” around the world involving five or more civilian fatalities (here). These data, which run all the way back to January 1995, are carefully compiled under the direction of a master of the craft, but they also suffer from the inevitable problems of reporting bias that plague all such efforts and so must be handled with care (see Will Moore here and here on this subject).

  • Author

  • Follow me on Twitter

  • Follow Dart-Throwing Chimp on WordPress.com
  • Enter your email address to follow this blog and receive notifications of new posts by email.

    Join 13,609 other subscribers
  • Archives

%d bloggers like this: