Whither Organized Violence?

The Human Security Research Group has just published the latest in its series of now-annual reports on “trends in organized violence around the world,” and it’s essential reading for anyone deeply interested in armed conflict and other forms of political violence. You can find the PDF here.

The 2013 edition takes Steven Pinker’s Better Angels as its muse and largely concurs with Pinker’s conclusions. I’ll sheepishly admit that I haven’t read Pinker’s book (yet), so I’m not going to engage directly in that debate. Instead, I’ll call attention to what the report’s authors infer from their research about future trends in political violence. Here’s how that bit starts, on p. 18:

The most encouraging data from the modern era come from the post–World War II years. This period includes the dramatic decline in the number and deadliness of international wars since the end of World War II and the reversal of the decades-long increase in civil war numbers that followed the end of the Cold War in the early 1990s.

What are the chances that these positive changes will be sustained? No one really knows. There are too many future unknowns to make predictions with any degree of confidence.

On that point, political scientist Bear Braumoeller would agree. In an interview last year for Popular Science (here), Kelsey Atherton asked Braumoeller about Braumoeller’s assertion in a recent paper (here) that it will take 150 years to know if the downward trend in warfare that Pinker and others have identified is holding. Braumoeller replied:

Some of this literature points to “the long peace” of post-World War II. Obviously we haven’t stopped fighting wars entirely, so what they’re referring to is the absence of really really big wars like World War I and World War II. Those wars would have to be absent for like 70 to 75 more years for us to have confidence that there’s been a change in the baseline rate of really really big wars.

That’s sort of a separate question from how we know whether there are trends in warfare in general. We need to understand that war and peace are both stochastic processes. We need a big enough sample to rule out the historical average, which is about one or two big wars per century. We just haven’t had enough time since World War I and World War II to rule out the possibility that nothing’s changed.

I suspect that the authors of the Human Security Report would not dispute that claim, but after carefully reviewing Pinker’s and their own evidence, they do see causes for cautious optimism. Here I’ll quote at length, because I think it’s important to see the full array of forces taken into consideration to increase our confidence in the validity of the authors’ cautious speculations.

The case for pessimism about the global security future is well rehearsed and has considerable support within the research community. Major sources of concern include the possibility of outbreaks of nuclear terrorism, a massive transnational upsurge of lethal Islamist radicalism, or wars triggered by mass droughts and population movements driven by climate change.

Pinker notes reasons for concern about each of these potential future threats but also skepticism about the more extreme claims of the conflict pessimists. Other possible drivers of global violence include the political crises that could follow the collapse of the international financial system and destabilizing shifts in the global balance of economic and military power—the latter being a major concern of realist scholars worried about the economic and military rise of China.

But focusing exclusively on factors and processes that may increase the risks of large-scale violence around the world, while ignoring those that decrease it, also almost certainly leads to unduly pessimistic conclusions.

In the current era, factors and processes that reduce the risks of violence not only include the enduring impact of the long-term trends identified in Better Angels but also the disappearance of two major drivers of warfare in the post–World War II period—colonialism and the Cold War. Other post–World War II changes that have reduced the risks of war include the entrenchment of the global norm against interstate warfare except in self-defence or with the authority of the UN Security Council; the intensification of economic and financial interdependence that increases the costs and decreases the benefits of cross-border warfare; the spread of stable democracies; and the caution-inducing impact of nuclear weapons on relations between the major powers.

With respect to civil wars, the emergent and still-growing system of global security governance discussed in Chapter 1 has clearly helped reduce the number of intrastate conflicts since the end of the Cold War. And, at what might be called the “structural” level, we have witnessed steady increases in national incomes across the developing world. This is important because one of the strongest findings from econometric research on the causes of war is that the risk of civil wars declines as national incomes—and hence governance and other capacities—increase. Chapter 1 reports on a remarkable recent statistical study by the Peace Research Institute, Oslo (PRIO) that found that if current trends in key structural variables are sustained, the proportion of the world’s countries afflicted by civil wars will halve by 2050.

Such an outcome is far from certain, of course, and for reasons that have yet to be imagined, as well as those canvassed by the conflict pessimists. But, thanks in substantial part to Steven Pinker’s extraordinary research, there are now compelling reasons for believing that the historical decline in violence is both real and remarkably large—and also that the future may well be less violent than the past.

After reading the new Human Security Report, I remain a short-term pessimist and long-term optimist. As I’ve said in a few recent posts (see especially this one), I think we’re currently in the thick of period of systemic instability that will continue to produce mass protests, state collapse, mass killing, and other forms of political instability at higher rates than we’ve seen since the early 1990s for at least the next year or two.

At the same time, I don’t think this local upswing marks a deeper reversal of the long-term trend that Pinker identifies, and that the Human Security Report confirms. Instead, I believe that the global political economy is continuing to evolve in a direction that makes political violence less common and less lethal. This system creep is evident not only in the aforementioned trends in armed violence, but also in concurrent and presumably interconnected trends in democratization, socio-economic development, and global governance. Until we see significant and sustained reversals in most or all of these trends, I will remain optimistic about the directionality of the underlying processes of which these data can give us only glimpses.

A New Statistical Approach to Assessing Risks of State-Led Mass Killing

Which countries around the world are currently at greatest risk of an onset of state-led mass killing? At the start of the year, I posted results from a wiki survey that asked this question. Now, here in heat-map form are the latest results from a rejiggered statistical process with the same target. You can find a dot plot of these data at the bottom of the post, and the data and code used to generate them are on GitHub.

Estimated Risk of New Episode of State-Led Mass Killing

These assessments represent the unweighted average of probabilistic forecasts from three separate models trained on country-year data covering the period 1960-2011. In all three models, the outcome of interest is the onset of an episode of state-led mass killing, defined as any episode in which the deliberate actions of state agents or other organizations kill at least 1,000 noncombatant civilians from a discrete group. The three models are:

  • PITF/Harff. A logistic regression model approximating the structural model of genocide/politicide risk developed by Barbara Harff for the Political Instability Task Force (PITF). In its published form, the Harff model only applies to countries already experiencing civil war or adverse regime change and produces a single estimate of the risk of a genocide or politicide occurring at some time during that crisis. To build a version of the model that was more dynamic, I constructed an approximation of the PITF’s global model for forecasting political instability and use the natural log of the predicted probabilities it produces as an additional input to the Harff model. This approach mimics the one used by Harff and Ted Gurr in their ongoing application of the genocide/politicide model for risk assessment (see here).
  • Elite Threat. A logistic regression model that uses the natural log of predicted probabilities from two other logistic regression models—one of civil-war onset, the other of coup attempts—as its only inputs. This model is meant to represent the argument put forth by Matt Krain, Ben Valentino, and others that states usually engage in mass killing in response to threats to ruling elites’ hold on power.
  • Random Forest. A machine-learning technique (see here) applied to all of the variables used in the two previous models, plus a few others of possible relevance, using the ‘randomforest‘ package in R. A couple of parameters were tuned on the basis of a gridded comparison of forecast accuracy in 10-fold cross-validation.

The Random Forest proved to be the most accurate of the three models in stratified 10-fold cross-validation. The chart below is a kernel density plot of the areas under the ROC curve for the out-of-sample estimates from that cross-validation drill. As the chart shows, the average AUC for the Random Forest was in the low 0.80s, compared with the high 0.70s for the PITF/Harff and Elite Threat models. As expected, the average of the forecasts from all three performed even better than the best single model, albeit not by much. These out-of-sample accuracy rates aren’t mind blowing, but they aren’t bad either, and they are as good or better than many of the ones I’ve seen from similar efforts to anticipate the onset of rare political crises in countries worldwide.


Distribution of Out-of-Sample AUC Scores by Model in 10-Fold Cross-Validation

The decision to use an unweighted average for the combined forecast might seem simplistic, but it’s actually a principled choice in this instance. When examples of the event of interest are hard to come by and we have reason to believe that the process generating those events may be changing over time, sticking with an unweighted average is a reasonable hedge against risks of over-fitting the ensemble to the idiosyncrasies of the test set used to tune it. For a longer discussion of this point, see pp. 7-8 in the last paper I wrote on this work and the paper by Andreas Graefe referenced therein.

Any close readers of my previous work on this topic over the past couple of years (see here and here) will notice that one model has been dropped from the last version of this ensemble, namely, the one proposed by Michael Colaresi and Sabine Carey in their 2008 article, “To Kill or To Protect” (here). As I was reworking my scripts to make regular updating easier (more on that below), I paid closer attention than I had before to the fact that the Colaresi and Carey model requires a measure of the size of state security forces that is missing for many country-years. In previous iterations, I had worked around that problem by using a categorical version of this variable that treated missingness as a separate category, but this time I noticed that there were fewer than 20 mass-killing onsets in country-years for which I had a valid observation of security-force size. With so few examples, we’re not going to get reliable estimates of any pattern connecting the two. As it happened, this model—which, to be fair to its authors, was not designed to be used as a forecasting device—was also by far the least accurate of the lot in 10-fold cross-validation. Putting two and two together, I decided to consign this one to the scrap heap for now. I still believe that measures of military forces could help us assess risks of mass killing, but we’re going to need more and better data to incorporate that idea into our multimodel ensemble.

The bigger and in some ways more novel change from previous iterations of this work concerns the unorthodox approach I’m now using to make the risk assessments as current as possible. All of the models used to generate these assessments were trained on country-year data, because that’s the only form in which most of the requisite data is produced. To mimic the eventual forecasting process, the inputs to those models are all lagged one year at the model-estimation stage—so, for example, data on risk factors from 1985 are compared with outcomes in 1986, 1986 inputs to 1987 outcomes, and so on.

If we stick rigidly to that structure at the forecasting stage, then I need data from 2013 to produce 2014 forecasts. Unfortunately, many of the sources for the measures used in these models won’t publish their 2013 data for at least a few more months. Faced with this problem, I could do something like what I aim to do with the coup forecasts I’ll be producing in the next few days—that is, only use data from sources that quickly and reliably update soon after the start of each year. Unfortunately again, though, the only way to do that would be to omit many of the variables most specific to the risk of mass atrocities—things like the occurrence of violent civil conflict or the political salience of elite ethnicity.

So now I’m trying something different. Instead of waiting until every last input has been updated for the previous year and they all neatly align in my rectangular data set, I am simply applying my algorithms to the most recent available observation of each input. It took some trial and error to write, but I now have an R script that automates this process at the country level by pulling the time series for each variable, omitting the missing values, reversing the series order, snipping off the observation at the start of that string, collecting those snippets in a new vector, and running that vector through the previously estimated model objects to get a forecast (see the section of this starting at line 284).

One implicit goal of this approach is to make it easier to jump to batch processing, where the forecasting engine routinely and automatically pings the data sources online and updates whenever any of the requisite inputs has changed. So, for example, when in a few months the vaunted Polity IV Project releases its 2013 update, my forecasting contraption would catch and ingest the new version and the forecasts would change accordingly. I now have scripts that can do the statistical part but am going to be leaning on other folks to automate the wider routine as part of the early-warning system I’m helping build for the U.S. Holocaust Memorial Museum’s Center for the Prevention of Genocide.

The big upside of this opportunistic approach to updating is that the risk assessments are always as current as possible, conditional on the limitations of the available data. The way I figure, when you don’t have information that’s as fresh as you’d like, use the freshest information you’ve got.

The downside of this approach is that it’s not clear exactly what the outputs from that process represent. Technically, a forecast is a probabilistic statement about the likelihood of a specific event during a specific time period. The outputs from this process are still probabilistic statements about the likelihood of a specific event, but they are no longer anchored to a specific time period. The probabilities mapped at the top of this post mostly use data from 2012, but the inputs for some variables for some cases are a little older, while the inputs for some of the dynamic variables (e.g., GDP growth rates and coup attempts) are essentially current. So are those outputs forecasts for 2013, or for 2014, or something else?

For now, I’m going with “something else” and am thinking of the outputs from this machinery as the most up-to-date statistical risk assessments I can produce, but not forecasts as such. That description will probably sound like fudging to most statisticians, but it’s meant to be an honest reflection of both the strengths and limitations of the underlying approach.

Any gear heads who’ve read this far, I’d really appreciate hearing your thoughts on this strategy and any ideas you might have on other ways to resolve this conundrum, or any other aspect of this forecasting process. As noted at the top, the data and code used to produce these estimates are posted online. This work is part of a soon-to-launch, public early-warning system, so we hope and expect that they will have some effect on policy and advocacy planning processes. Given that aim, it behooves us to do whatever we can to make them as accurate as possible, so I would very much welcome any suggestions on how to do or describe this better.

Finally and as promised, here is a dot plot of the estimates mapped above. Countries are shown in descending order by estimated risk. The gray dots mark the forecasts from the three component models, and the red dot marks the unweighted average.


PS. In preparation for a presentation on this work at an upcoming workshop, I made a new map of the current assessments that works better, I think, than the one at the top of this post. Instead of coloring by quintiles, this new version (below) groups cases into several bins that roughly represent doublings of risk: less than 1%, 1-2%, 2-4%, 4-8%, and 8-16%. This version more accurately shows that the vast majority of countries are at extremely low risk and more clearly shows variations in risk among the ones that are not.

Estimated Risk of New State-Led Mass Killing

Estimated Risk of New State-Led Mass Killing

A Coda to “Using GDELT to Monitor Atrocities, Take 2″

I love doing research in the Internet Age. As I’d hoped it would, my post yesterday on the latest iteration of our atrocities-monitoring system in the works has already sparked a lot of really helpful responses. Some of those responses are captured in comments on the post, but not all of them are. So, partly as a public good and partly for my own record-keeping, I thought I’d write a coda to that post enumerating the leads it generated and some of my reactions to them.

Give the Machines Another Shot at It

As a way to reduce or even eliminate the burden placed on our human(s) in the loop, several people suggested something we’ve been considering for a while: use machine-learning techniques to develop classifiers that can be used to further reduce the data left after our first round of filtering. These classifiers could consider all of the features in GDELT, not just the event and actor types we’re using in our R script now. If we’re feeling really ambitious, we could go all the way back to the source stories and use natural-language processing to look for additional discriminatory power there. This second round might not eliminate the need for human review, but it certainly could lighten the load.

The comment threads on this topic (here and here) nicely capture what I see as the promise and likely limitations of this strategy, so I won’t belabor it here. For now, I’ll just note that how well this would work is an empirical question, and it’s one we hope to get a chance to answer once we’ve accumulated enough screened data to give those classifiers a fighting chance.

Leverage GDELT’s Global Knowledge Graph

Related to the first idea, GDELT co-creator Kalev Leetaru has suggested on a couple of occasions that we think about ways to bring the recently-created GDELT Global Knowledge Graph (GKG) to bear on our filtering task. As Kalev describes in a post on the GDELT blog, GKG consists of two data streams, one that records mentions of various counts and another that captures connections  in each day’s news between “persons, organizations, locations, emotions, themes, counts, events, and sources.” That second stream in particular includes a bunch of data points that we can connect to specific event records and thus use as additional features in the kind of classifiers described under the previous header. In response to my post, Kalev sent this email to me and a few colleagues:

I ran some very very quick numbers on the human coding results Jay sent me where a human coded 922 articles covering 9 days of GDELT events and coded 26 of them as atrocities. Of course, 26 records isn’t enough to get any kind of statistical latch onto to build a training model, but the spectral response of the various GKG themes is quite informative. For events tagged as being an atrocity, themes such as ETHNICITY, RELIGION, HUMAN_RIGHTS, and a variety of functional actors like Villagers, Doctors, Prophets, Activists, show up in the top themes, whereas in the non-atrocities the roles are primarily political leaders, military personnel, authorities, etc. As just a simple example, the HUMAN_RIGHTS theme appeared in just 6% of non-atrocities, but 30% of atrocities, while Activists show up in 33% of atrocities compared with just 4% of non-atrocities, and the list goes on.

Again, 26 articles isn’t enough to build a model on, but just glancing over the breakdown of the GKG themes for the two there is a really strong and clear breakage between the two across the entire set of themes, and the breakdown fits precisely what baysean classifiers like (they are the most accurate for this kind of separation task and outperform SVM and random forest).

So, Jay, the bottom line is that if you can start recording each day the list of articles that you guys review and the ones you flag as an atrocity and give me a nice dataset over time, should be pretty easy to dramatically filter these down for you at the very least.

As I’ve said throughout this process, its not that event data can’t do what is needed, its that often you have to bring additional signals into the mix to accomplish your goals when the thing you’re after requires signals beyond what the event records are capturing.

What Kalev suggests at the end there—keep a record of all the events we review and the decisions we make on them—is what we’re doing now, and I hope we can expand on his experiment in the next several months.

Crowdsource It

Jim Walsh left a thoughtful comment suggesting that we crowdsource the human coding:

Seems to me like a lot of people might be willing to volunteer their time for this important issue–human rights activists and NGO types, area experts, professors and their students (who might even get some credit and learn about coding). If you had a large enough cadre of volunteers, could assign many (10 or more?) to each day’s data and generate some sort of average or modal response. Would need someone to organize the volunteers, and I’m not sure how this would be implemented online, but might be do-able.

As I said in my reply to him, this is an approach we’ve considered but rejected for now. We’re eager to take advantage of the wisdom of interested crowds and are already doing so in big ways on other parts of our early-warning system, but I have two major concerns about how well it would work for this particular task.

The first is the recruiting problem, and here I see a Catch-22: people are less inclined to do this if they don’t believe the system works, but it’s hard to convince them that the system works if we don’t already have a crowd involved to make it go. This recruiting problem becomes especially acute in a system with time-sensitive deliverables. If we promise daily updates, we need to produce daily updates, and it’s hard to do that reliably if we depend on self-organized labor.

My second concern is the principal-agent problem. Our goal is to make reliable and valid data in a timely way, but there are surely people out there who would bring goals to the process that might not align with ours. Imagine, for example, that Absurdistan appears in the filtered-but-not-yet-coded data to be committing atrocities, but citizens (or even paid agents) of Absurdistan don’t like that idea and so organize to vote those events out of the data set. It’s possible that our project would be too far under the radar for anyone to bother, but our ambitions are larger than that, so we don’t want to assume that will be true. If we succeed at attracting the kind of attention we hope to attract, the deeply political and often controversial nature of our subject matter would make crowdsourcing this task more vulnerable to this kind of failure.

Use Mechanical Turk

Both of the concerns I have about the downsides of crowdsourcing the human-coding stage could be addressed by Ryan Briggs’ suggestion via Twitter to have Amazon Mechanical Turk do it. A hired crowd is there when you need it and (usually) doesn’t bring political agendas to the task. It’s also relatively cheap, and you only pay for work performed.

Thanks to our collaboration with Dartmouth’s Dickey Center, the marginal cost of the human coding isn’t huge, so it’s not clear that Mechanical Turk would offer much advantage on that front. Where it could really help is in routinizing the daily updates. As I mentioned in the initial post, when you depend on human action and have just one or a few people involved, it’s hard to establish a set of routines that covers weekends and college breaks and sick days and is robust to periodic changes in personnel. Primarily for this reason, I hope we’ll be able to run an experiment with Mechanical Turk where we can compare its cost and output to what we’re paying and getting now and see if this strategy might make sense for us.

Don’t Forget About Errors of Omission

Last but not least, a longtime colleague had this to say in an email reacting to the post (hyperlinks added):

You are effectively describing a method for reducing errors of commission, events coded by GDELT as atrocities that, upon closer inspection, should not be. It seems like you also need to examine errors of omission. This is obviously harder. Two possible opportunities would be to compare to either [the PITF Worldwide Atrocities Event Data Set] or to ACLED.  There are two questions. Is GDELT “seeing” the same source info (and my guess is that it is and more, though ACLED covers more than just English sources and I’m not sure where GDELT stands on other languages). Then if so (and there are errors of omission) why aren’t they showing up (coded as different types of events or failed to trigger any coding at all)[?]

It’s true that our efforts so far have focused almost exclusively on avoiding errors of commission, with the important caveat that it’s really our automated filtering process, not GDELT, that commits most of these errors. The basic problem for us is that GDELT, or really the CAMEO scheme on which it’s based, wasn’t designed to spot atrocities per se. As a result, most of what we filter out in our human-coding second stage aren’t things that were miscoded by GDELT. Instead, they’re things that were properly coded by GDELT as various forms of violent action but upon closer inspection don’t appear to involve the additional features of atrocities as we define them.

Of course, that still leaves us with this colleague’s central concern about errors of omission, and on that he’s absolutely right. I have experimented with different actor and event-type criteria to make sure we’re not missing a lot of events of interest in GDELT, but I haven’t yet compared what we’re finding in GDELT to what related databases that use different sources are seeing. Once we accumulate a few month’s worth of data, I think this is something we’re really going to need to do.

Stay tuned for Take 3…

Using GDELT to Monitor Atrocities, Take 2

Last May, I wrote a post about my preliminary efforts to use a new data set called GDELT to monitor reporting on atrocities around the world in near-real time. Those efforts represent one part of the work I’m doing on a public early-warning system for the U.S. Holocaust Memorial Museum’s Center for the Prevention of Genocide, and they have continued in fits and starts over the ensuing eight months. With help from Dartmouth’s Dickey Center, Palantir, and the GDELT crew, we’ve made a lot of progress. I thought I’d post an update now because I’m excited about the headway we’ve made; I think others might benefit from seeing what we’re doing; and I hope this transparency can help us figure out how to do this task even better.

So, let’s cut to the chase: Here is a screenshot of an interactive map locating the nine events captured in GDELT in the first week of January 2014 that looked like atrocities to us and occurred in a place that the Google Maps API recognized when queried. (One event was left off the map because Google Maps didn’t recognize its reported location.) The size of the bubbles corresponds to the number of civilian deaths, which in this map range from one to 31. To really get a feel for what we’re trying to do, though, head over to the original visualization on CartoDB (here), where you can zoom in and out and click on the bubbles to see a hyperlink to the story from which each event was identified.


Looks simple, right? Well, it turns out it isn’t, not by a long shot.

As this blog’s regular readers know, GDELT uses software to scour the web for new stories about political interactions all around the world and parses those stories to identify and record information about who did or said what to whom, when, and where. It currently covers the period 1979-present and is now updated every day, and each of those daily updates contains some 100,000-140,000 new records. Miraculously and crucial to a non-profit pilot project like ours, GDELT is also available for free. 

The nine events plotted in the map above were sifted from the tens of thousands of records GDELT dumped on us in the first week of 2014. Unfortunately, that data-reduction process is only partially automated.

The first step in that process is the quickest. As originally envisioned back in May, we are using an R script (here) to download GDELT’s daily update file and sift it for events that look, from the event type and actors involved, like they might involve what we consider to be an atrocity—that is, deliberate, deadly violence against one or more noncombatant civilians in the context of a wider political conflict.

Unfortunately, the stack of records that filtering script returns—something like 100-200 records per day—still includes a lot of stuff that doesn’t interest us. Some records are properly coded but involve actions that don’t meet our definition of an atrocity (e.g., clashes between rioters and police or rebels and troops); some involve atrocities but are duplicates of events we’ve already captured; and some are just miscoded (e.g., a mention of the film industry “shooting” movies that gets coded as soldiers shooting civilians).

After we saw how noisy our data set would be if we stopped screening there, we experimented with a monitoring system that would acknowledge GDELT’s imperfections and try to work with them. As Phil Schrodt recommended at the recent GDELT DC Hackathon, we looked to “embrace the suck.” Instead of trying to use GDELT to generate a reliable chronicle of atrocities around the world, we would watch for interesting and potentially relevant perturbations in the information stream, noise and all, and those perturbations would produce alerts that users of our system could choose to investigate further. Working with Palantir, we built a system that would estimate country-specific prior moving averages of daily event counts returned by our filtering script and would generate an alert whenever a country’s new daily count landed more than two standard deviations above or below that average.

That system sounded great to most of the data pros in our figurative room, but it turned out to be a non-starter with some other constituencies of importance to us. The issue was credibility. Some of the events causing those perturbations in the GDELT stream were exactly what we were looking for, but others—a pod of beached whales in Brazil, or Congress killing a bill on healthcare reform—were laughably far from the mark. If our supposedly high-tech system confused beached whales and Congressional procedures for mass atrocities, we would risk undercutting the reputation for reliability and technical acumen that we are striving to achieve.

So, back to the drawing board we went. To separate the signal from the static and arrive at something more like that valid chronicle we’d originally envisioned, we decided that we needed to add a second, more laborious step to our data-reduction process. After our R script had done its work, we would review each of the remaining records by hand to decide if it belonged in our data set or not and, when necessary, to correct any fields that appeared to have been miscoded. While we were at it, we would also record the number of deaths each event produced. We wrote a set of rules to guide those decisions; had two people (a Dartmouth undergraduate research assistant and I) apply those rules to the same sets of daily files; and compared notes and made fixes. After a few iterations of that process over a few months, we arrived at the codebook we’re using now (here).

This process radically reduces the amount of data involved. Each of those two steps drops us down multiple orders of magnitude: from 100,000-140,000 records in the daily updates, to about 150 in our auto-filtered set, to just one or two in our hand-filtered set. The figure below illustrates the extent of that reduction. In effect, we’re treating GDELT as a very powerful but error-prone search and coding tool, a source of raw ore that needs refining to become the thing we’re after. This isn’t the only way to use GDELT, of course, but for our monitoring task as presently conceived, it’s the one that we think will work best.


Once that second data-reduction step is done, we still have a few tasks left to enable the kind of mapping and analysis we aim to do. We want to trim the data set to keep only the atrocities we’ve identified, and we need to consolidate the original and corrected fields in those remaining records and geolocate them. All of that work gets done with a second R script (here), which is applied to the spreadsheet the coder saves after completing her work. The much smaller file that script produces is then ready to upload to a repository where it can be combined with other days’ outputs to produce the global chronicle our monitoring project aims to produce.

From start to finish, each daily update now takes about 45 minutes, give or take 15. We’d like to shrink that further if we can but don’t see any real opportunities to do so at the moment. Perhaps more important, we still have to figure out the bureaucratic procedures that will allow us to squeeze daily updates from a “human in the loop” process in a world where there are weekends and holidays and people get sick and take vacations and sometimes even quit. Finally, we also have not yet built the dashboard that will display and summarize and provide access to these data on our program’s web site, which we expect to launch some time this spring.

We know that the data set this process produces will be incomplete. I am 100-percent certain that during the first week of January 2014, more than 10 events occurred around the world that met our definition of an atrocity. Unfortunately, we can only find things where GDELT looks, and even a scan of every news story produced every day everywhere in the world would fail to see the many atrocities that never make the news.

On the whole, though, I’m excited about the progress we’ve made. As soon as we can launch it, this monitoring process should help advocates and analysts more efficiently track atrocities globally in close to real time. As our data set grows, we also hope it will serve as the foundation for new research on forecasting, explaining, and preventing this kind of violence. Even with its evident shortcomings, we believe this data set will prove to be useful, and as GDELT’s reach continues to expand, so will ours.

PS For a coda discussing the great ideas people had in response to this post, go here.

[Erratum: The original version of this post said there were about 10,000 records in each daily update from GDELT. The actual figure is 100,000-140,000. The error has been corrected and the illustration of data reduction updated accordingly.]

Why More Mass Killings in 2013, and What It Portends for This Year

In a recent post, I noted that 2013 had distinguished itself in a dismal way, by producing more new episodes of mass killing than any other year since the early 1990s. Now let’s talk about why.

Each of these mass killings surely involves some unique and specific local processes, and people who study in depth the societies where mass killings are occurring can say much better than I what those are. As someone who believes local politics is always embedded in a global system, however, I don’t think we can fully understand these situations by considering only those idiosyncratic features, either. Sometimes we see “clusters” where they aren’t, but evidence that we live in a global system leads me to think that isn’t what’s happening here.

To fully understand why a spate of mass killings is happening now, I think it helps to recognize that this cluster is occurring alongside—or, in some cases, in concert with—a spate of state collapses and during a period of unusually high social unrest. Systemic thinking leads me to believe that these processes are interrelated in explicable ways.

Just as there are boom and bust cycles within economies, there seem to be cycles of political (dis)order in the global political economy, too. Economic crunches help spur popular unrest. Economic crunches are often regional or global in nature, and unrest can inspire imitation. These reverberating challenges can shove open doors to institutional change, but they also tend to inspire harsh responses from incumbents intent on preserving the status quo ante. The ensuing clashes present exactly the conditions that are ripest for mass killing. Foreign governments react to these clashes in various ways, sometimes to try to quell the conflict and sometimes to back a favored side. These reactions often beget further reactions, however, and efforts to manufacture a resolution can end up catalyzing wider disorder instead.

In hindsight, I don’t think it’s an accident that the last phase of comparable disorder—the early 1990s—produced two iconic yet seemingly contradictory pieces of writing on political order: Francis Fukuyama’s The End of History and the Last Man, and Robert Kaplan’s “The Coming Anarchy.” A similar dynamic seems to be happening now. Periods of heightened disorder bring heightened uncertainty, with many possibilities both good and bad. All good things do not necessarily arrive together, and the disruptions that are producing some encouraging changes in political institutions at the national and global levels also open the door to horrifying violence.

Of course, in political terms, calendar years are an entirely arbitrary delineation of time. The mass killings I called out in that earlier post weren’t all new in 2013, and the processes generating them don’t reset with the arrival of a new year. In light of the intensification and spread of the now-regional war in Syria; escalating civil wars in Pakistan, Iraq, and AfghanistanChina’s increasingly precarious condition; and the persistence of economic malaise in Europe, among other things, I think there’s a good chance that we still haven’t reached the peak of the current phase of global disorder. And, on mass killing in particular, I suspect that the persistence of this phase will probably continue to produce new episodes at a faster rate than we saw in the previous 20 years.

That’s the bad news. The slightly better news is that, while we (humanity) still aren’t nearly as effective at preventing mass killings as we’d like to be, there are signs that we’re getting better at it. In a recent post on United to End Genocide’s blog, Daniel Sullivan noted “five successes in genocide prevention in 2013,” and I think his list is a good one. Political scientist Bear Braumoeller encourages us to think of the structure of the international system as distributions of features deemed important by the major actors in it. Refracting Sullivan’s post through that lens, we can see how changes in the global distribution of political regime types, of formal and informal interdependencies among states, of ideas about atrocities prevention, and of organizations devoted to advocating for that cause seem to be enabling changes in responses to these episodes that are helping to stop or slow some of them sooner, making them somewhat less deadly on the whole.

The Central African Republic is a telling example. Attacks and clashes there have probably killed thousands over the past year, and even with deeper foreign intervention, the fighting hasn’t yet stopped. Still, in light of the reports we were receiving from people on the scene in early December (see here and here, for example), it’s easy to imagine this situation having spiraled much further downward already, had French forces and additional international assistance not arrived when they did. A similar process may be occurring now in South Sudan. Both cases already involve terrible violence on a large scale, but we should also acknowledge that both could have become much worse—and very likely will, if the braking efforts underway are not sustained or even intensified.

A Notable Year of the Wrong Kind

The year that’s about to end has distinguished itself in at least one way we’d prefer never to see again. By my reckoning, 2013 saw more new mass killings than any year since the early 1990s.

When I say “mass killing,” I mean any episode in which the deliberate actions of state agents or other organizations kill at least 1,000 noncombatant civilians from a discrete group. Mass killings are often but certainly not always perpetrated by states, and the groups they target may be identified in various ways, from their politics to their ethnicity, language, or religion. Thanks to my colleague Ben Valentino, we have a fairly reliable tally of episodes of state-led mass killing around the world since the mid-1940s. Unfortunately, there is no comparable reckoning of mass killings carried out by non-state actors—nearly always rebel groups of some kind—so we can’t make statements about counts and trends as confidently as I would like. Still, we do the best we can with the information we have.

With those definitions and caveats in mind, I would say that in 2013 mass killings began:

Of course, even as these new cases have developed, episodes of mass killings have continued in a number of other places:

In a follow-up post I hope to write soon, I’ll offer some ideas on why 2013 was such a bad year for deliberate mass violence against civilians. In the meantime, if you think I’ve misrepresented any of these cases here or overlooked any others, please use the Comments to set me straight.

One Outsider’s Take on Thailand

Justin Heifetz at the Bangkok Post asked me this morning for some comments on the current political situation in Thailand. Here is a slightly modified version of what I wrote in response to his questions.

I won’t speak to the specifics of Thai culture or social psychological theories of political behavior, because those things are outside my areas of expertise. What I can talk about are the strategic dilemmas that make some countries more susceptible to coups and other breakdowns of democracy than others. Instead of thinking in terms of a “coup culture”, I think it’s useful to ask why the military in the past and opposition parties now might prefer an unelected government to an elected one.

In the case of Thailand, it’s clear that some opposition factions recognize that they cannot win power through fair elections, and those factions are very unhappy with the policies enacted by the party that can. There are two paths out of that conundrum: either seize power directly through rebellion, or find a way to provoke or facilitate a seizure of power by another faction more sympathetic to your interests—in this and many other cases, the military. Rebellions are very hard to pull off, especially for minority factions, so that often leaves them with trying to provoke a coup as their only viable option. Apparently, Suthep Thaugsuban and his supporters recognize this logic and are now pursuing just such a strategy.

The big question now is whether or not the military leadership will respond as desired. They would be very likely to do so if they coveted power for themselves, but I think it’s pretty clear from their actions that many of them don’t. I suspect that’s partly because they saw after 2006 that seizing power didn’t really fix anything and carried all kinds of additional economic and reputational costs. If that’s right, then the military will only seize power again if the situation degenerates enough to make the costs of inaction even worse—say, into sustained fighting between rival factions, like we see in Bangladesh right now.

So far, Pheu Thai and its supporters seem to understand this risk and have mostly avoided direct confrontation in the streets. According to Reuters this morning, though, some “red shirt” activists are now threatening to mobilize anew if Suthep & co. do not back down soon. A peaceful demonstration of their numbers would remind the military and other fence-sitters of the electoral and physical power they hold, but it could also devolve into the kind of open conflict that might tempt the military to reassert itself as the guarantor of national order. Back on 1 December, red shirts cut short a rally in a Bangkok stadium after aggressive actions by their anti-government rivals led to two deaths and dozens of injuries, and there is some risk that fresh demonstrations could produce a similar situation.

On how or why this situation has escalated so quickly, I’d say that it didn’t really. This is just the latest flare-up of an underlying process of deep socio-economic and political transformation in Thailand that accelerated in the early 2000s and probably isn’t going to reach a new equilibrium of sorts for at least a few more years. Earlier in this process, the military clearly sided with conservative factions struggling to beat back the political consequences of this transformation for reasons that close observers of Thai politics surely understand much better than I. We’ll see soon if they’ve finally given up on that quixotic project.

Whatever happens this time around, though, the good news is that within a decade or so, Thai politics will probably stabilize into a new normal in which the military no longer acts directly in politics and parts of what’s now Pheu Thai and its coalition compete against each other and the remnants of today’s conservative forces for power through the ballot box.

The Fog of War Is Patchy

Over at Foreign Policy‘s Peace Channel, Sheldon Himmelfarb of USIP has a new post arguing that better communications technologies in the hands of motivated people now give us unprecedented access to information from ongoing armed conflicts.

The crowd, as we saw in the Syrian example, is helping us get data and information from conflict zones. Until recently these regions were dominated by “the fog war,” which blinded journalists and civilians alike; it took the most intrepid reporters to get any information on what was happening on the ground. But in the past few years, technology has turned conflict zones from data vacuums into data troves, making it possible to render parts the conflict in real time.

Sheldon is right, but only to a point. If crowdsourcing is the future of conflict monitoring, then the future is already here, as Sheldon notes; it’s just not very evenly distributed. Unfortunately, large swaths of the world remain effectively off the grid on which the production of crowdsourced conflict data depends. Worse, countries’ degree of disconnectedness is at least loosely correlated with their susceptibility to civil violence, so we still have the hardest time observing some of the world’s worst conflicts.

The fighting in the Central African Republic over the past year is a great and terrible case in point. The insurgency that flared there last December drove the president from the country in March, and state security forces disintegrated with his departure. Since then, CAR has descended into a state of lawlessness in which rival militias maraud throughout the country and much of the population has fled their homes in search of whatever security and sustenance they can find.

We know this process is exacting a terrible toll, but just how terrible is even harder to say than usual because very few people on hand have the motive and means to record and report out what they are seeing. At just 23 subscriptions per 100 people, CAR’s mobile-phone penetration rate remains among the lowest on the planet, not far ahead of Cuba’s and North Korea’s (data here). Some journalists and NGOs like Human Rights Watch and Amnesty International have been covering the situation as best they can, but they will be among the first to tell you that their information is woefully incomplete, in part because roads and other transport remain rudimentary. In a must-read recent dispatch on the conflict, anthropologist Louisa Lombard noted that “the French colonists invested very little in infrastructure, and even less has been invested subsequently.”

A week ago, I used Twitter to ask if anyone had managed yet to produce a reasonably reliable estimate of the number of civilian deaths in CAR since last December. The replies I received from some very reputable people and organizations makes clear what I mean about how hard it is to observe this conflict.

C.A.R. is an extreme case in this regard, but it’s certainly not the only one of its kind. The same could be said of ongoing episodes of civil violence in D.R.C., Sudan (not just Darfur, but also South Kordofan and Blue Nile), South Sudan, and in the Myanmar-China border region, to name a few. In all of these cases, we know fighting is happening, and we believe civilians are often targeted or otherwise suffering as a result, but our real-time information on the ebb and flow of these conflicts and the tolls they are exacting remains woefully incomplete. Mobile phones and the internet notwithstanding, I don’t expect that to change as quickly as we’d hope.

[N.B. I didn't even try to cover the crucial but distinct problem of verifying the information we do get from the kind of crowdsourcing Sheldon describes. For an entry point to that conversation, see this great blog post by Josh Stearns.]

China’s Accumulating Risk of Crisis

Eurasia Group founder Ian Bremmer has a long piece in the new issue of The National Interest that foretells continued political stability in China in spite of all the recent turbulence in the international system and at home. After cataloging various messes of the past few years—the global financial crisis and U.S. recession, war in Syria, and unrest in the other BRICS, to name a few—Bremmer says

It is all the more remarkable that there’s been so little noise from China, especially since the rising giant has experienced a once-in-a-decade leadership transition, slowing growth and a show trial involving one of the country’s best-known political personalities—all in just the past few months.

Given that Europe and America, China’s largest trade partners, are still struggling to recover their footing, growth is slowing across much of the once-dynamic developing world, and the pace of economic and social change within China itself is gathering speed, it’s easy to wonder if this moment is merely the calm before China’s storm.

Don’t bet on it. For the moment, China is more stable and resilient than many realize, and its political leaders have the tools and resources they need to manage a cooling economy and contain the unrest it might provoke.

Me, I’m not so sure. Every time I peek under another corner of the “authoritarian stability” narrative that blankets many discussions of China, I feel like I see another mess in the making.

That list is not exhaustive. No one of these situations seems especially likely to turn into a full-blown rebellion very soon, but that doesn’t mean that rebellion in China remains unlikely. That might sound like a contradiction, but it isn’t.

To see why, it helps to think statistically. Because of its size and complexity, China is like a big machine with lots of different modules, any one of which could break down and potentially set off a systemic failure. Think of the prospects for failure in each of those modules as an annual draw from a deck of cards: pull the ace of spades and you get a rebellion; pull anything else and you get more of the same. At 51:1 or about 2 percent, the chances that any one module will fail are quite small. If there are ten modules, though, you’re repeating the draw ten times, and your chances of pulling the ace of spades at least once (assuming the draws are independent) are more like 20 percent than 2. Increase the chances in any one draw—say, count both the king and the ace of spades as a “hit”—and the cumulative probability goes up accordingly. In short, when the risks are additive as I think they are here, it doesn’t take a ton of small probabilities to accumulate into a pretty sizable risk at the systemic level.

What’s more, the likelihoods of these particular events are actually connected in ways that further increase the chances of systemic trouble. As social movement theorists like Sidney Tarrow and Marc Beissinger have shown, successful mobilization in one part of an interconnected system can increase the likelihood of more action elsewhere by changing would-be rebels’ beliefs about the vulnerability of the system, and by starting to change the system itself.

As Bremmer points out, the Communist Party of China has done a remarkable job sustaining its political authority and goosing economic growth as long as it has. One important source of that success has been the Party’s willingness and capacity to learn and adapt as it goes, as evidenced by its sophisticated and always-evolving approach to censorship of social media and its increasing willingness to acknowledge and try to improve on its poor performance on things like air pollution and natural disasters.

Still, when I think of all the ways that system could start to fail and catalog the signs of increased stress on so many of those fronts, I have to conclude that the chances of a wider crisis in China are no longer so small and will only continue to grow. If Bremmer wanted to put a friendly wager on the prospect that China will be governed more or less as it is today to and through the Communist Party’s next National Congress, I’d take that bet.

How Long Will Syria’s Civil War Last? It’s Really Hard to Say

Last week, political scientist Barbara Walter wrote a great post for the blog Political Violence @ a Glance called “The Four Things We Know about How Civil Wars End (and What This Tells Us about Syria),” offering a set of base-rate forecasts about how long Syria’s civil war will last (probably a lot longer) and how it’s likely to end (with a military victory and not a peace agreement).

The post is great because it succeeds in condensing a large and complex literature into a small set of findings directly relevant to an important topic of public concern. It’s no coincidence that this post was written by one of the leading scholars on that subject. A “data scientist” could have looked at the same data sets used in the studies on which Walter bases her summary and not known which statistics would be most informative. Even with the right statistics in hand, a “hacker” probably wouldn’t know much about the relative quality of the different data sources, or the comparative-historical evidence on relevant causal mechanisms—two things that could (and should) inform their thinking about how much confidence to attach to the various results. To me, this is a nice illustration of the point that, even in an era of relentless quantification, subject-matter expertise still matters.

The one thing that seems to have gotten lost in the retellings and retweetings of this distilled evidence, though, is the idea of uncertainty. Apparently inspired by Walter’s post, Max Fisher wrote a similar one for the Washington Post‘s Worldviews blog under the headline “Political science says Syria’s civil war will probably last at least another decade.” Fisher’s prose is appropriately less specific than that (erroneous) headline, but if my Twitter feed is any indication, lots of people read Walter’s and Fisher’s post as predictions that the Syrian war will probably last 10 years or more in total.*

If you had to bet now on the war’s eventual duration, you’d be right to expect an over-under around 10, but the smart play would probably be not to bet at all, unless you were offered very favorable odds or you had some solid hedges in place. That’s because the statistics Walter and Fisher cite are based on a relatively small number of instances of a complex phenomenon, the origins and dynamics of which we still poorly understand. Under these circumstances, statistical forecasting is inevitably imprecise, and the imprecision only increases the farther we try to peer into the future.

We can visualize that imprecision, and the uncertainty it represents, with something called a prediction interval. A prediction interval is just an estimate of the range in which we expect future values of our quantity of interest to fall with some probability. Prediction intervals are sometimes included in plots of time-series forecasts, and the results typically look like the bell of a trumpet, as shown in the example below. The farther into the future you try to look, the less confidence you should have in your point prediction. When working with noisy data on a stochastic process, it doesn’t take a lot of time slices to reach the point where your prediction interval practically spans the full range of possible values.

prediction interval

Civil wars are, without question, one of those stochastic processes with noisy data. The averages Walter and Fisher cite are just central tendencies from a pretty heterogenous set of cases observed over a long period of world history. Using data like these, I think we can be very confident that the war will last at least a few more months and somewhat confident that it will last at least another year or more. Beyond that, though, I’d say the bell of our forecasting trumpet widens very quickly, and I wouldn’t want to hazard a guess if I didn’t have to.

* In fact, neither Walter nor Fisher specifically predicted that the war would last x number of years or more. Here’s what Walter actually wrote:

1. Civil wars don’t end quickly. The average length of civil wars since 1945 have been about 10 years. This suggests that the civil war in Syria is in its early stages, and not in the later stages that tend to encourage combatants to negotiate a settlement.

I think that’s a nice verbal summary of the statistical uncertainty I’m trying to underscore. And here’s what Fisher wrote under that misleading headline:

According to studies of intra-state conflicts since 1945, civil wars tend to last an average of about seven to 12 years. That would put the end of the war somewhere between 2018 and 2023.

Worse, those studies have identified several factors that tend to make civil wars last even longer than the average. A number of those factors appear to apply to Syria, suggesting that this war could be an unusually long one. Of course, those are just estimates based on averages; by definition, half of all civil wars are shorter than the median length, and Syria’s could be one of them. But, based on the political science, Syria has the right conditions to last through President Obama’s tenure and perhaps most or all of his successor’s.


Get every new post delivered to your Inbox.

Join 5,754 other followers

%d bloggers like this: