A Research Note on Updating Coup Forecasts

A new year is about to start, and that means it’s time for me to update my coup forecasts (see here and here for the 2013 and 2012 editions, respectively). The forecasts themselves aren’t quite ready yet—I need to wait until mid-January for updates from Freedom House to arrive—but I am making some changes to my forecasting process that I thought I would go ahead and describe now, because the thinking behind them illustrates some important dilemmas and new opportunities for predictions of many kinds of political events.

When it comes time to build a predictive statistical model of some rare political event, it’s usually not the model specification that gives me headaches. For many events of interest, I think we now have a pretty good understanding of which methods and variables are likely to produce more accurate forecasts.

Instead, it’s the data, or really the lack thereof, that sets me to pulling my hair out. As I discussed in a recent post, things we’d like to include in our models fall into a few general classes in this regard:

  • No data exist (fuggeddaboudit)
  • Data exist for some historical period, but they aren’t updated (“HA-ha!”)
  • Data exist and are updated, but they are patchy and not missing at random (so long, some countries)
  • Data exist and are updated, but not until many months or even years later (Spinning Pinwheel of Death)

In the past, I’ve set aside measures that fall into the first three of those sets but gone ahead and used some from the fourth, if I thought the feature was important enough. To generate forecasts before the original sources updated, I either a) pulled forward the last observed value for each case (if the measure was slow-changing, like a country’s infant mortality rate) or b) hand-coded my own updates (if the measure was liable to change from year to year, like a country’s political regime type).

Now, though, I’ve decided to get out of the “artisanal updating” business, too, for all but the most obvious and uncontroversial things, like which countries recently joined the WTO or held national elections. I’m quitting this business, in part, because it takes a lot of time and the results may be pretty noisy. More important, though, I’m also quitting because it’s not so necessary any more, thanks to  timelier updates from some data providers and the arrival of some valuable new data sets.

This commitment to more efficient updating has led me to adopt the following rules of thumb for my 2014 forecasting work:

  • For structural features that don’t change much from year to year (e.g., population size or infant mortality), include the feature and use the last observed value.
  • For variables that can change from year to year in hard-to-predict ways, only include them if the data source is updated in near-real time or, if it’s updated annually, if those updates are delivered within the first few weeks of the new year.
  • In all cases, only use data that are publicly available, to facilitate replication and to encourage more data sharing.

And here are some of the results of applying those rules of thumb to the list of features I’d like to include in my coup forecasting models for 2014.

  • Use Powell and Thyne’s list of coup events instead of Monty Marshall’s. Powell and Thyne’s list is updated throughout the year as events occur, whereas the publicly available version of Marshall’s list is only updated annually, several months after the start of the year. That wouldn’t matter so much if coups were only the dependent variable, but recent coup activity is also an important predictor, so I need the last year’s updates ASAP.
  • Use Freedom House’s Freedom in the World (FIW) data instead of Polity IV to measure countries’ political regime type. Polity IV offers more granular measures of political regime type than Freedom in the World, but Polity updates aren’t posted until spring or summer of the following year, usually more than a third of the way into my annual forecasting window.
  •  Use IMF data on economic growth instead of the World Bank’s. The Bank now updates its World Development Indicators a couple of times a year, and there’s a great R package that makes it easy to download the bits you need. That’s wonderful for slow-changing structural features, but it still doesn’t get me data on economic performance as fast as I’d like it. I work around that problem by using the IMF’s World Economic Outlook Database, which include projections for years for which observed data aren’t yet available and forecasts for several years into the future.
  • Last but not least, use GDELT instead of UCDP/PRIO or Major Episodes of Political Violence (MEPV) to measure civil conflict. Knowing which countries have had civil unrest or violence in the recent past can help us predict coup attempts, but the major publicly available measures of these things are only updated well into the year. GDELT now represents a nice alternative. It covers the whole world, measures lots of different forms of political cooperation and conflict, and is updated daily, so country-year updates are available on January 2. GDELT’s period of observation starts in 1979, so it’s still a stretch to use it models of super-rare events like mass-killing onsets, where the number of available examples since 1979 on which to train is still relatively small. For less-rare events like coup attempts, though, starting the analysis around 1980 is no problem. (Just don’t forget to normalize them!) With some help from John Beieler, I’m already experimenting with adding annual GDELT summaries to my coup forecasting process, and I’m finding that they do improve the model’s out-of-sample predictive power.

In all of the forecasting work I do, my long-term goals are 1) to make the forecasts more dynamic by updating them more frequently (e.g., monthly, weekly, or even daily instead of yearly) and 2) to automate that updating process as much as possible. The changes I’m making to my coup forecasting process for 2014 don’t directly accomplish either of these things, but they do take me a few steps in both directions. For example, once GDELT is in the mix, it’s possible to start thinking about how to switch to monthly or even daily updates that rely on a sliding window of recent GDELT tallies. And once I’ve got a coup data set that updates in near-real time, I can imagine pinging that source each day to update the counts of coup attempts in the past several years. I’m still not where I’d like to be, but I think I’m finally stepping onto a path that can carry me there.

EVEN BETTER Animated Map of Coup Attempts Worldwide, 1946-2013

[Click here to go straight to the map]

A week ago, I posted an animated map of coup attempts worldwide since 1946 (here). Unfortunately, those maps were built from a country-year data set, so we couldn’t see multiple attempts within a single country over the course of a year. As it happens, though, the lists of coup attempts on which that animation was based does specify the dates of those events. So why toss out all that information?

To get a sharper picture of the distribution of coup attempts across space and time, I rebuilt my mashed-up list of coup attempts from the original sources (Powell & Thyne and Marshall), but now with the dates included. Where only a month was given, I pegged the event to the first day of that month. To avoid double-counting, I then deleted events that appeared to be duplicates (same outcome in the same country within a single week). Finally, to get the animation in CartoDB to give a proper sense of elapsed time, I embedded the results in a larger data frame of all dates over the 68-year period observed. You can find the daily data on my Google Drive (here).

WordPress won’t seem to let me embed the results of my mapping directly in this post, but you can see and interact with the results at CartoDB (here). I think this version shows more clearly how much the rate of coup attempts has slowed in the past couple of decades, and it still does a good job of showing change over time in the geographic distribution of these events.

The two things I can’t figure out how to do so far are 1) to use color to differentiate between successful and failed attempts and 2) to show the year or month and year in the visualization so we know where we are in time. For differentiating by outcome, there’s a variable in the data set that does this, but it looks like the current implementation of the Torque option in CartoDB won’t let me show multiple layers or differentiate between the events by type. On showing the date, I have no clue. If anyone knows how to do either of these things, please let me know.

Singing the Missing-Data Blues

I’m currently in the throes of assembling data to use in forecasts on various forms of political change in countries worldwide for 2014. This labor-intensive process is the not-so-sexy side of “data science” that practitioners like to bang on about if you ask us, but I’m not going to do that here. Instead, I’m going to talk about how hard it is to find data sets that applied forecasters of rare events in international politics can even use in the first place. The steep data demands for predictive models mean that many of the things we’d like to include in our models get left out, and many of the data sets political scientists know and like aren’t useful to applied forecasters.

To see what I’m talking about, let’s assume we’re building a statistical model to forecast some rare event Y in countries worldwide, and we have reason to believe that some variable X should help us predict that Y. If we’re going to include X in our model, we’ll need data, but any old data won’t do. For a measure of X to be useful to an applied forecaster, it has to satisfy a few requirements. This Venn diagram summarizes the four I run into most often:


First, that measure of X has to be internally consistent. Validity is much less of a concern than it is in hypothesis-testing research, since we’re usually not trying to make causal inferences or otherwise build theory. If our measure of X bounces around arbitrarily, though, it’s not going to provide much a predictive signal, no matter how important the concept underlying X may be. Similarly, if the process by which that measure of X is generated keeps changing—say, national statistical agencies make idiosyncratic revisions to their accounting procedures, or coders keep changing their coding rules—then models based on the earlier versions will quickly break. If we know the source or shape of the variation, we might be able to adjust for it, but we aren’t always so lucky.

Second, to be useful in global forecasting, a data set has to offer global coverage, or something close to it. It’s really as simple as that. In the most commonly used statistical models, if a case is missing data on one or more of the inputs, it will be missing from the outputs, too. This is called listwise deletion, and it means we’ll get no forecast for cases that are missing values on any one of the predictor variables. Some machine-learning techniques can generate estimates in the face of missing data, and there are ways to work around listwise deletion in regression models, too (e.g., create categorical versions of continuous variables and treat missing values as another category). But those workarounds aren’t alchemy, and less information means less accurate forecasts.

Worse, the holes in our global data sets usually form a pattern, and that pattern is often correlated with the very things we’re trying to predict. For example, the poorest countries in the world are more likely to experience coups, but they are also more likely not to be able to afford the kind of bureaucracy that can produce high-quality economic statistics. Authoritarian regimes with frustrated citizens may be more likely to experience popular uprisings, but many autocrats won’t let survey research firms ask their citizens politically sensitive questions, and many citizens in those regimes would be reluctant to answer those questions candidly anyway. The fact that our data aren’t missing at random compounds the problem, leaving us without estimates for some cases and screwing up our estimates for the rest. Under these circumstances, it’s often best to omit the offending data set from our modeling process entirely, even if the X it’s measuring seems important.

Third and related to no. 2, if our events are rare, then our measure of X needs historical depth, too. To estimate the forecasting model, we want as rich a library of examples as we can get. For events as rare as onsets of violent rebellion or episodes of mass killing, which typically occur in just one or a few countries worldwide each year, we’ll usually need at least a few decades’ worth of data to start getting decent estimates on the things that differentiate the situations where the event occurs from the many others where it doesn’t. Without that historical depth, we run into the same missing-data problems I described in relation to global coverage.

I think this criterion is much tougher to satisfy than many people realize. In the past 10 or 20 years, statistical agencies, academic researchers, and non-governmental organizations have begun producing new or better data sets on all kinds of things that went unmeasured or poorly measured in the past—things like corruption or inflation or unemployment, to name a few that often come up in conversations about what predicts political instability and change. Those new data sets are great for expanding our view of the present, and they will be a boon to researchers of the future. Unfortunately, though, they can’t magically reconstruct the unobserved past, so they still aren’t very useful for predictive models of rare political events.

The fourth and final circle in that Venn diagram may be both the most important and the least appreciated by people who haven’t tried to produce statistical forecasts in real time: we need timely updates. If I can’t depend on the delivery of fresh data on X before or early in my forecasting window, then I can’t update my forecasts while they’re still relevant, and the model is effectively DOA. If X changes slowly, we can usually get away with using the last available observation until the newer stuff shows up. Population size and GDP per capita are a couple of variables for which this kind of extrapolation is generally fine. Likewise, if the variable changes predictably, we might use forecasts of X before the observed values become available. I sometimes do this with GDP growth rates. Observed data for one year aren’t available for many countries until deep into the next year, but the IMF produces decent forecasts of recent and future growth rates that can be used in the interim.

Maddeningly, though, this criterion alone renders many of the data sets scholars have painstakingly constructed for specific research projects useless for predictive modeling. For example, scholars in recent years have created numerous data sets to characterize countries’ national political regimes, a feature that scores of studies have associated with variation in the risk of many forms of political instability and change. Many of these “boutique” data sets on political regimes are based on careful research and coding procedures, cover the whole world, and reach at least several decades or more into the past. Only two of them, though—Polity IV and Freedom House’s Freedom in the World—are routinely updated. As much as I’d like to use unified democracy scores or measures of authoritarian regime type in my models, I can’t without painting myself into a forecasting corner, so I don’t.

As I hope this post has made clear, the set formed by the intersection of these four criteria is a tight little space. The practical requirements of applied forecasting mean that we have to leave out of our models many things that we believe might be useful predictors, no matter how important the relevant concepts might seem. They also mean that our predictive models on many different topics are often built from the same few dozen “usual suspects”—not because we want to, but because we don’t have much choice. Multiple imputation and certain machine-learning techniques can mitigate some of these problems, but they hardly eliminate them, and the missing information affects our forecasts either way. So the next time you’re reading about a global predictive model on international politics and wondering why it doesn’t include something “obvious” like unemployment or income inequality or survey results, know that these steep data requirements are probably the reason.

Animated Map of Coup Attempts Worldwide, 1946-2013

I’m in the throes of updating my data files to prepare for 2014 forecasts of various forms of political change, including coups d’etat. For the past couple of years, I’ve used the coup event list Monty Marshall produces (here) as my primary source on this topic, and I’ve informally cross-referenced Monty’s accounting with the list produced by Jonathan Powell and Clayton Thyne (here).

This year, I decided to quit trying to pick a favorite or adjudicate between the two and just go ahead and mash them up. The two projects use slightly different definitions, but both are basically looking for the same thing: some faction of political insiders (including but not limited to military leaders) seizes executive power at the national level by unconstitutional means that include the use or threat of force.

After stretching the two data sets into country-year format and merging the results, I created separate indicators for successful and failed coups that are scored 1 if either source reports an event of that type and 0 otherwise. For example, Marshall’s data set doesn’t see the removal of Egyptian president Hosni Mubarak from office in 2011 as a coup, but Powell and Thyne’s does, so in my mashed-up version, Egypt gets a 1 for 2011 on the indicator for any successful coups.* The Marshall data set starts in 1946, but Powell and Thyne don’t start until 1950, so my observations for 1946-1949 are based solely on the former. Powell and Thyne update their file on the go, however, whereas Marshall only updates once a year. This means that Powell and Thyne already have most of 2013 covered, so my observations for this year so far are based solely on their reckoning.

The bar plot below shows what the data from the combined version look like over time. The trend is basically the same one we’d see from either of the constituent sources. The frequency of coup attempts grew noticeably in the 1960s and 1970s; continued apace through the 1980s and 1990s, but with fewer successes; and then fell sharply in the past two decades.


We can see those time trends and the geographic distribution of these events in the GIF below (you may need to click on it to get it to play). As the maps show, coup events were pretty well scattered across the world in the 1960s and 1970s, but in the past 20 years, they’ve mostly struck in Africa and Asia.


A .csv with the mashed-up data is on Google Drive (here), and you can find the R script I used to make these plots on Github (here).

Update: For a new-and-improved version that uses daily data and is interactive, see this follow-up post.

* This sentence corrects an error I made in the original version of this post. In that version, I stated that Marshall did not consider the 3 July 2013 events in Egypt to include a coup. That was incorrect, and I apologize to him for my error.

China’s Accumulating Risk of Crisis

Eurasia Group founder Ian Bremmer has a long piece in the new issue of The National Interest that foretells continued political stability in China in spite of all the recent turbulence in the international system and at home. After cataloging various messes of the past few years—the global financial crisis and U.S. recession, war in Syria, and unrest in the other BRICS, to name a few—Bremmer says

It is all the more remarkable that there’s been so little noise from China, especially since the rising giant has experienced a once-in-a-decade leadership transition, slowing growth and a show trial involving one of the country’s best-known political personalities—all in just the past few months.

Given that Europe and America, China’s largest trade partners, are still struggling to recover their footing, growth is slowing across much of the once-dynamic developing world, and the pace of economic and social change within China itself is gathering speed, it’s easy to wonder if this moment is merely the calm before China’s storm.

Don’t bet on it. For the moment, China is more stable and resilient than many realize, and its political leaders have the tools and resources they need to manage a cooling economy and contain the unrest it might provoke.

Me, I’m not so sure. Every time I peek under another corner of the “authoritarian stability” narrative that blankets many discussions of China, I feel like I see another mess in the making.

That list is not exhaustive. No one of these situations seems especially likely to turn into a full-blown rebellion very soon, but that doesn’t mean that rebellion in China remains unlikely. That might sound like a contradiction, but it isn’t.

To see why, it helps to think statistically. Because of its size and complexity, China is like a big machine with lots of different modules, any one of which could break down and potentially set off a systemic failure. Think of the prospects for failure in each of those modules as an annual draw from a deck of cards: pull the ace of spades and you get a rebellion; pull anything else and you get more of the same. At 51:1 or about 2 percent, the chances that any one module will fail are quite small. If there are ten modules, though, you’re repeating the draw ten times, and your chances of pulling the ace of spades at least once (assuming the draws are independent) are more like 20 percent than 2. Increase the chances in any one draw—say, count both the king and the ace of spades as a “hit”—and the cumulative probability goes up accordingly. In short, when the risks are additive as I think they are here, it doesn’t take a ton of small probabilities to accumulate into a pretty sizable risk at the systemic level.

What’s more, the likelihoods of these particular events are actually connected in ways that further increase the chances of systemic trouble. As social movement theorists like Sidney Tarrow and Marc Beissinger have shown, successful mobilization in one part of an interconnected system can increase the likelihood of more action elsewhere by changing would-be rebels’ beliefs about the vulnerability of the system, and by starting to change the system itself.

As Bremmer points out, the Communist Party of China has done a remarkable job sustaining its political authority and goosing economic growth as long as it has. One important source of that success has been the Party’s willingness and capacity to learn and adapt as it goes, as evidenced by its sophisticated and always-evolving approach to censorship of social media and its increasing willingness to acknowledge and try to improve on its poor performance on things like air pollution and natural disasters.

Still, when I think of all the ways that system could start to fail and catalog the signs of increased stress on so many of those fronts, I have to conclude that the chances of a wider crisis in China are no longer so small and will only continue to grow. If Bremmer wanted to put a friendly wager on the prospect that China will be governed more or less as it is today to and through the Communist Party’s next National Congress, I’d take that bet.

Eye Candy for Social Scientists

A few great data visualizations have come across my virtual desk in the past 48 hours. I’ve already shared a couple of them on my Tumblr feed, but since no one actually looks at that, I thought I would post them here, too.

The first comes from Penn State Ph.D. student Josh Stevens, who has created a stunning set of maps and charts showing past and predicted conflict events in Afghanistan. This thing is great on a few levels. First, Josh has taken a massive data set—GDELT—and carefully parsed it to tell a number of interesting stories about where and when conflict has occurred and how those patterns have shifted over time. Second, he’s sticking his neck out and forecasting. That alone separates this visualization from almost every other one I’ve ever seen about political violence. Third, he’s done all this in a visual format that accounts for red/green color blindness. That condition is apparently pretty rare, but Josh’s attention to it is a nice reminder of the value of making your visualization broadly accessible.

Josh Steven's Visualization of Material Conflict Events in Afghanistan

Josh Steven’s Visualization of Material Conflict Events in Afghanistan

The next comes from software engineer Aengus Walton, who has built a web page that graphs hourly reports of air quality in several of China’s largest cities, a problem that I expect to play a role in future social unrest there. The graph is simple, pretty, and interactive. Users can toggle cities on and off and change the time period displayed from a single day to more than a year. What’s really cool about this graph, though, is what you don’t see. These data are reported by U.S. embassies, which tweet them every hour but don’t bother to chart them. Walton has written code that automatically pulls those tweets from the Twitter stream, scrapes them for the relevant data, and updates the graph in sync with the data bursts. I’d like to be able to do that when I grow up.

Aengus Walton's Graph of Air Pollution in China

Aengus Walton’s Graph of Air Pollution in China

The third comes from political science Ph.D. student Felix Haass, who gives us this animated GIF of African countries’ contributions to uniformed U.N. peacekeeping missions since 1991. (See here for the full post in which this was embedded.) I’m a sucker for animated maps, which suggest stories for further investigation that are harder to uncover from big lattices of static small multiples. This one, for example, nicely illustrates how the scale of UNPKO contributions has grown over time, and how a few countries (Egypt, Ethiopia, Ghana, Nigeria, Rwanda, and South Africa) have emerged as major contributors to these missions in the past decade or so.

Felix Haass' Maps of U.N. PKO Contributions by African States

Felix Haass’ Maps of U.N. PKO Contributions by African States

The last visualization isn’t social science, but I include it as a placeholder for a kind of page I could imagine creating some day. This is a screenshot from Forecast Lines, a weather site that aggregates data from several sources in an elegant way (h/t Trey Causey). The light grey lines show the component forecasts, and the black line shows the single-best forecast produced by an algorithm that combines them. You can toggle between temperature, precipitation, and other measures; you can see the component forecasts as you scroll over the lines; and each page includes forecasts on three time scales (hour, day, week).

forecast lines screenshot

Now: imagine a version of this showing daily forecasts of political conflict at the local level using batch-processed data from sources like GDELT and Bloomberg and Twitter, with the option of toggling over to a mapped version… I find that idea technically and theoretically thrilling and ethically ambiguous, depending, in part, on who is using it to what ends. Whatever you think of it, though, expect to see something like it on your mobile device in the next several years.

Road-Testing GDELT as a Resource for Monitoring Atrocities

As I said here a few weeks ago, I think the Global Dataset on Events, Location, and Tone (GDELT) is a fantastic new resource that really embodies some of the ways in which technological changes are coming together to open lots of new doors for social-scientific research. GDELT’s promise is obvious: more than 200 million political events from around the world over the past 30 years, all spotted and coded by well-trained software instead of the traditional armies of undergrad RAs, and with daily updates coming online soon. Or, as Adam Elkus’ t-shirt would have it, “200 million observations. Only one boss.”

BUT! Caveat emptor! Like every other data-collection effort ever, GDELT is not alchemy, and it’s important that people planning to use the data, or even just to consume analysis based on it, understand what its limitations are.

I’m starting to get a better feel for those limitations from my own efforts to use GDELT to help observe atrocities around the world, as part of a consulting project I’m doing for the U.S. Holocaust Memorial Museum’s Center for the Prevention of Genocide. The core task of that project is to develop plans for a public early-warning system that would allow us to assess the risk of onsets of atrocities in countries worldwide more accurately and earlier than current practice.

When I heard about GDELT last fall, though, it occurred to me that we could use it (and similar data sets in the pipeline) to support efforts to monitor atrocities as well. The CAMEO coding scheme on which GDELT is based includes a number of event types that correspond to various forms of violent attack and other variables indicating who was doing attacking whom. If we could develop a filter that reliably pulled events of interest to us from the larger stream of records, we could produce something like a near-real time bulletin on recent violence against civilians around the world. Our record would surely have some blind spots—GDELT only tracks a limited number of news sources, and some atrocities just don’t get reported, period—but I thought it would reliably and efficiently alert us to new episodes of violence against civilians and help us identify trends in ongoing ones.

Well, you know what they say about plans and enemies and first contact. After digging into GDELT, I still think we can accomplish those goals, but it’s going to take more human effort than I originally expected. Put bluntly, GDELT is noisier than I had anticipated, and for the time being the only way I can see to sharpen that signal is to keep a human in the loop.

Imagine (fantasize?) for a moment that there’s a perfect record somewhere of all the political interactions GDELT is trying to identify. For kicks, let’s call it the Encyclopedia Eventum (EE). Like any detection system, GDELT can mess up in two basic ways: 1) errors of omission, in which GDELT fails to spot something that’s in the EE; and 2) errors of commission, in which it mistakenly records an event that isn’t in the EE (or, relatedly, is in the EE but in a different place). We might also call these false negatives and false positives, respectively.

At this point, I can’t say anything about how often GDELT is making errors of omission, because I don’t have that Encyclopedia Eventum handy. A more realistic strategy for assessing the rate of errors of omission would involve comparing a subset of GDELT to another event data set that’s known to be a fairly reliable measure for some time and place of something GDELT is meant to track—say, protest and coercion in Europe—and see how well they match up, but that’s not a trivial task, and I haven’t tried it yet.

Instead, the noise I’m seeing is on the other side of that coin: the errors of commission, or false positives. Here’s what I mean:

To start developing my atrocities-monitoring filter, I downloaded the reduced and compressed version of GDELT recently posted on the Penn State Event Data Project page and pulled the tab-delimited text files for a couple of recent years. I’ve worked with event data before, so I’m familiar with basic issues in their analysis, but every data set has its own idiosyncrasies. After trading emails with a few CAMEO pros and reading Jay Yonamine’s excellent primer on event aggregation strategies, I started tinkering with a function in R that would extract the subset of events that appeared to involve lethal force against civilians. That function would involve rules to select on three features: event type, source (the doer), and target.

  • Event Type. For observing atrocities, type 20 (“Engage in Unconventional Mass Violence”) was an obvious choice. Based on advice from those CAMEO pros, I also focused on 18 (“Assault”) and 19 (“Fight”) but was expecting that I would need to be more restrictive about the subtypes, sources, and targets in those categories to avoid errors of commission.
  • Source. I’m trying to track violence by state and non-state agents, so I focused on GOV (government), MIL (Military), COP (police), and intelligence agencies (SPY) for the former and REB (militarized opposition groups) and SEP (separatist groups) for the latter. The big question mark was how to handle records with just a country code (e.g., “SYR” for Syria) and no indication of the source’s type. My CAMEO consultants told me these would usually refer in some way to the state, so I should at least consider including them.
  • Target. To identify violence against civilians, I figured I would get the most mileage out of the OPP (non-violent political opposition), CVL (“civilians,” people in general), and REF (refugees) codes, but I wanted to see if the codes for more specific non-state actors (e.g., LAB for labor, EDU for schools or students, HLH for health care) would also help flag some events of interest.

After tinkering with the data a bit, I decided to write to separate functions, one for events with state perpetrators and another for events with non-state perpetrators. If you’re into that sort of thing, you can see the state-perpetrator version of that filtering function on Github, here.

When I ran the more than 9 million records in the “2011.reduced.txt” file through that function, I got back 2,958 events. So far, so good. As soon as I started poking around in the results, though, I saw a lot of records that looked . The current release of GDELT doesn’t include text from or links to the source material, so it’s hard to say for sure what real-world event any one record describes. Still, some of the perpetrator-and-target combos looked odd to me, and web searches for relevant stories either came up empty or reinforced my suspicions that the records were probably errors of commission. Here are a few examples, showing the date, event type, source, and target:

  • 1/8/2011 193 USAGOV USAMED. Type 193 is “Fight with small arms and light weapons,” but I don’t think anyone from the U.S. government actually got in a shootout or knife fight with American journalists that day. In fact, that event-source-target combination popped up a lot in my subset.
  • 1/9/2011 202 USAMIL VNMCVL. Taken on its face, this record says that U.S. military forces killed Vietnamese civilians on January 9, 2011. My hunch is that the story on which this record is based was actually talking about something from the Vietnam War.
  • 4/11/2011 202 RUSSPY POLCVL. This record seems to suggest that Russian intelligence agents “engaged in mass killings” of Polish civilians in central Siberia two years ago. I suspect the story behind this record was actually talking about the Kaytn Massacre and associated mass deportations that occurred in April 1940.

That’s not to say that all the records looked wacky. Interleaved with these suspicious cases were records representing exactly the kinds of events I was trying to find. For example, my filter also turned up a 202 GOV SYRCVL for June 10, 2011, a day on which one headline blared “Dozens Killed During Syrian Protests.”

Still, it’s immediately clear to me that GDELT’s parsing process is not quite at the stage where we can peruse the codebook like a menu, identify the morsels we’d like to consume, phone our order in, and expect to have exactly the meal we imagined waiting for us when we go to pick it up. There’s lots of valuable information in there, but there’s plenty of chaff, too, and for the time being it’s on us as researchers to take time to try to sort the two out. This sorting will get easier to do if and when the posted version adds information about the source article and relevant text, but “easier” in this case will still require human beings to review the results and do the cross-referencing.

Over time, researchers who work on specific topics—like atrocities, or interstate war, or protest activity in specific countries—will probably be able to develop supplemental coding rules and tweak their filters to automate some of what they learn. I’m also optimistic that the public release of GDELT will accelerate improvements the software and dictionaries it uses, expanding its reach while shrinking the error rates. In the meantime, researchers are advised to stick to the same practices they’ve always used (or should have, anyway): take time to get to know your data; parse it carefully; and, when there’s no single parsing that’s obviously superior, check the sensitivity of your results to different permutations.

PS. If you have any suggestions on how to improve the code I’m using to spot potential atrocities or otherwise improve the monitoring process I’ve described, please let me know. That’s an ongoing project, and even marginal improvements in the fidelity of the filter would be a big help.

PPS. For more on these issues and the wider future of automated event coding, see this ensuing post from Phil Schrodt on his blog.

In Praise of Fun Projects

Over the past year, I’ve watched a few people I know in digital life sink a fair amount of time into statistical modeling projects that other people might see as “just for fun,” if not downright frivolous. Last April, for example, public-health grad student Brett Keller delivered an epic blog post that used event history models to explore why some competitors survive longer than others in the fictional Hunger Games. More recently, sociology Ph.D. student Alex Hanna has been using the same event history techniques to predict who’ll get booted each week from the reality TV show RuPaul’s Drag Race (see here and here so far). And then there’s Against the Spread, a nascent pro-football forecasting project from sociology Ph.D. candidate Trey Causey, whose dissertation uses natural language processing and agent-based modeling to examine information ecology in authoritarian regimes.

I happen to think these kinds of projects are a great idea, if you can find the time to do them–and if you’re reading this blog post, you probably can. Based on personal experience, I’m a big believer in learning by doing. Concepts don’t stick in my brain when I only read about them; I’ve got to see the concepts in action and attach them to familiar contexts and examples to really see what’s going on. Blog posts like Brett’s and Alex’s are a terrific way to teach yourself new methods by applying them to toy problems where the data sets are small, the domain is familiar and interesting, and the costs of being wrong are negligible.


A bigger project like Trey’s requires you to solve a lot of complex procedural and methodological problems, but all the skills you develop along the way transfer to other domains. If you can build and run a decent forecasting system from scratch for something as complex as pro football, you can do the same for “seriouser” problems, too. I think that demonstrated skill on fun tasks says as much about someone’s ability to execute complex research in the real world as any job talk or publication in a peer-reviewed journal. Done well, these hobby projects can even evolve into rewarding enterprises of their own. Just ask Nate Silver, who kickstarted his now-prodigious career as a statistical forecaster with PECOTA, a baseball forecasting system that he ginned up for fun while working for pay as a consultant.

I suspect that a lot of people in the private sector already get this. Academia, not so much, but then they’re the ones who wind up poorer for it.

Forecasting Politics Is Still Hard to Do (Well)

Last November, after the U.S. elections, I wrote a thing for Foreign Policy about persistent constraints on the accuracy of statistical forecasts of politics. The editors called it “Why the World Can’t Have a Nate Silver,” and the point was that much of what people who follow international affairs care about is still a lot harder to forecast accurately than American presidential elections.

One of the examples I cited in that piece was Silver’s poor performance on the U.K.’s 2010 parliamentary elections. Just two years before his forecasts became a conversation piece in American politics, the guy the Economist called “the finest soothsayer this side of Nostradamus” missed pretty badly in what is arguably another of the most information-rich election environments in the world.

A couple of recent election-forecasting efforts only reinforce the point that, the Internet and polling and “math” notwithstanding, this is still hard to do.

The first example comes from political scientist Chris Hanretty, who applied a statistical model to opinion polls to forecast the outcome of Italy’s parliamentary elections. Hanretty’s algorithm indicated that a coalition of center-left parties was virtually certain to win a majority and form the next government, but that’s not what happened. After the dust had settled, Hanretty sifted through the rubble and concluded that “the predictions I made were off because the polls were off.”

Had the exit polls given us reliable information, I could have made an instant prediction that would have been proved right. As it was, the exit polls were wrong, and badly so. This, to me, suggests that the polling industry has made a collective mistake.

The second recent example comes from doctoral candidate Ken Opalo, who used polling as grist for a statistical mill to forecast the outcome of Kenya’s presidential election. Ken’s forecast indicated that Uhuru Kenyatta would get the most votes but would fall short of the 50-percent-plus-one-vote required to win in the first round, making a run-off “almost inevitable.” In fact, Kenyatta cleared the 50-percent threshold in the first try, making him Kenya’s new president-elect. Once again, noisy polling data was apparently to blame. As Ken noted in a blog post before the results were finalized,

Mr. Kenyatta significantly outperformed the national polls leading to the election. I estimated that the national polls over-estimated Odinga’s support by about 3 percentage points. It appears that I may have underestimated their overestimation. I am also beginning to think that their regional weighting was worse than I thought.

As I see it, both of these forecasts were, as Nate Silver puts it in his book, wrong for the right reasons. Both Hanretty and Opalo built models that used the best and most relevant information available to them in a thoughtful way, and neither forecast was wildly off the mark. Instead, it just so happened that modest errors in the forecasts interacted with each country’s electoral rules to produce categorical outcomes that were quite different from the ones the forecasts had led us to expect.

But that’s the rub, isn’t it? Even in the European Union in the Internet age, it’s still hard to predict the outcome of national elections. We’re getting smarter about how to model these things, and our computers can now process more of the models we can imagine, but polling data are still noisy and electoral systems complex.

And that’s elections, where polling data nicely mimic the data-generating process that underlies the events we’re trying to forecast. We don’t have polls telling us what share of the population plans to turn out for anti-government demonstrations or join a rebel group or carry out a coup—and even if we did, we probably wouldn’t trust them. Absent these micro-level data, we turn to proxy measures and indicators of structural opportunities and constraints, but every step away from the choices we’re trying to forecast adds more noise to the result. Agent-based computational models represent a promising alternative, but when it comes to macro-political phenomena like revolutions and state collapses, these systems are still in their infancy.

Don’t get me wrong. I’m thrilled to see more people using statistical models to try to forecast important events in international politics, and I would eagerly pit the forecasts from models like Hanretty’s and Opalo’s against the subjective judgments of individual experts any day. I just think it’s important to avoid prematurely declaring the arrival of a revolution in forecasting political events, to keep reminding ourselves how hard this problem still is. As if the (in)accuracy of our forecasts would let us have it any other way.

Coup Risk in 2013, Mapped My Way

This blog’s gotten a lot more traffic than usual since yesterday, when Max Fisher of the Washington Post called out my 2013 coup forecasts in a post on WorldViews.

I’m grateful for the attention Max has drawn to my work, but if it had been up to me, I would have done the mapping a little differently. As I said to Max in an email from which he later excerpted, the forecasts simply aren’t sharp enough to parse the world as finely as their map did. Our theories of what causes coup attempts are too fuzzy and our measures of the things in those theories are too spotty to estimate the probability of these rare events with that much precision.

But, hey, I’m a data guy. I don’t have to stick to grumbling about the Post‘s map; I can make my own! So…

The map below sorts the countries of the world into three groups based on their relative coup risk for 2013: highest (red), moderate (orange), and lowest (beige). I emphasize “relative” because coup attempts are very rare, so the estimated risk of coup attempts in any given country in any single year is pretty small. For example, Guinea-Bissau tops my list for 2013, and the estimated probability of at least one coup attempt occurring there this year is only 25%. Most countries worldwide are under 2%.

Consistent with an emphasis on relative risk, the categories I’ve mapped are based on rank order, not predicted probability. The riskiest fifth of the world (33 countries) makes up the “highest” group, the second fifth the “moderate” group, and the bottom three-fifths the “lowest” group.

This forecasting process doesn’t have enough of track record for me to say exactly how those categories relate to real-world risk, but based on my experience working with similar data and models, I would expect roughly four of every five coup attempts to occur in countries identified here as high risk, and the occasional “miss” to come from the moderate-risk set. Only very rarely should coup attempts come from the 100 or so countries in the low-risk group.


FTR, this map was made in R using the ‘rworldmap‘ package.


Get every new post delivered to your Inbox.

Join 5,709 other followers

%d bloggers like this: