Road-Testing GDELT as a Resource for Monitoring Atrocities

As I said here a few weeks ago, I think the Global Dataset on Events, Location, and Tone (GDELT) is a fantastic new resource that really embodies some of the ways in which technological changes are coming together to open lots of new doors for social-scientific research. GDELT’s promise is obvious: more than 200 million political events from around the world over the past 30 years, all spotted and coded by well-trained software instead of the traditional armies of undergrad RAs, and with daily updates coming online soon. Or, as Adam Elkus’ t-shirt would have it, “200 million observations. Only one boss.”

BUT! Caveat emptor! Like every other data-collection effort ever, GDELT is not alchemy, and it’s important that people planning to use the data, or even just to consume analysis based on it, understand what its limitations are.

I’m starting to get a better feel for those limitations from my own efforts to use GDELT to help observe atrocities around the world, as part of a consulting project I’m doing for the U.S. Holocaust Memorial Museum’s Center for the Prevention of Genocide. The core task of that project is to develop plans for a public early-warning system that would allow us to assess the risk of onsets of atrocities in countries worldwide more accurately and earlier than current practice.

When I heard about GDELT last fall, though, it occurred to me that we could use it (and similar data sets in the pipeline) to support efforts to monitor atrocities as well. The CAMEO coding scheme on which GDELT is based includes a number of event types that correspond to various forms of violent attack and other variables indicating who was doing attacking whom. If we could develop a filter that reliably pulled events of interest to us from the larger stream of records, we could produce something like a near-real time bulletin on recent violence against civilians around the world. Our record would surely have some blind spots—GDELT only tracks a limited number of news sources, and some atrocities just don’t get reported, period—but I thought it would reliably and efficiently alert us to new episodes of violence against civilians and help us identify trends in ongoing ones.

Well, you know what they say about plans and enemies and first contact. After digging into GDELT, I still think we can accomplish those goals, but it’s going to take more human effort than I originally expected. Put bluntly, GDELT is noisier than I had anticipated, and for the time being the only way I can see to sharpen that signal is to keep a human in the loop.

Imagine (fantasize?) for a moment that there’s a perfect record somewhere of all the political interactions GDELT is trying to identify. For kicks, let’s call it the Encyclopedia Eventum (EE). Like any detection system, GDELT can mess up in two basic ways: 1) errors of omission, in which GDELT fails to spot something that’s in the EE; and 2) errors of commission, in which it mistakenly records an event that isn’t in the EE (or, relatedly, is in the EE but in a different place). We might also call these false negatives and false positives, respectively.

At this point, I can’t say anything about how often GDELT is making errors of omission, because I don’t have that Encyclopedia Eventum handy. A more realistic strategy for assessing the rate of errors of omission would involve comparing a subset of GDELT to another event data set that’s known to be a fairly reliable measure for some time and place of something GDELT is meant to track—say, protest and coercion in Europe—and see how well they match up, but that’s not a trivial task, and I haven’t tried it yet.

Instead, the noise I’m seeing is on the other side of that coin: the errors of commission, or false positives. Here’s what I mean:

To start developing my atrocities-monitoring filter, I downloaded the reduced and compressed version of GDELT recently posted on the Penn State Event Data Project page and pulled the tab-delimited text files for a couple of recent years. I’ve worked with event data before, so I’m familiar with basic issues in their analysis, but every data set has its own idiosyncrasies. After trading emails with a few CAMEO pros and reading Jay Yonamine’s excellent primer on event aggregation strategies, I started tinkering with a function in R that would extract the subset of events that appeared to involve lethal force against civilians. That function would involve rules to select on three features: event type, source (the doer), and target.

  • Event Type. For observing atrocities, type 20 (“Engage in Unconventional Mass Violence”) was an obvious choice. Based on advice from those CAMEO pros, I also focused on 18 (“Assault”) and 19 (“Fight”) but was expecting that I would need to be more restrictive about the subtypes, sources, and targets in those categories to avoid errors of commission.
  • Source. I’m trying to track violence by state and non-state agents, so I focused on GOV (government), MIL (Military), COP (police), and intelligence agencies (SPY) for the former and REB (militarized opposition groups) and SEP (separatist groups) for the latter. The big question mark was how to handle records with just a country code (e.g., “SYR” for Syria) and no indication of the source’s type. My CAMEO consultants told me these would usually refer in some way to the state, so I should at least consider including them.
  • Target. To identify violence against civilians, I figured I would get the most mileage out of the OPP (non-violent political opposition), CVL (“civilians,” people in general), and REF (refugees) codes, but I wanted to see if the codes for more specific non-state actors (e.g., LAB for labor, EDU for schools or students, HLH for health care) would also help flag some events of interest.

After tinkering with the data a bit, I decided to write to separate functions, one for events with state perpetrators and another for events with non-state perpetrators. If you’re into that sort of thing, you can see the state-perpetrator version of that filtering function on Github, here.

When I ran the more than 9 million records in the “2011.reduced.txt” file through that function, I got back 2,958 events. So far, so good. As soon as I started poking around in the results, though, I saw a lot of records that looked . The current release of GDELT doesn’t include text from or links to the source material, so it’s hard to say for sure what real-world event any one record describes. Still, some of the perpetrator-and-target combos looked odd to me, and web searches for relevant stories either came up empty or reinforced my suspicions that the records were probably errors of commission. Here are a few examples, showing the date, event type, source, and target:

  • 1/8/2011 193 USAGOV USAMED. Type 193 is “Fight with small arms and light weapons,” but I don’t think anyone from the U.S. government actually got in a shootout or knife fight with American journalists that day. In fact, that event-source-target combination popped up a lot in my subset.
  • 1/9/2011 202 USAMIL VNMCVL. Taken on its face, this record says that U.S. military forces killed Vietnamese civilians on January 9, 2011. My hunch is that the story on which this record is based was actually talking about something from the Vietnam War.
  • 4/11/2011 202 RUSSPY POLCVL. This record seems to suggest that Russian intelligence agents “engaged in mass killings” of Polish civilians in central Siberia two years ago. I suspect the story behind this record was actually talking about the Kaytn Massacre and associated mass deportations that occurred in April 1940.

That’s not to say that all the records looked wacky. Interleaved with these suspicious cases were records representing exactly the kinds of events I was trying to find. For example, my filter also turned up a 202 GOV SYRCVL for June 10, 2011, a day on which one headline blared “Dozens Killed During Syrian Protests.”

Still, it’s immediately clear to me that GDELT’s parsing process is not quite at the stage where we can peruse the codebook like a menu, identify the morsels we’d like to consume, phone our order in, and expect to have exactly the meal we imagined waiting for us when we go to pick it up. There’s lots of valuable information in there, but there’s plenty of chaff, too, and for the time being it’s on us as researchers to take time to try to sort the two out. This sorting will get easier to do if and when the posted version adds information about the source article and relevant text, but “easier” in this case will still require human beings to review the results and do the cross-referencing.

Over time, researchers who work on specific topics—like atrocities, or interstate war, or protest activity in specific countries—will probably be able to develop supplemental coding rules and tweak their filters to automate some of what they learn. I’m also optimistic that the public release of GDELT will accelerate improvements the software and dictionaries it uses, expanding its reach while shrinking the error rates. In the meantime, researchers are advised to stick to the same practices they’ve always used (or should have, anyway): take time to get to know your data; parse it carefully; and, when there’s no single parsing that’s obviously superior, check the sensitivity of your results to different permutations.

PS. If you have any suggestions on how to improve the code I’m using to spot potential atrocities or otherwise improve the monitoring process I’ve described, please let me know. That’s an ongoing project, and even marginal improvements in the fidelity of the filter would be a big help.

PPS. For more on these issues and the wider future of automated event coding, see this ensuing post from Phil Schrodt on his blog.

Advertisements

How Not to Help a Popular Uprising and Stop Mass Atrocities

More than 5,000 people have been killed and many thousands more detained and sometimes tortured since a nonviolent uprising began in Syria in March 2011. The regime’s sustained brutality in response to this popular challenge clearly deserves to be called a mass killing, and the killing machine so far shows no signs of abating.

Most people can’t witness atrocities on this scale without at least thinking about what might be done to stop them. That impulse has already led foreign governments to take a number of concrete actions to try to punish the perpetrators and protect Syrian civilians. The United States, the European UnionCanada, Turkey, and (most significant) the Arab League have all imposed tough sanctions on the Syrian regime, and those sanctions seem to be taking a real toll. In late December, the Arab League sent a team of observers to Syria to monitor the government’s treatment of nonviolent protesters. So far, that mission seems to be having little or no effect, but the mere fact of an Arab League mission to stop one of its member governments from killing its own people marks an important shift in the region’s international relations.

What hasn’t yet happened, of course, is direct foreign military intervention, at least not at any significant scale. Some elements of the Syrian opposition have called on foreign powers to establish a “no-fly zone” or “safe zones” in the country, and some commentators have called for a “Libyan-style liberation” with U.N. backing, but China and Russia so far have spoiled attempts to pass a Security Council resolution that would legitimate that kind action.

Of course, the absence of a U.N. resolution doesn’t mean that more forceful intervention can’t happen. In fact, according to a recent report by Josh Rogin on his blog for Foreign Policy, the Obama administration is already “quietly preparing options” to provide more direct support to the Syrian opposition. “After imposing several rounds of financial sanctions on Syrian regime leaders, the focus is now shifting to assisting the opposition directly,” Rogin writes. Among the options reportedly under consideration are…

…establishing a humanitarian corridor or safe zone for civilians in Syria along the Turkish border, extending humanitarian aid to the Syrian rebels, providing medical aid to Syrian clinics, engaging more with the external and internal opposition, forming an international contact group, or appointing a special coordinator for working with the Syrian opposition (as was done in Libya).

I’m a political scientist, not a foreign-policy pro, but my understanding of the politics of authoritarian rule tells me that this kind of prolonged mumbling about maybe intervening, a little bit, sometime soon might just be the worst thing a foreign government can do to try to help an opposition like Syria’s.

The basic problem is that a mumbled threat from a powerful adversary can be scary enough to provoke a response without actually doing anything concrete to help the opposition it’s meant to support. For one thing, fear of future intervention can prod the regime to kill faster in hopes of ending the uprising before any intervention can happen. In what economists call a free-rider problem, hopes for foreign intervention can also lead opposition groups to husband their own resources, thereby diminishing the chances that the revolution will succeed without substantial foreign support. Under certain conditions, the hanging threat of intervention can even give rebel leaders “an incentive to engage in the kinds of provocative actions that make atrocities against their followers more likely in the first place.” To a foreign government hoping to protect civilian lives and catalyze the fall of a dictatorial regime, none of these is a good outcome.

My point here is not to make the case against international support for Syrian opposition groups, although I do have serious doubts about the immediate and long-term effects of foreign military intervention [as discussed in this subsequent post]. And lest there be any doubt: attempts to establish a “no-fly zone,” “safe zones,” or “humanitarian corridors” in Syria would necessarily involve large-scale and risky military operations.

Instead, my point is that vague threats of future action are probably doing more harm than good, so they should stop. Cheap talk may be just that for the talkers, but it can actually be pretty costly to some of the bystanders. For foreign governments that want to see the atrocities against Syrian protesters end, it would be better to hurry up and make a credible threat of decisive action, or to signal clearly that the international cavalry isn’t going to arrive any time soon.

Not Everyone Likes Obama’s Atrocities Prevention Board

Last week, I blogged about President Obama’s new directive identifying the prevention of mass atrocities as a “core national security interest” and establishing an Atrocities Prevention Board to develop and coordinate the administration’s responses to situations where mass killing may occur (link). On Foreign Policy‘s web site yesterday, Celeste Ward Gventer offered a dissenting view on this initiative (“Interventionism Run Amok“). She worries that the prioritization of atrocities prevention as a “core national security interest” will pull the U.S. into more military interventions in complex conflicts it mistakenly views through the narrow lens of humanitarian concerns, and, to make matters worse, that those interventions will be ineffective.

I suppose that outcome is possible, but I gather that the point of the president’s Board is precisely to avoid that fate. With more lead time, more coordination, and more creative thinking about ways to discourage atrocities, I think the president hopes to help shrink the odds that future conflicts will turn toward mass killing so that the question of U.S. military intervention does not even come up. In corporate-speak, I think the goal is to try smarter, not harder. The president’s initial reluctance to intervene with force in Libya and the early rejection of U.S. military action as an option in Syria tell me that this administration is looking to avoid gunboat preventionism, not to embrace it.

Can the U.S. Prevent Mass Atrocities? Obama Apparently Thinks So

According to today’s New York Times (link), President Obama is set to issue a presidential directive establishing a new inter-agency panel to help the U.S. government try to prevent mass atrocities. Drawing “officials from the White House, the State Department, the Pentagon and other agencies,” this Atrocities Prevention Board is expected to develop “an early-warning system of potential genocide and other politically driven humanitarian catastrophes” and to come up with “a range of American responses” to those events.

It’s easy to be cynical about the endless proliferation of committees in government, but I think this board’s creation is a significant step. If it spurs improvements in early warning and streamlines the link between warning and preventive action, the board’s existence really could make the U.S. government’s efforts to prevent atrocities more effective. By signaling that the president considers atrocities prevention to be a high priority, the board’s creation could also motivate managers in relevant agencies to devote more resources to the problem.

I don’t know anything about the internal machinations behind it, but I presume the decision to create this new board was influenced by the work of the Genocide Prevention Task Force, a high-level panel convened in 2007 by the U.S. Holocaust Memorial Museum, the American Academy of Diplomacy, and the U.S. Institute of Peace with funding from private foundations. The Genocide Prevention Task Force issued a final report in late 2008 that was chock full of smart recommendations for American policy-makers, and at the top of the task force’s list was the creation of “a new standing interagency mechanism for analysis of threats and coordination of appropriate preventive action.” The Atrocities Prevention Board that President Obama is set to establish would seem to be exactly that.

I wonder if the president’s decision was also catalyzed by current events in Syria, where the Assad regime in the past week has accelerated its killings of unarmed protesters in Hama (link), the focal point of mass atrocities in 1982 in which more than 10,000 Syrians were killed. At this stage, there seems to be little the U.S. government can do short of direct military intervention, for which there is (appropriately) no appetite. When the U.S. government is reduced to verbal jousting in the U.N. Security Council and threats to withhold visas from accused human-rights violators, you know there aren’t a lot of great options on the table. I can imagine that the resulting sense of powerlessness in an administration which has strongly endorsed the international community’s responsibility to prevent mass atrocities might motivate officials to look for ways to strengthen their hand against similar crises in the future.

One big idea embodied in this panel is the hope that early warning of impending atrocities will give the U.S. government more time to formulate and implement preventive measures, and that more lead time will make those efforts more effective. I don’t know a lot about the prevention part of that equation, but I do have experience working on the warning side. As research director for the Political Instability Task Force, I was involved in two projects that used statistical analysis to develop tools for early warning on mass killings: one led by genocide scholar Barbara Harff (link), and another in which I collaborated with Dartmouth professor Ben Valentino and SAIC statisticians Mike Lustik and Hongxia Zhu (link). Based on my experience from those projects, I expect the warning piece of this task will be difficult to do as well as policy-makers would like, but I’m also confident it can be done reasonably well. For reasons I’ve elaborated in an earlier post (here), I doubt we’ll ever be able to develop a warning method that accurately and uniquely identifies all countries at great risk of mass atrocities weeks or months in advance. With events this rare, even the most accurate forecasting methods will produce a fair amount of noise with their signals.

Even in the already-difficult world of forecasting political crises, warning on mass atrocities has proved unusually difficult because these killings usually only happen on a large scale in situations where other forms of instability, such as civil wars or recent regime change, are already occurring. Consequently, when you try to develop a model to forecast the onset of mass killings, you mostly get results that look a lot like your models for forecasting those other events. It might be useful to tell interested audiences that general risks of political instability are their best leads on the specific risk of mass atrocities, but that’s probably not the kind of specificity they’re looking for. The two atrocities warning projects on which I’ve worked (see above) dealt with that problem by estimating statistical models from samples that only included countries which had recently experienced an onset of political instability. That strategy gives you a sharper cut on the risks of mass killing than you’d get by looking at all countries all the time, but it doesn’t give you much traction on cases like Syria today, where there was no civil war or adverse regime change before government forces began killing large numbers of unarmed protesters.

To make atrocities warnings more useful to policy-makers who are serious about taking preventive action, we need to improve on these approaches. I know of one work in progress by MIT political science Ph.D. student Chad Hazlett that tries to jettison the conditional design and shows promising results. Better data on the occurrence of atrocities could also help by opening the door to a wider array of analytical methods. On that front, I remain hopeful that PITF’s Worldwide Atrocities Event Data Set (link) will finally get discovered and used by scholars interested in this topic. And, of course, it’s always helpful to bear in mind that a warning tool doesn’t need to be perfect to be good; it just needs to work better than the ad hoc, mostly subjective approaches that are typically used now.

UPDATE: You can find the White House’s fact sheet on the president’s directive here. In that directive, President Obama finds that “preventing mass atrocities and genocide is a core national security interest and a core moral responsibility of the United States of America.” To my knowledge, this statement is unprecedented in the strength of its commitment to atrocities prevention.

UPDATE 2: For a dissenting view on this initiative and my response to it, see this later blog post.

  • Author

  • Follow me on Twitter

  • Follow Dart-Throwing Chimp on WordPress.com
  • Enter your email address to follow this blog and receive notifications of new posts by email.

    Join 13,631 other followers

  • Archives

  • Advertisements
%d bloggers like this: