Challenges in Measuring Violent Conflict, Syria Edition

As part of a larger (but, unfortunately, gated) story on how the terrific new Global Data on Events, Language, and Tone (GDELT) might help social scientists forecast violent conflicts, the New Scientist recently posted some graphics using GDELT to chart the ongoing civil war in Syria. Among those graphics was this time-series plot of violent events per day in Syria since the start of 2011:

Based on that chart, the author of the story (not the producers of GDELT, mind you) wrote:

As Western leaders ponder intervention, the resulting view suggests that the violence has subsided in recent months, from a peak in the third quarter of 2012.

That inference is almost certainly wrong, and why it’s wrong underscores one of the fundamental challenges in using event data—whether it’s collected and coded by software or humans or some combination thereof—to observe the dynamics of violent conflict.

I say that inference is almost certainly wrong because concurrent data on deaths and refugees suggest that violence in Syria has only intensified in past year. One of the most reputable sources on deaths from the war is the Syria Tracker. A screenshot of their chart of monthly counts of documented killings is shown below. Like GDELT, their data also identify a sharp increase in violence in late 2012. Unlike GDELT, their data indicate that the intensity of the violence has remained very high since then, and that’s true even though the process of documenting killings inevitably lags behind the actual violence.

We see a similar pattern in data from the U.N. High Commissioner on Refugees (UNHCR) on people fleeing the fighting in Syria. If anything, the flow of refugees has only increased in 2013, suggesting that the violence in Syria is hardly abating.

The reason GDELT’s count of violent events has diverged from other measures of the intensity of the violence in Syria in recent months is probably something called “media fatigue.” Data sets of political events generally depend on news sources to spot events of interest, and it turns out that news coverage of large-scale political violence follows a predictable arc. As Deborah Gerner and Phil Schrodt describe in a paper from the late 1990s, press coverage of a sustained and intense conflicts is often high when hostilities first break out but then declines steadily thereafter. That decline can happen because editors and readers get bored, burned out, or distracted. It can also happen because the conflict gets so intense that it becomes, in a sense, too dangerous to cover. In the case of Syria, I suspect all of these things are at work.

My point here isn’t to knock GDELT, which is still recording scores or hundreds of events in Syria every day, automatically, using open-source code, and then distributing those data to the public for free. Instead, I’m just trying to remind would-be users of any data set of political events to infer with caution. Event counts are one useful way to track variation over time in political processes we care about, but they’re only one part of the proverbial elephant, and they are inevitably constrained by the limitations of the sources from which they draw. To get a fuller sense of the beast, we need as often as possible to cross-reference those event data with other sources of information. Each of the sources I’ve cited here has its own blind spots and selection biases, but a comparison of trends from all three—and, importantly, an awareness of the likely sources of those biases—is enough to give me confidence that the civil war in Syria is only continuing to intensify. That says something important about Syria, of course, but it also says something important about the risks of drawing conclusions from event counts alone.

PS. For a great discussion of other sources of bias in the study of political violence, see Stathis Kalyvas’ 2004 essay on “The Urban Bias in Research on Civil Wars” (PDF).

17 Comments

by Jay Ulfelder on May 16, 2013 • Permalink

Posted in Mass Atrocities, Methods, Uncategorized, Violent Conflict

Tagged Deborah Gerner, GDELT, New Scientist, Phil Schrodt, Stathis Kalyvas, syria

Posted by Jay Ulfelder on May 16, 2013

https://dartthrowingchimp.wordpress.com/2013/05/16/challenges-in-measuring-violent-conflict-syria-edition/

17 Comments

Alex
/ May 16, 2013

The social movements literature has a lot of work on news bias in protest coverage as well. The Jennifer Earl et al. review is informative: http://www.annualreviews.org/doi/abs/10.1146/annurev.soc.30.012703.110603

Reply
- dartthrowingchimp
  / May 16, 2013
  
  As it happens, most of my own work with event data has focused on nonviolent mobilization, so I’m really glad you brought this up. I probably should have made the connection explicit in the post, but then I haven’t kept pace with the literature since grad school, so I wasn’t sure where to point. Problem solved with your link, so…thank you!
  
  Reply
Oral Hazard
/ May 16, 2013

To use an extreme example, I wonder if the GDELT collection would record the Hiroshima bomb as a single violent event in Japan. In more realistic terms, one marketplace car bomb with 50 fatalities = one downing of a regime aircraft.

Reply
- dartthrowingchimp
  / May 16, 2013
  
  I think the simple answer is yes, it would. That said, you would get some indication of the intensity of the event from duplicate records. Duplicates have traditionally been considered a bad thing in event data sets, but if I understand the coding process correctly, GDELT acknowledges the inevitability of duplicates and tries instead to treat it as a signal for this very purpose.
  
  From my conversations with Phil and Kalev, I also gather that they are working on a way to pull casualty counts from stories as well, so in the future you could use event counts, casualty counts, or some combination thereof as an indicator of conflict intensity. That would partially address your concern about the comparability of individual events. It would not address the deeper issue of selection bias in reporting, including the “media fatigue” problem I mentioned here.
  
  Reply
  - Felix Haass
    / May 16, 2013
    
    Sorry, I’m late to this, but I just saw the New Scientist article today. I would definitely agree to your conclusion that the violence in Syria has in fact NOT receded. If you combine the insights from the two datasets (GDELT and Syria Tracker), it could even lead to the conclusion that the conflict has become more brutal. Essentially, we’re observing a high number of killings (since August 2012, Syria Tracker) while there has been a decline in the number of reported events (GDELT) during the same time. This suggests a) media fatigue (as you have pointed out, Jay) and/or b) rebels and government troops are killing more people in the same events, i.e. through mass killings which would be an indicator for a brutalization of the war (which is also in line with the increasing refugee numbers you provided).
    
    I’m pretty sure it’s mostly a) but to reject b) we would need data on how many people were killed at a given event. I’m not familiar with the Syria Tracker data, so I don’t know if that data is in there. Anybody?
    
    Also, to assess GDELT’s accurancy of location, it would be interesting to see how the two data sets compare in terms of pinning down the location of violence. By simple visual comparison they seem pretty similar, but again, numbers would be great. The New Scientist data is not available on the net, right? Otherwise, the numbers could be crunched comparatively quickly. (I know, technically I could reconstruct the New Scientist Data from GDELT, but maybe I’ve just missed it)
  - dartthrowingchimp
    / May 16, 2013
    
    Great point about the possibility that both trends (fewer events, more deaths) could be “true” at the same time, Felix. Yet another example of the complexity of measuring violence.
    
    On comparing event locations across Syria Tracker and GDELT, it looks like Leetaru and Schrodt gave New Scientist some data that’s not yet in the public posting, which ends in mid-2012. I think their plan is to release a version that goes to the present and start posting daily updates very soon, but last time I checked, they weren’t up yet.
rationalinsurgent
/ May 16, 2013

You’re so awesome.

Reply
- dartthrowingchimp
  / May 16, 2013
  
  (blushing) Thanks, Erica!
  
  Reply
Grant
/ May 17, 2013

Even if they were correct that violence has been receding (which they aren’t) it would not necessarily mean that the war was in any way being ‘subsiding’. It could have just as easily meant that there was a brief pause in fighting as forces geared up for offensives, or that after failures to take Damascus the rebels were focusing on softer targets that might not lead to as many immediate casualties or that an armed force was having trouble moving into certain areas because a vital bridge was out or something of the sort.

In other words the analysis failed in two completely different ways. They didn’t recognize the problem of the data they were using and they didn’t remember that numbers without context are worthless.

Reply
Bear Braumoeller
/ May 19, 2013

The more I look at this graph, the more I don’t get it. Unfortunately, I can’t get to the original article. But I’ve pulled down the publicly available GDELT data (through mid-2012) and restricted it to actor1 or actor2 = Syria, which should be about as generous an inclusion rule as possible… and even when I weight the data by the Goldstein scale, I don’t see any days with 250-500 violent events. The worst (scaled) days clock in at around 120.

Reply
- dartthrowingchimp
  / May 20, 2013
  
  It’s hard to know what’s going on since the data since mid-2012 aren’t up yet, but I wonder if they used a shape file to pull events with geocoordinates inside Syria that didn’t have a “SYR” tag in the actor fields.
  
  Reply