Challenges in Measuring Violent Conflict, Syria Edition

As part of a larger (but, unfortunately, gated) story on how the terrific new Global Data on Events, Language, and Tone (GDELT) might help social scientists forecast violent conflicts, the New Scientist recently posted some graphics using GDELT to chart the ongoing civil war in Syria. Among those graphics was this time-series plot of violent events per day in Syria since the start of 2011:

Syrian Conflict   New Scientist

Based on that chart, the author of the story (not the producers of GDELT, mind you) wrote:

As Western leaders ponder intervention, the resulting view suggests that the violence has subsided in recent months, from a peak in the third quarter of 2012.

That inference is almost certainly wrong, and why it’s wrong underscores one of the fundamental challenges in using event data—whether it’s collected and coded by software or humans or some combination thereof—to observe the dynamics of violent conflict.

I say that inference is almost certainly wrong because concurrent data on deaths and refugees suggest that violence in Syria has only intensified in past year. One of the most reputable sources on deaths from the war is the Syria Tracker. A screenshot of their chart of monthly counts of documented killings is shown below. Like GDELT, their data also identify a sharp increase in violence in late 2012. Unlike GDELT, their data indicate that the intensity of the violence has remained very high since then, and that’s true even though the process of documenting killings inevitably lags behind the actual violence.

Syria Tracker monthly death counts

We see a similar pattern in data from the U.N. High Commissioner on Refugees (UNHCR) on people fleeing the fighting in Syria. If anything, the flow of refugees has only increased in 2013, suggesting that the violence in Syria is hardly abating.

UNHCR syria refugee plot

The reason GDELT’s count of violent events has diverged from other measures of the intensity of the violence in Syria in recent months is probably something called “media fatigue.” Data sets of political events generally depend on news sources to spot events of interest, and it turns out that news coverage of large-scale political violence follows a predictable arc. As Deborah Gerner and Phil Schrodt describe in a paper from the late 1990s, press coverage of a sustained and intense conflicts is often high when hostilities first break out but then declines steadily thereafter. That decline can happen because editors and readers get bored, burned out, or distracted. It can also happen because the conflict gets so intense that it becomes, in a sense, too dangerous to cover. In the case of Syria, I suspect all of these things are at work.

My point here isn’t to knock GDELT, which is still recording scores or hundreds of events in Syria every day, automatically, using open-source code, and then distributing those data to the public for free. Instead, I’m just trying to remind would-be users of any data set of political events to infer with caution. Event counts are one useful way to track variation over time in political processes we care about, but they’re only one part of the proverbial elephant, and they are inevitably constrained by the limitations of the sources from which they draw. To get a fuller sense of the beast, we need as often as possible to cross-reference those event data with other sources of information. Each of the sources I’ve cited here has its own blind spots and selection biases, but a comparison of trends from all three—and, importantly, an awareness of the likely sources of those biases—is enough to give me confidence that the civil war in Syria is only continuing to intensify. That says something important about Syria, of course, but it also says something important about the risks of drawing conclusions from event counts alone.

PS. For a great discussion of other sources of bias in the study of political violence, see Stathis Kalyvas’ 2004 essay on “The Urban Bias in Research on Civil Wars” (PDF).

Leave a comment


  1. The social movements literature has a lot of work on news bias in protest coverage as well. The Jennifer Earl et al. review is informative:

    • As it happens, most of my own work with event data has focused on nonviolent mobilization, so I’m really glad you brought this up. I probably should have made the connection explicit in the post, but then I haven’t kept pace with the literature since grad school, so I wasn’t sure where to point. Problem solved with your link, so…thank you!

  2. Oral Hazard

     /  May 16, 2013

    To use an extreme example, I wonder if the GDELT collection would record the Hiroshima bomb as a single violent event in Japan. In more realistic terms, one marketplace car bomb with 50 fatalities = one downing of a regime aircraft.

    • I think the simple answer is yes, it would. That said, you would get some indication of the intensity of the event from duplicate records. Duplicates have traditionally been considered a bad thing in event data sets, but if I understand the coding process correctly, GDELT acknowledges the inevitability of duplicates and tries instead to treat it as a signal for this very purpose.

      From my conversations with Phil and Kalev, I also gather that they are working on a way to pull casualty counts from stories as well, so in the future you could use event counts, casualty counts, or some combination thereof as an indicator of conflict intensity. That would partially address your concern about the comparability of individual events. It would not address the deeper issue of selection bias in reporting, including the “media fatigue” problem I mentioned here.

      • Sorry, I’m late to this, but I just saw the New Scientist article today. I would definitely agree to your conclusion that the violence in Syria has in fact NOT receded. If you combine the insights from the two datasets (GDELT and Syria Tracker), it could even lead to the conclusion that the conflict has become more brutal. Essentially, we’re observing a high number of killings (since August 2012, Syria Tracker) while there has been a decline in the number of reported events (GDELT) during the same time. This suggests a) media fatigue (as you have pointed out, Jay) and/or b) rebels and government troops are killing more people in the same events, i.e. through mass killings which would be an indicator for a brutalization of the war (which is also in line with the increasing refugee numbers you provided).

        I’m pretty sure it’s mostly a) but to reject b) we would need data on how many people were killed at a given event. I’m not familiar with the Syria Tracker data, so I don’t know if that data is in there. Anybody?

        Also, to assess GDELT’s accurancy of location, it would be interesting to see how the two data sets compare in terms of pinning down the location of violence. By simple visual comparison they seem pretty similar, but again, numbers would be great. The New Scientist data is not available on the net, right? Otherwise, the numbers could be crunched comparatively quickly. (I know, technically I could reconstruct the New Scientist Data from GDELT, but maybe I’ve just missed it)

      • Great point about the possibility that both trends (fewer events, more deaths) could be “true” at the same time, Felix. Yet another example of the complexity of measuring violence.

        On comparing event locations across Syria Tracker and GDELT, it looks like Leetaru and Schrodt gave New Scientist some data that’s not yet in the public posting, which ends in mid-2012. I think their plan is to release a version that goes to the present and start posting daily updates very soon, but last time I checked, they weren’t up yet.

  3. You’re so awesome.

  4. Grant

     /  May 17, 2013

    Even if they were correct that violence has been receding (which they aren’t) it would not necessarily mean that the war was in any way being ‘subsiding’. It could have just as easily meant that there was a brief pause in fighting as forces geared up for offensives, or that after failures to take Damascus the rebels were focusing on softer targets that might not lead to as many immediate casualties or that an armed force was having trouble moving into certain areas because a vital bridge was out or something of the sort.

    In other words the analysis failed in two completely different ways. They didn’t recognize the problem of the data they were using and they didn’t remember that numbers without context are worthless.

  5. The more I look at this graph, the more I don’t get it. Unfortunately, I can’t get to the original article. But I’ve pulled down the publicly available GDELT data (through mid-2012) and restricted it to actor1 or actor2 = Syria, which should be about as generous an inclusion rule as possible… and even when I weight the data by the Goldstein scale, I don’t see any days with 250-500 violent events. The worst (scaled) days clock in at around 120.

    • It’s hard to know what’s going on since the data since mid-2012 aren’t up yet, but I wonder if they used a shape file to pull events with geocoordinates inside Syria that didn’t have a “SYR” tag in the actor fields.

  1. Media Fatigue, Syria, and Social Science
  2. Has the Syrian civil war fallen victim to media fatigue? « Hot Air
  3. How Computers Can Help Us Track Violent Conflicts — Including Right Now in Syria | Symposium Magazine
  4. Links to some useful posts from the past | Global Database of Events, Language, and Tone (GDELT)
  5. Re-thinking conflict early warning: big data and systems thinking | Let them talk

Leave a Comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

Join 5,754 other followers

%d bloggers like this: