As part of a larger (but, unfortunately, gated) story on how the terrific new Global Data on Events, Language, and Tone (GDELT) might help social scientists forecast violent conflicts, the New Scientist recently posted some graphics using GDELT to chart the ongoing civil war in Syria. Among those graphics was this time-series plot of violent events per day in Syria since the start of 2011:
Based on that chart, the author of the story (not the producers of GDELT, mind you) wrote:
As Western leaders ponder intervention, the resulting view suggests that the violence has subsided in recent months, from a peak in the third quarter of 2012.
That inference is almost certainly wrong, and why it’s wrong underscores one of the fundamental challenges in using event data—whether it’s collected and coded by software or humans or some combination thereof—to observe the dynamics of violent conflict.
I say that inference is almost certainly wrong because concurrent data on deaths and refugees suggest that violence in Syria has only intensified in past year. One of the most reputable sources on deaths from the war is the Syria Tracker. A screenshot of their chart of monthly counts of documented killings is shown below. Like GDELT, their data also identify a sharp increase in violence in late 2012. Unlike GDELT, their data indicate that the intensity of the violence has remained very high since then, and that’s true even though the process of documenting killings inevitably lags behind the actual violence.
We see a similar pattern in data from the U.N. High Commissioner on Refugees (UNHCR) on people fleeing the fighting in Syria. If anything, the flow of refugees has only increased in 2013, suggesting that the violence in Syria is hardly abating.
The reason GDELT’s count of violent events has diverged from other measures of the intensity of the violence in Syria in recent months is probably something called “media fatigue.” Data sets of political events generally depend on news sources to spot events of interest, and it turns out that news coverage of large-scale political violence follows a predictable arc. As Deborah Gerner and Phil Schrodt describe in a paper from the late 1990s, press coverage of a sustained and intense conflicts is often high when hostilities first break out but then declines steadily thereafter. That decline can happen because editors and readers get bored, burned out, or distracted. It can also happen because the conflict gets so intense that it becomes, in a sense, too dangerous to cover. In the case of Syria, I suspect all of these things are at work.
My point here isn’t to knock GDELT, which is still recording scores or hundreds of events in Syria every day, automatically, using open-source code, and then distributing those data to the public for free. Instead, I’m just trying to remind would-be users of any data set of political events to infer with caution. Event counts are one useful way to track variation over time in political processes we care about, but they’re only one part of the proverbial elephant, and they are inevitably constrained by the limitations of the sources from which they draw. To get a fuller sense of the beast, we need as often as possible to cross-reference those event data with other sources of information. Each of the sources I’ve cited here has its own blind spots and selection biases, but a comparison of trends from all three—and, importantly, an awareness of the likely sources of those biases—is enough to give me confidence that the civil war in Syria is only continuing to intensify. That says something important about Syria, of course, but it also says something important about the risks of drawing conclusions from event counts alone.
PS. For a great discussion of other sources of bias in the study of political violence, see Stathis Kalyvas’ 2004 essay on “The Urban Bias in Research on Civil Wars” (PDF).