Lost in the Fog of Civil War in Syria

On Twitter a couple of days ago, Adam Elkus called out a recent post on Time magazine’s World blog as evidence of the way that many peoples’ expectations about the course of Syria’s civil war have zigged and zagged over the past couple of years. “Last year press was convinced Assad was going to fall,” Adam tweeted. “Now it’s that he’s going to win. Neither perspective useful.” To which the eminent civil-war scholar Stathis Kalyvas replied simply, “Agreed.”

There’s a lesson here for anyone trying to glean hints about the course of a civil war from press accounts of a war’s twists and turns. In this case, it’s a lesson I’m learning through negative feedback.

Since early 2012, I’ve been a participant/subject in the Good Judgment Project (GJP), a U.S. government-funded experiment in “wisdom of crowds” forecasting. Over the past year, GJP participants have been asked to estimate the probability of several events related to the conflict in Syria, including the likelihood that Bashar al-Assad would leave office and the likelihood that opposition forces would seize control of the city of Aleppo.

I wouldn’t describe myself as an expert on civil wars, but during my decade of work for the Political Instability Task Force, I spent a lot of time looking at data on the onset, duration, and end of civil wars around the world. From that work, I have a pretty good sense of the typical dynamics of these conflicts. Most of the civil wars that have occurred in the past half-century have lasted for many years. A very small fraction of those wars flared up and then ended within a year. The ones that didn’t end quickly—in other words, the vast majority of these conflicts—almost always dragged on for several more years at least, sometimes even for decades. (I don’t have my own version handy, but see Figure 1 in this paper by Paul Collier and Anke Hoeffler for a graphical representation of this pattern.)

On the whole, I’ve done well in the Good Judgment Project. In the year-long season that ended last month, I ranked fifth among the 303 forecasters in my experimental group, all while the project was producing fairly accurate forecasts on many topics. One thing that’s helped me do well is my adherence to what you might call the forecaster’s version of the Golden Rule: “Don’t neglect the base rate.” And, as I just noted, I’m also quite familiar with the base rates of civil-war duration.

So what did I do when asked by GJP to think about what would happen in Syria? I chucked all that background knowledge out the window and chased the very narrative that Elkus and Kalyvas rightly decry as misleading.

Here’s a chart showing how I assessed the probability that Assad wouldn’t last as president beyond the end of March 2013, starting in June 2012. The actual question asked us to divide the probability of his exiting office across several time periods, but for simplicity’s sake I’ve focused here on the part indicating that he would stick around past April 1. This isn’t the same thing as the probability that the war would end, of course, but it’s closely related, and I considered the two events as tightly linked. As you can see, until early 2013, I was pretty confident that Assad’s fall was imminent. In fact, I was so confident that at a couple of points in 2012, I gave him zero chance of hanging on past March of this year—something a trained forecaster really never should do.

gjp assad chart

Now here’s another chart showing my estimates of the likelihood that rebels would seize control of Aleppo before May 1, 2013. The numbers are a little different, but the basic pattern is the same. I started out very confident that the rebels would win the war soon and only swung hard in the opposite direction in early 2013, as the boundaries of the conflict seemed to harden.

gjp aleppo chart

It’s impossible to say what the true probabilities were in this or any other uncertain situation. Maybe Assad and Aleppo really were on the brink of falling for a while and then the unlikely-but-still-possible version happened anyway.

That said, there’s no question that forecasts more tightly tied to the base rate would have scored a lot better in this case. Here’s a chart showing what my estimates might have looked like had I followed that rule, using approximations of the hazard rate from the chart in the Collier and Hoeffler paper. If anything, these numbers overstate the likelihood that a civil war will end at a given point in time.

gjp baserate chart

I didn’t keep a log spelling out my reasoning at each step, but I’m pretty confident that my poor performance here is an example of motivated reasoning. I wanted Assad to fall and the pro-democracy protesters who dominated the early stages of the uprising to win, and that desire shaped what I read and then remembered when it came time to forecast. I suspect that many of the pieces I was reading were slanted by similar hopes, creating a sort of analytic cascade similar to the herd behavior thought to drive many financial-market booms and busts. I don’t have the data to prove it, but I’m pretty sure the ups and downs in my forecasts track the evolving narrative in the many newspaper and magazine stories I was reading about the Syrian conflict.

Of course, that kind of herding happens on a lot of topics, and I was usually good at avoiding it. For example, when tensions ratcheted up on the Korean Peninsula earlier this year, I hewed to the base rate and didn’t substantially change my assessment of the risk that real clashes would follow.

What got me in the case of Syria was, I think, a sense of guilt. The Assad government has responded to a legitimate popular challenge with mass atrocities that we routinely read about and sometimes even see. In parts of the country, the resulting conflict is producing scenes of absurd brutality. This isn’t a “problem from hell,” as Samantha Powers’ book title would have it; it is a glimpse of hell. And yet, in the face of that horror, I have publicly advocated against American military intervention. Upon reflection, I wonder if my wildly optimistic forecasting about the imminence of Assad’s fall wasn’t my unconscious attempt to escape the discomfort of feeling complicit in the prolongation of that suffering.

As a forecaster, if I were doing these questions over, I would try to discipline myself to attend to the base rate, but I wouldn’t necessarily stop there. As I’ve pointed out in a previous post, the base rate is a valuable anchoring device, but attending to it doesn’t mean automatically ignoring everything else. My preferred approach, when I remember to have one, is to take that base rate as a starting point and then use Bayes’ theorem to update my forecasts in a more disciplined way. Still, I’ll bring a newly skeptical eye the flurry of stories predicting that Assad’s forces will soon defeat Syria’s rebels and keep their patron in power. Now that we’re a couple years into the conflict, quantified history tells us that the most likely outcome in any modest slice of time (say, months rather than years) is, tragically, more of the same.

And, as a human, I’ll keep hoping the world will surprise us and take a different turn.

“They Said It Was Going to Rain”

Most Saturdays and some Sundays, I hook up with a bike ride that winds out of DC’s Rock Creek Park into semi-rural Maryland and back again over the course of a few hours. I depend on this ride for hard training and a shot of competition, but I’m a wet-weather wimp and will usually stay home and use the trainer in my basement if it’s raining or probably going to rain. So, one of the first things I do when I get up most weekend mornings is check the hourly forecasts at weather.com and Weather Underground. If there’s much risk of rain, I’ll open the radar map again close to my 9:45 departure and run the animated forecast for the next few hours. If that animation shows yellow or orange blobs swarming my regular route when I’m going to be on it, I almost always stay in.

One recent Sunday, the forecast had me hemming and hawing for a bit before I decided to go. The hourly breakout at weather.com pegged the chance of rain at 70 percent for the first couple of hours I’d be out, but it wasn’t raining at 9:30 and the radar map didn’t look bad, either. Updating completed, out I went.

The weather often dominates conversations at the start and finish of the ride, and on that Sunday two themes rang through the chatter I overheard: we’d gotten really lucky, and weather forecasters are idiots. “They said it was going to rain,” the Greek chorus kept repeating.

wet paris roubaix

But, of course, that’s not what “they” said. In point of fact, meteorologists had pegged the odds of rain at about 2:1. According to those forecasts, it was probably going to rain, but the chances that it would stay dry weren’t so bad, either. I wouldn’t bet my mortgage on a probability of 0.3, but I’m okay with occasionally risking a soggy ride on one.

As a weather-wimpy cyclist, I was happy to catch the lucky break that Sunday. As a guy who sometimes forecasts for a living, I was intrigued by the consistent way in which so many people had distorted that probability. In our heads, the quantified uncertainty we saw in the paper or on the web was transformed into a categorical prediction of rain. What the modeler would want to contextualize before assessing—”For all of the hours I said there was a 70-percent chance of rain, how often did rain actually happen?”—the intended audience was fine judging in isolation and declaring, “Wrong!”

That we’re not so great at processing probabilities won’t surprise anyone familiar with psychological research from the past few decades on that subject. Exactly what form that bias takes under what conditions, though, still seems to be something of a mystery. In a New York Times blog post about forecasts of the U.S. presidential election, statistician Andrew Gelman wrote:

What if the weatherman told you there was a 30 percent chance of rain—would you be shocked if it rained that day? No.

Apparently, Gelman hasn’t met the crew from my weekend ride. Gelman goes on to connect his assertion to work by Amos Tversky and Daniel Kahneman on prospect theory, which is based, in part, on the expectation people systematically overestimate the risk of low-probability events and underestimate the risk of high-probability ones. That expectation, in turn, is based on empirical research that has been replicated elsewhere, as the following chart shows:

probability weighting estimates

What’s puzzling to me here is that my fellow riders seemed to be distorting things in the opposite direction. Instead of taking a probability of 0.7 and thinking of it as a toss-up as Gelman and that chart predict they would, they had converted it into a sure thing. That’s still bias, of course—just not the kind I would have expected.

If there’s a moral to this story, it’s that we still have a lot of work left to do in understanding how we cogitate on uncertainty and what that implies about how we should produce and present probabilistic forecasts. In many domains, we’re getting better and better at the forecasting part, but even very accurate forecasts are only as useful as we make them or let them be. To get from the one to the other, we still need to learn a lot more about how we process and act on that information—not just individually, but also organizationally and socially.

Challenges in Measuring Violent Conflict, Syria Edition

As part of a larger (but, unfortunately, gated) story on how the terrific new Global Data on Events, Language, and Tone (GDELT) might help social scientists forecast violent conflicts, the New Scientist recently posted some graphics using GDELT to chart the ongoing civil war in Syria. Among those graphics was this time-series plot of violent events per day in Syria since the start of 2011:

Syrian Conflict   New Scientist

Based on that chart, the author of the story (not the producers of GDELT, mind you) wrote:

As Western leaders ponder intervention, the resulting view suggests that the violence has subsided in recent months, from a peak in the third quarter of 2012.

That inference is almost certainly wrong, and why it’s wrong underscores one of the fundamental challenges in using event data—whether it’s collected and coded by software or humans or some combination thereof—to observe the dynamics of violent conflict.

I say that inference is almost certainly wrong because concurrent data on deaths and refugees suggest that violence in Syria has only intensified in past year. One of the most reputable sources on deaths from the war is the Syria Tracker. A screenshot of their chart of monthly counts of documented killings is shown below. Like GDELT, their data also identify a sharp increase in violence in late 2012. Unlike GDELT, their data indicate that the intensity of the violence has remained very high since then, and that’s true even though the process of documenting killings inevitably lags behind the actual violence.

Syria Tracker monthly death counts

We see a similar pattern in data from the U.N. High Commissioner on Refugees (UNHCR) on people fleeing the fighting in Syria. If anything, the flow of refugees has only increased in 2013, suggesting that the violence in Syria is hardly abating.

UNHCR syria refugee plot

The reason GDELT’s count of violent events has diverged from other measures of the intensity of the violence in Syria in recent months is probably something called “media fatigue.” Data sets of political events generally depend on news sources to spot events of interest, and it turns out that news coverage of large-scale political violence follows a predictable arc. As Deborah Gerner and Phil Schrodt describe in a paper from the late 1990s, press coverage of a sustained and intense conflicts is often high when hostilities first break out but then declines steadily thereafter. That decline can happen because editors and readers get bored, burned out, or distracted. It can also happen because the conflict gets so intense that it becomes, in a sense, too dangerous to cover. In the case of Syria, I suspect all of these things are at work.

My point here isn’t to knock GDELT, which is still recording scores or hundreds of events in Syria every day, automatically, using open-source code, and then distributing those data to the public for free. Instead, I’m just trying to remind would-be users of any data set of political events to infer with caution. Event counts are one useful way to track variation over time in political processes we care about, but they’re only one part of the proverbial elephant, and they are inevitably constrained by the limitations of the sources from which they draw. To get a fuller sense of the beast, we need as often as possible to cross-reference those event data with other sources of information. Each of the sources I’ve cited here has its own blind spots and selection biases, but a comparison of trends from all three—and, importantly, an awareness of the likely sources of those biases—is enough to give me confidence that the civil war in Syria is only continuing to intensify. That says something important about Syria, of course, but it also says something important about the risks of drawing conclusions from event counts alone.

PS. For a great discussion of other sources of bias in the study of political violence, see Stathis Kalyvas’ 2004 essay on “The Urban Bias in Research on Civil Wars” (PDF).

Dart-Throwing Chimp Does TEDxTbilisi

Last month, I traveled to Georgia (the country) to give a talk at the second annual TEDxTbilisi. In that talk, I used stories about shoddy infrastructure to explore the gap between conventional theories and my own understanding of the things that cause authoritarian regimes to persist and then collapse. Called “Why Dictators Build Stuff that Crumbles,” my script was basically a mash-up of a couple of blog posts from the past year: one of nearly the same name, and another on why political activism over threats to public health and safety presents authoritarian regimes with special dilemmas.

The event was terrific—full house, great venue, good refreshments—and the small army of volunteers it took to make TEDxTbilisi happen did tremendous work. To readers of this blog, I’d especially recommend these four talks:

* Dato Gogigchaishvili, a Georgian television host and producer, gave a really smart and funny talk that probed the truth and limits of cross-cultural comparisons.

* Rusudan Gotsiridze spoke beautifully and humorously about gender roles through the lens of her own experiences as the first female bishop in Georgia.

* Educators and parents will appreciate the talk by Mark Rein-Hagen, a professional game designer, about learning through playing.

* The theme for TEDxTbilisi this year was “crossroads,” and Donald Rayfield capped the day with a great talk about Georgia’s long and difficult history as a place squished in between other, more powerful states and empires.

Honestly, preparing for the event was a lot harder than I’d expected. Having a blog where I regularly try to present social-science ideas to a broader audience made the initial task of identifying a relevant topic and drafting a script easier than they might have been. That part, I actually enjoyed. Much harder for me were committing the talk to memory and rehearsing it enough so that it (hopefully) didn’t look and sound too canned.

I’m sure the memory and delivery parts are easier for some people than others, and I suspect they get easier when you do them routinely. They were new to me, though, and I put a lot of hours into it over the two weeks before the event, reading out loud and then practicing versions of the talk. The closer I got to the trip, the more of my intellectual processing power it seemed to absorb. I was a lousy creative thinker that last week, and once in that home stretch I completely whiffed on a phone call I was supposed to make for work, something I never do. Having been through this once, I’m much more impressed with the people who make that kind of performance look natural and effortless than I used to be.

Finally, I gotta say, the process was exhausting. I am a creature of habit who rarely travels for work and almost never travels overseas. My TEDxTbilisi trip was a five-day blast with opening and closing legs of 24-hour travel to and from a city eight time zones ahead of home. During the three days I was in Tbilisi, the combination of jet lag, noise and cigarette smoke in the hotel, caffeine withdrawal, and anxiety about the impending event meant that I slept poorly. I used to race a lot as a runner and then a cyclist, and one of the big rules of thumb in those worlds is to stick to normal routines as much as possible before important races to keep the stress down and energy and focus up. Here, I’d basically done the opposite, shaking up everything I normally do. If I’d had my druthers, I’d have taken my first crack at this kind of thing under less stressful circumstances.

Of course, in real life you take what you can get, and in TEDx Tbilisi I got a great opportunity. If hope you enjoy the talk.

I’m Down with Complexity and All, But…

In a recent Scientific American blog post called “Big Data Needs a Big Theory“, Geoffrey West calls for a unified theory of complex systems that will advance our understanding of, and capacity to predict, stasis and change in many domains. Quoting at length:

The digital revolution is driving much of the increasing complexity and pace of life we are now seeing, but this technology also presents an opportunity… With new computational tools and techniques to digest vast, interrelated databases, researchers and practitioners in science, technology, business and government have begun to bring large-scale simulations and models to bear on questions formerly out of reach of quantitative analysis, such as how cooperation emerges in society, what conditions promote innovation, and how conflicts spread and grow.

The trouble is, we don’t have a unified, conceptual framework for addressing questions of complexity. We don’t know what kind of data we need, nor how much, or what critical questions we should be asking. “Big data” without a “big theory” to go with it loses much of its potency and usefulness, potentially generating new unintended consequences.

When the industrial age focused society’s attention on energy in its many manifestations—steam, chemical, mechanical, and so on—the universal laws of thermodynamics came as a response. We now need to ask if our age can produce universal laws of complexity that integrate energy with information. What are the underlying principles that transcend the extraordinary diversity and historical contingency and interconnectivity of financial markets, populations, ecosystems, war and conflict, pandemics and cancer? An overarching predictive, mathematical framework for complex systems would, in principle, incorporate the dynamics and organization of any complex system in a quantitative, computable framework.

We will probably never make detailed predictions of complex systems, but coarse-grained descriptions that lead to quantitative predictions for essential features are within our grasp. We won’t predict when the next financial crash will occur, but we ought to be able to assign a probability of one occurring in the next few years. The field is in the midst of a broad synthesis of scientific disciplines, helping reverse the trend toward fragmentation and specialization, and is groping toward a more unified, holistic framework for tackling society’s big questions.

Not to put too fine a point on it, but I think that agenda is unrealistic.

I agree with West that human social systems are best understood as complex systems in the technical sense of that term (see here). Still, on the possibility of law-like regularities in complex systems that extend to large-scale human social behavior and are usefully predictive, I’m skeptical. It’s hard for me to imagine what those laws would look like, but then I know that my incapacity to understand the universe is not a reliable indicator of the universe’s inherent regularity or intelligibility.protein_network

At the same time, I think West’s analogizing to physics and the laws of thermodynamics ignores the single most-important difference between the “natural” sciences and the social sciences, namely, the (in)ability to perform true experiments. (N.B. Humans and their social interactions are, of course, entirely “natural,” too, but these are the terms we conventionally use.) Social scientists can only observe the systems we study; we can’t repeatedly perturb them in specific ways under tightly controlled conditions and see how things play out.

The impossibility of experimentation means we’re never going to be able to see the counterfactuals we’d need to see to make clear and confident inferences about rules or laws. That doesn’t mean we can’t find some robust patterns, but those patterns will never be anywhere near as universal and specific as the laws of thermodynamics.

The fuzziness of our understanding also means that the patterns we do see will have only modest predictive power at best. Those fuzzy patterns will allow us to assess differences in propensities with some success, as they already do now, but they will not lead us to sharply accurate predictions about the timing and details of change.

More important, those patterns themselves will change over time, as the underlying system continues to evolve. As West suggests, the changes that are creating new opportunities for analysis are themselves products of exponential growth in the complexity of human society. It’s an empirical question, I suppose, but I find it hard to believe that the processes which beget conflicts between states in the middle of the twenty-first century—an age of nukes and mega-cities and deep globalization—will resemble the processes that begat World Wars I and II in all but the most banal ways. And, of course, that’s assuming that states in the conventional sense are even still around.

Sovereignty Without Territoriality?

The concentration of manpower was the key to political power in premodern Southeast Asia… This overwhelming concern for obtaining and holding population at the core is shot through every aspect of precolonial statecraft. What Geertz says about Balinese political rivalries—that they were “a struggle more for men than for land”—could apply equally to all of mainland Southeast Asia. This principle animated the conduct of warfare, which was less a grab for distant territory than a quest for captives who could be resettled at the core… Early European officials were frequently astounded by the extremely vague demarcation of territories and provinces in their new colonies and puzzled by an administration of manpower that had little or nothing to do with territorial jurisdiction… As Thongchai Winichakul’s insightful book shows, the Siamese paid more attention to the manpower they could summon than to sovereignty over land that had no value in the absence of labor.

That’s from Chapter 3 (pp. 64-68) of James Scott’s The Art of Not Being Governed. To an inhabitant of the “modern” world who studies international politics, Scott’s description of powerful states that only vaguely demarcated and policed their putative territorial boundaries serves as an intriguing reminder that the fusion of territoriality and political sovereignty we now take for granted is not inevitable. Organizations can and have exercised substantial authority over human society without husbanding exclusive control over specific patches of land. Scott sees similar processes at work in nineteenth– and twentieth-century sub-Saharan Africa:

The theme of manpower concentration permeates the literature on indigenous politics: “The drive to acquire relatives, adherents, dependents, retainers, and subjects and to keep them attached to oneself as a kind of social and political ‘capital’ has often been remarked upon as characteristic of African political processes.”… As in Southeast Asia there was little emphasis on sharp territorial boundaries, and the important rights were over people, not places, except for particular ritual sites. The competition for followers, kinsmen, and bondsmen operated at every level.

In fact, I’d say there are at least three interconnected but distinct spaces in which political authority can be organized—physical (territory), social (people), and economic (trade)—and the three don’t necessarily have to hang together. Scott has already described for us states whose sovereignty was rooted primarily in the social and economic realms with less attention to territory.

Contemporary drug cartels arguably exemplify the possibility of organizations that compete for power in trade space without asserting sovereignty over territory or society in the way that modern states do. Large cartels sometimes attempt to establish territorial zones of impunity or even governance, but those efforts often come in response to rivals’ attempts to quash their power in trade space. More important, the point of that territorial control is usually to gain freedom from interference in their economic activities, not to assert the full panoply of political authority we attach to the modern idea of sovereignty. As John Sullivan says of contemporary “criminal insurgencies” in Mexico and elsewhere,

Organized crime groups (gangs and cartels)…usually seek to elude detection and prefer co-opting (corrupting) the instruments of state rather than engaging in direct confrontation… Yet as the current crime wars illustrate, these actors can directly confront the state when their interests are challenged (Bailey & Talyor, 2009).  Criminal insurgency is the mechanism of the confrontation with the state that results when relationships between organized crime and the state fall into disequilibrium.

Criminal insurgency presents a challenge to states and communities. Criminal insurgency is different from conventional terrorism and insurgency because the criminal insurgents’ sole political motive is to gain autonomy and economic control over territory. They do so by hollowing out the state and creating criminal enclaves to secure freedom to maneuver.

It’s harder for me to think of an organization that competes for sovereignty in the social realm without seeking control over territory or trade. I suppose organized religion comes closest. Although some hierarchical religious organizations historically have also pursued control over land and trade, in ideological terms, their main claim attaches to the souls of their adherents and nothing else. Ethnicity might fit the bill, too, insofar as leaders of these communities of putative kinship claim authority over members wherever they may be and whatever trade they might take up. It’s also interesting to think about whether or not cyberspace is emerging as a fourth realm for political organization, intertwined with but at least partially independent of the other three, but that’s a question for another day.

What’s confusing to modern ears, I think, is the application of the word “state” to these other things. Scott explicitly did so, and I’m implicitly doing so here. My point in doing so is to highlight that the constructs we call “states” are just one of many organizations constantly competing for power in these various spaces. What’s unique about the modern state is its explicit claim to dominion over all three of those spaces—physical, social, and economic—within a particular set of sharply demarcated borders.

So, let’s flip it around: instead of calling all of these organizations states, let’s reserve that term for the modern thing, but let’s allow Scott’s passage to remind us that states are neither as inevitable nor as successful in their efforts to establish that dominion as we often assume. Instead, they are just one organizational form competing for sovereignty in these various realms, and their success in those struggles is neither as complete nor as final as they would like it to be. The fusion of sovereignty in the modern state is a specific idea, not a natural fact, and a self-serving one at that.

Road-Testing GDELT as a Resource for Monitoring Atrocities

As I said here a few weeks ago, I think the Global Dataset on Events, Location, and Tone (GDELT) is a fantastic new resource that really embodies some of the ways in which technological changes are coming together to open lots of new doors for social-scientific research. GDELT’s promise is obvious: more than 200 million political events from around the world over the past 30 years, all spotted and coded by well-trained software instead of the traditional armies of undergrad RAs, and with daily updates coming online soon. Or, as Adam Elkus’ t-shirt would have it, “200 million observations. Only one boss.”

BUT! Caveat emptor! Like every other data-collection effort ever, GDELT is not alchemy, and it’s important that people planning to use the data, or even just to consume analysis based on it, understand what its limitations are.

I’m starting to get a better feel for those limitations from my own efforts to use GDELT to help observe atrocities around the world, as part of a consulting project I’m doing for the U.S. Holocaust Memorial Museum’s Center for the Prevention of Genocide. The core task of that project is to develop plans for a public early-warning system that would allow us to assess the risk of onsets of atrocities in countries worldwide more accurately and earlier than current practice.

When I heard about GDELT last fall, though, it occurred to me that we could use it (and similar data sets in the pipeline) to support efforts to monitor atrocities as well. The CAMEO coding scheme on which GDELT is based includes a number of event types that correspond to various forms of violent attack and other variables indicating who was doing attacking whom. If we could develop a filter that reliably pulled events of interest to us from the larger stream of records, we could produce something like a near-real time bulletin on recent violence against civilians around the world. Our record would surely have some blind spots—GDELT only tracks a limited number of news sources, and some atrocities just don’t get reported, period—but I thought it would reliably and efficiently alert us to new episodes of violence against civilians and help us identify trends in ongoing ones.

Well, you know what they say about plans and enemies and first contact. After digging into GDELT, I still think we can accomplish those goals, but it’s going to take more human effort than I originally expected. Put bluntly, GDELT is noisier than I had anticipated, and for the time being the only way I can see to sharpen that signal is to keep a human in the loop.

Imagine (fantasize?) for a moment that there’s a perfect record somewhere of all the political interactions GDELT is trying to identify. For kicks, let’s call it the Encyclopedia Eventum (EE). Like any detection system, GDELT can mess up in two basic ways: 1) errors of omission, in which GDELT fails to spot something that’s in the EE; and 2) errors of commission, in which it mistakenly records an event that isn’t in the EE (or, relatedly, is in the EE but in a different place). We might also call these false negatives and false positives, respectively.

At this point, I can’t say anything about how often GDELT is making errors of omission, because I don’t have that Encyclopedia Eventum handy. A more realistic strategy for assessing the rate of errors of omission would involve comparing a subset of GDELT to another event data set that’s known to be a fairly reliable measure for some time and place of something GDELT is meant to track—say, protest and coercion in Europe—and see how well they match up, but that’s not a trivial task, and I haven’t tried it yet.

Instead, the noise I’m seeing is on the other side of that coin: the errors of commission, or false positives. Here’s what I mean:

To start developing my atrocities-monitoring filter, I downloaded the reduced and compressed version of GDELT recently posted on the Penn State Event Data Project page and pulled the tab-delimited text files for a couple of recent years. I’ve worked with event data before, so I’m familiar with basic issues in their analysis, but every data set has its own idiosyncrasies. After trading emails with a few CAMEO pros and reading Jay Yonamine’s excellent primer on event aggregation strategies, I started tinkering with a function in R that would extract the subset of events that appeared to involve lethal force against civilians. That function would involve rules to select on three features: event type, source (the doer), and target.

  • Event Type. For observing atrocities, type 20 (“Engage in Unconventional Mass Violence”) was an obvious choice. Based on advice from those CAMEO pros, I also focused on 18 (“Assault”) and 19 (“Fight”) but was expecting that I would need to be more restrictive about the subtypes, sources, and targets in those categories to avoid errors of commission.
  • Source. I’m trying to track violence by state and non-state agents, so I focused on GOV (government), MIL (Military), COP (police), and intelligence agencies (SPY) for the former and REB (militarized opposition groups) and SEP (separatist groups) for the latter. The big question mark was how to handle records with just a country code (e.g., “SYR” for Syria) and no indication of the source’s type. My CAMEO consultants told me these would usually refer in some way to the state, so I should at least consider including them.
  • Target. To identify violence against civilians, I figured I would get the most mileage out of the OPP (non-violent political opposition), CVL (“civilians,” people in general), and REF (refugees) codes, but I wanted to see if the codes for more specific non-state actors (e.g., LAB for labor, EDU for schools or students, HLH for health care) would also help flag some events of interest.

After tinkering with the data a bit, I decided to write to separate functions, one for events with state perpetrators and another for events with non-state perpetrators. If you’re into that sort of thing, you can see the state-perpetrator version of that filtering function on Github, here.

When I ran the more than 9 million records in the “2011.reduced.txt” file through that function, I got back 2,958 events. So far, so good. As soon as I started poking around in the results, though, I saw a lot of records that looked . The current release of GDELT doesn’t include text from or links to the source material, so it’s hard to say for sure what real-world event any one record describes. Still, some of the perpetrator-and-target combos looked odd to me, and web searches for relevant stories either came up empty or reinforced my suspicions that the records were probably errors of commission. Here are a few examples, showing the date, event type, source, and target:

  • 1/8/2011 193 USAGOV USAMED. Type 193 is “Fight with small arms and light weapons,” but I don’t think anyone from the U.S. government actually got in a shootout or knife fight with American journalists that day. In fact, that event-source-target combination popped up a lot in my subset.
  • 1/9/2011 202 USAMIL VNMCVL. Taken on its face, this record says that U.S. military forces killed Vietnamese civilians on January 9, 2011. My hunch is that the story on which this record is based was actually talking about something from the Vietnam War.
  • 4/11/2011 202 RUSSPY POLCVL. This record seems to suggest that Russian intelligence agents “engaged in mass killings” of Polish civilians in central Siberia two years ago. I suspect the story behind this record was actually talking about the Kaytn Massacre and associated mass deportations that occurred in April 1940.

That’s not to say that all the records looked wacky. Interleaved with these suspicious cases were records representing exactly the kinds of events I was trying to find. For example, my filter also turned up a 202 GOV SYRCVL for June 10, 2011, a day on which one headline blared “Dozens Killed During Syrian Protests.”

Still, it’s immediately clear to me that GDELT’s parsing process is not quite at the stage where we can peruse the codebook like a menu, identify the morsels we’d like to consume, phone our order in, and expect to have exactly the meal we imagined waiting for us when we go to pick it up. There’s lots of valuable information in there, but there’s plenty of chaff, too, and for the time being it’s on us as researchers to take time to try to sort the two out. This sorting will get easier to do if and when the posted version adds information about the source article and relevant text, but “easier” in this case will still require human beings to review the results and do the cross-referencing.

Over time, researchers who work on specific topics—like atrocities, or interstate war, or protest activity in specific countries—will probably be able to develop supplemental coding rules and tweak their filters to automate some of what they learn. I’m also optimistic that the public release of GDELT will accelerate improvements the software and dictionaries it uses, expanding its reach while shrinking the error rates. In the meantime, researchers are advised to stick to the same practices they’ve always used (or should have, anyway): take time to get to know your data; parse it carefully; and, when there’s no single parsing that’s obviously superior, check the sensitivity of your results to different permutations.

PS. If you have any suggestions on how to improve the code I’m using to spot potential atrocities or otherwise improve the monitoring process I’ve described, please let me know. That’s an ongoing project, and even marginal improvements in the fidelity of the filter would be a big help.

PPS. For more on these issues and the wider future of automated event coding, see this ensuing post from Phil Schrodt on his blog.

Hello?!? Not All Forecasters Are Strict Positivists

International relations is the most predictively oriented subfield of political science…Yet even in the other empirical subfields, the positivist notion that everything must ultimately be reducible to (knowable) universal laws displays its hold in excrescences such as quadrennial attempts to derive formulae for predicting the next presidential election outcome, usually on the basis of ‘‘real’’ (economic) factors. Even if one follows Milton Friedman (1953) in insisting that the factors expressed by such formulae are not supposed to be actually causing electoral outcomes, but are merely variables that (for some unknown reason) allow us to make good behavioral predictions, in practice one usually wants to know what is actually causing the behavior, and it is all too easy to assume that whatever is causing it—since it seems to be responsible for a behavioral regularity—must be some universal human disposition.

That’s from a 2012 paper by Jeffrey Friedman on Robert Jervis’ 1997 System Effects and the “problem of prediction.” I actually enjoyed the paper on the whole, but this passage encapsulates what drives me nuts about what many people—including many social “scientists”—think it means to try to make forecasts about politics.

Contrary to the assertions of some haters, political scientists almost never make explicit forecasts about the things they study—at least not in print or out loud. Some of that reticence presumably results from the fact that there’s no clear professional benefit to making predictions, and there is some professional risk in doing so and then being wrong.

Some of that reticence, though, also seems to flow from this silly but apparently widely-held idea that the very act of forecasting implies that the forecaster accepts the strict positivist premise that “everything must ultimately be reducible to (knowable) universal laws.” To that, I say…

charlie brown aaugh

Probability is a mathematical representation of uncertainty, and a probabilistic forecast explicitly acknowledges that we don’t know for sure what’s going to happen. Instead, it’s an educated guess—or, in Bayesian terms, an informed belief.

Forecasters generally use evidence from the past to educate those guesses, but that act of empiricism in itself does not imply that we presume there are universal laws driving political processes lurking beneath that history. Instead, it’s really just a practical solution to the problem of wanting better information—sometimes to help us plan for the future, and sometimes to try to adjudicate between different ideas about the forces shaping those processes now and in the past.

Empiricism is a practical solution because it works—not perfectly, of course, but, for many problems of interest, a lot better than casting bones or reading entrails or consulting oracles. The handful of forecasters I know all embrace the premises that their efforts are only approximations, and that the world can always change in ways that will render the models we find helpful today less helpful in the future. In the meantime, though, we figure we can nibble away at our ignorance by making structured guesses about that future and seeing which ones turn out to be more reliable than the others. Physicists still aren’t entirely sure how planes manage to fly, but millions of us make a prediction every day that the plane we’re about to board is somehow going to manage that feat. We don’t need to be certain of the underlying law to find that prediction useful.

Finally, I can’t resist: there’s real irony in Freidman’s choice of examples of misguided forecasting projects. To have called efforts to predict the outcome of U.S. presidential elections “excrescences” in the year those excrescences had a kind of popular coming out, well, that’s just unfortunate. I guess Friedman didn’t see that one coming.

A Few Suggestions for Social Scientists New to Twitter

Earlier today, one scholar whose work I greatly admire asked another scholar whose work I greatly admire for advice on how to get started on Twitter. I liked Dan’s response, but I thought I’d take Christian’s query as an open invitation to share a few suggestions of my own. So:

Replace the egg with a picture of you. Seriously, don’t even start following people until you’ve done this. It’s not vain; it’s just letting people know that there’s (probably) a real human on the other end, and letting us know something about how you plan to present yourself in this context. Some people can get away with using cartoons or pictures of their pets or kids, but most of us can’t. So, unless you’re trying to make a very specific statement by doing something different, you probably shouldn’t try.

Decide why you’re using Twitter. If your main goal is to use Twitter as a news feed or to follow other peoples’ work, then it’s a really easy tool to use. Just poke around until you find people and organizations that routinely cover the issues that interest you, and follow them. If, however, your goal is to develop a professional audience, then you need to put more thought into what you tweet and retweet, and the rest of my suggestions might be useful.

Pick your niche(s). There are a lot of social scientists on Twitter, and many of them are picky about whom they follow. To make it worth peoples’ while to add you to their feed, pick one or a few of your research interests and focus almost all of your tweets and retweets on them. For example, I’ve tried to limit my tweets to the topics I blog about: democratization, coups, state collapse,  forecasting, and a bit of international relations. When I was new to Twitter, I focused especially on democratization and forecasting because those weren’t topics other people were tweeting much about at the time. I think that differentiation made it easier for people to attach an identity to my avatar, and to understand what they would get by following me that they weren’t already getting from the 500 other accounts in their feeds.

Keep the tweet volume low, at least at the start. For a long time, I tried to limit myself to two or three tweets per Twitter session, usually once or twice per day. That made me think carefully about what I tweeted, (hopefully) keeping the quality higher and preventing me from swamping peoples’ feeds, a big turnoff for many.

Don’t just share the news; augment it. If you’re tweeting a news story or journal article or something, use a short quote or comment that crystallizes the story or tells us something about why you think it’s worth reading. In other words, try to add value. I usually lead with the title, then insert the link, then hang the quote or comment at the end, like this:

But, of course, there are lots of ways to do this. You can also drop the title entirely, like this recent one from Joshua Kucera that got me laughing:

Keep it professional.  If you’re thinking of Twitter as an extension of your work, don’t tweet about personal stuff. This is especially important when you’re new to the medium. The occasional reference to your life outside the office can help people feel more connected to you, but please err on the side of reticence. I have chosen not to follow or unfollowed many people because the interesting stuff in their feed was overwhelmed by the personal and trivial (and sometimes just downright gross). At some point, all that jetsam gets in the way of the information I’m actually looking for, so I choose to cut it off.

Related to the previous suggestion, be polite. In theory, this should go without saying, but, hey, this is the Internet. If you’re using Twitter for professional purposes, I think it makes sense to use the same language and demeanor you’d use in the office or at a professional conference. That can include humor and the occasional personal tidbit you’d share in a hallway conversation, but probably not the bar talk, and definitely not the post-conference conversations with your confidantes. It most definitely does not include nastiness or pettiness.

Be generous. Don’t retweet something under your own handle just to troll for RTs. If you want to share something someone else already shared, just pass along his or her tweet. The exception to this rule is when you’re going to add your own comment. Then just be sure to acknowledge the source with a via or h/t (hat tip). If a bunch of people already shared something so you’re not sure whom to credit, the answer is, Don’t share it again.

If you modify someone’s tweet at all before passing it along, use MT. This is a Twitter pet peeve of mine. RT (retweet) should only be used when what follows is a verbatim replication of the original. If you change anything—abbreviate, drop a comma, whatever—use MT (modified tweet) instead.

Finally, know that it’s addictive. I don’t mean fun-and-time-consuming addictive; I mean addictive addictive, like nicotine and booze. Before you dive in, it’s worth considering how that addiction might negatively affect your life and how you plan to deal with it. Just because lots of people do it doesn’t mean it’s good for you. The time you spend on Twitter is time you could have spent doing something else. If that something else is more important and you’re prone to addiction, be careful.

A Brief Exchange on Coups in Africa

When I got up this morning, I had an email in my inbox from Patrick Mathangani, a writer for Kenya’s The Standard. He said he was researching a story on coups in Africa, had found my blog and piece for Foreign Policy on the subject, and wondered if I’d answer a few questions. I thought some of this blog’s readers might be interested in that exchange, too, so here are Mr. Mathangani’s questions and my replies.

In your 2013 forecast, 22 of these countries are in Africa. Checking through data over the years, the continent appears to have had more than its share of coups since the 1950s, perhaps explaining why coups have been seen as an African problem. Your analysis appears to confirm this. What’s your view on this?

I don’t think coups are an African problem so much as they’re a problem of poor countries with weak states, and Africa happens to have more than its fair share of those. We’ve seen the same pattern in every other part of the world, just at different times in history. Latin America, for example, suffered lots of coups in the 1960s and 1970s, but the incidence dropped off sharply in the past couple of decades as most countries in the region got less poor and more democratic—and, crucially, after the Cold War ended and the U.S. and USSR stopped sponsoring or supporting coups in the region as a way to scratch at each other.

I expect we’ll see the same decline in the frequency of coups in Africa as more and more countries get into positive spirals of development. We’ve already seen a decline in the post-Cold War period, probably due to the end of those superpower proxy struggles, and I’m guessing that current patterns of economic growth and democratization will solidify that shift just as they did in Latin America and Europe before.

What, in your view, makes Africa such fertile ground for coups?

I think my answer to number 1 goes about as far as I can on this question. I’m sure there are other aspects, too, but I’ll leave those to the regional pros to address.

This year, we’ve had two distinct political events in Africa that show a sharp contrast and mixed fortunes for the continent’s push for good governance. These are a seamless transition in Kenya, and a coup in CAR. What do these portend for Africa’s future and struggle for democracy?

As William Gibson supposedly said, “The future is already here. It’s just not evenly distributed.” To me, Kenya looks like a state that’s on the edge of that virtuous cycle of development I mentioned earlier, while CAR still isn’t even really a state in the conventional sense.

It’s interesting to see Tanzania, Kenya’s neighbour, at number 22 in your list. Tanzania has been relatively stable, why does it land on the model?

Tanzania ranks relatively high on the list because in spite of its reputation as a stable democracy, it’s got the basic features that have historically been associated with the occurrence of coups. Most notably, it’s got a high infant mortality rate relative to most of the world, political institutions that combine features of democracy and authoritarianism, and sharply polarized politics.

Now, it’s worth underscoring that the risk of a coup attempt in any one country in any given year is generally very low, even in the countries toward the top of those rankings. There are usually only a handful of coups and failed coup attempts worldwide each year, so the best prediction for even the highest-risk countries will almost always be that no coup will occur. If the forecasting models are working well, then all or nearly all of the coup attempts we do see will occur in the couple of dozen countries at the top of the annual rankings. Those rankings most definitely do not mean that we should expect to see coup attempts in all of those countries, and that certainly goes for Tanzania, too.

Follow

Get every new post delivered to your Inbox.

Join 3,358 other followers

%d bloggers like this: