Introducing A New Venue for Atrocities Early Warning

Starting today, the bits of this blog on forecasting and monitoring mass atrocities are moving to their proper home, or at least the initial makings of it. Say hi to the (interim) blog of the Early Warning Project.

Since 2012, I have been working as a consultant to the U.S. Holocaust Memorial Museum’s Center for the Prevention of Genocide (CPG) to help build a new global early-warning system for mass atrocities. As usual, that process is taking longer than we had expected. We now have working versions of the project’s two main forecasting streams—statistical risk assessments and a “wisdom of (expert) crowds” system called an opinion pool—and CPG has hired a full-time staffer (hi, Ali) to manage their day-to-day workings. Unfortunately, though, the web site that will present, discuss, and invite discussion of those forecasts is still under construction. Thanks to Dartmouth’s DALI Lab, we’ve got a great prototype, but there’s finishing work to be done, and doing it takes a while.

Well, delays, be damned. We think the content we’re producing is useful now, so we’re not waiting for that site to get finished to start sharing it. Instead, we’re launching this interim blog to go ahead and start doing things like:

When the project’s full-blown web site finally goes up, it will feature a blog, too, and all of the content from this interim venue will migrate there. Until then, if you’re interested in atrocities early warning and prevention—or applied forecasting more generally—please come see what we’re doing, share what you find interesting, and help us think about how to do it even better.

Meanwhile, Dart-Throwing Chimp will keep plugging along on its core themes of democratization, political instability, and forecasting. If you’ve got the interest and the bandwidth, I hope you’ll find time to watch and engage with both channels.

Advertisements

Early Results from a New Atrocities Early Warning System

For the past couple of years, I have been working as a consultant to the U.S. Holocaust Memorial Museum’s Center for the Prevention of Genocide to help build a new early-warning system for mass atrocities around the world. Six months ago, we started running the second of our two major forecasting streams, a “wisdom of (expert) crowds” platform that aggregates probabilistic forecasts from a pool of topical and area experts on potential events of concern. (See this conference paper for more detail.)

The chart below summarizes the output from that platform on most of the questions we’ve asked so far about potential new episodes of mass killing before 2015. For our early-warning system, we define a mass killing as an episode of sustained violence in which at least 1,000 noncombatant civilians from a discrete group are intentionally killed, usually in a period of a year or less. Each line in the chart shows change over time in the daily average of the inputs from all of the participants who choose to make a forecast on that question. In other words, the line is a mathematical summary of the wisdom of our assembled crowd—now numbering nearly 100—on the risk of a mass killing beginning in each case before the end of 2014. Also:

  • Some of the lines (e.g., South Sudan, Iraq, Pakistan) start further to the right than others because we did not ask about those cases when the system launched but instead added them later, as we continue to do.
  • Two lines—Central African Republic and South Sudan—end early because we saw onsets of mass-killing episodes in those countries. The asterisks indicate the dates on which we made those declarations and therefore closed the relevant questions.
  • Most but not all of these questions ask specifically about state-led mass killings, and some focus on specific target groups (e.g., the Rohingya in Burma) or geographic regions (the North Caucasus in Russia) as indicated.
Crowd-Estimated Probabilities of Mass-Killing Onset Before 1 January 2015

Crowd-Estimated Probabilities of Mass-Killing Onset Before 1 January 2015

I look at that chart and conclude that this process is working reasonably well so far. In the six months since we started running this system, the two countries that have seen onsets of mass killing are both ones that our forecasters promptly and consistently put on the high side of 50 percent. Nearly all of the other cases, where mass killings haven’t yet occurred this year, have stuck on the low end of the scale.

I’m also gratified to see that the system is already generating the kind of dynamic output we’d hoped it would, even with fewer than 100 forecasters in the pool. In the past several weeks, the forecasts for both Burma and Iraq have risen sharply, apparently in response to shifts in relevant policies in the former and the escalation of the civil war in the latter. Meanwhile, the forecast for Uighurs in China has risen steadily over the year as a separatist rebellion in Xinjiang Province has escalated and, with it, concerns about a harsh government response. These inflection points and trends can help identify changes in risk that warrant attention from organizations and individuals concerned about preventing or mitigating these potential atrocities.

Finally, I’m also intrigued to see that our opinion pool seems to be sorting cases into a few clusters that could be construed as distinct tiers of concern. Here’s what I have in mind:

  • Above the 50-percent threshold are the high risk cases, where forecasters assess that mass killing is likely to occur during the specified time frame.  These cases won’t necessarily be surprising. Some observers had been warning on the risk of mass atrocities in CAR and South Sudan for months before those episodes began, and the plight of the Rohingya in Burma has been a focal point for many advocacy groups in the past year. Even in supposedly “obvious” cases, however, this system can help by providing a sharper estimate of that risk and giving a sense of how it is trending over time. In the case of Burma, for example, it is the separation that has happened in the last several weeks that tells the story of a switch from possible to likely and thus adds a degree of urgency to that warning.
  • A little farther down the y-axis are the moderate risk cases—ones that probably won’t suffer mass killing during the period in question but could more readily tip in that direction. In the chart above, Iraq, Sudan, Pakistan, Bangladesh, and Burundi all land in this tier, although Iraq now appears to be sliding into the high risk group.
  • Clustered toward the bottom are the low risk cases where the forecasters seem fairly confident that mass killing will not occur in the near future. In the chart above, Russia, Afghanistan, and Ethiopia are the cases that land firmly in this set. China (Uighurs) remains closer to them than the moderate risk tier, but it appears to be creeping toward the moderate-risk group. We are also running a question about the risk of state-led mass killing in Rwanda before 2015, and it currently lands in this tier, with a forecast of 14 percent.

The system that generates the data behind this chart is password protected, but the point of our project is to make these kinds of forecasts freely available to the global public. We are currently building the web site that will display the forecasts from this opinion pool in real time to all comers and hope to have it ready this fall.

In the meantime, if you think you have relevant knowledge or expertise—maybe you study or work on this topic, or maybe you live or work in parts of the world where risks tend to be higher—and are interested in volunteering as a forecaster, please send an email to us at ewp@ushmm.org.

Alarmed By Iraq

Iraq’s long-running civil war has spread and intensified again over the past year, and the government’s fight against a swelling Sunni insurgency now threatens to devolve into the sort of indiscriminate reprisals that could produce a new episode of state-led mass killing there.

The idea that Iraq could suffer a new wave of mass atrocities at the hands of state security forces or sectarian militias collaborating with them is not far fetched. According to statistical risk assessments produced for our atrocities early-warning project (here), Iraq is one of the 10 countries worldwide most susceptible to an onset of state-led mass killing, bracketed by places like Syria, Sudan, and the Central African Republic where large-scale atrocities and even genocide are already underway.

Of course, Iraq is already suffering mass atrocities of its own at the hands of insurgent groups who routinely kill large numbers of civilians in indiscriminate attacks, every one of which would stun American or European publics if it happened there. According to the widely respected Iraq Body Count project, the pace of civilian killings in Iraq accelerated sharply in July 2013 after a several-year lull of sorts in which “only” a few hundred civilians were dying from violence each month. Since the middle of last year, the civilian toll has averaged more than 1,000 fatalities per month. That’s well off the pace of 2006-2007, the peak period of civilian casualties under Coalition occupation, but it’s still an astonishing level of violence.

Monthly Counts of Civilian Deaths from Violence in Iraq (Source: Iraq Body Count)

Monthly Counts of Civilian Deaths from Violence in Iraq (Source: Iraq Body Count)

What seems to be increasing now is the risk of additional atrocities perpetrated by the very government that is supposed to be securing civilians against those kinds of attacks. A Sunni insurgency is gaining steam, and the government, in turn, is ratcheting up its efforts to quash the growing threat to its power in worrisome ways. A recent Reuters story summarized the current situation:

In Buhriz and other villages and towns encircling the capital, a pitched battle is underway between the emboldened Islamic State of Iraq and the Levant, the extremist Sunni group that has led a brutal insurgency around Baghdad for more than a year, and Iraqi security forces, who in recent months have employed Shi’ite militias as shock troops.

And this anecdote from the same Reuters story shows how that battle is sometimes playing out:

The Sunni militants who seized the riverside town of Buhriz late last month stayed for several hours. The next morning, after the Sunnis had left, Iraqi security forces and dozens of Shi’ite militia fighters arrived and marched from home to home in search of insurgents and sympathizers in this rural community, dotted by date palms and orange groves.

According to accounts by Shi’ite tribal leaders, two eyewitnesses and politicians, what happened next was brutal.

“There were men in civilian clothes on motorcycles shouting ‘Ali is on your side’,” one man said, referring to a key figure in Shi’ite tradition. “People started fleeing their homes, leaving behind the elders and young men and those who refused to leave. The militias then stormed the houses. They pulled out the young men and summarily executed them.”

Sadly, this escalatory spiral of indiscriminate violence is not uncommon in civil wars. Ben Valentino, a collaborator of mine in the development of this atrocities early-warning project, has written extensively on this topic (see especially here , here, and here). As Ben explained to me via email,

The relationship between counter-insurgency and mass violence against civilians is one of the most well-established findings in the social science literature on political violence. Not all counter-insurgency campaigns lead to mass killing, but when insurgent groups become large and effective enough to seriously threaten the government’s hold on power and when the rebels draw predominantly on local civilians for support, the risks of mass killing are very high. Usually, large-scale violence against civilians is neither the first nor the only tactic that governments use to defeat insurgencies. They may try to focus operations primarily against armed insurgents, or even offer positive incentives to civilians who collaborate with the government. But when less violent methods fail, the temptation to target civilians in the effort to defeat the rebels increases.

Right now, it’s hard to see what’s going to halt or reverse this trend in Iraq. “Things can get much worse from where we are, and more than likely they will,” Daniel Serwer told IRIN News for a story on Iraq’s escalating conflict (here). Other observers quoted in the same story seemed to think that conflict fatigue would keep the conflict from ballooning further, but that hope is hard to square with the escalation of violence that has already occurred over the past year and the fact that Iraq’s civil war never really ended.

In theory, elections are supposed to be a brake on this process, giving rival factions opportunities to compete for power and influence state policy in nonviolent ways. In practice, this often isn’t the case. Instead, Iraq appears to be following the more conventional path in which election winners focus on consolidating their own power instead of governing well, and excluded factions seek other means to advance their interests. Here’s part of how the New York Times set the scene for this week’s elections, which incumbent prime minister Nouri al-Maliki’s coalition is apparently struggling to win:

American intelligence assessments have found that Mr. Maliki’s re-election could increase sectarian tensions and even raise the odds of a civil war, citing his accumulation of power, his failure to compromise with other Iraqi factions—Sunni or Kurd—and his military failures against Islamic extremists. On his watch, Iraq’s American-trained military has been accused by rights groups of serious abuses as it cracks down on militants and opponents of Mr. Maliki’s government, including torture, indiscriminate roundups of Sunnis and demands of bribes to release detainees.

Because Iraq ranked so high in our last statistical risk assessments, we posted a question about it a few months ago on our “wisdom of (expert) crowds” forecasting system. Our pool of forecasters is still relatively small—89 as I write this—but the ones who have weighed in on this topic so far have put it in what I see as a middle tier of concern, where the risk is seen as substantial but not imminent or inevitable. Since January, the pool’s estimated probability of an onset of state-led mass killing in Iraq in 2014 has hovered around 20 percent, alongside countries like Pakistan (23 percent), Bangladesh (20 percent), and Burundi (19 percent) but well behind South Sudan (above 80 percent since December) and Myanmar (43 percent for the risk of a mass killing targeting the Rohingya in particular).

Notably, though, the estimate for Iraq has ticked up a few notches in the past few days to 27 percent as forecasters (including me) have read and discussed some of the pre-election reports mentioned here. I think we are on to something that deserves more scrutiny than it appears to be getting.

Forecasting Round-Up No. 6

The latest in a very occasional series.

1. The Boston Globe ran a story a few days ago about a company that’s developing algorithms to predict which patients in cardiac intensive care units are most likely to take a turn for the worse (here). The point of this exercise is to help doctors and nurses allocate their time and resources more efficiently and, ideally, to give them more lead time to try to stop those bad turns from happening.

The story suffers some rhetorical tics common to press reports on “predictive analytics.” For example, we never hear any specifics about the analytic techniques used or the predictive accuracy of the tool, and the descriptions of machine learning tilt toward the ingenuous (e.g., “The more data fed into the model, the more accurate the prediction becomes”). On the whole, though, I think this article does a nice job representing the promise and reality of this kind of work. The following passage especially resonated with me, because it describes a process for applying these predictions that sounds like the one I have in mind when building my own forecasting tools:

The unit’s medical director, Dr. Melvin Almodovar, uses [the prediction tool] to double-check his own clinical assessment of patients. Etiometry’s founders are careful to note that physicians will always be the ultimate bedside decision makers, using the Stability Index to confirm or inform their own diagnoses.

Butler said that an information-overload environment like the intensive care unit is ideal for a data-driven risk assessment tool, because the patients teeter between life and death. A predictive model can act as an early warning system, pointing out risky changes in multiple vital signs in a more sophisticated way than bedside alarms.

When our predictive models aren’t as accurate as we’d like or don’t yet have a clear track record, this hybrid approach—decisions are informed by the forecasts but not determined by them—is a prudent way to go. In the cardiac intensive care unit, doctors are already applying their own mental models to these data, so the idea of developing explicit algorithms to do the same isn’t a stretch (or shouldn’t be, but…). Unlike those doctors, though, statistical models won’t suffer from low blood sugar or distraction or become emotionally attached to some patients but not others. Also unlike the mental models doctors use now, statistical models will produce explicit forecasts that can be collected and assessed over time. The resulting feedback will give the stats guys many opportunities to improve their models, and the hospital staff a chance to get a feel for the models’ strengths and limitations. When you’re making such weighty decisions, why wouldn’t you want that additional information?

2. Lyle Ungar recently discussed forecasting with the Machine Intelligence Research Institute (here). The whole thing deserves a read, but I especially liked this framework for thinking about when different methods work best:

I think one can roughly characterize forecasting problems into categories—each requiring different forecasting methods—based, in part, on how much historical data is available.

Some problems, like the geo-political forecasting [the Good Judgment Project is] doing, require lots collection of information and human thought. Prediction markets and team-based forecasts both work well for sifting through the conflicting information about international events. Computer models mostly don’t work as well here—there isn’t a long enough track records of, say, elections or coups in Mali to fit a good statistical model, and it isn’t obvious what other countries are ‘similar.’

Other problems, like predicting energy usage in a given city on a given day, are well suited to statistical models (including neural nets). We know the factors that matter (day of the week, holiday or not, weather, and overall trends), and we have thousands of days of historical observation. Human intuition is not as going to beat computers on that problem.

Yet other classes of problems, like economic forecasting (what will the GDP of Germany be next year? What will unemployment in California be in two years) are somewhere in the middle. One can build big econometric models, but there is still human judgement about the factors that go into them. (What if Merkel changes her mind or Greece suddenly adopts austerity measures?) We don’t have enough historical data to accurately predict economic decisions of politicians.

The bottom line is that if you have lots of data and the world isn’t changing to much, you can use statistical methods. For questions with more uncertain, human experts become more important.

I might disagree on the particular problem of forecasting coups in Mali, but I think the basic framework that Lyle proposes is right.

3. Speaking of the Good Judgment Project (GJP), a bevy of its researchers, including Ungar, have an article in the March 2014 issue of Psychological Science (here) that shows how certain behavioral interventions can significantly boost the accuracy of forecasts derived from subjective judgments. Here’s the abstract:

Five university-based research groups competed to recruit forecasters, elicit their predictions, and aggregate those predictions to assign the most accurate probabilities to events in a 2-year geopolitical forecasting tournament. Our group tested and found support for three psychological drivers of accuracy: training, teaming, and tracking. Probability training corrected cognitive biases, encouraged forecasters to use reference classes, and provided forecasters with heuristics, such as averaging when multiple estimates were available. Teaming allowed forecasters to share information and discuss the rationales behind their beliefs. Tracking placed the highest performers (top 2% from Year 1) in elite teams that worked together. Results showed that probability training, team collaboration, and tracking improved both calibration and resolution. Forecasting is often viewed as a statistical problem, but forecasts can be improved with behavioral interventions. Training, teaming, and tracking are psychological interventions that dramatically increased the accuracy of forecasts. Statistical algorithms (reported elsewhere) improved the accuracy of the aggregation. Putting both statistics and psychology to work produced the best forecasts 2 years in a row.

The atrocities early-warning project on which I’m working is learning from GJP in real time, and we hope to implement some of these lessons in the opinion pool we’re running (see this conference paper for details).

Speaking of which: If you know something about conflict or atrocities risk or a particular part of the world and are interested in volunteering as a forecaster, please send an email to ewp@ushmm.org.

4. Finally, Daniel Little writes about the partial predictability of social upheaval on his terrific blog, Understanding Society (here). The whole post deserves reading, but here’s the nub (emphasis in the original):

Take unexpected moments of popular uprising—for example, the Arab Spring uprisings or the 2013 riots in Stockholm. Are these best understood as random events, the predictable result of long-running processes, or something else? My preferred answer is something else—in particular, conjunctural intersections of independent streams of causal processes (link). So riots in London or Stockholm are neither fully predictable nor chaotic and random.

This matches my sense of the problem and helps explain why predictive models of these events will never be as accurate as we might like but are still useful, as are properly elicited and combined forecasts from people using their noggins.

The Rwanda Enigma

For analysts and advocates trying to assess risks of future mass atrocities in hopes of preventing them, Rwanda presents an unusual puzzle. Most of the time, specialists in this field readily agree on which countries are especially susceptible to genocide or mass killing, either because those countries are either already experiencing large-scale civil conflict or because they are widely considered susceptible to it. Meanwhile, countries that sustain long episodes of peace and steadily grow their economies are generally presumed to have reduced their risk and eventually to have escaped this trap for good.

Contemporary Rwanda is puzzling because it provokes a polarized reaction. Many observers laud Rwanda as one of Africa’s greatest developmental successes, but others warn that it remains dangerously prone to mass atrocities. In a recent essay for African Arguments on how the Rwandan genocide changed the world, Omar McDoom nicely encapsulates this unusual duality:

What has changed inside Rwanda itself since the genocide? The country has enjoyed a remarkable period of social stability. There has not been a serious incident of ethnic violence in Rwanda for nearly two decades. Donors have praised the country’s astonishing development.  Economic growth has averaged over 6% per year, poverty and inequality have declined, child and maternal mortality have improved, and primary education is now universal and free. Rwanda has shown, in defiance of expectations, that an African state can deliver security, public services, and rising prosperity.

Yet, politically, there is some troubling continuity with pre-genocide Rwanda. Power remains concentrated in the hands of a small, powerful ethnic elite led by a charismatic individual with authoritarian tendencies. In form, current president Paul Kagame and his ruling party, the RPF, the heroes who ended the genocide, appear to exercise power in a manner similar to former president Juvenal Habyarimana and his ruling MRND party, the actors closely-tied to those who planned the slaughter. The genocide is testament to what unconstrained power over Rwanda’s unusually efficient state machinery can enable.

That duality also emerges from a comparison of two recent quantitative rankings. On the one hand, The World Bank now ranks Rwanda 32nd on the latest edition of its “ease of doing business” index—not 32nd in Africa, but 32nd of 189 countries worldwide. On the other hand, statistical assessments of the risk of an onset of state-led mass killing identify Rwanda as one of the 25 countries worldwide currently most vulnerable to this kind of catastrophe.

How can both of these things be true? To answer that question, we need to have a clearer sense of where that statistical risk assessment comes from. The number that ranks Rwanda among the 25 countries most susceptible to state-led mass killing is actually an average of forecasts from three models representing a few different ideas about the origins of mass atrocities, all applied to publicly available data from widely used sources.

  • Drawing on work by Barbara Harff and the Political Instability Task Force, the first model emphasizes features of countries’ national politics that hint at a predilection to commit genocide or “politicide,” especially in the context of political instability. Key risk factors in Harff’s model include authoritarian rule, the political salience of elite ethnicity, evidence of an exclusionary elite ideology, and international isolation as measured by trade openness.
  • The second model takes a more instrumental view of mass killing. It uses statistical forecasts of future coup attempts and new civil wars as proxy measures of things that could either spur incumbent rulers to lash out against threats to their power or usher in an insecure new regime that might do the same.
  • The third model is really not a model but a machine-learning process called Random Forests applied to the risk factors identified by the other two. The resulting algorithm is an amalgamation of theory and induction that takes experts’ beliefs about the origins of mass killing as its jumping-off point but also leaves more room for inductive discovery of contingent effects.

All of these models are estimated from historical data that compares cases where state-led mass killings occurred to ones where they didn’t. In essence, we look to the past to identify patterns that will help us spot cases at high risk of mass killing now and in the future. To get our single-best risk assessment—the number that puts Rwanda in the top (or bottom) 25 worldwide—we simply average the forecasts from these three models. We prefer the average to a single model’s output because we know from work in many fields—including meteorology and elections forecasting—that this “ensemble” approach generally produces more accurate assessments than we could expect to get from any one model alone. By combining forecasts, we learn from all three perspectives and hedge against the biases of any one of them.

Rwanda lands in the top 25 worldwide because all three models identify it as a relatively high-risk case. It ranks 15th on the PITF/Harff model, 28th on the “elite threat” model, and 30th on the Random Forest. The PITF/Harff model sees a relatively low risk in Rwanda of the kinds of political instability that typically trigger onsets of genocide or politicide, but it also pegs Rwanda as the kind of regime most likely to resort to mass atrocities if instability were to occur—namely, an autocracy in which elites’ ethnicity is politically salient in a country with a recent history of genocide. Rwanda also scores fairly high on the “elite threat” model because, according to our models of these things, it is at relatively high risk of a new insurgency and moderate risk of a coup attempt. Finally, the Random Forest sees a very low probability of mass killing onset in Rwanda but still pegs it as a riskier case than most.

Our identification of Rwanda as a relatively high-risk case is echoed by some, but not all, of the other occasional global assessments of countries’ susceptibility to mass atrocities. In her own applications of her genocide/politicide model for the task of early warning, Barbara Harff pegged Rwanda as one of the world’s riskiest cases in 2011 but not in 2013. Similarly, the last update of Genocide Watch’s Countries at Risk Report, in 2012, lists Rwanda as one of more than a dozen countries at stage five of seven on the path to genocide, putting it among the 35 countries worldwide at greatest risk. By contrast, the Global Centre for the Responsibility to Protect has not identified Rwanda as a situation of concern in any of its R2P Monitor reports to date, and the Sentinel Project for Genocide Prevention does not list Rwanda among its situations of concern, either. Meanwhile, recent reporting on Rwanda from Human Rights Watch has focused mostly on the pursuit of justice for the 1994 genocide and other kinds of human-rights violations in contemporary Rwanda.

To see what our own pool of experts makes of our statistical risk assessment and to track changes in their views over time, we plan to add a question to our “wisdom of (expert) crowds” forecasting system asking about the prospect of a new state-led mass killing in Rwanda before 2015. If one does not happen, as we hope and expect will be the case, we plan to re-launch the question at the start of next year and will continue to do so as long as our statistical models keep identifying it as a case of concern.

In the meantime, I thought it would be useful to ask a few country experts what they make of this assessment and how a return to mass killing in Rwanda might come about. Some were reluctant to speak on the record, and understandably so. The present government of Rwanda has a history of intimidating individuals it perceives as its critics. As Michaela Wrong describes in a recent piece for Foreign Policy,

A U.S. State Department spokesperson said in mid-January, “We are troubled by the succession of what appear to be politically motivated murders of prominent Rwandan exiles. President Kagame’s recent statements about, quote, ‘consequences’ for those who betray Rwanda are of deep concern to us.”

It is a pattern that suggests the Rwandan government may have come to see the violent silencing of critics—irrespective of geographical location and host country—as a beleaguered country’s prerogative.

Despite these constraints, the impression I get from talking to some experts and reading the work of others is that our risk assessment strikes nearly all of them as plausible. None said that he or she expects an episode of state-led mass killing to begin soon in Rwanda. Consistent with the thinking behind our statistical models, though, many seem to believe that another mass killing could occur in Rwanda, and if one did, it would almost certainly come in reaction to some other rupture in that country’s political stability.

Filip Reyntjens, a professor at the University of Antwerpen who wrote a book on Rwandan politics since the 1994 genocide, was both the most forthright and the most pessimistic in his assessment. Via email, he described Rwanda as

A volcano waiting to erupt. Nearly all field research during the last 15 years points at pervasive structural violence that may, as we know, become physical, acute violence following a trigger. I don’t know what that trigger will be, but I think a palace revolution or a coup d’etat is the most likely scenario. That may create a situation difficult to control.

In a recent essay for Juncture that was adapted for the Huffington Post (here), Phil Clark sounds more optimistic than Reyntjens, but he is not entirely sanguine, either. Clark sees the structure and culture of the country’s ruling party, the Rwandan Patriotic Front (RPF), as the seminal feature of Rwandan politics since the genocide and describes it as a double-edged sword. On the one hand, the RPF’s cohesiveness and dedication to purpose has enabled it, with help from an international community with a guilty conscience, to make “enormous” developmental gains. On the other hand,

The RPF’s desire for internal cohesion has made it suspicious of critical voices within and outside of the party—a feature compounded by Rwanda’s fraught experience of multi-party democracy in the early 1990s, which saw the rise of ethnically driven extremist parties and helped to create an environment conducive to genocide. The RPF’s singular focus on rebuilding the nation and facilitating the return of refugees means it has often viewed dissent as an unaffordable distraction. The disastrous dalliance with multipartyism before the genocide has only added to the deep suspicion of policy based on the open contestation of ideas.

Looking ahead, Clark wonders what happens when that intolerance for dissent bumps up against popular frustrations, as it probably will at some point:

For the moment, there are few signs of large-scale popular discontent with the closed political space. However, any substantial decline in socio-economic conditions in the countryside will challenge this. The RPF’s gamble appears to be that the population will tolerate a lack of national political contestation provided domestic stability and basic living standards are maintained. For now, the RPF seems to have rightly judged the popular mood but that situation may not hold.

Journalist Kris Berwouts portrays similarly ambiguous terrain in a recent piece for the Dutch magazine Mo that also appeared on the blog African Arguments (here). Berwouts quotes David Himbara, a former Rwandan regime insider who left the country in 2010 and has vocally criticized the Kagame government ever since, as telling him that “all society has vanished from Rwanda, mistrust is complete. It has turned Rwanda into a time bomb.” But Berwouts juxtaposes that dire assessment with the cautiously optimistic view of Belgian journalist Marc Hoogsteyns, who has worked in the region for years and has family ties by marriage to its Tutsi community. According to Hoogsteyns,

Rwanda is a beautiful country with many strengths and opportunities, but at the same time it is some kind of African version of Brave New World. People are afraid to talk. But they live more comfortably and safely than ever before, they enjoy high quality education and health care. They are very happy with that. The Tutsi community stands almost entirely behind Kagame and also most Hutu can live with it. They obviously don’t like the fact that they do not count on the political scene, but they can do what they want in all other spheres of live. They can study and do business etcetera. They can deal with the level of repression, because they know that countries such as Burundi, Congo or Kenya are not the slightest bit more democratic. Honestly, if we would have known twenty years ago, just after the genocide, that Rwanda would achieve this in two decades, we would have signed for it immediately.

As people of a certain age in places like Sarajevo or Bamako might testify, though, stability is a funny thing. It’s there until it isn’t, and when it goes, it sometimes goes quickly. In this sense, the political crises that sometimes produce mass killings are more like earthquakes than elections. We can spot the vulnerable structures fairly accurately, but we’re still not very good at anticipating the timing and dynamics of ruptures in them.

In the spirit of that last point, it’s important to acknowledge that the statistical assessment of Rwanda’s risk to mass killing is a blunt piece of information. Although it does specifically indicate a susceptibility to atrocities perpetrated by state security forces or groups acting at their behest, it does not necessarily implicate the RPF as the likely perpetrators. The qualitative assessments discussed above suggest that some experts find that scenario plausible, but it isn’t the only one consistent with our statistical finding. A new regime brought to power by coup or revolution could also become the agent of a new wave of mass atrocities in Rwanda, and the statistical forecast would be just as accurate.

Egypt’s recent past offers a case in point. Our statistical assessments of susceptibility to state-led mass killing in early 2013 identified Egypt as a relatively high-risk case, like Rwanda now. At the time, Mohammed Morsi was president, and one plausible interpretation of that risk assessment might have centered on the threat the Muslim Brotherhood’s supporters posed to Egypt’s Coptic Christians. Fast forward to July 2013, and the mass killing we ended up seeing in Egypt came at the hands of an army and police who snatched power away from Morsi and the Brotherhood and then proceeded to kill hundreds of their unarmed sympathizers. That outcome doesn’t imply that Coptic Christians weren’t at grave risk before the coup, but it should remind us to consider a variety of ways these systemic risks might become manifest.

Still, after conversations with a convenience sample of regional experts, I am left with the impression that the risk our statistical models identify of a new state-led mass killing in Rwanda is real, and that it is possible to imagine the ruling RPF as the agents of such violence.

No one seems to expect the regime to engage in mass violence without provocation, but the possibility of a new Hutu insurgency, and the state’s likely reaction to it, emerged from those conversations as perhaps the most likely scenario. According to some of the experts with whom I spoke, many Rwandan Hutus are growing increasingly frustrated with the RPF regime, and some radical elements of the Hutu diaspora appear to be looking for ways to take up that mantle. The presence of an insurgency is the single most-powerful predictor of state-led mass killing, and it does not seem far fetched to imagine the RPF regime using “scorched earth” tactics in response to the threat or occurrence of attacks on its soldiers and Tutsi citizens. After all, this is the same regime whose soldiers pursued Hutu refugees into Zaire in the mid-1990s and, according to a 2010 U.N. report, participated in the killings of tens of thousands of civilians in war crimes that were arguably genocidal.

Last but not least, we can observe that Rwanda has suffered episodes of mass killing roughly once per generation since independence—in the early 1960s, in 1974, and again in the early 1990s, culminating in the genocide of 1994 and the reprisal killings that followed. History certainly isn’t destiny, but our statistical models confirm that in the case of mass atrocities, it often rhymes.

It saddens me to write this piece about a country that just marked the twentieth anniversary of one of the most lethal genocides since the Holocaust, but the point of our statistical modeling is to see what the data say that our mental models and emotional assessments might overlook. A reprisal of mass killing in Rwanda would be horribly tragic. As Free Africa Foundation president George Ayittey wrote in a recent letter of the Wall Street Journal, however, “The real tragedy of Rwanda is that Mr. Kagame is so consumed by the 1994 genocide that, in his attempt to prevent another one, he is creating the very conditions that led to it.”

Watch Experts’ Beliefs Evolve Over Time

On 15 December 2013, “something” happened in South Sudan that quickly began to spiral into a wider conflict. Prior research tells us that mass killings often occur on the heels of coup attempts and during civil wars, and at the time South Sudan ranked among the world’s countries at greatest risk of state-led mass killing.

Motivated by these two facts, I promptly added a question about South Sudan to the opinion pool we’re running as part of a new atrocities early-warning system for the U.S. Holocaust Memorial Museum’s Center for the Prevention of Genocide (see this recent post for more on that). As it happened, we already had one question running about the possibility of a state-led mass killing in South Sudan targeting the Murle, but the spiraling conflict clearly implied a host of other risks. Posted on 18 December 2013, the new question asked, “Before 1 January 2015, will an episode of mass killing occur in South Sudan?”

The criteria we gave our forecasters to understand what we mean by “mass killing” and how we would decide if one has happened appear under the Background Information header at the bottom of this post. Now, shown below is an animated sequence of kernel density plots of each day’s forecasts from all participants who’d chosen to answer this question. A kernel density plot is like a histogram, but with some nonparametric estimation thrown in to try to get at the distribution of a variable’s “true” values from the sample of observations we’ve got. If that sound like gibberish to you, just think of the peaks in the plots as clumps of experts who share similar beliefs about the likelihood of mass killing in South Sudan. The taller the peak, the bigger the clump. The farther right the peak, the more likely that clump thinks a mass killing is.

kplot.ssd.20140205

I see a couple of interesting patterns in those plots. The first is the rapid rightward shift in the distribution’s center of gravity. As the fighting escalated and reports of atrocities began to trickle in (see here for one much-discussed article from the time), many of our forecasters quickly became convinced that a mass killing would occur in South Sudan in the coming year, if one wasn’t occurring already. On 23 December—the date that aforementioned article appeared—the average forecast jumped to approximately 80 percent, and it hasn’t fallen below that level since.

The second pattern that catches my eye is the appearance in January of a long, thin tail in the distribution that reaches into the lower ranges. That shift in the shape of the distribution coincides with stepped-up efforts by U.N. peacekeepers to stem the fighting and the start of direct talks between the warring parties. I can’t say for sure what motivated that shift, but it looks like our forecasters split in their response to those developments. While most remained convinced that a mass killing would occur or had already, a few forecasters were apparently more optimistic about the ability of those peacekeepers or talks or both to avert a full-blown mass killing. A few weeks later, it’s still not clear which view is correct, although a forthcoming report from the U.N. Mission in South Sudan may soon shed more light on this question.

I think this set of plots is interesting on its face for what it tells us about the urgent risk of mass atrocities in South Sudan. At the same time, I also hope this exercise demonstrates the potential to extract useful information from an opinion pool beyond a point-estimate forecast. We know from prior and ongoing research that those point estimates can be quite informative in their own right. Still, by looking at the distribution of participant’s forecasts on a particular question, we can glean something about the degree of uncertainty around an event of interest or concern. By looking for changes in that distribution over time, we can also get a more complete picture of how the group’s beliefs evolve in response to new information than a simple line plot of the average forecast could ever tell us. Look for more of this work as our early-warning system comes online, hopefully in the next few months.

UPDATE (7 Feb): At the urging of Trey Causey, I tried making another version of this animation in which the area under the density plot is filled in. I also decided to add a vertical line to show each day’s average forecast, which is what we currently report as the single-best forecast at any given time. Here’s what that looks like, using data from a question on the risk of a mass killing occurring in the Central African Republic before 2015. We closed this question on 19 December 2013, when it became clear through reporting by Human Rights Watch and others that an episode of mass killing has occurred.

kplot2.car.20140207

Background Information

We will consider a mass killing to have occurred when the deliberate actions of state security forces or other armed groups result in the deaths of at least 1,000 noncombatant civilians over a period of one year or less.

  • A noncombatant civilian is any person who is not a current member of a formal or irregular military organization and who does not apparently pose an immediate threat to the life, physical safety, or property of other people.
  • The reference to deliberate actions distinguishes mass killing from deaths caused by natural disasters, infectious diseases, the accidental killing of civilians during war, or the unanticipated consequences of other government policies. Fatalities should be considered intentional if they result from actions designed to compel or coerce civilian populations to change their behavior against their will, as long as the perpetrators could have reasonably expected that these actions would result in widespread death among the affected populations. Note that this definition also covers deaths caused by other state actions, if, in our judgment, perpetrators enacted policies/actions designed to coerce civilian population and could have expected that these policies/actions would lead to large numbers of civilian fatalities. Examples of such actions include, but are not limited to: mass starvation or disease-related deaths resulting from the intentional confiscation, destruction, or medicines or other healthcare supplies; and deaths occurring during forced relocation or forced labor.
  • To distinguish mass killing from large numbers of unrelated civilian fatalities, the victims of mass killing must appear to be perceived by the perpetrators as belonging to a discrete group. That group may be defined communally (e.g., ethnic or religious), politically (e.g., partisan or ideological), socio-economically (e.g., class or professional), or geographically (e.g., residents of specific villages or regions). In this way, apparently unrelated executions by police or other state agents would not qualify as mass killing, but capital punishment directed against members of a specific political or communal group would.

The determination of whether or not a mass killing has occurred will be made by the administrators of this system using publicly available secondary sources and in consultation with subject-matter experts. Relevant evidence will be summarized in a blog post published when the determination is announced, and any dissenting views will be discussed as well.

Will Unarmed Civilians Soon Get Massacred in Ukraine?

According to one pool of forecasters, most probably not.

As part of a public atrocities early-warning system I am currently helping to build for the U.S. Holocaust Memorial Museum’s Center for the Prevention of Genocide (see here), we are running a kind of always-on forecasting survey called an opinion pool. An opinion pool is similar in spirit to a prediction market, but instead of having participants trade shares tied the occurrence of some future event, we simply ask participants to estimate the probability of each event’s occurrence. In contrast to a traditional survey, every question remains open until the event occurs or the forecasting window closes. This way, participants can update their forecasts as often as they like, as they see or hear relevant information or just change their minds.

With generous support from Inkling, we started up our opinion pool in October, aiming to test and refine it before our larger early-warning system makes its public debut this spring (we hope). So far, we have only recruited opportunistically among colleagues and professional acquaintances, but we already have more than 70 registered participants. In the first four months of operation, we have used the system to ask more than two dozen questions, two of which have since closed because the relevant events occurred (mass killing in CAR and the Geneva II talks on Syria).

Over the next few years, we aim to recruit a large and diverse pool of volunteer forecasters from around the world with some claim to topical expertise or relevant local knowledge. The larger and more diverse our pool, the more accurate we expect our forecasts to be, and the wider the array of questions we can ask. (If you are interested in participating, please drop me a line at ulfelder <at> gmail <dot> com.)

A few days ago and prompted by a couple of our more active members, I posted a question to our pool asking, “Before 1 March 2014, will any massacres occur in Ukraine?” As of this morning, our pool had made a total of 13 forecasts, and the unweighted average of the latest of those estimates from each participating forecaster was just 15 percent. Under the criteria we specified (see Background Information below), this forecast does not address the risk of large-scale violence against or among armed civilians, nor does it exclude the possibility of a series of small but violent encounters that cumulatively produce a comparable or larger death toll. Still, for those of us concerned that security forces or militias will soon kill nonviolent protesters in Ukraine on a large scale, our initial forecast implies that those fears are probably unwarranted.

Crowd-Estimated Probability of Any Massacres in Ukraine Before 1 March 2014

Crowd-Estimated Probability of Any Massacres in Ukraine Before 1 March 2014

Obviously, we don’t have a crystal ball, and this is just an aggregation of subjective estimates from a small pool of people, none of whom (I think) is on the scene in Ukraine or has inside knowledge of the decision-making of relevant groups. Still, a growing body of evidence shows that aggregations of subjective forecasts like this one can often be usefully accurate (see here), even with a small number of contributing forecasters (see here). On this particular question, I very much hope our crowd is right. Whatever happens in Ukraine over the next few weeks, though, principle and evidence suggest that the method is sound, and we soon expect to be using this system to help assess risks of mass atrocities all over the world in real time.

Background Information

We define a “massacre” as an event that has the following features:

  • At least 10 noncombatant civilians are killed in one location (e.g., neighborhood, town, or village) in less than 48 hours. A noncombatant civilian is any person who is not a current member of a formal or irregular military organization and who does not apparently pose an immediate threat to the life, physical safety, or property of other people.
  • The victims appear to have been the primary target of the violence that killed them.
  • The victims do not appear to have been engaged in violent action or criminal activity when they were killed, unless that violent action was apparently in self-defense.
  • The relevant killings were carried out by individuals affiliated with a social group or organization engaged in a wider political conflict and appear to be connected to each other and to that wider conflict.

Those features will not always be self-evident or uncontroversial, so we use the following series of ad hoc rules to make more consistent judgments about ambiguous events.

  • Police, soldiers, prison guards, and other agents of state security are never considered noncombatant civilians, even if they are killed while off duty or out of uniform.
  • State officials and bureaucrats are not considered civilians when they are apparently targeted because of their professional status (e.g., assassinated).
  • Civilian deaths that occur in the context of operations by uniformed military-service members against enemy combatants are considered collateral damage, not atrocities, and should be excluded unless there is strong evidence that the civilians were targeted deliberately. We will err on the side of assuming that they were not.
  • Deaths from state repression of civilians engaged in nonviolent forms of protest are considered atrocities. Deaths resulting from state repression targeting civilians who were clearly engaged in rioting, looting, attacks on property, or other forms of collective aggression or violence are not.
  • Non-state militant or paramilitary groups, such as militias, gangs, vigilante groups, or raiding parties, are considered combatants, not civilians.

We will use contextual knowledge to determine whether or not a discrete event is linked to a wider conflict or campaign of violence, and we will err on the side of assuming that it is.

Determinations of whether or not a massacre has occurred will be made by the administrator of this system using publicly available secondary sources. Relevant evidence will be summarized in a blog post published when the determination is announced, and any dissenting views will be discussed as well.

Disclosure

I have argued on this blog that scholars have an obligation to disclose potential conflicts of interest when discussing their research, so let me do that again here: For the past two years, I have been paid as a contractor by the U.S. Holocaust Memorial Museum for my work on the atrocities early-warning system discussed in this post. Since the spring of 2013, I have also been paid to write questions for the Good Judgment Project, in which I participated as a forecaster the year before. To the best of my knowledge, I have no financial interests in, and have never received any payments from, any companies that commercially operate prediction markets or opinion pools.

A New Statistical Approach to Assessing Risks of State-Led Mass Killing

Which countries around the world are currently at greatest risk of an onset of state-led mass killing? At the start of the year, I posted results from a wiki survey that asked this question. Now, here in heat-map form are the latest results from a rejiggered statistical process with the same target. You can find a dot plot of these data at the bottom of the post, and the data and code used to generate them are on GitHub.

Estimated Risk of New Episode of State-Led Mass Killing

These assessments represent the unweighted average of probabilistic forecasts from three separate models trained on country-year data covering the period 1960-2011. In all three models, the outcome of interest is the onset of an episode of state-led mass killing, defined as any episode in which the deliberate actions of state agents or other organizations kill at least 1,000 noncombatant civilians from a discrete group. The three models are:

  • PITF/Harff. A logistic regression model approximating the structural model of genocide/politicide risk developed by Barbara Harff for the Political Instability Task Force (PITF). In its published form, the Harff model only applies to countries already experiencing civil war or adverse regime change and produces a single estimate of the risk of a genocide or politicide occurring at some time during that crisis. To build a version of the model that was more dynamic, I constructed an approximation of the PITF’s global model for forecasting political instability and use the natural log of the predicted probabilities it produces as an additional input to the Harff model. This approach mimics the one used by Harff and Ted Gurr in their ongoing application of the genocide/politicide model for risk assessment (see here).
  • Elite Threat. A logistic regression model that uses the natural log of predicted probabilities from two other logistic regression models—one of civil-war onset, the other of coup attempts—as its only inputs. This model is meant to represent the argument put forth by Matt Krain, Ben Valentino, and others that states usually engage in mass killing in response to threats to ruling elites’ hold on power.
  • Random Forest. A machine-learning technique (see here) applied to all of the variables used in the two previous models, plus a few others of possible relevance, using the ‘randomforest‘ package in R. A couple of parameters were tuned on the basis of a gridded comparison of forecast accuracy in 10-fold cross-validation.

The Random Forest proved to be the most accurate of the three models in stratified 10-fold cross-validation. The chart below is a kernel density plot of the areas under the ROC curve for the out-of-sample estimates from that cross-validation drill. As the chart shows, the average AUC for the Random Forest was in the low 0.80s, compared with the high 0.70s for the PITF/Harff and Elite Threat models. As expected, the average of the forecasts from all three performed even better than the best single model, albeit not by much. These out-of-sample accuracy rates aren’t mind blowing, but they aren’t bad either, and they are as good or better than many of the ones I’ve seen from similar efforts to anticipate the onset of rare political crises in countries worldwide.

cpg.statrisk2014.val.auc.by.fold

Distribution of Out-of-Sample AUC Scores by Model in 10-Fold Cross-Validation

The decision to use an unweighted average for the combined forecast might seem simplistic, but it’s actually a principled choice in this instance. When examples of the event of interest are hard to come by and we have reason to believe that the process generating those events may be changing over time, sticking with an unweighted average is a reasonable hedge against risks of over-fitting the ensemble to the idiosyncrasies of the test set used to tune it. For a longer discussion of this point, see pp. 7-8 in the last paper I wrote on this work and the paper by Andreas Graefe referenced therein.

Any close readers of my previous work on this topic over the past couple of years (see here and here) will notice that one model has been dropped from the last version of this ensemble, namely, the one proposed by Michael Colaresi and Sabine Carey in their 2008 article, “To Kill or To Protect” (here). As I was reworking my scripts to make regular updating easier (more on that below), I paid closer attention than I had before to the fact that the Colaresi and Carey model requires a measure of the size of state security forces that is missing for many country-years. In previous iterations, I had worked around that problem by using a categorical version of this variable that treated missingness as a separate category, but this time I noticed that there were fewer than 20 mass-killing onsets in country-years for which I had a valid observation of security-force size. With so few examples, we’re not going to get reliable estimates of any pattern connecting the two. As it happened, this model—which, to be fair to its authors, was not designed to be used as a forecasting device—was also by far the least accurate of the lot in 10-fold cross-validation. Putting two and two together, I decided to consign this one to the scrap heap for now. I still believe that measures of military forces could help us assess risks of mass killing, but we’re going to need more and better data to incorporate that idea into our multimodel ensemble.

The bigger and in some ways more novel change from previous iterations of this work concerns the unorthodox approach I’m now using to make the risk assessments as current as possible. All of the models used to generate these assessments were trained on country-year data, because that’s the only form in which most of the requisite data is produced. To mimic the eventual forecasting process, the inputs to those models are all lagged one year at the model-estimation stage—so, for example, data on risk factors from 1985 are compared with outcomes in 1986, 1986 inputs to 1987 outcomes, and so on.

If we stick rigidly to that structure at the forecasting stage, then I need data from 2013 to produce 2014 forecasts. Unfortunately, many of the sources for the measures used in these models won’t publish their 2013 data for at least a few more months. Faced with this problem, I could do something like what I aim to do with the coup forecasts I’ll be producing in the next few days—that is, only use data from sources that quickly and reliably update soon after the start of each year. Unfortunately again, though, the only way to do that would be to omit many of the variables most specific to the risk of mass atrocities—things like the occurrence of violent civil conflict or the political salience of elite ethnicity.

So now I’m trying something different. Instead of waiting until every last input has been updated for the previous year and they all neatly align in my rectangular data set, I am simply applying my algorithms to the most recent available observation of each input. It took some trial and error to write, but I now have an R script that automates this process at the country level by pulling the time series for each variable, omitting the missing values, reversing the series order, snipping off the observation at the start of that string, collecting those snippets in a new vector, and running that vector through the previously estimated model objects to get a forecast (see the section of this starting at line 284).

One implicit goal of this approach is to make it easier to jump to batch processing, where the forecasting engine routinely and automatically pings the data sources online and updates whenever any of the requisite inputs has changed. So, for example, when in a few months the vaunted Polity IV Project releases its 2013 update, my forecasting contraption would catch and ingest the new version and the forecasts would change accordingly. I now have scripts that can do the statistical part but am going to be leaning on other folks to automate the wider routine as part of the early-warning system I’m helping build for the U.S. Holocaust Memorial Museum’s Center for the Prevention of Genocide.

The big upside of this opportunistic approach to updating is that the risk assessments are always as current as possible, conditional on the limitations of the available data. The way I figure, when you don’t have information that’s as fresh as you’d like, use the freshest information you’ve got.

The downside of this approach is that it’s not clear exactly what the outputs from that process represent. Technically, a forecast is a probabilistic statement about the likelihood of a specific event during a specific time period. The outputs from this process are still probabilistic statements about the likelihood of a specific event, but they are no longer anchored to a specific time period. The probabilities mapped at the top of this post mostly use data from 2012, but the inputs for some variables for some cases are a little older, while the inputs for some of the dynamic variables (e.g., GDP growth rates and coup attempts) are essentially current. So are those outputs forecasts for 2013, or for 2014, or something else?

For now, I’m going with “something else” and am thinking of the outputs from this machinery as the most up-to-date statistical risk assessments I can produce, but not forecasts as such. That description will probably sound like fudging to most statisticians, but it’s meant to be an honest reflection of both the strengths and limitations of the underlying approach.

Any gear heads who’ve read this far, I’d really appreciate hearing your thoughts on this strategy and any ideas you might have on other ways to resolve this conundrum, or any other aspect of this forecasting process. As noted at the top, the data and code used to produce these estimates are posted online. This work is part of a soon-to-launch, public early-warning system, so we hope and expect that they will have some effect on policy and advocacy planning processes. Given that aim, it behooves us to do whatever we can to make them as accurate as possible, so I would very much welcome any suggestions on how to do or describe this better.

Finally and as promised, here is a dot plot of the estimates mapped above. Countries are shown in descending order by estimated risk. The gray dots mark the forecasts from the three component models, and the red dot marks the unweighted average.

dotplot.20140122

PS. In preparation for a presentation on this work at an upcoming workshop, I made a new map of the current assessments that works better, I think, than the one at the top of this post. Instead of coloring by quintiles, this new version (below) groups cases into several bins that roughly represent doublings of risk: less than 1%, 1-2%, 2-4%, 4-8%, and 8-16%. This version more accurately shows that the vast majority of countries are at extremely low risk and more clearly shows variations in risk among the ones that are not.

Estimated Risk of New State-Led Mass Killing

Estimated Risk of New State-Led Mass Killing

Using GDELT to Monitor Atrocities, Take 2

Last May, I wrote a post about my preliminary efforts to use a new data set called GDELT to monitor reporting on atrocities around the world in near-real time. Those efforts represent one part of the work I’m doing on a public early-warning system for the U.S. Holocaust Memorial Museum’s Center for the Prevention of Genocide, and they have continued in fits and starts over the ensuing eight months. With help from Dartmouth’s Dickey Center, Palantir, and the GDELT crew, we’ve made a lot of progress. I thought I’d post an update now because I’m excited about the headway we’ve made; I think others might benefit from seeing what we’re doing; and I hope this transparency can help us figure out how to do this task even better.

So, let’s cut to the chase: Here is a screenshot of an interactive map locating the nine events captured in GDELT in the first week of January 2014 that looked like atrocities to us and occurred in a place that the Google Maps API recognized when queried. (One event was left off the map because Google Maps didn’t recognize its reported location.) The size of the bubbles corresponds to the number of civilian deaths, which in this map range from one to 31. To really get a feel for what we’re trying to do, though, head over to the original visualization on CartoDB (here), where you can zoom in and out and click on the bubbles to see a hyperlink to the story from which each event was identified.

atrocities.monitoring.screenshot.20140113

Looks simple, right? Well, it turns out it isn’t, not by a long shot.

As this blog’s regular readers know, GDELT uses software to scour the web for new stories about political interactions all around the world and parses those stories to identify and record information about who did or said what to whom, when, and where. It currently covers the period 1979-present and is now updated every day, and each of those daily updates contains some 100,000-140,000 new records. Miraculously and crucial to a non-profit pilot project like ours, GDELT is also available for free. 

The nine events plotted in the map above were sifted from the tens of thousands of records GDELT dumped on us in the first week of 2014. Unfortunately, that data-reduction process is only partially automated.

The first step in that process is the quickest. As originally envisioned back in May, we are using an R script (here) to download GDELT’s daily update file and sift it for events that look, from the event type and actors involved, like they might involve what we consider to be an atrocity—that is, deliberate, deadly violence against one or more noncombatant civilians in the context of a wider political conflict.

Unfortunately, the stack of records that filtering script returns—something like 100-200 records per day—still includes a lot of stuff that doesn’t interest us. Some records are properly coded but involve actions that don’t meet our definition of an atrocity (e.g., clashes between rioters and police or rebels and troops); some involve atrocities but are duplicates of events we’ve already captured; and some are just miscoded (e.g., a mention of the film industry “shooting” movies that gets coded as soldiers shooting civilians).

After we saw how noisy our data set would be if we stopped screening there, we experimented with a monitoring system that would acknowledge GDELT’s imperfections and try to work with them. As Phil Schrodt recommended at the recent GDELT DC Hackathon, we looked to “embrace the suck.” Instead of trying to use GDELT to generate a reliable chronicle of atrocities around the world, we would watch for interesting and potentially relevant perturbations in the information stream, noise and all, and those perturbations would produce alerts that users of our system could choose to investigate further. Working with Palantir, we built a system that would estimate country-specific prior moving averages of daily event counts returned by our filtering script and would generate an alert whenever a country’s new daily count landed more than two standard deviations above or below that average.

That system sounded great to most of the data pros in our figurative room, but it turned out to be a non-starter with some other constituencies of importance to us. The issue was credibility. Some of the events causing those perturbations in the GDELT stream were exactly what we were looking for, but others—a pod of beached whales in Brazil, or Congress killing a bill on healthcare reform—were laughably far from the mark. If our supposedly high-tech system confused beached whales and Congressional procedures for mass atrocities, we would risk undercutting the reputation for reliability and technical acumen that we are striving to achieve.

So, back to the drawing board we went. To separate the signal from the static and arrive at something more like that valid chronicle we’d originally envisioned, we decided that we needed to add a second, more laborious step to our data-reduction process. After our R script had done its work, we would review each of the remaining records by hand to decide if it belonged in our data set or not and, when necessary, to correct any fields that appeared to have been miscoded. While we were at it, we would also record the number of deaths each event produced. We wrote a set of rules to guide those decisions; had two people (a Dartmouth undergraduate research assistant and I) apply those rules to the same sets of daily files; and compared notes and made fixes. After a few iterations of that process over a few months, we arrived at the codebook we’re using now (here).

This process radically reduces the amount of data involved. Each of those two steps drops us down multiple orders of magnitude: from 100,000-140,000 records in the daily updates, to about 150 in our auto-filtered set, to just one or two in our hand-filtered set. The figure below illustrates the extent of that reduction. In effect, we’re treating GDELT as a very powerful but error-prone search and coding tool, a source of raw ore that needs refining to become the thing we’re after. This isn’t the only way to use GDELT, of course, but for our monitoring task as presently conceived, it’s the one that we think will work best.

monitoring.data.reduction.graphic

Once that second data-reduction step is done, we still have a few tasks left to enable the kind of mapping and analysis we aim to do. We want to trim the data set to keep only the atrocities we’ve identified, and we need to consolidate the original and corrected fields in those remaining records and geolocate them. All of that work gets done with a second R script (here), which is applied to the spreadsheet the coder saves after completing her work. The much smaller file that script produces is then ready to upload to a repository where it can be combined with other days’ outputs to produce the global chronicle our monitoring project aims to produce.

From start to finish, each daily update now takes about 45 minutes, give or take 15. We’d like to shrink that further if we can but don’t see any real opportunities to do so at the moment. Perhaps more important, we still have to figure out the bureaucratic procedures that will allow us to squeeze daily updates from a “human in the loop” process in a world where there are weekends and holidays and people get sick and take vacations and sometimes even quit. Finally, we also have not yet built the dashboard that will display and summarize and provide access to these data on our program’s web site, which we expect to launch some time this spring.

We know that the data set this process produces will be incomplete. I am 100-percent certain that during the first week of January 2014, more than 10 events occurred around the world that met our definition of an atrocity. Unfortunately, we can only find things where GDELT looks, and even a scan of every news story produced every day everywhere in the world would fail to see the many atrocities that never make the news.

On the whole, though, I’m excited about the progress we’ve made. As soon as we can launch it, this monitoring process should help advocates and analysts more efficiently track atrocities globally in close to real time. As our data set grows, we also hope it will serve as the foundation for new research on forecasting, explaining, and preventing this kind of violence. Even with its evident shortcomings, we believe this data set will prove to be useful, and as GDELT’s reach continues to expand, so will ours.

PS For a coda discussing the great ideas people had in response to this post, go here.

[Erratum: The original version of this post said there were about 10,000 records in each daily update from GDELT. The actual figure is 100,000-140,000. The error has been corrected and the illustration of data reduction updated accordingly.]

Relative Risks of State-Led Mass Killing Onset in 2014: Results from a Wiki Survey

In early December, as part of our ongoing work for the Holocaust Museum’s Center for the Prevention of Genocide, Ben Valentino and I launched a wiki survey to help assess risks of state-led mass killing onsets in 2014 (here).

The survey is now closed and the results are in. Here, according to our self-selected crowd on five continents and the nearly 5,000 pairwise votes it cast, is a map of how the world looks right now on this score. The darker the shade of gray, the greater the relative risk that in 2014 we will see the start of an episode of mass killing in which the deliberate actions of state agents or other groups acting at their behest result in the deaths of at least 1,000 noncombatant civilians from a discrete group over a period of a year or less.

wikisurvey.masskilling.state.2014.map

Smaller countries are hard to find on that map, and it’s difficult to compare colors across regions, so here is a dot plot of the same data in rank order. Countries with red dots are ones that had ongoing episodes of state-led mass killing at the end of 2013: DRC, Egypt, Myanmar, Nigeria, North Korea, Sudan, and Syria. It’s possible that these countries will experience additional onsets in 2014, but we wonder if some of our respondents didn’t also conflate the risk of a new onset with the presence or intensity of an ongoing one. Also, there’s an ongoing episode in CAR that was arguably state-led for a time in 2013, but the Séléka militias no longer appear to be acting at the behest of the supposed government, so we didn’t color that dot. And, of course, there are at least a few ongoing episodes of mass killing being perpetrated by non-state actors (see this recent post for some ideas), but that’s not what we asked our crowd to consider in this survey.

wikisurvey.masskilling.state.2014.dotplot

It is very important to understand that the scores being mapped and plotted here are not probabilities of mass-killing onset. Instead, they are model-based estimates of the probability that the country in question is at greater risk than any other country chosen at random. In other words, these scores tell us which countries our crowd thinks we should worry about more, not how likely our crowd thinks a mass-killing onset is.

We think the results of this survey are useful in their own right, but we also plan to compare them to, and maybe even combine them with, other forecasts of mass killing onsets as part of the public early-warning system we expect to launch later this year.

In the meantime, if you’re interested in tinkering with the scores and our plots of them, you can find the code I used to make the map and dot plot on GitHub (here) and the data in .csv format on my Google Drive (here). If you have better ideas on how to visualize this information, please let us know and share your code.

UPDATE: Bad social scientist! With a tweet, Alex Hanna reminded me that I really need to say more about the survey method and respondents. So:

We used All Our Ideas to conduct this survey, and we embedded that survey in a blog post that defined our terms and explained the process. The blog post was published on December 1, and we publicized it through a few channels, including: a note to participants in a password-protected opinion pool we’re running to forecast various mass atrocities-related events; a posting to a Conflict Research group on Facebook; an email to the president of the American Association of Genocide Scholars asking him to announce it on their listserv; and a few tweets from my Twitter account at the beginning and end of the month. Some of those tweets were retweeted, and I saw a few other people post or tweet their own links to the blog post or survey as well.

As for Alex’s specific question about who comprised our crowd, the short answer is that we don’t and can’t know. Participation in All Our Ideas surveys is anonymous, and our blog post was not private. From the vote-level data (here), I can see that we ended the month with 4,929 valid votes from 147 unique voting sessions. I know for a fact that some people voted in more than one session—I cast a small number of votes on a few occasions, and I know at least one colleague voted more than once—so the number of people who participated was some unknown number smaller than 147 who found their way to the survey through those postings and tweets.

  • Author

  • Follow me on Twitter

  • Follow Dart-Throwing Chimp on WordPress.com
  • Enter your email address to follow this blog and receive notifications of new posts by email.

    Join 13,632 other followers

  • Archives

  • Advertisements
%d bloggers like this: