Another Chicken Little Post on China

Last fall, I described what I saw as an “accumulating risk of crisis” in China. Recent developments in two parts of the country only reinforce my sense that the Communist Party of China (CPC) is entering a period during which it will find it increasingly hard to sustain its monopoly on state authority.

The first part of the country drawing fresh attention is Hong Kong, where pro-democracy activists have mobilized a new nonviolent challenge to the Party’s authority in spite of the center’s pointed efforts to discourage them. Organizing under the Occupy Central label, these activists recently held an unofficial referendum that drew nearly 800,000 voters who overwhelmingly endorsed proposals that would allow the public to nominate candidates for elections in 2017—an idea that Beijing has repeatedly and unequivocally rejected. Today, on 1 July, tens of thousands of people marched into the city’s center to press those same demands.

1 July 2014 rally in Hong Kong (AP via BBC News)

The 1 July rally looks set to be one of the island’s largest protests in years, and it comes only weeks after Beijing issued a white paper affirming its “comprehensive jurisdiction” over Hong Kong. Although the official line since the 1997 handover has been “one country, two systems,” the expectation has generally been that national leaders would only tolerate differences that didn’t directly challenge their authority, and the new white paper made that implicit policy a bit clearer. Apparently, though, many Hong Kong residents aren’t willing to leave that assertion unchallenged, and the resulting conflict is almost certain to persist into and beyond those 2017 elections, assuming Beijing doesn’t concede the point before then.

The second restive area is Xinjiang Uyghur Autonomous Region, where Uyghurs have agitated for greater autonomy or outright independence since the area’s incorporation into China in 1949. Over the past year or so, the pace of this conflict has intensified again.

The Chinese government describes this conflict as a fight against terrorism, and some of the recent attacks—see here and here, for example—have targeted and killed large numbers of civilians. As Assaf Moghadam argues in a recent blog post, however, the line between terrorism and insurgency is almost always blurry in practice. Terrorism and insurgency—and, for that matter, campaigns of nonviolent resistance—are all tactical variations on the theme of rebellion. In Xinjiang, we see evidence of a wider insurgency in recent attacks on police stations and security checkpoints, symbols of the “occupying power” and certainly not civilian targets. Some Uyghurs have also engaged in nonviolent protests, although when they have, the police have responded harshly.

In any case, the tactical variation and increased pace and intensity of the clashes leads me to believe that this conflict should now be described as a separatist rebellion, and that this rebellion now poses a significant challenge to the Communist Party. Uyghurs certainly aren’t going to storm the capital, and they are highly unlikely to win sovereignty or independence for Xinjiang as long as the CPC still rules. Nevertheless, the expanding rebellion is taxing the center, and it threatens to make Party leaders look less competent than they would like.

Neither of these conflicts is new, and the Party has weathered flare-ups in both regions before. What is new is their concurrence with each other and with a number of other serious political and economic challenges. As the conflicts in Xinjiang and Hong Kong intensify, China’s real-estate market finally appears to be cooling, with potentially significant effects on the country’s economy, and pollution remains a national crisis that continues to stir sporadic unrest among otherwise “ordinary” citizens. And, of course, Party leaders are simultaneously pursuing an anti-corruption campaign that is hitting higher and higher targets. This campaign is ostensibly intended to bolster the economy and to address popular frustration over abuses of power, but like any purge, it also risks generating fresh enemies.

For reasons Barbara Geddes helps to illuminate (here), consolidated single-party authoritarian regimes like China’s tend to be quite resilient. They persist because they usually do a good job suppressing domestic opponents and co-opting would-be rivals within the ruling party. Single-party regimes are better than others at co-opting internal rivals because, under all but exceptional circumstances, regime survival reliably generates better payoffs for all factions than the alternatives.

Eventually, though, even single-party regimes break down, and when they do, it’s usually in the face of an economic crisis that simultaneously stirs popular frustration and weakens incentives for elites to remain loyal (on this point, see Haggard and Kaufman, too). Exactly how these regimes come undone is a matter of local circumstance and historical accident, but generally speaking, the likelihood increases as popular agitation swells and the array of potential elite defectors widens.

China’s slowing growth rate and snowballing financial troubles indicate that the risk of an economic crisis is still increasing. At the same time, the crises in Hong Kong, Xinjiang, and the many cities and towns where citizens are repeatedly protesting against pollution and corruption suggest that insiders who choose to defect would have plenty of potential allies to choose from. As I’ve said before, I don’t believe that the CPC regime is on the brink of collapse, but I would be surprised to see it survive in its current form—with no legal opposition and direct elections in rural villages only—to and through the Party’s next National Congress, due in in 2017.

Refugee Flows and Disorder in the Global System

This

The number of people displaced by violent conflict hit the highest level since World War II at the end of 2013, the head of the United Nations refugee agency, António Guterres, said in a report released on Friday…

Moreover, the impact of conflicts raging this year in Central African Republic, South Sudan, Ukraine and now Iraq threatens to push levels of displacement even higher by the end of 2014, he said.

…is, I think, another manifestation of the trends I discussed in a blog post here last September:

If we think on a systemic scale, it’s easier to see that we are now living through a period of global disorder matched in recent history only by the years surrounding the disintegration of the Soviet Union, and possibly exceeding it. Importantly, it’s not just the spate of state collapses through which this disorder becomes evident, but also the wider wave of protest activity and institutional transformation to which some of those collapses are connected.

If that’s true, then Mr. Guterres is probably right when he predicts that this will get even worse this year, because things still seem to be trending toward disorder. A lot of the transnational activity in response to local manifestations is still deliberately inflammatory (e.g., materiel and cash to rebels in Syria and Iraq, Russian support for separatists in Ukraine), and international efforts to quell some of those manifestations (e.g., UN PKOs in CAR and South Sudan) are struggling. Meanwhile, in what’s probably both a cause and an effect of these processes, global economic growth still has not rebounded as far or as fast as many had expected a year or two ago and remains uncertain and uneven.

In other words, the positive feedback still seems to be outrunning the negative feedback. Until that turns, the systemic processes driving (and being driven by) increased refugee flows will likely continue.

Addendum: The quote at the start of this post contains what I think is an error. A lot of the news stories on this report’s release used phrases like “displaced persons highest since World War II,” so I assumed that the U.N. report included the data on which that statement would be based. It turns out, though, that the report only makes a vague (and arguably misleading) reference to “the post-World War II era.” In fact, the U.N. does not have data to make comparisons on numbers of displaced persons prior to 1989. With the data it does have, the most the UNHCR can say is this, from p. 5: “The 2013 levels of forcible displacement were the highest since at least 1989, the first year that comprehensive statistics on global forced displacement existed.”

The picture also looks a little different from the press release if we adjust for increases in global population. Doing some rough math with the number of displaced persons in this UNHCR chart as the numerator and the U.S. Census Bureau’s mid-year estimates of world population as the denominator, here are some annual statistics on displaced persons as a share of the global population:

1989: 0.65%
1992: 0.84%
2010: 0.63%
2014: 0.72%

In no way do I mean to make light of what’s obviously a massive global problem, but as a share of the global population, the latest numbers are not (yet) even the worst since 1989, the first year for which UNHCR has comparable data.

There Is No Such Thing as Civil War

In a 2008 conference paper, Jim Fearon and David Laitin used statistics and case narratives to examine how civil wars around the world since 1955 have ended. They found that deadly fights between central governments and domestic challengers usually only end after an abrupt change in the relative fighting power of one side or the other, and that these abrupt changes are usually brought on by the beginning or end of foreign support. This pattern led them to ruminate thus (emphasis in original):

We were struck by the high frequency of militarily significant foreign support for government and rebels. The evidence suggests that more often than not, civil wars either become – or may even begin as –the object of other states’ foreign policies…Civil wars are normally studied as matters of domestic politics. Future research might make progress by shifting the perspective, and thinking about civil war as international politics by other means.

Their study recently came to mind when I was watching various people on Twitter object to the idea that what’s happening in Ukraine right now could be described as civil war, or at least the possible beginnings of one. Even if some of the separatists mobilizing in eastern Ukraine really were Ukrainian nationals, they argued, the agent provocateur was Russia, so this fight is properly understood as a foreign incursion.

As Jim and David’s paper shows, though, strong foreign hands are a common and often decisive feature of the fights we call civil wars.

In Syria, for example, numerous foreign governments and other external agents are funding, training, equipping, and arming various factions in the armed conflict that’s raged for nearly three years now. Some of that support is overt, but the support we see when we read about the war in the press is surely just a fraction of what’s actually happening. Yet we continue to see the conflict described as a civil war.

In the Central African Republic, it’s Chad that’s played “an ambiguous and powerful role” in the conflict that has precipitated state collapse and ethnic cleansing there. As the New York Times described in April,

[Chad] was accused of supporting the overthrow of the nation’s president, and then later helped remove the rebel who ousted him, making way for a new transitional government. In a statement on Thursday, the Chadian government said that its 850 soldiers had been accused of siding with Muslim militias in sectarian clashes with Christian fighters that have swept the Central African Republic for months.

At least a couple of bordering states are apparently involved in the civil war that’s stricken South Sudan since December. In a May 2014 report, the UN Mission to South Sudan asserted that government forces were receiving support from “armed groups from the Republic of Sudan,” and that “the Government has received support from the Uganda People’s Defence Force (UPDF), notably in Juba and Jonglei State.” The report also claimed that “some Darfuri militias have allied with opposition forces in the northern part of Unity State,” which borders Sudan. And, of course, there is a nearly 8,000-strong UN peacekeeping operation that is arguably shaping the scale of the violence there, even if it isn’t stopping it.

Pick a civil war—any civil war—and you’ll find similar evidence of external involvement. This is what led Jim and David to muse about civil wars as “international politics by other means,” and what led me to the deliberately provocative title of this post. As a researcher, I see analytic value in sometimes distinguishing between interstate and intrastate wars, which may have distinct causes and follow different patterns and may therefore be amenable to different forms of prevention or mitigation. At the same time, I think it’s clear that this distinction is nowhere near as crisp in reality as our labels imply, so we should be mindful to avoid confusing the typology with the reality it crudely describes.

A Useful Data Set on Political Violence that Almost No One Is Using

For the past 10 years, the CIA has overtly funded the production of a publicly available data set on certain atrocities around the world that now covers the period from January 1995 to early 2014 and is still updated on a regular basis. If you work in a relevant field but didn’t know that, you’re not alone.

The data set in question is the Political Instability Task Force’s Worldwide Atrocities Dataset, which records information from several international press sources about situations in which five or more civilians are deliberately killed in the context of some wider political conflict. Each record includes information about who did what to whom, where, and when, along with a brief text description of the event, a citation for the source article(s), and, where relevant, comments from the coder. The data are updated monthly, although those updates are posted on a four-month lag (e.g., data from January become available in May).

The decision to limit collection to events involving at least five fatalities was a pragmatic one. As the data set’s codebook notes,

We attempted at one point to lower this threshold to one and the data collection demands proved completely overwhelming, as this involved assessing every murder and ambiguous accidental death reported anywhere in the world in the international media. “Five” has no underlying theoretical justification; it merely provides a threshold above which we can confidently code all of the reported events given our available resources.

For the past three years, the data set has also fudged this rule to include targeted killings that appear to have a political motive, even when only a single victim is killed. So, for example, killings of lawyers, teachers, religious leaders, election workers, and medical personnel are nearly always recorded, and these events are distinguished from ones involving five or more victims by a “Yes” in a field identifying “Targeted Assassinations” under a “Related Tactics” header.

The data set is compiled from stories appearing in a handful of international press sources that are accessed through Factiva. It is a computer-assisted process. A Boolean keyword search is used to locate potentially relevant articles, and then human coders read those stories and make data from the ones that turn out actually to be relevant. From the beginning, the PITF data set has pulled from Reuters, Agence France Press, Associated Press, and the New York Times. Early in the process, BBC World Monitor and CNN were added to the roster, and All Africa was also added a few years ago to improve coverage of that region.

The decision to restrict collection to a relatively small number of sources was also a pragmatic one. Unlike GDELT, for example—the routine production of which is fully automated—the Atrocities Data Set is hand-coded by people reading news stories identified through a keyword search. With people doing the coding, the cost of broadening the search to local and web-based sources is prohibitive. The hope is eventually to automate the process, either as a standalone project or as part of a wider automated event data collection effort. As GDELT shows, though, that’s hard to do well, and that day hasn’t arrived yet.

Computer-assisted coding is far more labor intensive than fully automated coding, but it also carries some advantages. Human coders can still discern better than the best automated coding programs when numerous reports are all referring to the same event, so the PITF data set does a very good job eliminating duplicate records. Also, the “where” part of each record in the PITF data set includes geocoordinates, and its human coders can accurately resolve the location of nearly every event to at least the local administrative area, a task over which fully automated processes sometimes still stumble.

Of course, press reports only capture a fraction of all the atrocities that occur in most conflicts, and journalists writing about hard-to-cover conflicts often describe these situations with stories that summarize episodes of violence (e.g., “Since January, dozens of villagers have been killed…”). The PITF data set tries to accommodate this pattern by recording two distinct kinds of events: 1) incidents, which occur in a single place in short period of time, usually a single day; and 2) campaigns, which involve the same perpetrator and target group but may occur in multiple places over a longer period of time—usually days but sometimes weeks or months.

The inclusion of these campaigns alongside discrete events allows the data set to capture more information, but it also requires careful attention when using the results. Most statistical applications of data sets like this one involve cross-tabulations of events or deaths at a particular level during some period of time—say, countries and months. That’s relatively easy to do with data on discrete events located in specific places and days. Here, though, researchers have to decide ahead of time if and how they are going to blend information about the two event types. There are two basic options: 1) ignore the campaigns and focus exclusively on the incidents, treating that subset of the data set like a more traditional one and ignoring the additional information; or 2) make a convenient assumption about the distribution of the incidents of which campaigns are implicitly composed and apportion them accordingly.

For example, if we are trying to count monthly deaths from atrocities at the country level, we could assume that deaths from campaigns are distributed evenly over time and assign equal fractions of those deaths to all months over which they extend. So, a campaign in which 30 people were reportedly killed in Somalia between January and March would add 10 deaths to the monthly totals for that country in each of those three months. Alternatively, we could include all of the deaths from a campaign in the month or year in which it began. Either approach takes advantage of the additional information contained in those campaign records, but there is also a risk of double counting, as some of the events recorded as incidents might be part of the violence summarized in the campaign report.

It is also important to note that this data set does not record information about atrocities in which the United States is either the alleged perpetrator or the target (e.g., 9/11) of an atrocity because of legal restrictions on the activities of the CIA, which funds the data set’s production. This constraint presumably has a bigger impact on some cases, such as Iraq and Afghanistan, than others.

To provide a sense of what the data set contains and to make it easier for other researchers to use it, I wrote an R script that ingests and cross-tabulates the latest iteration of the data in country-month and country-year bins and then plots some of the results. That script is now posted on Github (here).

One way to see how well the data set is capturing the trends we hope it will capture is to compare the figures it produces with ones from data sets in which we already have some confidence. While I was writing this post, Colombian “data enthusiast” Miguel Olaya tweeted a pair of graphs summarizing data on massacres in that country’s long-running civil war. The data behind his graphs come from the Rutas de Conflicto project, an intensive and well-reputed effort to document as many as possible of the massacres that have occurred in Colombia since 1980. Here is a screenshot of Olaya’s graph of the annual death counts from massacres in the Rutas data set since 1995, when the PITF data pick up the story:

Annual Deaths from Massacres in Colombia by Perpetrator (Source: Rutas de Conflicta)

Annual Deaths from Massacres in Colombia by Perpetrator (Source: Rutas de Conflicta)

Now here is a graph of deaths from the incidents in the PITF data set:

deaths.yearly.colombia

Just eyeballing the two charts, the correlation looks pretty good. Both show a sharp increase in the tempo of killing in the mid-1990s; a sustained peak around 2000; a steady decline over the next several years; and a relatively low level of lethality since the mid-2000s. The annual counts from the Rutas data are two or three times larger than the ones from the PITF data during the high-intensity years, but that makes sense when we consider how much deeper of a search that project has conducted. There’s also a dip in the PITF totals in 1999 and 2000 that doesn’t appear in the Rutas data, but the comparisons over the larger span hold up. All things considered, this comparison makes the PITF data look quite good, I think.

Of course, the insurgency in Colombia has garnered better coverage from the international press than conflicts in parts of the world that are even harder to reach or less safe for correspondents than the Colombian highlands. On a couple of recent crises in exceptionally under-covered areas, the PITF data also seems to do a decent job capturing surges in violence, but only when we include campaigns as well as incidents in the counting.

The plots below show monthly death totals from a) incidents only and b) incidents and campaigns combined in the Central African Republic since 1995 and South Sudan since its independence in mid-2011. Here, deaths from campaigns have been assigned to the month in which the campaign reportedly began. In CAR, the data set identifies the upward trend in atrocities through 2013 and into 2014, but the real surge in violence that apparently began in late 2013 is only captured when we include campaigns in the cross-tabulation (the dotted line).

deaths.monthly.car

The same holds in South Sudan. There, the incident-level data available so far miss the explosion of civilian killings that began in December 2013 and reportedly continue, but the combination of campaign and incident data appears to capture a larger fraction of it, along with a notable spike in July 2013 related to clashes in Jonglei State.

deaths.monthly.southsudan

These examples suggest that the PITF Worldwide Atrocities Dataset is doing a good job at capturing trends over time in lethal violence against civilians, even in some of the hardest-to-cover cases. To my knowledge, though, this data set has not been widely used by researchers interested in atrocities or political violence more broadly. Probably its most prominent use to date was in the Model component of the Tech Challenge for Atrocities Prevention, a 2013 crowdsourced competition funded by USAID and Humanity United. That challenge produced some promising results, but it remains one of the few applications of this data set on a subject for which reliable data are scarce. Here’s hoping this post helps to rectify that.

Disclosure: I was employed by SAIC as research director of PITF from 2001 until 2011. During that time, I helped to develop the initial version of this data set and was involved in decisions to fund its continued production. Since 2011, however, I have not been involved in either the production of the data or decisions about its continued funding. I am part of a group that is trying to secure funding for a follow-on project to the Model part of the Tech Challenge for Atrocities Prevention, but that effort would not necessarily depend on this data set.

Conflict Events, Coup Forecasts, and Data Prospecting

Last week, for an upcoming post to the interim blog of the atrocities early-warning project I direct, I got to digging around in ACLED’s conflict event data for the first time. Once I had the data processed, I started wondering if they might help improve forecasts of coup attempts, too. That train of thought led to the preliminary results I’ll describe here, and to a general reminder of the often-frustrating nature of applied statistical forecasting.

ACLED is the Armed Conflict Location & Event Data Project, a U.S. Department of Defense–funded, multi-year endeavor to capture information about instances of political violence in sub-Saharan Africa from 1997 to the present.ACLED’s coders scan an array of print and broadcast sources, identifiy relevant events from them, and then record those events’ date, location, and form (battle, violence against civilians, or riots/protests); the types of actors involved; whether or not territory changed hands; and the number of fatalities that occurred. Researchers can download all of the project’s data in various formats and structures from the Data page, one of the better ones I’ve seen in political science.

I came to ACLED last week because I wanted to see if violence against civilians in Somalia had waxed, waned, or held steady in recent months. Trying to answer that question with their data meant:

  • Downloading two Excel spreadsheets, Version 4 of the data for 1997-2013 and the Realtime Data file covering (so far) the first five months of this year;
  • Processing and merging those two files, which took a little work because my software had trouble reading the original spreadsheets and the labels and formats differed a bit across them; and
  • Subsetting and summarizing the data on violence against civilians in Somalia, which also took some care because there was an extra space at the end of the relevant label in some of the records.

Once I had done these things, it was easy to generalize it to the entire data set, producing tables with monthly counts of fatalities and events by type  for all African countries over the past 13 years. And, once I had those country-month counts of conflict events, it was easy to imagine using them to try to help forecast of coup attempts in the world’s most coup-prone region. Other things being equal, variations across countries and over time in the frequency of conflict events might tell us a little more about the state of politics in those countries, and therefore where and when coup attempts are more likely to happen.

Well, in this case, it turns out they don’t tell us much more. The plot below shows ROC curves and the areas under those curves for the out-of-sample predictions from a five-fold cross-validation exercise involving a few country-month models of coup attempts. The Base Model includes: national political regime type (the categorization scheme from PITF’s global instability model applied to Polity 3d, the spell-file version); time since last change in Polity score (in days, logged); infant mortality rate (relative to the annual global median, logged); and an indicator for any coup attempts in the previous 24 months (yes/no). The three other models add logged sums of counts of ACLED events by type—battles, violence against civilians, or riots/protests—in the same country over the previous three, six, or 12 months, respectively. These are all logistic regression models, and the dependent variable is a binary one indicating whether or not any coup attempts (successful or failed) occurred in that country during that month, according to Powell and Thyne.

ROC Curves and AUC Scores from Five-Fold Cross-Validation of Coup Models Without and With ACLED Event Counts

ROC Curves and AUC Scores from Five-Fold Cross-Validation of Coup Models Without and With ACLED Event Counts

As the chart shows, adding the conflict event counts to the base model seems to buy us a smidgen more discriminatory power, but not enough to have confidence that they would routinely lead to more accurate forecasts. Intriguingly, the crossing of the ROC curves suggests that the base model, which emphasizes structural conditions, is actually a little better at identifying the most coup-prone countries. The addition of conflict event counts to the model leads to some under-prediction of coups in that high-risk set, but the balance tips the other way in countries with less structural vulnerability. In the aggregate, though, there is virtually no difference in discriminatory power between the base model and the ones that at the conflict event counts.

There are, of course, many other ways to group and slice ACLED’s data, but the rarity of coups leads me to believe that narrower cuts or alternative operationalizations aren’t likely to produce stronger predictive signals. In Africa since 1997, there are only 36 country-months with coup attempts, according to Powell and Thyne. When the events are this rare and complex and the examples this few, there’s really not much point in going beyond the most direct measures. Under these circumstances, we’re unlikely to discover finer patterns, and if we do, we probably shouldn’t have much confidence in them. There are also other models and techniques to try, but I’m dubious for the same reasons. (FWIW, I did try Random Forests and got virtually identical accuracy.)

So those are the preliminary results from this specific exercise. (The R scripts I used are on Github, here). I think those results are interesting in their own right, but the process involved in getting to them is also a great example of the often-frustrating nature of applied statistical forecasting. I spent a few hours each day for three days straight getting from the thought of exploring ACLED to the results described here. Nearly all of that time was spent processing data; only the last half-hour or so involved any modeling. As is often the case, a lot of that data-processing time was really just me staring at my monitor trying to think of another way to solve some problem I’d already tried and failed to solve.

In my experience, that kind of null result is where nearly all statistical forecasting ideas end. Even when you’re lucky enough to have the data to pursue them, few of your ideas pan out. But panning is the right metaphor, I think. Most of the work is repetitive and frustrating, but every so often you catch a nice nugget. Those nuggets tempt you to keep looking for more, and once in a great while, they can make you rich.

Alarmed By Iraq

Iraq’s long-running civil war has spread and intensified again over the past year, and the government’s fight against a swelling Sunni insurgency now threatens to devolve into the sort of indiscriminate reprisals that could produce a new episode of state-led mass killing there.

The idea that Iraq could suffer a new wave of mass atrocities at the hands of state security forces or sectarian militias collaborating with them is not far fetched. According to statistical risk assessments produced for our atrocities early-warning project (here), Iraq is one of the 10 countries worldwide most susceptible to an onset of state-led mass killing, bracketed by places like Syria, Sudan, and the Central African Republic where large-scale atrocities and even genocide are already underway.

Of course, Iraq is already suffering mass atrocities of its own at the hands of insurgent groups who routinely kill large numbers of civilians in indiscriminate attacks, every one of which would stun American or European publics if it happened there. According to the widely respected Iraq Body Count project, the pace of civilian killings in Iraq accelerated sharply in July 2013 after a several-year lull of sorts in which “only” a few hundred civilians were dying from violence each month. Since the middle of last year, the civilian toll has averaged more than 1,000 fatalities per month. That’s well off the pace of 2006-2007, the peak period of civilian casualties under Coalition occupation, but it’s still an astonishing level of violence.

Monthly Counts of Civilian Deaths from Violence in Iraq (Source: Iraq Body Count)

Monthly Counts of Civilian Deaths from Violence in Iraq (Source: Iraq Body Count)

What seems to be increasing now is the risk of additional atrocities perpetrated by the very government that is supposed to be securing civilians against those kinds of attacks. A Sunni insurgency is gaining steam, and the government, in turn, is ratcheting up its efforts to quash the growing threat to its power in worrisome ways. A recent Reuters story summarized the current situation:

In Buhriz and other villages and towns encircling the capital, a pitched battle is underway between the emboldened Islamic State of Iraq and the Levant, the extremist Sunni group that has led a brutal insurgency around Baghdad for more than a year, and Iraqi security forces, who in recent months have employed Shi’ite militias as shock troops.

And this anecdote from the same Reuters story shows how that battle is sometimes playing out:

The Sunni militants who seized the riverside town of Buhriz late last month stayed for several hours. The next morning, after the Sunnis had left, Iraqi security forces and dozens of Shi’ite militia fighters arrived and marched from home to home in search of insurgents and sympathizers in this rural community, dotted by date palms and orange groves.

According to accounts by Shi’ite tribal leaders, two eyewitnesses and politicians, what happened next was brutal.

“There were men in civilian clothes on motorcycles shouting ‘Ali is on your side’,” one man said, referring to a key figure in Shi’ite tradition. “People started fleeing their homes, leaving behind the elders and young men and those who refused to leave. The militias then stormed the houses. They pulled out the young men and summarily executed them.”

Sadly, this escalatory spiral of indiscriminate violence is not uncommon in civil wars. Ben Valentino, a collaborator of mine in the development of this atrocities early-warning project, has written extensively on this topic (see especially here , here, and here). As Ben explained to me via email,

The relationship between counter-insurgency and mass violence against civilians is one of the most well-established findings in the social science literature on political violence. Not all counter-insurgency campaigns lead to mass killing, but when insurgent groups become large and effective enough to seriously threaten the government’s hold on power and when the rebels draw predominantly on local civilians for support, the risks of mass killing are very high. Usually, large-scale violence against civilians is neither the first nor the only tactic that governments use to defeat insurgencies. They may try to focus operations primarily against armed insurgents, or even offer positive incentives to civilians who collaborate with the government. But when less violent methods fail, the temptation to target civilians in the effort to defeat the rebels increases.

Right now, it’s hard to see what’s going to halt or reverse this trend in Iraq. “Things can get much worse from where we are, and more than likely they will,” Daniel Serwer told IRIN News for a story on Iraq’s escalating conflict (here). Other observers quoted in the same story seemed to think that conflict fatigue would keep the conflict from ballooning further, but that hope is hard to square with the escalation of violence that has already occurred over the past year and the fact that Iraq’s civil war never really ended.

In theory, elections are supposed to be a brake on this process, giving rival factions opportunities to compete for power and influence state policy in nonviolent ways. In practice, this often isn’t the case. Instead, Iraq appears to be following the more conventional path in which election winners focus on consolidating their own power instead of governing well, and excluded factions seek other means to advance their interests. Here’s part of how the New York Times set the scene for this week’s elections, which incumbent prime minister Nouri al-Maliki’s coalition is apparently struggling to win:

American intelligence assessments have found that Mr. Maliki’s re-election could increase sectarian tensions and even raise the odds of a civil war, citing his accumulation of power, his failure to compromise with other Iraqi factions—Sunni or Kurd—and his military failures against Islamic extremists. On his watch, Iraq’s American-trained military has been accused by rights groups of serious abuses as it cracks down on militants and opponents of Mr. Maliki’s government, including torture, indiscriminate roundups of Sunnis and demands of bribes to release detainees.

Because Iraq ranked so high in our last statistical risk assessments, we posted a question about it a few months ago on our “wisdom of (expert) crowds” forecasting system. Our pool of forecasters is still relatively small—89 as I write this—but the ones who have weighed in on this topic so far have put it in what I see as a middle tier of concern, where the risk is seen as substantial but not imminent or inevitable. Since January, the pool’s estimated probability of an onset of state-led mass killing in Iraq in 2014 has hovered around 20 percent, alongside countries like Pakistan (23 percent), Bangladesh (20 percent), and Burundi (19 percent) but well behind South Sudan (above 80 percent since December) and Myanmar (43 percent for the risk of a mass killing targeting the Rohingya in particular).

Notably, though, the estimate for Iraq has ticked up a few notches in the past few days to 27 percent as forecasters (including me) have read and discussed some of the pre-election reports mentioned here. I think we are on to something that deserves more scrutiny than it appears to be getting.

Whither Organized Violence?

The Human Security Research Group has just published the latest in its series of now-annual reports on “trends in organized violence around the world,” and it’s essential reading for anyone deeply interested in armed conflict and other forms of political violence. You can find the PDF here.

The 2013 edition takes Steven Pinker’s Better Angels as its muse and largely concurs with Pinker’s conclusions. I’ll sheepishly admit that I haven’t read Pinker’s book (yet), so I’m not going to engage directly in that debate. Instead, I’ll call attention to what the report’s authors infer from their research about future trends in political violence. Here’s how that bit starts, on p. 18:

The most encouraging data from the modern era come from the post–World War II years. This period includes the dramatic decline in the number and deadliness of international wars since the end of World War II and the reversal of the decades-long increase in civil war numbers that followed the end of the Cold War in the early 1990s.

What are the chances that these positive changes will be sustained? No one really knows. There are too many future unknowns to make predictions with any degree of confidence.

On that point, political scientist Bear Braumoeller would agree. In an interview last year for Popular Science (here), Kelsey Atherton asked Braumoeller about Braumoeller’s assertion in a recent paper (here) that it will take 150 years to know if the downward trend in warfare that Pinker and others have identified is holding. Braumoeller replied:

Some of this literature points to “the long peace” of post-World War II. Obviously we haven’t stopped fighting wars entirely, so what they’re referring to is the absence of really really big wars like World War I and World War II. Those wars would have to be absent for like 70 to 75 more years for us to have confidence that there’s been a change in the baseline rate of really really big wars.

That’s sort of a separate question from how we know whether there are trends in warfare in general. We need to understand that war and peace are both stochastic processes. We need a big enough sample to rule out the historical average, which is about one or two big wars per century. We just haven’t had enough time since World War I and World War II to rule out the possibility that nothing’s changed.

I suspect that the authors of the Human Security Report would not dispute that claim, but after carefully reviewing Pinker’s and their own evidence, they do see causes for cautious optimism. Here I’ll quote at length, because I think it’s important to see the full array of forces taken into consideration to increase our confidence in the validity of the authors’ cautious speculations.

The case for pessimism about the global security future is well rehearsed and has considerable support within the research community. Major sources of concern include the possibility of outbreaks of nuclear terrorism, a massive transnational upsurge of lethal Islamist radicalism, or wars triggered by mass droughts and population movements driven by climate change.

Pinker notes reasons for concern about each of these potential future threats but also skepticism about the more extreme claims of the conflict pessimists. Other possible drivers of global violence include the political crises that could follow the collapse of the international financial system and destabilizing shifts in the global balance of economic and military power—the latter being a major concern of realist scholars worried about the economic and military rise of China.

But focusing exclusively on factors and processes that may increase the risks of large-scale violence around the world, while ignoring those that decrease it, also almost certainly leads to unduly pessimistic conclusions.

In the current era, factors and processes that reduce the risks of violence not only include the enduring impact of the long-term trends identified in Better Angels but also the disappearance of two major drivers of warfare in the post–World War II period—colonialism and the Cold War. Other post–World War II changes that have reduced the risks of war include the entrenchment of the global norm against interstate warfare except in self-defence or with the authority of the UN Security Council; the intensification of economic and financial interdependence that increases the costs and decreases the benefits of cross-border warfare; the spread of stable democracies; and the caution-inducing impact of nuclear weapons on relations between the major powers.

With respect to civil wars, the emergent and still-growing system of global security governance discussed in Chapter 1 has clearly helped reduce the number of intrastate conflicts since the end of the Cold War. And, at what might be called the “structural” level, we have witnessed steady increases in national incomes across the developing world. This is important because one of the strongest findings from econometric research on the causes of war is that the risk of civil wars declines as national incomes—and hence governance and other capacities—increase. Chapter 1 reports on a remarkable recent statistical study by the Peace Research Institute, Oslo (PRIO) that found that if current trends in key structural variables are sustained, the proportion of the world’s countries afflicted by civil wars will halve by 2050.

Such an outcome is far from certain, of course, and for reasons that have yet to be imagined, as well as those canvassed by the conflict pessimists. But, thanks in substantial part to Steven Pinker’s extraordinary research, there are now compelling reasons for believing that the historical decline in violence is both real and remarkably large—and also that the future may well be less violent than the past.

After reading the new Human Security Report, I remain a short-term pessimist and long-term optimist. As I’ve said in a few recent posts (see especially this one), I think we’re currently in the thick of period of systemic instability that will continue to produce mass protests, state collapse, mass killing, and other forms of political instability at higher rates than we’ve seen since the early 1990s for at least the next year or two.

At the same time, I don’t think this local upswing marks a deeper reversal of the long-term trend that Pinker identifies, and that the Human Security Report confirms. Instead, I believe that the global political economy is continuing to evolve in a direction that makes political violence less common and less lethal. This system creep is evident not only in the aforementioned trends in armed violence, but also in concurrent and presumably interconnected trends in democratization, socio-economic development, and global governance. Until we see significant and sustained reversals in most or all of these trends, I will remain optimistic about the directionality of the underlying processes of which these data can give us only glimpses.

A New Statistical Approach to Assessing Risks of State-Led Mass Killing

Which countries around the world are currently at greatest risk of an onset of state-led mass killing? At the start of the year, I posted results from a wiki survey that asked this question. Now, here in heat-map form are the latest results from a rejiggered statistical process with the same target. You can find a dot plot of these data at the bottom of the post, and the data and code used to generate them are on GitHub.

Estimated Risk of New Episode of State-Led Mass Killing

These assessments represent the unweighted average of probabilistic forecasts from three separate models trained on country-year data covering the period 1960-2011. In all three models, the outcome of interest is the onset of an episode of state-led mass killing, defined as any episode in which the deliberate actions of state agents or other organizations kill at least 1,000 noncombatant civilians from a discrete group. The three models are:

  • PITF/Harff. A logistic regression model approximating the structural model of genocide/politicide risk developed by Barbara Harff for the Political Instability Task Force (PITF). In its published form, the Harff model only applies to countries already experiencing civil war or adverse regime change and produces a single estimate of the risk of a genocide or politicide occurring at some time during that crisis. To build a version of the model that was more dynamic, I constructed an approximation of the PITF’s global model for forecasting political instability and use the natural log of the predicted probabilities it produces as an additional input to the Harff model. This approach mimics the one used by Harff and Ted Gurr in their ongoing application of the genocide/politicide model for risk assessment (see here).
  • Elite Threat. A logistic regression model that uses the natural log of predicted probabilities from two other logistic regression models—one of civil-war onset, the other of coup attempts—as its only inputs. This model is meant to represent the argument put forth by Matt Krain, Ben Valentino, and others that states usually engage in mass killing in response to threats to ruling elites’ hold on power.
  • Random Forest. A machine-learning technique (see here) applied to all of the variables used in the two previous models, plus a few others of possible relevance, using the ‘randomforest‘ package in R. A couple of parameters were tuned on the basis of a gridded comparison of forecast accuracy in 10-fold cross-validation.

The Random Forest proved to be the most accurate of the three models in stratified 10-fold cross-validation. The chart below is a kernel density plot of the areas under the ROC curve for the out-of-sample estimates from that cross-validation drill. As the chart shows, the average AUC for the Random Forest was in the low 0.80s, compared with the high 0.70s for the PITF/Harff and Elite Threat models. As expected, the average of the forecasts from all three performed even better than the best single model, albeit not by much. These out-of-sample accuracy rates aren’t mind blowing, but they aren’t bad either, and they are as good or better than many of the ones I’ve seen from similar efforts to anticipate the onset of rare political crises in countries worldwide.

cpg.statrisk2014.val.auc.by.fold

Distribution of Out-of-Sample AUC Scores by Model in 10-Fold Cross-Validation

The decision to use an unweighted average for the combined forecast might seem simplistic, but it’s actually a principled choice in this instance. When examples of the event of interest are hard to come by and we have reason to believe that the process generating those events may be changing over time, sticking with an unweighted average is a reasonable hedge against risks of over-fitting the ensemble to the idiosyncrasies of the test set used to tune it. For a longer discussion of this point, see pp. 7-8 in the last paper I wrote on this work and the paper by Andreas Graefe referenced therein.

Any close readers of my previous work on this topic over the past couple of years (see here and here) will notice that one model has been dropped from the last version of this ensemble, namely, the one proposed by Michael Colaresi and Sabine Carey in their 2008 article, “To Kill or To Protect” (here). As I was reworking my scripts to make regular updating easier (more on that below), I paid closer attention than I had before to the fact that the Colaresi and Carey model requires a measure of the size of state security forces that is missing for many country-years. In previous iterations, I had worked around that problem by using a categorical version of this variable that treated missingness as a separate category, but this time I noticed that there were fewer than 20 mass-killing onsets in country-years for which I had a valid observation of security-force size. With so few examples, we’re not going to get reliable estimates of any pattern connecting the two. As it happened, this model—which, to be fair to its authors, was not designed to be used as a forecasting device—was also by far the least accurate of the lot in 10-fold cross-validation. Putting two and two together, I decided to consign this one to the scrap heap for now. I still believe that measures of military forces could help us assess risks of mass killing, but we’re going to need more and better data to incorporate that idea into our multimodel ensemble.

The bigger and in some ways more novel change from previous iterations of this work concerns the unorthodox approach I’m now using to make the risk assessments as current as possible. All of the models used to generate these assessments were trained on country-year data, because that’s the only form in which most of the requisite data is produced. To mimic the eventual forecasting process, the inputs to those models are all lagged one year at the model-estimation stage—so, for example, data on risk factors from 1985 are compared with outcomes in 1986, 1986 inputs to 1987 outcomes, and so on.

If we stick rigidly to that structure at the forecasting stage, then I need data from 2013 to produce 2014 forecasts. Unfortunately, many of the sources for the measures used in these models won’t publish their 2013 data for at least a few more months. Faced with this problem, I could do something like what I aim to do with the coup forecasts I’ll be producing in the next few days—that is, only use data from sources that quickly and reliably update soon after the start of each year. Unfortunately again, though, the only way to do that would be to omit many of the variables most specific to the risk of mass atrocities—things like the occurrence of violent civil conflict or the political salience of elite ethnicity.

So now I’m trying something different. Instead of waiting until every last input has been updated for the previous year and they all neatly align in my rectangular data set, I am simply applying my algorithms to the most recent available observation of each input. It took some trial and error to write, but I now have an R script that automates this process at the country level by pulling the time series for each variable, omitting the missing values, reversing the series order, snipping off the observation at the start of that string, collecting those snippets in a new vector, and running that vector through the previously estimated model objects to get a forecast (see the section of this starting at line 284).

One implicit goal of this approach is to make it easier to jump to batch processing, where the forecasting engine routinely and automatically pings the data sources online and updates whenever any of the requisite inputs has changed. So, for example, when in a few months the vaunted Polity IV Project releases its 2013 update, my forecasting contraption would catch and ingest the new version and the forecasts would change accordingly. I now have scripts that can do the statistical part but am going to be leaning on other folks to automate the wider routine as part of the early-warning system I’m helping build for the U.S. Holocaust Memorial Museum’s Center for the Prevention of Genocide.

The big upside of this opportunistic approach to updating is that the risk assessments are always as current as possible, conditional on the limitations of the available data. The way I figure, when you don’t have information that’s as fresh as you’d like, use the freshest information you’ve got.

The downside of this approach is that it’s not clear exactly what the outputs from that process represent. Technically, a forecast is a probabilistic statement about the likelihood of a specific event during a specific time period. The outputs from this process are still probabilistic statements about the likelihood of a specific event, but they are no longer anchored to a specific time period. The probabilities mapped at the top of this post mostly use data from 2012, but the inputs for some variables for some cases are a little older, while the inputs for some of the dynamic variables (e.g., GDP growth rates and coup attempts) are essentially current. So are those outputs forecasts for 2013, or for 2014, or something else?

For now, I’m going with “something else” and am thinking of the outputs from this machinery as the most up-to-date statistical risk assessments I can produce, but not forecasts as such. That description will probably sound like fudging to most statisticians, but it’s meant to be an honest reflection of both the strengths and limitations of the underlying approach.

Any gear heads who’ve read this far, I’d really appreciate hearing your thoughts on this strategy and any ideas you might have on other ways to resolve this conundrum, or any other aspect of this forecasting process. As noted at the top, the data and code used to produce these estimates are posted online. This work is part of a soon-to-launch, public early-warning system, so we hope and expect that they will have some effect on policy and advocacy planning processes. Given that aim, it behooves us to do whatever we can to make them as accurate as possible, so I would very much welcome any suggestions on how to do or describe this better.

Finally and as promised, here is a dot plot of the estimates mapped above. Countries are shown in descending order by estimated risk. The gray dots mark the forecasts from the three component models, and the red dot marks the unweighted average.

dotplot.20140122

PS. In preparation for a presentation on this work at an upcoming workshop, I made a new map of the current assessments that works better, I think, than the one at the top of this post. Instead of coloring by quintiles, this new version (below) groups cases into several bins that roughly represent doublings of risk: less than 1%, 1-2%, 2-4%, 4-8%, and 8-16%. This version more accurately shows that the vast majority of countries are at extremely low risk and more clearly shows variations in risk among the ones that are not.

Estimated Risk of New State-Led Mass Killing

Estimated Risk of New State-Led Mass Killing

A Coda to “Using GDELT to Monitor Atrocities, Take 2″

I love doing research in the Internet Age. As I’d hoped it would, my post yesterday on the latest iteration of our atrocities-monitoring system in the works has already sparked a lot of really helpful responses. Some of those responses are captured in comments on the post, but not all of them are. So, partly as a public good and partly for my own record-keeping, I thought I’d write a coda to that post enumerating the leads it generated and some of my reactions to them.

Give the Machines Another Shot at It

As a way to reduce or even eliminate the burden placed on our human(s) in the loop, several people suggested something we’ve been considering for a while: use machine-learning techniques to develop classifiers that can be used to further reduce the data left after our first round of filtering. These classifiers could consider all of the features in GDELT, not just the event and actor types we’re using in our R script now. If we’re feeling really ambitious, we could go all the way back to the source stories and use natural-language processing to look for additional discriminatory power there. This second round might not eliminate the need for human review, but it certainly could lighten the load.

The comment threads on this topic (here and here) nicely capture what I see as the promise and likely limitations of this strategy, so I won’t belabor it here. For now, I’ll just note that how well this would work is an empirical question, and it’s one we hope to get a chance to answer once we’ve accumulated enough screened data to give those classifiers a fighting chance.

Leverage GDELT’s Global Knowledge Graph

Related to the first idea, GDELT co-creator Kalev Leetaru has suggested on a couple of occasions that we think about ways to bring the recently-created GDELT Global Knowledge Graph (GKG) to bear on our filtering task. As Kalev describes in a post on the GDELT blog, GKG consists of two data streams, one that records mentions of various counts and another that captures connections  in each day’s news between “persons, organizations, locations, emotions, themes, counts, events, and sources.” That second stream in particular includes a bunch of data points that we can connect to specific event records and thus use as additional features in the kind of classifiers described under the previous header. In response to my post, Kalev sent this email to me and a few colleagues:

I ran some very very quick numbers on the human coding results Jay sent me where a human coded 922 articles covering 9 days of GDELT events and coded 26 of them as atrocities. Of course, 26 records isn’t enough to get any kind of statistical latch onto to build a training model, but the spectral response of the various GKG themes is quite informative. For events tagged as being an atrocity, themes such as ETHNICITY, RELIGION, HUMAN_RIGHTS, and a variety of functional actors like Villagers, Doctors, Prophets, Activists, show up in the top themes, whereas in the non-atrocities the roles are primarily political leaders, military personnel, authorities, etc. As just a simple example, the HUMAN_RIGHTS theme appeared in just 6% of non-atrocities, but 30% of atrocities, while Activists show up in 33% of atrocities compared with just 4% of non-atrocities, and the list goes on.

Again, 26 articles isn’t enough to build a model on, but just glancing over the breakdown of the GKG themes for the two there is a really strong and clear breakage between the two across the entire set of themes, and the breakdown fits precisely what baysean classifiers like (they are the most accurate for this kind of separation task and outperform SVM and random forest).

So, Jay, the bottom line is that if you can start recording each day the list of articles that you guys review and the ones you flag as an atrocity and give me a nice dataset over time, should be pretty easy to dramatically filter these down for you at the very least.

As I’ve said throughout this process, its not that event data can’t do what is needed, its that often you have to bring additional signals into the mix to accomplish your goals when the thing you’re after requires signals beyond what the event records are capturing.

What Kalev suggests at the end there—keep a record of all the events we review and the decisions we make on them—is what we’re doing now, and I hope we can expand on his experiment in the next several months.

Crowdsource It

Jim Walsh left a thoughtful comment suggesting that we crowdsource the human coding:

Seems to me like a lot of people might be willing to volunteer their time for this important issue–human rights activists and NGO types, area experts, professors and their students (who might even get some credit and learn about coding). If you had a large enough cadre of volunteers, could assign many (10 or more?) to each day’s data and generate some sort of average or modal response. Would need someone to organize the volunteers, and I’m not sure how this would be implemented online, but might be do-able.

As I said in my reply to him, this is an approach we’ve considered but rejected for now. We’re eager to take advantage of the wisdom of interested crowds and are already doing so in big ways on other parts of our early-warning system, but I have two major concerns about how well it would work for this particular task.

The first is the recruiting problem, and here I see a Catch-22: people are less inclined to do this if they don’t believe the system works, but it’s hard to convince them that the system works if we don’t already have a crowd involved to make it go. This recruiting problem becomes especially acute in a system with time-sensitive deliverables. If we promise daily updates, we need to produce daily updates, and it’s hard to do that reliably if we depend on self-organized labor.

My second concern is the principal-agent problem. Our goal is to make reliable and valid data in a timely way, but there are surely people out there who would bring goals to the process that might not align with ours. Imagine, for example, that Absurdistan appears in the filtered-but-not-yet-coded data to be committing atrocities, but citizens (or even paid agents) of Absurdistan don’t like that idea and so organize to vote those events out of the data set. It’s possible that our project would be too far under the radar for anyone to bother, but our ambitions are larger than that, so we don’t want to assume that will be true. If we succeed at attracting the kind of attention we hope to attract, the deeply political and often controversial nature of our subject matter would make crowdsourcing this task more vulnerable to this kind of failure.

Use Mechanical Turk

Both of the concerns I have about the downsides of crowdsourcing the human-coding stage could be addressed by Ryan Briggs’ suggestion via Twitter to have Amazon Mechanical Turk do it. A hired crowd is there when you need it and (usually) doesn’t bring political agendas to the task. It’s also relatively cheap, and you only pay for work performed.

Thanks to our collaboration with Dartmouth’s Dickey Center, the marginal cost of the human coding isn’t huge, so it’s not clear that Mechanical Turk would offer much advantage on that front. Where it could really help is in routinizing the daily updates. As I mentioned in the initial post, when you depend on human action and have just one or a few people involved, it’s hard to establish a set of routines that covers weekends and college breaks and sick days and is robust to periodic changes in personnel. Primarily for this reason, I hope we’ll be able to run an experiment with Mechanical Turk where we can compare its cost and output to what we’re paying and getting now and see if this strategy might make sense for us.

Don’t Forget About Errors of Omission

Last but not least, a longtime colleague had this to say in an email reacting to the post (hyperlinks added):

You are effectively describing a method for reducing errors of commission, events coded by GDELT as atrocities that, upon closer inspection, should not be. It seems like you also need to examine errors of omission. This is obviously harder. Two possible opportunities would be to compare to either [the PITF Worldwide Atrocities Event Data Set] or to ACLED.  There are two questions. Is GDELT “seeing” the same source info (and my guess is that it is and more, though ACLED covers more than just English sources and I’m not sure where GDELT stands on other languages). Then if so (and there are errors of omission) why aren’t they showing up (coded as different types of events or failed to trigger any coding at all)[?]

It’s true that our efforts so far have focused almost exclusively on avoiding errors of commission, with the important caveat that it’s really our automated filtering process, not GDELT, that commits most of these errors. The basic problem for us is that GDELT, or really the CAMEO scheme on which it’s based, wasn’t designed to spot atrocities per se. As a result, most of what we filter out in our human-coding second stage aren’t things that were miscoded by GDELT. Instead, they’re things that were properly coded by GDELT as various forms of violent action but upon closer inspection don’t appear to involve the additional features of atrocities as we define them.

Of course, that still leaves us with this colleague’s central concern about errors of omission, and on that he’s absolutely right. I have experimented with different actor and event-type criteria to make sure we’re not missing a lot of events of interest in GDELT, but I haven’t yet compared what we’re finding in GDELT to what related databases that use different sources are seeing. Once we accumulate a few month’s worth of data, I think this is something we’re really going to need to do.

Stay tuned for Take 3…

Using GDELT to Monitor Atrocities, Take 2

Last May, I wrote a post about my preliminary efforts to use a new data set called GDELT to monitor reporting on atrocities around the world in near-real time. Those efforts represent one part of the work I’m doing on a public early-warning system for the U.S. Holocaust Memorial Museum’s Center for the Prevention of Genocide, and they have continued in fits and starts over the ensuing eight months. With help from Dartmouth’s Dickey Center, Palantir, and the GDELT crew, we’ve made a lot of progress. I thought I’d post an update now because I’m excited about the headway we’ve made; I think others might benefit from seeing what we’re doing; and I hope this transparency can help us figure out how to do this task even better.

So, let’s cut to the chase: Here is a screenshot of an interactive map locating the nine events captured in GDELT in the first week of January 2014 that looked like atrocities to us and occurred in a place that the Google Maps API recognized when queried. (One event was left off the map because Google Maps didn’t recognize its reported location.) The size of the bubbles corresponds to the number of civilian deaths, which in this map range from one to 31. To really get a feel for what we’re trying to do, though, head over to the original visualization on CartoDB (here), where you can zoom in and out and click on the bubbles to see a hyperlink to the story from which each event was identified.

atrocities.monitoring.screenshot.20140113

Looks simple, right? Well, it turns out it isn’t, not by a long shot.

As this blog’s regular readers know, GDELT uses software to scour the web for new stories about political interactions all around the world and parses those stories to identify and record information about who did or said what to whom, when, and where. It currently covers the period 1979-present and is now updated every day, and each of those daily updates contains some 100,000-140,000 new records. Miraculously and crucial to a non-profit pilot project like ours, GDELT is also available for free. 

The nine events plotted in the map above were sifted from the tens of thousands of records GDELT dumped on us in the first week of 2014. Unfortunately, that data-reduction process is only partially automated.

The first step in that process is the quickest. As originally envisioned back in May, we are using an R script (here) to download GDELT’s daily update file and sift it for events that look, from the event type and actors involved, like they might involve what we consider to be an atrocity—that is, deliberate, deadly violence against one or more noncombatant civilians in the context of a wider political conflict.

Unfortunately, the stack of records that filtering script returns—something like 100-200 records per day—still includes a lot of stuff that doesn’t interest us. Some records are properly coded but involve actions that don’t meet our definition of an atrocity (e.g., clashes between rioters and police or rebels and troops); some involve atrocities but are duplicates of events we’ve already captured; and some are just miscoded (e.g., a mention of the film industry “shooting” movies that gets coded as soldiers shooting civilians).

After we saw how noisy our data set would be if we stopped screening there, we experimented with a monitoring system that would acknowledge GDELT’s imperfections and try to work with them. As Phil Schrodt recommended at the recent GDELT DC Hackathon, we looked to “embrace the suck.” Instead of trying to use GDELT to generate a reliable chronicle of atrocities around the world, we would watch for interesting and potentially relevant perturbations in the information stream, noise and all, and those perturbations would produce alerts that users of our system could choose to investigate further. Working with Palantir, we built a system that would estimate country-specific prior moving averages of daily event counts returned by our filtering script and would generate an alert whenever a country’s new daily count landed more than two standard deviations above or below that average.

That system sounded great to most of the data pros in our figurative room, but it turned out to be a non-starter with some other constituencies of importance to us. The issue was credibility. Some of the events causing those perturbations in the GDELT stream were exactly what we were looking for, but others—a pod of beached whales in Brazil, or Congress killing a bill on healthcare reform—were laughably far from the mark. If our supposedly high-tech system confused beached whales and Congressional procedures for mass atrocities, we would risk undercutting the reputation for reliability and technical acumen that we are striving to achieve.

So, back to the drawing board we went. To separate the signal from the static and arrive at something more like that valid chronicle we’d originally envisioned, we decided that we needed to add a second, more laborious step to our data-reduction process. After our R script had done its work, we would review each of the remaining records by hand to decide if it belonged in our data set or not and, when necessary, to correct any fields that appeared to have been miscoded. While we were at it, we would also record the number of deaths each event produced. We wrote a set of rules to guide those decisions; had two people (a Dartmouth undergraduate research assistant and I) apply those rules to the same sets of daily files; and compared notes and made fixes. After a few iterations of that process over a few months, we arrived at the codebook we’re using now (here).

This process radically reduces the amount of data involved. Each of those two steps drops us down multiple orders of magnitude: from 100,000-140,000 records in the daily updates, to about 150 in our auto-filtered set, to just one or two in our hand-filtered set. The figure below illustrates the extent of that reduction. In effect, we’re treating GDELT as a very powerful but error-prone search and coding tool, a source of raw ore that needs refining to become the thing we’re after. This isn’t the only way to use GDELT, of course, but for our monitoring task as presently conceived, it’s the one that we think will work best.

monitoring.data.reduction.graphic

Once that second data-reduction step is done, we still have a few tasks left to enable the kind of mapping and analysis we aim to do. We want to trim the data set to keep only the atrocities we’ve identified, and we need to consolidate the original and corrected fields in those remaining records and geolocate them. All of that work gets done with a second R script (here), which is applied to the spreadsheet the coder saves after completing her work. The much smaller file that script produces is then ready to upload to a repository where it can be combined with other days’ outputs to produce the global chronicle our monitoring project aims to produce.

From start to finish, each daily update now takes about 45 minutes, give or take 15. We’d like to shrink that further if we can but don’t see any real opportunities to do so at the moment. Perhaps more important, we still have to figure out the bureaucratic procedures that will allow us to squeeze daily updates from a “human in the loop” process in a world where there are weekends and holidays and people get sick and take vacations and sometimes even quit. Finally, we also have not yet built the dashboard that will display and summarize and provide access to these data on our program’s web site, which we expect to launch some time this spring.

We know that the data set this process produces will be incomplete. I am 100-percent certain that during the first week of January 2014, more than 10 events occurred around the world that met our definition of an atrocity. Unfortunately, we can only find things where GDELT looks, and even a scan of every news story produced every day everywhere in the world would fail to see the many atrocities that never make the news.

On the whole, though, I’m excited about the progress we’ve made. As soon as we can launch it, this monitoring process should help advocates and analysts more efficiently track atrocities globally in close to real time. As our data set grows, we also hope it will serve as the foundation for new research on forecasting, explaining, and preventing this kind of violence. Even with its evident shortcomings, we believe this data set will prove to be useful, and as GDELT’s reach continues to expand, so will ours.

PS For a coda discussing the great ideas people had in response to this post, go here.

[Erratum: The original version of this post said there were about 10,000 records in each daily update from GDELT. The actual figure is 100,000-140,000. The error has been corrected and the illustration of data reduction updated accordingly.]

Follow

Get every new post delivered to your Inbox.

Join 6,429 other followers

%d bloggers like this: