Another Tottering Step Toward a New Era of Data-Making

Ken Benoit, Drew Conway, Benjamin Lauderdale, Michael Laver, and Slava Mikhaylov have an article forthcoming in the American Political Science Review that knocked my socks off when I read it this morning. Here is the abstract from the ungated version I saw:

Empirical social science often relies on data that are not observed in the field, but are transformed into quantitative variables by expert researchers who analyze and interpret qualitative raw sources. While generally considered the most valid way to produce data, this expert-driven process is inherently difficult to replicate or to assess on grounds of reliability. Using crowd-sourcing to distribute text for reading and interpretation by massive numbers of non-experts, we generate results comparable to those using experts to read and interpret the same texts, but do so far more quickly and flexibly. Crucially, the data we collect can be reproduced and extended transparently, making crowd-sourced datasets intrinsically reproducible. This focuses researchers’ attention on the fundamental scientific objective of specifying reliable and replicable methods for collecting the data needed, rather than on the content of any particular dataset. We also show that our approach works straightforwardly with different types of political text, written in different languages. While findings reported here concern text analysis, they have far-reaching implications for expert-generated data in the social sciences.

The data-making strategy they develop is really innovative, and the cost of implementing is, I estimate from the relevant tidbits in the paper, 2–3 orders of magnitude lower than the cost of the traditional expert-centric approach. In other words, this is potentially a BIG DEAL for social-science data-making, which, as Sinan Aral reminds us, is a BIG DEAL for doing better social science.

That said, I do wonder how much structure is baked into the manifesto-coding task that isn’t there in most data-making problems, and that makes it especially well suited to the process the authors develop. In the exercise the paper describes:

  1. The relevant corpus (party manifestos) is self-evident, finite, and not too large;
  2. The concepts of interest (economic vs. social policy, left vs. right) are fairly intuitive; and
  3. The inferential task is naturally “fractal”; that is, the concepts of interest inhere in individual sentences (and maybe even words) as well as whole documents.

None of those attributes holds when it comes to coding latent socio-political structural features like de facto forms of government (a.k.a. regime type) or whether or not a country is in a state of civil war. These features are fundamental to analyses of international politics, but the high cost of producing them means that we sometimes don’t get them at all, and when we do, we usually don’t get them updated as quickly or as often as we would need to do more dynamic analysis and prediction. Maybe it’s my lack of imagination, but I can’t quite see how to extend the authors’ approach to those topics without stretching it past the breaking point. I can think of ways to keep the corpus manageable, but the concepts are not as intuitive, and the inferential task is not fractal. Ditto for coding event data, where I suspect that 2 from the list above would mostly hold; 3 would sometimes hold; but 1 absolutely would not.*

In short, I’m ga-ga about this paper and the directions in which it points us, but I’m not ready yet to declare imminent victory in the struggle to drag political science into a new and much healthier era of data-making. (Fool me once…)

* If you think I’m overlooking something here, please leave a comment explaining how you think it might be do-able.

Visualizing Strike Activity in China

In my last post, I suggested that the likelihood of social unrest in China is probably higher than a glance at national economic statistics would suggest, because those statistics conceal the fact that economic malaise is hitting some areas much harder than others and local pockets of unrest can have national effects (ask Mikhail Gorbachev about that one). Near the end of the post, I effectively repeated this mistake by showing a chart that summarized strike activity over the past few years…at the national level.

So, what does the picture look like if we disaggregate that national summary?

The best current data on strike activity in China come from China Labour Bulletin (CLB), a Hong Kong–based NGO that collects incident reports from various Chinese-language sources, compiles them in a public data set, and visualizes them in an online map. Those data include a few fields that allow us to disaggregate our analysis, including the province in which an incident occurred (Location), the industry involved (Industry), and the claims strikers made (Demands). On May 28, I downloaded a spreadsheet with data for all available dates (January 2011 to the present) for all types of incidents and wrote an R script that uses small multiples to compare strike activity across groups within each of those categories.

First, here’s the picture by province. This chart shows that Guangdong has been China’s most strike-prone province over the past several years, but several other provinces have seen large increases in labor unrest in the past two years, including Henan, Hebei, Hubei, Shandong, Sichuan, and Jiangsu. Right now, I don’t have monthly or quarterly province-level data on population size and economic growth to model the relationship among these things, but a quick eyeballing of the chart from the FT in my last post indicates that these more strike-prone provinces skew toward the lower end of the range of recent GDP growth rates, as we would expect.

sparklines.province

Now here’s the picture by industry. This chart makes clear that almost all of the surge in strike activity in the past year has come from two sectors: manufacturing and construction. Strikes in the manufacturing sector have been trending upward for a while, but the construction sector really got hit by a wave in just the past year that crested around the time of the Lunar New Year in early 2015. Other sectors also show signs of increased activity in recent months, though, including services, mining, and education, and the transportation sector routinely contributes a non-negligible slice of the national total.

sparklines.industry

And, finally, we can compare trends over time in strikers’ demands. This analysis took a little more work, because the CLB data on Demands do not follow best coding practices in which a set of categories is established a priori and each demand is assigned to one of those categories. In the CLB data, the Demands field is a set of comma-delimited phrases that are mostly but not entirely standardized (e.g., “wage arrears” and “social security” but also “reduction of their operating territory” and “gas-filing problem and too many un-licensed cars”). So, to aggregate the data on this dimension, I created a few categories of my own and used searches for regular expressions to find records that belonged in them. For example, all events for which the Demands field included “wage arrear”, “pay”, “compensation”, “bonus” or “ot” got lumped together in a Pay category, while events involving claims marked as “social security” or “pension” got combined in a Social Security category (see the R script for details).

The results appear below. As CLB has reported, almost all of the strike activity in China is over pay, usually wage arrears. There’s been an uptick in strikes over layoffs in early 2015, but getting paid better, sooner, or at all for work performed is by far the chief concern of strikers in China, according to these data.

sparklines.demands

In closing, a couple of caveats.

First, we know these data are incomplete, and we know that we don’t know exactly how they are incomplete, because there is no “true” record to which they can be compared. It’s possible that the apparent increase in strike activity in the past year or two is really the result of more frequent reporting or more aggressive data collection on a constant or declining stock of events.

I doubt that’s what’s happening here, though, for two reasons. One, other sources have reported the Chinese government has actually gotten more aggressive about censoring reports of social unrest in the past two years, so if anything we should expect the selection bias from that process to bend the trend in the opposite direction. Two, theory derived from historical observation suggests that strike activity should increase as the economy slows and the labor market tightens, and the observed data are consistent with those expectations. So, while the CLB data are surely incomplete, we have reason to believe that the trends they show are real.

Second, the problem I originally identified at the national level also applies at these levels. China’s provinces are larger than many countries in the world, and industry segments like construction and manufacturing contain a tremendous variety of activities. To really escape the ecological fallacy, we would need to drill down much further to the level of specific towns, factories, or even individuals. As academics would say, though, that task lies beyond the scope of the current blog post.

In China, Don’t Mistake the Trees for the Forest

Anyone who pays much attention to news of the world knows that China’s economy is cooling a bit. Official statistics—which probably aren’t true but may still be useful—show annual growth slowing from over 7.5 to around 7 percent or lower and staying there for a while.

For economists, the big question seems to be whether or not policy-makers can control the descent and avoid a hard landing or crash. Meanwhile, political scientists and sociologists wonder whether or not that economic slowdown will spur social unrest that could produce a national political crisis or reform. Most of what I remember reading on the topic has suggested that the risk of large-scale social unrest will remain low as long as China avoids the worst-case economic scenarios. GDP growth in the 6–7 percent range would be a letdown, but it’s still pretty solid compared to most places and is hardly a crisis.

I don’t know enough about economics to wade into that field’s debate, but I do wonder if an ecological fallacy might be leading many political scientists to underestimate the likelihood of significant social unrest in China in response to this economic slowdown. We commit an ecological fallacy when we assume that the characteristics of individuals in a group match the central tendencies of that group—for example, assuming that a kid you meet from a wealthy, high-performing high school is rich and will score well on the SAT. Put another way, an ecological fallacy involves mistakenly assuming that each tree shares the characteristic features of the forest they comprise.

Now consider the chart below, from a recent article in the Financial Times about the uneven distribution of economic malaise across China’s provinces. As the story notes, “The slowdown has affected some areas far worse than others. Perhaps predictably, the worst-hit places are those that can least afford it.”

The chart reminds us that China is a large and heterogeneous country—and, as it happens, social unrest isn’t a national referendum. You don’t need a majority vote from a whole country to get popular protest that can threaten to reorder national politics; you just need to reach a critical point, and that point can often be reached with a very small fraction of the total population. So, instead of looking at national tendencies to infer national risk, we should look at the tails of the relevant distributions to see if they’re getting thicker or longer. The people and places at the wrong ends of those distributions represent pockets of potential unrest; other things being equal, the more of them there are, the greater the cumulative probability of relevant action.

So how do things look in that thickening tail? Here again is that recent story in the FT:

Last month more than 30 provincial taxi drivers drank poison and collapsed together on the busiest shopping street in Beijing in a dramatic protest against economic and working conditions in their home town.

The drivers, who the police say all survived, were from Suifenhe, a city on the Russian border in the northeastern province of Heilongjiang…

Heilongjiang is among the poorest performers. While national nominal growth slipped to 5.8 per cent in the first quarter compared with a year earlier — its lowest level since the global financial crisis — the province’s nominal GDP actually contracted, by 3.2 per cent.

In the provincial capital of Harbin, signs of economic malaise are everywhere.

The relatively small, ritual protest described at the start of that block quote wouldn’t seem to pose much threat to Communist Party rule, but then neither did Mohamed Bouazizi’s self-immolation in Tunisia in December 2010.

Meanwhile, as the chart below shows, data collected by China Labor Bulletin show that the incidence of strikes and other forms of labor unrest has increased in China in the past year. Each such incident is arguably another roll of the dice that could blow up into a larger and longer episode. Any one event is extremely unlikely to catalyze a larger campaign that might reshape national politics in a significant way, but the more trials run, the higher the cumulative probability.

Monthly counts of labor incidents in China, January 2012-May 2015 (data source: China Labor Bulletin)

Monthly counts of labor incidents in China, January 2012-May 2015 (data source: China Labor Bulletin)

The point of this post is to remind myself and anyone bothering to read it that statistics describing the national economy in the aggregate aren’t a reliable guide to the likelihood of those individual events, and thus of a larger and more disruptive episode, because they conceal important variation in the distribution they summarize. I suspect that most China experts already think in these terms, but I think most generalists (like me) do not. I also suspect that this sub-national variation is one reason why statistical models using country-year data generally find weak association between things like economic growth and inflation on the one hand and demonstrations and strikes on the other. Maybe with better data in the future, we’ll find stronger affirmation of the belief many of us hold that economic distress has a strong effect on the likelihood of social unrest, because we won’t be forced into an ecological fallacy by the limits of available information.

Oh, and by the way: the same goes for Russia.

About That Apparent Decline in Violent Conflict…

Is violent conflict declining, or isn’t it? I’ve written here and elsewhere about evidence that warfare and mass atrocities have waned significantly in recent decades, at least when measured by the number of people killed in those episodes. Not everyone sees the world the same way, though. Bear Braumoeller asserts that, to understand how war prone the world is, we should look at how likely countries are to use force against politically relevant rivals, and by this measure the rate of warfare has held pretty steady over the past two centuries. Tanisha Fazal argues that wars have become less lethal without becoming less frequent because of medical advances that help keep more people in war zones alive. Where I have emphasized war’s lethal consequences, these two authors emphasize war’s likelihood, but their arguments suggest that violent conflict hasn’t really waned the way I’ve alleged it has.

This week, we got another important contribution to the wider debate in which my shallow contributions are situated. In an updated working paper, Pasquale Cirillo and Nassim Nicholas Taleb claim to show that

Violence is much more severe than it seems from conventional analyses and the prevailing “long peace” theory which claims that violence has declined… Contrary to current discussions…1) the risk of violent conflict has not been decreasing, but is rather underestimated by techniques relying on naive year-on-year changes in the mean, or using sample mean as an estimator of the true mean of an extremely fat-tailed phenomenon; 2) armed conflicts have memoryless inter-arrival times, thus incompatible with the idea of a time trend.

Let me say up front that I only have a weak understanding of the extreme value theory (EVT) models used in Cirillo and Taleb’s paper. I’m a political scientist who uses statistical methods, not a statistician, and I have neither studied nor tried to use the specific techniques they employ.

Bearing that in mind, I think the paper successfully undercuts the most optimistic view about the future of violent conflict—that violent conflict has inexorably and permanently declined—but then I don’t know many people who actually hold that view. Most of the work on this topic distinguishes between the observed fact of a substantial decline in the rate of deaths from political violence and the underlying risk of those deaths and the conflicts that produce them. We can (partly) see the former, but we can’t see the latter; instead, we have to try to infer it from the conflicts that occur. Observed history is, in a sense, a single sample drawn from a distribution of many possible histories, and, like all samples, this one is only a jittery snapshot of the deeper data-generating process in which we’re really interested. What Cirillo and Taleb purport to show is that long sequences of relative peace like the one we have seen in recent history are wholly consistent with a data-generating process in which the risk of war and death from it have not really changed at all.

Of course, the fact that a decades-long decline in violent conflict like the one we’ve seen since World War II could happen by chance doesn’t necessarily mean that it is happening by chance. The situation is not dissimilar to one we see in sports when a batter or shooter seems to go cold for a while. Oftentimes that cold streak will turn out to be part of the normal variation in performance, and the athlete will eventually regress to the mean—but not every time. Sometimes, athletes really do get and stay worse, maybe because of aging or an injury or some other life change, and the cold streak we see is the leading edge of that sustained decline. The hard part is telling in real time which process is happening. To try to do that, we might look for evidence of those plausible causes, but humans are notoriously good at spotting patterns where there are none, and at telling ourselves stories about why those patterns are occurring that turn out to be bunk.

The same logic applies to thinking about trends in violent conflict. Maybe the downward trend in observed death rates is just a chance occurrence in an unchanged system, but maybe it isn’t. And, as Andrew Gelman told Zach Beauchamp, the statistics alone can’t answer this question. Cirillo and Taleb’s analysis, and Braumoeller’s before it, imply that the history we’ve seen in the recent past  is about as likely as any other, but that fact isn’t proof of its randomness. Just as rare events sometimes happen, so do systemic changes.

Claims that “This time really is different” are usually wrong, so I think the onus is on people who believe the underlying risk of war is declining to make a compelling argument about why that’s true. When I say “compelling,” I mean an argument that a) identifies specific causal mechanisms and b) musters evidence of change over time in the presence or prevalence of those mechanisms. That’s what Steven Pinker tries at great length to do in The Better Angels of Our Nature, and what Joshua Goldstein did in Winning the War on War.

My own thinking about this issue connects the observed decline in the the intensity of violent conflict to the rapid increase in the past 100+ years in the size and complexity of the global economy and the changes in political and social institutions that have co-occurred with it. No, globalization is not new, and it certainly didn’t stop the last two world wars. Still, I wonder if the profound changes of the past two centuries are accumulating into a global systemic transformation akin to the one that occurred locally in now-wealthy societies in which organized violent conflict has become exceptionally rare. Proponents of democratic peace theory see a similar pattern in the recent evidence, but I think they are too quick to give credit for that pattern to one particular stream of change that may be as much consequence as cause of the deeper systemic transformation. I also realize that this systemic transformation is producing negative externalities—climate change and heightened risks of global pandemics, to name two—that could offset the positive externalities or even lead to sharp breaks in other directions.

It’s impossible to say which, if any, of these versions is “true,” but the key point is that we can find real-world evidence of mechanisms that could be driving down the underlying risk of violent conflict. That evidence, in turn, might strengthen our confidence in the belief that the observed pattern has meaning, even if it doesn’t and can’t prove that meaning or any of the specific explanations for it.

Finally, without deeply understanding the models Cirillo and Taleb used, I also wondered when I first read their new paper if their findings weren’t partly an artifact of those models, or maybe some assumptions the authors made when specifying them. The next day, David Roodman wrote something that strengthened this source of uncertainty. According to Roodman, the extreme value theory (EVT) models employed by Cirillo and Taleb can be used to test for time trends, but the ones described in this new paper don’t. Instead, Cirillo and Taleb specify their models in a way that assumes there is no time trend and then use them to confirm that there isn’t. “It seems to me,” Roodman writes, “that if Cirillo and Taleb want to rule out a time trend according to their own standard of evidence, then they should introduce one in their EVT models and test whether it is statistically distinguishable from zero.”

If Roodman is correct on this point, and if Cirillo and Taleb were to do what he recommends and still find no evidence of a time trend, I would update my beliefs accordingly. In other words, I would worry a little more than I do now about the risk of much larger and deadlier wars occurring again in my expected lifetime.

An Applied Forecaster’s Bad Dream

This is the sort of thing that freaks me out every time I’m getting ready to deliver or post a new set of forecasts:

In its 2015 States of Fragility report, the Organization for Economic Co-operation and Development (OECD) decided to complicate its usual one-dimensional list of fragile states by assessing five dimensions of fragility: Violence, Justice, Institutions, Economic Foundations and Resilience…

Unfortunately, something went wrong during the calculations. In my attempts to replicate the assessment, I found that the OECD misclassified a large number of states.

That’s from a Monkey Cage post by Thomas Leo Scherer, published today. Here, per Scherer, is why those errors matter:

Recent research by Judith Kelley and Beth Simmons shows that international indicators are an influential policy tool. Indicators focus international attention on low performers to positive and negative effect. They cause governments in poorly ranked countries to take action to raise their scores when they realize they are being monitored or as domestic actors mobilize and demand change after learning how they rate versus other countries. Given their potential reach, indicators should be handled with care.

For individuals or organizations involved in scientific or public endeavors, the best way to mitigate that risk is transparency. We can and should argue about concepts, measures, and model choices, but given a particular set of those elements, we should all get essentially the same results. When one or more of those elements is hidden, we can’t fully understand what the reported results represent, and researchers who want to improve the design by critiquing and perhaps extending it are forced to box shadows. Also, individuals and organizations can double– and triple-check their own work, but errors are almost inevitable. When getting the best possible answers matters more than the risk of being seen making mistakes, then transparency is the way to go. This is why the Early Warning Project shares the data and code used to produce its statistical risk assessments in a public repository, and why Reinhart and Rogoff probably (hopefully?) wish they’d done something similar.

Of course, even though transparency improves the probability of catching errors and improving on our designs, it doesn’t automatically produce those goods. What’s more, we can know that we’re doing the right thing and still dread the public discovery of an error. Add to that risk the near-certainty of other researchers scoffing at your terrible code, and it’s easy see why even the best practices won’t keep you from breaking out in a cold sweat each time you hit “Send” or “Publish” on a new piece of work.

 

Polity Meets Joy Division

The Center for Systemic Peace posted its annual update of the Polity data set on Friday, here. The data set now covers the period 1800–2014.

For those of you who haven’t already fled the page to go download the data and who aren’t political scientists: Polity measures patterns of political authority in all countries with populations larger than 500,000. It is one of the mostly widely used data sets in the fields of comparative politics and international relations. Polity is also tremendously useful in forecasts of rare political crises—partly because it measures some very important things, but also because it is updated every year on a fairly predictable schedule. Thanks to PITF and CSP for that.

I thought I would mark the occasion by visualizing Polity in a new way (for me, at least). In the past, I’ve used heat maps (here and here) and line plots of summary statistics. This time, I wanted to try something other than a heat map that would show change over time in a distribution, instead of just a central tendency. Weakly inspired by the often-imitated cover of Joy Division’s 1979 album, here’s what I got. Each line in this chart is a kernel density plot of one year’s Polity scores, which range from -10 to 10 and are meant to indicate how democratic a country’s politics are. The small number of cases with special codes that don’t fit on this scale (-66, -77, and -88) have been set aside.

polity.meets.joy.division

The chart shows once again that the world has become much more democratic in the past half-century, with most of those gains occurring in the past 30 years. In the early 1960s, the distribution of national political regimes was bimodal, but authoritarian regimes outnumbering the more-democratic ones. As recently as the early 1970s, most regimes still fell toward the authoritarian end of the scale. Starting in the late 1980s, though, the authoritarian peak eroded quickly, and the balance of the distribution shifted toward the democratic end. Despite continuing talk of a democratic recession, the (political) world in 2014 is still mostly composed of relatively democratic regimes, and this data set doesn’t show much change in that basic pattern over the past decade.

 

The Myth of Comprehensive Data

“What about using Twitter sentiment?”

That suggestion came to me from someone at a recent Data Science DC meetup, after I’d given a short talk on assessing risks of mass atrocities for the Early Warning Project, and as the next speaker started his presentation on predicting social unrest. I had devoted the first half of my presentation to a digression of sorts, talking about how the persistent scarcity of relevant public data still makes it impossible to produce global forecasts of rare political crises—things like coups, insurgencies, regime breakdowns, and mass atrocities—that are as sharp and dynamic as we would like.

The meetup wasn’t the first time I’d heard that suggestion, and I think all of the well-intentioned people who have made it to me have believed that data derived from Twitter would escape or overcome those constraints. In fact, the Twitter stream embodies them. Over the past two decades, technological, economic, and political changes have produced an astonishing surge in the amount of information available from and about the world, but that surge has not occurred evenly around the globe.

Think of the availability of data as plant life in a rugged landscape, where dry peaks are places of data scarcity and fertile valleys represent data-rich environments. The technological developments of the past 20 years are like a weather pattern that keeps dumping more and more rain on that topography. That rain falls unevenly across the landscape, however, and it doesn’t have the same effect everywhere it lands. As a result, plants still struggle to grow on many of those rocky peaks, and much of the new growth occurs where water already collected and flora were already flourishing.

The Twitter stream exemplifies this uneven distribution of data in a couple of important ways. Take a look at the map below, a screenshot I took after letting Tweetping run for about 16 hours spanning May 6–7, 2015. The brighter the glow, the more Twitter activity Tweetping saw.

tweetping 1530 20150506 to 0805 20150507

Some of the spatial variation in that map reflects differences in the distribution of human populations, but not all of it. Here’s a map of population density, produced by Daysleeper using data from CEISIN (source). If you compare this one to the map of Twitter usage, you’ll see that they align pretty well in Europe, the Americas, and some parts of Asia. In Africa and other parts of Asia, though, not so much. If it were just a matter of population density, then India and eastern China should burn brightest, but they—and especially China—are relatively dark compared to “the West.” Meanwhile, in Africa, we see pockets of activity, but there are whole swathes of the continent that are populated as or more densely than the brighter parts of South America, but from which we see virtually no Twitter activity.

world population density map

So why are some pockets of human settlement less visible than others? Two forces stand out: wealth and politics.

First and most obvious, access to Twitter depends on electricity and telecommunications infrastructure and gadgets and literacy and health and time, all of which are much scarcer in poorer parts of the world than they are in richer places. The map below shows lights at night, as seen from space by U.S. satellites 20 years ago and then mapped by NASA (source). These light patterns are sometimes used as a proxy for economic development (e.g., here).

earth_lights

This view of the world helps explain some of the holes in our map of Twitter activity, but not all of it. For example, many of the densely populated parts of Africa don’t light up much at night, just as they don’t on Tweetping, because they lack the relevant infrastructure and power production. Even 20 years ago, though, India and China looked much brighter through this lens than they do on our Twitter usage map.

So what else is going on? The intensity and character of Twitter usage also depends on freedoms of information and speech—the ability and desire to access the platform and to speak openly on it—and this political layer keeps other areas in the dark in that Tweetping map. China, North Korea, Cuba, Ethiopia, Eritrea—if you’re trying to anticipate important political crises, these are all countries you would want to track closely, but Twitter is barely used or unavailable in all of them as a direct or indirect consequence of public policy. And, of course, there are also many places where Twitter is accessible and used but censorship distorts the content of the stream. For example, Saudi Arabia lights up pretty well on the Twitter-usage map, but it’s hard to imagine people speaking freely on it when a tweet can land you in prison.

Clearly, wealth and political constraints still strongly shape the view of the world we can get from new data sources like Twitter. Contrary to the heavily-marketed myth of “comprehensive data,” poverty and repression continue to hide large swathes of the world out of our digital sight, or to distort the glimpses we get of them.

Unfortunate for efforts to forecast rare political crises, those two structural features that so strongly shape the production and quality of data also correlate with the risks we want to anticipate. The map below shows the Early Warning Project‘s most recent statistical assessments of the risk of onsets of state-led mass-killing episodes. Now flash back to the visualization of Twitter usage above, and you’ll see that many of the countries colored most brightly on this map are among the darkest on that one. Even in 2015, the places about which we most need more information to sharpen our forecasts of rare political crises are the ones that are still hardest to see.

ewp.sra.world.2014

Statistically, this is the second-worst of all possible worlds, the worst one being the total absence of information. Data are missing not at random, and the processes producing those gaps are the same ones that put places at greater risk of mass atrocities and other political calamities. This association means that models we estimate with those data will often be misleading. There are ways to mitigate these problems, but they aren’t necessarily simple, cheap, or effective, and that’s before we even start in on the challenges of extracting useful measures from something as heterogeneous and complex as the Twitter stream.

So that’s what I see when I hear people suggest that social media or Google Trends or other forms of “digital exhaust” have mooted the data problems about which I so often complain. Lots of organizations are spending a lot of money trying to overcome these problems, but the political and economic topography producing them does not readily yield. The Internet is part of this complex adaptive system, not a space outside it, and its power to transform that system is neither as strong nor as fast-acting as many of us—especially in the richer and freer parts of the world—presume.

To Realize the QDDR’s Early-Warning Goal, Invest in Data-Making

The U.S. Department of State dropped its second Quadrennial Diplomacy and Development Review, or QDDR, last week (here). Modeled on the Defense Department’s Quadrennial Defense Review, the QDDR lays out the department’s big-picture concerns and objectives so that—in theory—they can guide planning and shape day-to-day decision-making.

The new QDDR establishes four main goals, one of which is to “strengthen our ability to prevent and respond to internal conflict, atrocities, and fragility.” To help do that, the State Department plans to “increase [its] use of early warning analysis to drive early action on fragility and conflict.” Specifically, State says it will:

  1. Improve our use of tools for analyzing, tracking, and forecasting fragility and conflict, leveraging improvements in analytical capabilities;
  2. Provide more timely and accurate assessments to chiefs of mission and senior decision-makers;
  3. Increase use of early warning data and conflict and fragility assessments in our strategic planning and programming;
  4. Ensure that significant early warning shifts trigger senior-level review of the mission’s strategy and, if necessary, adjustments; and
  5. Train and deploy conflict-specific diplomatic expertise to support countries at risk of conflict or atrocities, including conflict negotiation and mediation expertise for use at posts.

Unsurprisingly, that plan sounds great to me. We can’t now and never will be able to predict precisely where and when violent conflict and atrocities will occur, but we can assess risks with enough accuracy and lead time to enable better strategic planning and programming. These forecasts don’t have to be perfect to be earlier, clearer, and more reliable than the traditional practices of deferring to individual country or regional analysts or just reacting to the news.

Of course, quite a bit of well-designed conflict forecasting is already happening, much of it paid for by the U.S. government. To name a few of the relevant efforts: The Political Instability Task Force (PITF) and the Worldwide Integrated Conflict Early Warning System (W-ICEWS) routinely update forecasts of various forms of political crisis for U.S. government customers. IARPA’s Open Source Indicators (OSI) and Aggregative Contingent Estimation (ACE) programs are simultaneously producing forecasts now and discovering ways to make future forecasts even better. Meanwhile, outside the U.S. government, the European Union has recently developed its own Global Conflict Risk Index (GCRI), and the Early Warning Project now assesses risks of mass atrocities in countries worldwide.

That so much thoughtful risk assessment is being done now doesn’t mean it’s a bad idea to start new projects. If there are any iron laws of forecasting hard-to-predict processes like political violence, one of them is that combinations of forecasts from numerous sources should be more accurate than forecasts from a single model or person or framework. Some of the existing projects already do this kind of combining themselves, but combinations of combinations will often be even better.

Still, if I had to channel the intention expressed in this part of the QDDR into a single activity, it would not be the construction of new models, at least not initially. Instead, it would be data-making. Social science is not Newtonian physics, but it’s not astrology, either. Smart people have been studying politics for a long time, and collectively they have developed a fair number of useful ideas about what causes or precedes violent conflict. But, if you can’t track the things those theorists tell you to track, then your forecasts are going to suffer. To improve significantly on the predictive models of political violence we have now, I think we need better inputs most of all.

When I say “better” inputs, I have a few things in mind. In some cases, we need to build data sets from scratch. When I was updating my coup forecasts earlier this year, a number of people wondered why I didn’t include measures of civil-military relations, which are obviously relevant to this particular risk. The answer was simple: because global data on that topic don’t exist. If we aren’t measuring it, we can’t use it in our forecasts, and the list of relevant features that falls into this set is surprisingly long.

In other cases, we need to revive them. Social scientists often build “boutique” data sets for specific research projects, run the tests they want to run on them, and then move on to the next project. Sometimes, the tests they or others run suggest that some features captured in those data sets would make useful predictors. Those discoveries are great in principle, but if those data sets aren’t being updated, then applied forecasters can’t use that knowledge. To get better forecasts, we need to invest in picking up where those boutique data sets left off so we can incorporate their insights into our applications.

Finally and in almost all cases, we need to observe things more frequently. Most of the data available now to most conflict forecasters is only updated once each year, often on a several-month delay and sometimes as much as two years later (e.g., data describing 2014 becomes available in 2016). That schedule is fine for basic research, but it is crummy for applied forecasting. If we want to be able to give assessments and warnings that as current as possible to those “chiefs of mission and senior decision-makers” mentioned in the QDDR, then we need to build models with data that are updated as frequently as possible. Daily or weekly are ideal, but monthly updates would suffice in many cases and would mark a huge improvement over the status quo.

As I said at the start, we’re never going to get models that reliably tell us far in advance exactly where and when violent conflicts and mass atrocities will erupt. I am confident, however, that we can assess these risks even more accurately than we do now, but only if we start making more, and better versions, of the data our theories tell us we need.

I’ll end with a final plea to any public servants who might be reading this: if you do invest in developing better inputs, please make the results freely available to the public. When you share your data, you give the crowd a chance to help you spot and fix your mistakes, to experiment with various techniques, and to think about what else you might consider, all at no additional cost to you. What’s not to like about that?

An Updated Look at Trends in Political Violence

The Center for Systemic Peace (CSP) has just posted an updated version of its Major Episodes of Political Violence data set, which now covers the period 1946-2014. That data set includes scalar measures of the magnitude of several forms of political violence between and within states. Per the codebook (PDF):

Magnitude scores reflect multiple factors including state capabilities, interactive intensity (means and goals), area and scope of death and destruction, population displacement, and episode duration. Scores are considered to be consistently assigned (i.e., comparable) across episode types and for all states directly involved.

For each country in each year, the magnitude scores range from 0 to 10. The chart below shows annual global sums of those scores for conflicts between and within states (i.e., the INTTOT and CIVTOT columns in the source data).

mepv.intensity.by.year

Consistent with other measures, CSP’s data show an increase in violent political conflict in the past few years. At the same time, those data also indicate that, even at the end of 2014, the scale of conflict worldwide remained well below the peak levels observed in the latter decades of the Cold War and its immediate aftermath. That finding provides no comfort to the people directly affected by the fighting ongoing today. Still, it should (but probably won’t) throw another blanket over hyperbolic statements about the world being more unstable than ever before.

If we look at the trends by region, we see what most avid newsreaders would expect to see. The chart below uses the U.S. State Department’s regional designations. It confirms that the recent increase in conflict within states (the orange lines) has mostly come from Africa and the Middle East. Conflicts persist in the Americas and East and South Asia, but their magnitude has generally diminished in recent years. Europe and Eurasia supplies the least violent conflict of any region, but the war in Ukraine—designated a civil conflict by this source and assigned a magnitude score of 2—increased that supply in 2014.

mepv.intensity.by.year.and.region

CSP saw almost no interstate conflict around the world in 2014. The global score of 1 accrues from U.S. operations in Afghanistan. When interstate conflict has occurred in the post–Cold War period, it has mostly come from Africa and the Middle East, too, but East Asia was also a major contributor as recently as the 1980s.

For a complete list of the episodes of political violence observed by CSP in this data set, go here. For CSP’s analysis of trends in these data, go here.

Walling Ourselves Off

In the past two weeks, more than a thousand people have died trying to cross the Mediterranean Sea from Africa to Europe on often-overloaded boats. In 2014, more than three thousand perished on this crossing.

Each individual migrant’s motives are unique and unknowable, but this collective surge in deaths clearly stems, in part, from the disorder engulfing parts of North Africa and the Middle East. Civil war and state collapse have expanded the incentives and opportunities to flee, and the increased flow of migrants along dangerous routes has, predictably, led to a surge in accidental deaths.

Of course, those deaths also owe something to the policies of the countries toward which the overloaded boats sail. European governments—many of them presiding over anemic growth and unemployment crises of their own—do not have open borders, and they have responded ambivalently or coolly to this spate of arrivals. Italy, where many of these boats land, had run a widely praised search-and-rescue program for a couple of years, but that effort was replaced in late 2014 by a smaller and so-far less successful EU program. Most observers lament the drownings, but some also worry that a more effective rescue scheme will encourage more people to attempt the crossing, or to get into the sordid business of ferrying others.

Humans have always, and often literally, built walls to keep outsiders out. Leslie Chang’s Factory Girls examines China’s current wave of urban migration, but she also dug into her own family’s history in that country and found this:

In 1644, the Manchus, an ethnic group living on China’s northeaster frontier, conquered China and established the Qing Dynasty. Soon thereafter, the Qing rulers declared Manchuria off-limits to the Han Chinese, the majority ethnic group of the rest of the country. Their aim was to monopolize the region’s natural resources and to preserve their homeland: As long as the frontier remained intact, they believed, their people would retain their vitality and forestall the corruption and decadence by which dynasties inevitably fell. To seal off Manchuria, the emperors ordered the construction of a two-hundred-mile mud wall planted with willow trees. It stretched from the Great Wall northeast through most of present-day Liaoning and Jilin provinces, with fortified checkpoints along its length.

The border was called the Willow Palisade, and it was even more porous than the Great Wall. It was completed in 1681, and perhaps twenty years later my ancestor breached it to settle in Liutai, which means “sixth post”—one of the fortified towers that was built expressly to keep out people like him.

An article by Sarah Stillman in this week’s New Yorker describes how, over the past 15 years, the U.S. has adopted tougher measures to keep migrants from crossing illegally into the U.S. from Mexico in spite of the U.S. economy’s continued dependence on more immigrant labor than our government will legally allow to enter. These measures, which include the construction of hundreds of miles of fence, apparently have slowed the rate of illegal crossings. At the same time, they have encouraged the expansion of the human-smuggling business, catalyzed the growth of criminal rackets that extort the families of kidnapped migrants for ransom, and, as in the Mediterranean, contributed to a significant increase in the number of deaths occurring en route.

On the US-Mexico border. Photo by Anthony Suau for TIME.

This impulse is not specific to rich countries. In South Africa, at least seven people have been killed this month in violent attacks on immigrants and their businesses in parts of Durban and Johannesburg. Among the governments publicly condemning these attacks is Nigeria’s. In the early 1980s, the Nigerian government expelled millions of West African migrants from its territory, “blaming them for widespread unemployment and crime” after a slump in oil prices pushed Nigeria’s economy into a downward spiral.

This impulse runs deep. A study published in 1997 found that drivers at a shopping mall left their parking spaces more slowly when another car was waiting near that space than they did when no one was around, even though that delay was costly for both parties. The study’s authors attributed that finding to territorial behavior—”marking or defending a location in order to indicate a presumed right to a particular place.”

This behavior may be instinctual, but that doesn’t mean it’s just. Physical or legal, these walls implicitly assign different values to the lives of the people on either side of them. According to liberalism—and to many other moral philosophies—this gradation of human life is wrong. We should not confuse the accident of our birth on the richer or safer side of those walls with a moral right to exclusively enjoy that relative wealth or safety. The intended and unintended consequences of policy change need to be considered alongside the desired end state, but they should at least be considered. The status quo is shameful.

Some economists also argue that the status quo is unnecessarily costly. In a 2011 paper in the Journal of Economic Perspectives, Michael Clemens estimated that barriers to emigration have a much larger damping effect on the global economy than barriers to capital and trade do.

How large are the economic losses caused by barriers to emigration? Research on this question has been distinguished by its rarity and obscurity, but the few estimates we have should make economists’ jaws hit their desks. When it comes to policies that restrict emigration, there appear to be trillion-dollar bills on the sidewalk.

I hope I live to see that claim tested.

Follow

Get every new post delivered to your Inbox.

Join 12,849 other followers

%d bloggers like this: