How Likely Is (Nuclear) War Between the United States and Russia?

Last week, Vox ran a long piece by Max Fisher claiming that “the prospect of a major war, even a nuclear war, in Europe has become thinkable, [experts] warn, even plausible.” Without ever clarifying what “thinkable” or “plausible” mean in this context, Fisher seems to be arguing that, while still unlikely, the probability of a nuclear war between the United States and Russia is no longer small and is rising.

I finished Fisher’s piece and wondered: Is that true? As someone who’s worked on a couple of projects (here and here) that use “wisdom of crowds” methods to make educated guesses about how likely various geopolitical events are, I know that one way to try to answer that question is to ask a bunch of informed people for their best estimates and then average them.

So, on Thursday morning, I went to SurveyMonkey and set up a two-question survey that asks respondents to assess the likelihood of war between the United States and Russia before 2020 and, if war were to happen, the likelihood that one or both sides would use nuclear weapons. To elicit responses, I tweeted the link once and posted it to the Conflict Research Group on Facebook and the IRstudies subreddit. The survey is still running [UPDATE: It’s now closed, because Survey Monkey won’t show me more than the first 100 responses without a paid subscription], but 100 people have taken it so far, and here are the results—first, on the risk of war:

wwiii.warrisk

And then on the risk that one or both sides would nuclear weapons, conditional on the occurrence of war:

wwiii.nukerisk

These results come from a convenience sample, so we shouldn’t put too much stock in them. Still, my confidence in their reliability got a boost when I learned yesterday that a recent survey of international-relations experts around the world asked an almost-identical question about the risk of a war and obtained similar results. In its 2014 survey, the TRIP project asked: “How likely is war between the United States and Russia over the next decade? Please use the 0–10 scale with 10 indicating that war will definitely occur.” They got 2,040 valid responses to that question, and here’s how they were distributed:

trip.warrisk

Those results are centered a little further to the right than the ones from my survey, but TRIP asked about a longer time period (“next decade” vs. “before 2020″), and those additional five years could explain the difference. It’s also important to note that the scales aren’t directly comparable; where the TRIP survey’s bins implicitly lie on a linear scale, mine were labeled to give respondents more options toward the extremes (e.g., “Certainly not” and “Almost certainly not”).

In light of that corroborating evidence, let’s assume for the moment that the responses to my survey are not junk. So then, how likely is a US/Russia war in the next several years, and how likely is it that such a war would go nuclear if it happened? To get to estimated probabilities of those events, I did two things:

  1. Assuming that the likelihoods implicit my survey’s labels follow a logistic curve, I converted them to predicted probabilities as follows: p(war) = exp(response – 5)/(1 + exp(response – 5)). That rule produces the following sequence for the 0–10 bins: 0.007, 0.018, 0.047, 0.119, 0.269, 0.500, 0.731, 0.881, 0.953, 0.982, 0.993.

  2. I calculated the unweighted average of those predicted probabilities.

Here are the estimates that process produced, rounded up to the nearest whole percentage point:

  • Probability of war: 11%
  • Probability that one or both sides will use nuclear weapons, conditional on war: 18%

To translate those figures into a single number representing the crowd’s estimate of the probability of nuclear war between the US and Russia before 2020, we take their product: 2%.

Is that number different from what Max Fisher had in mind when he wrote that a nuclear war between the US and Russia is now “thinkable,” “plausible,” and “more likely than you think”? I don’t know. To me, “thinkable” and “plausible” seem about as specific as “possible,” a descriptor that applies to almost any geopolitical event you can imagine. I think Max’s chief concern in writing that piece was to draw attention to a risk that he believes to be dangerously under-appreciated, but it would be nice if he had asked his sources to be more specific about just how likely they think this calamity is.

More important, is that estimate “true”? As Ralph Atkins argued in a recent Financial Times piece about estimating the odds of Grexit, it’s impossible to say. For unprecedented and at least partially unique events like these—an exit from the euro zone, or a nuclear war between major powers—we can never know the event-generating process well enough to estimate their probabilities with high confidence. What we get instead are summaries of peoples’ current beliefs about those events’ likelihood. That’s highly imperfect, but it’s still informative in its own way.

2015 Tour de France Predictions

I like to ride bikes, I like to watch the pros race their bikes, and I make forecasts for a living, so I thought it would be fun to try to predict the outcome of this year’s Tour de France, which starts this Saturday and ends on July 26. I’m also interested in continuing to explore the predictive power of pairwise wiki surveys, a crowdsourcing tool that I’ve previously used to try to forecast mass-killing onsets, coup attempts, and pro football games, and that ESPN recently used to rank NBA draft prospects.

So, a couple of weeks ago, I used All Our Ideas to create a survey that asks, “Which rider is more likely to win the 2015 Tour de France?” I seeded the survey with the names of 11 riders—the 10 seen by bookmakers at Paddy Power as the most likely winners, plus Peter Sagan because he’s fun to watchposted a link to the survey on Tumblr, and trolled for respondents on Twitter and Facebook. The survey got off to a slow start, but then someone posted a link to it in the r/cycling subreddit, and the votes came pouring in. As of this afternoon, the survey had garnered more than 4,000 votes in 181 unique user sessions that came from five continents (see the map below). The crowd also added a handful of other riders to the set under consideration, bringing the list up to 16.

tourdefrance.2015.votemap

So how does that self-selected crowd handicap the race? The dot plot below shows the riders in descending order by their survey scores, which range from 0 to 100 and indicate the probability that that rider would beat a randomly chosen other rider for a randomly chosen respondent. In contrast to Paddy Power, which currently shows Chris Froome as the clear favorite and gives Nairo Quintana a slight edge over Alberto Contador, this survey sees Contador as the most likely winner (survey score of 90), followed closely by Froome (87) and a little further by Quintana (80). Both sources put Vincenzo Nibali as fourth likeliest (73) and Tejay van Garderen (65) and Thibaut Pinot (51) in the next two spots, although Paddy Power has them in the opposite order. Below that, the distances between riders’ chances get smaller, but the wiki survey’s results still approximate the handicapping of the real-money markets pretty well.

tourdefrance.2015.scores

There are at least a couple of ways to try to squeeze some meaning out those scores. One is to read the chart as a predicted finishing order for the 16 riders listed. That’s useful for something like a bike race, where we—well, some of us, anyway—care not only who wins, but also where other will riders finish, too.

We can also try to convert those scores to predicted probabilities of winning. The chart below shows what happens when we do that by dividing each rider’s score by the sum of all scores and then multiplying the result by 100. The probabilities this produces are all pretty low and more tightly bunched than seems reasonable, but I’m not sure how else to do this conversion. I tried squaring and cubing the scores; the results came closer to what the betting-market odds suggest are the “right” values, but I couldn’t think of a principled reason to do that, so I’m not showing those here. If you know a better way to get from those model scores to well-calibrated win probabilities, please let me know in the comments.

tourdefrance.2015.winprobs

So that’s what the survey says. After the Tour concludes in a few weeks, I’ll report back on how the survey’s predictions fared. Meanwhile, here’s wishing the athletes a crash–, injury–, and drug–free tour. Judging by the other big races I’ve seen so far this year, it should be a great one to watch.

The Birth of Crowdsourcing?

From p. 106 of the first paperback edition of The Professor and the Madman, a slightly overwrought but enjoyable history of the origins of the Oxford English Dictionary, found on the shelf of a vacation rental:

The new venture that [Richard Chenevix] Trench seemed now to be proposing would demonstrate not merely the meaning but the history of meaning, the life story of each word. And that would mean the reading of everything and the quoting of everything that showed anything of the history of the words that were to be cited. The task would be gigantic, monumental, and—according to the conventional thinking of the times—impossible.

Except that here Trench presented an idea, an idea that—to those ranks of conservative and frock-coated men who sat silently in the [London Library] on that dank and foggy evening [in 1857]—was potentially dangerous and revolutionary. But it was the idea that in the end made the whole venture possible.

The undertaking of the scheme, he said, was beyond the ability of any one man. To peruse all of English literature—and to comb the London and New York newspapers and the most literate of the magazines and journals—must be instead “the combined action of many.” It would be necessary to recruit a team—moreover, a huge one—probably comprising hundreds and hundreds of unpaid amateurs, all of them working as volunteers.

The audience murmured with surprise. Such an idea, obvious though it may sound today, had never been put forward before. But then, some members said as the meeting was breaking up, it did have some real merit.

And here’s what that crowdsourcing process ended up looking like in practice:

[Frederick] Furnivall then issued a circular calling for volunteer readers. They could select from which period of history they would like to read books—from 1250 to 1526, the year of the New English Testament; from then to 1674, the year when Milton died; or from 1674 to what was then the present day. Each period, it was felt, represented the existence of different trends in the development of the language.

The volunteers’ duties were simple enough, if onerous. They would write to the society offering their services in reading certain books; they would be asked to read and make word-lists of all that they read, and would then be asked to look, super-specifically, for certain words that currently interested the dictionary team. Each volunteer would take a slip of paper, write at its top left-hand side the target word, and below, also on the left, the date of the details that followed: These were, in order, the title of the book or paper, its volume and page number, and then, below that, the full sentence that illustrated the use of the target word. It was a technique that has been undertaken by lexicographers to the present day.

Herbert Coleridge became the first editor of what was to be called A New English Dictionary on Historical Principles. He undertook as his first task what may seem prosaic in the extreme: the design of a small stack of oak-board pigeonholes, nine holes wide and six high, which could accommodate the anticipated sixty to one hundred thousand slips of paper that would come in from the volunteers. He estimated that the first volume of the dictionary would be available to the world within two years. “And were it not for the dilatoriness of many contributors,” he wrote, clearly in a tetchy mood, “I should not hesitate to name an earlier period.”

Everything about these forecasts was magnificently wrong. In the end more than six million slips of paper came in from the volunteers; and Coleridge’s dreamy estimate that it might take two years to have the first salable section of the dictionary off the presses—for it was to be sold in parts, to help keep revenues coming in—was wrong by a factor of ten. It was this kind of woefully naive underestimate—of work, of time, of money—that at first so hindered the dictionary’s advance. No one had a clue what they were up against: They were marching blindfolded through molasses.

So, even with all those innovations, this undertaking also produced a textbook example of the planning fallacy. I wonder how quickly and cheaply the task could have been completed with Mechanical Turk, or with some brush-clearing assistance from text mining?

A Skeptical Note on Policy-Prescriptive Political Science

My sometimes-colleague Michael Horowitz wrote a great piece for War on the Rocks last week on what “policy relevance” means for political scientists who study international affairs, and the different forms that relevance can take. Among the dimensions of policy relevance he calls out is the idea of “policy actionability”:

Policy actionability refers to a recommendation that is possible to implement for the target of the recommendation. Most academic work is not policy actionable, fundamentally. For example, implications from international relations research are things such as whether countries with high male-to-female ratios are more likely to start military conflicts or that countries that acquire nuclear weapons become harder to coerce.

As Michael notes, most scholarship isn’t “actionable” in this way, and isn’t meant to be. In my experience, though, there is plenty of demand in Washington and elsewhere for policy-actionable research on international affairs, and there is a subset of scholars who, in pursuit of relevance, do try to extract policy prescriptions from their studies.

As an empiricist, I welcome both of those things—in principle. Unfortunately, the recommendations that scholars offer rarely follow directly from their research. Instead, they almost always require some additional, often-heroic assumptions, and those additional assumptions render the whole endeavor deeply problematic. For example, Michael observes that most statistical studies identify average effects—other things being equal, a unit change in x is associated with some amount of change in y—and points out that the effects in any particular case will still be highly uncertain.

That’s true for a lot of what we study, but it’s only the half of it. Even more significant, I think, are the following three assumptions, which implicitly underpin the “policy implications” sections in a lot of the work on international affairs that tries to convert comparative analysis (statistical or not) into policy recommendations:

  • Attempts to induce a change in x in the prescribed direction will actually produce the desired change in x;
  • Attempts to induce a change in x in the prescribed direction will not produce significant and negative unintended consequences; and
  • If it does occur, a change in y induced by the policy actor to whom the scholar is making recommendations will have the same effect as previous changes in y that occurred for various other reasons.

The last assumption isn’t so problematic when the study in question looked specifically at policy actions by that same policy actor, but that’s almost never the case in international relations and other fields using observational data to study macro-political behavior. Instead, we’re more likely to have a study that looked at something like GDP growth rates, female literacy, or the density of “civil society” organizations that the policy audience does not control and does not know how to control. Under these circumstances, all three of those assumptions must hold for the research to be neatly “actionable,” and I bet most social scientists will tell you that at least one and probably two or three of them usually don’t.

With so much uncertainty and so much at stake, I wind up thinking that, unless their research designs have carefully addressed these assumptions, scholars—in their roles as scientists, not as citizens or advocates—should avoid that last mile and leave it to the elected officials and bureaucrats hired for that purpose. That’s hard to do when we care about the policies involved and get asked to offer “expert” advice, but “I don’t know” or “That’s not my area of expertise” will almost always be a more honest answer in these situations.

 

One Measure By Which Things Have Recently Gotten Worse

The United Nation’s refugee agency today released its annual report on people displaced by war around the world, and the news is bad:

The number of people forcibly displaced at the end of 2014 had risen to a staggering 59.5 million compared to 51.2 million a year earlier and 37.5 million a decade ago.

The increase represents the biggest leap ever seen in a single year. Moreover, the report said the situation was likely to worsen still further.

The report focuses on raw estimates of displaced persons, but I think it makes more sense to look at this group as a share of world population. The number of people on the planet has increased by more than half a billion in the past decade, so we might expect to see some growth in the number of forcibly displaced persons even if the amount of conflict worldwide had held steady. The chart below plots annual totals from the UNHCR report as a share of mid-year world population, as estimated by the U.S. Census Bureau (here).

unhcr.refugee.trends

The number of observations in this time series is too small to use Bayesian change point detection to estimate the likelihood that the upturn after 2012 marks a change in the underlying data-generating process. I’m not sure we need that kind of firepower, though. After holding more or less steady for at least six years, the share of world population forcibly displaced by war has increased by more than 50 percent in just two years, from about one of every 200 people to 1 of every 133 people. Equally important, reports from field workers indicate that this problem only continues to grow in 2015. I don’t think I would call this upturn a “paradigm change,” as UN High Commissioner for Refugees António Guterres did, but there is little doubt that the problem of displacement by war has worsened significantly since 2012.

In historical terms, just how bad is it? Unfortunately, it’s impossible to say for sure. The time series in the UNHCR report only starts in 2004, and a note warns that methodological changes in 2007 render the data before that year incomparable to the more recent estimates. The UNHCR describes the 2014 figure as “the highest level ever recorded,” and that’s technically true but not very informative when recording started only recently. A longer time series assembled by the Center for Systemic Peace (here) supports the claim that the latest raw estimate is the largest ever, but as a share of world population, it’s probably still a bit lower than the levels seen in the post–Cold War tumult of the early 1990s (see here).

Other relevant data affirm the view that, while clearly worsening, the intensity of armed conflict around the world is not at historically high levels, not even for the past few decades. Here is a plot of annual counts of battle-related deaths (low, high, and best estimates) according to the latest edition of UCDP’s data set on that topic (here), which covers the period 1989–2013. Note that these figures have not been adjusted for changes in world population.

Annual estimates of battle-related deaths worldwide, 1989-2013 (data source: UCDP)

Annual estimates of battle-related deaths worldwide, 1989-2013 (data source: UCDP)

We see similar pattern in the Center for Systemic Peace’s Major Episodes of Political Violence data set (second row here), which covers the whole post-WWII period. For the chart below, I have separately summed the data set’s scalar measure of conflict intensity for two types of conflict, civil and interstate (see the codebook for details). Like the UCDP data, these figures show a local increase in the past few years that nevertheless remains well below the prior peak, which came when the Soviet Union fell apart.

Annual intensity of political violence worldwide, 1946-2014 (data source: CSP)

Annual intensity of political violence worldwide, 1946-2014 (data source: CSP)

And, for longer-term perspective, it always helps to take another look at this one, from an earlier UCDP report:

PRIO battle death trends

I’ll wrap this up by pinning a note in something I see when comparing the shorter-term UCDP estimates to the UNHCR estimates on forcibly displaced persons: adjusting for population, it looks like armed conflicts may be killing fewer but displacing more than they used to. That impression is bolstered by a glance at UCDP data on trends in deaths from “intentional attacks on civilians by governments and formally organized armed groups,” which UCDP calls “one-sided violence” (here).  As the plot below shows, the recent upsurge in warfare has not yet produced a large increase in the incidence of these killings, either. The line is bending upward, but it remains close to historical lows.

Estimated annual deaths from one-sided violence, 1989-2013 (Source: UCDP)

Estimated annual deaths from one-sided violence, 1989-2013 (Source: UCDP)

So, in the tumult of the past few years, it looks like the rate of population displacement has surged while the rate of battle deaths has risen more slowly and the rate of one-sided violence targeting civilians hasn’t risen much at all. If that’s true, then why? Improvements in medical care in conflict zones are probably part of the story, but I wonder if changes in norms and values, and in the international institutions and practices instantiating them, aren’t also shaping these trends. Governments that in the past might have wantonly killed populations they regarded as threats now seem more inclined to press those populations by other means—not always, but more often. Meanwhile, international organizations are readier than ever to assist those groups under pressure by feeding and sheltering them, drawing attention to their miseries, and sometimes even protecting them. The trend may be fragile, and the causality is impossible to untangle with confidence, but it deserves contemplation.

From China, Another Strike Against Legitimacy

I’ve groused on this blog before (here and here) about the trouble with “legitimacy” as a causal mechanism in theories of political stability and change, and I’ve pointed to Xavier Marquez’s now-published paper as the most cogent expression of this contrarian view to date.

Well, here is a fresh piece of empirical evidence against the utility of this concept: according to a new Global Working Paper from Brookings, the citizens of China who have benefited the most from that country’s remarkable economic growth in recent decades are, on average, its least happy. As one of the paper’s authors describes in a blog post about their research,

We find that the standard determinants of well-being are the same for China as they are for most countries around the world. At the same time, China stands out in that unhappiness and reported mental health problems are highest among the cohorts who either have or are positioned to benefit from the transition and related growth—a clear progress paradox. These are urban residents, the more educated, those who work in the private sector, and those who report to have insufficient leisure time and rest.

These survey results contradict the “performance legitimacy” story that many observers use to explain how the Chinese Communist Party has managed to avoid significant revolutionary threats since 1989 (see here, for example). In that story, Chinese citizens choose not to demand political liberalization because they are satisfied with the government’s economic performance. In effect, they accept material gains in lieu of political voice.

Now, though, we learn that the cohort in which contentious collective action is most likely to emerge—educated urbanites—are also, on average, the country’s least happy people. The authors also report (p. 14) that, in China, “the effect of income increases on life satisfaction are limited.” A legitimacy-based theory predicts that the CCP is surviving because it is making and keeping its citizens happy; instead, we see that it is surviving in spite of deepening unhappiness among key cohorts.

To me, this case further bares the specious logic behind most legitimacy-based explanations for political continuity. We believe that rebellion is an expression of popular dissatisfaction, a kind of referendum in the streets; we observe stability; so, we reason backwards from the absence of rebellion to the absence of dissatisfaction, sprinkle a little normative dust on it, and arrive at a positive concept called legitimacy. Formally, this is a fallacy of affirmative conclusion from a negative premise: happy citizens don’t rebel, no rebellion is occurring, therefore citizens must be happy. Informally, I think it’s a qualitative version of the “story time” process in which statistical modelers often indulge: get a surprising result, then make up a richer explanation for it that feels right.

I don’t mean to suggest that popular attitudes are irrelevant to political stasis and change, or that the durability of specific political regimes has nothing to do with the affinity between their institutional forms and the cultural contexts in which they’re operating. Like Xavier, though, I do believe that the conventional concept of legitimacy is too big and fuzzy to have any real explanatory power, and I think this new evidence from China reminds us of that point. If we want to understand how political regimes persist and when they break down, we need to identify mechanisms that are more specific than this one, and to embed them in theories that allow for more complexity.

A Plea for More Prediction

The second Annual Bank Conference on Africa happened in Berkeley, CA, earlier this week, and the World Bank’s Development Impact blog has an outstanding summary of the 50-odd papers presented there. If you have to pick between reading this post and that one, go there.

One paper on that roster that caught my eye revisits the choice of statistical models for the study of civil wars. As authors John Paul Dunne and Nan Tian describe, the default choice is logistic regression, although probit gets a little playing time, too. They argue, however, that a zero-inflated Poisson (ZIP) model matches the data-generating process better than either of these traditional picks, and they show that this choice affects what we learn about the causes of civil conflict.

Having worked on statistical models of civil conflict for nearly 20 years, I have some opinions on that model-choice issue, but those aren’t what I want to discuss right now. Instead, I want to wonder aloud why more researchers don’t use prediction as the yardstick—or at least one of the yardsticks—for adjudicating these model comparisons.

In their paper, Dunne and Tian stake their claim about the superiority of ZIP to logit and probit on comparisons of Akaike information criteria (AIC) and Vuong tests. Okay, but if their goal is to see if ZIP fits the underlying data-generating process better than those other choices, what better way to find out than by comparing out-of-sample predictive power?

Prediction is fundamental to the accumulation of scientific knowledge. The better we understand why and how something happens, the more accurate our predictions of it should be. When we estimate models from observational data and only look at how well our models fit the data from which they were estimated, we learn some things about the structure of that data set, but we don’t learn how well those things generalize to other relevant data sets. If we believe that the world isn’t deterministic—that the observed data are just one of many possible realizations of the world—then we need to care about that ability to generalize, because that generalization and the discovery of its current limits is the heart of the scientific enterprise.

From a scientific standpoint, the ideal world would be one in which we could estimate models representing rival theories, then compare the accuracy of the predictions they generate across a large number of relevant “trials” as they unfold in real time. That’s difficult for scholars studying big but rare events like civil wars and wars between states; though; a lot of time has to pass before we’ll see enough new examples to make a statistically powerful comparison across models.

But, hey, there’s an app for that—cross-validation! Instead of using all the data in the initial estimation, hold some out to use as a test set for the models we get from the rest. Better yet, split the data into several equally-sized folds and then iterate the training and testing across all possible groupings of them (k-fold cross-validation). Even better, repeat that process a bunch of times and compare distributions of the resulting statistics.

Prediction is the gold standard in most scientific fields, and cross-validation is standard practice in many areas of applied forecasting, because they are more informative than in-sample tests. For some reason, political science still mostly eschews both.* Here’s hoping that changes soon.

* For some recent exceptions to this rule on topics in world politics, see Ward, Greenhill, and Bakke and Blair, Blattman, and Hartman on predicting civil conflict; Chadefaux on warning signs of interstate war; Hill and Jones on state repression; and Chenoweth and me on the onset of nonviolent campaigns.

Another Tottering Step Toward a New Era of Data-Making

Ken Benoit, Drew Conway, Benjamin Lauderdale, Michael Laver, and Slava Mikhaylov have an article forthcoming in the American Political Science Review that knocked my socks off when I read it this morning. Here is the abstract from the ungated version I saw:

Empirical social science often relies on data that are not observed in the field, but are transformed into quantitative variables by expert researchers who analyze and interpret qualitative raw sources. While generally considered the most valid way to produce data, this expert-driven process is inherently difficult to replicate or to assess on grounds of reliability. Using crowd-sourcing to distribute text for reading and interpretation by massive numbers of non-experts, we generate results comparable to those using experts to read and interpret the same texts, but do so far more quickly and flexibly. Crucially, the data we collect can be reproduced and extended transparently, making crowd-sourced datasets intrinsically reproducible. This focuses researchers’ attention on the fundamental scientific objective of specifying reliable and replicable methods for collecting the data needed, rather than on the content of any particular dataset. We also show that our approach works straightforwardly with different types of political text, written in different languages. While findings reported here concern text analysis, they have far-reaching implications for expert-generated data in the social sciences.

The data-making strategy they develop is really innovative, and the cost of implementing is, I estimate from the relevant tidbits in the paper, 2–3 orders of magnitude lower than the cost of the traditional expert-centric approach. In other words, this is potentially a BIG DEAL for social-science data-making, which, as Sinan Aral reminds us, is a BIG DEAL for doing better social science.

That said, I do wonder how much structure is baked into the manifesto-coding task that isn’t there in most data-making problems, and that makes it especially well suited to the process the authors develop. In the exercise the paper describes:

  1. The relevant corpus (party manifestos) is self-evident, finite, and not too large;
  2. The concepts of interest (economic vs. social policy, left vs. right) are fairly intuitive; and
  3. The inferential task is naturally “fractal”; that is, the concepts of interest inhere in individual sentences (and maybe even words) as well as whole documents.

None of those attributes holds when it comes to coding latent socio-political structural features like de facto forms of government (a.k.a. regime type) or whether or not a country is in a state of civil war. These features are fundamental to analyses of international politics, but the high cost of producing them means that we sometimes don’t get them at all, and when we do, we usually don’t get them updated as quickly or as often as we would need to do more dynamic analysis and prediction. Maybe it’s my lack of imagination, but I can’t quite see how to extend the authors’ approach to those topics without stretching it past the breaking point. I can think of ways to keep the corpus manageable, but the concepts are not as intuitive, and the inferential task is not fractal. Ditto for coding event data, where I suspect that 2 from the list above would mostly hold; 3 would sometimes hold; but 1 absolutely would not.*

In short, I’m ga-ga about this paper and the directions in which it points us, but I’m not ready yet to declare imminent victory in the struggle to drag political science into a new and much healthier era of data-making. (Fool me once…)

* If you think I’m overlooking something here, please leave a comment explaining how you think it might be do-able.

Visualizing Strike Activity in China

In my last post, I suggested that the likelihood of social unrest in China is probably higher than a glance at national economic statistics would suggest, because those statistics conceal the fact that economic malaise is hitting some areas much harder than others and local pockets of unrest can have national effects (ask Mikhail Gorbachev about that one). Near the end of the post, I effectively repeated this mistake by showing a chart that summarized strike activity over the past few years…at the national level.

So, what does the picture look like if we disaggregate that national summary?

The best current data on strike activity in China come from China Labour Bulletin (CLB), a Hong Kong–based NGO that collects incident reports from various Chinese-language sources, compiles them in a public data set, and visualizes them in an online map. Those data include a few fields that allow us to disaggregate our analysis, including the province in which an incident occurred (Location), the industry involved (Industry), and the claims strikers made (Demands). On May 28, I downloaded a spreadsheet with data for all available dates (January 2011 to the present) for all types of incidents and wrote an R script that uses small multiples to compare strike activity across groups within each of those categories.

First, here’s the picture by province. This chart shows that Guangdong has been China’s most strike-prone province over the past several years, but several other provinces have seen large increases in labor unrest in the past two years, including Henan, Hebei, Hubei, Shandong, Sichuan, and Jiangsu. Right now, I don’t have monthly or quarterly province-level data on population size and economic growth to model the relationship among these things, but a quick eyeballing of the chart from the FT in my last post indicates that these more strike-prone provinces skew toward the lower end of the range of recent GDP growth rates, as we would expect.

sparklines.province

Now here’s the picture by industry. This chart makes clear that almost all of the surge in strike activity in the past year has come from two sectors: manufacturing and construction. Strikes in the manufacturing sector have been trending upward for a while, but the construction sector really got hit by a wave in just the past year that crested around the time of the Lunar New Year in early 2015. Other sectors also show signs of increased activity in recent months, though, including services, mining, and education, and the transportation sector routinely contributes a non-negligible slice of the national total.

sparklines.industry

And, finally, we can compare trends over time in strikers’ demands. This analysis took a little more work, because the CLB data on Demands do not follow best coding practices in which a set of categories is established a priori and each demand is assigned to one of those categories. In the CLB data, the Demands field is a set of comma-delimited phrases that are mostly but not entirely standardized (e.g., “wage arrears” and “social security” but also “reduction of their operating territory” and “gas-filing problem and too many un-licensed cars”). So, to aggregate the data on this dimension, I created a few categories of my own and used searches for regular expressions to find records that belonged in them. For example, all events for which the Demands field included “wage arrear”, “pay”, “compensation”, “bonus” or “ot” got lumped together in a Pay category, while events involving claims marked as “social security” or “pension” got combined in a Social Security category (see the R script for details).

The results appear below. As CLB has reported, almost all of the strike activity in China is over pay, usually wage arrears. There’s been an uptick in strikes over layoffs in early 2015, but getting paid better, sooner, or at all for work performed is by far the chief concern of strikers in China, according to these data.

sparklines.demands

In closing, a couple of caveats.

First, we know these data are incomplete, and we know that we don’t know exactly how they are incomplete, because there is no “true” record to which they can be compared. It’s possible that the apparent increase in strike activity in the past year or two is really the result of more frequent reporting or more aggressive data collection on a constant or declining stock of events.

I doubt that’s what’s happening here, though, for two reasons. One, other sources have reported the Chinese government has actually gotten more aggressive about censoring reports of social unrest in the past two years, so if anything we should expect the selection bias from that process to bend the trend in the opposite direction. Two, theory derived from historical observation suggests that strike activity should increase as the economy slows and the labor market tightens, and the observed data are consistent with those expectations. So, while the CLB data are surely incomplete, we have reason to believe that the trends they show are real.

Second, the problem I originally identified at the national level also applies at these levels. China’s provinces are larger than many countries in the world, and industry segments like construction and manufacturing contain a tremendous variety of activities. To really escape the ecological fallacy, we would need to drill down much further to the level of specific towns, factories, or even individuals. As academics would say, though, that task lies beyond the scope of the current blog post.

In China, Don’t Mistake the Trees for the Forest

Anyone who pays much attention to news of the world knows that China’s economy is cooling a bit. Official statistics—which probably aren’t true but may still be useful—show annual growth slowing from over 7.5 to around 7 percent or lower and staying there for a while.

For economists, the big question seems to be whether or not policy-makers can control the descent and avoid a hard landing or crash. Meanwhile, political scientists and sociologists wonder whether or not that economic slowdown will spur social unrest that could produce a national political crisis or reform. Most of what I remember reading on the topic has suggested that the risk of large-scale social unrest will remain low as long as China avoids the worst-case economic scenarios. GDP growth in the 6–7 percent range would be a letdown, but it’s still pretty solid compared to most places and is hardly a crisis.

I don’t know enough about economics to wade into that field’s debate, but I do wonder if an ecological fallacy might be leading many political scientists to underestimate the likelihood of significant social unrest in China in response to this economic slowdown. We commit an ecological fallacy when we assume that the characteristics of individuals in a group match the central tendencies of that group—for example, assuming that a kid you meet from a wealthy, high-performing high school is rich and will score well on the SAT. Put another way, an ecological fallacy involves mistakenly assuming that each tree shares the characteristic features of the forest they comprise.

Now consider the chart below, from a recent article in the Financial Times about the uneven distribution of economic malaise across China’s provinces. As the story notes, “The slowdown has affected some areas far worse than others. Perhaps predictably, the worst-hit places are those that can least afford it.”

The chart reminds us that China is a large and heterogeneous country—and, as it happens, social unrest isn’t a national referendum. You don’t need a majority vote from a whole country to get popular protest that can threaten to reorder national politics; you just need to reach a critical point, and that point can often be reached with a very small fraction of the total population. So, instead of looking at national tendencies to infer national risk, we should look at the tails of the relevant distributions to see if they’re getting thicker or longer. The people and places at the wrong ends of those distributions represent pockets of potential unrest; other things being equal, the more of them there are, the greater the cumulative probability of relevant action.

So how do things look in that thickening tail? Here again is that recent story in the FT:

Last month more than 30 provincial taxi drivers drank poison and collapsed together on the busiest shopping street in Beijing in a dramatic protest against economic and working conditions in their home town.

The drivers, who the police say all survived, were from Suifenhe, a city on the Russian border in the northeastern province of Heilongjiang…

Heilongjiang is among the poorest performers. While national nominal growth slipped to 5.8 per cent in the first quarter compared with a year earlier — its lowest level since the global financial crisis — the province’s nominal GDP actually contracted, by 3.2 per cent.

In the provincial capital of Harbin, signs of economic malaise are everywhere.

The relatively small, ritual protest described at the start of that block quote wouldn’t seem to pose much threat to Communist Party rule, but then neither did Mohamed Bouazizi’s self-immolation in Tunisia in December 2010.

Meanwhile, as the chart below shows, data collected by China Labor Bulletin show that the incidence of strikes and other forms of labor unrest has increased in China in the past year. Each such incident is arguably another roll of the dice that could blow up into a larger and longer episode. Any one event is extremely unlikely to catalyze a larger campaign that might reshape national politics in a significant way, but the more trials run, the higher the cumulative probability.

Monthly counts of labor incidents in China, January 2012-May 2015 (data source: China Labor Bulletin)

Monthly counts of labor incidents in China, January 2012-May 2015 (data source: China Labor Bulletin)

The point of this post is to remind myself and anyone bothering to read it that statistics describing the national economy in the aggregate aren’t a reliable guide to the likelihood of those individual events, and thus of a larger and more disruptive episode, because they conceal important variation in the distribution they summarize. I suspect that most China experts already think in these terms, but I think most generalists (like me) do not. I also suspect that this sub-national variation is one reason why statistical models using country-year data generally find weak association between things like economic growth and inflation on the one hand and demonstrations and strikes on the other. Maybe with better data in the future, we’ll find stronger affirmation of the belief many of us hold that economic distress has a strong effect on the likelihood of social unrest, because we won’t be forced into an ecological fallacy by the limits of available information.

Oh, and by the way: the same goes for Russia.

Follow

Get every new post delivered to your Inbox.

Join 12,542 other followers

%d bloggers like this: