2015 Tour de France Predictions

I like to ride bikes, I like to watch the pros race their bikes, and I make forecasts for a living, so I thought it would be fun to try to predict the outcome of this year’s Tour de France, which starts this Saturday and ends on July 26. I’m also interested in continuing to explore the predictive power of pairwise wiki surveys, a crowdsourcing tool that I’ve previously used to try to forecast mass-killing onsets, coup attempts, and pro football games, and that ESPN recently used to rank NBA draft prospects.

So, a couple of weeks ago, I used All Our Ideas to create a survey that asks, “Which rider is more likely to win the 2015 Tour de France?” I seeded the survey with the names of 11 riders—the 10 seen by bookmakers at Paddy Power as the most likely winners, plus Peter Sagan because he’s fun to watchposted a link to the survey on Tumblr, and trolled for respondents on Twitter and Facebook. The survey got off to a slow start, but then someone posted a link to it in the r/cycling subreddit, and the votes came pouring in. As of this afternoon, the survey had garnered more than 4,000 votes in 181 unique user sessions that came from five continents (see the map below). The crowd also added a handful of other riders to the set under consideration, bringing the list up to 16.

tourdefrance.2015.votemap

So how does that self-selected crowd handicap the race? The dot plot below shows the riders in descending order by their survey scores, which range from 0 to 100 and indicate the probability that that rider would beat a randomly chosen other rider for a randomly chosen respondent. In contrast to Paddy Power, which currently shows Chris Froome as the clear favorite and gives Nairo Quintana a slight edge over Alberto Contador, this survey sees Contador as the most likely winner (survey score of 90), followed closely by Froome (87) and a little further by Quintana (80). Both sources put Vincenzo Nibali as fourth likeliest (73) and Tejay van Garderen (65) and Thibaut Pinot (51) in the next two spots, although Paddy Power has them in the opposite order. Below that, the distances between riders’ chances get smaller, but the wiki survey’s results still approximate the handicapping of the real-money markets pretty well.

tourdefrance.2015.scores

There are at least a couple of ways to try to squeeze some meaning out those scores. One is to read the chart as a predicted finishing order for the 16 riders listed. That’s useful for something like a bike race, where we—well, some of us, anyway—care not only who wins, but also where other will riders finish, too.

We can also try to convert those scores to predicted probabilities of winning. The chart below shows what happens when we do that by dividing each rider’s score by the sum of all scores and then multiplying the result by 100. The probabilities this produces are all pretty low and more tightly bunched than seems reasonable, but I’m not sure how else to do this conversion. I tried squaring and cubing the scores; the results came closer to what the betting-market odds suggest are the “right” values, but I couldn’t think of a principled reason to do that, so I’m not showing those here. If you know a better way to get from those model scores to well-calibrated win probabilities, please let me know in the comments.

tourdefrance.2015.winprobs

So that’s what the survey says. After the Tour concludes in a few weeks, I’ll report back on how the survey’s predictions fared. Meanwhile, here’s wishing the athletes a crash–, injury–, and drug–free tour. Judging by the other big races I’ve seen so far this year, it should be a great one to watch.

The Birth of Crowdsourcing?

From p. 106 of the first paperback edition of The Professor and the Madman, a slightly overwrought but enjoyable history of the origins of the Oxford English Dictionary, found on the shelf of a vacation rental:

The new venture that [Richard Chenevix] Trench seemed now to be proposing would demonstrate not merely the meaning but the history of meaning, the life story of each word. And that would mean the reading of everything and the quoting of everything that showed anything of the history of the words that were to be cited. The task would be gigantic, monumental, and—according to the conventional thinking of the times—impossible.

Except that here Trench presented an idea, an idea that—to those ranks of conservative and frock-coated men who sat silently in the [London Library] on that dank and foggy evening [in 1857]—was potentially dangerous and revolutionary. But it was the idea that in the end made the whole venture possible.

The undertaking of the scheme, he said, was beyond the ability of any one man. To peruse all of English literature—and to comb the London and New York newspapers and the most literate of the magazines and journals—must be instead “the combined action of many.” It would be necessary to recruit a team—moreover, a huge one—probably comprising hundreds and hundreds of unpaid amateurs, all of them working as volunteers.

The audience murmured with surprise. Such an idea, obvious though it may sound today, had never been put forward before. But then, some members said as the meeting was breaking up, it did have some real merit.

And here’s what that crowdsourcing process ended up looking like in practice:

[Frederick] Furnivall then issued a circular calling for volunteer readers. They could select from which period of history they would like to read books—from 1250 to 1526, the year of the New English Testament; from then to 1674, the year when Milton died; or from 1674 to what was then the present day. Each period, it was felt, represented the existence of different trends in the development of the language.

The volunteers’ duties were simple enough, if onerous. They would write to the society offering their services in reading certain books; they would be asked to read and make word-lists of all that they read, and would then be asked to look, super-specifically, for certain words that currently interested the dictionary team. Each volunteer would take a slip of paper, write at its top left-hand side the target word, and below, also on the left, the date of the details that followed: These were, in order, the title of the book or paper, its volume and page number, and then, below that, the full sentence that illustrated the use of the target word. It was a technique that has been undertaken by lexicographers to the present day.

Herbert Coleridge became the first editor of what was to be called A New English Dictionary on Historical Principles. He undertook as his first task what may seem prosaic in the extreme: the design of a small stack of oak-board pigeonholes, nine holes wide and six high, which could accommodate the anticipated sixty to one hundred thousand slips of paper that would come in from the volunteers. He estimated that the first volume of the dictionary would be available to the world within two years. “And were it not for the dilatoriness of many contributors,” he wrote, clearly in a tetchy mood, “I should not hesitate to name an earlier period.”

Everything about these forecasts was magnificently wrong. In the end more than six million slips of paper came in from the volunteers; and Coleridge’s dreamy estimate that it might take two years to have the first salable section of the dictionary off the presses—for it was to be sold in parts, to help keep revenues coming in—was wrong by a factor of ten. It was this kind of woefully naive underestimate—of work, of time, of money—that at first so hindered the dictionary’s advance. No one had a clue what they were up against: They were marching blindfolded through molasses.

So, even with all those innovations, this undertaking also produced a textbook example of the planning fallacy. I wonder how quickly and cheaply the task could have been completed with Mechanical Turk, or with some brush-clearing assistance from text mining?

A Skeptical Note on Policy-Prescriptive Political Science

My sometimes-colleague Michael Horowitz wrote a great piece for War on the Rocks last week on what “policy relevance” means for political scientists who study international affairs, and the different forms that relevance can take. Among the dimensions of policy relevance he calls out is the idea of “policy actionability”:

Policy actionability refers to a recommendation that is possible to implement for the target of the recommendation. Most academic work is not policy actionable, fundamentally. For example, implications from international relations research are things such as whether countries with high male-to-female ratios are more likely to start military conflicts or that countries that acquire nuclear weapons become harder to coerce.

As Michael notes, most scholarship isn’t “actionable” in this way, and isn’t meant to be. In my experience, though, there is plenty of demand in Washington and elsewhere for policy-actionable research on international affairs, and there is a subset of scholars who, in pursuit of relevance, do try to extract policy prescriptions from their studies.

As an empiricist, I welcome both of those things—in principle. Unfortunately, the recommendations that scholars offer rarely follow directly from their research. Instead, they almost always require some additional, often-heroic assumptions, and those additional assumptions render the whole endeavor deeply problematic. For example, Michael observes that most statistical studies identify average effects—other things being equal, a unit change in x is associated with some amount of change in y—and points out that the effects in any particular case will still be highly uncertain.

That’s true for a lot of what we study, but it’s only the half of it. Even more significant, I think, are the following three assumptions, which implicitly underpin the “policy implications” sections in a lot of the work on international affairs that tries to convert comparative analysis (statistical or not) into policy recommendations:

  • Attempts to induce a change in x in the prescribed direction will actually produce the desired change in x;
  • Attempts to induce a change in x in the prescribed direction will not produce significant and negative unintended consequences; and
  • If it does occur, a change in y induced by the policy actor to whom the scholar is making recommendations will have the same effect as previous changes in y that occurred for various other reasons.

The last assumption isn’t so problematic when the study in question looked specifically at policy actions by that same policy actor, but that’s almost never the case in international relations and other fields using observational data to study macro-political behavior. Instead, we’re more likely to have a study that looked at something like GDP growth rates, female literacy, or the density of “civil society” organizations that the policy audience does not control and does not know how to control. Under these circumstances, all three of those assumptions must hold for the research to be neatly “actionable,” and I bet most social scientists will tell you that at least one and probably two or three of them usually don’t.

With so much uncertainty and so much at stake, I wind up thinking that, unless their research designs have carefully addressed these assumptions, scholars—in their roles as scientists, not as citizens or advocates—should avoid that last mile and leave it to the elected officials and bureaucrats hired for that purpose. That’s hard to do when we care about the policies involved and get asked to offer “expert” advice, but “I don’t know” or “That’s not my area of expertise” will almost always be a more honest answer in these situations.

 

One Measure By Which Things Have Recently Gotten Worse

The United Nation’s refugee agency today released its annual report on people displaced by war around the world, and the news is bad:

The number of people forcibly displaced at the end of 2014 had risen to a staggering 59.5 million compared to 51.2 million a year earlier and 37.5 million a decade ago.

The increase represents the biggest leap ever seen in a single year. Moreover, the report said the situation was likely to worsen still further.

The report focuses on raw estimates of displaced persons, but I think it makes more sense to look at this group as a share of world population. The number of people on the planet has increased by more than half a billion in the past decade, so we might expect to see some growth in the number of forcibly displaced persons even if the amount of conflict worldwide had held steady. The chart below plots annual totals from the UNHCR report as a share of mid-year world population, as estimated by the U.S. Census Bureau (here).

unhcr.refugee.trends

The number of observations in this time series is too small to use Bayesian change point detection to estimate the likelihood that the upturn after 2012 marks a change in the underlying data-generating process. I’m not sure we need that kind of firepower, though. After holding more or less steady for at least six years, the share of world population forcibly displaced by war has increased by more than 50 percent in just two years, from about one of every 200 people to 1 of every 133 people. Equally important, reports from field workers indicate that this problem only continues to grow in 2015. I don’t think I would call this upturn a “paradigm change,” as UN High Commissioner for Refugees António Guterres did, but there is little doubt that the problem of displacement by war has worsened significantly since 2012.

In historical terms, just how bad is it? Unfortunately, it’s impossible to say for sure. The time series in the UNHCR report only starts in 2004, and a note warns that methodological changes in 2007 render the data before that year incomparable to the more recent estimates. The UNHCR describes the 2014 figure as “the highest level ever recorded,” and that’s technically true but not very informative when recording started only recently. A longer time series assembled by the Center for Systemic Peace (here) supports the claim that the latest raw estimate is the largest ever, but as a share of world population, it’s probably still a bit lower than the levels seen in the post–Cold War tumult of the early 1990s (see here).

Other relevant data affirm the view that, while clearly worsening, the intensity of armed conflict around the world is not at historically high levels, not even for the past few decades. Here is a plot of annual counts of battle-related deaths (low, high, and best estimates) according to the latest edition of UCDP’s data set on that topic (here), which covers the period 1989–2013. Note that these figures have not been adjusted for changes in world population.

Annual estimates of battle-related deaths worldwide, 1989-2013 (data source: UCDP)

Annual estimates of battle-related deaths worldwide, 1989-2013 (data source: UCDP)

We see similar pattern in the Center for Systemic Peace’s Major Episodes of Political Violence data set (second row here), which covers the whole post-WWII period. For the chart below, I have separately summed the data set’s scalar measure of conflict intensity for two types of conflict, civil and interstate (see the codebook for details). Like the UCDP data, these figures show a local increase in the past few years that nevertheless remains well below the prior peak, which came when the Soviet Union fell apart.

Annual intensity of political violence worldwide, 1946-2014 (data source: CSP)

Annual intensity of political violence worldwide, 1946-2014 (data source: CSP)

And, for longer-term perspective, it always helps to take another look at this one, from an earlier UCDP report:

PRIO battle death trends

I’ll wrap this up by pinning a note in something I see when comparing the shorter-term UCDP estimates to the UNHCR estimates on forcibly displaced persons: adjusting for population, it looks like armed conflicts may be killing fewer but displacing more than they used to. That impression is bolstered by a glance at UCDP data on trends in deaths from “intentional attacks on civilians by governments and formally organized armed groups,” which UCDP calls “one-sided violence” (here).  As the plot below shows, the recent upsurge in warfare has not yet produced a large increase in the incidence of these killings, either. The line is bending upward, but it remains close to historical lows.

Estimated annual deaths from one-sided violence, 1989-2013 (Source: UCDP)

Estimated annual deaths from one-sided violence, 1989-2013 (Source: UCDP)

So, in the tumult of the past few years, it looks like the rate of population displacement has surged while the rate of battle deaths has risen more slowly and the rate of one-sided violence targeting civilians hasn’t risen much at all. If that’s true, then why? Improvements in medical care in conflict zones are probably part of the story, but I wonder if changes in norms and values, and in the international institutions and practices instantiating them, aren’t also shaping these trends. Governments that in the past might have wantonly killed populations they regarded as threats now seem more inclined to press those populations by other means—not always, but more often. Meanwhile, international organizations are readier than ever to assist those groups under pressure by feeding and sheltering them, drawing attention to their miseries, and sometimes even protecting them. The trend may be fragile, and the causality is impossible to untangle with confidence, but it deserves contemplation.

From China, Another Strike Against Legitimacy

I’ve groused on this blog before (here and here) about the trouble with “legitimacy” as a causal mechanism in theories of political stability and change, and I’ve pointed to Xavier Marquez’s now-published paper as the most cogent expression of this contrarian view to date.

Well, here is a fresh piece of empirical evidence against the utility of this concept: according to a new Global Working Paper from Brookings, the citizens of China who have benefited the most from that country’s remarkable economic growth in recent decades are, on average, its least happy. As one of the paper’s authors describes in a blog post about their research,

We find that the standard determinants of well-being are the same for China as they are for most countries around the world. At the same time, China stands out in that unhappiness and reported mental health problems are highest among the cohorts who either have or are positioned to benefit from the transition and related growth—a clear progress paradox. These are urban residents, the more educated, those who work in the private sector, and those who report to have insufficient leisure time and rest.

These survey results contradict the “performance legitimacy” story that many observers use to explain how the Chinese Communist Party has managed to avoid significant revolutionary threats since 1989 (see here, for example). In that story, Chinese citizens choose not to demand political liberalization because they are satisfied with the government’s economic performance. In effect, they accept material gains in lieu of political voice.

Now, though, we learn that the cohort in which contentious collective action is most likely to emerge—educated urbanites—are also, on average, the country’s least happy people. The authors also report (p. 14) that, in China, “the effect of income increases on life satisfaction are limited.” A legitimacy-based theory predicts that the CCP is surviving because it is making and keeping its citizens happy; instead, we see that it is surviving in spite of deepening unhappiness among key cohorts.

To me, this case further bares the specious logic behind most legitimacy-based explanations for political continuity. We believe that rebellion is an expression of popular dissatisfaction, a kind of referendum in the streets; we observe stability; so, we reason backwards from the absence of rebellion to the absence of dissatisfaction, sprinkle a little normative dust on it, and arrive at a positive concept called legitimacy. Formally, this is a fallacy of affirmative conclusion from a negative premise: happy citizens don’t rebel, no rebellion is occurring, therefore citizens must be happy. Informally, I think it’s a qualitative version of the “story time” process in which statistical modelers often indulge: get a surprising result, then make up a richer explanation for it that feels right.

I don’t mean to suggest that popular attitudes are irrelevant to political stasis and change, or that the durability of specific political regimes has nothing to do with the affinity between their institutional forms and the cultural contexts in which they’re operating. Like Xavier, though, I do believe that the conventional concept of legitimacy is too big and fuzzy to have any real explanatory power, and I think this new evidence from China reminds us of that point. If we want to understand how political regimes persist and when they break down, we need to identify mechanisms that are more specific than this one, and to embed them in theories that allow for more complexity.

A Plea for More Prediction

The second Annual Bank Conference on Africa happened in Berkeley, CA, earlier this week, and the World Bank’s Development Impact blog has an outstanding summary of the 50-odd papers presented there. If you have to pick between reading this post and that one, go there.

One paper on that roster that caught my eye revisits the choice of statistical models for the study of civil wars. As authors John Paul Dunne and Nan Tian describe, the default choice is logistic regression, although probit gets a little playing time, too. They argue, however, that a zero-inflated Poisson (ZIP) model matches the data-generating process better than either of these traditional picks, and they show that this choice affects what we learn about the causes of civil conflict.

Having worked on statistical models of civil conflict for nearly 20 years, I have some opinions on that model-choice issue, but those aren’t what I want to discuss right now. Instead, I want to wonder aloud why more researchers don’t use prediction as the yardstick—or at least one of the yardsticks—for adjudicating these model comparisons.

In their paper, Dunne and Tian stake their claim about the superiority of ZIP to logit and probit on comparisons of Akaike information criteria (AIC) and Vuong tests. Okay, but if their goal is to see if ZIP fits the underlying data-generating process better than those other choices, what better way to find out than by comparing out-of-sample predictive power?

Prediction is fundamental to the accumulation of scientific knowledge. The better we understand why and how something happens, the more accurate our predictions of it should be. When we estimate models from observational data and only look at how well our models fit the data from which they were estimated, we learn some things about the structure of that data set, but we don’t learn how well those things generalize to other relevant data sets. If we believe that the world isn’t deterministic—that the observed data are just one of many possible realizations of the world—then we need to care about that ability to generalize, because that generalization and the discovery of its current limits is the heart of the scientific enterprise.

From a scientific standpoint, the ideal world would be one in which we could estimate models representing rival theories, then compare the accuracy of the predictions they generate across a large number of relevant “trials” as they unfold in real time. That’s difficult for scholars studying big but rare events like civil wars and wars between states; though; a lot of time has to pass before we’ll see enough new examples to make a statistically powerful comparison across models.

But, hey, there’s an app for that—cross-validation! Instead of using all the data in the initial estimation, hold some out to use as a test set for the models we get from the rest. Better yet, split the data into several equally-sized folds and then iterate the training and testing across all possible groupings of them (k-fold cross-validation). Even better, repeat that process a bunch of times and compare distributions of the resulting statistics.

Prediction is the gold standard in most scientific fields, and cross-validation is standard practice in many areas of applied forecasting, because they are more informative than in-sample tests. For some reason, political science still mostly eschews both.* Here’s hoping that changes soon.

* For some recent exceptions to this rule on topics in world politics, see Ward, Greenhill, and Bakke and Blair, Blattman, and Hartman on predicting civil conflict; Chadefaux on warning signs of interstate war; Hill and Jones on state repression; and Chenoweth and me on the onset of nonviolent campaigns.

Another Tottering Step Toward a New Era of Data-Making

Ken Benoit, Drew Conway, Benjamin Lauderdale, Michael Laver, and Slava Mikhaylov have an article forthcoming in the American Political Science Review that knocked my socks off when I read it this morning. Here is the abstract from the ungated version I saw:

Empirical social science often relies on data that are not observed in the field, but are transformed into quantitative variables by expert researchers who analyze and interpret qualitative raw sources. While generally considered the most valid way to produce data, this expert-driven process is inherently difficult to replicate or to assess on grounds of reliability. Using crowd-sourcing to distribute text for reading and interpretation by massive numbers of non-experts, we generate results comparable to those using experts to read and interpret the same texts, but do so far more quickly and flexibly. Crucially, the data we collect can be reproduced and extended transparently, making crowd-sourced datasets intrinsically reproducible. This focuses researchers’ attention on the fundamental scientific objective of specifying reliable and replicable methods for collecting the data needed, rather than on the content of any particular dataset. We also show that our approach works straightforwardly with different types of political text, written in different languages. While findings reported here concern text analysis, they have far-reaching implications for expert-generated data in the social sciences.

The data-making strategy they develop is really innovative, and the cost of implementing is, I estimate from the relevant tidbits in the paper, 2–3 orders of magnitude lower than the cost of the traditional expert-centric approach. In other words, this is potentially a BIG DEAL for social-science data-making, which, as Sinan Aral reminds us, is a BIG DEAL for doing better social science.

That said, I do wonder how much structure is baked into the manifesto-coding task that isn’t there in most data-making problems, and that makes it especially well suited to the process the authors develop. In the exercise the paper describes:

  1. The relevant corpus (party manifestos) is self-evident, finite, and not too large;
  2. The concepts of interest (economic vs. social policy, left vs. right) are fairly intuitive; and
  3. The inferential task is naturally “fractal”; that is, the concepts of interest inhere in individual sentences (and maybe even words) as well as whole documents.

None of those attributes holds when it comes to coding latent socio-political structural features like de facto forms of government (a.k.a. regime type) or whether or not a country is in a state of civil war. These features are fundamental to analyses of international politics, but the high cost of producing them means that we sometimes don’t get them at all, and when we do, we usually don’t get them updated as quickly or as often as we would need to do more dynamic analysis and prediction. Maybe it’s my lack of imagination, but I can’t quite see how to extend the authors’ approach to those topics without stretching it past the breaking point. I can think of ways to keep the corpus manageable, but the concepts are not as intuitive, and the inferential task is not fractal. Ditto for coding event data, where I suspect that 2 from the list above would mostly hold; 3 would sometimes hold; but 1 absolutely would not.*

In short, I’m ga-ga about this paper and the directions in which it points us, but I’m not ready yet to declare imminent victory in the struggle to drag political science into a new and much healthier era of data-making. (Fool me once…)

* If you think I’m overlooking something here, please leave a comment explaining how you think it might be do-able.

Visualizing Strike Activity in China

In my last post, I suggested that the likelihood of social unrest in China is probably higher than a glance at national economic statistics would suggest, because those statistics conceal the fact that economic malaise is hitting some areas much harder than others and local pockets of unrest can have national effects (ask Mikhail Gorbachev about that one). Near the end of the post, I effectively repeated this mistake by showing a chart that summarized strike activity over the past few years…at the national level.

So, what does the picture look like if we disaggregate that national summary?

The best current data on strike activity in China come from China Labour Bulletin (CLB), a Hong Kong–based NGO that collects incident reports from various Chinese-language sources, compiles them in a public data set, and visualizes them in an online map. Those data include a few fields that allow us to disaggregate our analysis, including the province in which an incident occurred (Location), the industry involved (Industry), and the claims strikers made (Demands). On May 28, I downloaded a spreadsheet with data for all available dates (January 2011 to the present) for all types of incidents and wrote an R script that uses small multiples to compare strike activity across groups within each of those categories.

First, here’s the picture by province. This chart shows that Guangdong has been China’s most strike-prone province over the past several years, but several other provinces have seen large increases in labor unrest in the past two years, including Henan, Hebei, Hubei, Shandong, Sichuan, and Jiangsu. Right now, I don’t have monthly or quarterly province-level data on population size and economic growth to model the relationship among these things, but a quick eyeballing of the chart from the FT in my last post indicates that these more strike-prone provinces skew toward the lower end of the range of recent GDP growth rates, as we would expect.

sparklines.province

Now here’s the picture by industry. This chart makes clear that almost all of the surge in strike activity in the past year has come from two sectors: manufacturing and construction. Strikes in the manufacturing sector have been trending upward for a while, but the construction sector really got hit by a wave in just the past year that crested around the time of the Lunar New Year in early 2015. Other sectors also show signs of increased activity in recent months, though, including services, mining, and education, and the transportation sector routinely contributes a non-negligible slice of the national total.

sparklines.industry

And, finally, we can compare trends over time in strikers’ demands. This analysis took a little more work, because the CLB data on Demands do not follow best coding practices in which a set of categories is established a priori and each demand is assigned to one of those categories. In the CLB data, the Demands field is a set of comma-delimited phrases that are mostly but not entirely standardized (e.g., “wage arrears” and “social security” but also “reduction of their operating territory” and “gas-filing problem and too many un-licensed cars”). So, to aggregate the data on this dimension, I created a few categories of my own and used searches for regular expressions to find records that belonged in them. For example, all events for which the Demands field included “wage arrear”, “pay”, “compensation”, “bonus” or “ot” got lumped together in a Pay category, while events involving claims marked as “social security” or “pension” got combined in a Social Security category (see the R script for details).

The results appear below. As CLB has reported, almost all of the strike activity in China is over pay, usually wage arrears. There’s been an uptick in strikes over layoffs in early 2015, but getting paid better, sooner, or at all for work performed is by far the chief concern of strikers in China, according to these data.

sparklines.demands

In closing, a couple of caveats.

First, we know these data are incomplete, and we know that we don’t know exactly how they are incomplete, because there is no “true” record to which they can be compared. It’s possible that the apparent increase in strike activity in the past year or two is really the result of more frequent reporting or more aggressive data collection on a constant or declining stock of events.

I doubt that’s what’s happening here, though, for two reasons. One, other sources have reported the Chinese government has actually gotten more aggressive about censoring reports of social unrest in the past two years, so if anything we should expect the selection bias from that process to bend the trend in the opposite direction. Two, theory derived from historical observation suggests that strike activity should increase as the economy slows and the labor market tightens, and the observed data are consistent with those expectations. So, while the CLB data are surely incomplete, we have reason to believe that the trends they show are real.

Second, the problem I originally identified at the national level also applies at these levels. China’s provinces are larger than many countries in the world, and industry segments like construction and manufacturing contain a tremendous variety of activities. To really escape the ecological fallacy, we would need to drill down much further to the level of specific towns, factories, or even individuals. As academics would say, though, that task lies beyond the scope of the current blog post.

In China, Don’t Mistake the Trees for the Forest

Anyone who pays much attention to news of the world knows that China’s economy is cooling a bit. Official statistics—which probably aren’t true but may still be useful—show annual growth slowing from over 7.5 to around 7 percent or lower and staying there for a while.

For economists, the big question seems to be whether or not policy-makers can control the descent and avoid a hard landing or crash. Meanwhile, political scientists and sociologists wonder whether or not that economic slowdown will spur social unrest that could produce a national political crisis or reform. Most of what I remember reading on the topic has suggested that the risk of large-scale social unrest will remain low as long as China avoids the worst-case economic scenarios. GDP growth in the 6–7 percent range would be a letdown, but it’s still pretty solid compared to most places and is hardly a crisis.

I don’t know enough about economics to wade into that field’s debate, but I do wonder if an ecological fallacy might be leading many political scientists to underestimate the likelihood of significant social unrest in China in response to this economic slowdown. We commit an ecological fallacy when we assume that the characteristics of individuals in a group match the central tendencies of that group—for example, assuming that a kid you meet from a wealthy, high-performing high school is rich and will score well on the SAT. Put another way, an ecological fallacy involves mistakenly assuming that each tree shares the characteristic features of the forest they comprise.

Now consider the chart below, from a recent article in the Financial Times about the uneven distribution of economic malaise across China’s provinces. As the story notes, “The slowdown has affected some areas far worse than others. Perhaps predictably, the worst-hit places are those that can least afford it.”

The chart reminds us that China is a large and heterogeneous country—and, as it happens, social unrest isn’t a national referendum. You don’t need a majority vote from a whole country to get popular protest that can threaten to reorder national politics; you just need to reach a critical point, and that point can often be reached with a very small fraction of the total population. So, instead of looking at national tendencies to infer national risk, we should look at the tails of the relevant distributions to see if they’re getting thicker or longer. The people and places at the wrong ends of those distributions represent pockets of potential unrest; other things being equal, the more of them there are, the greater the cumulative probability of relevant action.

So how do things look in that thickening tail? Here again is that recent story in the FT:

Last month more than 30 provincial taxi drivers drank poison and collapsed together on the busiest shopping street in Beijing in a dramatic protest against economic and working conditions in their home town.

The drivers, who the police say all survived, were from Suifenhe, a city on the Russian border in the northeastern province of Heilongjiang…

Heilongjiang is among the poorest performers. While national nominal growth slipped to 5.8 per cent in the first quarter compared with a year earlier — its lowest level since the global financial crisis — the province’s nominal GDP actually contracted, by 3.2 per cent.

In the provincial capital of Harbin, signs of economic malaise are everywhere.

The relatively small, ritual protest described at the start of that block quote wouldn’t seem to pose much threat to Communist Party rule, but then neither did Mohamed Bouazizi’s self-immolation in Tunisia in December 2010.

Meanwhile, as the chart below shows, data collected by China Labor Bulletin show that the incidence of strikes and other forms of labor unrest has increased in China in the past year. Each such incident is arguably another roll of the dice that could blow up into a larger and longer episode. Any one event is extremely unlikely to catalyze a larger campaign that might reshape national politics in a significant way, but the more trials run, the higher the cumulative probability.

Monthly counts of labor incidents in China, January 2012-May 2015 (data source: China Labor Bulletin)

Monthly counts of labor incidents in China, January 2012-May 2015 (data source: China Labor Bulletin)

The point of this post is to remind myself and anyone bothering to read it that statistics describing the national economy in the aggregate aren’t a reliable guide to the likelihood of those individual events, and thus of a larger and more disruptive episode, because they conceal important variation in the distribution they summarize. I suspect that most China experts already think in these terms, but I think most generalists (like me) do not. I also suspect that this sub-national variation is one reason why statistical models using country-year data generally find weak association between things like economic growth and inflation on the one hand and demonstrations and strikes on the other. Maybe with better data in the future, we’ll find stronger affirmation of the belief many of us hold that economic distress has a strong effect on the likelihood of social unrest, because we won’t be forced into an ecological fallacy by the limits of available information.

Oh, and by the way: the same goes for Russia.

About That Apparent Decline in Violent Conflict…

Is violent conflict declining, or isn’t it? I’ve written here and elsewhere about evidence that warfare and mass atrocities have waned significantly in recent decades, at least when measured by the number of people killed in those episodes. Not everyone sees the world the same way, though. Bear Braumoeller asserts that, to understand how war prone the world is, we should look at how likely countries are to use force against politically relevant rivals, and by this measure the rate of warfare has held pretty steady over the past two centuries. Tanisha Fazal argues that wars have become less lethal without becoming less frequent because of medical advances that help keep more people in war zones alive. Where I have emphasized war’s lethal consequences, these two authors emphasize war’s likelihood, but their arguments suggest that violent conflict hasn’t really waned the way I’ve alleged it has.

This week, we got another important contribution to the wider debate in which my shallow contributions are situated. In an updated working paper, Pasquale Cirillo and Nassim Nicholas Taleb claim to show that

Violence is much more severe than it seems from conventional analyses and the prevailing “long peace” theory which claims that violence has declined… Contrary to current discussions…1) the risk of violent conflict has not been decreasing, but is rather underestimated by techniques relying on naive year-on-year changes in the mean, or using sample mean as an estimator of the true mean of an extremely fat-tailed phenomenon; 2) armed conflicts have memoryless inter-arrival times, thus incompatible with the idea of a time trend.

Let me say up front that I only have a weak understanding of the extreme value theory (EVT) models used in Cirillo and Taleb’s paper. I’m a political scientist who uses statistical methods, not a statistician, and I have neither studied nor tried to use the specific techniques they employ.

Bearing that in mind, I think the paper successfully undercuts the most optimistic view about the future of violent conflict—that violent conflict has inexorably and permanently declined—but then I don’t know many people who actually hold that view. Most of the work on this topic distinguishes between the observed fact of a substantial decline in the rate of deaths from political violence and the underlying risk of those deaths and the conflicts that produce them. We can (partly) see the former, but we can’t see the latter; instead, we have to try to infer it from the conflicts that occur. Observed history is, in a sense, a single sample drawn from a distribution of many possible histories, and, like all samples, this one is only a jittery snapshot of the deeper data-generating process in which we’re really interested. What Cirillo and Taleb purport to show is that long sequences of relative peace like the one we have seen in recent history are wholly consistent with a data-generating process in which the risk of war and death from it have not really changed at all.

Of course, the fact that a decades-long decline in violent conflict like the one we’ve seen since World War II could happen by chance doesn’t necessarily mean that it is happening by chance. The situation is not dissimilar to one we see in sports when a batter or shooter seems to go cold for a while. Oftentimes that cold streak will turn out to be part of the normal variation in performance, and the athlete will eventually regress to the mean—but not every time. Sometimes, athletes really do get and stay worse, maybe because of aging or an injury or some other life change, and the cold streak we see is the leading edge of that sustained decline. The hard part is telling in real time which process is happening. To try to do that, we might look for evidence of those plausible causes, but humans are notoriously good at spotting patterns where there are none, and at telling ourselves stories about why those patterns are occurring that turn out to be bunk.

The same logic applies to thinking about trends in violent conflict. Maybe the downward trend in observed death rates is just a chance occurrence in an unchanged system, but maybe it isn’t. And, as Andrew Gelman told Zach Beauchamp, the statistics alone can’t answer this question. Cirillo and Taleb’s analysis, and Braumoeller’s before it, imply that the history we’ve seen in the recent past  is about as likely as any other, but that fact isn’t proof of its randomness. Just as rare events sometimes happen, so do systemic changes.

Claims that “This time really is different” are usually wrong, so I think the onus is on people who believe the underlying risk of war is declining to make a compelling argument about why that’s true. When I say “compelling,” I mean an argument that a) identifies specific causal mechanisms and b) musters evidence of change over time in the presence or prevalence of those mechanisms. That’s what Steven Pinker tries at great length to do in The Better Angels of Our Nature, and what Joshua Goldstein did in Winning the War on War.

My own thinking about this issue connects the observed decline in the the intensity of violent conflict to the rapid increase in the past 100+ years in the size and complexity of the global economy and the changes in political and social institutions that have co-occurred with it. No, globalization is not new, and it certainly didn’t stop the last two world wars. Still, I wonder if the profound changes of the past two centuries are accumulating into a global systemic transformation akin to the one that occurred locally in now-wealthy societies in which organized violent conflict has become exceptionally rare. Proponents of democratic peace theory see a similar pattern in the recent evidence, but I think they are too quick to give credit for that pattern to one particular stream of change that may be as much consequence as cause of the deeper systemic transformation. I also realize that this systemic transformation is producing negative externalities—climate change and heightened risks of global pandemics, to name two—that could offset the positive externalities or even lead to sharp breaks in other directions.

It’s impossible to say which, if any, of these versions is “true,” but the key point is that we can find real-world evidence of mechanisms that could be driving down the underlying risk of violent conflict. That evidence, in turn, might strengthen our confidence in the belief that the observed pattern has meaning, even if it doesn’t and can’t prove that meaning or any of the specific explanations for it.

Finally, without deeply understanding the models Cirillo and Taleb used, I also wondered when I first read their new paper if their findings weren’t partly an artifact of those models, or maybe some assumptions the authors made when specifying them. The next day, David Roodman wrote something that strengthened this source of uncertainty. According to Roodman, the extreme value theory (EVT) models employed by Cirillo and Taleb can be used to test for time trends, but the ones described in this new paper don’t. Instead, Cirillo and Taleb specify their models in a way that assumes there is no time trend and then use them to confirm that there isn’t. “It seems to me,” Roodman writes, “that if Cirillo and Taleb want to rule out a time trend according to their own standard of evidence, then they should introduce one in their EVT models and test whether it is statistically distinguishable from zero.”

If Roodman is correct on this point, and if Cirillo and Taleb were to do what he recommends and still find no evidence of a time trend, I would update my beliefs accordingly. In other words, I would worry a little more than I do now about the risk of much larger and deadlier wars occurring again in my expected lifetime.

Follow

Get every new post delivered to your Inbox.

Join 12,511 other followers

%d bloggers like this: