The Steep Slope of the Data Revolution’s Second Derivative

Most of the talk about a social science “data revolution” has emphasized rapid increases in the quantity of data available to us. Some of that talk has also focused on changes in the quality of those data, including new ideas about how to separate the wheat from the chaff in situations where there’s a lot of grain to thresh. So far, though, we seem to be talking much less about the rate of change in those changes, or what calculus calls the second derivative.

Lately, the slope of this second derivative has been pretty steep. It’s not just that we now have much more, and in some cases much better, data. The sources and content of those data sets are often fast-moving targets, too. The whole environment is growing and churning at an accelerating pace, and that’s simultaneously exhilarating and frustrating.

It’s frustrating because data sets that evolve as we use them create a number of analytical problems that we don’t get from stable measurement tools. Most important, evolving data sets make it hard to compare observations across time, and longitudinal analysis is the crux of most social-scientific research. Paul Pierson explains why in his terrific 2004 book, Politics in Time:

Why do social scientists need to focus on how processes unfold over significant stretches of time? First, because many social processes are path dependent, in which case the key causes are temporally removed from their continuing effects… Second, because sequencing—the temporal order of events or processes—can be a crucial determinant of important social outcomes. Third, because many important social causes and outcomes are slow-moving—they take place over quite extended periods of time and are only likely to be adequately explained (or in some cases even observed in the first place) if analysts are specifically attending to that possibility.

When our measurement systems evolve as we use them, changes in the data we receive might reflect shifts in the underlying phenomenon. They also might reflect changes in the methods and mechanisms by which we observe and record information about that phenomenon, however, and it’s often impossible to tease the one out from the other.

recent study by David Lazer, Gary King, Ryan Kennedy, and Alessandro Vespignani on what Google Flu Trends (GFT) teaches us about “traps in Big Data analysis” offers a nice case in point. Developed in the late 2000s by Google engineers and researchers at the Centers for Disease Control and Prevention, GFT uses data on Google search queries to help detect flu epidemics (see this paper). As Lazer and his co-authors describe, GFT initially showed great promise as a forecasting tool, and its success spurred excitement about the power of new data streams to shed light on important social processes. For the past few years, though, the tool has worked poorly on its own, and Lazer & co. believe of changes in Google’s search software are the reason. The problem—for researchers, anyway—is that

The Google search algorithm is not a static entity—the company is constantly testing and improving search. For example, the official Google search blog reported 86 changes in June and July 2012 alone (SM). Search patterns are the result of thousands of decisions made by the company’s programmers in various sub-units and by millions of consumers worldwide.

Google keeps tinkering with its search software because that’s what its business entails, but we can expect to see more frequent changes in some data sets specific to social science, too. One of the developments about which I’m most excited is the recent formation of the Open Event Data Alliance (OEDA) and the initial release of the machine-coded political event data it plans to start producing soon, hopefully this summer. As its name implies, OEDA plans to make not just its data but also its code freely available to the public in order to grow a community of users who can help improve and expand the software. That crowdsourcing will surely accelerate the development of the scraping and coding machinery, but it also ensures that the data OEDA produces will be a moving target for a while in ways that will complicate attempts to analyze it.

If these accelerated changes are challenging for basic researchers, they’re even tougher on applied researchers, who have to show and use their work in real time. So what’s an applied researcher to do when your data-gathering instruments are frequently changing, and often in opaque and unpredictable ways?

First, it seems prudent to build systems that are modular, so that a failure in one part of the system can be identified and corrected without having to rebuild the whole edifice. In the atrocities early-warning system I’m helping to build right now, we’re doing this by creating a few subsystems with some overlap in their functions. If one part doesn’t pan out or suddenly breaks, we can lean on the others while we repair or retool.

Second, it’s also a good idea to embed those technical systems in organizational procedures that emphasize frequent checking and fast adaptation. One way to do this is to share your data and code and to discuss your work often with outsiders as you go, so you can catch mistakes, spot alternatives, and see these changes coming before you get too far down any one path. Using open-source statistical software like R is also helpful in this regard, because it lets you take advantage of new features and crowd fixes as they bubble up.

Last and fuzziest, I think it helps to embrace the idea that you’re work doesn’t really belong to you or your organization but is just one tiny part of a larger ecosystem that you’re hoping to see evolve in a particular direction. What worked one month might not work the next, and you’ll never know exactly what effect you’re having, but that’s okay if you recognize that it’s not really supposed to be about you. Just keep up as best you can, don’t get too heavily invested in any one approach or idea, and try to enjoy the ride.

Proto-Democratization in China?

Yesterday’s Washington Post included a story that seemed to portray the Chinese Communist Party’s increasingly sophisticated efforts to track and respond to popular opinion as a kind of democratization. Reporter Simon Denyer wrote:

The government is trying to understand public opinion on an unprecedented scale. In response to government demand, opinion monitoring centers have sprung up in state-run news organizations and universities to mine and interpret the vast rivers of chatter on the Internet. At the same time, the authorities are hiring firms to poll people about everything from traffic management to tax policy.

According to Denyer, these endeavors represent a significant change from the past and are having a real impact on decision-making:

The idea of actually listening to the opinions of the Chinese people is a radical departure for a Communist dictatorship more used to persecuting ordinary citizens for their criticism…Increasingly, public opposition to a proposal can shape policy, although not yet on issues vital to the party’s interests, such as political reform.

When I read the article, it reminded me of a school of thought in Soviet studies that saw important (if underdeveloped) features of democracy in the workings of the Communist Party of the Soviet Union (CPSU).  By incorporating an array of interest groups and creating channels for members of those groups to transmit their concerns to Soviet leaders, the thinking went, the CPSU after Stalin had built a form of organized pluralism that wasn’t as different from Western democracy as we conventionally thought. Other Sovietologists, however, countered that these claims about interest-group politics missed the forest for the trees. In a society that still had gulags and secret police and sharp limits on public speech, they argued, the hints of pluralism and responsiveness that some saw in CPSU politics were overwhelmed by the enduring organizational and cultural legacies of totalitarianism.

So who was right, and what does this tell us about China today? I think Charles Tilly’s ideas about democracy provide a useful fulcrum here. In contrast to procedural definitions of democracy that start (and sometimes end) with elections, Tilly jumps up one level of abstraction to emphasize the broader issue of consultation. In his words (2007: 13-14),

A regime is democratic to the degree that political relations between the state and its citizens feature broad, equal, protected, and mutually binding consultation.

In their classic discussion of what democracy is and is not (here), Philippe Schmitter and Terry Karl get at the same general idea with their emphasis on the principle of accountability. Still, I think Tilly’s notion of consultation better matches what most of us have in mind because of its more affirmative connotations. To me, accountability implies responsibility after the fact, the idea of holding someone to account for things he or she has already done. By contrast, consultation connotes a much wider array of interactions at all stages of the policy-making process—setting an agenda, formulating options, debating those options, making a decision, and evaluating the results—that better accords with the notion of government of, by, and for citizens.

At this level of abstraction, there’s no single form of consultation that is necessary and sufficient to qualify a regime as democratic, no single route across that threshold, and no point in time at which the process is completed. Elections are the chief mechanism we use today, but they are not the only form of routinized consultation that is possible or that matters. Political philosophers continue to discuss the merits of alternatives like deliberative or direct democracy, and some observers argue that new communications technologies are making these alternatives more realistic for large societies than ever.

So, back to that China story. Using Tilly’s definition as a prism, I think it’s easier to see why those social-media monitoring efforts and polling firms in China call democracy to mind, but also what’s different about them. Asking people what they think and listening and responding to their online chatter are forms of consultation, but this consultation isn’t protected, equal, or binding. It’s not protected because Chinese citizens still face harsh punishment for speaking out on sensitive topics. The state still chooses who gets to speak about what, and transgressions of those boundaries carry steep costs. The consultation isn’t equal because not everyone can participate. According to Denyer, “Chinese villagers, who still account for nearly half of the population, are not comfortable expressing their views to strangers and are generally not active online.”

Finally and probably most important, the consultation isn’t binding because the state decides when it will respond to what it hears, and citizens still have no way to hold them accountable for those choices. This is why competitive elections are so important. Without a formal mechanism that gives all citizens a chance to reward or punish political decision-makers for their behavior, the Chinese Communist Party can continue to cherry-pick its “listening” efforts in ways that are meant to maximize its own corporate interests without really attending to citizens’ preferences. There may be an element of democratization in these polling and eavesdropping endeavors, but if so, it’s an awfully thin and fragile form of it.

PS. For an excellent academic treatment of China’s online monitoring efforts, see “How Censorship in China Allows Government Criticism but Silences Collective Expression” by Gary King, Jennifer Pan, and Margaret E. Roberts. For a nice review of the debate over pluralism in the USSR, see this 1984 paper by Jeffrey Hahn.

Why the Communist Party of China Is Right to Worry about Popular Protests

China’s rulers are very nervous about collective action by their own citizens, and they have reason to be. Statistical forecasting of democratic transitions supports the supposition that, far more than leadership change or a slumping economy, the mobilization of nonviolent uprisings is what could tip China toward deep political reform. In the short term, the most likely outcome under all scenarios is a continuation of Communist rule, but the path to democratization seems almost certain to run through popular protests.

We know that the Communist Party of China (CPC) is very worried about collective action because they’re showing it. According to a recent study by social scientists Gary King, Jennifer Pan, and Molly Roberts of more than 1,400 social-media services in China,

Contrary to previous understandings, posts with negative, even vitriolic, criticism of the state, its leaders, and its policies are not more likely to be censored. Instead, we show that the censorship program is aimed at curtailing collection action by silencing comments that represent, reinforce, or spur social mobilization, regardless of content. Censorship is oriented toward attempting to forestall collective activities that are occurring now or may occur in the future — and, as such, seem to clearly expose government intent, such as examples we offer where sharp increases in censorship presage government action outside the Internet.

We also know that, in spite of these efforts, popular protests are still happening in China, and their frequency seems to be increasing, particularly around issues of environmental protection and public health. According to a recent post on the International Herald Tribune‘s IHT Rendezvous blog,

Although there are tens of thousands of civic protests every year in China, most are small-scale, ineffectual and officially smothered. But high profile demonstrations over environmental issues are occurring with more regularity, size, violence and political oomph.

That last point about the size, violence, and “political oomph” of these popular challenges was driven home by photographs from a recent protest in the eastern Chinese city of Qidong, where citizens confronted local authorities over their plans to dump waste water from a paper plant and compelled them to reverse course.

How much of threat does contentious collective action really pose to Communist Party rule, though? To try to answer this question, I used a statistical model designed to predict switches from authoritarian to democratic rule to estimate the likelihood of that event’s occurrence in China under various alternative scenarios. Technical details follow at the end of this post, but here I’ll simply note that the model controls for several risk factors widely thought to influence the odds of a democratic transition, including prior experience with democracy, the duration of authoritarian rule, natural-resource wealth, and the end of the Cold War. On top of those structural conditions, the model also considers the following more dynamic factors (with the “other things being equal” caveat attached to all of the following statements).

  • Leadership Change. Democratic transitions are more than three times as likely for at least a few years after a new leader takes office as they are under longer-tenured leaders.
  • Economic Growth. As expected, transitions are more likely when growth is slower.
  • Civil Liberties. Also as expected, transitions are more likely to occur in autocracies that impose fewer restrictions on freedoms of speech, association, and assembly.
  • Nonviolent Rebellion. Autocracies are more than eight times as likely to transition to democracy when challenged by nonviolent civil-resistance movements as they are when these organized popular challenges are absent.

To see what this model says about prospects for democratization in China, I fed it values of the relevant variables under several different scenarios, starting with one representing the current state of play. In all of the scenarios, China experiences a leadership change in 2012 as expected, an event that should already more than triple its risk of democratization.

  • Baseline. GDP growth hits the Party’s latest target of about 7.5 percent, civil liberties remain unchanged, and no civil-resistance movements emerge in 2012.
  • Slow Growth. GDP growth slumps to a more bearish 5 percent in 2012, but civil liberties remain unchanged and no civil-resistance movements emerge.
  • Modest Liberalization. Civil liberties expand slightly, moving from 6 to 5 on Freedom House’s inverted seven-point scale, but growth hits current targets and no civil-resistance movements arise.
  • Popular Challenge. One or more nonviolent movements emerge, even as growth reaches current targets and no political liberalization occurs.
  • Popular Challenge and Slow Growth. Growth slows to 5 percent and one or more nonviolent movements emerge while the Party holds steady on civil liberties.
  • Crackdown. Growth slows to 5 percent and nonviolent movements emerge, but the Party responds by tightening the screws, dropping the country’s civil-liberties score from 6 to 7.

Now, here are the predicted probabilities of a democratic transition occurring in 2013 we get when we plug in the numbers for these various scenarios.

A few things stand out to me from that chart.

First and foremost, the likelihood of a transition to democracy occurring before the end of 2013 appears to be quite small in absolute terms, and that doesn’t change much under any of these scenarios. The predicted probabilities in the chart range from less than 0.001 under the baseline scenario—which already takes into account the leadership change that’s occurring this year—to a maximum of roughly 0.005 under the popular challenge-plus-slow growth scenario. Because democratic transitions are so rare—on average, only one or two of these happen worldwide in any given year—the forecasts this model produces are always skewed toward zero. Even taking that downward bias into account, however, these numbers are pretty small. To put them in perspective: a country would need to score about a 0.05 to land in the top fifth of all authoritarian regimes in any given year, and nearly all transitions have historically happened in countries somewhere in that fifth. In short, continuation of the status quo is by far the most likely outcome in China over the next year, and we shouldn’t lose sight of that.

Bearing that in mind, I’m struck by how little the forecasts are moved by the ongoing leadership change and the prospect of a sharp economic slowdown. The former is already factored into the baseline forecast, which hovers perilously close to zero. As for the GDP growth rate, a drop from 7.5 percent to 5 percent would be a tremendous slump for China, unprecedented in its recent history, but the model suggests that variations of a few percentage points have not historically had much effect on the odds of regime change.

Last but not least, the chart clearly shows how strong the historical association is between the organization of civil resistance to authoritarian rule and the occurrence of democratic transitions, and what that patterns suggests about how democratization is likely to come about in China. The few scenarios that finally push the forecast upward all involve popular mobilization, and even a crackdown in response to that kind of agitation doesn’t do much to reverse that push.

Thus, of all the things that might happen in China in the next several months, the one that would probably have the biggest impact on near-term prospects for a democratic transition is the successful organization of a civil-resistance movement calling for fundamental changes to China’s political system. Intriguingly, statistical modeling of the conditions under which these movements get started suggests that, of all the countries in the world, China—because of its size and socio-economic development—is the most likely place for this kind of movement to emerge.


The model I used to generate these scenarios is a logistic regression model that was estimated with the ‘glm’ command in R from data for all authoritarian regimes in the world during the period 1972-2008. In the jargon of event history analysis, this is a “discrete-time logit” model that considers the risk of a democratic transition in annual slices while controlling for duration dependency, parameterized here as the natural log of the authoritarian regime’s lifespan in years, interacted with a binary indicator for countries that have attempted democracy before. The resulting model includes the following parameters:

p(transition to democracy | authoritarian rule) = f { any history of democracy + log(regime duration) + [history of democracy * log(regime duration)] + post-Cold War period + energy and mineral extraction as a % of GNI + civil-liberties index + annual GDP growth + any civil-resistance movements + any leadership changes in past three years }

The data on regimes and regime transitions used in this analysis comes from a data set I created for the Political Instability Task Force (PITF). The indicator of civil-resistance movements was taken from a data set created by Erica Chenoweth and Maria Stephan for their widely-cited work on “why civil resistance works.” Data on GDP growth and energy and mineral extraction come from the World Bank’s World Development Indicators. The civil-liberties index is produced by Freedom House, and the indicator of leadership change is based on another data set created for PITF, this one by Monty Marshall.

If you are interested in seeing detailed results from this analysis the data used in it, please email me at ulfelder <at> gmail.

  • Author

  • Follow me on Twitter

  • Follow Dart-Throwing Chimp on
  • Enter your email address to follow this blog and receive notifications of new posts by email.

    Join 13,612 other followers

  • Archives

%d bloggers like this: