In Praise of Fun Projects

Over the past year, I’ve watched a few people I know in digital life sink a fair amount of time into statistical modeling projects that other people might see as “just for fun,” if not downright frivolous. Last April, for example, public-health grad student Brett Keller delivered an epic blog post that used event history models to explore why some competitors survive longer than others in the fictional Hunger Games. More recently, sociology Ph.D. student Alex Hanna has been using the same event history techniques to predict who’ll get booted each week from the reality TV show RuPaul’s Drag Race (see here and here so far). And then there’s Against the Spread, a nascent pro-football forecasting project from sociology Ph.D. candidate Trey Causey, whose dissertation uses natural language processing and agent-based modeling to examine information ecology in authoritarian regimes.

I happen to think these kinds of projects are a great idea, if you can find the time to do them–and if you’re reading this blog post, you probably can. Based on personal experience, I’m a big believer in learning by doing. Concepts don’t stick in my brain when I only read about them; I’ve got to see the concepts in action and attach them to familiar contexts and examples to really see what’s going on. Blog posts like Brett’s and Alex’s are a terrific way to teach yourself new methods by applying them to toy problems where the data sets are small, the domain is familiar and interesting, and the costs of being wrong are negligible.


A bigger project like Trey’s requires you to solve a lot of complex procedural and methodological problems, but all the skills you develop along the way transfer to other domains. If you can build and run a decent forecasting system from scratch for something as complex as pro football, you can do the same for “seriouser” problems, too. I think that demonstrated skill on fun tasks says as much about someone’s ability to execute complex research in the real world as any job talk or publication in a peer-reviewed journal. Done well, these hobby projects can even evolve into rewarding enterprises of their own. Just ask Nate Silver, who kickstarted his now-prodigious career as a statistical forecaster with PECOTA, a baseball forecasting system that he ginned up for fun while working for pay as a consultant.

I suspect that a lot of people in the private sector already get this. Academia, not so much, but then they’re the ones who wind up poorer for it.

Episodes of Democracy and Autocracy: A New Data Set

To look for patterns in the occurrence of transitions to democracy and democratic breakdowns around the world over time, we need reliable observations of where and when those events have happened. Most statistical analyses of democratic transitions in the past 15 years have used either Polity or the Democracy-Dictatorship (DD) data set to do that. As part of my work for the Political Instability Task Force (PITF), I developed yet another data set on episodes of democratic and authoritarian government in countries worldwide with populations larger than 500,000. The results of that work—I’m calling it the Democracy/Autocracy Data Set (DAD)—are now publicly available on the Dataverse Network, a data-sharing platform operated by Harvard University’s Institute for Quantitative Social Science.*

Like DD, DAD sorts cases annually into two bins: democracies and non-democracies.  Countries are identified as democracies when they satisfy all of several criteria, like items on a checklist. Cases that fail to satisfy one or more of those criteria are identified as non-democracies. Those criteria are meant to be indicative of four broader conditions essential to democracy:

  • Elected officials rule. Representatives chosen by citizens actually make policy, and unelected individuals, bodies, and organizations cannot veto those representatives’ decisions.
  • Elections are fair and competitive. The process by which citizens elect their rulers provides voters with meaningful choice and is free from deliberate fraud or abuse.
  • Politics is inclusive. Adult citizens have equal rights to vote and participate in government and fair opportunity to exercise those rights.
  • Civil liberties are protected. Freedoms of speech, association, and assembly give citizens the chance to deliberate on their interests, to organize in pursuit of those interests, and to monitor the performance of their elected representatives and the bureaucracies on which those officials depend.

Conceptually, these conditions are very similar to the ones used in constructing the DD data set. So why bother doing it all over again? The impetus to re-invent this particular wheel came from concerns I had about the effects of a couple of ancillary rules the makers of the original DD data set used to make decisions about ambiguous cases. As I saw it, those rules systematically skewed the resulting data in ways that are especially problematic for the kind of survival analysis those authors and many others have done with them. I won’t belabor the issue here, but interested readers can find more on the subject in this paper of mine on SSRN.

DAD was designed with survival analysis in mind, so it includes duration of current status, indicators of change from current status, and running counts of past events of both types (transitions to and from democracy). Importantly, those running counts include episodes before 1955, so at least that portion of the data set is not left-censored. Unlike DD, DAD does not differentiate within the two bins among types of democracy and dictatorship. Also unlike DD, however, DAD does track times to first alternations in power within democratic episodes—by individual chief executive and by ruling party/coalition—and it differentiates among democratic breakdowns by their form: executive coup (a.k.a. consolidation of incumbent advantage), military coup, rebellion, or other.

As a kind of bonus, DAD also includes annual data on each countries’ participation in a host of regional and global intergovernmental organizations and treaty regimes—data I used for this paper, which looks at the effects of international integration on prospects for transitions to and from democracy. Those data are also available as a standalone data set through ICPSR (link).

For other published or publicly available research I’ve done with DAD, see here, here, here, here, and here.

Based on my experience working with Polity, DD, and Freedom House’s Freedom in the World data, I can say a little bit about how the various sources compare to one another. In its calls on which regimes are democratic, DAD is closest to Freedom House’s annual list of electoral democracies. DD is generally more cautious than DAD, identifying as dictatorships some cases where DAD sees (usually short-lived) spells of democratic government that ended with a consolidation of incumbent advantage. Polity runs the opposite way, identifying as more democratic than autocratic many cases where DAD sees an autocracy (e.g. Russia and Armenia today).

At present, I am not planning to update DAD. Still, I hope it’s a useful resource and welcome comments and criticisms. Again, you can find the data set and supporting documentation on the Dataverse Network.

* This research was conducted for the Political Instability Task Force (PITF). The PITF is funded by the Central Intelligence Agency (CIA). The views expressed herein are the author’s alone and do not necessarily represent the views of the Task Force or the U.S. Government.

More on Democratic Consolidation and Time

In yesterday’s post, I used survival analysis to look at the relationship between the age of a democratic regime and the risk of democratic breakdown. From those estimates, I concluded that traditional thinking about that relationship was probably wrong. Other things being equal, democracies are actually more likely to fail as time passes, at least up until their late teens or early 20s.

That post drew some incisive comments from Joe Wright, a professor at Pennsylvania State University who has done some terrific work with similar data and methods. One of Joe’s recommendations was to look at the same relationship in subsets of democracies to see if there are any telling variations.

So, let’s do it.

In his comments, Joe suggested subsetting either by the type of democracy (presidential vs. parliamentary) or by the type of authoritarian regime that immediately preceded each episode of democracy. Unfortunately, the data set I’m using doesn’t include information about either of those, so I’m going to take a different tack. Here, I’m going to compare baseline hazards (the term of art for the relationship we’re plotting) across groups defined by: a) whether or not there has been at least one transfer of power between political parties since the “birth” of the democratic regime; and b) the degree of protection for civil liberties (categorized as low, medium, and high).

The figure below juxtaposes plots for groups defined by any alternation in power. In democracies where no alternation in power has yet occurred (the plot on the left), we see essentially the same pattern we saw in the full sample. In democracies where at least one alternation has already occurred (on the right), we see no real association between regime age and risk.

The next figure does the same thing with groups defined by the strength of civil liberties. Here, I split the sample into three groups: low (5-7 on Freedom House’s scale); medium (3-4); and high (1-2). As it happens, there are only two instances of democratic breakdown during my period of observation in the last of those bins–which Freedom House calls “consolidated democracies”–so I’ve set that group aside and just plotted baseline hazards for the low and medium groups. The results are similar. Once again, the pattern in the higher-risk subset–the democracies with the weakest protections for civil liberties–is of increasing risk over time, while the pattern in the lower-risk subset is of no real association. (When comparing these plots to each other and to the preceding ones, note that the scales on the vertical axes sometimes vary.)

I read those results as evidence that traditional thinking about the relationship between the passage of time and democratic consolidation is biased by a selection effect. Yes, the risk of a reversion to authoritarian rule is lower in older democracies than it is in younger ones, but that’s really because the democracies most susceptible to breakdown have already been weeded out. For fragile democracies, the risk of breakdown actually increases over time, unless and until they manage to transform themselves into lower-risk cases by producing an alternation in power or deepening protections for civil liberties. After that, there is essentially no association between the passage of time and the prospects for regime survival. (For a classic illustration of selection bias at work, see here.)

  • Author

  • Follow me on Twitter

  • Follow Dart-Throwing Chimp on
  • Enter your email address to follow this blog and receive notifications of new posts by email.

    Join 13,614 other followers

  • Archives

%d bloggers like this: