Down the Country-Month Rabbit Hole

Some big things happened in the world this week. Iran and the P5+1 agreed on a framework for a nuclear deal, and the agreement looks good. In a presidential election in Nigeria—the world’s seventh–most populous country, and one that few observers would have tagged as a democracy before last weekend—incumbent Goodluck Jonathan lost and then promptly and peacefully conceded defeat. The trickle of countries joining China’s new Asian Infrastructure Investment Bank turned into a torrent.

All of those things happened, but you won’t read more about them here, because I have spent the better part of the past week down a different rabbit hole. Last Friday, after years of almosts and any-time-nows, the event data produced for the Integrated Conflict Early Warning System (ICEWS) finally landed in the public domain, and I have been busy trying to figure out how to put them to use.

ICEWS isn’t the first publicly available trove of political event data, but it compares favorably to the field’s first mover, GDELT, and it currently covers a much longer time span than the other recent entrant, Phoenix.

The public release of ICEWS is exciting because it opens the door wider to dynamic modeling of world politics. Right now, nearly all of the data sets employed in statistical studies of politics around the globe use country-years as their units of observation. That’s not bad if you’re primarily interested in the effects or predictive power of structural features, but it’s pretty awful for explaining and anticipating faster-changing phenomena, like social unrest or violent conflict. GDELT broke the lock on that door, but its high noise-to-signal ratio and the opacity of its coding process have deterred me from investing too much time in developing monitoring or forecasting systems that depend on it.

With ICEWS on the Dataverse, that changes. I think we now have a critical mass of data sets in the public domain that: a) reliably cover important topics for the whole world over many years; b) are routinely updated; and, crucially, c) can be parsed to the month or even the week or day to reward investments in more dynamic modeling. Other suspects fitting this description include:

  • The spell-file version of Polity, which measures national patterns of political authority;
  • Lists of coup attempts maintained by Jonathan Powell and Clayton Thyne (here) and the Center for Systemic Peace (here); and
  • The PITF Worldwide Atrocities Event Dataset, which records information about events involving the deliberate killing of five or more noncombatant civilians (more on it here).

We also have high-quality data sets on national elections (here) and leadership changes (here, described here) that aren’t routinely updated by their sources but would be relatively easy to code by hand for applied forecasting.

With ICEWS, there is, of course, a catch. The public version of the project’s event data set will be updated monthly, but on a one-year delay. For example, when the archive was first posted in March, it ran through February 2014. On April 1, the Lockheed team added March 2014. This delay won’t matter much for scholars doing retrospective analyses, but it’s a critical flaw, if not a fatal one, for applied forecasters who can’t afford to pay—what, probably hundreds of thousands of dollars?—for a real-time subscription.

Fortunately, we might have a workaround. Phil Schrodt has played a huge role in the creation of the field of machine-coded political event data, including GDELT and ICEWS, and he is now part of the crew building Phoenix. In a blog post published the day ICEWS dropped, Phil suggested that Phoenix and ICEWS data will probably look enough alike to allow substituting the former for the latter, perhaps with some careful calibration. As Phil says, we won’t know for sure until we have a wider overlap between the two and can see how well this works in practice, but the possibility is promising enough for me to dig in.

And what does that mean? Well, a week has now passed since ICEWS hit the Dataverse, and so far I have:

  • Written an R function that creates a table of valid country-months for a user-specified time period, to use as scaffolding in the construction and agglomeration of country-month data sets;
  • Written scripts that call that function and some others to ingest and then parse or aggregate the other data sets I mentioned to the country-month level;
  • Worked out a strategy, and written the code, to partition the data into training and test sets for a project on predicting violence against civilians; and
  • Spent a lot of time staring at the screen thinking about, and a little time coding, ways to aggregate, reduce, and otherwise pre-process the ICEWS events and Polity data for that work on violence against civilians and beyond.

What I haven’t done yet—T plus seven days and counting—is any modeling. How’s that for push-button, Big Data magic?

A New Statistical Approach to Assessing Risks of State-Led Mass Killing

Which countries around the world are currently at greatest risk of an onset of state-led mass killing? At the start of the year, I posted results from a wiki survey that asked this question. Now, here in heat-map form are the latest results from a rejiggered statistical process with the same target. You can find a dot plot of these data at the bottom of the post, and the data and code used to generate them are on GitHub.

Estimated Risk of New Episode of State-Led Mass Killing

These assessments represent the unweighted average of probabilistic forecasts from three separate models trained on country-year data covering the period 1960-2011. In all three models, the outcome of interest is the onset of an episode of state-led mass killing, defined as any episode in which the deliberate actions of state agents or other organizations kill at least 1,000 noncombatant civilians from a discrete group. The three models are:

  • PITF/Harff. A logistic regression model approximating the structural model of genocide/politicide risk developed by Barbara Harff for the Political Instability Task Force (PITF). In its published form, the Harff model only applies to countries already experiencing civil war or adverse regime change and produces a single estimate of the risk of a genocide or politicide occurring at some time during that crisis. To build a version of the model that was more dynamic, I constructed an approximation of the PITF’s global model for forecasting political instability and use the natural log of the predicted probabilities it produces as an additional input to the Harff model. This approach mimics the one used by Harff and Ted Gurr in their ongoing application of the genocide/politicide model for risk assessment (see here).
  • Elite Threat. A logistic regression model that uses the natural log of predicted probabilities from two other logistic regression models—one of civil-war onset, the other of coup attempts—as its only inputs. This model is meant to represent the argument put forth by Matt Krain, Ben Valentino, and others that states usually engage in mass killing in response to threats to ruling elites’ hold on power.
  • Random Forest. A machine-learning technique (see here) applied to all of the variables used in the two previous models, plus a few others of possible relevance, using the ‘randomforest‘ package in R. A couple of parameters were tuned on the basis of a gridded comparison of forecast accuracy in 10-fold cross-validation.

The Random Forest proved to be the most accurate of the three models in stratified 10-fold cross-validation. The chart below is a kernel density plot of the areas under the ROC curve for the out-of-sample estimates from that cross-validation drill. As the chart shows, the average AUC for the Random Forest was in the low 0.80s, compared with the high 0.70s for the PITF/Harff and Elite Threat models. As expected, the average of the forecasts from all three performed even better than the best single model, albeit not by much. These out-of-sample accuracy rates aren’t mind blowing, but they aren’t bad either, and they are as good or better than many of the ones I’ve seen from similar efforts to anticipate the onset of rare political crises in countries worldwide.

Distribution of Out-of-Sample AUC Scores by Model in 10-Fold Cross-Validation

The decision to use an unweighted average for the combined forecast might seem simplistic, but it’s actually a principled choice in this instance. When examples of the event of interest are hard to come by and we have reason to believe that the process generating those events may be changing over time, sticking with an unweighted average is a reasonable hedge against risks of over-fitting the ensemble to the idiosyncrasies of the test set used to tune it. For a longer discussion of this point, see pp. 7-8 in the last paper I wrote on this work and the paper by Andreas Graefe referenced therein.

Any close readers of my previous work on this topic over the past couple of years (see here and here) will notice that one model has been dropped from the last version of this ensemble, namely, the one proposed by Michael Colaresi and Sabine Carey in their 2008 article, “To Kill or To Protect” (here). As I was reworking my scripts to make regular updating easier (more on that below), I paid closer attention than I had before to the fact that the Colaresi and Carey model requires a measure of the size of state security forces that is missing for many country-years. In previous iterations, I had worked around that problem by using a categorical version of this variable that treated missingness as a separate category, but this time I noticed that there were fewer than 20 mass-killing onsets in country-years for which I had a valid observation of security-force size. With so few examples, we’re not going to get reliable estimates of any pattern connecting the two. As it happened, this model—which, to be fair to its authors, was not designed to be used as a forecasting device—was also by far the least accurate of the lot in 10-fold cross-validation. Putting two and two together, I decided to consign this one to the scrap heap for now. I still believe that measures of military forces could help us assess risks of mass killing, but we’re going to need more and better data to incorporate that idea into our multimodel ensemble.

The bigger and in some ways more novel change from previous iterations of this work concerns the unorthodox approach I’m now using to make the risk assessments as current as possible. All of the models used to generate these assessments were trained on country-year data, because that’s the only form in which most of the requisite data is produced. To mimic the eventual forecasting process, the inputs to those models are all lagged one year at the model-estimation stage—so, for example, data on risk factors from 1985 are compared with outcomes in 1986, 1986 inputs to 1987 outcomes, and so on.

If we stick rigidly to that structure at the forecasting stage, then I need data from 2013 to produce 2014 forecasts. Unfortunately, many of the sources for the measures used in these models won’t publish their 2013 data for at least a few more months. Faced with this problem, I could do something like what I aim to do with the coup forecasts I’ll be producing in the next few days—that is, only use data from sources that quickly and reliably update soon after the start of each year. Unfortunately again, though, the only way to do that would be to omit many of the variables most specific to the risk of mass atrocities—things like the occurrence of violent civil conflict or the political salience of elite ethnicity.

So now I’m trying something different. Instead of waiting until every last input has been updated for the previous year and they all neatly align in my rectangular data set, I am simply applying my algorithms to the most recent available observation of each input. It took some trial and error to write, but I now have an R script that automates this process at the country level by pulling the time series for each variable, omitting the missing values, reversing the series order, snipping off the observation at the start of that string, collecting those snippets in a new vector, and running that vector through the previously estimated model objects to get a forecast (see the section of this starting at line 284).

One implicit goal of this approach is to make it easier to jump to batch processing, where the forecasting engine routinely and automatically pings the data sources online and updates whenever any of the requisite inputs has changed. So, for example, when in a few months the vaunted Polity IV Project releases its 2013 update, my forecasting contraption would catch and ingest the new version and the forecasts would change accordingly. I now have scripts that can do the statistical part but am going to be leaning on other folks to automate the wider routine as part of the early-warning system I’m helping build for the U.S. Holocaust Memorial Museum’s Center for the Prevention of Genocide.

The big upside of this opportunistic approach to updating is that the risk assessments are always as current as possible, conditional on the limitations of the available data. The way I figure, when you don’t have information that’s as fresh as you’d like, use the freshest information you’ve got.

The downside of this approach is that it’s not clear exactly what the outputs from that process represent. Technically, a forecast is a probabilistic statement about the likelihood of a specific event during a specific time period. The outputs from this process are still probabilistic statements about the likelihood of a specific event, but they are no longer anchored to a specific time period. The probabilities mapped at the top of this post mostly use data from 2012, but the inputs for some variables for some cases are a little older, while the inputs for some of the dynamic variables (e.g., GDP growth rates and coup attempts) are essentially current. So are those outputs forecasts for 2013, or for 2014, or something else?

For now, I’m going with “something else” and am thinking of the outputs from this machinery as the most up-to-date statistical risk assessments I can produce, but not forecasts as such. That description will probably sound like fudging to most statisticians, but it’s meant to be an honest reflection of both the strengths and limitations of the underlying approach.

Any gear heads who’ve read this far, I’d really appreciate hearing your thoughts on this strategy and any ideas you might have on other ways to resolve this conundrum, or any other aspect of this forecasting process. As noted at the top, the data and code used to produce these estimates are posted online. This work is part of a soon-to-launch, public early-warning system, so we hope and expect that they will have some effect on policy and advocacy planning processes. Given that aim, it behooves us to do whatever we can to make them as accurate as possible, so I would very much welcome any suggestions on how to do or describe this better.

Finally and as promised, here is a dot plot of the estimates mapped above. Countries are shown in descending order by estimated risk. The gray dots mark the forecasts from the three component models, and the red dot marks the unweighted average.


PS. In preparation for a presentation on this work at an upcoming workshop, I made a new map of the current assessments that works better, I think, than the one at the top of this post. Instead of coloring by quintiles, this new version (below) groups cases into several bins that roughly represent doublings of risk: less than 1%, 1-2%, 2-4%, 4-8%, and 8-16%. This version more accurately shows that the vast majority of countries are at extremely low risk and more clearly shows variations in risk among the ones that are not.

Estimated Risk of New State-Led Mass Killing

Estimated Risk of New State-Led Mass Killing

A Research Note on Updating Coup Forecasts

A new year is about to start, and that means it’s time for me to update my coup forecasts (see here and here for the 2013 and 2012 editions, respectively). The forecasts themselves aren’t quite ready yet—I need to wait until mid-January for updates from Freedom House to arrive—but I am making some changes to my forecasting process that I thought I would go ahead and describe now, because the thinking behind them illustrates some important dilemmas and new opportunities for predictions of many kinds of political events.

When it comes time to build a predictive statistical model of some rare political event, it’s usually not the model specification that gives me headaches. For many events of interest, I think we now have a pretty good understanding of which methods and variables are likely to produce more accurate forecasts.

Instead, it’s the data, or really the lack thereof, that sets me to pulling my hair out. As I discussed in a recent post, things we’d like to include in our models fall into a few general classes in this regard:

  • No data exist (fuggeddaboudit)
  • Data exist for some historical period, but they aren’t updated (“HA-ha!”)
  • Data exist and are updated, but they are patchy and not missing at random (so long, some countries)
  • Data exist and are updated, but not until many months or even years later (Spinning Pinwheel of Death)

In the past, I’ve set aside measures that fall into the first three of those sets but gone ahead and used some from the fourth, if I thought the feature was important enough. To generate forecasts before the original sources updated, I either a) pulled forward the last observed value for each case (if the measure was slow-changing, like a country’s infant mortality rate) or b) hand-coded my own updates (if the measure was liable to change from year to year, like a country’s political regime type).

Now, though, I’ve decided to get out of the “artisanal updating” business, too, for all but the most obvious and uncontroversial things, like which countries recently joined the WTO or held national elections. I’m quitting this business, in part, because it takes a lot of time and the results may be pretty noisy. More important, though, I’m also quitting because it’s not so necessary any more, thanks to  timelier updates from some data providers and the arrival of some valuable new data sets.

This commitment to more efficient updating has led me to adopt the following rules of thumb for my 2014 forecasting work:

  • For structural features that don’t change much from year to year (e.g., population size or infant mortality), include the feature and use the last observed value.
  • For variables that can change from year to year in hard-to-predict ways, only include them if the data source is updated in near-real time or, if it’s updated annually, if those updates are delivered within the first few weeks of the new year.
  • In all cases, only use data that are publicly available, to facilitate replication and to encourage more data sharing.

And here are some of the results of applying those rules of thumb to the list of features I’d like to include in my coup forecasting models for 2014.

  • Use Powell and Thyne’s list of coup events instead of Monty Marshall’s. Powell and Thyne’s list is updated throughout the year as events occur, whereas the publicly available version of Marshall’s list is only updated annually, several months after the start of the year. That wouldn’t matter so much if coups were only the dependent variable, but recent coup activity is also an important predictor, so I need the last year’s updates ASAP.
  • Use Freedom House’s Freedom in the World (FIW) data instead of Polity IV to measure countries’ political regime type. Polity IV offers more granular measures of political regime type than Freedom in the World, but Polity updates aren’t posted until spring or summer of the following year, usually more than a third of the way into my annual forecasting window.
  •  Use IMF data on economic growth instead of the World Bank’s. The Bank now updates its World Development Indicators a couple of times a year, and there’s a great R package that makes it easy to download the bits you need. That’s wonderful for slow-changing structural features, but it still doesn’t get me data on economic performance as fast as I’d like it. I work around that problem by using the IMF’s World Economic Outlook Database, which include projections for years for which observed data aren’t yet available and forecasts for several years into the future.
  • Last but not least, use GDELT instead of UCDP/PRIO or Major Episodes of Political Violence (MEPV) to measure civil conflict. Knowing which countries have had civil unrest or violence in the recent past can help us predict coup attempts, but the major publicly available measures of these things are only updated well into the year. GDELT now represents a nice alternative. It covers the whole world, measures lots of different forms of political cooperation and conflict, and is updated daily, so country-year updates are available on January 2. GDELT’s period of observation starts in 1979, so it’s still a stretch to use it models of super-rare events like mass-killing onsets, where the number of available examples since 1979 on which to train is still relatively small. For less-rare events like coup attempts, though, starting the analysis around 1980 is no problem. (Just don’t forget to normalize them!) With some help from John Beieler, I’m already experimenting with adding annual GDELT summaries to my coup forecasting process, and I’m finding that they do improve the model’s out-of-sample predictive power.

In all of the forecasting work I do, my long-term goals are 1) to make the forecasts more dynamic by updating them more frequently (e.g., monthly, weekly, or even daily instead of yearly) and 2) to automate that updating process as much as possible. The changes I’m making to my coup forecasting process for 2014 don’t directly accomplish either of these things, but they do take me a few steps in both directions. For example, once GDELT is in the mix, it’s possible to start thinking about how to switch to monthly or even daily updates that rely on a sliding window of recent GDELT tallies. And once I’ve got a coup data set that updates in near-real time, I can imagine pinging that source each day to update the counts of coup attempts in the past several years. I’m still not where I’d like to be, but I think I’m finally stepping onto a path that can carry me there.

Singing the Missing-Data Blues

I’m currently in the throes of assembling data to use in forecasts on various forms of political change in countries worldwide for 2014. This labor-intensive process is the not-so-sexy side of “data science” that practitioners like to bang on about if you ask us, but I’m not going to do that here. Instead, I’m going to talk about how hard it is to find data sets that applied forecasters of rare events in international politics can even use in the first place. The steep data demands for predictive models mean that many of the things we’d like to include in our models get left out, and many of the data sets political scientists know and like aren’t useful to applied forecasters.

To see what I’m talking about, let’s assume we’re building a statistical model to forecast some rare event Y in countries worldwide, and we have reason to believe that some variable X should help us predict that Y. If we’re going to include X in our model, we’ll need data, but any old data won’t do. For a measure of X to be useful to an applied forecaster, it has to satisfy a few requirements. This Venn diagram summarizes the four I run into most often:


First, that measure of X has to be internally consistent. Validity is much less of a concern than it is in hypothesis-testing research, since we’re usually not trying to make causal inferences or otherwise build theory. If our measure of X bounces around arbitrarily, though, it’s not going to provide much a predictive signal, no matter how important the concept underlying X may be. Similarly, if the process by which that measure of X is generated keeps changing—say, national statistical agencies make idiosyncratic revisions to their accounting procedures, or coders keep changing their coding rules—then models based on the earlier versions will quickly break. If we know the source or shape of the variation, we might be able to adjust for it, but we aren’t always so lucky.

Second, to be useful in global forecasting, a data set has to offer global coverage, or something close to it. It’s really as simple as that. In the most commonly used statistical models, if a case is missing data on one or more of the inputs, it will be missing from the outputs, too. This is called listwise deletion, and it means we’ll get no forecast for cases that are missing values on any one of the predictor variables. Some machine-learning techniques can generate estimates in the face of missing data, and there are ways to work around listwise deletion in regression models, too (e.g., create categorical versions of continuous variables and treat missing values as another category). But those workarounds aren’t alchemy, and less information means less accurate forecasts.

Worse, the holes in our global data sets usually form a pattern, and that pattern is often correlated with the very things we’re trying to predict. For example, the poorest countries in the world are more likely to experience coups, but they are also more likely not to be able to afford the kind of bureaucracy that can produce high-quality economic statistics. Authoritarian regimes with frustrated citizens may be more likely to experience popular uprisings, but many autocrats won’t let survey research firms ask their citizens politically sensitive questions, and many citizens in those regimes would be reluctant to answer those questions candidly anyway. The fact that our data aren’t missing at random compounds the problem, leaving us without estimates for some cases and screwing up our estimates for the rest. Under these circumstances, it’s often best to omit the offending data set from our modeling process entirely, even if the X it’s measuring seems important.

Third and related to no. 2, if our events are rare, then our measure of X needs historical depth, too. To estimate the forecasting model, we want as rich a library of examples as we can get. For events as rare as onsets of violent rebellion or episodes of mass killing, which typically occur in just one or a few countries worldwide each year, we’ll usually need at least a few decades’ worth of data to start getting decent estimates on the things that differentiate the situations where the event occurs from the many others where it doesn’t. Without that historical depth, we run into the same missing-data problems I described in relation to global coverage.

I think this criterion is much tougher to satisfy than many people realize. In the past 10 or 20 years, statistical agencies, academic researchers, and non-governmental organizations have begun producing new or better data sets on all kinds of things that went unmeasured or poorly measured in the past—things like corruption or inflation or unemployment, to name a few that often come up in conversations about what predicts political instability and change. Those new data sets are great for expanding our view of the present, and they will be a boon to researchers of the future. Unfortunately, though, they can’t magically reconstruct the unobserved past, so they still aren’t very useful for predictive models of rare political events.

The fourth and final circle in that Venn diagram may be both the most important and the least appreciated by people who haven’t tried to produce statistical forecasts in real time: we need timely updates. If I can’t depend on the delivery of fresh data on X before or early in my forecasting window, then I can’t update my forecasts while they’re still relevant, and the model is effectively DOA. If X changes slowly, we can usually get away with using the last available observation until the newer stuff shows up. Population size and GDP per capita are a couple of variables for which this kind of extrapolation is generally fine. Likewise, if the variable changes predictably, we might use forecasts of X before the observed values become available. I sometimes do this with GDP growth rates. Observed data for one year aren’t available for many countries until deep into the next year, but the IMF produces decent forecasts of recent and future growth rates that can be used in the interim.

Maddeningly, though, this criterion alone renders many of the data sets scholars have painstakingly constructed for specific research projects useless for predictive modeling. For example, scholars in recent years have created numerous data sets to characterize countries’ national political regimes, a feature that scores of studies have associated with variation in the risk of many forms of political instability and change. Many of these “boutique” data sets on political regimes are based on careful research and coding procedures, cover the whole world, and reach at least several decades or more into the past. Only two of them, though—Polity IV and Freedom House’s Freedom in the World—are routinely updated. As much as I’d like to use unified democracy scores or measures of authoritarian regime type in my models, I can’t without painting myself into a forecasting corner, so I don’t.

As I hope this post has made clear, the set formed by the intersection of these four criteria is a tight little space. The practical requirements of applied forecasting mean that we have to leave out of our models many things that we believe might be useful predictors, no matter how important the relevant concepts might seem. They also mean that our predictive models on many different topics are often built from the same few dozen “usual suspects”—not because we want to, but because we don’t have much choice. Multiple imputation and certain machine-learning techniques can mitigate some of these problems, but they hardly eliminate them, and the missing information affects our forecasts either way. So the next time you’re reading about a global predictive model on international politics and wondering why it doesn’t include something “obvious” like unemployment or income inequality or survey results, know that these steep data requirements are probably the reason.

A Rumble of State Collapses

The past couple of years have produced an unusually large number of collapsed states around the world, and I think it’s worth pondering why.

As noted in a previous post, when I say “state collapse,” I mean this:

A state collapse occurs when a sovereign state fails to provide public order in at least one-half of its territory or in its capital city for at least 30 consecutive days. A sovereign state is regarded as failing to provide public order in a particular area when a) an organized challenger, usually a rebel group or regional government, effectively controls that area; b) lawlessness pervades in that area; or c) both. A state is considered sovereign when it is granted membership in the U.N. General Assembly.

The concepts used in this definition are very hard to observe, so I prefer to make probabilistic instead of categorical judgments about which states have crossed this imaginary threshold. In other words, I think state collapse is more usefully treated as a fuzzy set instead of a crisp one, so that’s what I’ll do here.

At the start of 2011, there was only state I would have confidently identified as collapsed: Somalia. Several more were plausibly collapsed or close to it—Afghanistan, Central African Republic (CAR), and Democratic Republic of Congo (DRC) come to mind—but only Somalia was plainly over the line.

By my reckoning, four states almost certainly collapsed in 2011-2012—Libya, Mali, Syria, and Yemen—and Central African Republic probably did. That’s a four- or five-fold increase in the prevalence of state collapse in just two years. In all five cases, collapse was precipitated by the territorial gains of armed challengers. So far, only three of the five states’ governments have fallen, but Assad and Bozize have both seen the reach of their authority greatly circumscribed, and my guess is that neither will survive politically through the end of 2013.

I don’t have historical data to which I can directly compare these observations, but Polity’s “interregnum” (-77) indicator offers a useful (if imperfect) proxy. The column chart below plots annual counts of Polity interregnums (interregna? interregni? what language is this, anyway?) since 1945. A quick glance at the chart indicates that both the incidence and prevalence of state collapse seen in the past two years—which aren’t shown in the plot because Polity hasn’t yet been updated to the present—are historically rare. The only comparable period in the past half-century came in the early 1990s, on the heels of the USSR’s disintegration. (For those of you wondering, the uptick in 2010 comes from Haiti and Ivory Coast. I hadn’t thought of those as collapsed states, and their addition to the tally would only make the past few years look that much more exceptional.)

Annual Counts of Polity Interregnums, 1946-2010

Annual Counts of Polity Interregnums, 1946-2010

I still don’t understand this phenomenon well enough to say anything with assurance about why this “rumble” of state collapses is occurring right now, but I have some hunches. At the systemic level, I suspect that shifts in the relative power of big states are partly responsible for this pattern. Political authority is, in many ways, a confidence game, and growing uncertainty about major powers’ will and ability to support the status quo may be increasing the risk of state collapse in countries and regions where that support has been especially instrumental.

Second and related is the problem of contagion. The set of collapses that have occurred in the past two years are clearly interconnected. Successful revolutions in Tunisia and Egypt spurred popular uprisings in many Arab countries, including Libya, Syria, and Yemen . Libya’s disintegration fanned the rebellion that precipitated a coup and then collapse in Mali. Only CAR seems disconnected from the Arab Spring, and I wonder if the rebels there didn’t time their offensive, in part, to take advantage of the region’s   current distraction with its regional neighbor to the northwest.

Surely there are many other forces at work, too, most of them local and none of them deterministic. Still, I think these two make a pretty good starting point, and they suggest that the current rumble probably isn’t over yet.

What Darwin Teaches Us about Political Regime Types

Here’s a paragraph, from a 2011 paper by Ian Lustick, that I really wish I’d written. It’s long, yes, but it rewards careful reading.

One might naively imagine that Darwin’s theory of the “origin of species” to be “only” about animals and plants, not human affairs, and therefore presume its irrelevance for politics. But what are species? The reason Darwin’s classic is entitled Origin of Species and not Origin of the Species is because his argument contradicted the essentialist belief that a specific, finite, and unchanging set of categories of kinds had been primordially established. Instead, the theory contends, “species” are analytic categories invented by observers to correspond with stabilized patterns of exhibited characteristics. They are no different in ontological status than “varieties” within them, which are always candidates for being reclassified as species. These categories are, in essence, institutionalized ways of imagining the world. They are institutionalizations of difference that, although neither primordial nor permanent, exert influence on the futures the world can take—both the world of science and the world science seeks to understand. In other words, “species” are “institutions”: crystallized boundaries among “kinds”, constructed as boundaries that interrupt fields of vast and complex patterns of variation. These institutionalized distinctions then operate with consequences beyond the arbitrariness of their location and history to shape, via rules (constraints on interactions), prospects for future kinds of change.

This is one of the big ideas to which I was trying to allude in a post I wrote a couple of months ago on “complexity politics”, and in an ensuing post that used animated heat maps to trace gross variations in forms of government over the past 211 years. Political regime types are the species of comparative politics. They are “analytic categories invented by observers to correspond with stabilized patterns of exhibited characteristics.” In short, they are institutionalized ways of thinking about political institutions. The patterns they describe may be real, but they are not essential. They’re not the natural contours of the moon’s surface; they’re the faces we sometimes see in them.

video game taxonomy

Mary Goodden’s Taxonomy of Video Games

If we could just twist our mental kaleidoscopes a bit, we might find different things in the same landscape. One way to do that would be to use a different set of measures. For the past 20 years or so, political scientists have relied almost exclusively on the same two data sets—Polity and Freedom House’s Freedom in the World—to describe and compare national political regimes in anything other than prose. These data sets are very useful, but they are also profoundly conventional. Polity offers a bit more detail than Freedom House on specific features of national politics, but the two are essentially operationalizing the same assumptions about the underlying taxonomy of forms of government.

Given that fact, it’s hard to see how further distillations of those data sets might surprise us in any deep way. A new project called Varieties of Democracy (V-Dem) promises to bring fresh grist to the mill by greatly expanding the number of institutional elements we can track, but it is still inherently orthodox. Its creators aren’t trying to reinvent the taxonomy; they’re looking to do a better job locating individuals in the prevailing one. That’s a worthy and important endeavor, but it’s not going to produce the kind of gestalt shift I’m talking about here.

New methods of automated text analysis just might. My knowledge of this field is quite limited, but I’m intrigued by the possibilities of applying unsupervised learning techniques, such as latent Dirichlet allocation (LDA), to the problem of identifying political forms and associating specific cases with them. In contrast to conventional measurement strategies, LDA doesn’t oblige us to specify a taxonomy ahead of time and then look for instances of the things in it. Instead, LDA assumes there is an infinite mixture of overlapping but latent categories out there, and these latent categories are partially revealed by characteristic patterns in the ways we talk and write about the world.

Unsupervised learning is still constrained by the documents we choose to include and the language we use in them, but it should still help us find patterns in the practice of politics that our conventional taxonomies overlook. I hope to be getting some funding to try this approach in the near future, and if that happens, I’m genuinely excited to see what we find.

211 Years of Political Evolution in 60 Seconds — New and Improved!!

The heat maps used in the animation I posted yesterday plotted change over time in counts of countries in each cell of a two-dimensional space representing different kinds of politcal institutions. Over the 211 years in question, however, the number of countries in the world has grown dramatically, from about 50 in 1800 to well over 150 in 2011. For that reason, a couple of commenters wondered whether we would see something different if we plotted proportions instead of counts, using the size of the total population as a denominator in each cell. Proportions better fit the ideas behind a fitness landscape, so I added a line to my code and gave it a whirl. Here’s what I got:

To my eye, there aren’t any big differences in the patterns we see here compared with the ones based on counts. Re-watching the animation today, though, here are a few other things that caught my attention:

  • The predominance in the mid-1800s of intermediate forms combining authoritarian selection with highly polarized political participation—what Polity calls “factionalism.” This peak in the middle left of the heat maps shows how popular mobilization generally led to competitive elections, and not the other way around. As historian Sean Wilenz wrote, “Democracy is never a gift bestowed…It must always be fought for.” It also reminds us that popular mobilization was initially quite polarized in the “developed” world (ha!), just as it often is poorer countries today.
  • The wide variety of intermediate forms present in the early 1900s. Here we see a bunch of cases in the upper left-hand quadrant, combining authoritarian selection procedures with open and well-regulated participation. This is a combination we almost never see nowadays. It looks like there were some interesting experiments occurring in the wake of the industrial explosion that occurred in richer countries in the latter half of the nineteenth century.
  • The sharp bifurcation of the fitness landscape after World War II. Before the war, the peak in the lower left-hand corner representing closed dictatorships had shrunken, and there seemed to be more action in the upper left and lower right quadrants. After the war, the peak in the lower left rose again and remained there until around 1990. This pattern makes clearer that the evolution of the past two centuries has not been a steady march toward democracy. It’s interesting—and potentially chilling—to contemplate how much the fitness landscape of the past 70 years might have differed had World War II taken different turns.

211 Years of Political Evolution in 60 Seconds

The GIF below—click on it to make it play—animates a series of 211 heat maps summarizing annual data on national political regimes around the world from 1800 to 2010. The space in the heat maps represents two of the “concept” variables from the Polity IV data set—executive recruitment and political competition—that roughly correspond to the dimensions of contestation and participation Robert Dahl uses to define modern regime types. In the animated maps, the lower left is least democratic, and the upper right is most democratic. The darker the grey, the higher the number of cases in that cell. [NB. For a version that uses proportions instead of raw counts and some additional thoughts on patterns over time, see this short follow-up post.]

[Fellow propeller-heads: I built this in R with helpful suggestions from Trey Causey and Tom Parris along the way. The heat maps were made with a function appropriately called ‘heatmap’, and I used the ‘animation’ package to compile those images into a .gif. Ping me if you’d like to see the script.]

I made this animation because I think it supports the idea, discussed briefly in my last post, that political development is an evolutionary process. Evolutionary processes feed on diversity and mutation, but the results of evolution are not randomly distributed. Borrowing from Daniel Dennett, we can imagine evolution occurring in a multidimensional design space that contains all possible combinations of a particular set of building blocks. In biology, those building blocks are genes; in politics, they might be simple rules.

For present purposes, let’s imagine that there are only two dimensions in this design space. Those two dimensions suggest a map of the design space that evolutionary biologists call a fitness landscape. The topography of this landscape is determined by the fitness of specific combinations, as indicated by sizes of the relevant populations. That’s what the heat maps in the animation above are showing.

The existence of the system is a matter of chance, but once an evolutionary system emerges, we can expect to see certain patterns. The selection pressures present in any particular environment mean that some combinations will be fitter than others, producing visible and often durable peaks in that fitness landscape. Mutation—and, in the case, of social technologies like government, deliberate tinkering—will keep producing new varieties, but most won’t be fit enough for the environment of the day to survive and spread. As a result, most of the variation will cluster around the existing peaks, because small differences in design will often (but not always!) produce small differences in fitness.

When selection pressures change, however, the designs embodied in the previous peaks will often become less fit, and new designs will emerge as stronger competitors. Importantly, though, that transition from the old peaks to new ones usually won’t be smooth and direct. Instead, as Niles Eldredge and Stephen Jay Gould describe in their model of punctuated equlibrium, we can expect to see bursts of diversity as the evolutionary engine “searches” for new forms that better fit the changing environment. As the selection pressures settle into a new normal, the fitness landscape should also settle back into the familiar pattern of clearer peaks and valleys.

The two Polity variables used here are, of course, gross and conceptually biased simplifications of complex phenomena. Underlying each of these dimensions are a few component variables that are themselves simplifications of complex sets of written and unwritten rules. Still, the Polity data are the best we’ve got right now for observing change in over a long period of time, and it’s pretty hard for us humans to visualize four- or seven- or thirty-dimensional space. So, for now, I’m using these two summary indices to get a very rough map of the design space for modern political institutions.

Maybe it’s confirmation bias at work, but when I watch the animation above, I see the patterns evolutionary theorists tell me I should see. In 1800, the fitness landscape is dominated by a single peak representing highly undemocratic regimes—mostly monarchies with virtually no popular participation. If we could extend the movie back several more centuries, we would see the same pattern holding through the entirety of human civilization since our hunter-gatherer days.

Pretty soon after we drop in to watch, however, things start to move. In the early 1800s, a couple of new lumps rise as popular participation expands in some regimes. Most countries still select their rulers by hereditary lineage or other closed means (the peak in the middle left), but some start using competitive elections to pick their governments. By the late nineteenth century, a second peak has clearly emerged in the upper right-hand corner, where rulers are chosen through competitive elections with broad participation. [NB: I think Polity rushes things a bit here by ignoring the disenfranchisement of women, but we go to publish with the data we’ve got, not the data we’d like.]

Through most of the twentieth century, the same general pattern holds. There’s a fair amount of variation, but most regimes are concentrated in the same few patches of the design space. At the end of the twentieth and start of the twenty-first centuries, however, we see a burst of diversity. The authoritarian peak shrinks, the democratic peak holds, and large swathes of the design space that have rarely been occupied bubble with activity.

To my eye, this very recent phase looks like one of Eldredge and Gould’s punctuation marks, that is, an episode of heightened diversity caused by a significant shift in selection pressures. Most observers of international politics won’t be surprised to see this pattern, and many of them would probably attribute it to the end of the Cold War. I’m not so sure. I’m more inclined to see the collapse of the Soviet Union and the expansion in the diversity of political forms as twin consequences of deeper changes in the global system that seem to be favoring democratic forms over authoritarian ones. What new peaks we’ll see when the system settles down again—and on what heretofore hidden dimensions of political design space they might draw—is impossible to know, but it sure is fascinating to watch.


Get every new post delivered to your Inbox.

Join 11,092 other followers

%d bloggers like this: