In a recent post, I described an ongoing project in which Shahryar Minhas, Mike Ward, and I are using text mining and machine learning to produce fuzzy-set measures of various political regime types for all countries of the world. As part of the NSF-funded MADCOW project,* our ultimate goal is to devise a process that routinely updates those data in near-real time at low cost. We’re not there yet, but our preliminary results are promising, and we plan to keep tinkering.

One of crucial choices we had to make in our initial analysis was how to measure each regime type for the machine-learning phase of the process. This choice is important because our models are only going to be as good as the data from which they’re derived. If the targets in that machine-learning process don’t reliably represent the concepts we have in mind, then the resulting models will be looking for the wrong things.

For our first cut, we decided to use dichotomous measures of several regime types, and to base those dichotomous measures on stringent criteria. So, for example, we identified as democracies only those cases with a score of 10, the maximum, on Polity’s scalar measure of democracy. For military rule, we only coded as 1 those cases where two major data sets agreed that a regime was authoritarian and only military-led, with no hybrids or modifiers. Even though the targets of our machine-learning process were crisply bivalent, we could get fuzzy-set measures from our classifiers by looking at the probabilities of class membership they produce.

In future iterations, though, I’m hoping we’ll get a chance to experiment with targets that are themselves fuzzy or that just take advantage of a larger information set. Bayesian measurement error models offer a great way to generate those targets.

Imagine that you have a set of cases that may or may not belong in some category of interest—say, democracy. Now imagine that you’ve got a set of experts who vote yes (1) or no (0) on the status of each of those cases and don’t always agree. We can get a simple estimate of the probability that a given case is a democracy by averaging the experts’ votes, and that’s not necessarily a bad idea. If, however, we suspect that some experts are more error prone than others, and that the nature of those errors follows certain patterns, then we can do better with a model that gleans those patterns from the data and adjusts the averaging accordingly. That’s exactly what a Bayesian measurement error model does. Instead of an unweighted average of the experts’ votes, we get an inverse-error-rate-weighted average, which should be more reliable than the unweighted version if the assumption about predictable patterns in those errors is largely correct.

I’m not trained in Bayesian data analysis and don’t know my way around the software used to estimate these models, so I sought and received generous help on this task from Sean J. Taylor. I compiled yes/no measures of democracy from five country-year data sets that ostensibly use similar definitions and coding criteria:

- Cheibub, Gandhi, and Vreeland’s Democracy and Dictatorship (DD) data set, 1946–2008 (here);
- Boix, Miller, and Rosato’s dichotomous coding of democracy, 1800–2007 (here);
- A binary indicator of democracy derived from Polity IV using the Political Instability Task Force’s coding rules, 1800–2013;
- The lists of electoral democracies in Freedom House’s annual
*Freedom in the World*reports, 1989–2013; and - My own Democracy/Autocracy data set, 1955–2010 (here).

Sean took those five columns of zeroes and ones and used them to estimate a model with no prior assumptions about the five sources’ relative reliability. James Melton, Stephen Meserve, and Daniel Pemstein use the same technique to produce the terrific Unified Democracy Scores. What we’re doing is a little different, though. Where their approach treats democracy as a scalar concept and estimates a composite index from several measures, we’re accepting the binary conceptualization underlying our five sources and estimating the probability that a country qualifies as a democracy. In fuzzy-set terms, this probability represents a case’s degree of membership in the democracy set, not how democratic it is.

The distinction between a country’s degree of membership in that set and its degree of democracy is subtle but potentially meaningful, and the former will sometimes be a better fit for an analytic task than the latter. For example, if you’re looking to distinguish categorically between democracies and autocracies in order to estimate the difference in some other quantity across the two sets, it makes more sense to base that split on a probabilistic measure of set membership than an arbitrarily chosen cut point on a scalar measure of democracy-ness. You would still need to choose a threshold, but “greater than 0.5” has a natural interpretation (“probably a democracy”) that suits the task in a way that an arbitrary cut point on an index doesn’t. And, of course, you could still perform a sensitivity analysis by moving the cut point around and seeing how much that choice affects your results.

So that’s the theory, anyway. What about the implementation?

I’m excited to report that the estimates from our initial measurement model of democracy look great to me. As someone who has spent a lot of hours wringing my hands over the need to make binary calls on many ambiguous regimes (Russia in the late 1990s? Venezuela under Hugo Chavez? Bangladesh between coups?), I think these estimates are accurately distinguishing the hazy cases from the rest and even doing a good job estimating the extent of that uncertainty.

As a first check, let’s take a look at the distribution of the estimated probabilities. The histogram below shows the estimates for the period 1989–2007, the only years for which we have inputs from all five of the source data sets. Voilà, the distribution has the expected shape. Most countries most of the time are readily identified as democracies or non-democracies, but the membership status of a sizable subset of country-years is more uncertain.

Of course, we can and should also look at the estimates for specific cases. I know a little more about countries that emerged from the collapse of the Soviet Union than I do about the rest of the world, so I like to start there when eyeballing regime data. The chart below compares scores for several of those countries that have exhibited more variation over the past 20+ years. Most of the rest of the post-Soviet states are slammed up against 1 (Estonia, Latvia, and Lithuania) or 0 (e.g., Uzbekistan, Turkmenistan, Tajikistan), so I left them off the chart. I also limited the range of years to the ones for which data are available from all five sources. By drawing strength from other years and countries, the model can produce estimates for cases with fewer or even no inputs. Still, the estimates will be less reliable for those cases, so I thought I would focus for now on the estimates based on a common set of “votes.”

Those estimates look about right to me. For example, Georgia’s status is ambiguous and trending less likely until the Rose Revolution of 2003, after which point it’s probably but not certainly a democracy, and the trend bends down again soon thereafter. Meanwhile, Russia is fairly confidently identified as a democracy after the constitutional crisis of 1993, but its status becomes uncertain around the passage of power from Yeltsin to Putin and then solidifies as most likely authoritarian by the mid-2000s. Finally, Armenia was one of the cases I found most difficult to code when building the Democracy/Autocracy data set for the Political Instability Task Force, so I’m gratified to see its probability of democracy oscillating around 0.5 throughout.

One nice feature of a Bayesian measurement error model is that, in addition to estimating the scores, we can also estimate confidence intervals to help quantify our uncertainty about those scores. The plot below shows Armenia’s trend line with the upper and lower bounds of a 90-percent confidence interval. Here, it’s even easier to see just how unclear this country’s democracy status has been since it regained independence. From 1991 until at least 2007, its 90-percent confidence interval straddled the toss-up line. How’s that for uncertain?

Sean and I are still talking about ways to tweak this process, but I think the data it’s producing are already useful and interesting. I’m considering using these estimates in a predictive model of coup attempts and seeing if and how the results differ from ones based on the Polity index and the Unified Democracy Scores. Meanwhile, the rest of the MADCOW crew and I are now talking about applying the same process to dichotomous indicators of military rule, one-party rule, personal rule, and monarchy and then experimenting with machine-learning processes that use the results as their targets. There are lots of moving parts in our regime data-making process, and this one isn’t necessarily the highest priority, but it would be great to get to follow this path and see where it leads.

* NSF Award 1259190, Collaborative Research: Automated Real-time Production of Political Indicators

## Andy

/ August 31, 2014Hi Jay,

I really enjoyed reading this and seeing your presentation at APSA. I’ve been thinking more about the discussant’s comments and the conversation we had on Twitter afterward (https:

//twitter.com/jay_ulfelder/status/505392596102172672), and I think part of the issue for people with the “membership in set X” vs. “degree of X” distinction is that we’re all very used to thinking about degrees of democracy. Treating democracy as a crisp and well-defined set is something that we’ve all been taught not to do, which makes it easy to treat the probability of being a democracy (or a Polity =10 democracy) as a degree. Having the monarchy and military rule applications in the paper helps to get away from this, but I bet it’s the democracy part that people are getting hung up on.

Another potential application for this that just came to me would be to generate the monthly ICEWS ground truths (international crisis, ethnic and religious violence, domestic political crisis, rebellion, and insurgency). They’re crisply defined (according to them), and there’s obviously uncertainty around them, which they acknowledge to a degree by going back and revising previous months’ classifications. Having these ratings, with explicit uncertainty measures, available publicly, and updated frequently, would be great both as inputs for forecasts and as the thing to try to forecast.

What are your next plans for the project? Have you tried using news articles yet? I’m really looking forward to seeing where this goes.

## dartthrowingchimp

/ August 31, 2014Thanks for following up here, Andy.

On the dichotomous vs. continuous measurement of democracy, I still buy Collier and Adcock’s argument that the choice depends on what you’re planning to use it for and why. So uni-dimensional scalar measures make sense for many applications, but a binary measure makes more sense for some, and multi-dimensional measures (like V-Dem) will make more sense for others. On when a binary measure would be required, event history analysis of regime survival is a good example. For this to work, you need cases to enter and exit the risk set, and that requires an “Is it or isn’t it?” call.

On the ICEWS events, yes, absolutely, this would be a great application of this approach and is actually part of what the larger MADCOW project endeavors to do. Coup attempts are one that are near and dear to my heart. The measurement-model approach described here could be applied to the Marshall, Powell and Thyne, and Banks data sets, for example, and then the text-mining-and-machine-learning process could be applied to a binary target based on the resulting probability estimates (e.g., “probably a coup attempt” as indicated by p > 0.5).

On what’s next for our project, Mike and Shahryar and I will be discussing that soon. And no, we haven’t tried using news articles yet.