Mining Texts to Generate Fuzzy Measures of Political Regime Type at Low Cost

Political scientists use the term “regime type” to refer to the formal and informal structure of a country’s government. Of course, “government” entails a lot of things, so discussions of regime type focus more specifically on how rulers are selected and how their authority is organized and exercised. The chief distinction in contemporary work on regime type is between democracies and non-democracies, but there’s some really good work on variations of non-democracy as well (see here and here, for example).

Unfortunately, measuring regime type is hard, and conventional measures of regime type suffer from one or two crucial drawbacks.

First, many of the data sets we have now represent regime types or their components with bivalent categorical measures that sweep meaningful uncertainty under the rug. Specific countries at specific times are identified as fitting into one and only one category, even when researchers knowledgeable about those cases might be unsure or disagree about where they belong. For example, all of the data sets that distinguish categorically between democracies and non-democracies—like this one, this one, and this one—agree that Norway is the former and Saudi Arabia the latter, but they sometimes diverge on the classification of countries like Russia, Venezuela, and Pakistan, and rightly so.

Importantly, the degree of our uncertainty about where a case belongs may itself be correlated with many of the things that researchers use data on regime type to study. As a result, findings and forecasts derived from those data are likely to be sensitive to those bivalent calls in ways that are hard to understand when that uncertainty is ignored. In principle, it should be possible to make that uncertainty explicit by reporting the probability that a case belongs in a specific set instead of making a crisp yes/no decision, but that’s not what most of the data sets we have now do.

Second, virtually all of the existing measures are expensive to produce. These data sets are coded either by hand or through expert surveys, and routinely covering the world this way takes a lot of time and resources. (I say this from knowledge of the budgets for the production of some of these data sets, and from personal experience.) Partly because these data are so costly to make, many of these measures aren’t regularly updated. And, if the data aren’t regularly updated, we can’t use them to generate the real-time forecasts that offer the toughest test of our theories and are of practical value to some audiences.

As part of the NSF-funded MADCOW project*, Michael D. (Mike) Ward, Philip Schrodt, and I are exploring ways to use text mining and machine learning to generate measures of regime type that are fuzzier in a good way from a process that is mostly automated. These measures would explicitly represent uncertainty about where specific cases belong by reporting the probability that a certain case fits a certain regime type instead of forcing an either/or decision. Because the process of generating these measures would be mostly automated, they would be much cheaper to produce than the hand-coded or survey-based data sets we use now, and they could be updated in near-real time as relevant texts become available.

At this week’s annual meeting of the American Political Science Association, I’ll be presenting a paper—co-authored with Mike and Shahryar Minhas of Duke University’s WardLab—that describes preliminary results from this endeavor. Shahryar, Mike, and I started by selecting a corpus of familiar and well-structured texts describing politics and human-rights practices each year in all countries worldwide: the U.S. State Department’s Country Reports on Human Rights Practices, and Freedom House’s Freedom in the World. After pre-processing those texts in a few conventional ways, we dumped the two reports for each country-year into a single bag of words and used text mining to extract features from those bags in the form of vectorized tokens that may be grossly described as word counts. (See this recent post for some things I learned from that process.) Next, we used those vectorized tokens as inputs to a series of binary classification models representing a few different ideal-typical regime types as observed in few widely used, human-coded data sets. Finally, we applied those classification models to a test set of country-years held out at the start to assess the models’ ability to classify regime types in cases they had not previously “seen.” The picture below illustrates the process and shows how we hope eventually to develop models that can be applied to recent documents to generate new regime data in near-real time.

Overview of MADCOW Regime Classification Process

Overview of MADCOW Regime Classification Process

Our initial results demonstrate that this strategy can work. Our classifiers perform well out of sample, achieving high or very high precision and recall scores in cross-validation on all four of the regime types we have tried to measure so far: democracy, monarchy, military rule, and one-party rule. The separation plots below are based on out-of-sample results from support vector machines trained on data from the 1990s and most of the 2000s and then applied to new data from the most recent few years available. When a classifier works perfectly, all of the red bars in the separation plot will appear to the right of all of the pink bars, and the black line denoting the probability of a “yes” case will jump from 0 to 1 at the point of separation. These classifiers aren’t perfect, but they seem to be working very well.

 

prelim.democracy.svm.sepplot

prelim.military.svm.sepplot

prelim.monarchy.svm.sepplot

prelim.oneparty.svm.sepplot

Of course, what most of us want to do when we find a new data set is to see how it characterizes cases we know. We can do that here with heat maps of the confidence scores from the support vector machines. The maps below show the values from the most recent year available for two of the four regime types: 2012 for democracy and 2010 for military rule. These SVM confidence scores indicate the distance and direction of each case from the hyperplane used to classify the set of observations into 0s and 1s. The probabilities used in the separation plots are derived from them, but we choose to map the raw confidence scores because they exhibit more variance than the probabilities and are therefore easier to visualize in this form.

prelim.democracy.svmcomf.worldmap.2012

prelim.military.svmcomf.worldmap.2010

 

On the whole, cases fall out as we would expect them to. The democracy classifier confidently identifies Western Europe, Canada, Australia, and New Zealand as democracies; shows interesting variations in Eastern Europe and Latin America; and confidently identifies nearly all of the rest of the world as non-democracies (defined for this task as a Polity score of 10). Meanwhile, the military rule classifier sees Myanmar, Pakistan, and (more surprisingly) Algeria as likely examples in 2010, and is less certain about the absence of military rule in several West African and Middle Eastern countries than in the rest of the world.

These preliminary results demonstrate that it is possible to generate probabilistic measures of regime type from publicly available texts at relatively low cost. That does not mean we’re fully satisfied with the output and ready to move to routine data production, however. For now, we’re looking at a couple of ways to improve the process.

First, the texts included in the relatively small corpus we have assembled so far only cover a narrow set of human-rights practices and political procedures. In future iterations, we plan to expand the corpus to include annual or occasional reports that discuss a broader range of features in each country’s national politics. Eventually, we hope to add news stories to the mix. If we can develop models that perform well on an amalgamation of occasional reports and news stories, we will be able to implement this process in near-real time, constantly updating probabilistic measures of regime type for all countries of the world at very low cost.

Second, the stringent criteria we used to observe each regime type in constructing the binary indicators on which the classifiers are trained also appear to be shaping the results in undesirable ways. We started this project with a belief that membership in these regime categories is inherently fuzzy, and we are trying to build a process that uses text mining to estimate degrees of membership in those fuzzy sets. If set membership is inherently ambiguous in a fair number of cases, then our approximation of a membership function should be bimodal, but not too neatly so. Most cases most of the time can be placed confidently at one end of the range of degrees of membership or the other, but there is considerable uncertainty at any moment in time about a non-trivial number of cases, and our estimates should reflect that fact.

If that’s right, then our initial estimates are probably too tidy, and we suspect that the stringent operationalization of each regime type in the training data is partly to blame. In future iterations, we plan to experiment with less stringent criteria—for example, by identifying a case as military rule if any of our sources tags it as such. With help from Sean J. Taylor, we’re also looking at ways we might use Bayesian measurement error models to derive fuzzy measures of regime type from multiple categorical data sets, and then use that fuzzy measure as the target in our machine-learning process.

So, stay tuned for more, and if you’ll be at APSA this week, please come to our Friday-morning panel and let us know what you think.

* NSF Award 1259190, Collaborative Research: Automated Real-time Production of Political Indicators

EVEN BETTER Animated Map of Coup Attempts Worldwide, 1946-2013

[Click here to go straight to the map]

A week ago, I posted an animated map of coup attempts worldwide since 1946 (here). Unfortunately, those maps were built from a country-year data set, so we couldn’t see multiple attempts within a single country over the course of a year. As it happens, though, the lists of coup attempts on which that animation was based does specify the dates of those events. So why toss out all that information?

To get a sharper picture of the distribution of coup attempts across space and time, I rebuilt my mashed-up list of coup attempts from the original sources (Powell & Thyne and Marshall), but now with the dates included. Where only a month was given, I pegged the event to the first day of that month. To avoid double-counting, I then deleted events that appeared to be duplicates (same outcome in the same country within a single week). Finally, to get the animation in CartoDB to give a proper sense of elapsed time, I embedded the results in a larger data frame of all dates over the 68-year period observed. You can find the daily data on my Google Drive (here).

WordPress won’t seem to let me embed the results of my mapping directly in this post, but you can see and interact with the results at CartoDB (here). I think this version shows more clearly how much the rate of coup attempts has slowed in the past couple of decades, and it still does a good job of showing change over time in the geographic distribution of these events.

The two things I can’t figure out how to do so far are 1) to use color to differentiate between successful and failed attempts and 2) to show the year or month and year in the visualization so we know where we are in time. For differentiating by outcome, there’s a variable in the data set that does this, but it looks like the current implementation of the Torque option in CartoDB won’t let me show multiple layers or differentiate between the events by type. On showing the date, I have no clue. If anyone knows how to do either of these things, please let me know.

Playing Telephone with Data Science

You know the telephone game, where a bunch of people sit in a circle or around a table and pass a whispered sentence from person to person until it comes back to the one who started it and they say the version they heard out loud and you all crack up at how garbled it got?

Well, I wonder if John Beieler is cracking up or crying right now, because the same thing is happening with a visualization he created using data from the recently released Global Dataset on Events, Language, and Tone, a.k.a. GDELT.

Back at the end of July, John posted a terrific animated set of maps of protest activity worldwide since 1979. In a previous post on a single slice of the data used in that animation, John was careful to attach a number of caveats to the work: the maps only include events covered in the sources GDELT scours, GDELT sometimes codes events that didn’t happen, GDELT sometimes struggles to put events in their proper geographic location, event labels in the CAMEO event classification scheme GDELT uses doesn’t always mean what you think they mean, counts of events don’t tell you anything about the size or duration of the events being counted, etc., etc.  In the blogged cover letter for the animated series, John added one more very important caveat about the apparent increase in the incidence of protest activity over time:

When dealing with the time-series of data, however, one additional, and very important, point also applies. The number of events recorded in GDELT grows exponentially over time, as noted in the paper introducing the dataset. This means that over time there appears to be a steady increase in events, but this should not be mistaken as a rise in the actual amount of behavior X (protest behavior in this case). Instead, due to changes in reporting and the digital recording of news stories, it is simply the case that there are more events of every type over time. In some preliminary work that is not yet publicly released, protest behavior seems to remain relatively constant over time as a percentage of the total number of events. This means that while there was an explosion of protest activity in the Middle East, and elsewhere, during the past few years, identifying visible patterns is a tricky endeavor due to the nature of the underlying data.

John’s post deservedly caught the eye of J. Dana Stuster, an assistant editor at Foreign Policy, who wrote a bit about it last week. Stuster’s piece was careful to repeat many of John’s caveats, but the headline—“Mapped: Every Protest on the Planet since 1979″—got sloppy, essentially shedding several of the most important qualifiers. As John had taken pains to note, what we see in the maps is not all that there is, and some of what’s shown in the maps didn’t really happen.

Well, you can probably where this is going.  Not long after that Foreign Policy piece appeared, I saw this tweet from Chelsea Clinton:

In fewer than 140 characters, Clinton impressively managed to put back the caveat Foreign Policy had dropped in its headline about press coverage vs. reality, but the message had already been garbled, and now it was going viral. Fast forward to this past weekend, when the phrase “Watch a Jaw-dropping Visualization of Every Protest since 1979” made repeated appearances in my Twitter timeline. This next iteration came from Ultraculture blogger Jason Louv, and it included this bit:

Also fruitful: Comparing this data with media coverage and treatment of protest. Why is it easy to think of the 1960s and 70s as a time of dissent and our time as a more ordered, controlled and conformist period when the data so clearly shows that there is no comparison in how much protest there is now compared to then? Media distortion much?

So now we get a version that ignores both the caveat about GDELT’s coverage not being exhaustive or perfect and the related one about the apparent increase in protest volume over time being at least in part an artifact of “changes in reporting and the digital recording of news stories.” What started out as a simple proof-of-concept exercise—“The areas that are ‘bright’ are those that would generally be expected to be so,” John wrote in his initial post—had been twisted into a definitive visual record of protest activity around the world in the past 35 years.

As someone who thinks that GDELT is an analytical gusher and believes that it’s useful and important to make work like this accessible to broader audiences, I don’t know what to learn from this example. John was as careful as could be, but the work still mutated as it spread. How do you prevent this from happening, or at least mitigate the damage when it does?

If anyone’s got some ideas, I’d love to hear them.

Coup Risk in 2013, Mapped My Way

This blog’s gotten a lot more traffic than usual since yesterday, when Max Fisher of the Washington Post called out my 2013 coup forecasts in a post on WorldViews.

I’m grateful for the attention Max has drawn to my work, but if it had been up to me, I would have done the mapping a little differently. As I said to Max in an email from which he later excerpted, the forecasts simply aren’t sharp enough to parse the world as finely as their map did. Our theories of what causes coup attempts are too fuzzy and our measures of the things in those theories are too spotty to estimate the probability of these rare events with that much precision.

But, hey, I’m a data guy. I don’t have to stick to grumbling about the Post‘s map; I can make my own! So…

The map below sorts the countries of the world into three groups based on their relative coup risk for 2013: highest (red), moderate (orange), and lowest (beige). I emphasize “relative” because coup attempts are very rare, so the estimated risk of coup attempts in any given country in any single year is pretty small. For example, Guinea-Bissau tops my list for 2013, and the estimated probability of at least one coup attempt occurring there this year is only 25%. Most countries worldwide are under 2%.

Consistent with an emphasis on relative risk, the categories I’ve mapped are based on rank order, not predicted probability. The riskiest fifth of the world (33 countries) makes up the “highest” group, the second fifth the “moderate” group, and the bottom three-fifths the “lowest” group.

This forecasting process doesn’t have enough of track record for me to say exactly how those categories relate to real-world risk, but based on my experience working with similar data and models, I would expect roughly four of every five coup attempts to occur in countries identified here as high risk, and the occasional “miss” to come from the moderate-risk set. Only very rarely should coup attempts come from the 100 or so countries in the low-risk group.

coup_risk_map_2013

FTR, this map was made in R using the ‘rworldmap‘ package.

Follow

Get every new post delivered to your Inbox.

Join 6,748 other followers

%d bloggers like this: