A Postscript on Measuring Change Over Time in Freedom in the World

After publishing yesterday’s post on Freedom House’s latest Freedom in the World report (here), I thought some more about better ways to measure what I think Freedom House implies it’s measuring with its annual counts of country-level gains and declines. The problem with those counts is that they don’t account for the magnitude of the changes they represent. That’s like keeping track of how a poker player is doing by counting bets won and bets lost without regard to their value. If we want to assess the current state of the system and compare it earlier states, the size of those gains and declines matters, too.

With that in mind, my first idea was to sum the raw annual changes in countries’ “freedom” scores by year, where the freedom score is just the sum of those 7-point political rights and civil liberties indices. Let’s imagine a year in which three countries saw a 1-point decline in their freedom scores; one country saw a 1-point gain; and one country saw a 3-point gain. Using Freedom House’s measure, that would look like a bad year, with declines outnumbering gains 3 to 2. Using the sum of the raw changes, however, it would look like a good year, with a net change in freedom scores of +1.

Okay, so here’s a plot of those sums of raw annual changes in freedom scores since 1982, when Freedom House rejiggered the timing of its survey.[1] I’ve marked the nine-year period that Freedom House calls out in its report as an unbroken run of bad news, with declines outnumbering gains every year since 2006. As the plot shows, when we account for the magnitude of those gains and losses, things don’t look so grim. In most of those nine years, losses did outweigh gains, but the net loss was rarely large, and two of the nine years actually saw net gains by this measure.

Annual global sums of raw yearly changes in Freedom House freedom scores (inverted), 1983-2014

Annual global sums of raw yearly changes in Freedom House freedom scores (inverted), 1983-2014

After I’d generated that plot, though, I worried that the sum of those raw annual changes still ignored another important dimension: population size. As I understand it, the big question Freedom House is trying to address with its annual report is: “How free is the world?” If we want to answer that question from a classical liberal perspective—and that’s where I think Freedom House is coming from—then individual people, not states, need to be our unit of observation.

Imagine a world with five countries where half the global population lives in one country and the other half is evenly divided between the other four. Now let’s imagine that the one really big country is maximally unfree while the other four countries are maximally free. If we compare scores (or changes in them) by country, things look great; 80 percent of the world is super-free! Meanwhile, though, half the world’s population lives under total dictatorship. An international relations theorist might care more about the distribution of states, but a liberal should care more about the distribution of people.

To take a look at things from this perspective, I decided to generate a scalar measure of freedom in the world system that sums country scores weighted by their share of the global population.[2] To make the result easier to interpret, I started by rescaling the country-level “freedom scores” from 14-2 to 0-10, with 10 indicating most free. A world in which all countries are fully free (according to Freedom House) would score a perfect 10 on this scale, and changes in large countries will move the index more than changes in small ones.

Okay, so here’s a plot of the results for the entire run of Freedom House’s data set, 1972–2014. (Again, 1981 is missing because that’s when Freedom House paused to align their reports with the calendar year.)  Things look pretty different than they do when we count gains and declines or even sum raw changes by country, don’t they?

A population-weighted annual scalar measure of freedom in the world, 1972-2014

A population-weighted annual scalar measure of freedom in the world, 1972-2014

The first thing that jumped out at me were those sharp declines in the mid-1970s and again in the late 1980s and early 1990s. At first I thought I must have messed up the math, because everyone knows things got a lot better when Communism crumbled in Eastern Europe and the Soviet Union, right? It turns out, though, that those swings are driven by changes in China and India, which together account for approximately one-third of the global population. In 1989, after Tienanmen Square, China’s score dropped from a 6/6 (or 1.67 on my 10-point scalar version) to 7/7 (or 0). At the time, China contained nearly one-quarter of the world’s population, so that slump more than offsets the (often-modest) gains made in the countries touched by the so-called fourth wave of democratic transitions. In 1998, China inched back up to 7/6 (0.83), and the global measure moved with it. Meanwhile, India dropped from 2/3 (7.5) to 3/4 (5.8) in 1991, and then again from 3/4 to 4/4 (5.0) in 1993, but it bumped back up to 2/4 (6.67) in 1996 and then 2/3 (7.5) in 1998. The global gains and losses produced by the shifts in those two countries don’t fully align with the conventional narrative about trends in democratization in the past few decades, but I think they do provide a more accurate measure of overall freedom in the world if we care about people instead of states, as liberalism encourages us to do.

Of course, the other thing that caught my eye in that second chart was the more-or-less flat line for the past decade. When we consider the distribution of the world’s population across all those countries where Freedom House tallies gains and declines, it’s hard to find evidence of the extended democratic recession they and others describe. In fact, the only notable downturn in that whole run comes in 2014, when the global score dropped from 5.2 to 5.1. To my mind, that recent downturn marks a worrying development, but it’s harder to notice it when we’ve been hearing cries of “Wolf!” for the eight years before.

NOTES

[1] For the #Rstats crowd: I used the slide function in the package DataCombine to get one-year lags of those indices by country; then I created a new variable representing the difference between the annual score for the current and previous year; then I used ddply from the plyr package to create a data frame with the annual global sums of those differences. Script on GitHub here.

[2] Here, I used the WDI package to get country-year data on population size; used ddply to calculate world population by year; merged those global sums back into the country-year data; used those sums as the denominator in a new variable indicating a country’s share of the global population; and then used ddply again to get a table with the sum of the products of those population weights and the freedom scores. Again, script on GitHub here (same one as before).

Statistical Assessments of Coup Risk for 2015

Which countries around the world are more likely to see coup attempts in 2015?

For the fourth year in a row, I’ve used statistical models to generate one answer to that question, where a coup is defined more or less as a forceful seizure of national political authority by military or political insiders. (I say “more or less” because I’m blending data from two sources with slightly different definitions; see below for details.) A coup doesn’t need to succeed to count as an attempt, but it does need to involve public action; alleged plots and rumors of plots don’t qualify. Neither do insurgencies or foreign invasions, which by definition involve military or political outsiders. The heat map below shows variation in estimated coup risk for 2015, with countries colored by quintiles (fifths).

forecast.heatmap.2015

The dot plot below shows the estimates and their 90-percent confidence intervals (CIs) for the 40 countries with the highest estimated risk. The estimates are the unweighted average of forecasts from two logistic regression models; more on those in a sec. To get CIs for estimates from those two models, I took a cue from a forthcoming article by Lyon, Wintle, and Burgman (fourth publication listed here; the version I downloaded last year has apparently been taken down, and I can’t find another) and just averaged the CIs from the two models.

forecast.dotplot.2015

I’ve consistently used simple two– or three-model ensembles to generate these coup forecasts, usually pairing a logistic regression model with an implementation of Random Forests on the same or similar data. This year, I decided to use only a pair of logistic regression models representing somewhat different ideas about coup risk. Consistent with findings from other work in which I’ve been involved (here), k-fold cross-validation told me that Random Forests wasn’t really boosting forecast accuracy, and sticking to logistic regression makes it possible to get and average those CIs. The first model matches one I used last year, and it includes the following covariates:

  • Infant mortality rate. Deaths of children under age 1 per 1,000 live births, relative to the annual global median, logged. This measure that primarily reflects national wealth but is also sensitive to variations in quality of life produced by things like corruption and inequality. (Source: U.S. Census Bureau)
  • Recent coup activity. A yes/no indicator of whether or not there have been any coup attempts in that country in the past five years. I’ve tried logged event counts and longer windows, but this simple version contains as much predictive signal as any other. (Sources: Center for Systemic Peace and Powell and Thyne)
  • Political regime type. Following Fearon and Laitin (here), a categorical measure differentiating between autocracies, anocracies, democracies, and other forms. (Source: Center for Systemic Peace, with hard-coded updates for 2014)
  • Regime durability. The “number of years since the last substantive change in authority characteristics (defined as a 3-point change in the POLITY score).” (Source: Center for Systemic Peace, with hard-coded updates for 2014)
  • Election year. A yes/no indicator for whether or not any national elections (executive, legislative, or general) are scheduled to take place during the forecast year. (Source: NELDA, with hard-coded updates for 2011–2015)
  • Economic growth. The previous year’s annual GDP growth rate. To dampen the effects of extreme values on the model estimates, I take the square root of the absolute value and then multiply that by -1 for cases where the raw value less than 0. (Source: IMF)
  • Political salience of elite ethnicity. A yes/no indicator for whether or not the ethnic identity of national leaders is politically salient. (Source: PITF, with hard-coded updates for 2014)
  • Violent civil conflict. A yes/no indicator for whether or not any major armed civil or ethnic conflict is occurring in the country. (Source: Center for Systemic Peace, with hard-coded updates for 2014)
  • Country age. Years since country creation or independence, logged. (Source: me)
  • Coup-tagion. Two variables representing (logged) counts of coup attempts during the previous year in other countries around the world and in the same geographic region. (Source: me)
  • Post–Cold War period. A binary variable marking years after the disintegration of the USSR in 1991.
  • Colonial heritage. Three separate binary indicators identifying countries that were last colonized by Great Britain, France, or Spain. (Source: me)

The second model takes advantage of new data from Geddes, Wright, and Frantz on autocratic regime types (here) to consider how qualitative differences in political authority structures and leadership might shape coup risk—both directly, and indirectly by mitigating or amplifying the effects of other things. Here’s the full list of covariates in this one:

  • Infant mortality rate. Deaths of children under age 1 per 1,000 live births, relative to the annual global median, logged. This measure that primarily reflects national wealth but is also sensitive to variations in quality of life produced by things like corruption and inequality. (Source: U.S. Census Bureau)
  • Recent coup activity. A yes/no indicator of whether or not there have been any coup attempts in that country in the past five years. I’ve tried logged event counts and longer windows, but this simple version contains as much predictive signal as any other. (Sources: Center for Systemic Peace and Powell and Thyne)
  • Regime type. Using the binary indicators included in the aforementioned data from Geddes, Wright, and Frantz with hard-coded updates for the period 2011–2014, a series of variables differentiating between the following:
    • Democracies
    • Military autocracies
    • One-party autocracies
    • Personalist autocracies
    • Monarchies
  • Regime duration. Number of years since the last change in political regime type, logged. (Source: Geddes, Wright, and Frantz, with hard-coded updates for the period 2011–2014)
  • Regime type * regime duration. Interactions to condition the effect of regime duration on regime type.
  • Leader’s tenure. Number of years the current chief executive has held that office, logged. (Source: PITF, with hard-coded updates for 2014)
  • Regime type * leader’s tenure. Interactions to condition the effect of leader’s tenure on regime type.
  • Election year. A yes/no indicator for whether or not any national elections (executive, legislative, or general) are scheduled to take place during the forecast year. (Source: NELDA, with hard-coded updates for 2011–2015)
  • Regime type * election year. Interactions to condition the effect of election years on regime type.
  • Economic growth. The previous year’s annual GDP growth rate. To dampen the effects of extreme values on the model estimates, I take the square root of the absolute value and then multiply that by -1 for cases where the raw value less than 0. (Source: IMF)
  • Regime type * economic growth. Interactions to condition the effect of economic growth on regime type.
  • Post–Cold War period. A binary variable marking years after the disintegration of the USSR in 1991.

As I’ve done for the past couple of years, I used event lists from two sources—the Center for Systemic Peace (about halfway down the page here) and Jonathan Powell and Clayton Thyne (Dataset 3 here)—to generate the historical data on which those models were trained. Country-years are the unit of observation in this analysis, so a country-year is scored 1 if either CSP or P&T saw any coup attempts there during those 12 months and 0 otherwise. The plot below shows annual counts of successful and failed coup attempts in countries worldwide from 1946 through 2014 according to the two data sources. There is a fair amount of variance in the annual counts and the specific events that comprise them, but the basic trend over time is the same. The incidence of coup attempts rose in the 1950s; spiked in the early 1960s; remained relatively high throughout the rest of the Cold War; declined in the 1990s, after the Cold War ended; and has remained relatively low throughout the 2000s and 2010s.

Annual counts of coup events worldwide from two data sources, 1946-2014

Annual counts of coup events worldwide from two data sources, 1946-2014

I’ve been posting annual statistical assessments of coup risk on this blog since early 2012; see here, here, and here for the previous three iterations. I have rejiggered the modeling a bit each year, but the basic process (and the person designing and implementing it) has remained the same. So, how accurate have these forecasts been?

The table below reports areas under the ROC curve (AUC) and Brier scores (the 0–1 version) for the forecasts from each of those years and their averages, using the the two coup event data sources alone and together as different versions of the observed ground truth. Focusing on the “either” columns, because that’s what I’m usually using when estimating the models, we can see the the average accuracy—AUC in the low 0.80s and Brier score of about 0.03—is comparable to what we see in many other country-year forecasts of rare political events using a variety of modeling techniques (see here). With the AUC, we can also see a downward trend over time. With so few events involved, though, three years is too few to confidently deduce a trend, and those averages are consistent with what I typically see in k-fold cross-validation. So, at this point, I suspect those swings are just normal variation.

AUC and Brier scores for coup forecasts posted on Dart-Throwing Chimp, 2012-2014, by coup event data source

AUC and Brier scores for coup forecasts posted on Dart-Throwing Chimp, 2012-2014, by coup event data source

The separation plot designed by Greenhill, Ward, and Sacks (here) offers a nice way to visualize the accuracy of these forecasts. The ones below show the three annual slices using the “either” version of the outcome, and they reinforce the story told in the table: the forecasts have correctly identified most of the countries that saw coup attempts in the past three years as relatively high-risk cases, but the accuracy has declined over time. Let’s define a surprise as a case that fell outside the top 30 of the ordered forecasts but still saw a coup attempt. In 2012, just one of four countries that saw coup attempts was a surprise: Papua New Guinea, ranked 48. In 2013, that number increased to two of five (Eritrea at 51 and Egypt at 58), and in 2014 it rose to three of five (Burkina Faso at 42, Ukraine at 57, and the Gambia at 68). Again, though, the average accuracy across the three years is consistent with what I typically see in k-fold cross-validation of these kinds of models in the historical data, so I don’t think we should make too much of that apparent time trend just yet.

cou.scoring.sepplot.2012 cou.scoring.sepplot.2013 cou.scoring.sepplot.2014

This year, for the first time, I am also running an experiment in crowdsourcing coup risk assessments by way of a pairwise wiki survey (survey here, blog post explaining it here, and preliminary results discussed here). My long-term goal is to repeat this process numerous times on this topic and some others (for example, onsets of state-led mass killing episodes) to see how the accuracy of the two approaches compares and how their output might be combined. Statistical forecasts are usually much more accurate than human judgment, but that advantage may be reduced or eliminated when we aggregate judgments from large and diverse crowds, or when we don’t have data on important features to use in those statistical models. Models that use annual data also suffer in comparison to crowdsourcing processes that can update continuously, as that wiki survey does (albeit with a lot of inertia).

We can’t incorporate the output from that wiki survey into the statistical ensemble, because the survey doesn’t generate predicted probabilities; it only assesses relative risk. We can, however, compare the rank orderings the two methods produce. The plot below juxtaposes the rankings produced by the statistical models (left) with the ones from the wiki survey (right). About 500 votes have been cast since I wrote up the preliminary results, but I’m going to keep things simple for now and use the preliminary survey results I already wrote up. The colored arrows identify cases ranked at least 10 spots higher (red) or lower (blue) by the crowd than the statistical models. As the plot shows, there are many differences between the two, even toward the top of the rankings where the differences in statistical estimates are bigger and therefore more meaningful. For example, the crowd sees Nigeria, Libya, and Venezuela as top 10 risks while the statistical models do not; of those three, only Nigeria ranks in the top 30 on the statistical forecasts. Meanwhile, the crowd pushes Niger and Guinea-Bissau out of the top 10 down to the 20s, and it sees Madagascar, Afghanistan, Egypt, and Ivory Coast as much lower risks than the models do. Come 2016, it will be interesting to see which version was more accurate.

coup.forecast.comparison.2015

If you are interested in getting hold of the data or R scripts used to produce these forecasts and figures, please send me an email at ulfelder at gmail dot com.

A Crowd’s-Eye View of Coup Risk in 2015

A couple of weeks ago (here), I used the blog to launch an experiment in crowdsourcing assessments of coup risk for 2015 by way of a pairwise wiki survey. The survey is still open and will stay that way until the end of the year, but with nearly 2,700 pairwise votes already cast, I thought it was good time to take stock of the results so far.

Before discussing those results, though, let me say thank you to all the people who voted in the survey or shared the link. These data don’t materialize from thin air. They only exist because busy people contributed their knowledge and time, and I really appreciate all of those contributions.

Okay, so, what does that self-assembled crowd think about relative risks of coup attempts in 2015? The figure below maps the country scores produced from the votes cast so far. Darker grey indicates higher risk. PLEASE NOTE: Those scores fall on a 0–100 scale, but they are not estimated probabilities of a coup attempt. Instead, they are only measures of relative risk, because that’s all we can get from a pairwise wiki survey. Coup attempts are rare events—in most recent years, we’ve seen fewer than a handful of them worldwide—so the safe bet for nearly every country every year is that there won’t be any coup attempts this year.

wikisurvey.couprisk.2015.map

 

Smaller countries can be hard to find on that map, and small differences in scores can be hard to discern, so I also like to have a list of the results to peruse. Here’s a dot plot with countries in descending order by model score. (It’d be nice to make this table sortable so you could also look for countries alphabetically, but my Internet fu is not up to that task.)

wikisurvey.couprisk.2015.dotplot

This survey is open to the public, and participants may cast as many votes as they like in as many sessions as they like. The scores summarized above come from nearly 2,700 votes cast between the morning of January 3, when I published the blog post about the survey, and the morning of January 14, when I downloaded a report on the current results. At present, this blog has a few thousand followers on Wordpress and a few hundred email subscribers. I also publicized the survey twice on Twitter, where I have approximately 6,000 followers: once when I published the initial blog post, and again on January 13. As the plot below shows, participation spiked around both of those pushes and was low otherwise.

votesovertime.20150114

The survey instrument does not collect identifying information about participants, so it is impossible to describe the make-up of the crowd. What we do know is that those votes came from about 100 unique user sessions. Some people probably participated more than once—I know that I cast a dozen or so votes on a few occasions—so 100 unique sessions probably works out to something like 80 or 90 individuals. But that’s a guess.

usersessions.20150114

We also know that those votes came from lots of different parts of the world. As the map below shows, most of the votes came from the U.S., Europe, and Australia, but there were also pockets of activity in the Middle East (especially Israel), Latin America (Brazil and Argentina), Africa (Cote d’Ivoire and Rwanda), and Asia (Thailand and Bangladesh).

votemap.20150114

I’ll talk a little more about the substance of these results when I publish my statistical assessments of coup risk for 2015, hopefully in the next week or so. Meanwhile, number-crunchers can get a .csv with the data used to generate the map and table in this post from my Google Drive (here) and the R script from GitHub (here). If you’re interested in seeing the raw vote-level data from which those scores were generated, drop me a line.

2014 NFL Football Season Predictions

Professional (American) football season starts tonight when the Green Bay Packers visit last year’s champs, the Seattle Seahawks, for a Thursday-night opener thing that still seems weird to me. (SUNDAY, people. Pro football is played on Sunday.) So, who’s likely to win?

With the final preseason scores from our pairwise wiki survey in hand, we can generate a prediction for that game, along with all 255 other regular-season contests on the 2014 schedule. As I described in a recent post, this wiki survey offers a novel way to crowdsource the problem of estimating team strength before the season starts. We can use last year’s preseason survey data and game results to estimate a simple statistical model that accounts for two teams’ strength differential and home-field advantage. Then, we can apply that model to this year’s survey results to get game-level forecasts.

In the last post, I used the initial model estimates to generate predicted net scores (home minus visitor) and confidence intervals. This time, I thought I’d push it a little further and use predictive simulations. Following Gelman and Hill’s Data Analysis Using Regression and Multilevel/Hierarchical Models (2009), I generated 1,000 simulated net scores for each game and then summarized the distributions of those scores to get my statistics of interest.

The means of those simulated net scores for each game represent point estimates of the outcome, and the variance of those distributions gives us another way to compute confidence intervals. Those means and confidence intervals closely approximate the ones we’d get from a one-shot application of the predictive model to the 2014 survey results, however, so there’s no real new information there.

What we can do with those distributions that is new is compute win probabilities. The share of simulated net scores above 0 gives us an estimate of the probability of a home-team win, and 1 minus that estimate gives us the probability of a visiting-team win.

A couple of pictures make this idea clearer. First, here’s a histogram of the simulated net scores for tonight’s Packers-Seahawks game. The Packers fared pretty well in the preseason wiki survey, ranking 5th overall with a score of 77.5 out of 100. The defending-champion Seahawks got the highest score in the survey, however—a whopping 92.6—and they have home-field advantage, which is worth about 3.1 points on average, according  to my model. In my predictive simulations, 673 of the 1,000 games had a net score above 0, suggesting a win probability of 67%, or 2:1 odds, in favor of the Seahawks. The mean predicted net score is 5.8, which is pretty darn close to the current spread of -5.5.

Seattle Seahawks.Green Bay Packers

Things look a little tighter for the Bengals-Ravens contest, which I’ll be attending with my younger son on Sunday in our once-annual pilgrimage to M&T Bank Stadium. The Ravens wound up 10th in the wiki survey with a score of 60.5, but the Bengals are just a few rungs down the ladder, in 13th, with a score of 54.7. Add in home-field advantage, though, and the simulations give the Ravens a win probability of 62%, or about 3:2 odds. Here, the mean net score is 3.6, noticeably higher than the current spread of -1.5 but on the same side of the win/loss line. (N.B. Because the two teams’ survey scores are so close, the tables turn when Cincinnati hosts in Week 8, and the predicted probability of a home win is 57%.)

Baltimore Ravens.Cincinnati Bengals

Once we’ve got those win probabilities ginned up, we can use them to move from game-level to season-level forecasts. It’s tempting to think of the wiki survey results as season-level forecasts already, but what they don’t do is account for variation in strength of schedule. Other things being equal, a strong team with a really tough schedule might not be expected to do much better than a mediocre team with a relatively easy schedule. The model-based simulations refract those survey results through the 2014 schedule to give us a clearer picture of what we can expect to happen on the field this year.

The table below (made with the handy ‘textplot’ command in R’s gplots package) turns the predictive simulations into season-level forecasts for all 32 teams.* I calculated two versions of a season summary and juxtaposed them to the wiki survey scores and resulting rankings. Here’s what’s in the table:

  • WikiRank shows each team’s ranking in the final preseason wiki survey results.
  • WikiScore shows the score on which that ranking is based.
  • WinCount counts the number of games in which each team has a win probability above 0.5. This process gives us a familiar number, the first half of a predicted W-L record, but it also throws out a lot of information by treating forecasts close to 0.5 the same as ones where we’re more confident in our prediction of the winner.
  • WinSum, is the sum of each team’s win probabilities across the 16 games. This expected number of wins is a better estimate of each team’s anticipated results than WinCount, but it’s also a less familiar one, so I thought I would show both.

Teams appear in the table in descending order of WinSum, which I consider the single-best estimate in this table of a team’s 2014 performance. It’s interesting (to me, anyway) to see how the rank order changes from the survey to the win totals because of differences in strength of schedule. So, for example, the Patriots ranked 4th in the wiki survey, but they get the second-highest expected number of wins this year (9.8), just behind the Seahawks (9.9). Meanwhile, the Steelers scored 16th in the wiki survey, but they rank 11th in expected number of wins with an 8.4. That’s a smidgen better than the Cincinnati Bengals (8.3) and not much worse than the Baltimore Ravens (9.0), suggesting an even tighter battle for the AFC North division title than the wiki survey results alone.

2014 NFL Season-Level Forecasts from 1,000 Predictive Simulations Using Preseason Wiki Survey Results and Home-Field Advantage

2014 NFL Season-Level Forecasts from 1,000 Predictive Simulations Using Preseason Wiki Survey Results and Home-Field Advantage

There are a lot of other interesting quantities we could extract from the results of the game-level simulations, but that’s all I’ve got time to do now. If you want to poke around in the original data and simulation results, you can find them all in a .csv on my Google Drive (here). I’ve also posted a version of the R script I used to generate the game-level and season-level forecasts on Github (here).

At this point, I don’t have plans to try to update the forecasts during the season, but I will be seeing how the preseason predictions fare and occasionally reporting the results here. Meanwhile, if you have suggestions on other ways to use these data or to improve these forecasts, please leave a comment here on the blog.

* The version of this table I initially posted had an error in the WikiRank column where 18 was skipped and the rankings ran to 33. This version corrects that error. Thanks to commenter C.P. Liberatore for pointing it out.

Uncertainty About How Best to Convey Uncertainty

NPR News ran a series of stories this week under the header Risk and Reason, on “how well we understand and act on probabilities.” I thought the series nicely represented how uncertain we are about how best to convey forecasts to people who might want to use them. There really is no clear standard here, even though it is clear that the choices we make in presenting forecasts and other statistics on risks to their intended consumers strongly shape what they hear.

This uncertainty about how best to convey forecasts was on full display in the piece on how CIA analysts convey predictive assessments (here). Ken Pollack, a former analyst who now teaches intelligence analysis, tells NPR that, at CIA, “There was a real injunction that no one should ever use numbers to explain probability.” Asked why, he says that,

Assigning numerical probability suggests a much greater degree of certainty than you ever want to convey to a policymaker. What we are doing is inherently difficult. Some might even say it’s impossible. We’re trying to protect the future. And, you know, saying to someone that there’s a 67 percent chance that this is going to happen, that sounds really precise. And that makes it seem like we really know what’s going to happen. And the truth is that we really don’t.

In that same segment, though, Dartmouth professor Jeff Friedman, who studies decision-making about national security issues, says we should provide a numeric point estimate of an event’s likelihood, along with some information about our confidence in that estimate and how malleable it may be. (See this paper by Friedman and Richard Zeckhauser for a fuller treatment of this argument.) The U.S. Food and Drug Administration apparently agrees; according to the same NPR story, the FDA “prefers numbers and urges drug companies to give numerical values for risk—and to avoid using vague terms such as ‘rare, infrequent and frequent.'”

Instead of numbers, Pollack advocates for using words: “Almost certainly or highly likely or likely or very unlikely,” he tells NPR. As noted by one of the other stories in the series (here), however—on the use of probabilities in medical decision-making—words and phrases are ambiguous, too, and that ambiguity can be just as problematic.

Doctors, including Leigh Simmons, typically prefer words. Simmons is an internist and part of a group practice that provides primary care at Mass General. “As doctors we tend to often use words like, ‘very small risk,’ ‘very unlikely,’ ‘very rare,’ ‘very likely,’ ‘high risk,’ ” she says.

But those words can be unclear to a patient.

“People may hear ‘small risk,’ and what they hear is very different from what I’ve got in my mind,” she says. “Or what’s a very small risk to me, it’s a very big deal to you if it’s happened to a family member.

Intelligence analysts have sometimes tried to remove that ambiguity by standardizing the language they use to convey likelihoods, most famously in Sherman Kent’s “Words of Estimative Probability.” It’s not clear to me, though, how effective this approach is. For one thing, consumers are often lazy about trying to understand just what information they’re being given, and templates like Kent’s don’t automatically solve that problem. This laziness came across most clearly in NPR’s Risk and Reason segment on meteorology (here). Many of us routinely consume probabilistic forecasts of rainfall and make decisions in response to them, but it turns out that few of us understand what those forecasts actually mean. With Kent’s words of estimative probability, I suspect that many readers of the products that use them haven’t memorized the table that spells out their meaning and don’t bother to consult it when they come across those phrases, even when it’s reproduced in the same document.

Equally important, a template that works well for some situations won’t necessarily work for all. I’m thinking in particular of forecasts on the kinds of low-probability, high-impact events that I usually analyze and that are essential to the CIA’s work, too. Here, what look like small differences in probability can sometimes be very meaningful. For example, imagine that it’s August 2001 and you’ve three different assessments of the risk of a major terrorist attack on U.S. soil in the next few months. One pegs the risk at 1 in 1,000; another at 1 in 100; and another at 1 in 10. Using Kent’s table, all three of those assessments would get translated into a statement that the event is “almost certainly not” going to happen, but I imagine that most U.S. decision-makers would have felt very differently about risks of 0.1%, 1%, and 10% with a threat of that kind.

There are lots of rare but important events that inhabit this corner of the probability space: nuclear accidents, extreme weather events, medical treatments, and mass atrocities, to name a few. We could create a separate lexicon for assessments in these areas, as the European Medicines Agency has done for adverse reactions to medical therapies (here, via NPR). I worry, though, that we ask too much of consumers of these and other forecasts if we expect them to remember multiple lexicons and to correctly code-switch between them. We also know that the relevant scale will differ across audiences, even on the same topic. For example, an individual patient considering a medical treatment might not care much about the difference between a mortality risk of 1 in 1,000 and 1 in 10,000, but a drug company and the regulators charged with overseeing them hopefully do.

If there’s a general lesson here, it’s that producers of probabilistic forecasts should think carefully about how best to convey their estimates to specific audiences. In practice, that means thinking about the nature of the decision processes those forecasts are meant to inform and, if possible, trying different approaches and checking to find out how each is being understood. Ideally, consumers of those forecasts should also take time to try to educate themselves on what they’re getting. I’m not optimistic that many will do that, but we should at least make it as easy as possible for them to do so.

Is the World Boiling Over or Just Getting Back to Normal?

Here’s a plot of observed and “predicted” rates of political instability onset around the world from 1956 to 2012, the most recent year for which I now have data. The dots are the annual rates, and the lines are smoothing curves fitted from those annual rates using local regression (or loess).

  • The observed rates come from the U.S. government-funded Political Instability Task Force (PITF), which identifies political instability through the occurrence of civil war, state collapse, contested state break-up, abrupt declines in democracy, or genocide or politicide. The observed rate is just the number of onsets that occurred that year divided by the number of countries in the world at the time.
  • The “predicted” probabilities come from an approximation of a model the PITF developed to assess risks of instability onset in countries worldwide. That model includes measures of infant mortality, political regime type, state-led communal discrimination, armed conflict in nearby states, and geographic region. (See this 2010 journal article on which I was a co-author for more info.) In the plot, the “predicted” rate (green) is the sum of the predicted probabilities for the year divided by the number of countries with predicted probabilities that year. I put predicted in quotes because these are in-sample estimates and not actual forecasts.
Observed and Predicted Rates of Political Instability Onset Worldwide, 1956-2012

Observed and Predicted Rates of Political Instability Onset Worldwide, 1956-2012

I see a couple of interesting things in that plot.

First, these data suggest that the anomaly we need to work harder to explain isn’t the present but the recent past. As the right-most third of the plot shows, the observed incidence of political instability was unusually low in the 1990s and 2000s. For the previous several decades, the average annual rate of instability onset was about 4 percent. Apart from some big spikes around decolonization and the end of the Cold War, the trend over time was pretty flat. Then came the past 20 years, when the annual rate has hovered around 2 percent, and the peaks have barely reached the Cold War–era average. In the context of the past half-century, then, any upticks we’ve seen in the past few years don’t seem so unusual. To answer the question in this post’s title, it looks like the world isn’t boiling over after all. Instead, it looks more like we’re returning to a state of affairs that was, until recently, normal.

Second, the differences between the observed and “predicted” rates suggest that the recent window of comparative stability can’t be explained by generic trends in the structural factors that best predict instability. If anything, the opposite is true. According to our structural model of instability risk, we should have seen an increase in the rate of these crises in the past 20 years, as more countries moved from dictatorial regimes to various transitional and hybrid forms of government. Instead, we saw the opposite. He or she who can explain why that’s so with a theory that accurately predicts where this trend is now headed deserves a…well, whatever prize political scientists would get if we had our own Fields Medal.

For the latest data on the political instability events PITF tracks, see the Center for Systemic Peace’s data page. For the data and code used to approximate the PITF’s global instability model, see this GitHub repository of mine.

The Evolution of Political Regimes, Freedom House Version

A year and a half ago, I posted animated heat maps that used Polity data to look at the evolution of national political regimes at the global level over the past two centuries (here and here). Polity hasn’t posted new data for 2013 yet, but Freedom House (sort of) has, so I thought I’d apply the same template to Freedom House’s measures of political rights and civil liberties and see what stories emerged.

The result is shown below. Here are a few things to keep in mind when watching it:

  • The cells in each frame represent annual proportions of all national political regimes worldwide. The darker the gray, the larger the share of the world’s regimes that year.
  • Freedom House’s historical depth is much shallower than Polity’s—coverage begins in 1972 instead of 1800—so we’re missing most of the story the Polity version told about the advent and spread of contemporary democracy in the 19th and 20th centuries. Oh, well.
  • The order of the Freedom House indices is counter-intuitive. One is most liberal (“freest”) and 7 is least. So in these plots, the upper right-hand corner is where you’d find most of Europe and North America today, and the lower left-hand corner is where you’ll find what Freedom House calls “the worst of the worst.”
  • One year (1981) is missing because Freedom House made some changes to its process around that time that meant they effectively skipped a year.
  • For details on what the two measures are meant to represent and how they are produced, see Freedom House’s Methodology Fact Sheet.

freedomhouse.heatmap.20140213

Now here are a few things that occur to me when watching it.

  • The core trend is clear and unsurprising. Over the past four decades, national political regimes around the world have trended more liberal (see this post for more on that). We can see that here in the fading of the cells in the lower left and the flow of that color toward the upper right.
  • You have to look a little harder for it, but I think I can see the slippage that Freedom House emphasizes in its recent reports, too. Compared with the 1970s, 1980s, and even 1990s, the distributions of the past several years still look quite liberal, but it’s also evident that national political regimes aren’t marching inexorably into that upper right-hand corner. Whether that’s just the random part of a process that remains fundamentally unchanged or the start of a sustained slide from a historical peak, we’ll just have to wait and see. (My money’s on the former.)
  • These plots also show just how tightly coupled these two indices are. Most of the cells far from the heavily populated diagonal never register a single case. This visual pattern reinforces the idea that these two indices aren’t really measuring independent aspects of governance. Instead, they look more like two expressions of a common underlying process. (For deep thoughts on these measurement issues, see Munck and Verkuilen 2002 and Coppedge et al. 2011 [gated, sorry].)

You can find the R script used to produce this .gif on GitHub (here) and the data set used by that script on Google Drive (here). Freedom House hasn’t yet released the 2013 data in tabular format, so I typed those up myself and then merged the results with a table created from last year’s spreadsheet.

  • Author

  • Follow me on Twitter

  • Follow Dart-Throwing Chimp on WordPress.com
  • Enter your email address to follow this blog and receive notifications of new posts by email.

    Join 13,614 other followers

  • Archives

%d bloggers like this: