The Worst World EVER…in the Past 5 or 10 Years

A couple of months ago, the head of the UN’s refugee agency announced that, in 2013, “the number of people displaced by violent conflict hit the highest level since World War II,” and he noted that the number was still growing in 2014.

A few days ago, under the headline “Countries in Crisis at Record High,” Foreign Policy‘s The Cable reported that the UN’s Inter-Agency Standing Committee for the first time ever had identified four situations worldwide—Syria, Iraq, South Sudan, and Central African Republic—as level 3 humanitarian emergencies, its highest (worst) designation.

Today, the Guardian reported that “last year was the most dangerous on record for humanitarian workers, with 155 killed, 171 seriously wounded and 134 kidnapped as they attempted to help others in some of the world’s most dangerous places.'”

If you read those stories, you might infer that the world has become more insecure than ever, or at least the most insecure it’s been since the last world war. That would be reasonable, but probably also wrong.  These press accounts of record-breaking trends are often omitting or underplaying a crucial detail: the data series on which these claims rely don’t extend very far into the past.

In fact, we don’t know how the current number of displaced persons compares to all years since World War II, because the UN only has data on that since 1989. In absolute terms, the number of refugees worldwide is now the largest it’s been since record-keeping began 25 years ago. Measured as a share of global population, however, the number of displaced persons in 2013 had not yet matched the peak of the early 1990s (see the Addendum here).

The Cable accurately states that having four situations designated as level-3 humanitarian disasters by the UN is “unprecedented,” but we only learn late in the story that the system which makes these designations has only existed for a few years. In other words, unprecedented…since 2011.

Finally, while the Guardian correctly reports that 2013 was the most dangerous year on record for aid workers, it fails to note that those records only reach back to the late 1990s.

I don’t mean to make light of worrisome trends in the international system or any of the terrible conflicts driving them. From the measures I track—see here and here, for example, and here for an earlier post on causes—I’d say that global levels of instability and violent conflict are high and waxing, but they have not yet exceeded the peaks we saw in the early 1990s and probably the 1960s. Meanwhile, the share of states worldwide that are electoral democracies remains historically high, and the share of the world’s population living in poverty has declined dramatically in the past few decades. The financial crisis of 2008 set off a severe and persistent global recession, but that collapse could have been much worse, and institutions of global governance deserve some credit for helping to stave off an even deeper failure.

How can all of these things be true at the same time? It’s a bit like climate change. Just as one or even a few unusually cool years wouldn’t reverse or disprove the clear long-term trend toward a hotter planet, an extended phase of elevated disorder and violence doesn’t instantly undo the long-term trends toward a more peaceful and prosperous human society. We are currently witnessing (or suffering) a local upswing in disorder that includes numerous horrific crises, but in global historical terms, the world has not fallen apart.

Of course, if it’s a mistake to infer global collapse from these local trends, it’s also a mistake to infer that global collapse is impossible from the fact that it hasn’t occurred already. The war that is already consuming Syria and Iraq is responsible for a substantial share of the recent increase in refugee flows and casualties, and it could spread further and burn hotter for some time to come. Probably more worrisome to watchers of long-term trends in international relations, the crisis in Ukraine and recent spate of confrontations between China and its neighbors remind us that war between major powers could happen again, and this time those powers would both or all have nuclear weapons. Last but not least, climate change seems to be accelerating with consequences unknown.

Those are all important sources of elevated uncertainty, but uncertainty and breakdown are not the same thing. Although those press stories describing unprecedented crises are all covering important situations and trends, I think their historical perspective is too shallow. I’m forty-four years old. The global system is less orderly than it’s been in a while, but it’s still not worse than it’s ever been in my lifetime, and it’s still nowhere near as bad as it was when my parents were born. I won’t stop worrying or working on ways to try to make things a tiny bit better, but I will keep that frame of reference in mind.

Notes From a First Foray into Text Mining

Guess what? Text mining isn’t push-button, data-making magic, either. As Phil Schrodt likes to say, there is no Data Fairy.

data fairy meme

I’m quickly learning this point from my first real foray into text mining. Under a grant from the National Science Foundation, I’m working with Phil Schrodt and Mike Ward to use these techniques to develop new measures of several things, including national political regime type.

I wish I could say that I’m doing the programming for this task, but I’m not there yet. For the regime-data project, the heavy lifting is being done by Shahryar Minhas, a sharp and able Ph.D. student in political science at Duke University, where Mike leads the WardLab. Shahryar and I are scheduled to present preliminary results from this project at the upcoming Annual Meeting of the American Political Science Association in Washington, DC (see here for details).

When we started work on the project, I imagined a relatively simple and mostly automatic process running from location and ingestion of the relevant texts to data extraction, model training, and, finally, data production. Now that we’re actually doing it, though, I’m finding that, as always, the devil is in the details. Here are just a few of the difficulties and decision points we’ve had to confront so far.

First, the structure of the documents available online often makes it difficult to scrape and organize them. We initially hoped to include annual reports on politics and human-rights practices from four or five different organizations, but some of the ones we wanted weren’t posted online in a format we could readily scrape. At least one was scrapable but not organized by country, so we couldn’t properly group the text for analysis. In the end, we wound up with just two sets of documents in our initial corpus: the U.S. State Department’s Country Reports on Human Rights Practices, and Freedom House’s annual Freedom in the World documents.

Differences in naming conventions almost tripped us up, too. For our first pass at the problem, we are trying to create country-year data, so we want to treat all of the documents describing a particular country in a particular year as a single bag of words. As it happens, the State Department labels its human rights reports for the year on which they report, whereas Freedom House labels its Freedom in the World report for the year in which it’s released. So, for example, both organizations have already issued their reports on conditions in 2013, but Freedom House dates that report to 2014 while State dates its version to 2013. Fortunately, we knew this and made a simple adjustment before blending the texts. If we hadn’t known about this difference in naming conventions, however, we would have ended up combining reports for different years from the two sources and made a mess of the analysis.

Once ingested, those documents include some text that isn’t relevant to our task, or that is relevant but the meaning of which is tacit. Common stop words like “the”, “a”, and “an” are obvious and easy to remove. More challenging are the names of people, places, and organizations. For our regime-data task, we’re interested in the abstract roles behind some of those proper names—president, prime minister, ruling party, opposition party, and so on—rather than the names themselves, but text mining can’t automatically derive the one for the other.

For our initial analysis, we decided to omit all proper names and acronyms to focus the classification models on the most general language. In future iterations, though, it would be neat if we could borrow dictionaries developed for related tasks and use them to replace those proper names with more general markers. For example, in a report or story on Russia, Vladimir Putin might get translated into <head of government>, the FSB into <police>, and Chechen Republic of Ichkeria into <rebel group>. This approach would preserve the valuable tacit information in those names while making it explicit and uniform for the pattern-recognition stage.

That’s not all, but it’s enough to make the point. These things are always harder than they look, and text mining is no exception. In any case, we’ve now run this gantlet once and made our way to an encouraging set of initial results. I’ll post something about those results closer to the conference when the paper describing them is ready for public consumption. In the meantime, though, I wanted to share a few of the things I’ve already learned about these techniques with others who might be thinking about applying them, or who already do and can commiserate.

Turning Crowdsourced Preseason NFL Strength Ratings into Game-Level Forecasts

For the past week, nearly all of my mental energy has gone into the Early Warning Project and a paper for the upcoming APSA Annual Meeting here in Washington, DC. Over the weekend, though, I found some time for a toy project on forecasting pro-football games. Here are the results.

The starting point for this toy project is a pairwise wiki survey that turns a crowd’s beliefs about relative team strength into scalar ratings. Regular readers will recall that I first experimented with one of these before the 2013-2014 NFL season, and the predictive power wasn’t terrible, especially considering that the number of participants was small and the ratings were completed before the season started.

This year, to try to boost participation and attract a more knowledgeable crowd of respondents, I paired with Trey Causey to announce the survey on his pro-football analytics blog, The Spread. The response has been solid so far. Since the survey went up, the crowd—that’s you!—has cast nearly 3,400 votes in more than 100 unique user sessions (see the Data Visualizations section here).

The survey will stay open throughout the season, but that doesn’t mean it’s too early to start seeing what it’s telling us. One thing I’ve already noticed is that the crowd does seem to be updating in response to preseason action. For example, before the first round of games, I noticed that the Baltimore Ravens, my family’s favorites, were running mid-pack with a rating of about 50. After they trounced the defending NFC champion 49ers in their preseason opener, however, the Ravens jumped to the upper third with a rating of 59. (You can always see up-to-the-moment survey results here, and you can cast your own votes here.)

The wiki survey is a neat way to measure team strength. On their own, though, those ratings don’t tell us what we really want to know, which is how each game is likely to turn out, or how well our team might be expected to do this season. The relationship between relative strength and game outcomes should be pretty strong, but we might want to consider other factors, too, like home-field advantage. To turn a strength rating into a season-level forecast for a single team, we need to consider the specifics of its schedule. In game play, it’s relative strength that matters, and some teams will have tougher schedules than others.

A statistical model is the best way I can think to turn ratings into game forecasts. To get a model to apply to this season’s ratings, I estimated a simple linear one from last year’s preseason ratings and the results of all 256 regular-season games (found online in .csv format here). The model estimates net score (home minus visitor) from just one feature, the difference between the two teams’ preseason ratings (again, home minus visitor). Because the net scores are all ordered the same way and the model also includes an intercept, though, it implicitly accounts for home-field advantage as well.

The scatterplot below shows the raw data on those two dimensions from the 2013 season. The model estimated from these data has an intercept of 3.1 and a slope of 0.1 for the score differential. In other words, the model identifies a net home-field advantage of 3 points—consistent with the conventional wisdom—and it suggests that every point of advantage on the wiki-survey ratings translates into a net swing of one-tenth of a point on the field. I also tried a generalized additive model with smoothing splines to see if the association between the survey-score differential and net game score was nonlinear, but as the scatterplot suggests, it doesn’t seem to be.

2013 NFL Games Arranged by Net Game Score and Preseason Wiki Survey Rating Differentials

2013 NFL Games Arranged by Net Game Score and Preseason Wiki Survey Rating Differentials

In sample, the linear model’s accuracy was good, not great. If we convert the net scores the model postdicts to binary outcomes and compare those postdictions to actual outcomes, we see that the model correctly classifies 60 percent of the games. That’s in sample, but it’s also based on nothing more than home-field advantage and a single preseason rating for each team from a survey with a small number of respondents. So, all things considered, it looks like a potentially useful starting point.

Whatever its limitations, that model gives us the tool we need to convert 2014 wiki survey results into game-level predictions. To do that, we also need a complete 2014 schedule. I couldn’t find one in .csv format, but I found something close (here) that I saved as text, manually cleaned in a minute or so (deleted extra header rows, fixed remaining header), and then loaded and merged with a .csv of the latest survey scores downloaded from the manager’s view of the survey page on All Our Ideas.

I’m not going to post forecasts for all 256 games—at least not now, with three more preseason games to learn from and, hopefully, lots of votes yet to be cast. To give you a feel for how the model is working, though, I’ll show a couple of cuts on those very preliminary results.

The first is a set of forecasts for all Week 1 games. The labels show Visitor-Home, and the net score is ordered the same way. So, a predicted net score greater than 0 means the home team (second in the paired label) is expected to win, while a predicted net score below 0 means the visitor (first in the paired label) is expected to win. The lines around the point predictions represent 90-percent confidence intervals, giving us a partial sense of the uncertainty around these estimates.

Week 1 Game Forecasts from Preseason Wiki Survey Results on 10 August 2014

Week 1 Game Forecasts from Preseason Wiki Survey Results on 10 August 2014

Of course, as a fan of particular team, I’m most interested in what the model says about how my guys are going to do this season. The next plot shows predictions for all 16 of Baltimore’s games. Unfortunately, the plotting command orders the data by label, and my R skills and available time aren’t sufficient to reorder them by week, but the information is all there. In this plot, the dots for the point predictions are colored red if they predict a Baltimore win and black for an expected loss. The good news for Ravens fans is that this plot suggests an 11-5 season, good enough for a playoff berth. The bad news is that an 8-8 season also lies within the 90-percent confidence intervals, so the playoffs don’t look like a lock.

2014 Game-Level Forecasts for the Baltimore Ravens from 10 August 2014 Wiki Survey Scores

2014 Game-Level Forecasts for the Baltimore Ravens from 10 August 2014 Wiki Survey Scores

So that’s where the toy project stands now. My intuition tells me that the predicted net scores aren’t as well calibrated as I’d like, and the estimated confidence intervals surely understate the true uncertainty around each game (“On any given Sunday…”). Still, I think this exercise demonstrates the potential of this forecasting process. If I were a betting man, I wouldn’t lay money on these estimates. As an applied forecaster, though, I can imagine using these predictions as priors in a more elaborate process that incorporates additional and, ideally, more dynamic information about each team and game situation over the course of the season. Maybe my doppelganger can take that up while I get back to my day job…

Postscript. After I published this post, Jeff Fogle suggested via Twitter that I compare the Week 1 forecasts to the current betting lines for those games. The plot below shows the median point spread from an NFL odds-aggregating site as blue dots on top of the statistical forecasts already shown above. As you can see, the statistical forecasts are tracking the betting lines pretty closely. There’s only one game—Carolina at Tampa Bay—where the predictions from the two series fall on different sides of the win/loss line, and it’s a game the statistical model essentially sees as a toss-up. It’s also reassuring that there isn’t a consistent direction to the differences, so the statistical process doesn’t seem to be biased in some fundamental way.

Week 1 Game-Level Forecasts Compared to Median Point Spread from Betting Sites on 11 August 2014

Week 1 Game-Level Forecasts Compared to Median Point Spread from Betting Sites on 11 August 2014

Forecasting Round-Up No. 7

1. I got excited when I heard on Twitter yesterday about a machine-learning process that turns out to be very good at predicting U.S. Supreme Court decisions (blog post here, paper here). I got even more excited when I saw that the guys who built that process have also been running a play-money prediction market on the same problem for the past several years, and that the most accurate forecasters in that market have done even better than that model (here). It sounds like they are now thinking about more rigorous ways to compare and cross-pollinate the two. That’s part of what we’re trying to do with the Early Warning Project, so I hope that they do and we can learn from their findings.

2. A paper in the current issue of the Journal of Personality and Social Psychology (here, but paywalled; hat-tip to James Igoe Walsh) adds to the growing pile of evidence on the forecasting power of crowds, with an interesting additional finding on the willingness of others to trust and use those forecasts:

We introduce the select-crowd strategy, which ranks judges based on a cue to ability (e.g., the accuracy of several recent judgments) and averages the opinions of the top judges, such as the top 5. Through both simulation and an analysis of 90 archival data sets, we show that select crowds of 5 knowledgeable judges yield very accurate judgments across a wide range of possible settings—the strategy is both accurate and robust. Following this, we examine how people prefer to use information from a crowd. Previous research suggests that people are distrustful of crowds and of mechanical processes such as averaging. We show in 3 experiments that, as expected, people are drawn to experts and dislike crowd averages—but, critically, they view the select-crowd strategy favorably and are willing to use it. The select-crowd strategy is thus accurate, robust, and appealing as a mechanism for helping individuals tap collective wisdom.

3. Adam Elkus recently spotlighted two interesting papers involving agent-based modeling (ABM) and forecasting.

  • The first (here) “presents a set of guidelines, imported from the field of forecasting, that can help social simulation and, more specifically, agent-based modelling practitioners to improve the predictive performance and the robustness of their models.”
  • The second (here), from 2009 but new to me, describes an experiment in deriving an agent-based model of political conflict from event data. The results were pretty good; a model built from event data and then tweaked by a subject-matter expert was as accurate as one built entirely by hand, and the hybrid model took much less time to construct.

4. Nautilus ran a great piece on Lewis Fry Richardson, a pioneer in weather forecasting who also applied his considerable intellect to predicting violent conflict. As the story notes,

At the turn of the last century, the notion that the laws of physics could be used to predict weather was a tantalizing new idea. The general idea—model the current state of the weather, then apply the laws of physics to calculate its future state—had been described by the pioneering Norwegian meteorologist Vilhelm Bjerknes. In principle, Bjerkens held, good data could be plugged into equations that described changes in air pressure, temperature, density, humidity, and wind velocity. In practice, however, the turbulence of the atmosphere made the relationships among these variables so shifty and complicated that the relevant equations could not be solved. The mathematics required to produce even an initial description of the atmosphere over a region (what Bjerknes called the “diagnostic” step) were massively difficult.

Richardson helped solve that problem in weather forecasting by breaking the task into many more manageable parts—atmospheric cells, in this case—and thinking carefully about how those parts fit together. I wonder if we will see similar advances in forecasts of social behavior in the next 100 years. I doubt it, but the trajectory of weather prediction over the past century should remind us to remain open to the possibility.

5. Last, a bit of fun: Please help Trey Causey and me forecast the relative strength of this year’s NFL teams by voting in this pairwise wiki survey! I did this exercise last year, and the results weren’t bad, even though the crowd was pretty small and probably not especially expert. Let’s see what happens if more people participate, shall we?

In Praise of a Measured Response to the Ukraine Crisis

Yesterday afternoon, I tweeted that the Obama administration wasn’t getting enough credit for its measured response to the Ukraine crisis so far, asserting that sanctions were really hurting Russia and noting that “we”—by which I meant the United States—were not directly at war.

Not long after I said that, someone I follow tweeted that he hadn’t seen a compelling explanation of how sanctions are supposed to work in this case. That’s an important question, and one I also haven’t seen or heard answered in depth. I don’t know how U.S. or European officials see this process beyond what they say in public, but I thought I would try to spell out the logic as a way to back up my own assertion in support of the approach the U.S. and its allies have pursued so far.

I’ll start by clarifying what I’m talking about. When I say “Ukraine crisis,” I am referring to the tensions created by Russia’s annexation of Crimea and its evident and ongoing support for a separatist rebellion in eastern Ukraine. These actions are only the latest in a long series of interactions with the U.S. and Europe in Russia’s “near abroad,” but their extremity and the aggressive rhetoric and action that has accompanied them have sharply amplified tensions between the larger powers that abut Ukraine on either side. For the first time in a while, there has been open talk of a shooting war between Russia and NATO. Whatever you make of the events that led to it and however you assign credit or blame for them, this state of affairs represents a significant and undesirable escalation.

Faced with this crisis, the U.S. and its NATO allies have three basic options: compel, cajole, or impel.

Compel in this case means to push Russia out of Ukraine by force—in other words, to go to war. So far, the U.S. and Europe appear to have concluded—correctly, in my opinion—that Russia’s annexation of Crimea and its support for separatists in eastern Ukraine does not warrant a direct military response. The likely and possible costs of war between two nuclear powers are simply too great to bear for the sake of Ukraine’s autonomy or territorial integrity.

Cajoling would mean persuading Russian leaders to reverse course through positive incentives—carrots of some kind. It’s hard to imagine what the U.S. and E.U. could offer that would have the desired effect, however. Russian leaders consider Ukraine a vital interest, and the West has nothing comparably valuable to offer in exchange. More important, the act of making such an offer would reward Russia for its aggression, setting a precedent that could encourage Russia to grab for more and could also affect other country’s perceptions of the U.S.’s tolerance for seizures of territory.

That leaves impel—to impose costs on Russia to the point where its leaders feel obliged to change course. The chief tool that U.S. and European leaders have to impose costs on Russia are economic and financial sanctions. Those leaders are using this tool, and it seems to be having the desired effect. Sanctions are encouraging capital flight, raising the costs of borrowing, increasing inflation, and slowing Russia’s already-anemic economic growth (see here and here for some details). Investors, bankers, and consumers are partly responding to the specific constraints of sanctions, but they are also responding to the broader economic uncertainty associated with those sanctions and the threat of wider war they imply. “It’s pure geopolitical risk,” one analyst told Bloomberg.

These costs can directly and indirectly shape Russian policy. They can directly affect Russian policy if and as the present leadership comes to view them as unbearable, or at least not worth the trade-offs against other policy objectives. That seems unlikely in the short term but increasingly likely over the long term, if the sanctions are sustained and markets continue to react so negatively. Sustained capital flight, rising inflation, and slower growth will gradually shrink Russia’s domestic policy options and its international power by eroding its fiscal health, and at some point these costs should come to outweigh the putative gains of territorial expansion and stronger leverage over Ukrainian policy.

These costs can also indirectly affect Russian policy by increasing the risk of internal instability. In authoritarian regimes, significant reforms usually occur in the face of popular unrest that may or may not be egged on by elites who defect from the ruling coalition. We are already seeing signs of infighting among regime insiders, and rising inflation and slowing growth should increase the probability of popular unrest.

To date, sanctions have not dented Putin’s soaring approval rating, but social unrest is not a referendum. Unrest only requires a small but motivated segment of the population to get started, and once it starts, its very occurrence can help persuade others to follow. I still wouldn’t bet on Putin’s downfall in the near future, but I believe the threat of significant domestic instability is rising, and I think that Putin & co. will eventually care more about this domestic risk than the rewards of continued adventurism abroad. In fact, I think we see some evidence that Putin & co. are already worrying more about this risk in their ever-expanding crackdown on domestic media and their recent moves to strengthen punishment for unauthorized street rallies and, ironically, calls for separatism. Even if this mobilization does not come, the increased threat of it should weigh on the Russian administration’s decision-making.

In my tweet on the topic, I credited the Obama administration for using measured rhetoric and shrewd policy in response to this crisis. Importantly, though, the success of this approach also depends heavily on cooperation among the U.S. and the E.U., and that seems to be happening. It’s not clear who deserves the credit for driving this process, but as one anonymous tweeter pointed out, the downing of flight MH17 appears to have played a role in deepening it.

Concerns are growing that sanctions may, in a sense, be too successful. Some observers fear that apparent capitulation to the U.S. and Europe would cost Russian leaders too much at home at a time when nationalist fervor has reached fever pitch. Confronted with a choice between wider war abroad or a veritable lynch mob at home, Putin & co. will, they argue, choose the former.

I think that this line of reasoning overstates the extent to which the Russian administration’s hands are tied at home. Putin & co. are arguably no more captive to the reinvigorated radical-nationalist fringe than they were to the liberal fringe that briefly threatened to oust them after the last presidential election.

Still, it is at least a plausible scenario, and the U.S. and E.U. have to be prepared for the possibility that Russian aggression will get worse before it gets better. This is where rhetorical and logistical efforts to bolster NATO are so important, and that’s just what NATO has been doing. NATO is predicated on a promise of collective defense; an attack on any one member state is regarded as an attack on all. By strengthening Russian policy-makers’ beliefs that this promise is credible, NATO can lead them to fear that escalations beyond certain thresholds will carry extreme costs and even threaten their very survival. So far, that’s just what the alliance has been doing with a steady flow of words and actions. Russian policy-makers could still choose wider war for various reasons, but theory and experience suggest that they are less likely to do so than they would be in the absence of this response.

In sum, given a short menu of unpalatable options, I think that the Obama administration and its European allies have chosen the best line of action and, so far, made the most of it. To expect Russia quickly to reverse course by withdrawing from Crimea and stopping its rabble-rousing in eastern Ukraine without being compelled by force to do so is unrealistic. The steady, measured approach the U.S. and E.U. have adopted appears to be having the intended effects. Russia could still react to the rising structural pressures on it by lashing out, but NATO is taking careful steps to discourage that response and to prepare for it if it comes. Under such lousy circumstances, I think this is about as well as we could expect the Obama administration and its E.U. counterparts to do.

Uncertainty About How Best to Convey Uncertainty

NPR News ran a series of stories this week under the header Risk and Reason, on “how well we understand and act on probabilities.” I thought the series nicely represented how uncertain we are about how best to convey forecasts to people who might want to use them. There really is no clear standard here, even though it is clear that the choices we make in presenting forecasts and other statistics on risks to their intended consumers strongly shape what they hear.

This uncertainty about how best to convey forecasts was on full display in the piece on how CIA analysts convey predictive assessments (here). Ken Pollack, a former analyst who now teaches intelligence analysis, tells NPR that, at CIA, “There was a real injunction that no one should ever use numbers to explain probability.” Asked why, he says that,

Assigning numerical probability suggests a much greater degree of certainty than you ever want to convey to a policymaker. What we are doing is inherently difficult. Some might even say it’s impossible. We’re trying to protect the future. And, you know, saying to someone that there’s a 67 percent chance that this is going to happen, that sounds really precise. And that makes it seem like we really know what’s going to happen. And the truth is that we really don’t.

In that same segment, though, Dartmouth professor Jeff Friedman, who studies decision-making about national security issues, says we should provide a numeric point estimate of an event’s likelihood, along with some information about our confidence in that estimate and how malleable it may be. (See this paper by Friedman and Richard Zeckhauser for a fuller treatment of this argument.) The U.S. Food and Drug Administration apparently agrees; according to the same NPR story, the FDA “prefers numbers and urges drug companies to give numerical values for risk—and to avoid using vague terms such as ‘rare, infrequent and frequent.'”

Instead of numbers, Pollack advocates for using words: “Almost certainly or highly likely or likely or very unlikely,” he tells NPR. As noted by one of the other stories in the series (here), however—on the use of probabilities in medical decision-making—words and phrases are ambiguous, too, and that ambiguity can be just as problematic.

Doctors, including Leigh Simmons, typically prefer words. Simmons is an internist and part of a group practice that provides primary care at Mass General. “As doctors we tend to often use words like, ‘very small risk,’ ‘very unlikely,’ ‘very rare,’ ‘very likely,’ ‘high risk,’ ” she says.

But those words can be unclear to a patient.

“People may hear ‘small risk,’ and what they hear is very different from what I’ve got in my mind,” she says. “Or what’s a very small risk to me, it’s a very big deal to you if it’s happened to a family member.

Intelligence analysts have sometimes tried to remove that ambiguity by standardizing the language they use to convey likelihoods, most famously in Sherman Kent’s “Words of Estimative Probability.” It’s not clear to me, though, how effective this approach is. For one thing, consumers are often lazy about trying to understand just what information they’re being given, and templates like Kent’s don’t automatically solve that problem. This laziness came across most clearly in NPR’s Risk and Reason segment on meteorology (here). Many of us routinely consume probabilistic forecasts of rainfall and make decisions in response to them, but it turns out that few of us understand what those forecasts actually mean. With Kent’s words of estimative probability, I suspect that many readers of the products that use them haven’t memorized the table that spells out their meaning and don’t bother to consult it when they come across those phrases, even when it’s reproduced in the same document.

Equally important, a template that works well for some situations won’t necessarily work for all. I’m thinking in particular of forecasts on the kinds of low-probability, high-impact events that I usually analyze and that are essential to the CIA’s work, too. Here, what look like small differences in probability can sometimes be very meaningful. For example, imagine that it’s August 2001 and you’ve three different assessments of the risk of a major terrorist attack on U.S. soil in the next few months. One pegs the risk at 1 in 1,000; another at 1 in 100; and another at 1 in 10. Using Kent’s table, all three of those assessments would get translated into a statement that the event is “almost certainly not” going to happen, but I imagine that most U.S. decision-makers would have felt very differently about risks of 0.1%, 1%, and 10% with a threat of that kind.

There are lots of rare but important events that inhabit this corner of the probability space: nuclear accidents, extreme weather events, medical treatments, and mass atrocities, to name a few. We could create a separate lexicon for assessments in these areas, as the European Medicines Agency has done for adverse reactions to medical therapies (here, via NPR). I worry, though, that we ask too much of consumers of these and other forecasts if we expect them to remember multiple lexicons and to correctly code-switch between them. We also know that the relevant scale will differ across audiences, even on the same topic. For example, an individual patient considering a medical treatment might not care much about the difference between a mortality risk of 1 in 1,000 and 1 in 10,000, but a drug company and the regulators charged with overseeing them hopefully do.

If there’s a general lesson here, it’s that producers of probabilistic forecasts should think carefully about how best to convey their estimates to specific audiences. In practice, that means thinking about the nature of the decision processes those forecasts are meant to inform and, if possible, trying different approaches and checking to find out how each is being understood. Ideally, consumers of those forecasts should also take time to try to educate themselves on what they’re getting. I’m not optimistic that many will do that, but we should at least make it as easy as possible for them to do so.

Indonesia’s Elections Offer Some Light in the Recent Gloom

The past couple of weeks have delivered plenty of terrible news, so I thought I would take a moment to call out a significant positive development: Indonesia held a presidential election early this month; there were no coup attempts and little violence associated with that balloting; and the contest was finally won by the guy who wasn’t threatening to dismantle democracy.

By my reckoning, this outcome should increase our confidence that Indonesia now deserves to be called a consolidated democracy, where “consolidated” just means that the risk of a reversion to authoritarian rule is low. Democracies are most susceptible to those reversions in their first 15–20 years (here and here), especially when they are poor and haven’t yet seen power passed from one party to another (here).

Indonesia now looks reasonably solid on all of those counts. The current democratic episode began nearly 15 years ago, in 1999, and the country has elected three presidents from as many parties since then—four if we count the president-elect. Indonesia certainly isn’t a rich country, but it’s not exactly poor any more, either. With a GDP per capita of approximately $3,500, it now lands near the high end of the World Bank’s “lower middle income” tier. Together, those features don’t describe a regime that we would expect to be immune from authoritarian reversal, but the elections that just occurred put that system through a major stress test, and it appears to have passed.

Some observers would argue that the country’s democratic regime already crossed the “consolidated” threshold years ago. When I described Indonesia as a newly consolidated democracy on Twitter, Indonesia specialist Jeremy Menchik noted that colleagues William Liddle and Saiful Mujani had identified Indonesia as being consolidated since 2004 and said that he agreed with them. Meanwhile, democratization experts often use the occurrence of one or two peaceful transfers of power as a rule of thumb for declaring democracies consolidated, and Indonesia had passed both of those tests before the latest election campaign even began.

Of course, it’s easy to say in hindsight that the risk of an authoritarian reversal in Indonesia around this election was low. We shouldn’t forget, though, that there was a lot of anxiety during the campaign about how the eventual loser, Prabowo Subianto, might dismantle democracy if he were elected, and in the end he only lost by a few percentage points. What’s more, the kind of “reforms” at which Prabowo hinted are just the sorts of things that have undone many other attempts at democracy in the past couple of decades. There were also rumors of coup plots, especially during the nerve-wracking last few weeks of the campaign until the official results were announced (see here, for example). Some seasoned observers of Indonesian politics with whom I spoke were confident at the time that those plots would not come to pass, but the fact that those rumors existed and were anxiously discussed in some quarters suggests that they were at least plausible, even if they weren’t probable. Last but not least, statistical modeling by Milan Svolik suggests that a middle-income presidential democracy like Indonesia’s won’t really be “cured” of its risk of authoritarian reversal until it gets much wealthier (see the actuarial tables on p. 43 in this excellent paper, which was later published in the American Political Science Review).

Even bearing those facts and Milan’s tables in mind, I think it’s fair to say that Indonesia now qualifies as a consolidated democracy, in the specific sense that the risk of an authoritarian reversal is now quite small and will remain so. If that’s right, then four of the world’s five most populous countries now fit under that label. The democratic regimes in India, the United States, Indonesia, and Brazil—roughly 2 billion citizens among them—all have lots of flaws, but the increased prevalence and persistence of democracy among the world’s largest countries is still a very big deal in the long course of human affairs. And, who knows, maybe China will finally join them in the not-too-distant future?

In Applied Forecasting, Keep It Simple

One of the lessons I think I’ve learned from the nearly 15 years I’ve spent developing statistical models to forecast rare political events is: keep it simple unless and until you’re compelled to do otherwise.

The fact that the events we want to forecast emerge from extremely complex systems doesn’t mean that the models we build to forecast them need to be extremely complex as well. In a sense, the unintelligible complexity of the causal processes relieves us from the imperative to follow that path. We know our models can’t even begin to capture the true data-generating process. So, we can and usually should think instead about looking for measures that capture relevant concepts in a coarse way and then use simple model forms to combine those measures.

A few experiences and readings have especially shaped my thinking on this issue.

  • When I worked on the Political Instability Task Force (PITF), my colleagues and I found that a logistic regression model with just four variables did a pretty good job assessing relative risks of a few forms of major political crisis in countries worldwide (see here, or ungated here). In fact, one of the four variables in that model—an indicator that four or more bordering countries have ongoing major armed conflicts—has almost no variance, so it’s effectively a three-variable model. We tried adding a lot of other things that were suggested by a lot of smart people, but none of them really improved the model’s predictive power. (There were also a lot of things we couldn’t even try because the requisite data don’t exist, but that’s a different story.)
  • Toward the end of my time with PITF, we ran a “tournament of methods” to compare the predictive power of several statistical techniques that varied in their complexity, from logistic regression to Bayesian hierarchical models with spatial measures (see here for the write-up). We found that the more complex approaches usually didn’t outperform the simpler ones, and when they did, it wasn’t by much. What mattered most for predictive accuracy was finding the inputs with the most forecasting power. Once we had those, the functional form and complexity of the model didn’t make much difference.
  • As Andreas Graefe describes (here), models that assign equal weights to all predictors often forecast at least as accurately as multiple regression models that estimate weights from historical data. “Such findings have led researchers to conclude that the weighting of variables is secondary for the accuracy of forecasts,” Graefe writes. “Once the relevant variables are included and their directional impact on the criterion is specified, the magnitudes of effects are not very important.”

Of course, there will be some situations in which complexity adds value, so it’s worth exploring those ideas when we have a theoretical rationale and the coding skills, data, and time needed to pursue them. In general, though, I am convinced that we should always try simpler forms first and only abandon them if and when we discover that more complex forms significantly increase forecasting power.

Importantly, the evidence for that judgment should come from out-of-sample validation—ideally, from forecasts made about events that hadn’t yet happened. Models with more variables and more complex forms will often score better than simpler ones when applied to the data from which they were derived, but this will usually turn out to be a result of overfitting. If the more complex approach isn’t significantly better at real-time forecasting, it should probably be set aside until it does.

Oh, and a corollary: if you have to choose between a) building more complex models, or even just applying lots of techniques to the same data, and b) testing other theoretically relevant variables for predictive power, do (b).

Russia Throws Cuba a Lifeline

Russia has just reinvigorated its relationship with Cuba, and I suspect that this renewed friendship of convenience will help Cuba’s Communist regime stick around longer than it would have without it.

A few things happened, all apparently part of an elaborate quid pro quo. First, while visiting Cuba last week, Russian president Vladimir Putin announced that his country was forgiving nearly all of Cuba’s lingering Soviet-era debt to Russia, or more than $30 billion. Then, a few days later, reports emerged that Cuba had agreed to allow Russia to re-open a large Soviet-era intelligence-gathering facility used to surveil the United States during the Cold War. While in Havana, Putin also spoke of reviving broader military and technological cooperation with Cuba, although he did not say exactly what that would entail. Last but not least, Russia and Cuba reportedly also signed some significant economic contracts, including ones that would allow Russian oil companies to explore Cuban waters.

Putin’s government seems to be responding in kind to what it perceives as a deepening  U.S. threat on its own borders, and this is important in its own right. As a specialist on the survival and transformation of authoritarian regimes, though, I am also interested in how this reinvigorated relationship affects prospects for political change in Cuba.

Consolidated single-party regimes, like Cuba’s, are the most durable kind of autocracies, but when they do break down, it’s usually an economic or fiscal crisis that sets the process in motion. Slumping state revenues shrink the dole that encourages various factions within the party to stay loyal to the ruling elite, while wider economic problems also give ordinary citizens stronger motivations to demand reform. When frustrated citizens and disgruntled insiders find each other, the effect can be especially potent. Economic crisis doesn’t guarantee the collapse of single-party regimes, but it does significantly increase the probability of its occurrence.

The Soviet Union bankrolled Havana for many years, and the Cuban economy has been limping along since that funding stream disappeared along with the country that provided it. In 2o11, the Communist Party of Cuba finally responded to that malaise as formal theory leads us to expect that it would: by experimenting with some limited forms of economic liberalization. These reforms are significant, but as far as I can tell, they have not yet led to the kind of economic renewal that would give the ruling party a serious boost.

One of the reasons the Cuban regime managed to delay those reforms for long was the largesse it received from its close friends in Venezuela. As I discussed in a post here last year, Hugo Chavez’s government used its oil boom to help finance the Cuban regime at a time when Havana would otherwise have been hard pressed to search for new sources of revenue.

With Hugo Chavez dead and Venezuela’s economy in crisis, however, this support has become unreliable. I had expected this uncertainty to increase pressure on the Communist Party of Cuba to expand its liberalization in search of new revenues, and for that expanded liberalization, in turn, to improve prospects for popular mobilization and elite defections that could lead to broader political reforms.

The renewed embrace from Russia now has me revisiting that expectation. The forgiveness of more than $30 billion in debt should provide an immediate boost to Cuba’s finances, but I’m also intrigued by the talk of new oil concessions. For years, the Cuban government has seemed to be hoping that hydrocarbons under its waters would provide it with a new fiscal lifeline. That hasn’t happened yet, but it sounds like Russia and Havana increasingly see prospects for mutual gains in this sphere. Of course, it will also be important to see what other forms of economic and military support are on offer from Moscow and how quickly they might arrive.

None of these developments magically resolves the fundamental flaws in Cuba’s political economy, and so far the government shows no signs of rolling back the process of limited liberalization it has already begun. What’s more, Russia also has economic problems of its own, so it’s not clear how much help it can offer and how long it will be able to sustain that support. Even so, these developments probably do shrink the probability that the Cuban economy will tip soon into a deeper crisis, and with it the near-term prospects for a broader political transformation.

We Are All Victorians

“We have no idea, now, of who or what the inhabitants of our future might be. In that sense, we have no future. Not in the sense that our grandparents had a future, or thought they did. Fully imagined cultural futures were the luxury of another day, one in which ‘now’ was of some greater duration. For us, of course, things can change so abruptly, so violently, so profoundly, that futures like our grandparents’ have insufficient ‘now’ to stand on. We have no future because our present is too volatile… We have only risk management. The spinning of the given moment’s scenarios. Pattern recognition.”

That’s the fictional Hubertus Bigend sounding off in Chapter Six of William Gibson’s fantastic 2003 novel. Gibson is best known as an author of science fiction set in the not-too-distant future. As that passage suggests, though, he is not uniquely interested in looking forward. In Gibson’s renderings, future and past might exist in some natural sense, but our ideas of them can only exist in the present, which is inherently and perpetually liminal.

In Chapter Six, the conversation continues:

“Do we have a past, then?” Stonestreet asks.

“History is a best-guess narrative about what happened and when,” Bigend says, his eyes narrowing. “Who did what to whom. With what. Who won. Who lost. Who mutated. Who became extinct.”

“The future is there,” Cayce hears herself say, “looking back at us. Trying to make sense of the fiction we will have become. And from where they are, the past behind us will look nothing at all like the past we imagine behind us now.”

“You sound oracular.” White teeth.

“I only know that the one constant in history is change: The past changes. Our version of the past will interest the future to about the extent we’re interested in in whatever the past the Victorians believed in. It simply won’t seem very relevant.”

I read that passage and I picture a timeline flipped vertical and frayed at both ends. Instead of a flow of time from left to right, we have only the floating point of the present, with ideas about the future and past radiating outwards and nothing to which we can moor any of it.

In a recent interview with David Wallace-Wells for Paris Review, Gibson revisits this theme when asked about science fiction as futurism.

Of course, all fiction is speculative, and all history, too—endlessly subject to revision. Particularly given all of the emerging technology today, in a hundred years the long span of human history will look fabulously different from the version we have now. If things go on the way they’re going, and technology keeps emerging, we’ll eventually have a near-total sorting of humanity’s attic.

In my lifetime I’ve been able to watch completely different narratives of history emerge. The history now of what World War II was about and how it actually took place is radically different from the history I was taught in elementary school. If you read the Victorians writing about themselves, they’re describing something that never existed. The Victorians didn’t think of themselves as sexually repressed, and they didn’t think of themselves as racist. They didn’t think of themselves as colonialists. They thought of themselves as the crown of creation.

Of course, we might be Victorians, too.

Of course we are. How could we not be?

That idea generally fascinates me, but it also specifically interests me as a social scientist. As discussed in a recent post, causal inference in the social sciences depends on counterfactual reasoning—that is, imagining versions of the past and future that we did not see.

Gibson’s rendering of time reminds us that this is even harder than we like to pretend. It’s not just that we can’t see the alternative histories we would need to compare to our lived history in order to establish causality with any confidence. We can’t even see that lived history clearly. The history we think we see is a pattern that is inexorably constructed from materials available in the present. Our constant disdain for most past versions of those renderings should give us additional pause when attempting to draw inferences from current ones.

Follow

Get every new post delivered to your Inbox.

Join 6,630 other followers

%d bloggers like this: