How Makers of Foreign Policy Use Statistical Forecasts: They Don’t, Really

The current issue of Foreign Policy magazine includes a short piece I wrote on how statistical models can be useful for forecasting coups d’etat. With the March coup in Mali as a hook, the piece aims to show that number-crunching can sometimes do a good job assessing risks of rare events that might otherwise present themselves as strategic surprises.

In fact, statistical forecasting of international politics is a relatively young field, and decision-makers in government and the private sector have traditionally relied on subject-matter experts to prognosticate on events of interest. Unfortunately, expert judgment does not work nearly as well as a forecasting tool as we might hope or expect.

In a comprehensive study of expert political judgment, Philip Tetlock finds that forecasts made by human experts on a wide variety of political phenomena are barely better than random guesses, and they are routinely bested by statistical algorithms that simply extrapolate from recent trends. Some groups of experts perform better than others—the experts’ cognitive style is especially relevant, and feedback and knowledge of base rates can help, too—but even the best-performing sets of experts fail to match the accuracy of those simple statistical algorithms.

The finding that models outperform subjective judgments at forecasting has been confirmed repeatedly by other researchers, including one prominent 2004 study which showed that a simple statistical model could predict the outcomes of U.S. Supreme Court cases much more accurately than a large assemblage of legal experts.

Because statistical forecasts are potentially so useful, you would think that policy makers and the analysts who inform them would routinely use them. That, however, would be a bad bet. I spoke with several former U.S. policy and intelligence officials, and all of them agreed that policymakers make little use of these tools and the “watch lists” they are often used to produce. A few of those former officials noted some variation in the application of these techniques across segments of the government—military leaders seem to be more receptive to statistical forecasting than civilian ones—but, broadly speaking, sentiment runs strongly against applied modeling.

If the evidence in favor of statistical risk assessment is so strong, why is it such a tough sell?

Part of the answer surely lies in a general tendency humans have to discount or ignore evidence that doesn’t match our current beliefs. Psychologists call this tendency confirmation bias, and it affects how we respond when models produce forecasts that contradict our expectations about the future. In theory, this is when models are most useful; in practice, it may also be when they’re hardest to sell.

Jeremy Weinstein, a professor of political science at Stanford University, served as Director for Development and Democracy on the National Security Council staff at the White House from 2009 until 2011. When I asked him why statistical forecasts don’t get used more in foreign-policy decision-making, he replied, “I only recall seeing the use of quantitative assessments in one context. And in that case, I think they were accepted by folks because they generated predictions consistent with people’s priors. I’m skeptical that they would have been valued the same if they had generated surprising predictions. For example, if a quantitative model suggests instability in a country that no one is invested in following or one everyone believes is stable, I think the likely instinct of policymakers is to question the value of the model.”

The pattern of confirmation bias extends to the bigger picture on the relative efficacy of models and experts. When asked about why policymakers don’t pay more attention to quantitative risk assessments, Anne-Marie Slaughter, former director of Policy Planning at State, responded: “You may believe that [statistical forecasts] have a better track record than expert judgment, but that is not a widely shared view. Changing minds has to come first, then changing resources.”

Where Weinstein and Slaughter note doubts about the value of the forecasts, others see deeper obstacles in the organizational culture of the intelligence community. Ken Knight, now Analytic Director at Centra Technology, spent the better part of a 30-year career in government working on risk assessment, including several years in the 2000s as National Intelligence Officer for Warning. According to Knight, “Part of it is the analytic community that I grew up in. There was very little in the way of quantitative analytic techniques that was taught to me as an analyst in the courses I took. There is this bias that says this stuff is too complex to model…People are just really skeptical that this is going to tell them something they don’t already know.”

This organizational bias may simply reflect some deep grooves in human cognition. Psychological research shows that our minds routinely ignore statistical facts about groups or populations while gobbling up or even cranking out causal stories that purport to explain those facts. These different responses appear to be built-in features of the automatic and unconscious thinking that dominates our cognition. Because of them, our minds “can deal with stories in which the elements are causally linked,” Daniel Kahneman writes, but they are “weak in statistical reasoning.”

Of course, cognitive bias and organizational culture aren’t the only reasons statistical risk assessments don’t always get traction in the intelligence-production process. Stephen Krasner, a predecessor of Slaughter’s as director of Policy Planning at State, noted in an email exchange that there’s often a mismatch between the things these models can warn about and the kinds of questions policymakers are often trying to answer. Krasner’s point was echoed in a recent column by CNAS senior fellow Andrew Exum, who notes that “intelligence organizations are normally asked to answer questions regarding both capability and intent.” To that very short list, I would add “probability,” but the important point here is that estimating the likelihood of events of concern is just one part of what these organizations are asked to do, and often not the most prominent one.

Clearly, there are a host of reasons why policy-makers might not see statistical forecasts as a valuable resource. Some are rooted in cognitive bias and organizational culture, while others are related to the nature of the problems they’re trying to solve.

That said, I suspect that modelers also share some of the blame for the chilly reception their forecasts receive. When modelers are building their forecasting tools, I suspect they often imagine their watch lists landing directly on the desks of policymakers with global concerns who are looking to take preventive action or to nudge along events they’d like to see happen. “Tell me the 10 countries where civil war is most likely,” we might imagine the president saying, “so I know where to send my diplomats and position my ships now.”

In reality, the policy process is much more reactive, and by the time something has landed on the desks of the most senior decision-makers, the opportunity for useful strategic warning is often gone. What’s more, in the rare instances where quantitative forecasts do land on policy-makers’ desks, analysts may not be thrilled to see those watch lists cutting to the front of the line and competing directly with them for the scarce attention of their “customers.”

In this environment, modelers could try to make their forecasts more valuable by designing them for, and targeting them at, people earlier in the analytical process—that is, lower in the bureaucracy. Quantitative risk assessments should be more useful to the analysts, desk officers, and deputies who may be able to raise warning flags earlier and who will be called upon when their country of interest pops into the news. Statistical forecasts of relevant events can shape those specialists’ thinking about what the major risks are in their areas of concern, hopefully spurring them to revisit their assumptions in cases where the forecast diverges significantly from their own expectations. Statistical forecasts can also give those specialists some indication on how various risks might increase or decrease as other conditions change. In this model, the point isn’t to replace or overrule the analyst’s judgment, but rather to shape and inform it.

Even without strategic redirection among modelers, though, it’s possible that broader cultural trends will at least erode resistance to statistical risk assessment among senior decision-makers and the analysts who support them. Advances in computing and communications technology are spurring the rise of Big Data and even talk of a new “age of the algorithm.” The discourse often gets a bit heady, but there’s no question that statistical thinking is making new inroads into many fields. In medicine, for example—another area where subjective judgment is prized and decisions can have life-or-death consequences—improvements in data and analysis are combining with easier access to the results to encourage practitioners to lean more heavily on statistical risk assessments in their decisions about diagnosis and treatment. If the hidebound world of medicine can find new value in statistical modeling, who knows, maybe foreign policy won’t be too far behind.

Advertisements
  • Author

  • Follow me on Twitter

  • Follow Dart-Throwing Chimp on WordPress.com
  • Enter your email address to follow this blog and receive notifications of new posts by email.

    Join 13,647 other followers

  • Archives

  • Advertisements
%d bloggers like this: