How Makers of Foreign Policy Use Statistical Forecasts: They Don’t, Really

The current issue of Foreign Policy magazine includes a short piece I wrote on how statistical models can be useful for forecasting coups d’etat. With the March coup in Mali as a hook, the piece aims to show that number-crunching can sometimes do a good job assessing risks of rare events that might otherwise present themselves as strategic surprises.

In fact, statistical forecasting of international politics is a relatively young field, and decision-makers in government and the private sector have traditionally relied on subject-matter experts to prognosticate on events of interest. Unfortunately, expert judgment does not work nearly as well as a forecasting tool as we might hope or expect.

In a comprehensive study of expert political judgment, Philip Tetlock finds that forecasts made by human experts on a wide variety of political phenomena are barely better than random guesses, and they are routinely bested by statistical algorithms that simply extrapolate from recent trends. Some groups of experts perform better than others—the experts’ cognitive style is especially relevant, and feedback and knowledge of base rates can help, too—but even the best-performing sets of experts fail to match the accuracy of those simple statistical algorithms.

The finding that models outperform subjective judgments at forecasting has been confirmed repeatedly by other researchers, including one prominent 2004 study which showed that a simple statistical model could predict the outcomes of U.S. Supreme Court cases much more accurately than a large assemblage of legal experts.

Because statistical forecasts are potentially so useful, you would think that policy makers and the analysts who inform them would routinely use them. That, however, would be a bad bet. I spoke with several former U.S. policy and intelligence officials, and all of them agreed that policymakers make little use of these tools and the “watch lists” they are often used to produce. A few of those former officials noted some variation in the application of these techniques across segments of the government—military leaders seem to be more receptive to statistical forecasting than civilian ones—but, broadly speaking, sentiment runs strongly against applied modeling.

If the evidence in favor of statistical risk assessment is so strong, why is it such a tough sell?

Part of the answer surely lies in a general tendency humans have to discount or ignore evidence that doesn’t match our current beliefs. Psychologists call this tendency confirmation bias, and it affects how we respond when models produce forecasts that contradict our expectations about the future. In theory, this is when models are most useful; in practice, it may also be when they’re hardest to sell.

Jeremy Weinstein, a professor of political science at Stanford University, served as Director for Development and Democracy on the National Security Council staff at the White House from 2009 until 2011. When I asked him why statistical forecasts don’t get used more in foreign-policy decision-making, he replied, “I only recall seeing the use of quantitative assessments in one context. And in that case, I think they were accepted by folks because they generated predictions consistent with people’s priors. I’m skeptical that they would have been valued the same if they had generated surprising predictions. For example, if a quantitative model suggests instability in a country that no one is invested in following or one everyone believes is stable, I think the likely instinct of policymakers is to question the value of the model.”

The pattern of confirmation bias extends to the bigger picture on the relative efficacy of models and experts. When asked about why policymakers don’t pay more attention to quantitative risk assessments, Anne-Marie Slaughter, former director of Policy Planning at State, responded: “You may believe that [statistical forecasts] have a better track record than expert judgment, but that is not a widely shared view. Changing minds has to come first, then changing resources.”

Where Weinstein and Slaughter note doubts about the value of the forecasts, others see deeper obstacles in the organizational culture of the intelligence community. Ken Knight, now Analytic Director at Centra Technology, spent the better part of a 30-year career in government working on risk assessment, including several years in the 2000s as National Intelligence Officer for Warning. According to Knight, “Part of it is the analytic community that I grew up in. There was very little in the way of quantitative analytic techniques that was taught to me as an analyst in the courses I took. There is this bias that says this stuff is too complex to model…People are just really skeptical that this is going to tell them something they don’t already know.”

This organizational bias may simply reflect some deep grooves in human cognition. Psychological research shows that our minds routinely ignore statistical facts about groups or populations while gobbling up or even cranking out causal stories that purport to explain those facts. These different responses appear to be built-in features of the automatic and unconscious thinking that dominates our cognition. Because of them, our minds “can deal with stories in which the elements are causally linked,” Daniel Kahneman writes, but they are “weak in statistical reasoning.”

Of course, cognitive bias and organizational culture aren’t the only reasons statistical risk assessments don’t always get traction in the intelligence-production process. Stephen Krasner, a predecessor of Slaughter’s as director of Policy Planning at State, noted in an email exchange that there’s often a mismatch between the things these models can warn about and the kinds of questions policymakers are often trying to answer. Krasner’s point was echoed in a recent column by CNAS senior fellow Andrew Exum, who notes that “intelligence organizations are normally asked to answer questions regarding both capability and intent.” To that very short list, I would add “probability,” but the important point here is that estimating the likelihood of events of concern is just one part of what these organizations are asked to do, and often not the most prominent one.

Clearly, there are a host of reasons why policy-makers might not see statistical forecasts as a valuable resource. Some are rooted in cognitive bias and organizational culture, while others are related to the nature of the problems they’re trying to solve.

That said, I suspect that modelers also share some of the blame for the chilly reception their forecasts receive. When modelers are building their forecasting tools, I suspect they often imagine their watch lists landing directly on the desks of policymakers with global concerns who are looking to take preventive action or to nudge along events they’d like to see happen. “Tell me the 10 countries where civil war is most likely,” we might imagine the president saying, “so I know where to send my diplomats and position my ships now.”

In reality, the policy process is much more reactive, and by the time something has landed on the desks of the most senior decision-makers, the opportunity for useful strategic warning is often gone. What’s more, in the rare instances where quantitative forecasts do land on policy-makers’ desks, analysts may not be thrilled to see those watch lists cutting to the front of the line and competing directly with them for the scarce attention of their “customers.”

In this environment, modelers could try to make their forecasts more valuable by designing them for, and targeting them at, people earlier in the analytical process—that is, lower in the bureaucracy. Quantitative risk assessments should be more useful to the analysts, desk officers, and deputies who may be able to raise warning flags earlier and who will be called upon when their country of interest pops into the news. Statistical forecasts of relevant events can shape those specialists’ thinking about what the major risks are in their areas of concern, hopefully spurring them to revisit their assumptions in cases where the forecast diverges significantly from their own expectations. Statistical forecasts can also give those specialists some indication on how various risks might increase or decrease as other conditions change. In this model, the point isn’t to replace or overrule the analyst’s judgment, but rather to shape and inform it.

Even without strategic redirection among modelers, though, it’s possible that broader cultural trends will at least erode resistance to statistical risk assessment among senior decision-makers and the analysts who support them. Advances in computing and communications technology are spurring the rise of Big Data and even talk of a new “age of the algorithm.” The discourse often gets a bit heady, but there’s no question that statistical thinking is making new inroads into many fields. In medicine, for example—another area where subjective judgment is prized and decisions can have life-or-death consequences—improvements in data and analysis are combining with easier access to the results to encourage practitioners to lean more heavily on statistical risk assessments in their decisions about diagnosis and treatment. If the hidebound world of medicine can find new value in statistical modeling, who knows, maybe foreign policy won’t be too far behind.

Leave a comment

39 Comments

  1. Tammy

     /  June 18, 2012

    Jay–

    Thought-provoking, as usual. Like the notion that value of a predictive model is high as additional tool for analysts earlier in the process. That said, seems that comparison to application of “big data” in medicine is flawed. Aren’t there inherent differences in the underlying nature of what’s being measured/predicted? cheers

    Reply
    • Thanks for reading, Tammy. Even though human bodies and bodies politic work very differently, I think the comparison with risk assessment in medicine works because we still poorly understand the causes of many diseases, just as we still poorly understand the causes of many forms of political change. That’s precisely why model-based risk assessment is proving so useful in some areas of medicine–there’s still a ton we don’t know, so the recognition of even fairly simple pattern can still lead to marginal gains. In the long run, I’d expect statistical risk assessments in medicine to be more accurate than the ones in politics, but both fields have lots of room for improvement now.

      Reply
  2. Rex Brynen

     /  June 18, 2012

    Jay:

    This is an excellent piece–you really should write up an expanded version of it for Perspectives on Politics or similar. I’ve offered a few thoughts of my own below–although, as any simple statistical algorithm would have predicted, they pretty much echo comments I made on an earlier blogpost on early warning and mass atrocity!

    1) What do the numbers actually show?

    I love Tetlock’s work–indeed, I’m one of the volunteer forecasters in the current Good Judgment study–but I don’t think it has been established whether his reported findings on the predictive accuracy of pundits applies to the predictive accuracy of the foreign policy and intelligence process. So far as I am aware, the only open source information on the latter is by David Mandel at Defence Research and Development Canada. His data on the performance of intel analysts (http://www7.nationalacademies.org/bbcss/DNI_Mandel_Slides.pdf) comes up with much higher numbers than Tetlock did, partly for methodological reasons (Tetlock measures the accuracy of multiple analysts on a single event, Mandel measures the accuracy of the self-generated predictions made by individual analysts), and partly because of the nature of the intel process (which is, in theory at least, designed to offset the most common problems that arise in making predictions).

    One might also argue that intel analysts have access to superior classified information upon which to base their judgments. However, having seen a great deal of classified information and read many hundreds of intel assessments, I don’t think that is what makes the difference.

    2) The sales job

    Part of the reason why quantitative models have made little inroads is, as you suggest, a cultural resistance. Part of the reason is also a terrible selling job by their proponents (coupled with some really facile quantitative work—like much of the index-based early warning stuff you warned against a few blogposts ago–that really damages the field in general).

    Advocacy work for quantitative methods needs to be done in a way that sells it as a useful adjunct for traditional methods, not as the formula-that-will-replace-the-seasoned-analyst. The latter approach not only raises hackles, but it is also quite silly given that no single model can replace the intellectual flexibility of an analyst (who might, in a single afternoon, go from developing an analysis of medium term stability in country X to writing a quick urgent brief for decision makers on the immediate implications of the car accident that just killed political leader Y).

    It also needs to be recognized that foreign policy and intel analysts are often concerned not only with predicting an outcome, but also implicitly or explicitly modelling the process whereby that outcome might occur, something diplomats need to know to affect outcomes, and intel analysts need to know in order to compare competing hypothesis, flag early warning indicators, develop branching conditional predictions, etc. If a quantitative model doesn’t offer some plausible reason WHY it predicts certain outcomes (that is, the causal logic behind the statistical correlations) it is much less likely to be accepted by folks who spend all day trying to identify causalities.

    On the positive side, analysts are trained to welcome mixed methods, and to view confirmation bias as one of the deadly sins. (They are also warned that reflexive straight-line prediction is also a deadly sin, however.) Given that, quantitative models that offer alternative perspectives ought to be an easy sell if that is what they are sold as. To return to the Mali example, one could easily imagine an analyst meeting around the prediction, in which it was used as a spur for discussion even among those who disagreed with the finding. (This technique is already used quite a bit in red teaming or alternative futures exercises, but with a qualitative rather than quantitative point of departure.)

    Finally, as you note in the blogpost, the foreign policy process is a rather messy and complicated sausage factory, whereby both information requirements and decision-making are driven by a great many things. Most of the time, no one actually asks “where will there be a coup this year” (and even if they knew, would not necessarily make major policy adjustments a priori). Models need to be designed, sold, and used within the complex foreign policy processes that we have, not offered up based on the implicit but erroneous assumption that foreign policy-making is a parsimonious, hyper rational version of chess.

    I’ll be doing work again this summer (for David Mandel’s project) measuring predictive accuracy in classified intelligence assessments–if you want to discuss these issues further, drop me an email.

    cheers,

    Rex

    Reply
    • Rex, thank you very much for dropping all this knowledge here. If you’d ever like to turn your thoughts on Mandel’s research into a guest post, I’d love to have it.

      Reply
  3. Jay,
    To follow up on Rex’s comment, in our study of strategic intelligence forecasts, analysts are much better than chimp strategies. I recently employed three versions of the chimp: (a) the chimp on the fence who always picks p = .5, (b) the extreme chimp who randomly guesses p = 1 or p = 0, and (c) chimp with 9 lives, who picks one of the 9 p levels that analysts use (namely, 0, 1, 2-3, 4, 5, 6, 7-8, 9, and 10 out of 10). These strategies produce abysmal Brier scores of .25, .52, .34, respectively. In contrast, analysts based on 1075 forecasts had a Brier score of .09 and a difficulty-adjusted Brier score (as Tetlock describes in his technical appendix) of .62. For those unfamiliar with this measure, it equals 0 when analysts predictions do no better than predicting the base rate and 1 when forecasting is perfect. A difficulty-adjusted Brier score of .62 shows considerable skill. Finally, the source of inaccuracy is not what the judgment literature would suggest. The calibration curve shows a pattern of under-extremity bias which in this case is consistent with timid probabilistic forecasts; in other words, under- not over-confidence. Over 90% of analysts’ forecasts (excluding a small number of 5/10s) are on the correct side of 50/50. Analysts in our study discriminate very well, but they couch their forecasts in more uncertainty than they need to. They are better forecasters than they think.
    David

    Reply
    • David, thanks very much for that amplification. Will the details of your study be made public at some point, or have they been already?

      Thinking of political forecasting as a whole, I wonder if Tetlock’s study has become something of a curse as well as a blessing. It’s a blessing because it gave us a set of benchmarks that are based on meticulously collected evidence across a wide range of topics. It’s something of a curse, too, though, because its success makes it very hard to get away from the simplest representations of those benchmarks to talk about the subject with more nuance.

      For researchers like you, that means it’s really hard to dislodge people from the view that “expert judgment sucks,” even when evidence shows—as yours apparently does—that many experts can do markedly better than chance.

      Meanwhile, as a statistical forecaster, I’m bothered that the statistical models used in Tetlock’s comparisons were just about the simplest ones possible, merely extrapolating forecasts for the near future from levels and trends of the recent past. On many topics, I would expect modelers with topical knowledge to be able to build models that could do much better than those. I understand that Phil’s goal wasn’t to run a horse race between humans and models, but to the extent that his study is often interpreted in those terms, it’s too bad that the modeling wasn’t more ambitious as well.

      I raise this now because my post was about how statistical models can be useful forecasting tools, but somehow we’ve wound up discussing whether or not human experts can forecast better than chimps. You’ve shown that they can, and that’s very important, but I also wonder how those Brier scores would compare to ones from carefully-built algorithms. That’s another important question, and I hope some of the ongoing research in this field is taking a look at that, too.

      Reply
  4. Rex Brynen

     /  June 19, 2012

    It certainly would be interesting to run a test of model predictions versus analyst predictions over a given time period, and see how they stack up. Indeed, if you want to put forward a model, Jay, we could probably do a small-n competition with a few analysts for 2013.

    Of course, if one looks at the content of most intel assessments, I suspect that 90% of the specific predictions being made aren’t things that a model would capture. In other words, while it is super important, the question “how likely is this regime to survive 2013 intact” (etc) represents only a small proportion of the predictions contained in the typical strategic-level analysis.

    Reply
    • On your second point here, Rex, of course that’s true, and I tried to acknowledge that in the post with the bit about capability vs. intent vs. probability. As I see it, this is another reason for analysts to embrace these tools instead of dismissing them. Even when they work wonderfully, forecasting models aren’t going to take anyone’s job, because they can’t answer most of the questions analysts get asked.

      Reply
  5. Rex Brynen

     /  June 19, 2012

    It does occur to me that we really ought to have some chimps posting on this. I’m sure they could develop cogent arguments for ape-driven analytical methods too.

    Reply
  6. Jason Fritz

     /  June 19, 2012

    Jay – excellent piece. I don’t have much to add, but I thought this line funny: “You may believe that [statistical forecasts] have a better track record than expert judgment, but that is not a widely shared view.” I assume it’s not widely shared by experts with judgments.

    Reply
  7. ADTS

     /  June 19, 2012

    Jay

    Great stuff – glad I found this – still need a bit more time to read and absorb.

    I was recently reading S&P’s methodology in light of their downgrading of US debt. What struck me was how much they tried to obfuscate the extent to which their ostensibly quantitative and objective methodology was in fact qualitative and subjective. Forget about the IC and USG for a minute – what about the private sector (EIU, Oxan, S&P, etc.)?

    Also, I have not checked the hyperlinks, but thoughts in re Feder (and BDM, Policon/Factions) and his piece in the Annual Review of Political Science a while back now? Rely on large n (i.e., statistical data analysis) or formal modeling (i.e., game theory predicated on RCT assumpions)?

    Finally, why *not* supplement with qualitative expert analysis – is that not how one develops one’s variables in any event anyways?

    Best
    ADTS

    Reply
    • Regarding private-sector risk assessment, I don’t know the specifics of those firm’s methods, so it’s hard to say. My general impression, though, is that they usually involve quantifications of judgment or the construction of indices based on subjective judgments about which factors matter and how much, neither of which is what I’m taking about when I say “statistical risk assessment.” Just because it’s numeric doesn’t mean it’s reliable or useful.

      As for supplementing or combining statistical forecasts with subjective judgments, there are different ways to do that, and there’s some evidence that some ways work better than others. In some cases, you wind up with a better forecast; in others, you wind up replacing a good forecast with a bad one.

      That’s a different problem from identifying which variables might be useful predictors in the first place. On that question, I think it’s clear that domain knowledge is very helpful. You’re probably going to get quicker to a model that’s also more reliable when you start by asking experts what risk factors to consider than when you dive blindly into a mass of data.

      Reply
      • ADTS

         /  June 20, 2012

        Hay

        Briefly, I concur. What struck me was what I did *not* see: sensitivity estimates, intercoder reliability statistics. It was a very superficial exercise in *appearing* “quantitative,” to me at least.

        ADTS

  8. ADTS

     /  June 19, 2012

    Jay

    In re Brynen, 6/19/2012, Kobrin, “Managing Political Risk Assessment” (I think) notes most political risk assessment focuses on catastrophic events – political violence like riots, coups, etc. – while most of the risk is far more mundane and far less dramatic – exchange rates, non-tariff barriers to trade (for example, regulations and taxes), etc. Any thoughts on a good way to test models – some valid construction of a predicted/observed index taking into account the nature of the examined phenomena (in other words, appropriately weighting “big” events like political violence and “small” events like regulatory or policy shifts)?

    Best
    ADTS

    Reply
    • I think it’s really just a matter of developing separate models for separate events. There’s no reason we can’t build a statistical model of exchange rates or regulatory shifts, too.

      Reply
  9. Rex Brynen

     /  June 19, 2012

    I think this overstates the predictive accuracy (and predictive utility) of BBdeM’s work, but relevant nonetheless: http://salises.mona.uwi.edu/sa61a/Rational%20Choice%20models.pdf

    There is also research to suggest that rational choice theorists are no better than students using unaided judgment in predicting outcomes, while methodologies using interactive role-playing (that is, gaming) are significantly better than both:

    Click to access gt_update_in_IJF21.pdf

    Then again, the FP and IC community uses much less analytical gaming than people might think–and very little indeed outside the US, in my experience.

    Reply
  10. Jay — the findings will be summarized in a comprehensive tech report with more focused peer-reviewed articles, I hope, to follow. We are still running analyses.

    Phil’s study is phenomenal in many ways. How many psychologists run 20 year studies on topics of such importance, while taking great methodological care. Still, each study has its own particular set of features. Readers of the final product seldom think about how each of those factors can affect the findings. Phil’s study has greater internal validity than our analyst study, and some of his methodological checks and balances, if applied to our analysts, might yield different results. In particular, he took care to structure questions in such a way such that the possible outcomes were exhaustive and mutually exclusive, and where, after the fact, all reasonable people should at least agree on what happened, even if it disconfirmed their expectation. In our study, about 15% of outcomes were uncodable and excluded from analysis. On the other hand, our study was about as high as you can get in terms of external validity since the analyses were conducted on real intelligence assessments by analysts who were doing their job, not taking part in an expert survey. When Phil posed questions to his experts, they presumably drew on their knowledge and beliefs and formulated a prediction. They didn’t, however, go to work on the question, as analysts would. Imagine how well our intelligence analysts might have done if they were given questions to which they had to provide immediate answers without research the question. Sure, their area expertise would be an asset, but I’m not sure a decision maker would appreciate a judgment that preceded question-specific analysis. Both mine and Phil’s are expert studies, but the subtle and not-so-subtle differences shouldn’t be overlooked.

    As for the issue of predictions models, I think the question of how an intelligence organization would implement them needs to be addressed. Otherwise, we won’t get past making human forecasters who actually do offer predictions to decision makers feel defensive towards, or peeved at, statistical modellers who keep insisting they can do better. Will analysts be building those models – not the current set (of analysts). So, who will build them, and who will use them once they’re built, and how will their output be integrated into reporting formats that organizations or end users are committed to sticking with? In other words, how extreme would organizational changes have to be before such models were usable as more than demonstrations of human imperfection in forecasting? On a fixed budget, would organizations reduce the number of intelligence analysts so that they could develop statistical analysis cells? Would the new arrangements create new interoperability issues? Would improvements in forecast accuracy offset the unintended negative consequences? In other words, would the real application of such models end up being like Jervis’s example (in Tetlock & Belkin, 1996, e.g.) of double-hulled compartments in shipping vessels? What are the nth-order effects (good and bad) that escape our prefactual thoughts about an IC that would use statistical models as standard practice?

    David

    Reply
    • ADTS

       /  June 20, 2012

      David,

      I think Klein, “Sources of Power” did a decent, roughly analogous “experiment” (probably not an appropriate word, but please be lenient) with respect to forecasting the Polish economy circa 1990 (?). The person with both the analytic construct and the domain knowledge – an expatriate Polish economist who was faculty at a liberal arts college, if I recall correctly – did best; indeed, he was the only one who did well at all.

      ADTS

      Reply
    • David, I could not agree more with your points about the organizational implications. I guess my hope is that senior managers in the relevant agencies will see the potential and will start trying proactively to sort out the issues you raise, instead of continuing to treat the forecasts these tools can produce as just another bit of information in the intel stream.

      Reply
      • Jay, I’m skeptical about that happening. Many senior managers in the IC aren’t even comfortable with the premise that analysts are in the business of making predictions and they are even less comfortable with the notion of analysis as an applied science. They tend to believe that their clients prefer narrative accounts, that they distrust the precision of point predictions, but that they lack either the skill or patience to interpret confidence intervals. I think until some rather senior clients disabuse them of these notions, IC practice will continue more or less as usual. If any such attempt were to be proactive on the part of the IC, the scientific community should work just as hard on the implementation problems as the modelling problems (drawing on different pools of expertise, of course).

  11. ADTS

     /  June 20, 2012

    Jay

    Apologies for geeking out so hard and hogging your comments, but these are questions that have been pent up for so long. 🙂

    1) Do you have any thoughts about economists who perform political risk analysis at economic forecasting houses, investment banks, etc., as opposed to people with formal training in political science? I thought Drezner raised good points after the S&P downgrade about the economists performing poor political science.

    2) What do you think are the best commercially available forecasting data sets in terms of both underlying validity/reliability as well as predictive/explanatory power? Someone whose opinion I respect thinks well of Political Risk Services, ICRG/Coplin-O’Leary, but it seems to me that on this topic, opinions are like elbows: everyone has at least two (to use the sanitized version of the saying).

    ADTS

    Reply
    • Re 1, I don’t know enough about the modeling those folks do to say. That said, I do know that “political risk” is a funny animal; it means different things to different people, and even the pros often don’t define it very clearly. So, for some definitions–say, sovereign default–it may be that economists have more relevant expertise than political scientists, and vice versa (e.g., on coups).

      Re 2, that depends entirely on what you’re trying to forecast. In general, though, I look for data that’s produced on a reliable schedule by people who are transparent about its construction. I like to know what the numbers I’m using actually represent, and I like to know that I’ll get them when I need them to generate a forecast.

      Reply
  12. ADTS

     /  June 20, 2012

    Jay

    Appreciate your thorough replies to each of my inquiries *so* much – thank you!

    ADTS

    Reply
  13. ADTS

     /  June 21, 2012

    I tried to respond to David Mandel’s respond, but the software does not seem to let me.

    What would be required to make decisionmakers amenable to statistically-derived insight?

    Should there be a course in the basics – assessments of probability (e.g., p values), assessments of substantive significance (e.g., beta coefficients), etc.?

    How realistic is coercing successful adults into taking Statistics 101?

    ADTS

    Reply
    • Da

       /  June 21, 2012

      As Rex has noted, much of intelligence reporting actually isn’t in the form of what a statistical model would produce. I really think that the attention to the issue isn’t a priority for management in the IC. If the NSA was breathing down the DNI’s neck to “use scientific models for forecasting,” then the ADNI for Analysis would probably get tasked to “make it so” and,likewise, IARPA PMs would get that sort of thing added pretty quickly to their funding priority list. Demands from powerful sources, not stats courses for directors, will make this sort of endeavor more likely. IC reform is largely reactive; I don’t see directors voluntarily geeking up any time soon. If this sort of endeavor becomes standard practice, I think the IC will have to hire new analysts with a different skill set and way of seeing the problem space. Just like global terrorism prompted the construction of more transnational units in the IC, whatever might prompt analytic reform in this general direction, will require new units where the analysts know more about Bayes than bays. If anyone thinks that the traditional analyst will add all these statistical models to their “analytic toolkit”, I think they are deluding themselves. This goes beyond the issue of technical competence to the issue of professional accountability. Why would I, as an analyst, use something I don’t understand well and might be called to task on? Imagine how it would look if a director asks, “Well, how did you arrive at that estimate?” And, the analyst replies, “Well, sir, these data sources were inputted into the model, which uses various algorithms I don’t really understand to produce a forecast that behavioral scientists in academia assure us work even better than traditional analysis by humans. I then take the forecast output and work it into my report. That what I do, sir.” I wouldn’t want to be *that* analyst.

      David

      Reply
      • Rex Brynen

         /  June 21, 2012

        DA’s comment on accountability is a really important one (“This goes beyond the issue of technical competence to the issue of professional accountability. Why would I, as an analyst, use something I don’t understand well and might be called to task on? Imagine how it would look if a director asks, “Well, how did you arrive at that estimate?” And, the analyst replies, “Well, sir, these data sources were inputted into the model, which uses various algorithms I don’t really understand to produce a forecast…”), and I might even take it a step further.

        The very best IC managers I’ve worked with have really pushed analysts to unpack their assumptions, and the assumptions in those assumptions–to think about why the moving parts supposedly work in a certain way, and whether there are other drivers that might be off the radar screen. Analysts sometimes hate it: it is a pain in the neck to send what you think is a well-polished draft upwards, only for the boss to send it back looking like a disassembled lawn mower, with a series of difficult questions about the implicit or explicit logic of your argument and the quality of your data. However, the process dramatically improves the quality of analysis (for reasons that Tetlock’s work on expert judgment can readily explain).

        It follows, therefore, that quantitative methods need to not be excessively black-boxed if they are to find greater take-up in the IC. users need to understand what they are doing, why, and what their intellectual underpinnings. Their operation also needs to be understood by those who didn’t take (or can no longer remember) a grad level stats course.

        Great discussion, folks–and thanks, Jay, for starting it off.

      • Yes, that lines up well with some directors’ views that externalizing evidence, hypotheses, and reasoning is very important. Externalizing the audit trail. As well, IARPA is funding a project to develop and test an argument-mapping course for analysts aimed at improving their critical thinking skills by aiding that externalizing-of-argumentation process. I’m involved with that effort and will be trying to set up a course in Ottawa for Canadian analysts in the next year.

        David (apparently, also known as DA when I work on a government computer)

  14. ADTS

     /  June 21, 2012

    Thanks to David, Rex and Jay.

    I like Rex’s point about assumptions. I know that a lot of times, behavioral economics and finance (e.g., Kahneman and Tversky and Thaler, say) seems a much more accurate description of my own behavior as well as the behavior of others; in many ways (although not to sound dismissive of Nobel-worthy work) it seems intuitive. Perhaps accordingly, I can easily see any number of standard RCT assumptions dismissed handily if the premises are challenged to any real scrutiny:

    Peoples’ preferences are rank-ordered and transitive. They are?

    Peoples preferences do not change over time. Really?

    People maximize self-interest and wealt maximization h is an easy and good proxy for this. It is?

    This leads me another question:

    Would one have any easier time defending RCT-based game theory or large n statistical analysis? I would think the latter, but would be curious to get the perspective of others.

    Thanks again,

    ADTS

    Reply
  15. ADTS

     /  June 21, 2012

    David

    The Jervis cite is “System Effects,” I think”; it is on the first page and is itself cited from a semipop magazine like “Scientific American” (again, I think).

    When you say “our study” or “our article,” could you please provide a citation or link to a working paper, etc.? – I see you are hyperlinked but am too lazy, if possible, to avoid scrolling through a CV.

    🙂

    ADTS

    Reply
    • The Jervis reference I had in mind was from his thought piece in Phil Tetlock and Aaron Belkin’s 1996 edited book, Counterfactual thought experiments in word politics. I suspect Jervis made similar points in other publications.

      As for “our study”, as I mentioned to Jay, there will be a tech report coming, but at present the only document that describes the study is my presentation slides from recent conferences. I believe that Rex provided a link in his initial post to this thread to a slide deck I had used when I spoke at the National Academies in 2009, but the sample of forecasts was much smaller then. The more recent slides are not on the internet.

      Reply
  16. ADTS

     /  June 22, 2012

    Jay

    I realize the party is over.

    Still, I perceive four primary elements (or independent variables) in your underlying model:

    1) Wealth (e.g., poor/not poor)
    2) Government (e.g., autocratic/democratic)
    3) Political violence (e.g., insurgency/peace)
    4) Region (e.g., West Africa/Western Europe)

    Is this the basic model for predicting coups? What are the inputs used (e.g., GDP/per capita for wealth, etc.) Is this different than political instability/violence/risk broadly construed, or is the dependent variable (that is to say, coups) more narrowly defined?

    Thanks
    ADTS

    Reply
    • My initial post on coup risk details the inputs used. Unsurprisingly, there’s a lot of overlap in risk factors across various forms of political instability when you’re looking at country-years. That’s partly because the structural conditions conducive to these various things are similar, and it’s partly because there aren’t a million things for which we have sufficient data to try.

      Reply
  1. Why Not Forecast The Future? | The Penn Ave Post
  2. Dr. Bayes, or How I Learned to Stop Worrying and Love Updating « Dart-Throwing Chimp
  3. Yes, Forecasting Conflict Can Help Make Better Foreign Policy Decisions | Dart-Throwing Chimp
  4. Statistics | Flower

Leave a Comment

  • Follow me on Twitter

  • Follow Dart-Throwing Chimp on WordPress.com
  • Enter your email address to follow this blog and receive notifications of new posts by email.

    Join 13.6K other subscribers
  • Archives