An Applied Forecaster’s Bad Dream

This is the sort of thing that freaks me out every time I’m getting ready to deliver or post a new set of forecasts:

In its 2015 States of Fragility report, the Organization for Economic Co-operation and Development (OECD) decided to complicate its usual one-dimensional list of fragile states by assessing five dimensions of fragility: Violence, Justice, Institutions, Economic Foundations and Resilience…

Unfortunately, something went wrong during the calculations. In my attempts to replicate the assessment, I found that the OECD misclassified a large number of states.

That’s from a Monkey Cage post by Thomas Leo Scherer, published today. Here, per Scherer, is why those errors matter:

Recent research by Judith Kelley and Beth Simmons shows that international indicators are an influential policy tool. Indicators focus international attention on low performers to positive and negative effect. They cause governments in poorly ranked countries to take action to raise their scores when they realize they are being monitored or as domestic actors mobilize and demand change after learning how they rate versus other countries. Given their potential reach, indicators should be handled with care.

For individuals or organizations involved in scientific or public endeavors, the best way to mitigate that risk is transparency. We can and should argue about concepts, measures, and model choices, but given a particular set of those elements, we should all get essentially the same results. When one or more of those elements is hidden, we can’t fully understand what the reported results represent, and researchers who want to improve the design by critiquing and perhaps extending it are forced to box shadows. Also, individuals and organizations can double– and triple-check their own work, but errors are almost inevitable. When getting the best possible answers matters more than the risk of being seen making mistakes, then transparency is the way to go. This is why the Early Warning Project shares the data and code used to produce its statistical risk assessments in a public repository, and why Reinhart and Rogoff probably (hopefully?) wish they’d done something similar.

Of course, even though transparency improves the probability of catching errors and improving on our designs, it doesn’t automatically produce those goods. What’s more, we can know that we’re doing the right thing and still dread the public discovery of an error. Add to that risk the near-certainty of other researchers scoffing at your terrible code, and it’s easy see why even the best practices won’t keep you from breaking out in a cold sweat each time you hit “Send” or “Publish” on a new piece of work.

 

Reactions to Reflections on the Arab Uprisings

Yesterday, Marc Lynch posted a thoughtful and candid set of reflections on how political scientists who specialize in the Middle East performed as analysts and forecasters during the Arab uprisings, not before them, the subject on which most of the retrospectives have focused thus far. The background to the post is a set of memos Marc commissioned from the contributors to a volume he edited on the origins of the uprisings. As Marc summarizes, their self-criticism is tough:

We paid too much attention to the activists and not enough to the authoritarians; we understated the importance of identity politics; we assumed too quickly that successful popular uprisings would lead to a democratic transition; we under-estimated the key role of international and regional factors in domestic outcomes; we took for granted a second wave of uprisings, which thus far has yet to materialize; we understated the risk of state failure and over-stated the possibility of democratic consensus.

Social scientists and other professional analysts of world affairs should read the whole thing—if not for the specifics, then as an example of how to assess and try to learn from your own mistakes. Here, I’d like to focus on three points that jumped out at me as I read it.

The first is the power of motivated reasoning—”the unconscious tendency of individuals to process information in a manner that suits some end or goal extrinsic to the formation of accurate beliefs.” When we try to forecast politics in real time, we tend to conflate our feelings about specific events or trends with their likelihood. After noting that he and his colleagues over-predicted democratization, Marc observes:

One point that emerged in the workshop discussions is the extent to which we became too emotionally attached to particular actors or policies. Caught up in the rush of events, and often deeply identifying with our networks of friends and colleagues involved in these politics, we may have allowed hope or passion to cloud our better comparative judgment.

That pattern sounds a lot like the one I saw in my own thinking when I realized that my initial forecasts about the duration and outcome of the Syrian civil war had missed badly.

This tendency is probably ubiquitous, but it’s also one about which we can actually do something, even if we can’t eliminate it. Whenever we’re formulating an analysis or prediction, we can start by ask ourselves what result we hope to see and why, and we can think about how that desire might relate to the conclusions we’re reaching. We can try to imagine how someone with different motivations might view the same situation, or just seek out examples of those alternative views. Finally, we can weight or adjust our own analysis accordingly. Basically, we can try to replicate in our own analysis what “wisdom of crowds” systems do to great effect on a larger scale. This exercise can’t fully escape the cognitive traps to which it responds, but I think it can at least mitigate their influence.

Second, Marc’s reflections also underscore our tendency to underestimate the prevalence of inertia in politics, especially during what seem like exceptional times. As I recently wrote, our analytical eyes are drawn to the spectacular and dynamic, but on short time scales at least, continuity is the norm. Observers hoping for change in the countries touched by the Arab uprisings would have done well to remember this fact—and surely some did—when they were trying to assess how much structural change those uprisings would actually produce.

My last point concerns the power of social scientists to shape these processes as they unfold. In reflecting on his own analysis, Marc notes that he correctly saw how the absence of agreement on the basic rules of politics would complicate transitions, but he “was less successful in figuring out how to overcome these problems.” Marc aptly dubs this uncertainty Calvinball, and he concludes:

I’m more convinced than ever that moving beyond Calvinball is essential for any successful transition, but what makes a transitional constitutional design process work—or fail—needs a lot more attention.

Actually, I don’t think the problem is a lack of attention. How to escape this uncertainty in a liberal direction has been a central concern for decades now of scholarship on democratization and the field of applied democracy promotion that’s grown up alongside it. Giuseppe di Palma’s 1990 book, To Craft Democracies, remains a leading example on the kind of advocacy-cum-scholarship this field has produced, but there are countless “lesson learned” white papers and “best practices” policy briefs to go with it.

No, the real problem is that transitional periods are irreducibly fraught with the uncertainties Marc rightly spotlighted, and there simply are no deus-ex-machina resolutions to them. When scholars and practitioners do get involved, we are absorbed into the politics we mean to “correct,” and most of us aren’t nearly as adept in that field as we are in our own. After a couple of decades of closely watching these transitions and the efforts of various parties to point them in particular directions, I have come to believe that this is one of those things social science can help us understand but not “fix.”

The Ethics of Political Science in Practice

As citizens and as engaged intellectuals, we all have the right—indeed, an obligation—to make moral judgments and act based on those convictions. As political scientists, however, we have a unique set of potential contributions and constraints. Political scientists do not typically have anything of distinctive value to add to a chorus of moral condemnation or declarations of normative solidarity. What we do have, hopefully, is the methodological training, empirical knowledge and comparative insight to offer informed assessments about alternative courses of action on contentious issues. Our primary ethical commitment as political scientists, therefore must be to get the theory and the empirical evidence right, and to clearly communicate those findings to relevant audiences—however unpalatable or inconclusive they might be.

That’s a manifesto of sorts, nested in a great post by Marc Lynch at the Monkey Cage. Marc’s post focuses on analysis of the Middle East, but everything he writes generalizes to the whole discipline.

I’ve written a couple of posts on this theme, too:

  • This Is Not a Drill,” on the challenges of doing what Marc proposes in the midst of fast-moving and politically charged events with weighty consequences; and
  • Advocascience,” on the ways that researchers’ political and moral commitments shape our analyses, sometimes but not always intentionally.

Putting all of those pieces together, I’d say that I wholeheartedly agree with Marc in principle, but I also believe this is extremely difficult to do in practice. We can—and, I think, should—aspire to this posture, but we can never quite achieve it.

That applies to forecasting, too, by the way. Coincidentally, I saw this great bit this morning in the Letter from the Editors for a new special issue of The Appendix, on “futures of the past”:

Prediction is a political act. Imagined futures can be powerful tools for social change, but they can also reproduce the injustices of the present.

Concern about this possibility played a role in my decision to leave my old job, helping to produce forecasts of political instability around the world for private consumption by the U.S. government. It is also part of what attracts me to my current work on a public early-warning system for mass atrocities. By making the same forecasts available to all comers, I hope that we can mitigate that downside risk in an area where the immorality of the acts being considered is unambiguous.

As a social scientist, though, I also understand that we’ll never know for sure what good or ill effects our individual and collective efforts had. We won’t know because we can’t observe the “control” worlds we would need to confidently establish cause and effect, and we won’t know because the world we seek to understand keeps changing, sometimes even in response to our own actions. This is the paradox at the core of applied, empirical social science, and it is inescapable.

Yes, Forecasting Conflict Can Help Make Better Foreign Policy Decisions

At the Monkey Cage, Idean Salehyan has a guest post that asks, “Can forecasting conflict help to make better foreign policy decisions?” I started to respond in a comment there, but as my comment ballooned into several paragraphs and started to include hyperlinks, I figured I’d go ahead and blog it.

Let me preface my response by saying that I’ve spent most of my 16-year career since graduate school doing statistical forecasting for the U.S. government and now wider audiences and plan and expect to continue doing this kind of work for a while. That means I have a lot of experience doing it and thinking about how and why to do it, but it also means that I’m financially invested in an affirmative answer to Salehyan’s rhetorical question. Make of that what you will.

So, on to the substance. Salehyan’s main concern is actually an ethical one, not the pragmatic one I inferred when I first saw the title of his post. When Salehyan asks about making decisions “better,” he doesn’t just mean more effective. In his view,

Scholars cannot be aloof from the real-world implications of their work, but must think carefully about the potential uses of forecasts…If social scientists will not use their research to engage in policy debates about when to strike, provide aid, deploy troops, and so on, others will do so for them.  Conflict forecasting should not be seen as value-neutral by the academic community—it will certainly not be seen as such by others.

On this point, I agree completely, but I don’t think there’s anything unique about conflict forecasting in this regard. No scholarship is entirely value neutral, and research on causal inference informs policy decisions, too. In fact, my experience is that policy frames suggested by compelling causal analysis have deeper and more durable influence than statistical forecasts, which most policymakers still seem inclined to ignore.

One prominent example comes from the research program that emerged in the 2000s on the relationship between natural resources and the occurrence and persistence of armed conflict. After Paul Collier and Anke Hoeffler famously identified “greed” as an important impetus to civil war (here), numerous scholars showed that some rebel groups were using “lootable” resources to finance their insurgencies. These studies helped inspire advocacy campaigns that led, among other things, to U.S. legislation aimed at restricting trade in “conflict minerals” from the Democratic Republic of Congo. Now, several years later, other scholars and advocates have convincingly shown that this legislation was counterproductive. According to Laura Seay (here), the U.S. law

has created a de facto ban on Congolese mineral exports, put anywhere from tens of thousands up to 2 million Congolese miners out of work in the eastern Congo, and, despite ending most of the trade in Congolese conflict minerals, done little to improve the security situation or the daily lives of most Congolese.

Those are dire consequences, and forecasting is nowhere in sight. I don’t blame Collier and Hoeffler or the scholars who followed their intellectual lead on this topic for Dodd-Frank 1502, but I do hope and expect that those scholars will participate in the public conversation around related policy choices.

Ultimately, we all have a professional and ethical responsibility for the consequences of our work. For statistical forecasters, I think this means, among other things, a responsibility to be honest about the limitations, and to attend to the uses, of the forecasts we produce. The fact that we use mathematical equations to generate our forecasts and we can quantify our uncertainty doesn’t always mean that our forecasts are more accurate or more precise than what pundits offer, and it’s incumbent on us to convey those limitations. It’s easy to model things. It’s hard to model them well, and sometimes hard to spot the difference. We need to try to recognize which of those worlds we’re in and to communicate our conclusions about those aspects of our work along with our forecasts. (N.B. It would be nice if more pundits tried to abide by this rule as well. Alas, as Phil Tetlock points out in Expert Political Judgment, the market for this kind of information rewards other things.)

Salehyan doesn’t just make this general point, however. He also argues that scholars who produce statistical forecasts have a special obligation to attend to the ethics of policy informed by their work because, in his view, they are likely to be more influential.

The same scientific precision that makes statistical forecasts better than ‘gut feelings’ makes it even more imperative for scholars to engage in policy debates.  Because statistical forecasts are seen as more scientific and valid they are likely to carry greater weight in the policy community.  I would expect—indeed hope—that scholars care about how their research is used, or misused, by decision makers.  But claims to objectivity and coolheaded scientific-ness make many academics reluctant to advocate for or against a policy position.

In my experience and the experience of every policy veteran with whom I’ve ever spoken about the subject, Salehyan’s conjecture that “statistical forecasts are likely to carry greater weight in the policy community” is flat wrong. In many ways, the intellectual culture within the U.S. intelligence and policy communities mirrors the intellectual culture of the larger society from which their members are drawn. If you want to know how those communities react to statistical forecasts of the things they care about, just take a look at the public discussion around Nate Silver’s election forecasts. The fact that statistical forecasts aren’t blithely and blindly accepted doesn’t absolve statistical forecasters of responsibility for their work. Ethically speaking, though, it matters that we’re nowhere close to the world Salehyan imagines in which the layers of deliberation disappear and a single statistical forecast drives a specific foreign policy decision.

Look, these decisions are going to be made whether or not we produce statistical forecasts, and when they are made, they will be informed by many things, of which forecasts—statistical or otherwise—will be only one. That doesn’t relieve the forecaster of ethical responsibility for the potential consequences of his or her work. It just means that the forecaster doesn’t have a unique obligation in this regard. In fact, if anything, I would think we have an ethical obligation to help make those forecasts as accurate as we can in order to reduce as much as we can the uncertainty about this one small piece of the decision process. It’s a policymaker’s job to confront these kinds of decisions, and their choices are going to be informed by expectations about the probability of various alternative futures. Given that fact, wouldn’t we rather those expectations be as well informed as possible? I sure think so, and I’m not the only one.

  • Author

  • Follow me on Twitter

  • Follow Dart-Throwing Chimp on WordPress.com
  • Enter your email address to follow this blog and receive notifications of new posts by email.

    Join 13,609 other subscribers
  • Archives

%d bloggers like this: