The Siren Song of Certainty

Statistician William Briggs ran a great post today under the headline “Uncertainty Is an Impossible Sell” (h/t Danilo Freire on Facebook). Read the whole thing, but here’s the money quote:

If you want to set up business as a data scientist (the newfangled term by which statisticians are beginning to call themselves), the lesson is this: promise the moon and charge like you’re actually going there. Failure is rarely punished and never remembered.

Sad but true. Sad because the siren song of certainty tempts us into wasteful spending and poorly informed decision-making and makes it tougher for honest brokers to compete in the marketplace of paid work and ideas.

Here’s Daniel Kahneman on the latter point in Thinking, Fast and Slow (pp. 262–263):

Optimism is highly valued, socially and in the market; people and firms reward the providers of dangerously misleading information more than they reward truth tellers…

Experts who acknowledge the full extent of their ignorance may expect to be replaced by more confident competitors, who are better able to gain the trust of clients. An unbiased appreciation of uncertainty is a cornerstone of rationality—but it is not what people and organizations want. Extreme uncertainty is paralyzing under dangerous circumstances, and the admission that one is merely guessing is especially unacceptable when the stakes are high. Acting on pretended knowledge is often the preferred solution.

Remember, the con in con artist is short for confidence. The more excited or flattered or assured a forecaster or other expert makes you feel, the more skeptical you should probably be.

What are all these violent images doing to us?

Early this morning, I got up, made some coffee, sat down at my desk, and opened Twitter to read the news and pass some time before I had to leave for a conference. One of the first things I saw in my timeline was a still from a video of what was described in the tweet as an ISIS fighter executing a group of Syrian soldiers. The soldiers lay on their stomachs in the dirt, mostly undressed, hands on their heads. They were arranged in a tightly packed row, arms and legs sometimes overlapping. The apparent killer stood midway down the row, his gun pointed down, smoke coming from its barrel.

That experience led me to this pair of tweets:

tweet 1

tweet 2

If you don’t use Twitter, you probably don’t know that, starting in 2013, Twitter tweaked its software so that photos and other images embedded in tweets would automatically appear in users’ timelines. Before that change, you had to click on a link to open an embedded image. Now, if you follow someone who appends an image to his or her tweet, you instantly see the image when the tweet appears in your timeline. The system also includes a filter of sorts that’s supposed to inform you before showing media that may be sensitive, but it doesn’t seem to be very reliable at screening for violence, and it can be turned off.

As I said this morning, I think the automatic display of embedded images is great for sharing certain kinds of information, like data visualizations. Now, tweets can become charticles.

I am increasingly convinced, though, that this feature becomes deeply problematic when people choose to share disturbing images. After I tweeted my complaint, Werner de Pooter pointed out a recent study on the effects of frequent exposure to graphic depictions of violence on the psychological health of journalists. The study’s authors found that daily exposure to violent images was associated with higher scores on several indices of psychological distress and depression. The authors conclude:

Given that good journalism depends on healthy journalists, news organisations will need to look anew at what can be done to offset the risks inherent in viewing User Generated Content material [which includes graphic violence]. Our findings, in need of replication, suggest that reducing the frequency of exposure may be one way to go.

I mostly use Twitter to discover stories and ideas I don’t see in regular news outlets, to connect with colleagues, and to promote my own work. Because I study political violence and atrocities, a fair share of my feed deals with potentially disturbing material. Where that material used to arrive only as text, it increasingly includes photos and video clips of violent or brutal acts as well. I am starting to wonder how routine exposure to those images may be affecting my mental health. The study de Pooter pointed out has only strengthened that concern.

I also wonder if the emotional power of those images is distorting our collective sense of the state of the world. Psychologists talk about the availability heuristic, a cognitive shortcut in which the ease of recalling examples of certain things drives our expectations about the likelihood or risk of those things. As Daniel Kahneman describes on p. 138 of Thinking, Fast and Slow,

Unusual events (such as botulism) attract disproportionate attention and are consequently perceived as less unusual than they really are. The world in our heads is not a precise replica of reality; our expectations about the frequency of events are distorted by the prevalence and emotional intensity of the messages to which we are exposed.

When those images of brutal violence pop into our view, they grab our attention, pack a lot of emotional intensity, and are often to hard to shake. The availability heuristic implies that frequent exposure to those images leads us to overestimate the threat or risk of things associated with them.

This process could even be playing some marginal role in a recent uptick in stories about how the world is coming undone. According to Twitter, its platform now has more than 270 million monthly active users. Many journalists and researchers covering world affairs probably fall in that 270 million. I suspect that those journalists and researchers spend more time watching their timelines than the average user, and they are probably more likely to turn off that “sensitive content” warning, too.

Meanwhile, smartphones and easier Internet access make it increasingly likely that acts of violence will be recorded and then shared through those media, and Twitter’s default settings now make it more likely that we see them when they are. Presumably, some of the organizations perpetrating this violence—and, sometimes, ones trying to mobilize action to stop it—are aware of the effects these images can have and deliberately push them to us to try to elicit that response.

As a result, many writers and analysts are now seeing much more of this material than they used to, even just a year or two ago. Whatever the actual state of the world, this sudden increase in exposure to disturbing material could be convincing many of us that the world is scarier and therefore more dangerous than ever before.

This process could have larger consequences. For example, lately I’ve had trouble getting thoughts of James Foley’s killing out of my mind, even though I never watched the video of it. What about the journalists and policymakers and others who did see those images? How did that exposure affect them, and how much is that emotional response shaping the public conversation about the threat the Islamic State poses and how our governments should respond to it?

I’m not sure what to do about this problem. As an individual, I can choose to unfollow people who share these images or spend less time on Twitter, but both of those actions carry some professional costs as well. The thought of avoiding these images also makes me feel guilty, as if I am failing the people whose suffering they depict and the ones who could be next. By hiding from those images, do I become complicit in the wider violence and injustice they represent?

As an organization, Twitter could decide to revert to the old no-show default, but that almost certainly won’t happen. I suspect this isn’t an issue for the vast majority of users, and it’s hard to imagine any social-media platform retreating from visual content as sites like Instagram and Snapchat grow quickly. Twitter could also try to remove embedded images that contain potentially disturbing material. As a fan of unfettered speech, though, I don’t find that approach appealing, either, and the unreliability of the current warning system suggests it probably wouldn’t work so well anyway.

In light of all that uncertainty, I’ll conclude with an observation instead of a solution: this is one hell of a huge psychological experiment we’re running right now, and its consequences for our own mental health and how we perceive the world around us may be more substantial than we realize.

How Circumspect Should Quantitative Forecasters Be?

Yesterday, I participated in a panel discussion on the use of technology to prevent and document mass atrocities as part of an event at American University’s Washington College of Law to commemorate the Rwandan genocide.* In my prepared remarks, I talked about the atrocities early-warning system I’m helping build for the U.S. Holocaust Memorial Museum’s Center for the Prevention of Genocide. The chief outputs of that system are probabilistic forecasts, some from statistical models and others from a “wisdom of (expert) crowds” system called an opinion pool.

After I’d described that project, one of the other panelists, Patrick Ball, executive director of Human Rights Data Analysis Group, had this to say via Google Hangout:

As someone who uses machine learning to build statistical models—that’s what I do all day long, that’s my job—I’m very skeptical that models about conflict, about highly rare events that have very complicated and situation-unique antecedents are forecastable. I worry about early warning because when we build models we listen to people less. I know that, from my work with the U.N., when we have a room full of people who know an awful lot about what’s going on on the ground, a graph—when someone puts a graph on the table, everybody stops thinking. They just look at the graph. And that worries me a lot.

In 1994, human-rights experts warned the world about what was happening [in Rwanda]. No one listened. So as we, as technologists and people who like technology, when we ask questions of data, we have to make sure that if anybody is going to listen to us, we’d better be giving them the right answers.

Maybe I was being vain, but I heard that part of Patrick’s remarks as a rebuke of our early-warning project and pretty much every other algorithm-driven atrocities and conflict forecasting endeavor out there. I responded by acknowledging that our forecasts are far from perfect, but I also asserted that we have reason to believe they will usually be at least marginally better than the status quo, so they’re worth doing and sharing anyway.

A few minutes later, Patrick came back with this:

When we build technology for human rights, I think we need to be somewhat thoughtful about how our less technical colleagues are going to hear the things that we say. In a lot of meetings over a lot of years, I’ve listened to very sophisticated, thoughtful legal, qualitative, ethnographic arguments about very specific events occurring on the ground. But almost inevitably, when someone proposes some kind of quantitative analysis, all that thoughtful reasoning escapes the room… The practical effect of introducing any kind of quantitative argument is that it displaces the other arguments that are on the table. And we are naive to think otherwise.

What that means is that the stakes for getting these kinds of claims right are very high. If we make quantitative claims and we’re wrong—because our sampling foundations are weak, because our model is inappropriate, because we misinterpreted the error around our claim, or for any other reason—we can do a lot of harm.

From that combination of uncertainty and the possibility for harm, Patrick concludes that quantitative forecasters have a special responsibility to be circumspect in the presentation of their work:

I propose that one of the foundations of any kind of quantitative claims-making is that we need to have very strict validation before we propose a conclusion to be used by our broader community. There are all kinds of rules about validation in model-building. We know a lot about it. We have a lot of contexts in which we have ground truth. We have a lot of historical detail. Some of that historical detail is itself beset by these sampling problems, but we have opportunities to do validation. And I think that any argument, any claim that we make—especially to non-technical audiences—should lead with that validation rather than leaving it to the technical detail. By avoiding discussing the technical problems in front of non-technical audiences, we’re hiding stuff that might not be working. So I warn us all to be much stricter.

Patrick has applied statistical methods to human-rights matters for a long time, and his combined understanding of the statistics and the advocacy issues is as good if not better than almost anyone else’s. Still, what he described about how people respond to quantitative arguments is pretty much the exact opposite of my experience over 15 years of working on statistical forecasts of various forms of political violence and change. Many of the audiences to which I’ve presented that work have been deeply skeptical of efforts to forecast political behavior. Like Patrick, many listeners have asserted that politics is fundamentally unquantifiable and unpredictable. Statistical forecasts in particular are often derided for connoting a level of precision that’s impossible to achieve and for being too far removed from the messy reality of specific places to produce useful information. Even in cases where we can demonstrate that the models are pretty good at distinguishing high-risk cases from low-risk ones, that evidence usually fails to persuade many listeners, who appear to reject the work on principle.

I hear loud echoes of my experiences in Daniel Kahneman’s discussion of clinical psychologists’ hostility to algorithms and enduring prejudice in favor of clinical judgment, even in situations where the former is demonstrably superior to the latter. On pp. 228 of Thinking, Fast and Slow, Kahneman observes that this prejudice “is an attitude we can all recognize.”

When a human competes with a machine, whether it is John Henry a-hammerin’ on the mountain or the chess genius Garry Kasparov facing off against the computer Deep Blue, our sympathies lie with our fellow human. The aversion to algorithms making decisions that affect humans is rooted in the strong preference that many people have for the natural over the synthetic or artificial.

Kahneman further reports that

The prejudice against algorithms is magnified when the decisions are consequential. [Psychologist Paul] Meehl remarked, ‘I do not quite know how to alleviate the horror some clinicians seem to experience when they envisage a treatable case being denied treatment because a ‘blind, mechanical’ equation misclassifies him.’ In contrast, Meehl and other proponents of algorithms have argued strongly that it is unethical to rely on intuitive judgments for important decisions if an algorithm is available that will make fewer mistakes. Their rational argument is compelling, but it runs against a stubborn psychological reality: for most people, the cause of a mistake matters. The story of a child dying because an algorithm made a mistake is more poignant than the story of the same tragedy occurring as a result of human error, and the difference in emotional intensity is readily translated into a moral preference.

If our distaste for algorithms is more emotional than rational, then why do forecasters who use them have a special obligation, as Patrick asserts, to lead presentations of their work with a discussion of the “technical problems” when experts offering intuitive judgments almost never do? I’m uncomfortable with that requirement, because I think it unfairly handicaps algorithmic forecasts in what is, frankly, a competition for attention against approaches that are often demonstrably less reliable but also have real-world consequences. This isn’t a choice between action or inaction; it’s a trolley problem. Plenty of harm is already happening on the current track, and better forecasts could help reduce that harm. Under these circumstances, I think we behave ethically when we encourage the use of our forecasts in honest but persuasive ways.

If we could choose between forecasting and not forecasting, then I would be happier to set a high bar for predictive claims-making and let the validation to which Patrick alluded determine whether or not we’re going to try forecasting at all. Unfortunately, that’s not the world we inhabit. Instead, we live in a world in which governments and other organizations are constantly making plans, and those plans incorporate beliefs about future states of the world.

Conventionally, those beliefs are heavily influenced by the judgments of a small number of experts elicited in unstructured ways. That approach probably works fine in some fields, but geopolitics is not one of them. In this arena, statistical models and carefully designed procedures for eliciting and combining expert judgments will also produce forecasts that are uncertain and imperfect, but those algorithm-driven forecasts will usually be more accurate than the conventional approach of querying one or a few experts and blending their views in our heads (see here and here for some relevant evidence).

We also know that most of those subject-matter experts don’t abide by the rules Patrick proposes for quantitative forecasters. Anyone who’s ever watched cable news or read an op-ed—or, for that matter, attended a panel discussion—knows that experts often convey their judgments with little or no discussion of their cognitive biases and sources of uncertainty.

As it happens, that confidence is persuasive. As Kahneman writes (p. 263),

Experts who acknowledge the full extent of their ignorance may expect to be replaced by more confident competitors who are better able to gain the trust of clients. An unbiased appreciation of uncertainty is a cornerstone of rationality—but it is not what people and organizations want. Extreme uncertainty is paralyzing under dangerous circumstances, and the admission that one is merely guessing is especially unacceptable when the stakes are high. Acting on pretended knowledge is often the preferred solution.

The allure of confidence is dysfunctional in many analytic contexts, but it’s also not something we can wish away. And if confidence often trumps content, then I think we do our work and our audiences a disservice when we hem and haw about the validity of our forecasts as long as the other guys don’t. Instead, I believe we are behaving ethically when we present imperfect but carefully derived forecasts in a confident manner. We should be transparent about the limitations of the data and methods, and we should assess the accuracy of our forecasts and share what we learn. Until we all agree to play by the same rules, though, I don’t think quantitative forecasters have a special obligation to lead with the limitations of their work, thus conceding a persuasive advantage to intuitive forecasters who will fill that space and whose prognostications we can expect to be less reliable than ours.

* You can replay a webcast of that event here. Our panel runs from 1:00:00 to 2:47:00.

There Are No “Best Practices” for Democratic Transitions

I’ve read two pieces in the past two days that have tried to draw lessons from one or more cases about how policy-makers and practitioners can improve the odds that ongoing or future democratic transitions will succeed by following certain rules or formulas. They’ve got my hackles up, so figured I’d use the blog to think through why.

The first of the two pieces was a post by Daniel Brumberg on Foreign Policy‘s Middle East Channel blog entitled “Will Egypt’s Agony Save the Arab Spring?” In that post, Brumberg looks to Egypt’s failure and “the ups and downs of political change in the wider Arab world” to derive six “lessons or rules” for leaders in other transitional cases. I won’t recapitulate Brumberg’s lessons here, but what caught my eye was the frequent use of prescriptive language, like “must be” and “should,” and the related emphasis on the “will and capacity of rival opposition leaders” as the crucial explanatory variable.

The second piece came in this morning’s New York Times, which included an op-ed by Jonathan Tepperman, managing editor of Foreign Affairs, entitled “Can Egypt Learn from Thailand?” As Tepperman notes, Thailand has a long history of military coups, and politics has been sharply polarized there for years, but it’s still managed to make it through a rough patch that began in the mid-2000s with just the one coup in 2006 and no civil war between rival national factions. How?

The formula turns out to be deceptively simple: provide decent, clean governance, compromise with your enemies and focus on the economy.

This approach is common in the field of comparative democratization, and I’ve even done a bit of it myself.  I think scholars who want to make their work on democratization useful to policy-makers and other practitioners often feel compelled to go beyond description and explanation into prescription, and these lists of “best practices” are a familiar and accessible form in which to deliver this kind of advice. In the business world, the archetype is the white paper based on case studies of a one or a few successful firms or entrepreneurs: look what Google or Facebook or Chipotle did and do it, too. In comparative democratization, we often get studies that find things that happened in successful cases but not in failed ones (or vice versa) and then advise practitioners to manufacture the good ones (e.g., pacts, fast economic growth) and avoid the bad (e.g., corruption, repression).

Unfortunately, I think these “best practices” pieces almost invariably succumb to what Nassim Taleb calls the narrative fallacy, as described here by Daniel Kahneman (p. 199):

Narrative fallacies arise inevitably from our continuous attempt to make sense of the world. The explanatory stories that people find compelling are simple; are concrete rather than abstract; assign a larger role to talent, stupidity, and intentions than to luck; and focus on a few striking events that happened rather than on the countless events that failed to happen.

The narrative fallacy is intertwined with outcome bias. Per Kahneman (p. 203),

We are prone to blame decision makers for good decisions that worked out badly and to give them too little credit for successful moves that appear obvious only after the fact… Actions that seem prudent in foresight can look irresponsibly negligent in hindsight [and vice versa].

When I read Tupperman’s “deceptively simple” formula for the survival of democracy and absence of civil war in Thailand, I wondered how confident he was seven or five or two years ago that Yingluck Shinawatra was doing the right things, and that they weren’t going to blow up in her and everyone else’s faces. I also wonder how realistic he thinks it would have been for Morsi and co. to have “provide[d] decent, clean governance” and “focus[ed] on the economy” in ways that would have worked and wouldn’t have sparked backlashes or fresh problems of their own.

Brumberg’s essay gets a little more distance from outcome bias than Tepperman’s does, but I think it still greatly overstates the power of agency and isn’t sufficiently sympathetic to the complexity of the politics within and between relevant organizations in transitional periods.

In Egypt, for example, it’s tempting to pin all the blame for the exclusion of political rivals from President Morsi’s cabinet, the failure to overhaul the country’s police and security forces, and the broader failure “to forge a common vision of political community” (Brumberg’s words) on the personal shortcomings of Morsi and Egypt’s civilian political leaders, but we have to wonder: given the context, who would have chosen differently, and how likely is it that those choices would have produced very different outcomes? Egypt’s economy is suffering from serious structural problems that will probably take many years to untangle, and anyone who thinks he or she knows how to quickly fix those problems is either delusional or works at the IMF. Presidents almost never include opposition leaders in their cabinets; would doing so in Egypt really have catalyzed consensus, or would it just have led to a wave of frustrated resignations a few months down the road? Attempting to overhaul state security forces might have helped avert a coup and prevent the mass killing we’re seeing now, but it might also have provoked a backlash that would have lured the military back out of the barracks even sooner. And in how many countries in the world do political rivals have a “common vision of political community”? We sure don’t in the United States, and I’m hard pressed to think of how any set of politicians here could manufacture one. So why should I expect politicians in Egypt or Tunisia or Libya to be able to pull this off?

Instead of advice, I’ll close with an observation: many of the supposed failures of leadership we often see in cases where coups or rebellions led new democracies back to authoritarian rule or even state collapse are, in fact, inherent to the politics of democratic transitions. The profound economic problems that often help create openings for democratization don’t disappear just because elected officials start trying harder. The distrust between political factions that haven’t yet been given any reason to believe their rivals won’t usurp power at the first chance they get isn’t something that good intentions can easily overcome. As much as I might want to glean a set of “best practices” from the many cases I’ve studied, the single generalization I feel most comfortable making is that the forces which finally tip some cases toward democratic consolidation remain a mystery, and until we understand them better, we can’t pretend to know how to control them.

N.B. For a lengthy exposition of the opposing view on this topic, read Giuseppe Di Palma’s To Craft Democracies. For Di Palma, “Democratization is ultimately a matter of political crafting,” and “democracies can be made (or unmade) in the act of making them.”

Legitimacy Revisited…and Still Found Wanting

The more I think about it, the more convinced I become that “legitimacy” is a solution to a theoretical puzzle that isn’t really so puzzling.

One of the central concerns of contemporary political science is political development—that is, understanding how and why different systems of government emerge, survive, and change.  Many of the theories we’ve crafted to address this topic start by assuming that those dynamics depend, in no small part, on the consent of the governed. Yes, all states sometimes coerce subjects into obedience, but coercion alone can’t explain why people don’t more often ignore or overthrow governments that fail to make them as happy as they could be. Taxes are costly, there are always some laws we don’t like, and subjects usually outnumber state security forces by a large margin.

Legitimacy is the idea we’ve concocted to fill that space between the amount of cooperation we think we can explain with coercion and the amount of cooperation we actually see. In its contemporary form, legitimacy has two layers. The first and supposedly deeper layer is a moral judgment about the justice of the current form of government; the second, surface layer is an instrumental judgment about the utility that government is providing. If we imagine the relationship between a state and its subjects as a marriage of sorts, we might think of the two layers of legitimacy as answers to two different questions: “Do you deserve my love?” and “What have you done for me lately?”

This two-layered notion of legitimacy is made clearest in contemporary thinking about the origins and survival of democratic regimes. According to Larry Diamond, Juan Linz, and Seymour Martin Lipset in Politics in Developing Countries (p. 9, emphasis mine),

All governments rest on some mixture of coercion and consent, but democracies are unique in the degree to which their stability depends on the consent of a majority of those governed…Democratic stability requires a widespread belief among elites and masses in the legitimacy of the democratic system: that it is the best form of government (or the “least evil”), “that in spite of shortcomings and failures, the existing political institutions are better than any others that might be established,” and hence that the democratic regime is morally entitled to demand obedience—to tax and draft, to make laws and enforce them, even “if necessary, by the use of force.”

Democratic legitimacy derives, when it is most stable and secure, from an intrinsic value commitment rooted in the political culture at all levels of society, but it is also shaped (particularly in the early years of democracy) by the performance of the democratic regime, both economically and politically (through the “maintenance of civil order, personal security, adjudication and arbitration of conflicts, and a minimum of predictability in the making and implementing of decisions”). Historically, the more successful a regime has been in providing what people want, the greater and more deeply rooted tends to be its legitimacy. A long record of successful performance tends to build a large reservoir of legitimacy, enabling the system better to endure crises and challenges.

So, to recap, legitimacy is a common answer to a question about the roots of consent, and this question about consent, in turn, emerges from a particular understanding of the relationship between governments and subjects. We think that forms of government only survive so long as subjects choose to keep cooperating, and we expect that subjects will only choose to keep cooperating as long as their moral beliefs and evaluations of regime performance tell them it is in their interest to do so. The math is a bit fuzzy, but the two layers of legitimacy are basically additive. As long as the sum of the moral and instrumental judgments is above some threshold, people will cooperate.

But what if this underlying model isn’t true? What if people actually don’t scan the world that way and actively choose between cooperation and rebellion on a regular basis? What if most of us are just busy getting on with our lives, operating on something more like autopilot, unconcerned with this world of high politics as long as it doesn’t disrupt our local routines and compel us to attend to it?

The more I read about how we as humans actually think—and the more I reflect on my own lived experience—the more convinced I become that the “active optimizer” assumption on which the puzzle of consent depends is bunk. As Daniel Kahneman describes in Thinking, Fast and Slow (pp. 394-395),

Our emotional state is largely determined by what we attend to, and we are normally focused on our current activity and immediate environment. There are exceptions, where the quality of subjective experience is dominated by recurrent thoughts rather than by the events of the moment. When happily in love, we may feel joy even when caught in traffic, and if grieving, we may remain depressed when watching a funny movie. In normal circumstances, however, we draw pleasure and pain from what is happening at the moment, if we attend to it.

One big reason “we are normally focused on our current activity and immediate environment” is that we are creatures of habit and routine with limited cognitive resources. Most of the time, most of us don’t have the energy or the impetus to attend to big, hard, abstract questions about the morality of the current form of government, the available alternatives, and ways to get from one to the other. As Kahneman surmises (p. 354),

We normally experience life in the between-subjects mode, in which contrasting alternatives that might change your mind are absent, and of course [what you see is all there is]. As a consequence, the beliefs that you endorse when you reflect about morality do not necessarily govern your emotional reactions, and the moral intuitions that come to your mind in different situations are not internally consistent.

Put all of this together, and it looks like the active assessments of moral and instrumental value on which “legitimacy” supposedly depends are rarely made, and when they are made, they’re highly contingent. We mostly take things as they come and add the stories and meaning when prompted to do so. A lot of what looks like consent is just people going about their local business in a highly path-dependent world. If you ask us questions about various forms of government, we’ll offer answers, but those answers aren’t very reliable indicators of what’s actually guiding our behavior before or after you asked.

Put another way, I’m saying that the survival of political regimes depends not only on coercion and consent, but also, in large part, on inattention and indifference.

I think we find this hard to accept because (when we bother to think about it) we’ve bought the Hobbesian idea that, without a sovereign state, there would be no order. Hobbes’ State of Nature is philosophically useful, but empirically it’s absurd. As James Scott observes (p. 3) in The Art of Not Being Governed,

Until shortly before the common era, the very last 1 percent of human history, the social landscape consisted of elementary, self-governing, kinship units that might, occasionally, cooperate in hunting, feasting, skirmishing, trading, and peacemaking. It did not contain anything that one could call a state. In other words, living in the absence of state structures has been the standard human condition.

Clearly, nation-states aren’t the “natural” condition of the human animal, and they certainly aren’t a prerequisite for cooperation. Instead, they are a specific social technology that has emerged very recently and has so far proven highly effective at organizing coercive power and, in some cases, at helping to solve certain dilemmas of coordination and cooperation. But that doesn’t mean that we need to refer to national political regimes to explain all coordination and cooperation that happens within their territorial boundaries.

The irrelevance of legitimacy is the other side of that coin. We don’t need to refer to states to explain most of the cooperation that occurs among their putative subjects. Likewise, we don’t need a whole lot of consent to explain why those subjects don’t spend more time trying to change the forms of the nation-states they inhabit. We’ve concocted legitimacy to explain why people seemingly choose to go along with governments that don’t meet their expectations, when really most of the time people are just stumbling from immediate task to task, largely indifferent to the state-level politics on which we focus in our theories of regime survival and change. “Legitimacy” is a hypothesis in response to a question predicated on the false belief that we’re routinely more attentive to, and active in, this arena than we really are.

Big Data Won’t Kill the Theory Star

A few years ago, Wired editor Chris Anderson trolled the scientific world with an essay called “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete.” After talking about the fantastic growth in the scale and specificity of data that was occurring at the time—and that growth has only gotten a lot faster since—Anderson argued that

Petabytes allow us to say: “Correlation is enough.” We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.

In other words, with data this rich, theory becomes superfluous.

Like many of my colleagues, I think Anderson is wrong about the increasing irrelevance of theory. Mark Graham explains why in a year-old post on the Guardian‘s Datablog:

We may one day get to the point where sufficient quantities of big data can be harvested to answer all of the social questions that most concern us. I doubt it though. There will always be digital divides; always be uneven data shadows; and always be biases in how information and technology are used and produced.

And so we shouldn’t forget the important role of specialists to contextualise and offer insights into what our data do, and maybe more importantly, don’t tell us.

At the same time, I also worry that we’re overreacting to Anderson and his ilk by dismissing Big Data as nothing but marketing hype.  From my low perch in one small corner of the social-science world, I get the sense that anyone who sounds excited about Big Data is widely seen as either a fool or a huckster. As Christopher Zorn wrote on Twitter this morning, “‘Big data is dead” is the geek-hipster equivalent of ‘I stopped liking that band before you even heard of them.'”

Of course, I say that as one of those people who’s really excited about the social-scientific potential these data represent. I think a lot of people who dismiss Big Data as marketing hype misunderstand the status quo in social science. If you don’t regularly try to use data to test and develop hypotheses about things like stasis and change in political institutions or the ebb and flow of political violence around the world, you might not realize how scarce and noisy the data we have now really are. On many things our mental models tell us to care about, we simply don’t have reliable measures.

Take, for example, the widely held belief that urban poverty and unemployment drive political unrest in poor countries. Is this true? Well, who knows? For most poor countries, the data we have on income are sparse and often unreliable, and we don’t have any data on unemployment, ever. And that’s at the national level. The micro-level data we’d need to link individuals’ income and employment status to their participation in political mobilization and violence? Apart from a few projects on specific cases (e.g., here and here), fuggeddaboudit.

Lacking the data we need to properly test our models, we fill the space with stories. As Daniel Kahneman describes on p. 201 of Thinking, Fast and Slow,

You cannot help dealing with the limited information you have as if it were all there is to know. You build the best possible story from the information available to you, and if it is a good story, you believe it. Paradoxically, it is easier to construct a coherent story when you know little, when there are fewer pieces to fit into the puzzle. Our comforting conviction that the world makes sense rests on a secure foundation: our almost unlimited ability to ignore our ignorance.

When that’s the state of the art, more data can only make things better. Sure, some researchers will poke around in these data sets until they find “statistically significant” associations and then pretend that’s what they expected to find the whole time. But, as Phil Schrodt points out, plenty of people are already doing that now.

Meanwhile, other researchers with important but unproven ideas about social phenomena will finally get a chance to test and refine those ideas in ways they’d never been able to do before. Barefoot empiricism will play a role, too, but science has always been an iterative process that way, bouncing around between induction and deduction until it hits on something that works. If the switch from data-poor to data-rich social science brings more of that, I feel lucky to be present for its arrival.

Forecasting Round-Up No. 3

1. Mike Ward and six colleagues recently posted a new working paper on “the next generation of crisis prediction.” The paper echoes themes that Mike and Nils Metternich sounded in a recent Foreign Policy piece responding to one I wrote a few days earlier, about the challenges of forecasting rare political events around the world. Here’s a snippet from the paper’s intro:

We argue that conflict research in political science can be improved by more, not less, attention to predictions. The increasing availability of disaggregated data and advanced estimation techniques are making forecasts of conflict more accurate and precise. In addition, we argue that forecasting helps to prevent overfi tting, and can be used both to validate models, and inform policy makers.

I agree with everything the authors say about the scientific value and policy relevance of forecasting, and I think the modeling they’re doing on civil wars is really good. There were two things I especially appreciated about the new paper.

First, their modeling is really ambitious. In contrast to most recent statistical work on civil wars, they don’t limit their analysis to conflict onset, termination, or duration, and they don’t use country-years as their unit of observation. Instead, they look at country-months, and they try to tackle the more intuitive but also more difficult problem of predicting where civil wars will be occurring, whether or not one is already ongoing.

This version of the problem is harder because the factors that affect the risk of conflict onset might not be the same ones that affect the risk of conflict continuation. Even when they are, those factors might not affect the two risks in inverse ways. As a result, it’s hard to specify a single model that can reliably anticipate continuity in, and changes from, both forms of the status quo (conflict or no conflict).

The difficulty of this problem is evident in the out-of-sample accuracy of the model these authors have developed. The performance statistics are excellent on the whole, but that’s mostly because the model is accurately forecasting that whatever is happening in one month will continue to happen in the next. Not surprisingly, the model’s ability to anticipate transitions is apparently weaker. Of the five civil-war onsets that occurred in the test set, only two “arguably…rise to probability levels that are heuristic,” as the authors put it.

I emailed Mike to ask about this issue, and he said they were working on it:

Although the paper doesn’t go into it, in a separate part of this effort we actually do have separate models for onset and continuation, and they do reasonably well.  We are at work on terminations, and developing a new methodology that predicts onsets, duration, and continuation in a single (complicated!) model.  But that is down the line a bit.

Second and even more exciting to me, the authors close the paper with real, honest-to-goodness forecasts. Using the most recent data available when the paper was written, the authors generate predicted probabilities of civil war for the next six months: October 2012 through March 2013. That’s the first time I’ve seen that done in an academic paper about something other than an election, and I hope it sets a precedent that others will follow.

2. Over at Red (team) Analysis, Helene Lavoix appropriately pats The Economist on the back for publicly evaluating the accuracy of the predictions they made in their “World in 2012” issue. You can read the Economist‘s own rack-up here, but I want to highlight one of the points Helene raised in her discussion of it. Toward the end of her post, in a section called “Black swans or biases?”, she quotes this bit from the Economist:

As ever, we failed at big events that came out of the blue. We did not foresee the LIBOR scandal, for example, or the Bo Xilai affair in China or Hurricane Sandy.

As Helene argues, though, it’s not self evident that these events were really so surprising—in their specifics, yes, but not in the more general sense of the possibility of events like these occurring sometime this year. On Sandy, for example, she notes that

Any attention paid to climate change, to the statistics and documents produced by Munich-re…or Allianz, for example, to say nothing about the host of related scientific studies, show that extreme weather events have become a reality and we are to expect more of them and more often, including in the so-called rich countries.

This discussion underscores the importance of being clear about what kind of forecasting we’re trying to do, and why. Sometimes the specifics will matter a great deal. In other cases, though, we may have reason to be more concerned with risks of a more general kind, and we may need to broaden our lens accordingly. Or, as Helene writes,

The methodological problem we are facing here is as follows: Are we trying to predict discrete events (hard but not impossible, however with some constraints and limitations according to cases) or are we trying to foresee dynamics, possibilities? The answer to this question will depend upon the type of actions that should follow from the anticipation, as predictions or foresight are not done in a vacuum but to allow for the best handling of change.

3. Last but by no means least, Edge.org has just posted an interview with psychologist Phil Tetlock about his groundbreaking and ongoing research on how people forecast, how accurate (or not) their forecasts are, and whether or not we can learn to do this task better. [Disclosure: I am one of hundreds of subjects in Phil’s contribution to the IARPA tournament, the Good Judgment Project.] On the subject of learning, the conventional wisdom is pessimistic, so I was very interested to read this bit (emphasis added):

Is world politics like a poker game? This is what, in a sense, we are exploring in the IARPA forecasting tournament. You can make a good case that history is different and it poses unique challenges. This is an empirical question of whether people can learn to become better at these types of tasks. We now have a significant amount of evidence on this, and the evidence is that people can learn to become better [forecasters]. It’s a slow process. It requires a lot of hard work, but some of our forecasters have really risen to the challenge in a remarkable way and are generating forecasts that are far more accurate than I would have ever supposed possible from past research in this area.

And bonus alert: the interview is introduced by Daniel Kahneman, Nobel laureate and author of one of my favorite books from the past few years, Thinking, Fast and Slow.

N.B. In case you’re wondering, you can find Forecasting Round-Up Nos. 1 and 2 here and here.

Complexity Politics: Some Preliminary Ideas

As regular readers of this blog will know, I have become interested of late in applying ideas from complexity theory to politics. I’m hardly the first person to have this thought, but I’ve been surprised by how little published political science I’ve been able to find that goes beyond loose metaphors and really digs into the study of complex adaptive systems to try to explain specific macro-political phenomena.

To start thinking about how that might be done, I’ve been reading: Miller & Page on complex adaptive systems; Gould and Mayr on evolution; Kahneman on human cognition; Beinhocker on the economy; Ostrom on institutions; BatesFukuyama, and North, Wallis, & Weingast on the long course of political development; and Taleb on the predictability of unpredictability.

The single most-stimulating thing I’ve read so far is Eric Beinhocker’s The Origin of Wealth, which provides a thorough but accessible introduction to the principles of complex adaptive systems and then attempts to re-imagine the entirety of economics through that prism. Beinhocker dubs his reworked discipline Complexity Economics, so I thought I would borrow that phraseology and talk about Complexity Politics. Where Beinhocker asks, “Where does wealth come from, and why did it grow explosively in the past few hundred years?” I want to know: Where does government come from? Why does it take so many different forms, and why do those forms change over time? More specifically, why is democracy so prevalent nowadays? How long is that pattern going to last, and what comes next?

In the spirit of web logging circa 2003, I thought I would use this platform to sketch out a rough map of the terrain I’m trying to explore in hopes of stimulating conversation with other social scientists, modelers, and anyone else interested in the subject. Some of these probably won’t make sense to people who aren’t already familiar with complexity theory, but, hey, you can’t blame a guy for trying.

Anyway,  here in very loose order are some of the thoughts I’ve had so far.

1. Political systems aren’t “like” complex adaptive systems. They are complex adaptive systems, and those systems are embedded in a much larger system that “exists in the real physical world,” to borrow Beinhocker’s phrase. The human part of this larger system also encompasses the economy and non-economic forms of social interaction (like friendship), and the political part is not prior to, outside, or above the others, even if it sometimes aspires or claims to be. These various streams of human activity don’t just affect each other; they are all part of a single system in which human activity is embedded and is just one small part.

2. Political development doesn’t just resemble an evolutionary process. These systems are evolutionary systems, and political organization co-evolves with the economy and culture and the physical and biological environments in which all this behavior occurs. As a result, changes in physical and social technologies and the wider ecology of any of these other systems will affect politics, and vice versa.

3. In light of humans’ evolutionary trajectory, some form of hierarchical organization of our social activity is virtually inevitable, but that does not mean that the specific forms we see today were inevitable. The basic theme of organization for cooperation, and the never-ending tension between cooperation and conflict, may be “natural,” but the specific organizational expressions of these themes are not. There is no utopia or other optimal form, just an unending process of variation, replication, and selection.

4. In the human portion of this system, governments are the political equivalent of firms in the economy—organizations that bring together multiple “businesses” in pursuit of some wider goal(s). There is a great deal of isomorphism in which “businesses” governments pursue, but, as the unending arguments in American politics over the proper purpose and size of government show, this debate is not settled. In other words, there is no natural or obvious answer to the question, “What do governments do?”

5. So what is government, anyway? The defining feature of government as a social technology is the claim to the authority to make rules affecting people who are not parties to the rule-making process. Economic exchange is based on trade or contracts, both of which involve all parties choosing “freely” to make the exchange. Governments, by contrast, are defined by their assertion of the authority to compel behavior by all individuals of a certain class. In the system of government that has developed so far, the relevant classes are defined primarily by territory, but this is not the only structure possible.

6. The defining features of government are: a) procedures for selecting rule-makers, b) procedures for making rules, c) some capacity to implement those rules, and d) some capacity to enforce those rules. Variation in the form (and therefore fitness) of governments occurs along these four dimensions, each of which has many components and sub-components that also vary widely (e.g., electoral systems in democracies).

7. Because they must enforce the rules they make, all governments depend to some extent on coercion. In this sense, all governments depend on people skilled in violence, and on physical technologies—including weapons—that enable monitoring and enforcement. As relevant physical technologies emerge and evolve, governments will often evolve with them.

8. States are a particular form of government connected to the contemporary organization of politics at the global level. (I wrote more about that here.) As Edward Carr wrote in a recent blog post, however, “Many of the global poor live beyond the reach of the state.” In other words, states are just one part of the global political landscape, and all social behavior within their borders does not necessarily fall under their hierarchical structures. It’s really a matter of degree, and for a non-trivial proportion of the human population, the degree is approximately zero. On this point, see also Steve Inskeep’s work on cities in “developing” countries.

9. The economy, by contrast, is effectively ubiquitous in human society. This means that efforts to understand the emergence and evolution of government should presume that governments emerged to serve economic ends and not vice versa. Once government emerged as a social technology, path dependence kicked in, and the two began co-evolving. But the economic roots of government should not be ignored. You can’t explain or understand politics without reference to the economy.

10. Governments operate on many different geographic scales. The presumption (or assertion) by many actors at the national and international scale is that governments at these different levels are nested in a clear hierarchy: local, regional, national. In practice, though, these organizations often don’t operate that way, and the array of governments around the world is really interconnected through a mixture of hierarchical and dense networks that often overlap.

11. Once the social technology of government had emerged, it began to evolve, too. Evolution involves variation, selection, and replication. Adaptation occurs as selection and replication amplify fitter variations. In political space, rules are the building blocks, governments are the “readers” that give form to different arrangements of rules, and institutions are the results on which selection pressures act. As with other social technologies, change primarily occurs through human agency, some of it with clear intention and some of it more experimental. Mutations may also occur as a result of ambiguities inherent in language.

12. Regime types are like species. They aren’t crisp categories so much as recognizable peaks in multidimensional space defined by possible combinations of political DNA. One implication of this observation is that we may get better insights from inductive scans of this multidimensional space than we do from efforts to match real-world cases to deductively defined ideal types. After all, those deductively defined forms are just ideas, and those ideas are just another stream in the same co-evolving system.

13. Like anything else, forms of government vary in their fitness, and fitness is always situational. The evolution of forms of government should follow the usual patterns of s-curves and punctuated equilibria. There will be periods of relative stability in the system when specific combinations with a fitness edge will come to dominate, and there will be periods of rapid change when lots of experimentation and churn will occur. During the more stable phases, hedgehog-like forms that do the “fit” things well will predominate. During periods of phase shift, fox-like organizations that internalize experimentation will survive more readily.

14. Re (13), it’s unclear if democracy is the former or the latter, but I’m inclined to see it as the latter. The last 200 years have been a period of rapid change in human society, and democracy is proliferating because it is fitter than authoritarian rule in this highly uncertain environment. If that’s right, then we would expect to see something other than democracy come to dominate the political landscape whenever this period of phase shift comes to an end. I have no idea when that might be or what the world will look like when that happens, and therefore I have no idea what organizational forms might be fitter in that new era.

15. Ditto for territoriality as the basis for defining the boundaries of governments as political organizations. To imagine what a non-territorial form of political organization might look like, we can consider possibilities for political organization in cyberspace. As more and more exchange migrates to cyberspace, pressures to organize in that domain will increase. States are currently trying to maintain control of that process, and their efforts to do so are facilitated by the dependency of cyberspace on a physical infrastructure. If and when that infrastructure becomes sufficiently non-hierarchical and resilient, I expect we’ll see the center of gravity for governance shift to that (non-territorial) domain. The physical element of coercion will keep territoriality relevant, but there are ways other than direct violence to coerce (e.g., delete bank accounts, revoke accesses or permissions, block signals), and developments in physical technologies (e.g., remotely operated weapons) may also make territoriality less relevant.

16. One of the few “laws” of political behavior is Michel’s Iron Law of Oligarchy, which implies that political organizations invariably become more bureaucratic and self-protective as they grow and gain power. Any attempt to trace political development through the lens of complex adaptive systems needs to show how this pattern emerges from the process. It’s easy to imagine a connection between this pattern and things like loss aversion and the biological drive to dominate reproduction, but it would be useful to see if we can induce the emergence of this pattern from agent-based models with realistic simplifying assumptions.

So that’s where I’m starting from. I hope to dig deeper into some of these ideas in future blog posts. Meanwhile, if you have any reactions or you can point me toward relevant books or articles, please leave a comment or send me an email.

“Muslim Rage!” as an Availability Cascade

How do we make sense of sensationalism like the “MUSLIM RAGE” headline on this week’s Newsweek cover? Here’s one idea:

An availability cascade is a self-sustaining chain of events, which may start from media reports of a relatively minor event and lead up to public panic and large-scale government action. On some occasions, a media story about a risk catches the attention of a segment of the public, which becomes aroused and worried. This emotional reaction becomes a story in itself, prompting additional coverage in the media, which in turn produces greater concern and involvement. The cycle is sometimes sped along deliberately by ‘availability entrepreneurs,’ individuals or organizations who work to ensure a continuous flow of worrying news. The danger is increasingly exaggerated as the media compete for attention-grabbing headlines. Scientists and others who try to dampen the increasing fear and revulsion attract little attention, most of it hostile; anyone who claims that the danger is overstated is suspected of association with a ‘heinous cover-up.’ The issue becomes politically important because it is on everyone’s mind, and the response of the political system is guided by the intensity of public sentiment. The availability cascade has now reset priorities. Other risks, and other ways that resources could be applied for the public good, all have faded into the background.

That’s Daniel Kahneman in Thinking, Fast and Slow (p. 142), and I think this vignette nicely describes the frenzied American reaction to the wave of violent attacks on U.S. diplomatic outposts that began a few days ago. The term “availability cascade” was coined in a 1999 paper by Timur Kuran and Cass Sunstein, and it’s rooted in a cognitive bias psychologists call the availability heuristic: a human tendency to judge the risk of an event by the ease with which vivid examples spring to mind. Recent and dramatic events are easier to recall, and wall-to-wall multimedia coverage keeps those events fresh in our minds. The resulting cascade is a form of herd behavior, the complex process that also contributes to things like bubbles and crashes in the stock market—and, arguably, the anti-American riots that have the U.S. in a tizzy right now.

By describing our response as an availability cascade, I don’t mean to imply that these events are unimportant. Attacks on diplomatic posts are a big deal in international politics, and numerous people have died in the ensuing violence, including the U.S. ambassador to Libya. As such, these events will and should have real consequences, hopefully to include fresh thinking about how to conduct diplomacy in environments with weak or fragmented security services and powerful anti-American groups.

Rather, my point is that these events probably aren’t the political earthquake we’re making them out to be, and our herd-like response may lead us further astray. For starters, most of the recent protests haven’t been that large. In a very helpful blog post, political scientist Megan Rief compares the size of the protests in the past week to the size of the early gatherings in the s0-called Arab Spring and shows that the recent events have generally been much smaller. She also spotlights a specific choice being made by media outlets—the chief “availability entrepreneurs” in this cascade—that is shaping our impressions of the threat these events pose:

It is interesting to observe how media images of the crowds at Tahrir square in early 2011 were presented in wide-angle format, while the current spate of protest images are closely cropped around smaller, violent groups of people, giving the impression that the crowds are large and menacing.

I think it’s also useful to keep in mind that we’ve seen similar waves of unrest a couple of times in the past several years, and each time, things have returned more or less to normal within a couple of weeks. The first wave occurred in 2006 over the publication of “blasphemous” cartoon, and another struck in 2010 over American “pastor” Terry Jones’ call to mark the anniversary of 9/11 by burning the Koran. The short life span of those previous waves doesn’t guarantee that the current one won’t drag on or even escalate further, but it suggests that escalation is unlikely.

Meanwhile, the frenzied reaction is having real-world consequences. On his blog a couple of days ago, longtime Middle East observer Juan Cole lists the “top ten likely consequences of Muslim anti-U.S. embassy riots,” including further declines in tourism to Egypt and Tunisia, countries whose already-struggling economies depend heavily on those foreign visitors, and deeper U.S. entanglement in the domestic politics of Yemen. At Foreign Policy, Josh Keating discusses the effects of terrorism on the design of America’s overseas outposts and asks if the U.S. can keep its diplomats safe without walling them off from the societies they’re supposed to be engaging.

More broadly, this cascade is threatening to reconfigure American public opinion, and through it American foreign policy, in ways that we might later regret. In the online magazine Jadaliyya, Bassam Haddad appropriately bemoans the spate of stories questioning support for recent changes in the politics of the Middle East and North Africa under “casually barbaric” headlines like, “Was the Arab Spring really worth it?” If anything, the U.S. government is traditionally guilty of overreach in these parts of the world: deeply enmeshing itself in the domestic politics of many Arab countries, striking at targets imperfectly identified in secret, and even trying desperately to “reshape the narrative” in societies where anti-Americanism runs deep and wide. Still, I think we also risk under-reaching if we let the opportunistic behavior of a few “availability entrepreneurs” in predominantly Muslim countries and in our own media reconfigure our government’s approach to whole swathes of the world at the very moment those societies are struggling to the institutionalize the political values we so loudly claim to espouse.

Here’s hoping that cooler heads prevail.

How Makers of Foreign Policy Use Statistical Forecasts: They Don’t, Really

The current issue of Foreign Policy magazine includes a short piece I wrote on how statistical models can be useful for forecasting coups d’etat. With the March coup in Mali as a hook, the piece aims to show that number-crunching can sometimes do a good job assessing risks of rare events that might otherwise present themselves as strategic surprises.

In fact, statistical forecasting of international politics is a relatively young field, and decision-makers in government and the private sector have traditionally relied on subject-matter experts to prognosticate on events of interest. Unfortunately, expert judgment does not work nearly as well as a forecasting tool as we might hope or expect.

In a comprehensive study of expert political judgment, Philip Tetlock finds that forecasts made by human experts on a wide variety of political phenomena are barely better than random guesses, and they are routinely bested by statistical algorithms that simply extrapolate from recent trends. Some groups of experts perform better than others—the experts’ cognitive style is especially relevant, and feedback and knowledge of base rates can help, too—but even the best-performing sets of experts fail to match the accuracy of those simple statistical algorithms.

The finding that models outperform subjective judgments at forecasting has been confirmed repeatedly by other researchers, including one prominent 2004 study which showed that a simple statistical model could predict the outcomes of U.S. Supreme Court cases much more accurately than a large assemblage of legal experts.

Because statistical forecasts are potentially so useful, you would think that policy makers and the analysts who inform them would routinely use them. That, however, would be a bad bet. I spoke with several former U.S. policy and intelligence officials, and all of them agreed that policymakers make little use of these tools and the “watch lists” they are often used to produce. A few of those former officials noted some variation in the application of these techniques across segments of the government—military leaders seem to be more receptive to statistical forecasting than civilian ones—but, broadly speaking, sentiment runs strongly against applied modeling.

If the evidence in favor of statistical risk assessment is so strong, why is it such a tough sell?

Part of the answer surely lies in a general tendency humans have to discount or ignore evidence that doesn’t match our current beliefs. Psychologists call this tendency confirmation bias, and it affects how we respond when models produce forecasts that contradict our expectations about the future. In theory, this is when models are most useful; in practice, it may also be when they’re hardest to sell.

Jeremy Weinstein, a professor of political science at Stanford University, served as Director for Development and Democracy on the National Security Council staff at the White House from 2009 until 2011. When I asked him why statistical forecasts don’t get used more in foreign-policy decision-making, he replied, “I only recall seeing the use of quantitative assessments in one context. And in that case, I think they were accepted by folks because they generated predictions consistent with people’s priors. I’m skeptical that they would have been valued the same if they had generated surprising predictions. For example, if a quantitative model suggests instability in a country that no one is invested in following or one everyone believes is stable, I think the likely instinct of policymakers is to question the value of the model.”

The pattern of confirmation bias extends to the bigger picture on the relative efficacy of models and experts. When asked about why policymakers don’t pay more attention to quantitative risk assessments, Anne-Marie Slaughter, former director of Policy Planning at State, responded: “You may believe that [statistical forecasts] have a better track record than expert judgment, but that is not a widely shared view. Changing minds has to come first, then changing resources.”

Where Weinstein and Slaughter note doubts about the value of the forecasts, others see deeper obstacles in the organizational culture of the intelligence community. Ken Knight, now Analytic Director at Centra Technology, spent the better part of a 30-year career in government working on risk assessment, including several years in the 2000s as National Intelligence Officer for Warning. According to Knight, “Part of it is the analytic community that I grew up in. There was very little in the way of quantitative analytic techniques that was taught to me as an analyst in the courses I took. There is this bias that says this stuff is too complex to model…People are just really skeptical that this is going to tell them something they don’t already know.”

This organizational bias may simply reflect some deep grooves in human cognition. Psychological research shows that our minds routinely ignore statistical facts about groups or populations while gobbling up or even cranking out causal stories that purport to explain those facts. These different responses appear to be built-in features of the automatic and unconscious thinking that dominates our cognition. Because of them, our minds “can deal with stories in which the elements are causally linked,” Daniel Kahneman writes, but they are “weak in statistical reasoning.”

Of course, cognitive bias and organizational culture aren’t the only reasons statistical risk assessments don’t always get traction in the intelligence-production process. Stephen Krasner, a predecessor of Slaughter’s as director of Policy Planning at State, noted in an email exchange that there’s often a mismatch between the things these models can warn about and the kinds of questions policymakers are often trying to answer. Krasner’s point was echoed in a recent column by CNAS senior fellow Andrew Exum, who notes that “intelligence organizations are normally asked to answer questions regarding both capability and intent.” To that very short list, I would add “probability,” but the important point here is that estimating the likelihood of events of concern is just one part of what these organizations are asked to do, and often not the most prominent one.

Clearly, there are a host of reasons why policy-makers might not see statistical forecasts as a valuable resource. Some are rooted in cognitive bias and organizational culture, while others are related to the nature of the problems they’re trying to solve.

That said, I suspect that modelers also share some of the blame for the chilly reception their forecasts receive. When modelers are building their forecasting tools, I suspect they often imagine their watch lists landing directly on the desks of policymakers with global concerns who are looking to take preventive action or to nudge along events they’d like to see happen. “Tell me the 10 countries where civil war is most likely,” we might imagine the president saying, “so I know where to send my diplomats and position my ships now.”

In reality, the policy process is much more reactive, and by the time something has landed on the desks of the most senior decision-makers, the opportunity for useful strategic warning is often gone. What’s more, in the rare instances where quantitative forecasts do land on policy-makers’ desks, analysts may not be thrilled to see those watch lists cutting to the front of the line and competing directly with them for the scarce attention of their “customers.”

In this environment, modelers could try to make their forecasts more valuable by designing them for, and targeting them at, people earlier in the analytical process—that is, lower in the bureaucracy. Quantitative risk assessments should be more useful to the analysts, desk officers, and deputies who may be able to raise warning flags earlier and who will be called upon when their country of interest pops into the news. Statistical forecasts of relevant events can shape those specialists’ thinking about what the major risks are in their areas of concern, hopefully spurring them to revisit their assumptions in cases where the forecast diverges significantly from their own expectations. Statistical forecasts can also give those specialists some indication on how various risks might increase or decrease as other conditions change. In this model, the point isn’t to replace or overrule the analyst’s judgment, but rather to shape and inform it.

Even without strategic redirection among modelers, though, it’s possible that broader cultural trends will at least erode resistance to statistical risk assessment among senior decision-makers and the analysts who support them. Advances in computing and communications technology are spurring the rise of Big Data and even talk of a new “age of the algorithm.” The discourse often gets a bit heady, but there’s no question that statistical thinking is making new inroads into many fields. In medicine, for example—another area where subjective judgment is prized and decisions can have life-or-death consequences—improvements in data and analysis are combining with easier access to the results to encourage practitioners to lean more heavily on statistical risk assessments in their decisions about diagnosis and treatment. If the hidebound world of medicine can find new value in statistical modeling, who knows, maybe foreign policy won’t be too far behind.

  • Follow me on Twitter

  • Follow Dart-Throwing Chimp on WordPress.com
  • Enter your email address to follow this blog and receive notifications of new posts by email.

    Join 13.6K other subscribers
  • Archives