Scientific Updating as a Social Process

Cognitive theories predict that even experts cope with the complexities and ambiguities of world politics by resorting to theory-driven heuristics that allow them: (a) to make confident counterfactual inferences about what would have happened had history gone down a different path (plausible pasts); (b) to generate predictions about what might yet happen (probable futures); (c) to defend both counterfactual beliefs and conditional forecasts from potentially disconfirming data. An interrelated series of studies test these predictions by assessing correlations between ideological world view and beliefs about counterfactual histories (Studies 1 and 2), experimentally manipulating the results of hypothetical archival discoveries bearing on those counterfactual beliefs (Studies 3-5), and by exploring experts’ reactions to the confirmation or disconfirmation of conditional forecasts (Studies 6-12). The results revealed that experts neutralize dissonant data and preserve confidence in their prior assessments by resorting to a complex battery of belief-system defenses that, epistemologically defensible or not, make learning from history a slow process and defections from theoretical camps a rarity.

That’s the abstract to a 1999 AJPS paper by Phil Tetlock (emphasis added; ungated PDF here). Or, as Phil writes in the body of the paper,

The three sets of studies underscore how easy it is even for sophisticated professionals to slip into borderline tautological patterns of thinking about complex path-dependent systems that unfold once and only once. The risk of circularity is particularly pronounced when we examine reasoning about ideologically charged historical counterfactuals.

As noted in a recent post, ongoing debates over who “lost” Iraq or how direct U.S. military intervention in Syria might or might not have prevented wider war in the Middle East are current cases in point.

This morning, though, I’m intrigued by Phil’s point about the rarity of defections from theoretical camps tied to wider belief systems. If that’s right—and I have no reason to doubt that it is—then we should not put much faith in any one expert’s ability to update his or her scientific understanding in response to new information. In other words, we shouldn’t expect science to happen at the level of the individual. Instead, we should look wherever possible at the distribution of beliefs across a community of experts and hope that social cognition is more powerful than our individual minds are.

This evidence should also affect our thinking about how scientific change occurs. In The Structure of Scientific Revolutions, Thomas Kuhn (p. 19 in the 2nd Edition) described scientific revolutions as a process that happens at both the individual and social levels:

When, in the development of a natural science, an individual or group first produces a synthesis able to attract most of the next generation’s practitioners, the older schools gradually disappear. In part their disappearance is caused by their members’ conversion to the new paradigm. But there are always some men who cling to one or another of the older views, and they are simply read out of the profession, which thereafter ignores their work.

If I’m reading Tetlock’s paper right, though, then this description is only partly correct. In reality, scientists who are personally and professionally (or cognitively and emotionally?) invested in existing theories probably don’t convert to new ones very often. Instead, the recruitment mechanism Kuhn also mentions is probably the more relevant one. If we could reliably measure it, the churn rate associated with specific belief clusters would be a fascinating thing to watch.

Asking the Right Questions

This is a cross-post from the Good Judgment Project’s blog.

I came to the Good Judgment Project (GJP) two years ago, in Season 2, as a forecaster, excited about contributing to an important research project and curious to learn more about my skill at prediction. I did pretty well at the latter, and GJP did very well at the former. I’m also a political scientist who happened to have more time on my hands than many of my colleagues, because I work as an independent consultant and didn’t have a full plate at that point. So, in Season 3, the project hired me to work as one of its lead question writers.

Going into that role, I had anticipated that one of the main challenges would be negotiating what Phil Tetlock calls the “rigor-relevance trade-off”—finding questions that are relevant to the project’s U.S. government sponsors and can be answered as unambiguously as possible. That forecast was correct, but even armed with that information, I failed to anticipate just how hard it often is to strike this balance.

The rigor-relevance trade-off exists because most of the big questions about global politics concern latent variables. Sometimes we care about specific political events because of their direct consequences, but more often we care about those events because of what they reveal to us about deeper forces shaping the world. For example, we can’t just ask if China will become more cooperative or more belligerent, because cooperation and belligerence are abstractions that we can’t directly observe. Instead, we have to find events or processes that (a) we can see and (b) that are diagnostic of that latent quality. For example, we can tell when China issues another statement reiterating its claim to the Senkaku Islands, but that happens a lot, so it doesn’t give us much new information about China’s posture. If China were to fire on Japanese aircraft or vessels in the vicinity of the islands—or, for that matter, to renounce its claim to them—now that would be interesting.

It’s tempting to forego some rigor to ask directly about the latent stuff, but it’s also problematic. For the forecast’s consumers, we need to be able to explain clearly what a forecast does and does not cover, so they can use the information appropriately. As forecasters, we need to understand what we’re being asked to anticipate so we can think clearly about the forces and pathways that might or might not produce the relevant outcome. And then there’s the matter of scoring the results. If we can’t agree on what eventually happened, we won’t agree on the accuracy of the predictions. Then the consumers don’t know how reliable those forecasts are, the producers don’t get the feedback they need, and everyone gets frustrated and demotivated.

It’s harder to formulate rigorous questions than many people realize until they try to do it, even on things that seem like they should be easy to spot. Take coups. It’s not surprising that the U.S. government might be keen on anticipating coups in various countries for various reasons. It is, however, surprisingly hard to define a “coup” in such a way that virtually everyone would agree on whether or not one had occurred.

In the past few years, Egypt has served up a couple of relevant examples. Was the departure of Hosni Mubarak in 2011 a coup? On that question, two prominent scholarly projects that use similar definitions to track coups and coup attempts couldn’t agree. Where one source saw an “overt attempt by the military or other elites within the state apparatus to unseat the sitting head of state using unconstitutional means,” the other saw the voluntary resignation of a chief executive due to a loss of his authority and a prompt return to civilian-led government. And what about the ouster of Mohammed Morsi in July 2013? On that, those academic sources could readily agree, but many Egyptians who applauded Morsi’s removal—and, notably, the U.S. government—could not.

We see something similar on Russian military intervention in Ukraine. Not long after Russia annexed Crimea, GJP posted a question asking whether or not Russian armed forces would invade the eastern Ukrainian cities of Kharkiv or Donetsk before 1 May 2014. The arrival of Russian forces in Ukrainian cities would obviously be relevant to U.S. policy audiences, and with Ukraine under such close international scrutiny, it seemed like that turn of events would be relatively easy to observe as well.

Unfortunately, that hasn’t been the case. As Mark Galeotti explained in a mid-April blog post,

When the so-called “little green men” deployed in Crimea, they were very obviously Russian forces, simply without their insignia. They wore Russian uniforms, followed Russian tactics and carried the latest, standard Russian weapons.

However, the situation in eastern Ukraine is much less clear. U.S. Secretary of State John Kerry has asserted that it was “clear that Russian special forces and agents have been the catalyst behind the chaos of the last 24 hours.” However, it is hard to find categorical evidence of this.

Even evidence that seemed incontrovertible when it emerged, like video of a self-proclaimed Russian lieutenant colonel in the Ukrainian city of Horlivka, has often been debunked.

This doesn’t mean we were wrong to ask about Russian intervention in eastern Ukraine. If anything, the intensity of the debate over whether or not that’s happened simply confirms how relevant this topic was. Instead, it implies that we chose the wrong markers for it. We correctly anticipated that further Russian intervention was possible if not probable, but we—like many others—failed to anticipate the unconventional forms that intervention would take.

Both of these examples show how hard it can be to formulate rigorous questions for forecasting tournaments, even on topics that are of keen interest to everyone involved and seem like naturals for the task. In an ideal world, we could focus exclusively on relevance and ask directly about all the deeper forces we want to understand and anticipate. As usual, though, that ideal world isn’t the one we inhabit. Instead, we struggle to find events and processes whose outcomes we can discern that will also reveal something telling about those deeper forces at play.

 

  • Author

  • Follow me on Twitter

  • Follow Dart-Throwing Chimp on WordPress.com
  • Enter your email address to follow this blog and receive notifications of new posts by email.

    Join 13,609 other subscribers
  • Archives

%d bloggers like this: