This is a cross-post from the Good Judgment Project’s blog.
I came to the Good Judgment Project (GJP) two years ago, in Season 2, as a forecaster, excited about contributing to an important research project and curious to learn more about my skill at prediction. I did pretty well at the latter, and GJP did very well at the former. I’m also a political scientist who happened to have more time on my hands than many of my colleagues, because I work as an independent consultant and didn’t have a full plate at that point. So, in Season 3, the project hired me to work as one of its lead question writers.
Going into that role, I had anticipated that one of the main challenges would be negotiating what Phil Tetlock calls the “rigor-relevance trade-off”—finding questions that are relevant to the project’s U.S. government sponsors and can be answered as unambiguously as possible. That forecast was correct, but even armed with that information, I failed to anticipate just how hard it often is to strike this balance.
The rigor-relevance trade-off exists because most of the big questions about global politics concern latent variables. Sometimes we care about specific political events because of their direct consequences, but more often we care about those events because of what they reveal to us about deeper forces shaping the world. For example, we can’t just ask if China will become more cooperative or more belligerent, because cooperation and belligerence are abstractions that we can’t directly observe. Instead, we have to find events or processes that (a) we can see and (b) that are diagnostic of that latent quality. For example, we can tell when China issues another statement reiterating its claim to the Senkaku Islands, but that happens a lot, so it doesn’t give us much new information about China’s posture. If China were to fire on Japanese aircraft or vessels in the vicinity of the islands—or, for that matter, to renounce its claim to them—now that would be interesting.
It’s tempting to forego some rigor to ask directly about the latent stuff, but it’s also problematic. For the forecast’s consumers, we need to be able to explain clearly what a forecast does and does not cover, so they can use the information appropriately. As forecasters, we need to understand what we’re being asked to anticipate so we can think clearly about the forces and pathways that might or might not produce the relevant outcome. And then there’s the matter of scoring the results. If we can’t agree on what eventually happened, we won’t agree on the accuracy of the predictions. Then the consumers don’t know how reliable those forecasts are, the producers don’t get the feedback they need, and everyone gets frustrated and demotivated.
It’s harder to formulate rigorous questions than many people realize until they try to do it, even on things that seem like they should be easy to spot. Take coups. It’s not surprising that the U.S. government might be keen on anticipating coups in various countries for various reasons. It is, however, surprisingly hard to define a “coup” in such a way that virtually everyone would agree on whether or not one had occurred.
In the past few years, Egypt has served up a couple of relevant examples. Was the departure of Hosni Mubarak in 2011 a coup? On that question, two prominent scholarly projects that use similar definitions to track coups and coup attempts couldn’t agree. Where one source saw an “overt attempt by the military or other elites within the state apparatus to unseat the sitting head of state using unconstitutional means,” the other saw the voluntary resignation of a chief executive due to a loss of his authority and a prompt return to civilian-led government. And what about the ouster of Mohammed Morsi in July 2013? On that, those academic sources could readily agree, but many Egyptians who applauded Morsi’s removal—and, notably, the U.S. government—could not.
We see something similar on Russian military intervention in Ukraine. Not long after Russia annexed Crimea, GJP posted a question asking whether or not Russian armed forces would invade the eastern Ukrainian cities of Kharkiv or Donetsk before 1 May 2014. The arrival of Russian forces in Ukrainian cities would obviously be relevant to U.S. policy audiences, and with Ukraine under such close international scrutiny, it seemed like that turn of events would be relatively easy to observe as well.
Unfortunately, that hasn’t been the case. As Mark Galeotti explained in a mid-April blog post,
When the so-called “little green men” deployed in Crimea, they were very obviously Russian forces, simply without their insignia. They wore Russian uniforms, followed Russian tactics and carried the latest, standard Russian weapons.
However, the situation in eastern Ukraine is much less clear. U.S. Secretary of State John Kerry has asserted that it was “clear that Russian special forces and agents have been the catalyst behind the chaos of the last 24 hours.” However, it is hard to find categorical evidence of this.
Even evidence that seemed incontrovertible when it emerged, like video of a self-proclaimed Russian lieutenant colonel in the Ukrainian city of Horlivka, has often been debunked.
This doesn’t mean we were wrong to ask about Russian intervention in eastern Ukraine. If anything, the intensity of the debate over whether or not that’s happened simply confirms how relevant this topic was. Instead, it implies that we chose the wrong markers for it. We correctly anticipated that further Russian intervention was possible if not probable, but we—like many others—failed to anticipate the unconventional forms that intervention would take.
Both of these examples show how hard it can be to formulate rigorous questions for forecasting tournaments, even on topics that are of keen interest to everyone involved and seem like naturals for the task. In an ideal world, we could focus exclusively on relevance and ask directly about all the deeper forces we want to understand and anticipate. As usual, though, that ideal world isn’t the one we inhabit. Instead, we struggle to find events and processes whose outcomes we can discern that will also reveal something telling about those deeper forces at play.