Yesterday, I participated in a panel discussion on the use of technology to prevent and document mass atrocities as part of an event at American University’s Washington College of Law to commemorate the Rwandan genocide.* In my prepared remarks, I talked about the atrocities early-warning system I’m helping build for the U.S. Holocaust Memorial Museum’s Center for the Prevention of Genocide. The chief outputs of that system are probabilistic forecasts, some from statistical models and others from a “wisdom of (expert) crowds” system called an opinion pool.
After I’d described that project, one of the other panelists, Patrick Ball, executive director of Human Rights Data Analysis Group, had this to say via Google Hangout:
As someone who uses machine learning to build statistical models—that’s what I do all day long, that’s my job—I’m very skeptical that models about conflict, about highly rare events that have very complicated and situation-unique antecedents are forecastable. I worry about early warning because when we build models we listen to people less. I know that, from my work with the U.N., when we have a room full of people who know an awful lot about what’s going on on the ground, a graph—when someone puts a graph on the table, everybody stops thinking. They just look at the graph. And that worries me a lot.
In 1994, human-rights experts warned the world about what was happening [in Rwanda]. No one listened. So as we, as technologists and people who like technology, when we ask questions of data, we have to make sure that if anybody is going to listen to us, we’d better be giving them the right answers.
Maybe I was being vain, but I heard that part of Patrick’s remarks as a rebuke of our early-warning project and pretty much every other algorithm-driven atrocities and conflict forecasting endeavor out there. I responded by acknowledging that our forecasts are far from perfect, but I also asserted that we have reason to believe they will usually be at least marginally better than the status quo, so they’re worth doing and sharing anyway.
A few minutes later, Patrick came back with this:
When we build technology for human rights, I think we need to be somewhat thoughtful about how our less technical colleagues are going to hear the things that we say. In a lot of meetings over a lot of years, I’ve listened to very sophisticated, thoughtful legal, qualitative, ethnographic arguments about very specific events occurring on the ground. But almost inevitably, when someone proposes some kind of quantitative analysis, all that thoughtful reasoning escapes the room… The practical effect of introducing any kind of quantitative argument is that it displaces the other arguments that are on the table. And we are naive to think otherwise.
What that means is that the stakes for getting these kinds of claims right are very high. If we make quantitative claims and we’re wrong—because our sampling foundations are weak, because our model is inappropriate, because we misinterpreted the error around our claim, or for any other reason—we can do a lot of harm.
From that combination of uncertainty and the possibility for harm, Patrick concludes that quantitative forecasters have a special responsibility to be circumspect in the presentation of their work:
I propose that one of the foundations of any kind of quantitative claims-making is that we need to have very strict validation before we propose a conclusion to be used by our broader community. There are all kinds of rules about validation in model-building. We know a lot about it. We have a lot of contexts in which we have ground truth. We have a lot of historical detail. Some of that historical detail is itself beset by these sampling problems, but we have opportunities to do validation. And I think that any argument, any claim that we make—especially to non-technical audiences—should lead with that validation rather than leaving it to the technical detail. By avoiding discussing the technical problems in front of non-technical audiences, we’re hiding stuff that might not be working. So I warn us all to be much stricter.
Patrick has applied statistical methods to human-rights matters for a long time, and his combined understanding of the statistics and the advocacy issues is as good if not better than almost anyone else’s. Still, what he described about how people respond to quantitative arguments is pretty much the exact opposite of my experience over 15 years of working on statistical forecasts of various forms of political violence and change. Many of the audiences to which I’ve presented that work have been deeply skeptical of efforts to forecast political behavior. Like Patrick, many listeners have asserted that politics is fundamentally unquantifiable and unpredictable. Statistical forecasts in particular are often derided for connoting a level of precision that’s impossible to achieve and for being too far removed from the messy reality of specific places to produce useful information. Even in cases where we can demonstrate that the models are pretty good at distinguishing high-risk cases from low-risk ones, that evidence usually fails to persuade many listeners, who appear to reject the work on principle.
I hear loud echoes of my experiences in Daniel Kahneman’s discussion of clinical psychologists’ hostility to algorithms and enduring prejudice in favor of clinical judgment, even in situations where the former is demonstrably superior to the latter. On pp. 228 of Thinking, Fast and Slow, Kahneman observes that this prejudice “is an attitude we can all recognize.”
When a human competes with a machine, whether it is John Henry a-hammerin’ on the mountain or the chess genius Garry Kasparov facing off against the computer Deep Blue, our sympathies lie with our fellow human. The aversion to algorithms making decisions that affect humans is rooted in the strong preference that many people have for the natural over the synthetic or artificial.
Kahneman further reports that
The prejudice against algorithms is magnified when the decisions are consequential. [Psychologist Paul] Meehl remarked, ‘I do not quite know how to alleviate the horror some clinicians seem to experience when they envisage a treatable case being denied treatment because a ‘blind, mechanical’ equation misclassifies him.’ In contrast, Meehl and other proponents of algorithms have argued strongly that it is unethical to rely on intuitive judgments for important decisions if an algorithm is available that will make fewer mistakes. Their rational argument is compelling, but it runs against a stubborn psychological reality: for most people, the cause of a mistake matters. The story of a child dying because an algorithm made a mistake is more poignant than the story of the same tragedy occurring as a result of human error, and the difference in emotional intensity is readily translated into a moral preference.
If our distaste for algorithms is more emotional than rational, then why do forecasters who use them have a special obligation, as Patrick asserts, to lead presentations of their work with a discussion of the “technical problems” when experts offering intuitive judgments almost never do? I’m uncomfortable with that requirement, because I think it unfairly handicaps algorithmic forecasts in what is, frankly, a competition for attention against approaches that are often demonstrably less reliable but also have real-world consequences. This isn’t a choice between action or inaction; it’s a trolley problem. Plenty of harm is already happening on the current track, and better forecasts could help reduce that harm. Under these circumstances, I think we behave ethically when we encourage the use of our forecasts in honest but persuasive ways.
If we could choose between forecasting and not forecasting, then I would be happier to set a high bar for predictive claims-making and let the validation to which Patrick alluded determine whether or not we’re going to try forecasting at all. Unfortunately, that’s not the world we inhabit. Instead, we live in a world in which governments and other organizations are constantly making plans, and those plans incorporate beliefs about future states of the world.
Conventionally, those beliefs are heavily influenced by the judgments of a small number of experts elicited in unstructured ways. That approach probably works fine in some fields, but geopolitics is not one of them. In this arena, statistical models and carefully designed procedures for eliciting and combining expert judgments will also produce forecasts that are uncertain and imperfect, but those algorithm-driven forecasts will usually be more accurate than the conventional approach of querying one or a few experts and blending their views in our heads (see here and here for some relevant evidence).
We also know that most of those subject-matter experts don’t abide by the rules Patrick proposes for quantitative forecasters. Anyone who’s ever watched cable news or read an op-ed—or, for that matter, attended a panel discussion—knows that experts often convey their judgments with little or no discussion of their cognitive biases and sources of uncertainty.
As it happens, that confidence is persuasive. As Kahneman writes (p. 263),
Experts who acknowledge the full extent of their ignorance may expect to be replaced by more confident competitors who are better able to gain the trust of clients. An unbiased appreciation of uncertainty is a cornerstone of rationality—but it is not what people and organizations want. Extreme uncertainty is paralyzing under dangerous circumstances, and the admission that one is merely guessing is especially unacceptable when the stakes are high. Acting on pretended knowledge is often the preferred solution.
The allure of confidence is dysfunctional in many analytic contexts, but it’s also not something we can wish away. And if confidence often trumps content, then I think we do our work and our audiences a disservice when we hem and haw about the validity of our forecasts as long as the other guys don’t. Instead, I believe we are behaving ethically when we present imperfect but carefully derived forecasts in a confident manner. We should be transparent about the limitations of the data and methods, and we should assess the accuracy of our forecasts and share what we learn. Until we all agree to play by the same rules, though, I don’t think quantitative forecasters have a special obligation to lead with the limitations of their work, thus conceding a persuasive advantage to intuitive forecasters who will fill that space and whose prognostications we can expect to be less reliable than ours.
* You can replay a webcast of that event here. Our panel runs from 1:00:00 to 2:47:00.