How Circumspect Should Quantitative Forecasters Be?

Yesterday, I participated in a panel discussion on the use of technology to prevent and document mass atrocities as part of an event at American University’s Washington College of Law to commemorate the Rwandan genocide.* In my prepared remarks, I talked about the atrocities early-warning system I’m helping build for the U.S. Holocaust Memorial Museum’s Center for the Prevention of Genocide. The chief outputs of that system are probabilistic forecasts, some from statistical models and others from a “wisdom of (expert) crowds” system called an opinion pool.

After I’d described that project, one of the other panelists, Patrick Ball, executive director of Human Rights Data Analysis Group, had this to say via Google Hangout:

As someone who uses machine learning to build statistical models—that’s what I do all day long, that’s my job—I’m very skeptical that models about conflict, about highly rare events that have very complicated and situation-unique antecedents are forecastable. I worry about early warning because when we build models we listen to people less. I know that, from my work with the U.N., when we have a room full of people who know an awful lot about what’s going on on the ground, a graph—when someone puts a graph on the table, everybody stops thinking. They just look at the graph. And that worries me a lot.

In 1994, human-rights experts warned the world about what was happening [in Rwanda]. No one listened. So as we, as technologists and people who like technology, when we ask questions of data, we have to make sure that if anybody is going to listen to us, we’d better be giving them the right answers.

Maybe I was being vain, but I heard that part of Patrick’s remarks as a rebuke of our early-warning project and pretty much every other algorithm-driven atrocities and conflict forecasting endeavor out there. I responded by acknowledging that our forecasts are far from perfect, but I also asserted that we have reason to believe they will usually be at least marginally better than the status quo, so they’re worth doing and sharing anyway.

A few minutes later, Patrick came back with this:

When we build technology for human rights, I think we need to be somewhat thoughtful about how our less technical colleagues are going to hear the things that we say. In a lot of meetings over a lot of years, I’ve listened to very sophisticated, thoughtful legal, qualitative, ethnographic arguments about very specific events occurring on the ground. But almost inevitably, when someone proposes some kind of quantitative analysis, all that thoughtful reasoning escapes the room… The practical effect of introducing any kind of quantitative argument is that it displaces the other arguments that are on the table. And we are naive to think otherwise.

What that means is that the stakes for getting these kinds of claims right are very high. If we make quantitative claims and we’re wrong—because our sampling foundations are weak, because our model is inappropriate, because we misinterpreted the error around our claim, or for any other reason—we can do a lot of harm.

From that combination of uncertainty and the possibility for harm, Patrick concludes that quantitative forecasters have a special responsibility to be circumspect in the presentation of their work:

I propose that one of the foundations of any kind of quantitative claims-making is that we need to have very strict validation before we propose a conclusion to be used by our broader community. There are all kinds of rules about validation in model-building. We know a lot about it. We have a lot of contexts in which we have ground truth. We have a lot of historical detail. Some of that historical detail is itself beset by these sampling problems, but we have opportunities to do validation. And I think that any argument, any claim that we make—especially to non-technical audiences—should lead with that validation rather than leaving it to the technical detail. By avoiding discussing the technical problems in front of non-technical audiences, we’re hiding stuff that might not be working. So I warn us all to be much stricter.

Patrick has applied statistical methods to human-rights matters for a long time, and his combined understanding of the statistics and the advocacy issues is as good if not better than almost anyone else’s. Still, what he described about how people respond to quantitative arguments is pretty much the exact opposite of my experience over 15 years of working on statistical forecasts of various forms of political violence and change. Many of the audiences to which I’ve presented that work have been deeply skeptical of efforts to forecast political behavior. Like Patrick, many listeners have asserted that politics is fundamentally unquantifiable and unpredictable. Statistical forecasts in particular are often derided for connoting a level of precision that’s impossible to achieve and for being too far removed from the messy reality of specific places to produce useful information. Even in cases where we can demonstrate that the models are pretty good at distinguishing high-risk cases from low-risk ones, that evidence usually fails to persuade many listeners, who appear to reject the work on principle.

I hear loud echoes of my experiences in Daniel Kahneman’s discussion of clinical psychologists’ hostility to algorithms and enduring prejudice in favor of clinical judgment, even in situations where the former is demonstrably superior to the latter. On pp. 228 of Thinking, Fast and Slow, Kahneman observes that this prejudice “is an attitude we can all recognize.”

When a human competes with a machine, whether it is John Henry a-hammerin’ on the mountain or the chess genius Garry Kasparov facing off against the computer Deep Blue, our sympathies lie with our fellow human. The aversion to algorithms making decisions that affect humans is rooted in the strong preference that many people have for the natural over the synthetic or artificial.

Kahneman further reports that

The prejudice against algorithms is magnified when the decisions are consequential. [Psychologist Paul] Meehl remarked, ‘I do not quite know how to alleviate the horror some clinicians seem to experience when they envisage a treatable case being denied treatment because a ‘blind, mechanical’ equation misclassifies him.’ In contrast, Meehl and other proponents of algorithms have argued strongly that it is unethical to rely on intuitive judgments for important decisions if an algorithm is available that will make fewer mistakes. Their rational argument is compelling, but it runs against a stubborn psychological reality: for most people, the cause of a mistake matters. The story of a child dying because an algorithm made a mistake is more poignant than the story of the same tragedy occurring as a result of human error, and the difference in emotional intensity is readily translated into a moral preference.

If our distaste for algorithms is more emotional than rational, then why do forecasters who use them have a special obligation, as Patrick asserts, to lead presentations of their work with a discussion of the “technical problems” when experts offering intuitive judgments almost never do? I’m uncomfortable with that requirement, because I think it unfairly handicaps algorithmic forecasts in what is, frankly, a competition for attention against approaches that are often demonstrably less reliable but also have real-world consequences. This isn’t a choice between action or inaction; it’s a trolley problem. Plenty of harm is already happening on the current track, and better forecasts could help reduce that harm. Under these circumstances, I think we behave ethically when we encourage the use of our forecasts in honest but persuasive ways.

If we could choose between forecasting and not forecasting, then I would be happier to set a high bar for predictive claims-making and let the validation to which Patrick alluded determine whether or not we’re going to try forecasting at all. Unfortunately, that’s not the world we inhabit. Instead, we live in a world in which governments and other organizations are constantly making plans, and those plans incorporate beliefs about future states of the world.

Conventionally, those beliefs are heavily influenced by the judgments of a small number of experts elicited in unstructured ways. That approach probably works fine in some fields, but geopolitics is not one of them. In this arena, statistical models and carefully designed procedures for eliciting and combining expert judgments will also produce forecasts that are uncertain and imperfect, but those algorithm-driven forecasts will usually be more accurate than the conventional approach of querying one or a few experts and blending their views in our heads (see here and here for some relevant evidence).

We also know that most of those subject-matter experts don’t abide by the rules Patrick proposes for quantitative forecasters. Anyone who’s ever watched cable news or read an op-ed—or, for that matter, attended a panel discussion—knows that experts often convey their judgments with little or no discussion of their cognitive biases and sources of uncertainty.

As it happens, that confidence is persuasive. As Kahneman writes (p. 263),

Experts who acknowledge the full extent of their ignorance may expect to be replaced by more confident competitors who are better able to gain the trust of clients. An unbiased appreciation of uncertainty is a cornerstone of rationality—but it is not what people and organizations want. Extreme uncertainty is paralyzing under dangerous circumstances, and the admission that one is merely guessing is especially unacceptable when the stakes are high. Acting on pretended knowledge is often the preferred solution.

The allure of confidence is dysfunctional in many analytic contexts, but it’s also not something we can wish away. And if confidence often trumps content, then I think we do our work and our audiences a disservice when we hem and haw about the validity of our forecasts as long as the other guys don’t. Instead, I believe we are behaving ethically when we present imperfect but carefully derived forecasts in a confident manner. We should be transparent about the limitations of the data and methods, and we should assess the accuracy of our forecasts and share what we learn. Until we all agree to play by the same rules, though, I don’t think quantitative forecasters have a special obligation to lead with the limitations of their work, thus conceding a persuasive advantage to intuitive forecasters who will fill that space and whose prognostications we can expect to be less reliable than ours.

* You can replay a webcast of that event here. Our panel runs from 1:00:00 to 2:47:00.

19 Comments

by Jay Ulfelder on April 9, 2014 • Permalink

Posted in Forecasting

Tagged algorithms, Daniel Kahneman, Holocaust Museum, Patrick Ball, Philip Tetlock, trolley problem

Posted by Jay Ulfelder on April 9, 2014

https://dartthrowingchimp.wordpress.com/2014/04/09/how-circumspect-should-quantitative-forecasters-be/

19 Comments

Rex Brynen
/ April 9, 2014

I would certainly say that my experience has been much closer to yours than to Ball’s, namely that policy-makers and intel analysts are extremely dubious about (and resistant to) quantitative data analysis on issues like this. Indeed, such are the biases that my current advice to foreign ministry desk officers and intel analysts wanting to flag warning signs would be to make the case in their reporting almost exclusively on qualitative grounds, even if quantitative data analysis provides part of the basis for the judgment.

Reply
schrodt735
/ April 9, 2014

I also find the comment that quantitative models are taken too seriously very odd, and my experience, while more limited than Jay’s, is consistent with his: quantitative models are held to a far higher standard than qualitative forecasts, and most people are looking for any excuse to dismiss the models so they can get back to telling stories and applying theories that have been dead wrong for decades. And yes, read Kahneman, for numerous reasons, but in particular for the number of times he says variations on “Amos and I expected this result to have a major impact on the profession, and no one paid any attention to it.”

Quantitative models *should* be useful precisely because they are different and are about the only way, short of contacting the sentient slime molds on Beta Centuri C and asking for their perspective, of getting an “opinion” in the room that has a distinctly different set of information-processing biases than the human brain has. But they still need to be used intelligently: they are neither “garbage in, garbage out” nor “garbage in, gospel out” but a tool.

Reply
Raul Pacheco-Vega
/ April 9, 2014

I echo previous commenters, even though my field (water governance) seems to be way more open to quantitative modeling now than it ever used to be. I am pushing to move forward in a different way (e.g. to increase number of cases, and to stop doing so much case study). I don’t do forecasting but if there’s a field where I feel quantitative methods give us A LOT of insight is precisely conflict and violence.

Reply
adbge
/ April 9, 2014

Thanks for posting this. The discussion with Patrick was very interesting. And I think his argument, or at least his intuitions, are sound. Namely, by putting together a model, we can often fool ourselves into thinking that we’ve tamed uncertainty, right? It seems like, well, now this confused, stumbling in the dark feeling is not so bad, I have a *model*, and semi-enlightened people sometimes take it even more seriously than whoever produced it, because they don’t grasp all the nasty assumptions you had to make — the general precarious nature of the whole thing. Basically, the same sort of suspicion that skeptics have about the predictive value of a lot of economics models and whatever.

So I think this suspicion is okay, but I’d also agree that human judgment is even worse. We’ve got Tetlock’s *Political Judgment* which, at least for me, basically confirmed everything I suspected, except it’s 10x worse. Television audiences want confident pundits making bold predictions, and so on. But this doesn’t just disappear when you’re not on screen, yeah? Everyone weighs confident-sounding predictions more highly than uncertain sounding ones. (Thanks evolution.)

And the problem is even worse because, as you point out, human intuition is to distrust unfeeling algorithms when, if you look at a lot of the research, it’s sorta of like “humans suck, models suck, but humans suck more.” That’s my feeling anyways. But few people know this. They haven’t read the research, and they’re stuck with the default intuitions. I mean, hell, I was like this, too.

You see it everywhere. College admissions are a big one. No administrator ever talks about the predictive value of their admissions process, and it’s always a big point of pride when a school says, “We have humans responsible for the admission process.” You know, colleges will go out of their way to tell kids that they’re not just getting fed into an algorithm — presumably because feeling, thinking, soul-having humans have some special juice that quantitative methods don’t have.

Or I stumbled on an interview with Herbert Simon yesterday, and the interviewer is just terrible. Things like, “Estimating probabilities is one thing; applying human wisdom is another. Do you accept the idea of human wisdom?” and “Vincent Van Gogh’s great creativity supposedly sprang from his tortured soul. A computer couldn’t have a soul, could it?”

So, what should be done about it? I don’t know. That’s a much harder problem than just pointing out flaws.

Reply
David
/ April 9, 2014

Do you use an alternative futures analysis technique? It would seem to me using this technique, with weighted factors, would be the answer.

Reply
Jonas
/ April 9, 2014

People don’t understand quantitative analysis, so its credibility is treated in a polarized manner. Either they refuse to believe it, or they’re completely smitten by it. You’re pointing out the case when they reject it, while Patrick Ball is pointing out the case where people surrendering their judgment to it. The recent financial crisis is a good example where, with a little encouragement from self interest, model calculated risk assessments overwhelmed all other voices.

Reply
bradleytencate
/ April 9, 2014

I agree with Ball inasmuch that I feel like statistical models, especially those that attempt to foresight, should at all times be very explicitly validated, and the assumptions underlying these models should be made clearer than they sometimes are. To present the results produced by a forecasting model without having all its parameters clearly laid out as well is to deliberately downplay the considerable degree of uncertainty that also goes into making such a model (much like adbge above me also commented). Apart from inference-related errors, I think the fact that Ball is as familiar as he is with the types of datasets that are typically used for these models (such as GDELT, to name but one), makes him all the more wary of those political scientists that claim to have reliable forecasting models, even though these models might make more objective sense than arguments based purely in (often very flawed) theory. I do think it’s counterproductive to not stimulate political forecasting (especially through continually improving on the quality of event-datasets) in the way that he does, but he has a point when it comes to being up-front about what can and what can’t be said with certainty.

Even though I understand your last point about having to compete with overly confident pundits etc., I don’t think it’s wise for political scientists to go about things the same way, as I think it would only open the field up to even more criticisms to the effect that political scientists can’t be rigorous, or that they make predictive claims about things that simply cannot be predicted (not my opinion). That said, as long as forecasts are accompanied by proper explanations and the data that were used to inform them, I don’t see a problem with stating that quantitative models are more reliable when it comes to forecasting than people, it just needs to be worded carefully.

Reply
couchpsychologist
/ April 9, 2014

Interesting post and I’ve just blogged about it here: http://couchpsychologist.wordpress.com/2014/04/09/do-quantitative-forecasters-have-special-obligations-to-policy-advisees/

Reply
Dan B.
/ April 10, 2014

Ever heard of taking the high road? I don’t see how stooping to the level of media talking head morons is going to advance your profession.

“Maybe I was being vain, but I heard that part of Patrick’s remarks as a rebuke of our early-warning project and pretty much every other algorithm-driven atrocities and conflict forecasting endeavor out there.”

Kahneman, System 1, you’re reacting as if your very survival is at stake.

Reply
- dartthrowingchimp
  / April 10, 2014
  
  Thanks for commenting. From your reaction, I guess I wasn’t clear enough about what I am advocating. I’m not proposing that quantitative forecasters skip validation or hide the results. Instead, I’m arguing that we shouldn’t feel obliged to *lead* with the caveats.
  
  To your point about just taking the high road, I think that’s too simple a way of thinking about it. This situation involves ambiguous and competing imperatives with potentially large social costs, so it’s not so obvious where the high road lies. I conclude that it lies in being transparent about your methods but delivering your findings with confidence befitting their relative merits. I don’t advocate deliberately deceiving anyone.
  
  Last, I realize that this is a blog comment thread, but I would appreciate it if you’d drop the caustic snark next time, or would at least put your name on comments with that tone so you can stand behind them.
  
  Reply
May Sams
/ April 10, 2014

This post presents the case for good leadership. While in the moment, a leader must be firm and decisive. A good leader also takes into account when he/she is wrong and makes improvements for the future. You make a good case. In today’s society we rely on performance reviews to tell us whether to put our faith in a particular person or machine. The person or machine at question will always put forth his/her best qualities on the table (through a resume, an ad, etc.) and it is up to the consumer to be informed on the negative qualities (through reviews, references, word of mouth) as well. We hire for strengths and hope those pull through over weaknesses when the moment comes.

Reply
aliabbasali
/ April 11, 2014

I’d echo and expand on the point made by May above – in an environment where there are competing views from imperfect sources, the responsibility lies with the decision maker.

It is therefore, in an upwards cascade, the responsibility of the appointing body to ensure the decision maker is someone of sufficient experience and education to understand the value and shortcomings of the analyses they are presented with, and to form a valid judgement.

This recasts the problem into two parts, the first is to get the people who understand quant out of the lab and the classroom and into policy making to redress a balance that is currently skewed – otherwise the under or over reliance on analytics is a lottery, which suits no one.

The other is fundamental education. While the maths is a mystical art that the general public do not understand, they will either revere it or dismiss it, rather than according its insights their due weight.

Reply
schrodt735
/ April 11, 2014

The clear advantage of quantitative models is that there is a very large professional community — statisticians generally, and social science statisticians in particular — who can evaluate a model and its strengths and weaknesses, provided the model and the data are open. That’s not to say this will always be done, though with the emerging very strong norms for replication in political science — replication involving not just the data set but also the code used to transform it (usually R now, so there is a common platform) , which is the source of endless potential errors — it can be done much more easily than even a decade ago. Those norms will presumably also be acceptable in the NGO community, whereas “hide and hoard” still seems to prevail in various government agencies, though even there one is seeing at least push-back, if not, so far, action on this. If the “With enough eyeballs, all bugs are shallow” mechanism applies to the models, this will lead, over time, to the NGO models actually being better than the government models except in those areas where the classified data sources provide an edge.

That same statistical community also has lots of shared norms for evaluating models that have explicit measures of error (and *explicit* measures of error on a huge advantage to the formal approach: these are almost impossible to get for qualitative forecasts). The problem is that humans in general do poorly dealing with probabilities (again, read Kahneman. Or watch how many people buy lottery tickets the next time you stop at the local Quik-Trip), and in the US at least there is a serious snobbishness against people who understand mathematics — count the number of times you’ve heard someone brag “I can barely balance my checkbook” vs bragging “I can barely read a menu.” To say nothing of the dominant meme that anyone who understands anything beyond the use of a calculator is either a mildly amusing autistic or a thoroughly obnoxious sociopath. So even when quantitative information could be very useful — for example to President Romney’s advisors — it tends to be dismissed by the guys on the golf course disparaging the “quants” whose idea of a wild evening is playing video games in their parents’ basement.

Or, apparently in Ball’s case, the quantitative models are accepted uncritically, which is just as bad.

So we’re swimming upstream here folks…

Reply
Michael Bishop
/ April 11, 2014

Jay, you make a lot of great points. I’m sure we can all think of situations in which statistics were misunderstood or abused and situations which benefit from more good statistics.

If you and Ball were to continue your discussion publicly I’d ask both of you to (a) specify when each type of situations is more common, and (b) describe how to improve on both problems.

Reply
fungalspore
/ April 12, 2014

This is an interesting post. I read Kahneman’s book last year and am still puzzling over some of the situtations he puts forward. It should be compulsory reading for policy-makers, although the biases he so elegantly describes are difficult to avoid even when you are aware that you are llikely to be swayed by them. In geo-politics algorithms could prove very useful in predicting the levels of poverty that will lead to revolution, for example. It seems sinister but if you design your algorithms around self-limiting questions they are not as good as expert opinions.

Reply
bendehaldevang
/ April 15, 2014

Reblogged this on Inarticulate ramblings of a management consultant and commented:
Great article on the relative and perceived value of quantitative forecasting. I found the section relating to our psychological response particularly relevant. Worth a read

Reply