Repeat after Me: Correlation Does Not Prove Causation

It’s not hard to imagine that some of the people who consume statistical forecasts of political crises might also look to statisticians for evidence-based insights into the causes of those events and what they (the consumers) might do to shape the risk of their occurrence. It’s perfectly reasonable to expect social scientists to try to provide answers to all three of those questions, but we shouldn’t expect those answers to come from the same studies, because the standards of evidence involved differ in crucial ways.

The standard of evidence for forecasting is out-of-sample accuracy, plain and simple. If a variable makes our predictive models more accurate, we have good reason to use it. We might choose to restrict our search to variables that are plausibly related to the event or outcome of interest in order to avoid basing models on chance associations that disappear when more data is used, but theoretical plausibility should never be confused for strong evidence of causality.

The standard of evidence for causal analysis is much higher. Here, we have to identify a credible mechanism linking the cause to the effect and then to rule out plausible alternative explanations. In other words, we have to show not only that X is associated with Y, but also that the association between X and Y cannot explained by any other number of confounding factors, some of which we know about and some of which we probably do not. This standard is extremely hard to attain, especially in the social sciences where controlled experimentation is usually impossible, unethical, or both.

The standard of evidence for analysis on the effects of interventions is comparable to the standard for causal analysis, but the requisite evidence is usually even scarcer. Consumers of statistical forecasts sometimes assume that deliberate manipulations of factors identified as precursors to, or even causes of, events of interest will have the same effects as “organic” changes in those factors. This assumption is incorrect and possibly even dangerous. Paraphrasing the late and great statistician John Tukey, if you want to know what happens when you change something, you have to try changing it.

These divergent standards of evidence lead social scientists working on different parts of the problem to adopt very different research designs. We can get to a useful forecasting model without addressing all of the concerns required to do good causal analysis, and if the forecasts are useful in their own right, it makes sense to do so. What we must not do, however, is take those models designed to maximize out-of-sample predictive accuracy and use them to draw confident inferences about causation. The findings from the forecasting analysis might offer some interesting hints about cause and effect, but those hints should always be tested against the stronger standard of evidence before serving as the basis for decisions about interventions.

An example helps make the point clearer. If you could only use one variable to predict which countries will experience civil violence in the near future and which ones will not, you would do well to choose a measure of per capita income or median quality of life. Virtually all cross-national observational studies of civil war find a strong association between poverty and the risk of conflict, including the closest thing I’ve seen to a meta-analysis on the subject, a 2006 paper by Havard Hegre and Nicholas Sambanis. Because this association is so powerful, a quality-of-life measure can be a very useful variable for predicting the occurrence of civil violence (although not necessarily the most useful one–on this point, see this paper which I co-authored with many members of the Political Instability Task Force, or its pre-publication version on SSRN).

But what does that association tell us about the causes of civil violence, really? The problem is that numerous theories of civil violence lead us to expect to see this relationship, and the evidence in cross-national observational studies is just too coarse to help us adjudicate among them. For starters, we have grievance-based theories, which see the roots of violence in poor people’s anger and frustration over their meager living conditions and the exploitation that creates or perpetuates them (cf. Ted Gurr’s classic text, Why Men Rebel). More recently, some economists have argued that poverty breeds civil violence by lowering the opportunity costs involved with participation in armed conflict. The basic idea is that poor people have little left to lose, so the risks of fighting don’t look so bad. Construed as a form of employment, participation in a rebel militia that gets fed regularly and enjoys opportunities to loot as it fights can even look pretty lucrative (cf. Paul Collier and Anke Hoeffler’s influential article, “Greed and Grievance in Civil War”). Finally, still other scholars interpret poverty in models of civil violence as a feature of the state rather than its citizens. Seen from this perspective, differences in per capita income produce and reflect underlying differences in the state’s capacity to maintain order, and weak states are more susceptible to armed challenges than strong ones (cf. Jim Fearon and David Laitin’s “Ethnicity, Insurgency, and Civil War”).

All three of these theories imply that variation in the occurrence of civil violence should be associated with variation in per capita income. So which is it? Grievance or greed? Citizens or states? Without sharper evidence on the causal mechanisms each of these approaches suggests and research designs that control more effectively for potential confounding factors, we can’t even begin to answer that question in an intelligent way.

The problem gets even harder when the conversation shifts from causes to solutions. If people fight the state because they are sick of being poor, we might try to reduce the incidence of civil violence through aid programs that give poor people money or jobs. But what if those programs generate their own resentments among people who don’t receive any aid, or who think they are receiving less than their fair share? If, on the other hand, conflict is about state weakness, then aid-givers might focus on building the capacity of important state agencies and security services. But what if those aid programs create new tensions among elites interested in grabbing a slice of the limited aid pie, and those tensions increase the risk of conflict and coups? The only way to know with confidence whether or not these various approaches will help or hurt is to try them and see what happens, and even then the picture is not always clear, because there are so many potentially confounding factors at work. (For some details on efforts to untangle the links between poverty and violence, see this recent talk by Yale economist Chris Blattman.)

My point here is not to discourage consumers of political forecasts from asking social scientists about the causes of the events that interest them and things they might do to affect the odds that those events will occur. My point instead is to warn against trying to learn all of those things from the same piece of research. Findings from one type of study can sometimes improve future iterations of the others, but there is no single design that can effectively answer all three questions at once. We take big intellectual, practical, and even moral risks when we ignore that fact and try to pull causal inferences and policy recommendations out of forecasting tools.