All posts tagged Big Data

Big Data Doesn’t Automatically Produce Better Predictions

At FiveThirtyEight, Neil Payne and Rob Arthur report on an intriguing puzzle:

In an age of unprecedented baseball data, we somehow appear to be getting worse at knowing which teams are — and will be — good.

Player-level predictions are as good if not better than they used to be, but team-level predictions of performance are getting worse. Payne and Arthur aren’t sure why, but they rank a couple of trends in the industry — significant changes in the age structure of the league’s players and, ironically, the increased use of predictive analytics in team management — among the likely culprits.

This story nicely illustrates a fact that breathless discussions of the power of “Big Data” often elide: more and better data don’t automatically lead to more accurate predictions. Observation and prediction are interrelated, but the latter does not move in lock step with the former. At least two things can weaken the link between those two steps in the analytical process.

First, some phenomena are just inherently difficult or impossible to predict with much accuracy. That’s not entirely true of baseball; as Payne and Arthur show, team-level performance predictions have been pretty good in the past. It is true of many other phenomena or systems, however. Take earthquakes; we can now detect and record these events with tremendous precision, but we’re still pretty lousy at anticipating when they’ll occur and how strong they will be. So far, better observation hasn’t led to big gains in prediction.

Second, the systems we’re observing sometimes change, even as we get better at observing them. This is what Payne and Arthur imply is occurring in baseball when they identify trends in the industry as likely explanations for a decline in the predictive power of models derived from historical data. It’s like trying to develop a cure for a disease that’s evolving rapidly as you work on it; the cure you develop in the lab might work great on the last version you captured, but by the time you deploy it, the disease has evolved further, and the treatment doesn’t have much effect.

I wonder if this is also the trajectory social science will follow over the next few decades. Right now, we’re getting hit by the leading edge of what will probably be a large and sustained flood tide of new data on human behavior. That inflow is producing some rather optimistic statements about how predictable human behavior in general, and sometimes politics in particular, will become as we discover deeper patterns in those data.

I don’t share that confidence. A lot of human behavior is predictably routine, and a torrent of new behavioral data will almost certainly make us even better at predicting these actions and events. For better or for worse, though, those routines are not especially interesting or important to most political scientists. Political scientists are more inclined to focus on “high” politics, which remains fairly opaque, or on system-level outcomes like wars and revolutions that emerge from changes in individual-level behavior in non-obvious ways. I suspect we’ll get a little better at predicting these things as we accumulate richer data on various parts of those systems, but I am pretty sure we won’t ever get great at it. The processes are too complex, and the systems themselves are constantly evolving, maybe even at an accelerating rate.

1 Comment

by Jay Ulfelder on August 20, 2015 • Permalink

Posted in Complexity Politics, Forecasting, Measurement

Tagged baseball, Big Data, FiveThirtyEight

Posted by Jay Ulfelder on August 20, 2015

https://dartthrowingchimp.wordpress.com/2015/08/20/big-data-doesnt-automatically-produce-better-predictions/

The Myth of Comprehensive Data

“What about using Twitter sentiment?”

That suggestion came to me from someone at a recent Data Science DC meetup, after I’d given a short talk on assessing risks of mass atrocities for the Early Warning Project, and as the next speaker started his presentation on predicting social unrest. I had devoted the first half of my presentation to a digression of sorts, talking about how the persistent scarcity of relevant public data still makes it impossible to produce global forecasts of rare political crises—things like coups, insurgencies, regime breakdowns, and mass atrocities—that are as sharp and dynamic as we would like.

The meetup wasn’t the first time I’d heard that suggestion, and I think all of the well-intentioned people who have made it to me have believed that data derived from Twitter would escape or overcome those constraints. In fact, the Twitter stream embodies them. Over the past two decades, technological, economic, and political changes have produced an astonishing surge in the amount of information available from and about the world, but that surge has not occurred evenly around the globe.

Think of the availability of data as plant life in a rugged landscape, where dry peaks are places of data scarcity and fertile valleys represent data-rich environments. The technological developments of the past 20 years are like a weather pattern that keeps dumping more and more rain on that topography. That rain falls unevenly across the landscape, however, and it doesn’t have the same effect everywhere it lands. As a result, plants still struggle to grow on many of those rocky peaks, and much of the new growth occurs where water already collected and flora were already flourishing.

The Twitter stream exemplifies this uneven distribution of data in a couple of important ways. Take a look at the map below, a screenshot I took after letting Tweetping run for about 16 hours spanning May 6–7, 2015. The brighter the glow, the more Twitter activity Tweetping saw.

Some of the spatial variation in that map reflects differences in the distribution of human populations, but not all of it. Here’s a map of population density, produced by Daysleeper using data from CEISIN (source). If you compare this one to the map of Twitter usage, you’ll see that they align pretty well in Europe, the Americas, and some parts of Asia. In Africa and other parts of Asia, though, not so much. If it were just a matter of population density, then India and eastern China should burn brightest, but they—and especially China—are relatively dark compared to “the West.” Meanwhile, in Africa, we see pockets of activity, but there are whole swathes of the continent that are populated as or more densely than the brighter parts of South America, but from which we see virtually no Twitter activity.

So why are some pockets of human settlement less visible than others? Two forces stand out: wealth and politics.

First and most obvious, access to Twitter depends on electricity and telecommunications infrastructure and gadgets and literacy and health and time, all of which are much scarcer in poorer parts of the world than they are in richer places. The map below shows lights at night, as seen from space by U.S. satellites 20 years ago and then mapped by NASA (source). These light patterns are sometimes used as a proxy for economic development (e.g., here).

This view of the world helps explain some of the holes in our map of Twitter activity, but not all of it. For example, many of the densely populated parts of Africa don’t light up much at night, just as they don’t on Tweetping, because they lack the relevant infrastructure and power production. Even 20 years ago, though, India and China looked much brighter through this lens than they do on our Twitter usage map.

So what else is going on? The intensity and character of Twitter usage also depends on freedoms of information and speech—the ability and desire to access the platform and to speak openly on it—and this political layer keeps other areas in the dark in that Tweetping map. China, North Korea, Cuba, Ethiopia, Eritrea—if you’re trying to anticipate important political crises, these are all countries you would want to track closely, but Twitter is barely used or unavailable in all of them as a direct or indirect consequence of public policy. And, of course, there are also many places where Twitter is accessible and used but censorship distorts the content of the stream. For example, Saudi Arabia lights up pretty well on the Twitter-usage map, but it’s hard to imagine people speaking freely on it when a tweet can land you in prison.

Clearly, wealth and political constraints still strongly shape the view of the world we can get from new data sources like Twitter. Contrary to the heavily-marketed myth of “comprehensive data,” poverty and repression continue to hide large swathes of the world out of our digital sight, or to distort the glimpses we get of them.

Unfortunate for efforts to forecast rare political crises, those two structural features that so strongly shape the production and quality of data also correlate with the risks we want to anticipate. The map below shows the Early Warning Project‘s most recent statistical assessments of the risk of onsets of state-led mass-killing episodes. Now flash back to the visualization of Twitter usage above, and you’ll see that many of the countries colored most brightly on this map are among the darkest on that one. Even in 2015, the places about which we most need more information to sharpen our forecasts of rare political crises are the ones that are still hardest to see.

Statistically, this is the second-worst of all possible worlds, the worst one being the total absence of information. Data are missing not at random, and the processes producing those gaps are the same ones that put places at greater risk of mass atrocities and other political calamities. This association means that models we estimate with those data will often be misleading. There are ways to mitigate these problems, but they aren’t necessarily simple, cheap, or effective, and that’s before we even start in on the challenges of extracting useful measures from something as heterogeneous and complex as the Twitter stream.

So that’s what I see when I hear people suggest that social media or Google Trends or other forms of “digital exhaust” have mooted the data problems about which I so often complain. Lots of organizations are spending a lot of money trying to overcome these problems, but the political and economic topography producing them does not readily yield. The Internet is part of this complex adaptive system, not a space outside it, and its power to transform that system is neither as strong nor as fast-acting as many of us—especially in the richer and freer parts of the world—presume.

12 Comments

by Jay Ulfelder on May 7, 2015 • Permalink

Posted in Forecasting, Measurement

Tagged Big Data, Twitter

Posted by Jay Ulfelder on May 7, 2015

https://dartthrowingchimp.wordpress.com/2015/05/07/the-myth-of-comprehensive-data/

The Future of Political Science Just Showed Up

I recently wrote about how data sets just starting to come online are going to throw open doors to projects that political scientists have been hoping to do for a while but haven’t had the evidence to handle. Well, one of those shiny new trains just pulled into the station: the Global Dataset of Events, Language, and Tone, a.k.a. GDELT, is now in the public domain.

GDELT is the primarily work of Kalev Leetaru, a a University Fellow at the University of Illinois Graduate School of Library and Information Science, but its intellectual and practical origins—and its journey into the public domain—also owe a lot to the great Phil Schrodt. The data set includes records summarizing more than 200 million events that have occurred around the world from 1979 to the present. Those records are created by software that grabs and parses news reports from a number of international sources, including Agence France Press, the Associated Press, and Xinhua. Each record indicates who did or said what to whom, where, and when.

The “did what” part of each record is based on the CAMEO coding scheme, which sorts actions into a fairly detailed set of categories covering many different forms of verbal and material cooperation and conflict, from public statements of support to attacks with weapons of mass destruction. The “who” and “to whom” parts use carefully constructed dictionaries to identify specific actors and targets by type and proper name. So, for example, “Philippine soldiers” gets identified as Philippines military (PHLMIL), while “Philippine Secretary of Agriculture” gets tagged as Philippine government (PHLGOV). The “where” part uses place names and other clues in the stories to geolocate each event as specifically as possible.

I try to avoid using words like “revolutionary” when talking about the research process, but in this case I think it fits. I suspect this is going to be the data set that launches a thousand dissertations. As Josh Keating noted on his War of Ideas blog at Foreign Policy,

Similar event databases have been built for particular regions, and DARPA has been working along similar lines for the Pentagon with a project known as ICEWS, but for a publicly accessible program…GDELT is unprecedented in it geographic and historic scale.

To Keating’s point about the data set’s scale, I would add two other ways that GDELT is a radical departure from past practice in the discipline. First, it’s going to be updated daily (watch this space). Second, it’s freely available to the public.

Yes, you read that right: a global data set summarizing all sorts of political cooperation and conflict with daily updates is now going to available to anyone with an Internet connection at no charge. As in: FREE. As I said in a tweet-versation about GDELT this afternoon, contractors have been trying for years (and probably succeeding) to sell closed systems like this to the U.S. government for hundreds of thousands or millions of dollars. If I’m not mistaken, that market just crashed, or at the very least shrank by a whole lot.

GDELT isn’t perfect, of course. I’ve already been tinkering with it a bit as part of a project I’m doing for the Holocaust Museum’s Center for the Prevention of Genocide, on monitoring and predicting mass atrocities, and the data on the “Engage in Unconventional Mass Violence” events I’m hoping to use as a marker of atrocities look more reliable in some cases than others. Still, getting a data set of this size and quality in the public domain is a tremendous leap forward for empirical political science, and the fact that it’s open will allow lots of other smart people to find the flaws and work on eliminating or mitigating them.

Last but not least, I think it’s worth noting that GDELT was made possible, in part, through support from the National Science Foundation. It may be free to you, and it’s orders of magnitude cheaper to produce than the artisanal, hand-crafted event data of yesteryear (like, yesterday). But that doesn’t mean it’s been free to develop, produce, or share, and you can thank the NSF for helping various parts of that process happen.

31 Comments

by Jay Ulfelder on April 10, 2013 • Permalink

Posted in Methods

Tagged Big Data, GDELT, Holocaust Museum, Kalev Leetaru, Phil Schrodt

Posted by Jay Ulfelder on April 10, 2013

https://dartthrowingchimp.wordpress.com/2013/04/10/the-future-of-political-science-just-showed-up/

Digging Down to the Micro-Foundations

Yesterday, the blog Political Violence @ a Glance carried a thoughtful post by Thomas Zietzoff about why international relations and conflict researchers need to work harder to get at the micro-foundations of the processes they study. For those of you not immersed in these methodological debates, “micro-foundations” in this context is just a fancy way of referring to individual people instead of the groups and networks in which they’re members, or the towns or countries or regions in which they’re located.

What’s bugging Zeitzoff is the ecological fallacy—that is, the (flawed) assumption that patterns observed across groups necessarily hold for the individuals who belong to those groups. As Zeitzoff notes, many theories of political conflict involve decision processes occurring within individuals—to participate in a protest, to join a rebel group, to vote for A instead of B—but virtually all of the data we use to test those theories describes the groups or environments in which those individuals are embedded.

Take, for example, the idea that poverty increases the risk of civil war. Statistical models of civil-war onset have shown over and over that poorer countries are indeed more susceptible to civil war, but the country-level data used in those models don’t tell us who’s actually doing the fighting. It could be true that poorer individuals are more motivated to rebel than richer ones, but it could also be true that poorer countries are more susceptible to rebellions by collections of individuals whose own economic status has little effect on their decision to participate. Without data on who participates and how poor they are, we can’t really say which is correct.

The broader point is that patterns we observe at these higher levels might may match what’s going on within specific individuals, but they also might not. To make confident inferences about why people act like they do, we really need to try to directly observe (some of) those individuals and the choices they make.

Zeitzoff is absolutely right about the importance of avoiding the ecological fallacy, of course, but I don’t agree with the prescription he writes to remedy this ailment. According to Zeitzoff,

Field experiments offer a promising path forward and need to be incorporated into the repertoire of techniques conflict scholars adopt; a stronger version of this point is that conflict scholars have to do this or else leave unexplored the central arguments that animate the field.

Contra Zeitzoff, I’m skeptical that field experiments will shed much light on many topics of interest to students of international politics, mostly because I don’t think those field experiments will ever happen. Maybe some of the experimental-design pros will set me straight on this, but I don’t see how researchers are going to create and reliably observe experimental and control groups for things like war between states, participation in insurgencies, or protests against authoritarian regimes, given the political sensitivity and ethical dilemmas involved. Many of the actions theorists of international politics care about are dangerous and illegal. Those qualities give participants strong incentives to conceal their actions, and they give states affected by those actions strong incentives to block experiments that could help catalyze unrest or insurgency.

Instead of field experiments, I think this is where Big Data could really help push political science forward. And when I say Big Data here, I don’t just mean larger data sets (although those are great, too). Instead, I’m referring more specifically to what organizations like the U.N.’s Global Pulse have in mind when they use this term: massive collections of digital observations created, sometimes incidentally, as people go about their daily lives.

As Patrick Meier noted in a blog post last year, these high-frequency digital data sets come with significant concerns and constraints of their own, including the need to respect the privacy of the individuals being observed and selection bias in the samples they generate. Still, as long as the data are handled and analyzed with these limitations in mind, they should offer new opportunities to explore the micro-foundations of our theories in ways we’re only just starting to imagine.

In principle, evidence from carefully designed experiments would be even better. In practice, though, I just don’t see many of those experiments happening, and I see no reason to eschew improvement in quixotic pursuit of perfection.

7 Comments

by Jay Ulfelder on April 3, 2013 • Permalink

Posted in Methods

Tagged Big Data, micro-foundations, Thomas Zeitzoff

Posted by Jay Ulfelder on April 3, 2013

https://dartthrowingchimp.wordpress.com/2013/04/03/digging-down-to-the-micro-foundations/

Big Data Won’t Kill the Theory Star

A few years ago, Wired editor Chris Anderson trolled the scientific world with an essay called “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete.” After talking about the fantastic growth in the scale and specificity of data that was occurring at the time—and that growth has only gotten a lot faster since—Anderson argued that

Petabytes allow us to say: “Correlation is enough.” We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.

In other words, with data this rich, theory becomes superfluous.

Like many of my colleagues, I think Anderson is wrong about the increasing irrelevance of theory. Mark Graham explains why in a year-old post on the Guardian‘s Datablog:

We may one day get to the point where sufficient quantities of big data can be harvested to answer all of the social questions that most concern us. I doubt it though. There will always be digital divides; always be uneven data shadows; and always be biases in how information and technology are used and produced.

And so we shouldn’t forget the important role of specialists to contextualise and offer insights into what our data do, and maybe more importantly, don’t tell us.

At the same time, I also worry that we’re overreacting to Anderson and his ilk by dismissing Big Data as nothing but marketing hype. From my low perch in one small corner of the social-science world, I get the sense that anyone who sounds excited about Big Data is widely seen as either a fool or a huckster. As Christopher Zorn wrote on Twitter this morning, “‘Big data is dead” is the geek-hipster equivalent of ‘I stopped liking that band before you even heard of them.'”

Of course, I say that as one of those people who’s really excited about the social-scientific potential these data represent. I think a lot of people who dismiss Big Data as marketing hype misunderstand the status quo in social science. If you don’t regularly try to use data to test and develop hypotheses about things like stasis and change in political institutions or the ebb and flow of political violence around the world, you might not realize how scarce and noisy the data we have now really are. On many things our mental models tell us to care about, we simply don’t have reliable measures.

Take, for example, the widely held belief that urban poverty and unemployment drive political unrest in poor countries. Is this true? Well, who knows? For most poor countries, the data we have on income are sparse and often unreliable, and we don’t have any data on unemployment, ever. And that’s at the national level. The micro-level data we’d need to link individuals’ income and employment status to their participation in political mobilization and violence? Apart from a few projects on specific cases (e.g., here and here), fuggeddaboudit.

Lacking the data we need to properly test our models, we fill the space with stories. As Daniel Kahneman describes on p. 201 of Thinking, Fast and Slow,

You cannot help dealing with the limited information you have as if it were all there is to know. You build the best possible story from the information available to you, and if it is a good story, you believe it. Paradoxically, it is easier to construct a coherent story when you know little, when there are fewer pieces to fit into the puzzle. Our comforting conviction that the world makes sense rests on a secure foundation: our almost unlimited ability to ignore our ignorance.

When that’s the state of the art, more data can only make things better. Sure, some researchers will poke around in these data sets until they find “statistically significant” associations and then pretend that’s what they expected to find the whole time. But, as Phil Schrodt points out, plenty of people are already doing that now.

Meanwhile, other researchers with important but unproven ideas about social phenomena will finally get a chance to test and refine those ideas in ways they’d never been able to do before. Barefoot empiricism will play a role, too, but science has always been an iterative process that way, bouncing around between induction and deduction until it hits on something that works. If the switch from data-poor to data-rich social science brings more of that, I feel lucky to be present for its arrival.

16 Comments

by Jay Ulfelder on February 23, 2013 • Permalink

Posted in Misc.

Tagged Big Data, Christopher Zorn, Daniel Kahneman, narrative fallacy, Phil Schrodt, philosophy of science

Posted by Jay Ulfelder on February 23, 2013

https://dartthrowingchimp.wordpress.com/2013/02/23/big-data-wont-kill-the-theory-star/

How Makers of Foreign Policy Use Statistical Forecasts: They Don’t, Really

The current issue of Foreign Policy magazine includes a short piece I wrote on how statistical models can be useful for forecasting coups d’etat. With the March coup in Mali as a hook, the piece aims to show that number-crunching can sometimes do a good job assessing risks of rare events that might otherwise present themselves as strategic surprises.

In fact, statistical forecasting of international politics is a relatively young field, and decision-makers in government and the private sector have traditionally relied on subject-matter experts to prognosticate on events of interest. Unfortunately, expert judgment does not work nearly as well as a forecasting tool as we might hope or expect.

In a comprehensive study of expert political judgment, Philip Tetlock finds that forecasts made by human experts on a wide variety of political phenomena are barely better than random guesses, and they are routinely bested by statistical algorithms that simply extrapolate from recent trends. Some groups of experts perform better than others—the experts’ cognitive style is especially relevant, and feedback and knowledge of base rates can help, too—but even the best-performing sets of experts fail to match the accuracy of those simple statistical algorithms.

The finding that models outperform subjective judgments at forecasting has been confirmed repeatedly by other researchers, including one prominent 2004 study which showed that a simple statistical model could predict the outcomes of U.S. Supreme Court cases much more accurately than a large assemblage of legal experts.

Because statistical forecasts are potentially so useful, you would think that policy makers and the analysts who inform them would routinely use them. That, however, would be a bad bet. I spoke with several former U.S. policy and intelligence officials, and all of them agreed that policymakers make little use of these tools and the “watch lists” they are often used to produce. A few of those former officials noted some variation in the application of these techniques across segments of the government—military leaders seem to be more receptive to statistical forecasting than civilian ones—but, broadly speaking, sentiment runs strongly against applied modeling.

If the evidence in favor of statistical risk assessment is so strong, why is it such a tough sell?

Part of the answer surely lies in a general tendency humans have to discount or ignore evidence that doesn’t match our current beliefs. Psychologists call this tendency confirmation bias, and it affects how we respond when models produce forecasts that contradict our expectations about the future. In theory, this is when models are most useful; in practice, it may also be when they’re hardest to sell.

Jeremy Weinstein, a professor of political science at Stanford University, served as Director for Development and Democracy on the National Security Council staff at the White House from 2009 until 2011. When I asked him why statistical forecasts don’t get used more in foreign-policy decision-making, he replied, “I only recall seeing the use of quantitative assessments in one context. And in that case, I think they were accepted by folks because they generated predictions consistent with people’s priors. I’m skeptical that they would have been valued the same if they had generated surprising predictions. For example, if a quantitative model suggests instability in a country that no one is invested in following or one everyone believes is stable, I think the likely instinct of policymakers is to question the value of the model.”

The pattern of confirmation bias extends to the bigger picture on the relative efficacy of models and experts. When asked about why policymakers don’t pay more attention to quantitative risk assessments, Anne-Marie Slaughter, former director of Policy Planning at State, responded: “You may believe that [statistical forecasts] have a better track record than expert judgment, but that is not a widely shared view. Changing minds has to come first, then changing resources.”

Where Weinstein and Slaughter note doubts about the value of the forecasts, others see deeper obstacles in the organizational culture of the intelligence community. Ken Knight, now Analytic Director at Centra Technology, spent the better part of a 30-year career in government working on risk assessment, including several years in the 2000s as National Intelligence Officer for Warning. According to Knight, “Part of it is the analytic community that I grew up in. There was very little in the way of quantitative analytic techniques that was taught to me as an analyst in the courses I took. There is this bias that says this stuff is too complex to model…People are just really skeptical that this is going to tell them something they don’t already know.”

This organizational bias may simply reflect some deep grooves in human cognition. Psychological research shows that our minds routinely ignore statistical facts about groups or populations while gobbling up or even cranking out causal stories that purport to explain those facts. These different responses appear to be built-in features of the automatic and unconscious thinking that dominates our cognition. Because of them, our minds “can deal with stories in which the elements are causally linked,” Daniel Kahneman writes, but they are “weak in statistical reasoning.”

Of course, cognitive bias and organizational culture aren’t the only reasons statistical risk assessments don’t always get traction in the intelligence-production process. Stephen Krasner, a predecessor of Slaughter’s as director of Policy Planning at State, noted in an email exchange that there’s often a mismatch between the things these models can warn about and the kinds of questions policymakers are often trying to answer. Krasner’s point was echoed in a recent column by CNAS senior fellow Andrew Exum, who notes that “intelligence organizations are normally asked to answer questions regarding both capability and intent.” To that very short list, I would add “probability,” but the important point here is that estimating the likelihood of events of concern is just one part of what these organizations are asked to do, and often not the most prominent one.

Clearly, there are a host of reasons why policy-makers might not see statistical forecasts as a valuable resource. Some are rooted in cognitive bias and organizational culture, while others are related to the nature of the problems they’re trying to solve.

That said, I suspect that modelers also share some of the blame for the chilly reception their forecasts receive. When modelers are building their forecasting tools, I suspect they often imagine their watch lists landing directly on the desks of policymakers with global concerns who are looking to take preventive action or to nudge along events they’d like to see happen. “Tell me the 10 countries where civil war is most likely,” we might imagine the president saying, “so I know where to send my diplomats and position my ships now.”

In reality, the policy process is much more reactive, and by the time something has landed on the desks of the most senior decision-makers, the opportunity for useful strategic warning is often gone. What’s more, in the rare instances where quantitative forecasts do land on policy-makers’ desks, analysts may not be thrilled to see those watch lists cutting to the front of the line and competing directly with them for the scarce attention of their “customers.”

In this environment, modelers could try to make their forecasts more valuable by designing them for, and targeting them at, people earlier in the analytical process—that is, lower in the bureaucracy. Quantitative risk assessments should be more useful to the analysts, desk officers, and deputies who may be able to raise warning flags earlier and who will be called upon when their country of interest pops into the news. Statistical forecasts of relevant events can shape those specialists’ thinking about what the major risks are in their areas of concern, hopefully spurring them to revisit their assumptions in cases where the forecast diverges significantly from their own expectations. Statistical forecasts can also give those specialists some indication on how various risks might increase or decrease as other conditions change. In this model, the point isn’t to replace or overrule the analyst’s judgment, but rather to shape and inform it.

Even without strategic redirection among modelers, though, it’s possible that broader cultural trends will at least erode resistance to statistical risk assessment among senior decision-makers and the analysts who support them. Advances in computing and communications technology are spurring the rise of Big Data and even talk of a new “age of the algorithm.” The discourse often gets a bit heady, but there’s no question that statistical thinking is making new inroads into many fields. In medicine, for example—another area where subjective judgment is prized and decisions can have life-or-death consequences—improvements in data and analysis are combining with easier access to the results to encourage practitioners to lean more heavily on statistical risk assessments in their decisions about diagnosis and treatment. If the hidebound world of medicine can find new value in statistical modeling, who knows, maybe foreign policy won’t be too far behind.

39 Comments

by Jay Ulfelder on June 18, 2012 • Permalink

Posted in Forecasting, Statistics

Tagged Anne-Marie Slaughter, Big Data, Daniel Kahneman, Jeremy Weinstein, Ken Knight, military coup, Philip Tetlock, Steven Krasner

Posted by Jay Ulfelder on June 18, 2012

https://dartthrowingchimp.wordpress.com/2012/06/18/how-makers-of-foreign-policy-use-statistical-forecasts-they-dont-really/

Statistics Is Not Alchemy

Are aid and investment from China driving crackdowns on the press in some parts of Africa?

I don’t know.

That’s unsatisfying and maybe even a little annoying, but I’m writing a post about it anyway because why I don’t know says a lot about how hard it is to do good quantitative social science, even in the age of Big Data. Here’s the story:

A few Mondays ago, the New York Times ran an op-ed entitled “Africa’s Free Press Problem” in which the author, Mohamed Keita of the advocacy group Committee to Protect Journalists, asserted that press freedom is eroding in Africa, and foreign forces are partially at fault. According to Keita, “Independent African journalists covering the continent’s development are now frequently persecuted for critical reporting on the misuse of public finances, corruption and the activities of foreign investors.” He lays part of the blame for this alleged trend at the feet of Western governments more interested in promoting economic development and stability than democracy, but he sees other forces at work, too:

Then there’s the influence of China, which surpassed the West as Africa’s largest trading partner in 2009. Ever since, China has been deepening technical and media ties with African governments to counter the kind of critical press coverage that both parties demonize as neocolonialist.

In January, Beijing issued a white paper calling for accelerated expansion of China’s news media abroad and the deployment of a press corps of 100,000 around the world, particularly in priority regions like Africa. In the last few months alone, China established its first TV news hub in Kenya and a print publication in South Africa. The state-run Xinhua news agency already operates more than 20 bureaus in Africa. More than 200 African government press officers received Chinese training between 2004 and 2011 in order to produce what the Communist Party propaganda chief, Li Changchun, called “truthful” coverage of development fueled by China’s activities.

When I finished Keita’s piece, I was sympathetic to his concerns, but I was skeptical of his claim that the ebb and flow of press freedom in Africa was being shaped so decisively by China’s recent investments on the continent. From my own reading of politics, I see the kinds of constraints on the press that Keita describes in Ethiopia and Rwanda as normal features of authoritarian rule. By my reckoning, both Ethiopia and Rwanda have been repressing independent journalism for quite a while, so I couldn’t see how China’s recent overtures would have much to do with why that repression is happening. Cause has to precede effect and all that.

Being an empiricist and a blogger, I figured I’d pursue my hunch by taking a look at the data and writing a post. In a day or two, I could run a statistical analysis that would check Keita’s implied claim that Chinese engagement was reducing press freedom in Africa. I knew that both Freedom House and Reporters Without Borders produce annual, country-level measures of press freedom covering at least the past decade, so I was confident that I could observe recent trends on that side of the equation. All I needed was comparable data on aid and foreign direct investment from China, and I could run some simple fixed-effects models to see if changes over time in those money flows really were associated with decreases in press freedom, as Keita’s essay seemed to suggest.

And that’s where I hit a wall. First, I Googled “china foreign investment data” and “china foreign aid data” and came up with next to the nothing. The best I could do was an incomplete, project-level data set of Chinese foreign aid projects in Africa from 1990 through 2005. Next, I posted queries on Twitter and the listserv of the Society for Political Methodology. The latter led me to the University of the Pacific’s Daniel O’Neill, who confirmed my growing suspicion that the data I wanted simply don’t exist. We can see annual outflows of FDI from China, but we can’t see where that money’s gone, and bilateral data on development assistance from China are not available. (Even if they were, I’m not sure I would have trusted the numbers, but that’s beside the point for now.)

So, here we are in 2012, and it’s impossible to answer a seemingly simple question because the data we need to answer that question are nowhere to be found.

In fact, there are a lot of really interesting and important social-science questions where this is true. Income inequality is one of them, as I discussed on this blog a few weeks ago. Unemployment is another. If I had a dollar for every time I heard someone suggested adding unemployment to a global statistical model of political instability, I’d be a lot richer. It turns out, though, that many countries don’t report unemployment rates, and many of the ones that do only started to do so recently. A quick look at the World Bank’s World Development Indicators shows the problem clearly; lots of countries have no observations, and those gaps are correlated with other things that contribute to the risks of political instability–poverty most especially, but also authoritarian rule and recent or ongoing civil violence.

The list of known unknowns is a lot longer, but I think that’s enough to make the problem clear. From popular discussions, you’d think we’re living in an era when anything and everything is routinely quantified and the only problem left is finding the signal in all that noise. For some questions in some (rich) countries, that’s a fair description. For many of the big questions in comparative politics and international relations, though, we’re only just starting to exit the Dark Ages, and the past–and often even the present–are essentially lost to statistical analysis.