To Realize the QDDR’s Early-Warning Goal, Invest in Data-Making

The U.S. Department of State dropped its second Quadrennial Diplomacy and Development Review, or QDDR, last week (here). Modeled on the Defense Department’s Quadrennial Defense Review, the QDDR lays out the department’s big-picture concerns and objectives so that—in theory—they can guide planning and shape day-to-day decision-making.

The new QDDR establishes four main goals, one of which is to “strengthen our ability to prevent and respond to internal conflict, atrocities, and fragility.” To help do that, the State Department plans to “increase [its] use of early warning analysis to drive early action on fragility and conflict.” Specifically, State says it will:

  1. Improve our use of tools for analyzing, tracking, and forecasting fragility and conflict, leveraging improvements in analytical capabilities;
  2. Provide more timely and accurate assessments to chiefs of mission and senior decision-makers;
  3. Increase use of early warning data and conflict and fragility assessments in our strategic planning and programming;
  4. Ensure that significant early warning shifts trigger senior-level review of the mission’s strategy and, if necessary, adjustments; and
  5. Train and deploy conflict-specific diplomatic expertise to support countries at risk of conflict or atrocities, including conflict negotiation and mediation expertise for use at posts.

Unsurprisingly, that plan sounds great to me. We can’t now and never will be able to predict precisely where and when violent conflict and atrocities will occur, but we can assess risks with enough accuracy and lead time to enable better strategic planning and programming. These forecasts don’t have to be perfect to be earlier, clearer, and more reliable than the traditional practices of deferring to individual country or regional analysts or just reacting to the news.

Of course, quite a bit of well-designed conflict forecasting is already happening, much of it paid for by the U.S. government. To name a few of the relevant efforts: The Political Instability Task Force (PITF) and the Worldwide Integrated Conflict Early Warning System (W-ICEWS) routinely update forecasts of various forms of political crisis for U.S. government customers. IARPA’s Open Source Indicators (OSI) and Aggregative Contingent Estimation (ACE) programs are simultaneously producing forecasts now and discovering ways to make future forecasts even better. Meanwhile, outside the U.S. government, the European Union has recently developed its own Global Conflict Risk Index (GCRI), and the Early Warning Project now assesses risks of mass atrocities in countries worldwide.

That so much thoughtful risk assessment is being done now doesn’t mean it’s a bad idea to start new projects. If there are any iron laws of forecasting hard-to-predict processes like political violence, one of them is that combinations of forecasts from numerous sources should be more accurate than forecasts from a single model or person or framework. Some of the existing projects already do this kind of combining themselves, but combinations of combinations will often be even better.

Still, if I had to channel the intention expressed in this part of the QDDR into a single activity, it would not be the construction of new models, at least not initially. Instead, it would be data-making. Social science is not Newtonian physics, but it’s not astrology, either. Smart people have been studying politics for a long time, and collectively they have developed a fair number of useful ideas about what causes or precedes violent conflict. But, if you can’t track the things those theorists tell you to track, then your forecasts are going to suffer. To improve significantly on the predictive models of political violence we have now, I think we need better inputs most of all.

When I say “better” inputs, I have a few things in mind. In some cases, we need to build data sets from scratch. When I was updating my coup forecasts earlier this year, a number of people wondered why I didn’t include measures of civil-military relations, which are obviously relevant to this particular risk. The answer was simple: because global data on that topic don’t exist. If we aren’t measuring it, we can’t use it in our forecasts, and the list of relevant features that falls into this set is surprisingly long.

In other cases, we need to revive them. Social scientists often build “boutique” data sets for specific research projects, run the tests they want to run on them, and then move on to the next project. Sometimes, the tests they or others run suggest that some features captured in those data sets would make useful predictors. Those discoveries are great in principle, but if those data sets aren’t being updated, then applied forecasters can’t use that knowledge. To get better forecasts, we need to invest in picking up where those boutique data sets left off so we can incorporate their insights into our applications.

Finally and in almost all cases, we need to observe things more frequently. Most of the data available now to most conflict forecasters is only updated once each year, often on a several-month delay and sometimes as much as two years later (e.g., data describing 2014 becomes available in 2016). That schedule is fine for basic research, but it is crummy for applied forecasting. If we want to be able to give assessments and warnings that as current as possible to those “chiefs of mission and senior decision-makers” mentioned in the QDDR, then we need to build models with data that are updated as frequently as possible. Daily or weekly are ideal, but monthly updates would suffice in many cases and would mark a huge improvement over the status quo.

As I said at the start, we’re never going to get models that reliably tell us far in advance exactly where and when violent conflicts and mass atrocities will erupt. I am confident, however, that we can assess these risks even more accurately than we do now, but only if we start making more, and better versions, of the data our theories tell us we need.

I’ll end with a final plea to any public servants who might be reading this: if you do invest in developing better inputs, please make the results freely available to the public. When you share your data, you give the crowd a chance to help you spot and fix your mistakes, to experiment with various techniques, and to think about what else you might consider, all at no additional cost to you. What’s not to like about that?

Advertisements

A Bit More on Country-Month Modeling

My family is riding the flu carousel right now, and my turn came this week. So, in lieu of trying to write from scratch, I wanted to pick up where my last post—on moving from country-year to country-month modeling—left off.

As many of you know, this notion is hardly new. For at least the past decade, many political scientists who use statistical tools to study violent conflict have been advocating and sometimes implementing research designs that shrink their units of observation on various dimensions, including time. The Journal of Conflict Resolution published a special issue on “disaggregating civil war” in 2009. At the time, that publication felt (to me) more like the cresting of a wave of new work than the start of one, and it was motivated, in part, by frustration over all the questions that a preceding wave of country-year civil-war modeling had inevitably left unanswered. Over the past several years, Mike Ward and his WardLab collaborators at Duke have been using ICEWS and other higher-resolution data sets to develop predictive models of various kinds of political instability at the country-month level. Their work has used designs that deal thoughtfully with the many challenges this approach entails, including spatial and temporal interdependence and the rarity of the events of interest. So have others.

Meanwhile, sociologists who study protests and social movements have been pushing in this direction even longer. Scholars trying to use statistical methods to help understand the dynamic interplay between mobilization, activism, repression, and change recognized that those processes can take important turns in weeks, days, or even hours. So, researchers in that field started trying to build event data sets that recorded as exactly as possible when and where various actions occurred, and they often use event history models and other methods that “take time seriously” to analyze the results. (One of them sat on my dissertation committee and had a big influence on my work at the time.)

As far as I can tell, there are two main reasons that all research in these fields hasn’t stampeded in the direction of disaggregation, and one of them is a doozy. The first and lesser one is computing power. It’s no simple thing to estimate models of mutually causal processes occurring across many heterogeneous units observed at high frequency. We still aren’t great at it, but accelerating improvements in computational processing, storage, software—and co-evolving improvements in statistical methods—have made it more tractable than it was even five or 10 years ago.

The second, more important, and more persistent impediment to disaggregated analysis is data, or the lack thereof. Data sets used by statistically minded political scientists come in two basic flavors: global, and case– or region-specific. Almost all of the global data sets of which I’m aware have always used, and continue to use, country-years as their units of observation.

That’s partly a function of the research questions they were built to help answer, but it’s also a function of cost. Data sets were (and mostly still are) encoded by hand by people sifting through or poring over relevant documents. All that labor takes a lot of time and therefore costs a lot of money. One can make (or ask RAs to make) a reasonably reliable summary judgment about something like whether or not a civil war was occurring in a particular country during particular year much quicker than one can do that for each month of that year, or each district in that country, or both. This difficulty hasn’t stopped everyone from trying, but the exceptions have been few and often case-specific. In a better world, we could have patched together those case-specific sets to make a larger whole, but they often use idiosyncratic definitions and face different informational constraints, making cross-case comparison difficult.

That’s why I’ve been so excited about the launch of GDELT and Phoenix and now the public release of the ICEWS event data. These are, I think, the leading edge of efforts to solve those data-collection problems in an efficient and durable way. ICEWS data have been available for several years to researchers working on a few contracts, but they haven’t been accessible to most of us until now.  At first I thought GDELT had rendered that problem moot, but concerns about its reliability have encouraged me to keep looking. I think Phoenix’s open-source-software approach holds more promise for the long run, but, as its makers describe, it’s still in “beta release” and “under active development.” ICEWS is a more mature project that has tried carefully to solve some of the problems, like event duplication and errors in geolocation, that diminish GDELT’s utility. (Many millions of dollars help.) So, naturally, I and many others have been eager to start exploring it. And now we can. Hooray!

To really open up analysis at this level, though, we’re going to need comparable and publicly (or at least cheaply) available data sets on a lot more of things our theories tell us to care about. As I said in the last post, we have a few of those now, but not many. Some of the work I’ve done over the past couple of years—this, especially—was meant to help fill those gaps, and I’m hoping that work will continue. But it’s just a drop in a leaky bucket. Here’s hoping for a hard turn of the spigot.

Down the Country-Month Rabbit Hole

Some big things happened in the world this week. Iran and the P5+1 agreed on a framework for a nuclear deal, and the agreement looks good. In a presidential election in Nigeria—the world’s seventh–most populous country, and one that few observers would have tagged as a democracy before last weekend—incumbent Goodluck Jonathan lost and then promptly and peacefully conceded defeat. The trickle of countries joining China’s new Asian Infrastructure Investment Bank turned into a torrent.

All of those things happened, but you won’t read more about them here, because I have spent the better part of the past week down a different rabbit hole. Last Friday, after years of almosts and any-time-nows, the event data produced for the Integrated Conflict Early Warning System (ICEWS) finally landed in the public domain, and I have been busy trying to figure out how to put them to use.

ICEWS isn’t the first publicly available trove of political event data, but it compares favorably to the field’s first mover, GDELT, and it currently covers a much longer time span than the other recent entrant, Phoenix.

The public release of ICEWS is exciting because it opens the door wider to dynamic modeling of world politics. Right now, nearly all of the data sets employed in statistical studies of politics around the globe use country-years as their units of observation. That’s not bad if you’re primarily interested in the effects or predictive power of structural features, but it’s pretty awful for explaining and anticipating faster-changing phenomena, like social unrest or violent conflict. GDELT broke the lock on that door, but its high noise-to-signal ratio and the opacity of its coding process have deterred me from investing too much time in developing monitoring or forecasting systems that depend on it.

With ICEWS on the Dataverse, that changes. I think we now have a critical mass of data sets in the public domain that: a) reliably cover important topics for the whole world over many years; b) are routinely updated; and, crucially, c) can be parsed to the month or even the week or day to reward investments in more dynamic modeling. Other suspects fitting this description include:

  • The spell-file version of Polity, which measures national patterns of political authority;
  • Lists of coup attempts maintained by Jonathan Powell and Clayton Thyne (here) and the Center for Systemic Peace (here); and
  • The PITF Worldwide Atrocities Event Dataset, which records information about events involving the deliberate killing of five or more noncombatant civilians (more on it here).

We also have high-quality data sets on national elections (here) and leadership changes (here, described here) that aren’t routinely updated by their sources but would be relatively easy to code by hand for applied forecasting.

With ICEWS, there is, of course, a catch. The public version of the project’s event data set will be updated monthly, but on a one-year delay. For example, when the archive was first posted in March, it ran through February 2014. On April 1, the Lockheed team added March 2014. This delay won’t matter much for scholars doing retrospective analyses, but it’s a critical flaw, if not a fatal one, for applied forecasters who can’t afford to pay—what, probably hundreds of thousands of dollars?—for a real-time subscription.

Fortunately, we might have a workaround. Phil Schrodt has played a huge role in the creation of the field of machine-coded political event data, including GDELT and ICEWS, and he is now part of the crew building Phoenix. In a blog post published the day ICEWS dropped, Phil suggested that Phoenix and ICEWS data will probably look enough alike to allow substituting the former for the latter, perhaps with some careful calibration. As Phil says, we won’t know for sure until we have a wider overlap between the two and can see how well this works in practice, but the possibility is promising enough for me to dig in.

And what does that mean? Well, a week has now passed since ICEWS hit the Dataverse, and so far I have:

  • Written an R function that creates a table of valid country-months for a user-specified time period, to use as scaffolding in the construction and agglomeration of country-month data sets;
  • Written scripts that call that function and some others to ingest and then parse or aggregate the other data sets I mentioned to the country-month level;
  • Worked out a strategy, and written the code, to partition the data into training and test sets for a project on predicting violence against civilians; and
  • Spent a lot of time staring at the screen thinking about, and a little time coding, ways to aggregate, reduce, and otherwise pre-process the ICEWS events and Polity data for that work on violence against civilians and beyond.

What I haven’t done yet—T plus seven days and counting—is any modeling. How’s that for push-button, Big Data magic?

Forecasting Round-Up No. 2

N.B. This is the second in an occasional series of posts I’m expecting to do on forecasting miscellany. You can find the first one here.

1. Over at Bad Hessian a few days ago, Trey Causey asked, “Where are the predictions in sociology?” After observing how the accuracy of some well-publicized forecasts of this year’s U.S. elections has produced “growing public recognition that quantitative forecasting models can produce valid results,” Trey wonders:

If the success of these models in forecasting the election results is seen as a victory for social science, why don’t sociologists emphasize the value of prediction and forecasting more? As far as I can tell, political scientists are outpacing sociologists in this area.

I gather that Trey intended his post to stimulate discussion among sociologists about the value of forecasting as an element of theory-building, and I’m all for that. As a political scientist, though, I found myself focusing on the comparison Trey drew between the two disciplines, and that got me thinking again about the state of forecasting in political science. On that topic, I had two brief thoughts.

First, my simple answer to why forecasting is getting more attention from political scientists that it used to is: money! In the past 20 years, arms of the U.S. government dealing with defense and intelligence seem to have taken a keener interest in using tools of social science to try to anticipate various calamities around the world. The research program I used to help manage, the Political Instability Task Force (PITF), got its start in the mid-1990s for that reason, and it’s still alive and kicking. PITF draws from several disciplines, but there’s no question that it’s dominated by political scientists, in large part because the events it tries to forecast—civil wars, mass killings, state collapses, and such—are traditionally the purview of political science.

I don’t have hard data to back this up, but I get the sense that the number and size of government contracts funding similar work has grown substantially since the mid-1990s, especially in the past several years. Things like the Department of Defense’s Minerva Initiative; IARPA’s ACE Program; the ICEWS program that started under DARPA and is now funded by the Office of Naval Research; and Homeland Security’s START consortium come to mind. Like PITF, all of these programs are interdisciplinary by design, but many of the topics they cover have their theoretical centers of gravity in political science.

In other words, through programs like these, the U.S. government is now spending millions of dollars each year to generate forecasts of things political scientists like to think about. Some of that money goes to private-sector contractors, but some of it is also flowing to research centers at universities. I don’t think any political scientists are getting rich off these contracts, but I gather there are bureaucratic and career incentives (as well as intellectual ones) that make the contracts rewarding to pursue. If that’s right, it’s not hard to understand why we’d be seeing more forecasting come out of political science than we used to.

My second reaction to Trey’s question is to point out that there actually isn’t a whole lot of forecasting happening in political science, either. That might seem like it contradicts the first, but it really doesn’t. The fact is that forecasting has long been pooh-poohed in academic social sciences, and even if that’s changing at the margins in some corners of the discipline, it’s still a peripheral endeavor.

The best evidence I have for this assertion is the brief history of the American Political Science Association’s Political Forecasting Group. To my knowledge—which comes from my participation in the group since its establishment—the Political Forecasting Group was only formed several years ago, and its membership is still too small to bump it up to the “organized section” status that groups representing more established subfields enjoy. What’s more, almost all of the panels the group has sponsored so far have focused on forecasts of U.S. elections. That’s partly because those papers are popular draws in election years, but it’s also because the group’s leadership has had a really hard time finding enough scholars doing forecasting on other topics to assemble panels.

If the discipline’s flagship association in one of the countries most culturally disposed to doing this kind of work has trouble cobbling together occasional panels on forecasts of things other than elections, then I think it’s fair to say that forecasting still isn’t a mainstream pursuit in political science, either.

2. Speaking of U.S. election forecasting, Drew Linzer recently blogged a clinic in how statistical forecasts should be evaluated. Via his web site, Votamatic, Drew:

1) began publishing forecasts about the 2012 elections well in advance of Election Day (so there couldn’t be any post hoc hemming and hawing about what his forecasts really were);

2) described in detail how his forecasting model works;

3) laid out a set of criteria he would use to judge those forecasts after the election; and then

4) walked us through his evaluations soon after the results were (mostly) in.

Oh, and in case you’re wondering: Drew’s model performed very well, thank you.

3. But you know what worked a little better than Drew’s election-forecasting model, and pretty much everyone else’s, too? An average of the forecasts from several of them. As it happens, this pattern is pretty robust. A well-designed statistical model is great for forecasting, but an average of forecasts from a number of them is usually going to be even better. Just ask the weather guys.

4. Finally, for those of you—like me—who want to keep holding pundits’ feet to the fire long after the election’s over, rejoice that Pundit Tracker is now up and running, and they even have a stream devoted specifically to politics. Among other things, they’ve got John McLaughlin on the record predicting that Hillary Clinton will win the presidency in 2016, and that President Obama will not nominate Susan Rice to be Secretary of State. McLaughlin’s hit rate so far is a rather mediocre 49 percent (18 of 37 graded calls correct), so make of those predictions what you will.

House Votes to Defund Political Science Program: The Irony, It Burns

From the Monkey Cage this morning:

The Flake amendment Henry wrote about appears to have passed the House last night with a 218-208 vote. The amendment prohibits funding for NSF’s political science program, which among others funds many valuable data collection efforts including the National Election Studies. No other program was singled out like this…This is obviously not the last word on this. The provision may be scrapped in the conference committee (Sara?). But it is clear that political science research is in real danger of a very serious setback.

There’s real irony here in a Republican-controlled House of Representatives voting to defund a political-science program at a time when the Department of Defense and “intelligence community” are apparently increasing spending on similar work. With things like the Minerva Initiative, the Political Instability Task Force (on which I worked for 10 years), ICEWS, and IARPA’s Open Source Indicators programs, the parts of the government concerned with protecting national security seem to find growing value in social-science research and are spending accordingly. Meanwhile, the party that claims to be the stalwart defender of national security pulls in the opposite direction, like the opposing head on Dr. Doolittle’s Pushmi-pullyu. Nice work, fellas.

  • Author

  • Follow me on Twitter

  • Follow Dart-Throwing Chimp on WordPress.com
  • Enter your email address to follow this blog and receive notifications of new posts by email.

    Join 13,651 other followers

  • Archives

%d bloggers like this: