A Bit More on Country-Month Modeling

My family is riding the flu carousel right now, and my turn came this week. So, in lieu of trying to write from scratch, I wanted to pick up where my last post—on moving from country-year to country-month modeling—left off.

As many of you know, this notion is hardly new. For at least the past decade, many political scientists who use statistical tools to study violent conflict have been advocating and sometimes implementing research designs that shrink their units of observation on various dimensions, including time. The Journal of Conflict Resolution published a special issue on “disaggregating civil war” in 2009. At the time, that publication felt (to me) more like the cresting of a wave of new work than the start of one, and it was motivated, in part, by frustration over all the questions that a preceding wave of country-year civil-war modeling had inevitably left unanswered. Over the past several years, Mike Ward and his WardLab collaborators at Duke have been using ICEWS and other higher-resolution data sets to develop predictive models of various kinds of political instability at the country-month level. Their work has used designs that deal thoughtfully with the many challenges this approach entails, including spatial and temporal interdependence and the rarity of the events of interest. So have others.

Meanwhile, sociologists who study protests and social movements have been pushing in this direction even longer. Scholars trying to use statistical methods to help understand the dynamic interplay between mobilization, activism, repression, and change recognized that those processes can take important turns in weeks, days, or even hours. So, researchers in that field started trying to build event data sets that recorded as exactly as possible when and where various actions occurred, and they often use event history models and other methods that “take time seriously” to analyze the results. (One of them sat on my dissertation committee and had a big influence on my work at the time.)

As far as I can tell, there are two main reasons that all research in these fields hasn’t stampeded in the direction of disaggregation, and one of them is a doozy. The first and lesser one is computing power. It’s no simple thing to estimate models of mutually causal processes occurring across many heterogeneous units observed at high frequency. We still aren’t great at it, but accelerating improvements in computational processing, storage, software—and co-evolving improvements in statistical methods—have made it more tractable than it was even five or 10 years ago.

The second, more important, and more persistent impediment to disaggregated analysis is data, or the lack thereof. Data sets used by statistically minded political scientists come in two basic flavors: global, and case– or region-specific. Almost all of the global data sets of which I’m aware have always used, and continue to use, country-years as their units of observation.

That’s partly a function of the research questions they were built to help answer, but it’s also a function of cost. Data sets were (and mostly still are) encoded by hand by people sifting through or poring over relevant documents. All that labor takes a lot of time and therefore costs a lot of money. One can make (or ask RAs to make) a reasonably reliable summary judgment about something like whether or not a civil war was occurring in a particular country during particular year much quicker than one can do that for each month of that year, or each district in that country, or both. This difficulty hasn’t stopped everyone from trying, but the exceptions have been few and often case-specific. In a better world, we could have patched together those case-specific sets to make a larger whole, but they often use idiosyncratic definitions and face different informational constraints, making cross-case comparison difficult.

That’s why I’ve been so excited about the launch of GDELT and Phoenix and now the public release of the ICEWS event data. These are, I think, the leading edge of efforts to solve those data-collection problems in an efficient and durable way. ICEWS data have been available for several years to researchers working on a few contracts, but they haven’t been accessible to most of us until now.  At first I thought GDELT had rendered that problem moot, but concerns about its reliability have encouraged me to keep looking. I think Phoenix’s open-source-software approach holds more promise for the long run, but, as its makers describe, it’s still in “beta release” and “under active development.” ICEWS is a more mature project that has tried carefully to solve some of the problems, like event duplication and errors in geolocation, that diminish GDELT’s utility. (Many millions of dollars help.) So, naturally, I and many others have been eager to start exploring it. And now we can. Hooray!

To really open up analysis at this level, though, we’re going to need comparable and publicly (or at least cheaply) available data sets on a lot more of things our theories tell us to care about. As I said in the last post, we have a few of those now, but not many. Some of the work I’ve done over the past couple of years—this, especially—was meant to help fill those gaps, and I’m hoping that work will continue. But it’s just a drop in a leaky bucket. Here’s hoping for a hard turn of the spigot.

Down the Country-Month Rabbit Hole

Some big things happened in the world this week. Iran and the P5+1 agreed on a framework for a nuclear deal, and the agreement looks good. In a presidential election in Nigeria—the world’s seventh–most populous country, and one that few observers would have tagged as a democracy before last weekend—incumbent Goodluck Jonathan lost and then promptly and peacefully conceded defeat. The trickle of countries joining China’s new Asian Infrastructure Investment Bank turned into a torrent.

All of those things happened, but you won’t read more about them here, because I have spent the better part of the past week down a different rabbit hole. Last Friday, after years of almosts and any-time-nows, the event data produced for the Integrated Conflict Early Warning System (ICEWS) finally landed in the public domain, and I have been busy trying to figure out how to put them to use.

ICEWS isn’t the first publicly available trove of political event data, but it compares favorably to the field’s first mover, GDELT, and it currently covers a much longer time span than the other recent entrant, Phoenix.

The public release of ICEWS is exciting because it opens the door wider to dynamic modeling of world politics. Right now, nearly all of the data sets employed in statistical studies of politics around the globe use country-years as their units of observation. That’s not bad if you’re primarily interested in the effects or predictive power of structural features, but it’s pretty awful for explaining and anticipating faster-changing phenomena, like social unrest or violent conflict. GDELT broke the lock on that door, but its high noise-to-signal ratio and the opacity of its coding process have deterred me from investing too much time in developing monitoring or forecasting systems that depend on it.

With ICEWS on the Dataverse, that changes. I think we now have a critical mass of data sets in the public domain that: a) reliably cover important topics for the whole world over many years; b) are routinely updated; and, crucially, c) can be parsed to the month or even the week or day to reward investments in more dynamic modeling. Other suspects fitting this description include:

  • The spell-file version of Polity, which measures national patterns of political authority;
  • Lists of coup attempts maintained by Jonathan Powell and Clayton Thyne (here) and the Center for Systemic Peace (here); and
  • The PITF Worldwide Atrocities Event Dataset, which records information about events involving the deliberate killing of five or more noncombatant civilians (more on it here).

We also have high-quality data sets on national elections (here) and leadership changes (here, described here) that aren’t routinely updated by their sources but would be relatively easy to code by hand for applied forecasting.

With ICEWS, there is, of course, a catch. The public version of the project’s event data set will be updated monthly, but on a one-year delay. For example, when the archive was first posted in March, it ran through February 2014. On April 1, the Lockheed team added March 2014. This delay won’t matter much for scholars doing retrospective analyses, but it’s a critical flaw, if not a fatal one, for applied forecasters who can’t afford to pay—what, probably hundreds of thousands of dollars?—for a real-time subscription.

Fortunately, we might have a workaround. Phil Schrodt has played a huge role in the creation of the field of machine-coded political event data, including GDELT and ICEWS, and he is now part of the crew building Phoenix. In a blog post published the day ICEWS dropped, Phil suggested that Phoenix and ICEWS data will probably look enough alike to allow substituting the former for the latter, perhaps with some careful calibration. As Phil says, we won’t know for sure until we have a wider overlap between the two and can see how well this works in practice, but the possibility is promising enough for me to dig in.

And what does that mean? Well, a week has now passed since ICEWS hit the Dataverse, and so far I have:

  • Written an R function that creates a table of valid country-months for a user-specified time period, to use as scaffolding in the construction and agglomeration of country-month data sets;
  • Written scripts that call that function and some others to ingest and then parse or aggregate the other data sets I mentioned to the country-month level;
  • Worked out a strategy, and written the code, to partition the data into training and test sets for a project on predicting violence against civilians; and
  • Spent a lot of time staring at the screen thinking about, and a little time coding, ways to aggregate, reduce, and otherwise pre-process the ICEWS events and Polity data for that work on violence against civilians and beyond.

What I haven’t done yet—T plus seven days and counting—is any modeling. How’s that for push-button, Big Data magic?

Egypt as a Case Study in the Causes of Political Inertia

For LARB, Max Strasser has just reviewed (here) Thanassis Cambanis’ new book on the arc of Egyptian revolution (here). I haven’t read the book, but from Strasser’s review, it sounds like Cambanis’ account makes for a useful case study on the causal mechanisms of political inertia.

Here, for example, is how we are to understand how the military managed to retain and even strengthen its hold on political power in Egypt over the course of the past four years:

After the initial protests forced President Hosni Mubarak from power, a military junta known as the Supreme Council of Armed Forces (SCAF) took control of Egypt.

I think this sentence gets the sequence wrong—that the officers who formed SCAF played a direct role in forcing Mubarak’s departure to clear the way for their junta (source). That is no minor detail when we’re talking about how those officers managed to avert transformational change. Anyway, back to Strasser:

The generals vociferously claimed they were the defenders of the revolution, but they did everything in their power to stymy [sic] radical change. They fast-tracked constitutions and dissolved parliaments, they cut backroom deals and initiated prosecutions. Most of all, they sowed fear and chaos that ultimately served them perfectly.

The “men with guns” sowed that fear through violence—at Maspero, at Port Said, and in many other situations that challenged their claim to power. In this behavior, we see how entrenched hierarchical organizations deploy familiar routines that simultaneously protect and reproduce their established positions. The marginal costs of deploying these routines are relatively low, precisely because they are routinized. In their parts if not in their whole, they have been rehearsed and repeated, and their propriety is etched in the extant culture. Metaphorically speaking, no new software is required; instead, organizational leaders only have to hit ‘run’ on the scripts in place. When circumstances demand innovation, preexisting modules—parts of organizations and behavioral routines—can be reassembled or lightly tweaked and then employed in short order.

And what about the revolutionaries? They possess none of those advantages, and it shows. Back to Strasser:

The revolutionaries — the leftists and liberals who formed the core of the uprising and tried to keep its goals alive amid military massacres and Brotherhood backroom dealing — do not emerge blameless from the tumultuous 2011–2013 period. Cambanis is unabashedly sympathetic to them. (I was, and am, too.) But he can’t help but point out their foibles. The revolutionaries failed to take advantage of electoral politics; they neglected political organizing in the countryside and the small cities in favor of Cairo and Alexandria (and Tahrir Square in particular); they made demands on the government that were at times unreasonable; they squandered opportunities to have their voices heard by those who held power; far too often they fought among themselves. (Something that some — such as the Revolutionary Youth Coalition, of which Moaz, one of Cambanis’s central characters, was a member — came to admit only too late.)

Nothing exemplifies the revolutionaries’ pitfalls and failures as well as the ill-fated Tahrir Square sit-in of July 2011. Amid feelings that the revolution had stalled under military rule, the revolutionary groups repaired to their favorite tactic: a tent camp in the center of Cairo. But unlike the initial uprising demanding Mubarak leave the presidency, this time the goals were diffuse and hazy. Protesters called for prosecution of members of the former regime, including hanging Mubarak, but other arguments were presented poorly. The protesters gathered under the conveniently ambiguous slogan “The Revolution First.” Once they were stuck in the square — in the sweltering weather of Cairo in July — they couldn’t back down. Each group was concerned about looking somehow less revolutionary than the others. The sit-in lacked public support and petered out. The memory of the July sit-in, like so much from that decisive year, will likely wither into oblivion. It was one of many missteps. But by focusing a chapter around it (“Stuck in the Square”), by describing the way the revolutionaries argued among themselves and aimlessly checked social media on their iPhones from the center of Tahrir, Cambanis makes clear what exactly went wrong, giving a microcosmic preview of the ways the revolution would falter. Every political organizing meeting in Cairo that devolved into pointless bickering under a cloud of cigarette smoke feels like a tragic missed connection — what if that one had only worked out?

To gain power, the forces seeking deep change must act collectively and purposefully. Unfortunately for them, the organizations and routines through which they would do those things do not exist, and they are difficult and costly to create. Even when participants agree on the broad objectives, inevitable and frequent disputes over the details—and, crucially, the procedures by which those disputes will be resolved—hamper efforts to convert shared intentions into effective action. Absent prior routines for taxing and policing members, free-rider problems abound. Organizations that have already solved some of these problems—in Egypt in 2011, the Muslim Brotherhood—enjoy significant advantages over their aspiring civic collaborators and rivals, but they rarely match the capacity of their bureaucratized rivals within state.

So, in most cases most of the time, even when incumbents are unloved and frustrations abound, the revolutionary moment never emerges. And in the rare instances that it does, incumbent power-holders usually manage to repress it or ride it out. These outcomes have less to do with the attraction of the underlying ideas and individuals than the power of prior organization. Routines are hard to create, and then hard to dislodge once created.

A Note on Trends in Armed Conflict

In a report released earlier this month, the Project for the Study of the 21st Century (PS21) observed that “the body count from the top twenty deadliest wars in 2014 was more than 28% higher than in the previous year.” They counted approximately 163 thousand deaths in 2014, up from 127 thousand in 2013. The report described that increase as “part of a broader multi-year trend” that began in 2007. The project’s executive director, Peter Epps, also appropriately noted that “assessing casualty figures in conflict is notoriously difficult and many of the figures we are looking at here a probably underestimates.”

This is solid work. I do not doubt the existence of the trend it identifies. That said, I would also encourage us to keep it in perspective:

That chart (source) ends in 2005. Uppsala University’s Department of Peace and Conflict (UCDP) hasn’t updated its widely-used data set on battle-related deaths for 2014 yet, but from last year’s edition, we can see the tail end of that longer period, as well as the start of the recent upward trend PS21 identifies. In this chart—R script here—the solid line marks the annual, global sums of their best estimates, and the dotted lines show the sums of the high and low estimates:
Annual, global battle-related deaths, 1989-2013 (source: UCDP)

Annual, global battle-related deaths, 1989-2013 (Data source: UCDP)

If we mentally tack that chart onto the end of the one before it, we can also see that the increase of the past few years has not yet broken the longer spell of relatively low numbers of battle deaths. Not even close. The peak around 2000 in the middle of the nearer chart is a modest bump in the farther one, and the upward trend we’ve seen since 2007 has not yet matched even that local maximum. This chart stops at the end of 2013, but if we used the data assembled by PS21 for the past year to project an increase in 2014, we’d see that we’re still in reasonably familiar territory.

Both of these things can be true. We could be—we are—seeing a short-term increase that does not mark the end of a longer-term ebb. The global economy has grown fantastically since the 1700s, and yet it still suffers serious crises and recessions. The planet has warmed significantly over the past century, but we still see some unusually cool summers and winters.

Lest this sound too sanguine at a time when armed conflict is waxing, let me add two caveats.

First, the picture from the recent past looks decidedly worse if we widen our aperture to include deliberate killings of civilians outside of battle. UCDP keeps a separate data set on that phenomenon—here—which they label “one-sided” violence. If we add the fatalities tallied in that data set to the battle-related ones summarized in the previous plot, here is what we get:

Annual, global battle-related deaths and deaths from one-sided violence, 1989-2013 (Data source: UCDP)

Annual, global battle-related deaths and deaths from one-sided violence, 1989-2013 (Data source: UCDP)

Note the difference in the scale of the y-axis; it is an order of magnitude larger than the one in the previous chart. At this scale, the peaks and valleys in battle-related deaths from the past 25 years get smoothed out, and a single peak—the Rwandan genocide—dominates the landscape. That peak is still much lower than the massifs marking the two World Wars in the first chart, but it is huge nonetheless. Hundreds of thousands of people were killed in a matter of months.

Second, the long persistence of this lower rate does not prove that the risk of violent conflict on the scale of the two World Wars has been reduced permanently. As Bear Braumoeller (here) and Nassim Nicholas Taleb (here; I link reluctantly, because I don’t care for the scornful and condescending tone) have both pointed out, a single war between great powers could end or even reverse this trend, and it is too soon to say with any confidence whether or not the risk of that happening is much lower than it used to be. Like many observers of international relations, I think we need to see how the system processes the (relative) rise of China and declines of Russia and the United States before updating our beliefs about the risk of major wars. As someone who grew up during the Cold War and was morbidly fascinated by the possibility of nuclear conflagration, I think we also need to remember how close we came to nuclear war on some occasions during that long spell, and to ponder how absurdly destructive and terrible that would be.

Strictly speaking, I’m not an academic, but I do a pretty good impersonation of one, so I’ll conclude with a footnote to that second caveat: I did not attribute the idea that the risk of major war is a thing of the past to Steven Pinker, as some do, because as Pinker points out in a written response to Taleb (here), he does not make precisely that claim, and his wider point about a long-term decline in human violence does not depend entirely on an ebb in warfare persisting. It’s hard to see how Pinker’s larger argument could survive a major war between nuclear powers, but then if that happened, who would care one way or another if it had?

The Stacked-Label Column Plot

Most of the statistical work I do involves events that occur rarely in places over time. One of the best ways to get or give a feel for the structure of data like that is with a plot that shows variation in counts of those events across sequential, evenly-sized slices of time. For me, that usually means a sequence of annual, global counts of those events, like the one below for successful and failed coup attempts over the past several decades (see here for the R script that generated that plot and a few others and here for the data):

Annual, global counts of successful and failed coup attempts per the Cline Center's SPEED Project

Annual, global counts of successful and failed coup attempts per the Cline Center’s SPEED Project, 1946-2005

One thing I don’t like about those plots, though, is the loss of information that comes from converting events to counts. Sometimes we want to know not just how many events occurred in a particular year but also where they occurred, and we don’t want to have to query the database or look at a separate table to find out.

I try to do both in one go with a type of column chart I’ll call the stacked-label column plot. Instead of building columns from bricks of identical color, I use blocks of text that describe another attribute of each unit—usually country names in my work, but it could be lots of things. In order for those blocks to have comparable visual weight, they need to be equally sized, which usually means using labels of uniform length (e.g., two– or three-letter country codes) and a fixed-width font like Courier New.

I started making these kinds of plots in the 1990s, using Excel spreadsheets or tables in Microsoft Word to plot things like protest events and transitions to and from democracy. A couple decades later, I’m finally trying to figure out how to make them in R. Here is my first reasonably successful attempt, using data I just finished updating on when countries joined the World Trade Organization (WTO) or its predecessor, the General Agreement on Tariffs and Trade (GATT).

Note: Because the Wordpress template I use crams blog-post content into a column that’s only half as wide as the screen, you might have trouble reading the text labels in some browsers. If you can’t make out the letters, try clicking on the plot, then increasing the zoom if needed.

Annual, global counts of countries joining the global free-trade regime, 1960-2014

Annual, global counts of countries joining the global free-trade regime, 1960-2014

Without bothering to read the labels, you can see the time trend fine. Since 1960, there have been two waves of countries joining the global free-trade regime: one in the early 1960s, and another in the early 1990s. Those two waves correspond to two spates of state creation, so without the labels, many of us might infer that those stacks are composed mostly or entirely of new states joining.

When we scan the labels, though, we discover a different story. As expected, the wave in the early 1960s does include a lot of newly independent African states, but it also includes a couple of Warsaw Pact countries (Yugoslavia and Poland) and some middle-income cases from other parts of the world (e.g., Argentina and South Korea). Meanwhile, the wave of the early 1990s turns out to include very few post-Communist countries, most of which didn’t join until the end of that decade or early in the next one. Instead, we see a second wave of “developing” countries joining on the eve of the transition from GATT to the WTO, which officially happened on January 1, 1995. I’m sure people who really know the politics of the global free-trade regime, or of specific cases or regions, can spot some other interesting stories in there, too. The point, though, is that we can’t discover those stories if we can’t see the case labels.

Here’s another one that shows which countries had any coup attempts each year between 1960 and 2014, according to Jonathan Powell and Clayton Thyne‘s running list. In this case, color tells us the outcomes of those coup attempts: red if any succeeded, dark grey if they all failed.

Countries with any coup attempts per Powell and Thyne, 1960-2014

One story that immediately catches my eye in this plot is Argentina’s (ARG) remarkable propensity for coups in the early 1960s. It shows up in each of the first four columns, although only in 1962 are any of those attempts successful. Again, this is information we lose when we only plot the counts without identifying the cases.

The way I’m doing it now, this kind of chart requires data to be stored in (or converted to) event-file format, not the time-series cross-sectional format that many of us usually use. Instead of one row per unit–time slice, you want one row for each event. Each row should at least two columns with the case label and the time slice in which the event occurred.

If you’re interested in playing around with these types of plots, you can find the R script I used to generate the ones above here. Perhaps some enterprising soul will take it upon him- or herself to write a function that makes it easy to produce this kind of chart across a variety of data structures.

It would be especially nice to have a function that worked properly when the same label appears more than once in a given time slice. Right now, I’m using the function ‘match’ to assign y values that evenly stack the events within each bin. That doesn’t work for the second or third or nth match, though, because the ‘match’ function always returns the position of the first match in the relevant vector. So, for example, if I try to plot all coup attempts each year instead of all countries with any coup attempts each year, the second or later events in the same country get placed in the same position as the first, which ultimately means they show up as blank spaces in the columns. Sadly, I haven’t figured out yet how to identify location in that vector in a more general way to fix this problem.

Watching the States Get Made

The part of the world the US State Department calls “the Near East” is beset right now with a series of interlinked civil wars that threaten to cohere into a wider and even-worse regional conflagration. As it happens, this disorder is exposing the constitution and construction of the contemporary international system to a degree we don’t often see.

In fact, it’s all there in a single passage from a Reuters story on Yemen today (here). The passage starts like this:

[Yemeni President] Hadi’s flight to Aden has raised the prospect of armed confrontation between rival governments based in the north and south, creating chaos that could be exploited by the Yemen-based regional wing of al Qaeda.

Fighting is spreading across Yemen, and 137 people were killed on Friday in the bombings of two Shi’ite mosques in Sanaa. The bombings were claimed by Islamic State, an al Qaeda offshoot that controls large swaths of territory in Iraq and Syria and said it was also behind an attack that killed 23 people in Tunisia on Wednesday.

As this opening implies, the crisis in Yemen is not “just” a civil war—as if that weren’t bad enough for the people who live there. In addition, the state has almost-completely collapsed.

The contemporary international system is organized around the principle of sovereignty vested in organizations we call states. On paper, every swath of territory is legitimately claimed by one and only one government, and that government enjoys final authority over all political doings in its patch of earth (and airspace and, when relevant, coastal waters).

In much of Yemen right now, though, at least two rival factions lay claim to political authority in the same territory. They are actively fighting over those competing claims, but no faction is strong enough to win the struggle, so, effectively, there is no sovereign. What’s more, at least one of those factions isn’t just trying to seize control of part or all of an existing state. Instead, it is trying to create a new state of sorts that cuts across the borders of several extant ones.

Okay, so now what? The passage continues:

In [a recent] letter to [the UN] Security Council, Hadi called for a [UN Security Council] resolution to “deter the Houthi militias and their allies, to stop their aggression against all governorates, especially the city of Aden, and to support the legitimate authority”.

Here we see that, to help its cause, one faction is appealing for help to the organization that sits atop the existing system—the UN Security Council. Led for now by President Hadi, that faction bases its appeal on its purported legitimacy.

If asked, the leaders of that faction would probably say that their legitimacy flows from their victory in Yemen’s last national elections. Now, those elections weren’t exactly the freest and fairest on record, and the ongoing civil war makes plain that some segments of the population in the territory claimed by Yemen’s national government don’t recognize the election winners as their rightful sovereign. Those elections were also conducted with significant support from the UN and other governments. So, although they are construed as an internal source of legitimation, they could not have occurred without external intervention. Never mind, too, that the faction making this appeal also happens to be the one that has continued to permit the most powerful state in that system, the USA, to conduct drone strikes and other operations on its territory against one of that faction’s chief rivals in Yemen’s civil war.

Never mind all that. Elections are the mechanism that the organization sitting atop this system formally recognizes as the only rightful source of political authority, so the appeal makes sense.

So, how is the UN responding?

U.N. mediator Jamal Benomar is likely to brief the council on Sunday via video link, diplomats said. The Security Council is negotiating a statement on Yemen that could be adopted during the meeting, diplomats said.

“We join all of the other members of the Security Council in underscoring that President Hadi is the legitimate authority in Yemen,” Rathke said in a statement released in Washington.

Unsurprisingly, we see that the UN responds positively to an appeal that implicitly and explicitly reinforce the order it was established to protect and deepen. Even less surprising, we see this positive response in a case where the party making the appeal happens to be the faction favored by the states that wield the most power in that system. In his statement, UN mediator Benomar implies that the positive response reflects the domestic legitimacy of Hadi’s authority. In fact, the whole exchange reveals how sovereignty flows from the system to its parts as much as the other way around.

Finally, the passage concludes:

[Benomar] called on the Houthis and “their allies to stop their violent incitement” but made no mention of Iran, whose backing for the Houthis has raised U.S. concerns.

Hadi held open the door to a negotiated settlement with a call for the Houthis and other groups to attend peace talks in Saudi Arabia.

I think this bit is especially fascinating, because it shows how governments simultaneously play by and against the rules, and how the intergovernmental organizations constituted to codify and enforce those rules try to mitigate the damage by pretending this double-dealing isn’t happening. Iran is the only state specifically described as a transgressor here, but the passage also mentions Saudi Arabia, which has forever meddled in Yemeni affairs, and the US, which has done a lot more meddling in Yemen over the past 10 years or so as part of its so-called Global War on Terror. Talking openly of these double-dealings would underscore how prevalent they are, but these routines contradict the formal rules, so the defenders of the extant order try to minimize those behaviors’ corrosive effects by not speaking of them.

In our daily doings, many of us take for granted the organization of human society into a series of states whose boundaries have already been properly established and whose governments receive their political authority from the tacit or explicit consent of the people they rule. Meanwhile, when events conspire to pull back the curtain a bit, we see a messier scene in which powerful organizations continually engage in rituals and sometimes forceful actions that mostly but not always work to sustain that system; in which authority flows from power as much or more than the reverse; and in which upstarts keep trying (and mostly failing) to get in on the action or overturn the table.

Data Science Takes Work, Too

Yesterday, I got an email from the editor of an online publication inviting me to contribute pieces that would bring statistical analysis to bear on some topics they are hoping to cover. I admire the publication, and the topics interest me.

There was only one problem: the money. The honorarium they could offer for a published piece is less than my hourly consulting rate, and all of the suggested projects—as well most others I can imagine that would fit into this outlet’s mission—would probably take days to do. I would have to find, assemble, and clean the relevant data; explore and then analyze the fruits of that labor; generate and refine visualizations of those results; and, finally, write approximately 1,000 words about it. Extrapolating from past experience, I suspect that if I took on one of these projects, I would be working for less than minimum wage. And, of course, that estimated wage doesn’t account for the opportunity costs of foregoing other work (or leisure) I might have done during that time.

I don’t mean to cast aspersions on this editor. The publication is attached to a non-profit endeavor, so the fact that they were offering any payment at all already puts them well ahead of most peers. I’m also guessing that many of this outlet’s writers have salaried “day” jobs to which their contributions are relevant, so the honorarium is more of a bonus than a wage. And, of course, I spend hours of unpaid time writing posts for this blog, a pattern that some people might reasonably interpret as a signal of how much (or little) I think my time is worth.

Still, I wonder if part of the issue here is that this editor just had no idea how much work those projects would entail. A few days ago, Jeff Leek ran an excellent post on the Simply Statistics blog, about how “data science done well looks easy—and that is a big problem for data scientists.” As Leek points out,

Most well executed and successful data science projects don’t (a) use super complicated tools or (b) fit super complicated statistical models. The characteristics of the most successful data science projects I’ve evaluated or been a part of are: (a) a laser focus on solving the scientific problem, (b) careful and thoughtful consideration of whether the data is the right data and whether there are any lurking confounders or biases and (c) relatively simple statistical models applied and interpreted skeptically.

It turns out doing those three things is actually surprisingly hard and very, very time consuming. It is my experience that data science projects take a solid 2-3 times as long to complete as a project in theoretical statistics. The reason is that inevitably the data are a mess and you have to clean them up, then you find out the data aren’t quite what you wanted to answer the question, so you go find a new data set and clean it up, etc. After a ton of work like that, you have a nice set of data to which you fit simple statistical models and then it looks super easy to someone who either doesn’t know about the data collection and cleaning process or doesn’t care.

All I can say to all of that is: YES. On topics I’ve worked for years, I realize some economies of scale by knowing where to look for data, knowing what those data look like, and having ready-made scripts that ingest, clean, and combine them. Even on those topics, though, updates sometimes break the scripts, sources come and go, and the choice of model or methods isn’t always obvious. Meanwhile, on new topics, the process invariably takes many hours, and it often ends in failure or frustration because the requisite data don’t exist, or you discover that they can’t be trusted.

The visualization part alone can take a lot of time if you’re finicky about it—and you should be finicky about it, because your charts are what most people are going to see, learn from, and remember. Again, though, I think most people who don’t do this work simply have no idea.

Last year, as part of a paid project, I spent the better part of a day tinkering with an R script to ingest and meld a bunch of time series and then generate a single chart that would compare those time series. When I finally got the chart where I wanted it, I showed the results to someone else working on that project. He liked the chart and immediately proposed some other variations we might try. When I responded by pointing out that each of those variations might take an hour or two to produce, he was surprised and admitted that he thought the chart had come from a canned routine.

We laughed about it at the time, but I think that moment perfectly illustrates the disconnect that Gill describes. What took me hours of iterative code-writing and drew on years of accumulated domain expertise and work experience looked to someone else like nothing more than the result of a few minutes of menu-selecting and button-clicking. When that’s what people think you do, it’s hard to get them to agree to pay you well for what you actually do.

About That Decline in EU Contributions to UN Peacekeeping

A couple of days ago, Ambassador Samantha Power, the US Permanent Representative to the United Nations, gave a speech on peacekeeping in Brussels that, among other things, lamented a decline in the participation of European personnel in UN peacekeeping missions:

Twenty years ago, European countries were leaders in UN peacekeeping. 25,000 troops from European militaries served in UN peacekeeping operations – more than 40 percent of blue helmets at the time. Yet today, with UN troop demands at an all-time high of more than 90,000 troops, fewer than 6,000 European troops are serving in UN peacekeeping missions. That is less than 7 percent of UN troops.

The same day, Mark Leon Goldberg wrote a post for UN Dispatch (here) that echoed Ambassador Power’s remarks and visualized her point with a chart that was promptly tweeted by the US Mission to the UN:

Percentage of western European Troops in UN Peacekeeping missions (source: UN Dispatch)

When I saw that chart, I wondered if it might be a little misleading. As Ambassador Power noted in her remarks, the number of troops deployed as UN peacekeepers has increased significantly in recent years. With so much growth in the size of the pool, changes in the share of that pool contributed by EU members could result from declining contributions, but they could also result from no change, or from slower growth in EU contributions relative to other countries.

To see which it was, I used data from the International Peace Institute’s Providing for Peacekeeping Project to plot monthly personnel contributions from late 1991 to early 2014 for EU members and all other countries. Here’s what I got (and here is the R script I used to get there):

Monthly UN PKO personnel totals by country of origin, Nov 1991-Feb 2014

Monthly UN PKO personnel totals by country of origin, November 1991-February 2014

To me, that chart tells a different story than the one Ambassador Power and UN Dispatch describe. Instead of a sharp decline in European contributions over the past 20 years, we see a few-year surge in the early 1990s followed by a fairly constant level of EU member contributions since then. There’s even a mini-surge in 2005–2006 followed by a slow and steady return to the average level after that.

In her remarks, Ambassador Power compared Europe’s participation now to 20 years ago. Twenty years ago—late 1994 and early 1995—just happens to be the absolute peak of EU contributions. Not coincidentally, that peak coincided with the deployment of a UN PKO in Europe, the United Nations Protection Force (UNPROFOR) in Bosnia and Herzegovina, to which European countries contributed the bulk of the troops. In other words, when UN peacekeeping was focused on Europe, EU members contributed most of the troops. As the UN has expanded its peacekeeping operations around the world (see here for current info), EU member states haven’t really reduced their participation; instead, other countries have greatly increased theirs.

We can and should argue about how much peacekeeping the UN should try to do, and what various countries should contribute to those efforts. After looking at European participation from another angle, though, I’m not sure it’s fair to criticize EU members for “declining” involvement in the task.

Oh, and in case you’re wondering like I was, here’s a comparison of personnel contributions from EU members to ones from the United States over that same period. The US pays the largest share, but on the dimension Ambassador Power and UN Dispatch chose to spotlight—troop contributions—it offers very little.

unpko.contribution.comparison.eu.us

Monthly UN PKO personnel totals by country of origin, November 1991-February 2014

The Political Context of Political Forecasting

In Seeing Like a State, James Scott describes how governments have tried to make their societies more legible in pursuit of their basic organizational mission—”to arrange the population in ways that simplified the classic state functions of taxation, conscription, and prevention of rebellion.”

These state simplifications, the basic givens of modern statecraft, were, I began to realize, rather like abridged maps. They did not successfully represent the actual activity of the society they depicted, nor were they intended to; they represented only that slice of it that interested the official observer. They were, moreover, not just maps. Rather, they were maps that, when allied with state power, would enable much of the reality they depicted to be remade.

Statistical forecasts of political events are a form of legibility, too—an abridged map—with all the potential benefits and issues Scott identifies. Most of the time, the forecasts we generate focus on events or processes of concern to national governments and other already-powerful entities, like multinational firms and capital funds. These organizations are the ones who can afford to invest in such work, who stand to benefit most from it, and who won’t get in trouble for doing so. We talk about events “of interest” or “of concern” but rarely ask ourselves out loud: “Of interest to whom?” Sometimes we literally map our forecasts, but even when we don’t, the point of our work is usually to make the world more legible for organizations that are already wealthy or powerful so that they can better protect and expand their wealth and power.

If we’re doing our work as modelers right, then the algorithms we build to generate these forecasts will summarize our best ideas about things that cause or predict those events. Those ideas do not emerge in a vacuum. Instead, they are part of a larger intellectual and informational ecosystem that is also shaped by those same powerful organizations. Ideology and ideation cannot be fully separated.

In political forecasting, it’s not uncommon to have something that we believe to be usefully predictive but can’t include in our models because we don’t have data that reliably describe it. These gaps are not arbitrary. Sometimes they reflect technical, bureaucratic, or conceptual barriers, but sometimes they don’t. For example, no model of civil conflict can be close to “true” without including information about foreign support for governments and their challengers, but a lot of that support is deliberately hidden. Some of the same organizations that ask us to predict accurately hide from us some of the information we need most to do that.

Some of us try to escape the moral consequences of serving powerful organizations whose actions we don’t always endorse by making our work available to the public. If we share the forecasts with everyone, our (my) thinking goes, then we aren’t serving a particular master. Instead, we are producing a public good, and public goods are inherently good—right?

There are two problems with that logic. First, most of the public doesn’t have the interest or capacity to act on those forecasts, so sharing the forecasts with them will usually have little effect on their behavior. Second, some of the states and organizations that consume our public forecasts will apply them to ends we don’t like. For example, a dictatorial regime might see a forecast that it is susceptible to a new wave of nonviolent protest and respond by repressing harder. So, the practical effects of broadcasting our work will usually be modest, and some of them could even be harmful.

I know all of this, and I continue to do the work I do because it challenges and interests me, it pays well, and, I believe, some of it can help people do good. Still, I think it’s important periodically to remind ourselves—myself—that there is no escape from the moral consequences of this work, only trade-offs.

On Revolution, Theory or Ideology?

Humans understand and explain through stories, and the stories we in the US tell about why people rebel against their governments usually revolve around deprivation and injustice. In the prevailing narratives, rebellion occurs when states either actively make people suffer or passively fail to alleviate their suffering. Rebels in the American colonies made this connection explicit in the Declaration of Independence. This is also how we remember and understand lots of other rebellions we “like” and the figures who led them, from Moses to Robin Hood to Nelson Mandela.

As predictors of revolution, though, deprivation and injustice don’t fare so well. A chart in a recent Bloomberg Business piece on “the 15 most miserable economies in the world” got me thinking about this again. The chart shows the countries that score highest on a crude metric that sums a country’s unemployment rate and annual change in its consumer price index. Here are the results for 2015:

Of the 15 countries on that list, only two—Ukraine and Colombia—have ongoing civil wars, and it’s pretty hard to construe current unemployment or inflation as relevant causes in either case. Colombia’s civil war has run for decades. Ukraine’s war isn’t so civil (<cough> Russia <cough>), and this year’s spike in unemployment and inflation are probably more consequences than causes of that fighting. Frankly, I’m surprised that Venezuela hasn’t seen a sustained, large-scale challenge to its government since Hugo Chavez’s death and wonder if this year will prove different. But, so far, it hasn’t. Ditto for South Africa, where labor actions have at least hinted the potential for wider rebellion.

That chart, in turn, reminded me of a 2011 New York Times column by Charles Blow called “The Kindling of Change,” on the causes of revolutions in the Middle East and North Africa.  Blow wrote, “It is impossible to know exactly which embers spark a revolution, but it’s not so hard to measure the conditions that make a country prime for one.” As evidence, he offered the following table comparing countries in the region on several “conditions”:

The chart, and the language that precede it, seem to imply that these factors are ones that obviously “prime” countries for revolution. If that’s true, though, then why didn’t we see revolutions in the past few years in Algeria, Morocco,  Sudan, Jordan, and Iran? Morocco and Sudan saw smaller protest waves that failed to produce revolutions, but so did Kuwait and Bahrain. And why did Syria unravel while those others didn’t? It’s true that poorer countries are more susceptible to rebellions than richer ones, but it’s also true that poor countries are historically common and rebellions are not.

All of which makes me wonder how much our theories of rebellion are really theories at all, and not more awkward blends of selective observation and ideology. Maybe we believe that injustice explains rebellion because we want to live in a universe in which justice triumphs and injustice gets punished. When violent or nonviolent rebellions erupt, we often watch and listen to the participants enumerate grievances about poverty and indignity and take those claims as evidence of underlying causes. We do this even though we know that humans are unreliable archivists and interpreters of their own behavior and motivations, and that we could elicit similar tales of poverty and indignity from many, many more people who are not rebelling in those societies and others. If a recent study generalizes, then we in the US and other rich democracies are also consuming news that systematically casts rebels in a more favorable light than governments during episodes of protest and civil conflict abroad.

Meanwhile, when rebel groups don’t fit our profile as agents of justice, we rarely expand our theories of revolution to account for these deviant cases. Instead, we classify the organizations as “terrorists”, “radicals”, or “criminals” and explain their behavior in some other way, usually one that emphasizes flaws in the character or beliefs of the participants or manipulations of them by other nefarious agents. Boko Haram and the Islamic State are rebel groups in any basic sense of that term, but our explanations of their emergence often emphasize indoctrination instead of injustice. Why?

I don’t mean to suggest that misery, dignity, and rebellion are entirely uncoupled. Socioeconomic and emotional misery may and probably do contribute in some ways to the emergence of rebellion, even if they aren’t even close to sufficient causes of it. (For some deeper thinking on the causal significance of social structure, see this recent post by Daniel Little.)

Instead, I think I mean this post to serve as plea to avoid the simple versions of those stories, at least when we’re trying to function as explainers and not activists or rebels ourselves. In light of what we think we know about confirmation bias and cognitive dissonance, the fact that a particular explanation harmonizes with our values and makes us feel good should not be mistaken for evidence of its truth. If anything, it should motivate us to try harder to break it.

Follow

Get every new post delivered to your Inbox.

Join 10,983 other followers

%d bloggers like this: