I’m shutting down the blog. Well, I won’t be publishing new posts, anyway; the archived material will remain online, in case anyone finds it useful.

I started Dart-Throwing Chimp in the spring of 2011, not long after I made the jump to freelancing, with a few goals in mind. I wanted to make myself more visible and appealing to potential clients. I wanted to practice and improve as a writer, a researcher, and a coder. And I wanted to participate in interesting conversations with colleagues and the wider world.

The blog succeeded on all of those counts. In so doing, though, it also became a larger and larger job of its own. Some of the more involved posts, like my annual coup forecasts, required several days of work. Even the shorter ones often take hours to write, and there were stretches when I was writing three or four of those each week.

I don’t get paid for any of that time. For a while, that strategy made sense to me. It doesn’t any more. The personal and professional opportunity costs have come to outweigh the benefits.

I’m shuttering the blog, but I continue to look for new work as a writer and a data scientist, for lack of a better term. If you think I might be useful to you or your organization—as a freelancer, or maybe as something else—please let me know (ulfelder at gmail dot com; CV here).

Meanwhile, thanks for reading, and I hope I’ll see you around.

Measuring Trends in Human Rights Practices

I wrote a thing for Foreign Policy‘s Democracy Lab on evidence that widely-used data on human rights practices understate the improvements that have occurred around the world in the past few decades:

“It’s Getting Better All The Time”

The idea for the piece came from reading Chris Fariss’s  May 2014 article in American Political Science Review and then digging around in the other work he and others have done on the topic. It’s hard to capture the subtleties of a debate as technical as this one in a short piece for a general audience, so if you’re really interested in the subject, I would encourage you to read further. See especially the other relevant papers on Chris’s Publications page and the 2013 article by Anne Marie Clark and Kathryn Sikkink.

In the piece, I report that “some human rights scholars see Fariss’ statistical adjustments as a step in the right direction.” Among others I asked, Christian Davenport wrote to me that he agrees with Fariss about how human rights reporting has evolved over time, and what that implies for measurement of these trends. And Will Moore described Fariss’s estimates in an email as a “dramatic improvement” over previous measures. As it happens, Will is working with Courtenay Conrad on a data set of allegations of torture incidents around the world from specific watchdog groups (see here). Like Chris, Will presumes that the information we see about human rights violations is incomplete, so he encourages researchers to treat available information as a biased sample and use statistical models to better estimate the underlying conditions of concern.

When I asked David Cingranelli, one of the co-creators of what started out at the Cingranelli and Richards (CIRI) data set, for comment, he had this to say (and more, but I’ll just quote this bit here):

I’m not convinced that either the “human rights information paradox” or the “changing standard of accountability” produce a systematic bias in CIRI data. More importantly, the evidence presented by Clark and Sikkink and the arguments made by Chris Fariss do not convince me that there is a better alternative to the CIRI method of data recording that would be less likely to suffer from biases and imprecision. The CIRI method is not perfect, but it provides an optimal trade-off between data precision and transparency of data collection. Statistically advanced indexes (scores) might improve the precision but would for sure significantly reduce the ability of scholars to understand and replicate the data generation process.  Overall, the empirical research would suffer from such modifications.

I hope this piece draws wider attention to this debate, which interests me in two ways. The first is the substance: How have human rights practices changed over time? I don’t think Fariss’ findings settle that question in some definitive and permanent way, but they did convince me that the trend in the central tendency over the past 30 or 40 years is probably better than the raw data imply.

The second way this debate interests me is as another example of the profound challenges involved in measuring political behavior. As is the case with violent conflict and other forms of contentious politics, almost every actor at every step in the process of observing human rights practices has an agenda—who doesn’t?—and those agendas shape what information bubbles up, how it gets reported, and how it gets summarized as numeric data. The obvious versions of this are the attempts by violators to hide their actions, but activists and advocates also play important roles in selecting and shaping information about human rights practices. And, of course, there are also technical and practical features of local and global political economies that filter and alter the transmission of this information, including but certainly not limited to language and cultural barriers and access to communications technologies.

This blog post is now about half as long as the piece it’s meant to introduce, so I’ll stop here. If you work in this field or otherwise have insight into these issues and want to weigh in, please leave a comment here or at Democracy Lab.

Kill the Spider

When I was 15, I got my first paying job, doing yard work and odd tasks on summer weekdays for a family in Chapel Hill, North Carolina, where we lived at the time. Their house featured a rough stone foundation with lots of nooks and crannies.

One day, while spreading mulch around the edge of the house, I noticed a dense, funnel-shaped spider web emerging from one of those nooks. I picked up a short stick and touched it to the web. Nothing happened. I shook the stick a bit. Still nothing. I shook the stick a bit harder.

Suddenly, a large, oddly-patterned spider stood on the web above the end of the stick. The spider appeared so fast that it might as well have teleported to that spot, only inches from my hand.

I dropped the stick and leaped back. After I caught my breath, I ran to the garage and grabbed a can of insect-killing spray off the shelf. By the time I got back to the web, the spider had disappeared. I shot a long burst of insecticide into the hole, then another, then another.

Sometimes, I think foreign policy gets made the same way.

Halloween, Quantified

Some parents dress up for Halloween. Some throw parties. In our house, we—well, I, really; my wife was bemused, my younger son vaguely interested, and my elder son embarrassed—I collect and chart the data.

First, the flow of trick-or-treaters. The figure below shows counts in 15-minute bins of kids who came to our door for candy. The first arrival, a little girl in a fairy/princess costume, showed up around 5:50 PM, well before sunset. The deluge came an hour later, when a mob from a party next door blended with an uptick in other arrivals. The other peak came almost an hour after that and probably had a much higher median age than the earlier one. The final handful strolled through around 8:40, right when we were shutting down so we could fetch and drop off our own teenage boys from other parts of town.


This year, I also tallied which candy the trick-or-treaters chose. The figure below plots the resulting data. If the line ends early, it means we ran out of that kind of candy. As my wife predicted, the kids’ taste is basically the inverse of ours, which, as one costumed adult chaperoning his child pointed out, is “perfect.”


To collect the data, I sat on my front porch in a beach chair with a notepad, greeted the arriving kids, asked them to pick one, and then jotted tick marks as they left. Colleague Ali Necamp suggested that I put the candies in separate containers to make it easier to track who took what; I did, and she was right. Only a couple of people asked me why the candies were laid out in bins, and I clearly heard one kid approaching the house ask, “Mommy, why is that man sitting on the porch?”

Measurement Is Hard, Especially of Politics, and Everything Is Political

If occasional readers of this blog remember only one thing from their time here, I’d like it to be this: we may be getting better at measuring political things around the world, but huge gaps remain, sometimes on matters that seem basic or easy to see, and we will never close those gaps completely.

Two items this week reminded me of this point. The first came from the World Bank, which blogged that only about half of the countries they studied for a recent paper had “adequate” data on poverty. As a chart from an earlier World Bank blog post showed, the number of countries suffering from “data deprivation” on this topic has declined since the early 1990s, but it’s still quite large. Also notice that the period covered by the 2015 study ends in 2011. So, in addition to “everywhere”, we’ve still got serious problems with the “all the time” part of the Big Data promise, too.


The other thing that reminded me of data gaps was a post on the Lowy Institute’s Interpreter blog about Myanmar’s military, the Tatmadaw. According to Andrew Selth,

Despite its dominance of Burma’s national affairs for decades, the Tatmadaw remains in many respects a closed book. Even the most basic data is beyond the reach of analysts and other observers. For example, the Tatmadaw’s current size is a mystery, although most estimates range between 300,000 and 350,000. Official statistics put Burma’s defence expenditure this year at 3.7 % of GDP, but the actual level is unknown.

This kind of situation may be especially pernicious. It looks like we have data—350,000 troops, 3.7 percent of GDP—but the subject-matter expert knows that those data are not reliable. For those of us trying to do cross-national analysis of things like conflict dynamics or coup risk, the temptation to plow ahead with the numbers we have is strong, but we shouldn’t trust the inferences we draw from them.

The size and capability of a country’s military are obviously political matters. It’s not hard to imagine why governments might want to mislead others about the true values of those statistics.

Measuring poverty might seem less political and thus more amenable to technical fixes or workarounds, but that really isn’t true. At each step in the measurement process, the people being observed or doing the observing may have reasons to obscure or mislead. Survey respondents might not trust their observers; they may fear the personal or social consequences of answering or not answering certain ways, or just not like the intrusion. When the collection is automated, they may develop ways to fool the routines. Local officials who sometimes oversee the collection of those data may be tempted to fudge numbers that affect their prospects for promotion or permanent exile. National governments might seek to mislead other governments as a way to make their countries look stronger or weaker than they really are—stronger to deter domestic and international adversaries or get a leg up in ideological competitions, or weaker to attract aid or other help.

As social scientists, we dream of data sets that reliably track all sorts of human behavior. Our training should also make us sensitive to the many reasons why that dream is impossible and, in many cases, undesirable. Measurement begets knowledge; knowledge begets power; and struggles over power will never end.

The State of War in Syria

I’m just now seeing Bridget Conley’s recent post on the state of war in Syria, which appeared on the World Peace Foundation’s Reinventing Peace blog several days ago. I agree wholly with her diagnosis:

Critics of either U.S. or Russian policy would prefer the rhetorical simplicity of merely pointing out flaws in the other’s position. What is really the problem is that both want war.

Russia now embraces war as a means to ensure that its client, the Assad regime, remains in power. But the U.S. also embraces war as a means to try to achieve regime change and, presumably, other regional and global ends as well. If the Obama administration were primarily concerned with fostering peace or stability and minimizing civilian casualties, it probably should have taken a softer line on President Assad’s status much earlier in this conflict. Instead, it has continued to insist that the conflict cannot end without his departure from power, and it has deepened its support for militias seeking to attain that goal by force.

As Bridget argues, “The most likely outcome of all these pro-war positions is continued conflict.” That’s what the scholarship on foreign intervention in civil wars tells us to expect, and that’s what we’ve seen in Syria for the past few years.

On the alternatives, though, I am less hopeful than Bridget seems to be. Instead of trying to win two consecutive wars—one to topple Assad, and then another to rule post-Assad Syria—Bridget proposes this:

If protection of civilian lives and carving a greater space for democratic practice is the desired outcome, then it’s time to seize the moment and negotiate, playing hardball for a political solution that provides institutional guarantees for democratizing processes.

But here is the conundrum: How can the U.S. “play hardball” in negotiations over Syria’s fate if it does not wield a credible threat to impose some costly punishment on parties that refuse to negotiate, or that negotiate but threaten to renege on any deal reached? And, given the current state of this conflict, how can it credibly threaten to punish defectors from any deal without fighting? What other threats are going to be so costly that the warring parties would prefer a certain outcome in which they mostly lose to the present uncertainty in which they might win and, in some cases, are profiting along the way?

Alternatively, the U.S. could simply pull back from the fight and leave it to the belligerents and their other patrons to sort out. In her post, though, Bridget alludes to one reason the U.S. has not committed to a hands-off approach: the U.S. is not acting alone, and its ostensible allies in this conflict would carry on without its participation. By keeping its hands in the war, the Obama administration apparently sustains its hope of managing that coalition, and of gaining leverage on other issues beyond Syria. As far as I can tell, the administration also seems to accept the claim that any diminution of U.S. involvement in Syria automatically and durably concedes power to its Russian and Iranian rivals.

Another option is escalation—fight harder. As Dan Drezner recently pointed out, though, escalation only makes sense if you believe that fighting harder will push the war onto a preferred path at an acceptable cost. Like Dan, I haven’t yet heard a convincing description of how that would occur. Even if you manage to win the war to topple Assad, you then have to win the post-war fight and contain the regional and global repercussions, and every recent iteration of this approach has ended poorly. With so many players committed to working at cross-purposes, I cannot imagine how this iteration would be different.

What we’re left with is foreign policy as a form of witchcraft. As the warring parties fight, various onlookers mumble incantations, wave herbs, and dole out potions. They have faith in the effectiveness of these traditional practices. When events fail to take the desired turn, evil spirits are to blame, and the answer is more mojo. If events ever do turn favorably, everyone swears it was his last spell that did it.

Personally, I remain unconvinced that a hands-off approach would be worse than the status quo. Instead of investing more in fighting and killing, why not invest in opening our doors wider to refugees from this war and helping them resettle here? I know the answer to that question: because U.S. domestic politics won’t allow it. It’s a fantasy. But then, so is the delusion of control that has us investing in the further destruction of Syria, and only one of those two fantasies involves the U.S. government spending its money and sending its people to kill other people.

Which NFL Teams Are the Biggest Surprises of 2015 So Far?

We’re now 4.0625 weeks into the NFL’s 2015 regular season. (If you don’t know what the NFL is, you should probably stop reading now.) That’s about one-quarter of the whole 256-game shebang, enough to start taking stock of preseason predictions. So I got to wondering: Which teams have been the biggest surprises so far?

To get one answer to this question, I downloaded game results from Pro-Football-Reference.com (here) and compared them to the central tendencies of my preseason predictive simulations (here). The mean error of the predictions for each team so far is plotted below. The error in this case is the difference between the number of points by which the team was expected to win or lose each game and the number of points by which it actually won or lost. For example, my simulations had the Colts, on average, winning this week’s Thursday-night game against the Texans by 4, but they actually won by 7. That’s an error of +3 for the Colts and -3 for Houston. The mean error is the average of those errors across all games played so far. So, a positive mean error (blue dots) means the team is over-performing relative to the preseason predictions, while a negative mean error (red dots) means it’s under-performing.


Most of those results won’t surprise regular NFL watchers. The New York Football Jets finished 4–12 last year and ranked near the bottom in my preseason wiki survey, but they’re off to a 3–1 start this year. The Falcons, who went 6–10 in 2014 and garnered a low-middle score in the wiki survey, are undefeated after four weeks. At the other end of the scale, the Dolphins got a high-middle score in the preseason survey, but they have stumbled to a 1–3 start.

It’s also interesting (to me, anyway) to note how the team-specific errors are only loosely correlated with differences between predicted and observed records. For example, the Giants are only 2–2 so far this year, but they show up as one of the biggest over-performers of the first four weeks. That’s partly because both of those two losses were close games that could easily have flipped the other way. The Giants were expected to be on the bad side of mediocre, but they’ve been competitive in every game so far. Ditto for the Ravens, who only show up as mild under-performers but have a 1–3 record (sob). At least three of those four games were expected to be close, and all of them turned on late scores; unfortunately, only one of those four late turns broke in Baltimore’s favor.

This exercise is only interesting if the preseason predictions on which we’re basing the calls about over– or under-performance are sound. So far, they look pretty solid. After four weeks, the root mean squared error for the predicted net scores is 12.8, and the mean squared error is 165. Those look large, but I think they’re in line with other preseason score forecasts. If we convert the predicted net scores to binary predicted outcomes, the model is 40–23 after four weeks, or 41–23 if we include last night’s Colts-Texans game. That’s not exactly clairvoyant, but it beats eight of ESPN’s 13 experts and matches one more, and they make their predictions each week with updated information.

A Fictitional But Telling Take on Policy Relevance

I finally read and really enjoyed Todd Moss’s first novel, The Golden Hour. It’s a thriller starring Judd Ryker, a political scientist who gets pulled into service at the State Department to help apply a theory he developed on how to nip coups and civil wars in the bud. Before he’s offered that government job, Ryker comes to Washington to brief a small group at State on his ideas. At that point, Ryker has written about his theory but not really tested it. Here’s how the briefing ends:

“What is driving the results on coups? How can you explain what’s so special about timing? I understand the idea of a Golden Hour, but why does it exist?”

“We don’t really know. We can theorize that it probably has something to do with the dynamics of consolidating power after seizure. The coup makers must line up the rest of the security forces and maybe buy off parliament and other local political leaders before those loyal to the deposed president are able to react and countermove. It’s a race for influence. But these are just hypotheses.”

“What about external intervention? Does it matter if an external force gets involved diplomatically?” asked one staffer.

“Or militarily?” interjected another.

“We don’t have classifications for intervention, so it’s not in there,” replied Judd. “The numbers can’t tell us. So we don’t know. I guess we would—”

Parker interrupted abruptly. “But in your expert opinion, Ryker, does it matter? Would it make a difference? Does the United States need to find ways to intervene more rapidly in emerging crises in the developing world? Can we prevent more wars and coups by reacting more quickly?”

Judd looked around the room at all the eyes locked on him. My numbers don’t answer that question. Isn’t that what you guys are here for?

But instead he stood up straight, turned to look Landon Parker directly in the eyes, and said simply, “Yes.”

I think that passage says more about the true nature of the “policy relevance” dance than most of the blog posts I’ve read on that subject. It’s fiction, of course, but it’s written by someone who knows well both sides of that exchange, and it rang true to me.

As we learn later in the novel, the people Ryker was briefing already had a plan, and Ryker’s theory of a Golden Hour—a short window when emerging crises might still be averted—aligned nicely with their existing agenda. This is true, in part, because Ryker’s theory supports the view that U.S. policy makers can and should play an active role in defusing those crises. If Ryker’s theory had implied that U.S. involvement would only make things worse, he would never have been invited to give that briefing.

Scholars who spend time talking to policy makers joke about how much those audiences don’t like to hear “I don’t know” as an answer to questions about why something is happening. That’s real, but I think those audiences might get even more frustrated at hearing “There’s nothing you can do about it” or “Your efforts will only make things worse” in response to questions about what they should do. I suspect that many of those people pursued or accepted government jobs to try to effect change in the world—to “make a difference”—and they don’t want to sit idly while their short windows of opportunity pop open and slam shut.

Then there is Ryker’s decision to submit to his audience’s agenda. Ryker doesn’t know the answer to Parker’s question, and he knows he doesn’t know. Yet, in the moment, he chooses to feign confidence and say “yes” anyway.

The novel hints that this performance owes something to Ryker’s desire to please a mentor who has encouraged him to go into public service. That feels plausible to me, but I would also suspect a deeper and more generic motive: a desire to be wanted by powerful people, to “matter.” If my own experience is any guide, I’d say that we are flattered by attention, and we are eager to stand out. Having government officials ask for your advice feeds both of those cravings.

In short, selection effects abound. The subset of scholars who choose to pursue policy relevance is not a random sample of all academics, and the subset of that subset whose work resonates with policy audiences is not a not a random sample, either. Both partners in this dance have emotional agendas that draw them to each other and then shape their behavior in ways that don’t always align with their ostensible professional ideals: to advance national interests, and to be true to the evidence.

I won’t spoil the novel by telling you how things turn out in Ryker’s case. Instead, I’ll just invite those of you who ever find yourselves on one side or the other of these exchanges—or hope to land there—to consider why you’re doing what you’re doing, and to consider the alternatives before acting.

Military Coup in Burkina Faso

Yesterday, Burkina Faso suffered its second military coup in less than a year. Just a few weeks before scheduled national elections, members of the presidential guard (or RSP, per its French initials) arrested the interim president and prime minister and dissolved the government those men led. According to Reuters:

“The patriotic forces, grouped together in the National Council for Democracy, have decided today to put an end to the deviant transitional regime,” the military official said on RTB state television.

“The transition has progressively distanced itself from the objectives of refounding our democracy,” he said, adding that a revision of the electoral law that blocked supporters of Compaore from running in the planned Oct. 11 had “created divisions and frustrations amongst the people.”

My knowledge of politics in Burkina Faso is shallow, but if I had to guess why this coup happened now, this, also from Reuters, is what I would spotlight:

Burkina Faso’s powerful presidential guard should be dismantled, according to a commission charged with proposing reforms…

In a report submitted to Prime Minister Yacouba Isaac Zida, himself a former commander in the RSP, the national reconciliation and reform commission on Monday described the 1,200 troop strong unit as “an army within an army”.

It called for the regiment to be broken up and its members redeployed within the framework of a broader reform of the military.

In a July post, I spotlighted regional experts’ concerns about another coup by Burkina Faso’s presidential guard, observing how those concerns encapsulated the dilemma that confronts civilian politicians who wish to deepen democracy—or, more cynically, their own power—by strengthening their control over the military. Stronger civilian control means fewer military prerogatives, and as a general rule, political actors prefer not to cede power. I wonder if the RSP saw that reform commission’s report as a harbinger of its fate under the next batch of elected civilian leaders and decided to act now, against the shallow-rooted interim government.

In this year’s statistical assessments of coup risk, Burkina Faso ranked fifth in the world, in no small part because of the coup it suffered last year. As I discussed in a blog post a few years ago, when Mali got hit by its second coup in a 10-month span, coup attempts amplify uncertainty in ways that can keep a country on edge for years. Whether or not the latest coup attempt sticks and without touching the forecasting algorithm, I can tell you that Burkina Faso will land near the top of the global list in next year’s statistical assessments of coup risk, too.

A First-Person Reminder of How Not to Do Statistics and Science

I recently re-learned a lesson in statistical and scientific thinking by getting—or, really, putting—some egg on my face. I think this experience has some didactic value, so I thought I would share it here.

On Monday, the New York Times ran a story claiming that “cities across the nation are seeing a startling rise in murders after years of declines.” The piece included a chart showing year-over-year change in murder counts in 10 cities, many of them substantial, and it discussed various ideas about why homicide rates are spiking now after years of declines.

I read the piece and thought of claims made in the past decade about the relationship between lead (the metal) and crime. I don’t know the science on that topic, but I read about it in 2013 in Mother Jones, where Kevin Drum wrote:

We now have studies at the international level, the national level, the state level, the city level, and even the individual level. Groups of children have been followed from the womb to adulthood, and higher childhood blood lead levels are consistently associated with higher adult arrest rates for violent crimes. All of these studies tell the same story: Gasoline lead is responsible for a good share of the rise and fall of violent crime over the past half century.

When I read the NYT piece, though, I thought: If murder rates are now spiking in the U.S. but ambient lead levels remain historically low, doesn’t that disprove or at least undercut the claim that lead was responsible for the last crime wave? So I tweeted:

Jordan Wilcox pushed back:

Jordan was right, and I had made two basic mistakes en route to my glib but erroneous conclusion.

First and dumbest, I didn’t check the numbers. The Times only reported statistics from a small and non-representative sample of U.S. cities, and it only compared them across two years. In my experience, that’s not uncommon practice in popular-press trend pieces.

As Bruce Frederick argues in a Marshall Project commentary responding to the same NYT piece, however, that’s not a sound way to look for patterns. When Frederick took a deeper look at the latest police data across a more representative set of cases, he found that almost no U.S. cities appear to be experiencing changes in murder rates outside what we would expect from normal variation around historically low means of recent years. He concludes: “Neither the Times analysis nor my own yields compelling evidence that there has been a pervasive increase in homicides that is substantively meaningful.” On the Washington Post‘s Wonkblog, Max Ehrenfreund showed the same.

Second, even with the flawed statistics I had, I didn’t think carefully about how they related to the Pb-crime hypothesis. Instead, I thought: “We are experiencing a new crime wave and lead levels are still low; therefore lead does not explain the current wave; therefore lead can’t explain the last wave, either.”

In that simple chain of reasoning, I had failed to consider the possibility that different crime waves could have different causes—or really contributing factors, as no one doing careful work on this topic would offer a monocausal explanation of crime. Just as leaded gasoline came and went, other potentially relevant “treatments” that might affect crime rates could come and go, and those subsequent variations would provide little new information about the effects of lead at an earlier time. Imagine that in the near future that smoking is virtually eliminated and yet we still see a new wave of lung-cancer cases; would that new wave disprove the link between smoking and lung cancer? No. It might help produce a sharper estimate of the size of that earlier effect and give us a clearer picture of the causal mechanisms at work, but there’s almost always more than one pathway to the same outcome, and the affirmation of one does not disprove the possibility of another.

After reading more about the crime stats and thinking more about the evidence on lead, I’m back where I started. I believe that rates of homicide and other crimes remain close to historical lows in most U.S. cities, and I believe that lead exposure probably had a significant effect on crime rates in previous decades. That’s not terribly interesting, but it’s truer than the glib and provocative thing I tweeted, and it’s easier to see when I slow down and work more carefully through the basics.


Get every new post delivered to your Inbox.

Join 13,618 other followers

%d bloggers like this: