Election Monitoring : Democratization :: Drug Testing : Sport

AP reports today that the World Anti-Doping Agency (WADA) is conducting an “extraordinary” audit of Jamaica’s drug-testing agency after allegations surfaced that the Jamaican organization had failed to do its job for most of the six months leading to the London Games.

“There was a period of — and forgive me if I don’t have the number of months right — but maybe five to six months during the beginning part of 2012 where there was no effective operation,” WADA Director General David Howman said in an interview. “No testing. There might have been one or two, but there was no testing. So we were worried about it, obviously.”

As fan of track and field and of cycling, I read AP’s story and got a little sadder. At this point, you can’t see stellar performances from guys like Usain Bolt and wonder if banned drugs are what gave those superstars their crucial edge, and failures like this one don’t inspire much confidence.

As a professional observer of democratization, though, I read the AP story and was reminded of the challenges of international election monitoring. Both anti-doping and international election observation efforts involve under-resourced and overly-politicized watchdogs deploying occasional and imperfect tests to try to catch determined cheaters whose careers hang in the balance. Because the stakes are so high, the screening systems we devise are tuned to favor the cheaters. We tolerate errors of omission, or false negatives, to avoid accidentally ruining the reputations of people who aren’t doping or rigging elections, but in so doing, we tolerate a higher rate of cheating than I think most of us realize.

In sport, the bias in the system is encapsulated in the defensive deployment of the phrase “never failed a drug test” by athletes who later admit they cheated. The AP story on Jamaica’s breakdown applies that phrase to world’s fastest man Usain Bolt, and “never a failed test” was a favorite weapon of cyclist Lance Armstrong’s right up until he finally confessed to years of doping.

As statistician Kaiser Fung argues, the fact that an athlete has passed lots of drug tests doesn’t tell us a whole lot when the tests are deliberately skewed to minimize the chance of falsely accusing a “clean” athlete. When we set the threshold for a positive test very high, we create a system in which most cheaters will test negative most of the time. Under these conditions, even a large number of passed tests isn’t especially informative, and the circumstantial evidence—the stories from Lance Armstrong’s trainers and teammates, or the peculiar collapse of Jamaican drug-testing during a critical training period ahead of the London games—should be considered as well.

The election-monitoring equivalent of “never failed a drug test” is the phrase “largely free and fair.” International election observation missions typically deploy staffs of a couple dozen people that are assembled in an ad hoc fashion and have to cover a wide range of issues across whole countries. The missions often expand greatly around election day, but most polling sites still go unobserved, and technical prowess is not necessarily the primary consideration in the selection (or self-selection) of those short-term observers. The quality of the resulting arrangements varies widely, but even in the best of cases, these missions leave plenty of room for determined cheaters to fix the process in their favor.

If the main goal of these missions were to cast doubt on suspicious elections, this piecemeal approach would probably work fine. Even these shoestring missions often catch whiffs of foul play and say so in their reports. For better or for worse, though, these missions also serve political and diplomatic functions, and those other concerns often compel them to soft-pedal their criticisms. Observers want to catch cheats, but they also want to avoid becoming the catalysts of a political crisis and don’t want to discourage governments from participating in the international inspections regime. So the system bends to minimize the risk of false accusations, and we end up with a steady stream of “mostly free and fair” topline judgments that agents of electoral fraud and abuse can then repeat like a mantra to defeat or deflate their domestic political opponents.

Unfortunately, there’s no easy fix in either case. As Kaiser Fung also points out, these trade-offs are unavoidable when trying to detect hard-to-observe phenomena. As long as the tests are imperfect, any reduction in the rate of one kind of error will increase the rate of the other kind. You can slide the threshold up and down, but you can’t wish the errors away.

In sport, we have to decide if we care enough about doping to risk damaging the careers of more “innocent” athletes in pursuit of the (probably many) cheaters who are getting away with it under the current system. In election observation, we have to wonder if the international missions’ declarations of “free and fair” have become so devalued that they don’t serve their intended purpose, and if so, to ask if we’re willing to see more governments disengage from the regime in exchange for a sharper signal. If these choices were easy, we wouldn’t still be talking about them.

Forecasting Round-Up No. 4

Another in an occasional series of posts calling out interesting work on forecasting. See here, here, and here for earlier ones.

1. A gaggle of researchers at Penn State, including Phil Schrodt, have posted a new conference paper (PDF) showing how they are using computer-generated data on political interactions around the world (the oft-mentioned GDELT) to forecast various forms of political crisis with respectable accuracy.

One important finding from their research so far: models that mix dynamic data on political interactions with slow-changing data on relevant structural conditions (political, social, economic) produce more accurate forecasts than models that use only one or the other. That’s not surprising, but it is a useful confirmation nonetheless. Thanks to GDELT’s public release, I predict that we’ll see a lot more social-science modelers doing that kind of mixing in the near future.

2. Kaiser Fung reviews Predictive Analytics, a book by Eric Siegel. I haven’t read it, but Kaiser’s review makes me think it would be a good addition to my short list of recommended readings for forecasters.

3. Finally, the 2013 edition of the Failed States Index (FSI) is now up on Foreign Policy‘s web site (here). I call it out here to air a few grievances.

First, it makes me a little crazy that it’s hard to pin down exactly what this index is supposed to do. Is FSI meant to summarize recent conditions or to help forecast new troubles down the road? In their explication of the methodology behind it, the makers of the FSI acknowledge that it’s the largely former but also slide into describing it as an early-warning tool. And what exactly is “state failure,” anyway? They never quite say, which makes it hard to use the index as either a snapshot or a forecast.

Second, as I’ve said before on this blog, I’m also not a big fan of indices that roll up so many different things into a single value on the basis of assumptions alone. Statistical models also combine a lot of information, but they do so with weights that are derived from a systematic exploration of empirical evidence. FSI simply assumes all of its 12 components are equally relevant when there’s ample opportunity to check that assumption against the historical record. Maybe some of the index’s components are more informative than others, so why not use models to try to find out?

Last but not least, on the way FSI is presented, I think the angry reactions it elicits (see comments on previous editions or my Twitter feed whenever FSI is released) are a useful reminder of the risks of presenting rank-ordered lists based on minor variations in imprecise numbers. People spend a lot of time venting about relatively small differences between states (e.g., “Why is Ethiopia two notches higher than Syria?”) when those aren’t very informative, and aren’t really meant to be. I’ve run into the same problem when I’ve posted statistical forecasts of things like coup attempts and nonviolent uprisings, and I’m increasingly convinced that those rank-ordered lists are a distraction. To use the results without fetishizing the numbers, we might do better to focus on the counter-intuitive results (surprises) and on cases whose scores change a lot across iterations.

Some Suggested Readings for Political Forecasters

A few people have recently asked me to recommend readings on political forecasting for people who aren’t already immersed in the subject. Since the question keeps coming up, I thought I’d answer with a blog post. Here, in no particular order, are books (and one article) I’d suggest to anyone interested in the subject.

Thinking, Fast and Slow, by Daniel Kahneman. A really engaging read on how we think, with special attention to cognitive biases and heuristics. I think forecasters should read it in hopes of finding ways to mitigate the effects of these biases on their own work, and of getting better at spotting them in the thinking of others.

Numbers Rule Your World, by Kaiser Fung. Even if you aren’t going to use statistical models to forecast, it helps to think statistically, and Fung’s book is the most engaging treatment of that topic that I’ve read so far.

The Signal and the Noise, by Nate Silver. A guided tour of how forecasters in a variety of fields do their work, with some useful general lessons on the value of updating and being an omnivorous consumer of relevant information.

The Theory that Would Not Die, by Sharon Bertsch McGrayne. A history of Bayesian statistics in the real world, including successful applications to some really hard prediction problems, like the risk of accidents with atomic bombs and nuclear power plants.

The Black Swan, by Nicholas Nassim Taleb. If you can get past the derisive tone—and I’ll admit, I initially found that hard to do—this book does a great job explaining why we should be humble about our ability to anticipate rare events in complex systems, and how forgetting that fact can hurt us badly.

Expert Political Judgment: How Good Is It? How Can We Know?, by Philip Tetlock. The definitive study to date on the limits of expertise in political forecasting and the cognitive styles that help some experts do a bit better than others.

Counterfactual Thought Experiments in World Politics, edited by Philip Tetlock and Aaron Belkin. The introductory chapter is the crucial one. It’s ostensibly about the importance of careful counterfactual reasoning to learning from history, but it applies just as well to thinking about plausible futures, an important skill for forecasting.

The Foundation Trilogy, by Isaac Asimov. A great fictional exploration of the Modernist notion of social control through predictive science. These books were written half a century ago, and it’s been more than 25 years since I read them, but they’re probably more relevant than ever, what with all the talk of Big Data and the Quantified Self and such.

The Perils of Policy by P-Value: Predicting Civil Conflicts,” by Michael Ward, Brian Greenhill, and Kristin Bakke. This one’s really for practicing social scientists, but still. The point is that the statistical models we typically construct for hypothesis testing often won’t be very useful for forecasting, so proceed with caution when switching between tasks. (The fact that they often aren’t very good for hypothesis testing, either, is another matter. On that and many other things, see Phil Schrodt’s “Seven Deadly Sins of Contemporary Quantitative Political Analysis.“)

I’m sure I’ve missed a lot of good stuff and would love to hear more suggestions from readers.

And just to be absolutely clear: I don’t make any money if you click through to those books or buy them or anything like that. The closest thing I have to a material interest in this list are ongoing professional collaborations with three of the authors listed here: Phil Tetlock, Phil Schrodt, and Mike Ward.

The Importance of Thinking Statistically

In his enjoyable and accessible book, Numbers Rule Your World, statistician and blogger Kaiser Fung talks a lot about the value of “thinking statistically.” I was reminded of this point twice in the past 24 hours in ways that illustrate some common traps in our causal reasoning and, more generally, the difficulties of designing useful research.

First, I starting my Monday morning with a deeply disturbing but also annoying article in the New York Times, about a Tennessee pastor and his wife whose self-published book advocates corporal punishment as a basic part of child-rearing. The article was really a trend story in two parts. First, the article notes the book’s commercial success, which is linked to a wider resurgence in the use of corporal punishment in America. “More than 670,000 copies of the Pearls’ self-published book are in circulation,” we’re told, “and it is especially popular among Christian home-schoolers.” The real news hook, however, came from the second supposed trend: the deaths of three horribly abused kids in families that had been exposed to the Pearls’ teachings. “Debate over the Pearls’ teachings…gained new intensity after the death of a third child, all allegedly at the hands of parents who kept the Pearls’ book, To Train Up a Child, in their homes.”

The stories of extreme child abuse and neglect are the disturbing part of the article, and they are hard to read. Even so, the “data scientist” in me still managed to get annoyed by the insinuation that the Pearls are partly responsible for the three killings the article describes. The article’s author doesn’t flat-out blame the Pearls for the deaths of those three children, but he certainly entertains the idea.

In my view, this is a classic case of inference by anecdote. We see what looks like a cluster of related events (the three deaths); in looking at those events, we see exposure to a common factor that’s plausibly related to them (the Pearls’ book); and so we deduce that the factor caused or at least contributed to the events’ occurrence. The logic is the same as Michelle Bachmann’s absurd reasoning about vaccine safety: I met someone who said her daughter got vaccinated and suffered harm soon after; therefore vaccines are harmful, and parents should consider not using them.

Maybe the Pearls’ teachings do increase the risk of child abuse. To see if that’s true, though, we would need a lot more information. What the three deaths give us is a start on the numerator on one side of a comparison of rates of deadly child abuse among parents who have been exposed to the Pearls’ teachings and parents who have not.

Can we fill in any of those other blanks? Well, the advocacy group Childhelp tells us that more than 1,800 children die each year in the United States from child abuse and neglect (5 per day times 365 days), and the CIA Factbook says there are more than 60 million children under 14 in the U.S. That works out to an annual death rate about 0.003% (1,800 divided by 60 million). Meanwhile, the New York Times story tells us that the Pearls’ book is now in 670,000 households. If we assume that there are an average of two children in each of those households, that works out to 1.34 million kids in “exposed” families. For the risk to kids in those exposed families to be higher than the risks kids face in the general population, we would need to see more than 40 deaths from child abuse and neglect each year in that group of 1.34 million. To be confident that the difference wasn’t occurring by chance, we would need to see many more than 40 deaths from child abuse each year in that group.

Given the national rate of deaths from child abuse and neglect, it’s highly unlikely that the three killings discussed in the Times story are the only ones that have occurred in households with the Pearls’ book. Even so, once we widen our view beyond that “cluster” of three deaths and try to engage in a little comparison, it should become clear that we really don’t know whether or not the Pearls’ book is putting kids at increased risk of fatal abuse, and it’s arguably irresponsible to imply that it is on the basis of those three deaths alone.

The second time my statistical alarm went off in the past 24 hours was during a conversation on Twitter about the effectiveness of U.S. government support for pro-democracy movements in countries under authoritarian rule. As I’ve articulated elsewhere on this blog (see here and here, for example), I’m skeptical of the claim that U.S. support is required to help activists catalyze democratization and believe that it can sometimes even hurt that cause. That claim got me in a debate of sorts with Catherine Fitzpatrick, a human-rights activist who strongly believes U.S. support for democracy movements in other countries is morally and practically necessary. To rebut my argument, she challenged me to find “a carefully calibrated US-hands-off [movement] that succeeded in the world against a deadly authoritarian regime.”

She’s right that there aren’t many. The problem with reaching a conclusion from that fact alone, though, is that there aren’t many authoritarian regimes in which the U.S. government hasn’t provided some support for pro-democracy advocates. To infer something about the effects of U.S. democracy-promoting activity, we need to compare cases where the U.S. got involved with ones where it didn’t, and there are very few cases in the latter set. In experimental-design terms, we’ve got a large test group and a tiny control group.

Making the inferential job even tougher, countries aren’t randomly assigned to those two groups. I’m not privy to the conversations where these decisions are made, but I presume the U.S. government prefers to support movements in cases where it believes those efforts will be more effective. If those judgments are better than chance, then there is an element of self-fulfilling prophecy to the observation my Twitter debater made. This is what statisticians call a selection effect. The fact that democratization rarely occurs in the small set of cases where the U.S. does not publicly promote it (North Korea comes to mind) may simply be telling us that U.S. government is capable of recognizing situations where its efforts are almost certain to be wasted and acts accordingly.

I could go on about the difficulties of designing research on the effects of U.S. democracy-promotion efforts, but I’ll save that for another day. The big idea behind this post is that causal inference depends on careful comparisons. In the case of the New York Times story, we’re lured to infer from three deaths that the Pearls’ teachings put children at risk without considering how those kids might have fared had their parents never seen the Pearls’ book. In the case of my Twitter conversation, I’m told to understand that aggressive U.S. assistance to pro-democracy advocates makes democratization happen without considering how those advocates would have fared without U.S. help. In drawing attention to the need for thinking comparatively, I’m not claiming to have disproved those hypotheses. I’m just saying that we can’t tell without more information and, in so doing, inviting the authors of those hypotheses–and us all–to dig a little deeper before forming strong beliefs.

  • Author

  • Follow me on Twitter

  • Follow Dart-Throwing Chimp on WordPress.com
  • Enter your email address to follow this blog and receive notifications of new posts by email.

    Join 13,611 other followers
  • Archives

%d bloggers like this: