If At First You Don’t Succeed

A couple of weeks ago, I blogged about a failed attempt to do some exploratory text-mining on the US National Security Strategy reports (here). That project was supposed to give me a fun way to learn the basics of text mining in R, something I’ve been eager to do of late. In writing the blog post, I had two motives: 1) to help normalize the experience of getting stuck and failing in social science and data science, and 2) to appeal for help from more experienced coders who could help get me unstuck on this particular task.

The post succeeded on both counts. I won’t pepper you with evidence on the commiseration front, but I am excited to share the results of the coding improvements. In addition to learning how to text-mine, I have also been trying to learn how to use RStudio and Shiny to build interactive apps, and this project seemed like a good one to do both. So, I’ve created an app that lets users explore this corpus in three ways:

  • Plot word counts over time to see how the use of certain terms has waxed and waned over the 28 years the reports span.
  • Generate word clouds showing the 50 most common words in each of the 16 reports.
  • Explore associations between terms by picking one and see which 10 others are most closely correlated with it in the entire corpus.

For example, here’s a plot of change over time in the relative frequency of the term ‘terror’. Its usage spikes after 9/11 and then falls sharply when Barack Obama replaces George W. Bush as president.

NSS terror time trend

That pattern contrasts sharply with references to climate, which rarely gets mentioned until the Obama presidency, when its usage spikes upward. (Note, though, that the y-axis has been rescaled from the previous chart, so this large increase still has ‘climat’ only appearing about half as often as ‘terror’.)

NSS climat time trend

And here’s a word cloud of the 50 most common terms from the first US National Security Strategy, published in 1987. Surprise! The Soviet Union dominates the monologue.

NSS 1987 word cloud

When I built an initial version of the app a couple of Sundays ago, I promptly launched it on shinyapps.io to try to show it off. Unfortunately, the Shiny server only gives you 25 hours of free usage per billing cycle, and when I tweeted about the app, it got so much attention that those hours disappeared in a little over a day!

I don’t have my own server to host this thing, and I’m not sure when Shiny’s billing cycle refreshes. So, for the moment, I can’t link to a permanently working version of the app. If anyone reading this post is interested in hosting the app on a semi-permanent basis, please drop me a line at ulfelder <at> gmail. Meanwhile, R users can launch the app from their terminals with these two lines of code, assuming the ‘shiny’ package is already installed:

library(shiny)
runGitHub("national-security-strategy", "ulfelder")

You can also find all of the texts and code used in the app and some other stuff (e.g., the nss.explore.R script also implements topic modeling) in that GitHub repository, here.

A Good Dream

The novel Station Eleven—an immediate addition to my short list of favorite books—imagines the world after the global political economy has disintegrated. A flu pandemic has killed almost all humans, and the ones who remain inhabit the kinds of traveling bands or small encampments that are only vaguely familiar to most of us. There is no gasoline, no Internet, no electricity.

“I dreamt last night I saw an airplane,” Dieter whispered. They were lying a few feet apart in the dark of the tent. They had only ever been friends—in a hazy way Kirsten thought of him as family—but her thirty-year-old tent had finally fallen apart a year ago and she hadn’t yet managed to find a new one. For obvious reasons she was no longer sharing a tent with Sayid, so Dieter, who had one of the largest tents in the Symphony, had been hosting her. Kirsten heard soft voices outside, the tuba and the first violin on watch. The restless movements of the horses, penned between the three caravans for safety.

“I haven’t thought of an airplane in so long.”

“That’s because you’re so young.” A slight edge to his voice. “You don’t remember anything.”

“I do remember things. Of course I do. I was eight.”

Dieter had been twenty years old when the world ended. The main difference between Dieter and Kirsten was that Dieter remembered everything. She listened to him breathe.

“I used to watch for it,” he said. “I used to think about the countries on the other side of the ocean, wonder if any of them had somehow been spared. If I ever saw an airplane, that meant that somewhere planes still took off. For a whole decade after the pandemic, I kept looking at the sky.”

“Was it a good dream?”

“In the dream I was so happy,” he whispered. “I looked up and there it was, the plane had finally come. There was still a civilization somewhere. I fell to my knees. I started weeping and laughing, and then I woke up.”

Leaving New Orleans by jet yesterday morning only a couple of weeks after reading that book, flying—with wifi on a tablet!—felt miraculous again. As we lifted away from the airport an hour after sunrise on a clear day, I could see a dozen freighters lined up on the Mississippi, a vast industrial plant of some kind billowing steam on the adjacent shore, a railway spreading like capillaries as it ran out of the plant.

As we inhabit that world, it feels inevitable, but it was not. Our political economy is as natural as a termite mound, but it did not have to arise and cohere, to turn out like this—to turn out at all.

Nor does it have to persist. The first and only other time I visited New Orleans was in 2010, for the same conference in the same part of town—the Warehouse District, next to the river. Back then, a little closer to Katrina, visual reminders of the flood that already happened gave that part of the city an eerie feel. I stayed in hotel a half-mile south of the conference venue, and the walk to the Hilton led me past whole blocks that were still mostly empty, fresh coats of bright paint covering the facades that water had submerged five years before.

Now, with pictures in the news of tunnels scratched out of huge snow banks in Boston and Manhattan ringed by ice, it’s the future flood that haunts New Orleans in my mind as I walk back from an excursion to the French Quarter to get the best possible version of a drink made from boiled water and beans grown thousands of miles away, scores of Mardi Gras bead strings still hanging from some gutters. Climate change is “weirding” our weather, rendering the models we use to anticipate events like Katrina less and less reliable. A flood will happen again, probably sooner than we expect, and yet here everybody is, returning and rebuilding and cavorting right where all that water will want to go.

Self Points

For the second year in a row, Dart-Throwing Chimp won Best Blog (Individual) at the Online Achievement in International Studies awards, a.k.a. the Duckies (see below). Thank you for continuing to read and, apparently, for voting.

Duckie 2015

Before the awards, Eva Brittin-Snell, a student of IR at University of Sussex, interviewed a few of last year’s winners, including me, about blogging on international affairs. You can read her post on the SAGE Connection blog, here.

On Evaluating and Presenting Forecasts

On Saturday afternoon, at the International Studies Association‘s annual conference in New Orleans, I’m slated to participate in a round-table discussion with Patrick Brandt, Kristian Gleditsch, and Håvard Hegre on “assessing forecasts of (rare) international events.” Our chair, Andy Halterman, gave us three questions he’d like to discuss:

  1. What are the quantitative assessments of forecasting accuracy that people should use when they publish forecasts?
  2. What’s the process that should be used to determine whether a gain in a scoring metric represents an actual improvement in the model?
  3. How should model uncertainty and past model performance be conveyed to government or non-academic users of forecasts?

As I did for a Friday panel on alternative academic careers (here), I thought I’d use the blog to organize my thoughts ahead of the event and share them with folks who are interested but can’t attend the round table. So, here goes:

When assessing predictive power, we use perfection as the default benchmark. We ask, “Was she right?” or “How close to true value did he get?”

In fields where predictive accuracy is already very good, this approach seems reasonable. When the object of the forecasts are rare international events, however, I think this is a mistake, or at least misleading. It implies that perfection is attainable, and that distance from perfection is what we care about. In fact, approximating perfection is not a realistic goal in many fields, and what we really care about in those situations is distance from the available alternatives. In other words, I think we should always assess accuracy in comparative terms, not absolute ones. So, the question becomes: “Compared to what?”

I can think of two situations in which we’d want to forecast international events, and the ways we assess and describe the accuracy of the results will differ across the two. First, there is basic research, where the goal is to develop and test theory. This is what most scholars are doing most of the time, and here the benchmark should be other relevant theories. We want compare predictive power across nested models or models representing competing hypotheses to see which version does a better job anticipating real-world behavior—and, by implication, explaining it.

Then, of course, there is applied research, where the goal is to support or improve some decision process. Policy-making and investing are probably the two most common ones. Here, the benchmark should be the status quo. What we want to know is: “How much does the proposed forecasting process improve on the one(s) used now?” If the status quo is unclear, that already tells you something important about the state of forecasting in that field—namely, that it probably isn’t great. Even in that case, though, I think it’s still important to pick a benchmark that’s more realistic than perfection. Depending on the rarity of the event in question, that will usually mean either random guessing (for frequent events) or base rates (for rare ones).

How we communicate our findings on predictive power will also differ across basic and applied research, or at least I think it should. This has less to do with the goals of the work than it does with the audiences at which they’re usually aimed. When the audience is other scholars, I think it’s reasonable to expect them to understand the statistics and, so, to use those. For frequent events, Brier or logarithmic scores are often best, whereas for rare events I find that AUC scores are usually more informative, and I know a lot of people like to use F-1 scores in this context, too.

In applied settings, though, we’re usually doing the work as a service to someone else who probably doesn’t know the mechanics of the relevant statistics and shouldn’t be expected to. In my experience, it’s a bad idea in these settings to try to educate your customer on the spot about things like Brier or AUC scores. They don’t need to know those statistics, and you’re liable to come across as aloof or even condescending if you presume to spend time teaching them. Instead, I’d recommend using the practical problem they’re asking you to help solve to frame your representation of your predictive power. Propose a realistic decision process—or, if you can, take the one they’re already using—and describe the results you’d get if you plugged your forecasts into it.

In applied contexts, people often will also want to know how your process performed on crucial cases and what events would have surprised it, so it’s good to be prepared to talk about those as well. These topics are germane to basic research, too, but crucial cases will be defined differently in the two contexts. For scholars, crucial cases are usually understood as most-likely and least-likely ones in relation to the theory being tested. For policy-makers and other applied audiences, the crucial cases are usually understood as the ones where surprise was or would have been costliest.

So that’s how I think about assessing and describing the accuracy of forecasts of the kinds of (rare) international events a lot of us study. Now, if you’ll indulge me, I’d like to close with a pedantic plea: Can we please reserve the terms “forecast” and “prediction” for statements about things that haven’t happened and not apply them to estimates we generate for cases with known outcomes?

This might seem like a petty concern, but it’s actually tied to the philosophy of knowledge that underpins science, or my understanding of it, anyway. Making predictions about things that haven’t already happened is a crucial part of the scientific method. To learn from prediction, we assume that a model’s forecasting power tells us something about its proximity to the “true” data-generating process. This assumption won’t always be right, but it’s proven pretty useful over the past few centuries, so I’m okay sticking with it for now. For obvious reasons, it’s much easier to make accurate “predictions” about cases with known outcomes than unknown ones, so the scientific value of the two endeavors is very different. In light of that fact, I think we should be as clear and honest with ourselves and our audiences as we can about which one we’re doing, and therefore how much we’re learning.

When we’re doing this stuff in practice, there are three basic modes: 1) in-sample fitting, 2) cross-validation (CV), and 3) forecasting. In-sample fitting is the least informative of the three and, in my opinion, really should only be used in exploratory analysis and should not be reported in finished work. It tells us a lot more about the method than the phenomenon of interest.

CV is usually more informative than in-sample fitting, but not always. Each iteration of CV on the same data set moves you a little closer to in-sample fitting, because you effectively train to the idiosyncrasies of your chosen test set. Using multiple iterations of CV may ameliorate this problem, but it doesn’t always eliminate it. And on topics where the available data have already been worked to death—as they have on many problems of interest to scholars of international affairs—cross-validation really isn’t much more informative than in-sample fitting unless you’ve got a brand-new data series you can throw at the task and are focused on learning about it.

True forecasting—making clear statements about things that haven’t happened yet and then seeing how they turn out—is uniquely informative in this regard, so I think it’s important to reserve that term for the situations where that’s actually what we’re doing. When we describe in-sample and cross-validation estimates as forecasts, we confuse our readers, and we risk confusing ourselves about how much we’re really learning.

Of course, that’s easier for some phenomena than it is for others. If your theory concerns the risk of interstate wars, for example, you’re probably (and thankfully) not going to get a lot of opportunities to test it through prediction. Rather than sweeping those issues under the rug, though, I think we should recognize them for what they are. They are not an excuse to elide the huge differences between prediction and fitting models to history. Instead, they are a big haymaker of a reminder that social science is especially hard—not because humans are uniquely unpredictable, but rather because we only have the one grand and always-running experiment to observe, and we and our work are part of it.

“No One Stayed to Count the Bodies”

If you want to understand and appreciate why, even in the age of the Internet and satellites and near-ubiquitous mobile telephony, it remains impossible to measure even the coarsest manifestations of political violence with any precision, read this blog post by Phil Hazlewood, AFP’s bureau chief in Lagos. (Warning: graphic. H/t to Ryan Cummings on Twitter.)

Hazlewood’s post focuses on killings perpetrated by Boko Haram, but the same issues arise in measuring violence committed by states. Violence sometimes eliminates some people who might describe the acts involved, and it intentionally scares many others. If you hear or see details of what happened, that’s often because the killers or their rivals for power wanted you to hear or see those details. We cannot sharply distinguish between the communication of those facts and the political intentions expressed in the violence or the reactions to it. The conversation is the message, and the violence is part of the conversation.

When you see or hear things in spite of those efforts to conceal them, you have to wonder how selection effects limit or distort the information that gets through. North Korea’s gulag system apparently contains thousands and kills some untold numbers each year. Defectors are the outside world’s main source of information about that system, but those defectors are not a random sample of victims, nor are they mechanical recording devices. Instead, they are human beings who have somehow escaped that country and who are now seeking to draw attention to and destroy that system. I do not doubt the basic truth of the gulags’ existence and the horrible things done there, but as a social scientist, I have to consider how those selection processes and motivations shape what we think we know. In the United States, we lack reliable data on fatal encounters with police. That’s partly because different jurisdictions have different capabilities for recording and reporting these incidents, but it’s also partly because some people in that system do not want us to see what they do.

For a previous post of mine on this topic, see “The Fog of War Is Patchy“.

 

Some Thoughts on “Alternative Academic Careers”

I’m headed to New Orleans next week for the the annual convention of the International Studies Association (ISA), and while there I’m scheduled to speak on a panel on “alternative academic careers” (Friday at 1:45 PM). To help organize my views on the subject and to share them with people who are curious but can’t attend the panel, I thought I would turn them into a blog post. So:

Let me start with some caveats. I am a white, male U.S. citizen who grew up in a family that wasn’t poor and who married a woman while in grad school. I would prefer to deal in statistics, but on this topic, I can only really speak from experience, and that experience has been conditioned by those personal characteristics. In other words, I know my view is narrow and biased, but on this topic, that view is all I’ve got, so take it for whatever you think it’s worth.

My own career trajectory has been, I think, unusual. I got my Ph.D. from Stanford in the spring of 1997. At the time, I had no academic job offers; I had a spouse who wanted to go to art school; we had two dogs and couldn’t find a place we could afford that would rent to us in the SF Bay Area (this was the leading edge of the dot-com boom); and we both had family in the DC metro area. So, we packed up and moved in with my mother-in-law in Baltimore, and I started looking for work in and around Washington.

It took me a few months to land a job as an analyst in a little branch of a big forensic accounting firm, basically writing short pieces on political risk for a newsletter that went out to the firm’s corporate clients. We moved to the DC suburbs and I did that for about a year until I got a job with a small government contractor that did research projects for the U.S. “intelligence community” and the Department of Defense. After a couple of years of that and the birth of our first son, I decided I needed a change, so I took a job writing book-length research reports on telecom firms and industry segments for a trade-news outfit. At the time, I was also doing some freelance feature writing on whatever I could successfully pitch, and I thought a full-time writing job would help me move faster in that direction.

After a couple of years of that and no serious traction on the writing front, I got a line through one of my dissertation committee members on a part-time consulting thing with a big government contractor, SAIC. I was offered and took that job, which soon evolved into a full-time, salaried position as research director for the Political Instability Task Force. I did that for 10 years, then left it at the end of 2011 to try freelancing as a social scientist. Most of my work time since then has been devoted to the Early Warning Project and the Good Judgment Project, with dribs and drabs on other things and enough free time to write this blog.

So that’s where I’m coming from. Now, here is what I think I’ve learned from those experiences, and from watching and talking to others with similar training who do related things.

First, freelancing—what I’m doing now, and what usually gets fancified as “independent consulting”—is not a realistic option for most social scientists early in their careers, and probably not for most people, period. I would not have landed either of the large assignments I’ve gotten in the past few years without the professional network and reputation I had accumulated over the previous ten. This blog and my activity on social media have helped me expand that network, but nearly all of the paid jobs I’ve done in the past few years have come through those earlier connections. Best I can tell, there are no realistic short cuts to this kind of role.

Think tanks aren’t really an option for new Ph.D.s, either. Most of the jobs in that world are for recent undergraduates or MAs on the one hand and established scholars and practitioners on the other. There are some opportunities for new Ph.Ds looking to do straight-up analysis at places like RAND and the Congressional Research Service, but the niche is tiny, and those jobs will be very hard to land.

That brings me to the segment I know best, namely, government contracting. Budget cuts mean that jobs in that market are scarcer than they were a few years ago, but there are still lots of them. In that world, though, hiring is often linked to specific roles in contracts that have already been awarded. You’re not paid to pursue your own interests or write policy briefs; you’re usually hired to do tasks X and Y on contract Z, with the expectation that you’ll also be able to apply those skills to other, similar contracts after Z runs out. So, to land these jobs, you need to have abilities that can plug into clearly-defined contract tasks, but that also make you a reasonably good risk overall. Nowadays, programming and statistical (“data science”) skills are in demand, but so are others, including fluency in foreign languages and area expertise. If you want to get a feel for what’s valued in that world, spend some time looking at job listings from the big contracting firms—places like Booz Allen Hamilton, Lockheed Martin, SAIC, and Leidos—and see what they’re asking for.

If you do that, you’ll quickly discover that having a security clearance is a huge plus. If you think you want to do government-contract work, you can give yourself a leg up by finding some way to get that clearance while you’re still in graduate school. If you can’t do that, you might consider looking for work that isn’t your first choice on substance but will lead to a clearance in hopes of making an upward or lateral move later on.

All of that said, it’s important to think carefully about the down sides of a security clearance before you pursue one. A clearance opens some doors, but it closes others, some permanently. The process can take a long time, and it can be uncomfortable. When you get a clearance, you accept some constraints on your speech, and those never entirely fall away, even if you leave that world. The fact that you currently or once held a clearance and worked with agencies that required one will also make a certain impression on some people, and that never fully goes away, either. So it’s worth thinking carefully about those long-term implications before you jump in.

If you do go to work on either side of that fence—for a contractor, or directly for the government—you need to be prepared to work on topics on which you’re not a already an expert. It’s often your analytic skills they’re after, not your topical expertise, so you should be prepared to stretch that way. Maybe you’ll occasionally or eventually work back to your original research interests, maybe you’ll discover new ones, or maybe you’ll get bored or frustrated and restart your job search. In all cases, the process will go better if you’re prepared for any of the above.

Last but not least, if you’re not going into academia, I think it can help to know that your first job after grad school does not make or break your career. I don’t have data to support this claim, but my own experience and my observation of others tells me that nonacademic careers are not nearly as path dependent as academic ones. That can be scarier in some ways, but it’s also potentially liberating. Within the nontrivial constraints of the market, you can keep reinventing yourself, and I happen to think that’s great.

A Tale of Normal Failure

When I blog about my own research, I usually describe work I’ve already completed and focus on the results. This post is about a recent effort that ended in frustration, and it focuses on the process. In writing about this aborted project, I have two hopes: 1) to reassure other researchers (and myself) that this kind of failure is normal, and 2) if I’m lucky, to get some help with this task.

This particular ball got rolling a couple of days ago when I read a blog post by Dan Drezner about one aspect of the Obama administration’s new National Security Strategy (NSS) report. A few words in the bits Dan quoted got me thinking about the worldview they represented, and how we might use natural-language processing (NLP) to study that:

At first, I was just going to drop that awkwardly numbered tweetstorm and leave it there. I had some time that afternoon, though, and I’ve been looking for opportunities to learn text mining, so I decided to see what I could do. The NSS reports only became a thing in 1987, so there are still just 16 of them, and they all try to answer the same basic questions: What threats and opportunities does the US face in the world, and what should the government do to meet them? As such, they seemed like the kind of manageable and coherent corpus that would make for a nice training exercise.

I started by checking to see if anyone had already done with earlier reports what I was hoping to do with the latest one. It turned out that someone had, and to good effect:

I promptly emailed the corresponding author to ask if they had replication materials, or even just clean versions of the texts for all previous years. I got an autoreply informing me that the author was on sabbatical and would only intermittently be reading his email. (He replied the next day to say that he would put the question to his co-authors, but that still didn’t solve my problem, and by then I’d moved on anyway.)

Without those materials, I would need to start by getting the documents in the proper format. A little Googling led me to the National Security Strategy Archive, which at the time had PDFs of all but the newest report, and that one was easy enough to find on the White House’s web site. Another search led me to a site that converts PDFs to plain text online for free. I spent the next hour or so running those reports through the converter (and playing a little Crossy Road on my phone while I waited for the jobs to finish). Once I had the reports as .txt files, I figured I could organize my work better and do other researchers a solid by putting them all in a public repository, so I set one up on GitHub (here) and cloned it to my hard drive.

At that point, I was getting excited, thinking: “Hey, this isn’t so hard after all.” In most of the work I do, getting the data is the toughest part, and I already had all the documents I wanted in the format I needed. I was just a few lines of code away from the statistics and plots and that would confirm or infirm my conjectures.

From another recent collaboration, I knew that the next step would be to use some software to ingest those .txt files, scrub them a few different ways, and then generate some word counts and maybe do some topic modeling to explore changes over time in the reports’ contents. I’d heard several people say that Python is really good at these tasks, but I’m an R guy, so I followed the lead on the CRAN Task View for natural language processing and installed and loaded the ‘tm’ package for text mining.

And that’s where the wheels started to come off of my rickety little wagon. Using the package developers’ vignette and an article they published in the Journal of Statistical Software, I started tinkering with some code. After a couple of false starts, I found that I could create a corpus and run some common preprocessing tasks on it without too much trouble, but I couldn’t get the analytical functions to run on the results. Instead, I kept getting this error message:

Error: inherits(doc, "TextDocument") is not TRUE

By then it was dinner time, so I called it a day and went to listen to my sons holler at each other across the table for a while.

When I picked the task back up the next morning, I inspected a few of the scrubbed documents and saw some strange character strings—things like ir1 instead of in and ’ where an apostrophe should be. That got me wondering if the problem lay in the encoding of those .txt files. Unfortunately, neither the files themselves nor the site that produced them tell me which encoding they use. I ran through a bunch of options, but none of them fixed the problem.

“Okay, no worries,” I thought. “I’ll use gsub() to replace those funky bits in the strings by hand.” The commands ran without a hiccup, but the text didn’t change. Stranger, when I tried to inspect documents in the R terminal, the same command wouldn’t always produce the same result. Sometimes I’d get the head, and sometimes the tail. I tried moving back a step in the process and installed a PDF converter that I could run from R, but R couldn’t find the converter, and my attempts to fix that failed.

At this point, I was about ready to quit, and I tweeted some of that frustration. Igor Brigadir quickly replied to suggest a solution, but it involved another programming language, Python, that I don’t know:

To go that route, I would need to start learning Python. That’s probably a good idea for the long run, but it wasn’t going to happen this week. Then Ken Benoit pointed me toward a new R package he’s developing and even offered to help me :

That sounded promising, so I opened R again and followed the clear instructions on the README at Ken’s repository to install the package. Of course the installation failed, probably because I’m still using R Version 3.1.1 and the package is, I suspect, written for the latest release, 3.1.2.

And that’s where I finally quit—for now. I’d hit a wall, and all my usual strategies for working through or around it had either failed or led to solutions that would require a lot more work. If I were getting paid and on deadline, I’d keep hacking away, but this was supposed to be a “fun” project for my own edification. What seemed at first like a tidy exercise had turned into a tar baby, and I needed to move on.

This cycle of frustration –> problem-solving –> frustration might seem like a distraction from the real business of social science, but in my experience, it is the real business. Unless I’m performing a variation on a familiar task with familiar data, this is normal. It might be boring to read, but then most of the day-to-day work of social science probably is, or at least looks that way to the people who aren’t doing it and therefore can’t see how all those little steps fit into the bigger picture.

So that’s my tale of minor woe. Now, if anyone who actually knows how to do text-mining in R is inspired to help me figure out what I’m doing wrong on that National Security Strategy project, please take a look at that GitHub repo and the script posted there and let me know what you see.

Demography and Democracy Revisited

Last spring on this blog, I used Richard Cincotta’s work on age structure to take another look at the relationship between democracy and “development” (here). In his predictive models of democratization, Rich uses variation in median age as a proxy for a syndrome of socioeconomic changes we sometimes call “modernization” and argues that “a country’s chances for meaningful democracy increase as its population ages.” Rich’s models have produced some unconventional predictions that have turned out well, and if you buy the scientific method, this apparent predictive power implies that the underlying theory holds some water.

Over the weekend, Rich sent me a spreadsheet with his annual estimates of median age for all countries from 1972 to 2015, so I decided to take my own look at the relationship between those estimates and the occurrence of democratic transitions. For the latter, I used a data set I constructed for PITF (here) that covers 1955–2010, giving me a period of observation running from 1972 to 2010. In this initial exploration, I focused specifically on switches from authoritarian rule to democracy, which are observed with a binary variable that covers all country-years where an autocracy was in place on January 1. That variable (rgjtdem) is coded 1 if a democratic regime came into being at some point during that calendar year and 0 otherwise. Between 1972 and 2010, 94 of those switches occurred worldwide. The data set also includes, among other things, a “clock” counting consecutive years of authoritarian rule and an indicator for whether or not the country has ever had a democratic regime before.

To assess the predictive power of median age and compare it to other measures of socioeconomic development, I used the base and caret packages in R to run 10 iterations of five-fold cross-validation on the following series of discrete-time hazard (logistic regression) models:

  • Base model. Any prior democracy (0/1), duration of autocracy (logged), and the product of the two.
  • GDP per capita. Base model plus the Maddison Project’s estimates of GDP per capita in 1990 Geary-Khamis dollars (here), logged.
  • Infant mortality. Base model plus the U.S. Census Bureau’s estimates of deaths under age 1 per 1,000 live births (here), logged.
  • Median age. Base model plus Cincotta’s estimates of median age, untransformed.

The chart below shows density plots and averages of the AUC scores (computed with ‘roc.area’ from the verification package) for each of those models across the 10 iterations of five-fold CV. Contrary to the conventional assumption that GDP per capita is a useful predictor of democratic transitions—How many papers have you read that tossed this measure into the model as a matter of course?—I find that the model with the Maddison Project measure actually makes slightly less accurate predictions than the one with duration and prior democracy alone. More relevant to this post, though, the two demographic measures clearly improve the predictions of democratic transitions relative to the base model, and median age adds a smidgen more predictive signal than infant mortality.

transit.auc.by.fold

Of course, all of these things—national wealth, infant mortality rates, and age structures—have also been changing pretty steadily in a single direction for decades, so it’s hard to untangle the effects of the covariates from other features of the world system that are also trending over time. To try to address that issue and to check for nonlinearity in the relationship, I used Simon Wood’s mgcv package in R to estimate a semiparametric logistic regression model with smoothing splines for year and median age alongside the indicator of prior democracy and regime duration. Plots of the marginal effects of year and median age estimated from that model are shown below. As the left-hand plot shows, the time effect is really a hump in risk that started in the late 1980s and peaked sharply in the early 1990s; it is not the across-the-board post–Cold War increase that we often see covered in models with a dummy variable for years after 1991. More germane to this post, though, we still see a marginal effect from median age, even when accounting for those generic effects of time. Consistent with Cincotta’s argument and other things being equal, countries with higher median age are more likely to transition to democracy than countries with younger populations.

transit.ageraw.effect.spline.with.year

I read these results as a partial affirmation of modernization theory—not the whole teleological and normative package, but the narrower empirical conjecture about a bundle of socioeconomic transformations that often co-occur and are associated with a higher likelihood of attempting and sustaining democratic government. Statistical studies of this idea (including my own) have produced varied results, but the analysis I’m describing here suggests that some of the null results may stem from the authors’ choice of measures. GDP per capita is actually a poor proxy for modernization; there are a number of ways countries can get richer, and not all of them foster (or are fostered by) the socioeconomic transformations that form the kernel of modernization theory (cf. Equatorial Guinea). By contrast, demographic measures like infant mortality rates and median age are more tightly coupled to those broader changes about which Seymour Martin Lipset originally wrote. And, according to my analysis, those demographic measures are also associated with a country’s propensity for democratic transition.

Shifting to the applied forecasting side, I think these results confirm that median age is a useful addition to models of regime transitions, and it seems capture more information about those propensities than GDP (by a lot) and infant mortality (by a little). Like all slow-changing structural indicators, though, median age is a blunt instrument. Annual forecasts based on it alone would be pretty clunky, and longer-term forecasts would do well to consider other domestic and international forces that also shape (and are shaped by) these changes.

PS. If you aren’t already familiar with modernization theory and want more background, this ungated piece by Sheri Berman for Foreign Affairs is pretty good: “What to Read on Modernization Theory.”

PPS. The code I used for this analysis is now on GitHub, here. It includes a link to the folder on my Google Drive with all of the required data sets.

A Postscript on Measuring Change Over Time in Freedom in the World

After publishing yesterday’s post on Freedom House’s latest Freedom in the World report (here), I thought some more about better ways to measure what I think Freedom House implies it’s measuring with its annual counts of country-level gains and declines. The problem with those counts is that they don’t account for the magnitude of the changes they represent. That’s like keeping track of how a poker player is doing by counting bets won and bets lost without regard to their value. If we want to assess the current state of the system and compare it earlier states, the size of those gains and declines matters, too.

With that in mind, my first idea was to sum the raw annual changes in countries’ “freedom” scores by year, where the freedom score is just the sum of those 7-point political rights and civil liberties indices. Let’s imagine a year in which three countries saw a 1-point decline in their freedom scores; one country saw a 1-point gain; and one country saw a 3-point gain. Using Freedom House’s measure, that would look like a bad year, with declines outnumbering gains 3 to 2. Using the sum of the raw changes, however, it would look like a good year, with a net change in freedom scores of +1.

Okay, so here’s a plot of those sums of raw annual changes in freedom scores since 1982, when Freedom House rejiggered the timing of its survey.[1] I’ve marked the nine-year period that Freedom House calls out in its report as an unbroken run of bad news, with declines outnumbering gains every year since 2006. As the plot shows, when we account for the magnitude of those gains and losses, things don’t look so grim. In most of those nine years, losses did outweigh gains, but the net loss was rarely large, and two of the nine years actually saw net gains by this measure.

Annual global sums of raw yearly changes in Freedom House freedom scores (inverted), 1983-2014

Annual global sums of raw yearly changes in Freedom House freedom scores (inverted), 1983-2014

After I’d generated that plot, though, I worried that the sum of those raw annual changes still ignored another important dimension: population size. As I understand it, the big question Freedom House is trying to address with its annual report is: “How free is the world?” If we want to answer that question from a classical liberal perspective—and that’s where I think Freedom House is coming from—then individual people, not states, need to be our unit of observation.

Imagine a world with five countries where half the global population lives in one country and the other half is evenly divided between the other four. Now let’s imagine that the one really big country is maximally unfree while the other four countries are maximally free. If we compare scores (or changes in them) by country, things look great; 80 percent of the world is super-free! Meanwhile, though, half the world’s population lives under total dictatorship. An international relations theorist might care more about the distribution of states, but a liberal should care more about the distribution of people.

To take a look at things from this perspective, I decided to generate a scalar measure of freedom in the world system that sums country scores weighted by their share of the global population.[2] To make the result easier to interpret, I started by rescaling the country-level “freedom scores” from 14-2 to 0-10, with 10 indicating most free. A world in which all countries are fully free (according to Freedom House) would score a perfect 10 on this scale, and changes in large countries will move the index more than changes in small ones.

Okay, so here’s a plot of the results for the entire run of Freedom House’s data set, 1972–2014. (Again, 1981 is missing because that’s when Freedom House paused to align their reports with the calendar year.)  Things look pretty different than they do when we count gains and declines or even sum raw changes by country, don’t they?

A population-weighted annual scalar measure of freedom in the world, 1972-2014

A population-weighted annual scalar measure of freedom in the world, 1972-2014

The first thing that jumped out at me were those sharp declines in the mid-1970s and again in the late 1980s and early 1990s. At first I thought I must have messed up the math, because everyone knows things got a lot better when Communism crumbled in Eastern Europe and the Soviet Union, right? It turns out, though, that those swings are driven by changes in China and India, which together account for approximately one-third of the global population. In 1989, after Tienanmen Square, China’s score dropped from a 6/6 (or 1.67 on my 10-point scalar version) to 7/7 (or 0). At the time, China contained nearly one-quarter of the world’s population, so that slump more than offsets the (often-modest) gains made in the countries touched by the so-called fourth wave of democratic transitions. In 1998, China inched back up to 7/6 (0.83), and the global measure moved with it. Meanwhile, India dropped from 2/3 (7.5) to 3/4 (5.8) in 1991, and then again from 3/4 to 4/4 (5.0) in 1993, but it bumped back up to 2/4 (6.67) in 1996 and then 2/3 (7.5) in 1998. The global gains and losses produced by the shifts in those two countries don’t fully align with the conventional narrative about trends in democratization in the past few decades, but I think they do provide a more accurate measure of overall freedom in the world if we care about people instead of states, as liberalism encourages us to do.

Of course, the other thing that caught my eye in that second chart was the more-or-less flat line for the past decade. When we consider the distribution of the world’s population across all those countries where Freedom House tallies gains and declines, it’s hard to find evidence of the extended democratic recession they and others describe. In fact, the only notable downturn in that whole run comes in 2014, when the global score dropped from 5.2 to 5.1. To my mind, that recent downturn marks a worrying development, but it’s harder to notice it when we’ve been hearing cries of “Wolf!” for the eight years before.

NOTES

[1] For the #Rstats crowd: I used the slide function in the package DataCombine to get one-year lags of those indices by country; then I created a new variable representing the difference between the annual score for the current and previous year; then I used ddply from the plyr package to create a data frame with the annual global sums of those differences. Script on GitHub here.

[2] Here, I used the WDI package to get country-year data on population size; used ddply to calculate world population by year; merged those global sums back into the country-year data; used those sums as the denominator in a new variable indicating a country’s share of the global population; and then used ddply again to get a table with the sum of the products of those population weights and the freedom scores. Again, script on GitHub here (same one as before).

No, Democracy Has Not Been Discarded

Freedom House released the latest iteration of its annual Freedom in the World report yesterday (PDF) with the not-so-subtle subtitle “Discarding Democracy: Return to the Iron Fist.” The report starts like this (emphasis in original):

In a year marked by an explosion of terrorist violence, autocrats’ use of more brutal tactics, and Russia’s invasion and annexation of a neighboring country’s territory, the state of freedom in 2014 worsened significantly in nearly every part of the world.

For the ninth consecutive year, Freedom in the World, Freedom House’s annual report on the condition of global political rights and civil liberties, showed an overall decline. Indeed, acceptance of democracy as the world’s dominant form of government—and of an international system built on democratic ideals—is under greater threat than at any point in the last 25 years.

Even after such a long period of mounting pressure on democracy, developments in 2014 were exceptionally grim. The report’s findings show that nearly twice as many countries suffered declines as registered gains, 61 to 33, with the number of gains hitting its lowest point since the nine-year erosion began.

Once again, I’m going to respond to the report’s release by arguing that things aren’t nearly as bad as Freedom House describes them, even according to their own data. As I see it, the nine-year trend of “mounting pressure on democracy” isn’t really much of a thing, and it certainly hasn’t brought the return of the “iron fist.”

Freedom House measures freedom on two dimensions: political rights and civil liberties. Both are measured on a seven-point scale, with lower numbers indicating more freedom. (You can read more about the methodology here.) I like to use heat maps to visualize change over time in the global distribution of countries on these scales. In these heat maps, the quantities being visualized are the proportion of countries worldwide landing in each cell of the 7 x 7 grid defined by juxtaposing those two scales. The darker the color, the higher the proportion.

Let’s start with one for 1972, the first year Freedom House observes and just before the start of the so-called third wave of democratic transitions, to establish a baseline. At this point, the landscape has two distinct peaks, in the most– and least-free corners, and authoritarian regimes actually outnumber democracies.

fhm.1972

Okay, now let’s jump ahead to 1986, the eve of what some observers describe as the fourth wave of democratic transitions that swept the Warsaw Pact countries and sub-Saharan Africa. At this point, the world doesn’t look much different. The “least free” peak (lower left) has spread a bit, but there are still an awful lot of countries on that side of the midline, and the “most free” peak (upper right) is basically unchanged.

fhm.1986

A decade later, though, things look pretty different. By 1995, the “most free” peak has gotten taller and broader, the “least free” peak has eroded further, and there’s now a third peak of sorts in the middle, centered on 4/4.

fhm.1995

Jump ahead another 10 years, to 2005, and the landscape has tilted decisively toward democracy. The “least free” peak is no more, the bump in the middle has shifted up and right, and the “most free” peak now dominates the landscape.

fhm.2005

That last image comes not long before the run of nine years of consecutive declines in freedom described in the 2015 report. From the report’s subtitle and narrative, you might think the landscape had clearly shifted—maybe not all the way back to the one we saw in the 1970s or even the 1980s, but perhaps to something more like the mid-1990s, when we still had a clear authoritarian peak and a lot of countries seemed to be sliding between the two poles. Well, here’s the actual image:

fhm.2014

That landscape looks a lot more like 2005 than 1995, and it looks nothing like the 1970s or 1980s. The liberal democratic peak still dominates, and the authoritarian peak is still gone. The clumps at 3/3 and 6/5 seem to have slumped a little in the wrong direction, but there are not significant new accretions in any spot. In a better world, we would have seen continued migration toward the “most free” corner over the past decade, but the absence of further improvement is hardly the kind of rollback that phrases like “discarding democracy” and “return to the iron fist” seem to imply.

Freedom House’s topline message is also belied by the trend over time in its count of electoral democracies—that is, countries that hold mostly free and fair elections for the offices that actually make policy. By Freedom House’s own count, the number of electoral democracies around the world actually increased by three in 2014 to an all-time high of 125, or more than two-thirds of all countries. Here’s the online version of their chart of that trend (from this page):

fh.electoraldemocracies.2015

Again, I’m having a hard time seeing democracy being “discarded” in that plot.

So how can both of these things be true? How can the number of electoral democracies grow over a period when annual declines in freedom scores outnumber annual gains?

The answer is that those declines are often occurring in countries that are already governed by authoritarian regimes, and they are often small in size. Meanwhile, some countries are still making jumps from autocracy to democracy that are usually larger in scale than the incremental declines and thus mostly offset the losses in the global tally. So, while those declines are surely bad for the citizens suffering through them, they rarely move countries from one side of the ledger to the other, and they have only a modest effect on the overall level of “freedom” in the system.

This year’s update on the Middle East shows what I mean. In its report, Freedom House identifies only one country in that region that made significant gains in freedom in 2014—Tunisia—against seven that saw declines: Bahrain, Egypt, Iraq, Lebanon, Libya, Syria, and Yemen. All seven of those decliners were already on the authoritarian side of the ledger going into 2014, however, and only four of the declines were large enough to move a country’s rating on one or both of the relevant indices. Meanwhile, Tunisia jumped up two points on political rights in one year, and since 2011 its combined score (political rights + civil liberties) has gone up eight points, from 12 to 4. We see similar patterns in the declines in Eurasia, where nearly all countries already clustered around the “least free” pole, and sub-Saharan Africa, where only one country moved down into Freedom House’s Not Free category (Uganda) and two returned to the set of electoral democracies after holding reasonably fair and competitive elections (Guinea-Bissau and Madgascar).

In short, I continue to believe that Freedom House’s presentation of trends over time in political rights and civil liberties is much gloomier than the world its own data portray. Part of me feels like a jerk for saying so, because I recognize that Freedom House’s messaging is meant to be advocacy, not science, and I support the goals that advocacy is meant to achieve. As a social scientist, though, I also think it’s important that our analyses and decisions be informed by as accurate a sketch of the world as we can draw, so I will keep mumbling into this particular gale.

PS. If you want Freedom House’s data in .csv format, I’ve posted a version of them—including the 2014 updates, which I entered by hand this morning—on my Google Drive, here.

PPS. If you’re curious where I think these trends might be headed in the next 10 years, see this recent post.

PPPS. The day after I ran this post, I published another in which I tried to think of better ways to measure what Freedom House purports to describe in its annual reports. You can read it here.

Follow

Get every new post delivered to your Inbox.

Join 10,318 other followers

%d bloggers like this: