The Political Context of Political Forecasting

In Seeing Like a State, James Scott describes how governments have tried to make their societies more legible in pursuit of their basic organizational mission—”to arrange the population in ways that simplified the classic state functions of taxation, conscription, and prevention of rebellion.”

These state simplifications, the basic givens of modern statecraft, were, I began to realize, rather like abridged maps. They did not successfully represent the actual activity of the society they depicted, nor were they intended to; they represented only that slice of it that interested the official observer. They were, moreover, not just maps. Rather, they were maps that, when allied with state power, would enable much of the reality they depicted to be remade.

Statistical forecasts of political events are a form of legibility, too—an abridged map—with all the potential benefits and issues Scott identifies. Most of the time, the forecasts we generate focus on events or processes of concern to national governments and other already-powerful entities, like multinational firms and capital funds. These organizations are the ones who can afford to invest in such work, who stand to benefit most from it, and who won’t get in trouble for doing so. We talk about events “of interest” or “of concern” but rarely ask ourselves out loud: “Of interest to whom?” Sometimes we literally map our forecasts, but even when we don’t, the point of our work is usually to make the world more legible for organizations that are already wealthy or powerful so that they can better protect and expand their wealth and power.

If we’re doing our work as modelers right, then the algorithms we build to generate these forecasts will summarize our best ideas about things that cause or predict those events. Those ideas do not emerge in a vacuum. Instead, they are part of a larger intellectual and informational ecosystem that is also shaped by those same powerful organizations. Ideology and ideation cannot be fully separated.

In political forecasting, it’s not uncommon to have something that we believe to be usefully predictive but can’t include in our models because we don’t have data that reliably describe it. These gaps are not arbitrary. Sometimes they reflect technical, bureaucratic, or conceptual barriers, but sometimes they don’t. For example, no model of civil conflict can be close to “true” without including information about foreign support for governments and their challengers, but a lot of that support is deliberately hidden. Some of the same organizations that ask us to predict accurately hide from us some of the information we need most to do that.

Some of us try to escape the moral consequences of serving powerful organizations whose actions we don’t always endorse by making our work available to the public. If we share the forecasts with everyone, our (my) thinking goes, then we aren’t serving a particular master. Instead, we are producing a public good, and public goods are inherently good—right?

There are two problems with that logic. First, most of the public doesn’t have the interest or capacity to act on those forecasts, so sharing the forecasts with them will usually have little effect on their behavior. Second, some of the states and organizations that consume our public forecasts will apply them to ends we don’t like. For example, a dictatorial regime might see a forecast that it is susceptible to a new wave of nonviolent protest and respond by repressing harder. So, the practical effects of broadcasting our work will usually be modest, and some of them could even be harmful.

I know all of this, and I continue to do the work I do because it challenges and interests me, it pays well, and, I believe, some of it can help people do good. Still, I think it’s important periodically to remind ourselves—myself—that there is no escape from the moral consequences of this work, only trade-offs.

On Revolution, Theory or Ideology?

Humans understand and explain through stories, and the stories we in the US tell about why people rebel against their governments usually revolve around deprivation and injustice. In the prevailing narratives, rebellion occurs when states either actively make people suffer or passively fail to alleviate their suffering. Rebels in the American colonies made this connection explicit in the Declaration of Independence. This is also how we remember and understand lots of other rebellions we “like” and the figures who led them, from Moses to Robin Hood to Nelson Mandela.

As predictors of revolution, though, deprivation and injustice don’t fare so well. A chart in a recent Bloomberg Business piece on “the 15 most miserable economies in the world” got me thinking about this again. The chart shows the countries that score highest on a crude metric that sums a country’s unemployment rate and annual change in its consumer price index. Here are the results for 2015:

Of the 15 countries on that list, only two—Ukraine and Colombia—have ongoing civil wars, and it’s pretty hard to construe current unemployment or inflation as relevant causes in either case. Colombia’s civil war has run for decades. Ukraine’s war isn’t so civil (<cough> Russia <cough>), and this year’s spike in unemployment and inflation are probably more consequences than causes of that fighting. Frankly, I’m surprised that Venezuela hasn’t seen a sustained, large-scale challenge to its government since Hugo Chavez’s death and wonder if this year will prove different. But, so far, it hasn’t. Ditto for South Africa, where labor actions have at least hinted the potential for wider rebellion.

That chart, in turn, reminded me of a 2011 New York Times column by Charles Blow called “The Kindling of Change,” on the causes of revolutions in the Middle East and North Africa.  Blow wrote, “It is impossible to know exactly which embers spark a revolution, but it’s not so hard to measure the conditions that make a country prime for one.” As evidence, he offered the following table comparing countries in the region on several “conditions”:

The chart, and the language that precede it, seem to imply that these factors are ones that obviously “prime” countries for revolution. If that’s true, though, then why didn’t we see revolutions in the past few years in Algeria, Morocco,  Sudan, Jordan, and Iran? Morocco and Sudan saw smaller protest waves that failed to produce revolutions, but so did Kuwait and Bahrain. And why did Syria unravel while those others didn’t? It’s true that poorer countries are more susceptible to rebellions than richer ones, but it’s also true that poor countries are historically common and rebellions are not.

All of which makes me wonder how much our theories of rebellion are really theories at all, and not more awkward blends of selective observation and ideology. Maybe we believe that injustice explains rebellion because we want to live in a universe in which justice triumphs and injustice gets punished. When violent or nonviolent rebellions erupt, we often watch and listen to the participants enumerate grievances about poverty and indignity and take those claims as evidence of underlying causes. We do this even though we know that humans are unreliable archivists and interpreters of their own behavior and motivations, and that we could elicit similar tales of poverty and indignity from many, many more people who are not rebelling in those societies and others. If a recent study generalizes, then we in the US and other rich democracies are also consuming news that systematically casts rebels in a more favorable light than governments during episodes of protest and civil conflict abroad.

Meanwhile, when rebel groups don’t fit our profile as agents of justice, we rarely expand our theories of revolution to account for these deviant cases. Instead, we classify the organizations as “terrorists”, “radicals”, or “criminals” and explain their behavior in some other way, usually one that emphasizes flaws in the character or beliefs of the participants or manipulations of them by other nefarious agents. Boko Haram and the Islamic State are rebel groups in any basic sense of that term, but our explanations of their emergence often emphasize indoctrination instead of injustice. Why?

I don’t mean to suggest that misery, dignity, and rebellion are entirely uncoupled. Socioeconomic and emotional misery may and probably do contribute in some ways to the emergence of rebellion, even if they aren’t even close to sufficient causes of it. (For some deeper thinking on the causal significance of social structure, see this recent post by Daniel Little.)

Instead, I think I mean this post to serve as plea to avoid the simple versions of those stories, at least when we’re trying to function as explainers and not activists or rebels ourselves. In light of what we think we know about confirmation bias and cognitive dissonance, the fact that a particular explanation harmonizes with our values and makes us feel good should not be mistaken for evidence of its truth. If anything, it should motivate us to try harder to break it.

If At First You Don’t Succeed

A couple of weeks ago, I blogged about a failed attempt to do some exploratory text-mining on the US National Security Strategy reports (here). That project was supposed to give me a fun way to learn the basics of text mining in R, something I’ve been eager to do of late. In writing the blog post, I had two motives: 1) to help normalize the experience of getting stuck and failing in social science and data science, and 2) to appeal for help from more experienced coders who could help get me unstuck on this particular task.

The post succeeded on both counts. I won’t pepper you with evidence on the commiseration front, but I am excited to share the results of the coding improvements. In addition to learning how to text-mine, I have also been trying to learn how to use RStudio and Shiny to build interactive apps, and this project seemed like a good one to do both. So, I’ve created an app that lets users explore this corpus in three ways:

  • Plot word counts over time to see how the use of certain terms has waxed and waned over the 28 years the reports span.
  • Generate word clouds showing the 50 most common words in each of the 16 reports.
  • Explore associations between terms by picking one and see which 10 others are most closely correlated with it in the entire corpus.

For example, here’s a plot of change over time in the relative frequency of the term ‘terror’. Its usage spikes after 9/11 and then falls sharply when Barack Obama replaces George W. Bush as president.

NSS terror time trend

That pattern contrasts sharply with references to climate, which rarely gets mentioned until the Obama presidency, when its usage spikes upward. (Note, though, that the y-axis has been rescaled from the previous chart, so this large increase still has ‘climat’ only appearing about half as often as ‘terror’.)

NSS climat time trend

And here’s a word cloud of the 50 most common terms from the first US National Security Strategy, published in 1987. Surprise! The Soviet Union dominates the monologue.

NSS 1987 word cloud

When I built an initial version of the app a couple of Sundays ago, I promptly launched it on shinyapps.io to try to show it off. Unfortunately, the Shiny server only gives you 25 hours of free usage per billing cycle, and when I tweeted about the app, it got so much attention that those hours disappeared in a little over a day!

I don’t have my own server to host this thing, and I’m not sure when Shiny’s billing cycle refreshes. So, for the moment, I can’t link to a permanently working version of the app. If anyone reading this post is interested in hosting the app on a semi-permanent basis, please drop me a line at ulfelder <at> gmail. Meanwhile, R users can launch the app from their terminals with these two lines of code, assuming the ‘shiny’ package is already installed:

library(shiny)
runGitHub("national-security-strategy", "ulfelder")

You can also find all of the texts and code used in the app and some other stuff (e.g., the nss.explore.R script also implements topic modeling) in that GitHub repository, here.

A Good Dream

The novel Station Eleven—an immediate addition to my short list of favorite books—imagines the world after the global political economy has disintegrated. A flu pandemic has killed almost all humans, and the ones who remain inhabit the kinds of traveling bands or small encampments that are only vaguely familiar to most of us. There is no gasoline, no Internet, no electricity.

“I dreamt last night I saw an airplane,” Dieter whispered. They were lying a few feet apart in the dark of the tent. They had only ever been friends—in a hazy way Kirsten thought of him as family—but her thirty-year-old tent had finally fallen apart a year ago and she hadn’t yet managed to find a new one. For obvious reasons she was no longer sharing a tent with Sayid, so Dieter, who had one of the largest tents in the Symphony, had been hosting her. Kirsten heard soft voices outside, the tuba and the first violin on watch. The restless movements of the horses, penned between the three caravans for safety.

“I haven’t thought of an airplane in so long.”

“That’s because you’re so young.” A slight edge to his voice. “You don’t remember anything.”

“I do remember things. Of course I do. I was eight.”

Dieter had been twenty years old when the world ended. The main difference between Dieter and Kirsten was that Dieter remembered everything. She listened to him breathe.

“I used to watch for it,” he said. “I used to think about the countries on the other side of the ocean, wonder if any of them had somehow been spared. If I ever saw an airplane, that meant that somewhere planes still took off. For a whole decade after the pandemic, I kept looking at the sky.”

“Was it a good dream?”

“In the dream I was so happy,” he whispered. “I looked up and there it was, the plane had finally come. There was still a civilization somewhere. I fell to my knees. I started weeping and laughing, and then I woke up.”

Leaving New Orleans by jet yesterday morning only a couple of weeks after reading that book, flying—with wifi on a tablet!—felt miraculous again. As we lifted away from the airport an hour after sunrise on a clear day, I could see a dozen freighters lined up on the Mississippi, a vast industrial plant of some kind billowing steam on the adjacent shore, a railway spreading like capillaries as it ran out of the plant.

As we inhabit that world, it feels inevitable, but it was not. Our political economy is as natural as a termite mound, but it did not have to arise and cohere, to turn out like this—to turn out at all.

Nor does it have to persist. The first and only other time I visited New Orleans was in 2010, for the same conference in the same part of town—the Warehouse District, next to the river. Back then, a little closer to Katrina, visual reminders of the flood that already happened gave that part of the city an eerie feel. I stayed in hotel a half-mile south of the conference venue, and the walk to the Hilton led me past whole blocks that were still mostly empty, fresh coats of bright paint covering the facades that water had submerged five years before.

Now, with pictures in the news of tunnels scratched out of huge snow banks in Boston and Manhattan ringed by ice, it’s the future flood that haunts New Orleans in my mind as I walk back from an excursion to the French Quarter to get the best possible version of a drink made from boiled water and beans grown thousands of miles away, scores of Mardi Gras bead strings still hanging from some gutters. Climate change is “weirding” our weather, rendering the models we use to anticipate events like Katrina less and less reliable. A flood will happen again, probably sooner than we expect, and yet here everybody is, returning and rebuilding and cavorting right where all that water will want to go.

Self Points

For the second year in a row, Dart-Throwing Chimp won Best Blog (Individual) at the Online Achievement in International Studies awards, a.k.a. the Duckies (see below). Thank you for continuing to read and, apparently, for voting.

Duckie 2015

Before the awards, Eva Brittin-Snell, a student of IR at University of Sussex, interviewed a few of last year’s winners, including me, about blogging on international affairs. You can read her post on the SAGE Connection blog, here.

On Evaluating and Presenting Forecasts

On Saturday afternoon, at the International Studies Association‘s annual conference in New Orleans, I’m slated to participate in a round-table discussion with Patrick Brandt, Kristian Gleditsch, and Håvard Hegre on “assessing forecasts of (rare) international events.” Our chair, Andy Halterman, gave us three questions he’d like to discuss:

  1. What are the quantitative assessments of forecasting accuracy that people should use when they publish forecasts?
  2. What’s the process that should be used to determine whether a gain in a scoring metric represents an actual improvement in the model?
  3. How should model uncertainty and past model performance be conveyed to government or non-academic users of forecasts?

As I did for a Friday panel on alternative academic careers (here), I thought I’d use the blog to organize my thoughts ahead of the event and share them with folks who are interested but can’t attend the round table. So, here goes:

When assessing predictive power, we use perfection as the default benchmark. We ask, “Was she right?” or “How close to true value did he get?”

In fields where predictive accuracy is already very good, this approach seems reasonable. When the object of the forecasts are rare international events, however, I think this is a mistake, or at least misleading. It implies that perfection is attainable, and that distance from perfection is what we care about. In fact, approximating perfection is not a realistic goal in many fields, and what we really care about in those situations is distance from the available alternatives. In other words, I think we should always assess accuracy in comparative terms, not absolute ones. So, the question becomes: “Compared to what?”

I can think of two situations in which we’d want to forecast international events, and the ways we assess and describe the accuracy of the results will differ across the two. First, there is basic research, where the goal is to develop and test theory. This is what most scholars are doing most of the time, and here the benchmark should be other relevant theories. We want compare predictive power across nested models or models representing competing hypotheses to see which version does a better job anticipating real-world behavior—and, by implication, explaining it.

Then, of course, there is applied research, where the goal is to support or improve some decision process. Policy-making and investing are probably the two most common ones. Here, the benchmark should be the status quo. What we want to know is: “How much does the proposed forecasting process improve on the one(s) used now?” If the status quo is unclear, that already tells you something important about the state of forecasting in that field—namely, that it probably isn’t great. Even in that case, though, I think it’s still important to pick a benchmark that’s more realistic than perfection. Depending on the rarity of the event in question, that will usually mean either random guessing (for frequent events) or base rates (for rare ones).

How we communicate our findings on predictive power will also differ across basic and applied research, or at least I think it should. This has less to do with the goals of the work than it does with the audiences at which they’re usually aimed. When the audience is other scholars, I think it’s reasonable to expect them to understand the statistics and, so, to use those. For frequent events, Brier or logarithmic scores are often best, whereas for rare events I find that AUC scores are usually more informative, and I know a lot of people like to use F-1 scores in this context, too.

In applied settings, though, we’re usually doing the work as a service to someone else who probably doesn’t know the mechanics of the relevant statistics and shouldn’t be expected to. In my experience, it’s a bad idea in these settings to try to educate your customer on the spot about things like Brier or AUC scores. They don’t need to know those statistics, and you’re liable to come across as aloof or even condescending if you presume to spend time teaching them. Instead, I’d recommend using the practical problem they’re asking you to help solve to frame your representation of your predictive power. Propose a realistic decision process—or, if you can, take the one they’re already using—and describe the results you’d get if you plugged your forecasts into it.

In applied contexts, people often will also want to know how your process performed on crucial cases and what events would have surprised it, so it’s good to be prepared to talk about those as well. These topics are germane to basic research, too, but crucial cases will be defined differently in the two contexts. For scholars, crucial cases are usually understood as most-likely and least-likely ones in relation to the theory being tested. For policy-makers and other applied audiences, the crucial cases are usually understood as the ones where surprise was or would have been costliest.

So that’s how I think about assessing and describing the accuracy of forecasts of the kinds of (rare) international events a lot of us study. Now, if you’ll indulge me, I’d like to close with a pedantic plea: Can we please reserve the terms “forecast” and “prediction” for statements about things that haven’t happened and not apply them to estimates we generate for cases with known outcomes?

This might seem like a petty concern, but it’s actually tied to the philosophy of knowledge that underpins science, or my understanding of it, anyway. Making predictions about things that haven’t already happened is a crucial part of the scientific method. To learn from prediction, we assume that a model’s forecasting power tells us something about its proximity to the “true” data-generating process. This assumption won’t always be right, but it’s proven pretty useful over the past few centuries, so I’m okay sticking with it for now. For obvious reasons, it’s much easier to make accurate “predictions” about cases with known outcomes than unknown ones, so the scientific value of the two endeavors is very different. In light of that fact, I think we should be as clear and honest with ourselves and our audiences as we can about which one we’re doing, and therefore how much we’re learning.

When we’re doing this stuff in practice, there are three basic modes: 1) in-sample fitting, 2) cross-validation (CV), and 3) forecasting. In-sample fitting is the least informative of the three and, in my opinion, really should only be used in exploratory analysis and should not be reported in finished work. It tells us a lot more about the method than the phenomenon of interest.

CV is usually more informative than in-sample fitting, but not always. Each iteration of CV on the same data set moves you a little closer to in-sample fitting, because you effectively train to the idiosyncrasies of your chosen test set. Using multiple iterations of CV may ameliorate this problem, but it doesn’t always eliminate it. And on topics where the available data have already been worked to death—as they have on many problems of interest to scholars of international affairs—cross-validation really isn’t much more informative than in-sample fitting unless you’ve got a brand-new data series you can throw at the task and are focused on learning about it.

True forecasting—making clear statements about things that haven’t happened yet and then seeing how they turn out—is uniquely informative in this regard, so I think it’s important to reserve that term for the situations where that’s actually what we’re doing. When we describe in-sample and cross-validation estimates as forecasts, we confuse our readers, and we risk confusing ourselves about how much we’re really learning.

Of course, that’s easier for some phenomena than it is for others. If your theory concerns the risk of interstate wars, for example, you’re probably (and thankfully) not going to get a lot of opportunities to test it through prediction. Rather than sweeping those issues under the rug, though, I think we should recognize them for what they are. They are not an excuse to elide the huge differences between prediction and fitting models to history. Instead, they are a big haymaker of a reminder that social science is especially hard—not because humans are uniquely unpredictable, but rather because we only have the one grand and always-running experiment to observe, and we and our work are part of it.

“No One Stayed to Count the Bodies”

If you want to understand and appreciate why, even in the age of the Internet and satellites and near-ubiquitous mobile telephony, it remains impossible to measure even the coarsest manifestations of political violence with any precision, read this blog post by Phil Hazlewood, AFP’s bureau chief in Lagos. (Warning: graphic. H/t to Ryan Cummings on Twitter.)

Hazlewood’s post focuses on killings perpetrated by Boko Haram, but the same issues arise in measuring violence committed by states. Violence sometimes eliminates some people who might describe the acts involved, and it intentionally scares many others. If you hear or see details of what happened, that’s often because the killers or their rivals for power wanted you to hear or see those details. We cannot sharply distinguish between the communication of those facts and the political intentions expressed in the violence or the reactions to it. The conversation is the message, and the violence is part of the conversation.

When you see or hear things in spite of those efforts to conceal them, you have to wonder how selection effects limit or distort the information that gets through. North Korea’s gulag system apparently contains thousands and kills some untold numbers each year. Defectors are the outside world’s main source of information about that system, but those defectors are not a random sample of victims, nor are they mechanical recording devices. Instead, they are human beings who have somehow escaped that country and who are now seeking to draw attention to and destroy that system. I do not doubt the basic truth of the gulags’ existence and the horrible things done there, but as a social scientist, I have to consider how those selection processes and motivations shape what we think we know. In the United States, we lack reliable data on fatal encounters with police. That’s partly because different jurisdictions have different capabilities for recording and reporting these incidents, but it’s also partly because some people in that system do not want us to see what they do.

For a previous post of mine on this topic, see “The Fog of War Is Patchy“.

 

Some Thoughts on “Alternative Academic Careers”

I’m headed to New Orleans next week for the the annual convention of the International Studies Association (ISA), and while there I’m scheduled to speak on a panel on “alternative academic careers” (Friday at 1:45 PM). To help organize my views on the subject and to share them with people who are curious but can’t attend the panel, I thought I would turn them into a blog post. So:

Let me start with some caveats. I am a white, male U.S. citizen who grew up in a family that wasn’t poor and who married a woman while in grad school. I would prefer to deal in statistics, but on this topic, I can only really speak from experience, and that experience has been conditioned by those personal characteristics. In other words, I know my view is narrow and biased, but on this topic, that view is all I’ve got, so take it for whatever you think it’s worth.

My own career trajectory has been, I think, unusual. I got my Ph.D. from Stanford in the spring of 1997. At the time, I had no academic job offers; I had a spouse who wanted to go to art school; we had two dogs and couldn’t find a place we could afford that would rent to us in the SF Bay Area (this was the leading edge of the dot-com boom); and we both had family in the DC metro area. So, we packed up and moved in with my mother-in-law in Baltimore, and I started looking for work in and around Washington.

It took me a few months to land a job as an analyst in a little branch of a big forensic accounting firm, basically writing short pieces on political risk for a newsletter that went out to the firm’s corporate clients. We moved to the DC suburbs and I did that for about a year until I got a job with a small government contractor that did research projects for the U.S. “intelligence community” and the Department of Defense. After a couple of years of that and the birth of our first son, I decided I needed a change, so I took a job writing book-length research reports on telecom firms and industry segments for a trade-news outfit. At the time, I was also doing some freelance feature writing on whatever I could successfully pitch, and I thought a full-time writing job would help me move faster in that direction.

After a couple of years of that and no serious traction on the writing front, I got a line through one of my dissertation committee members on a part-time consulting thing with a big government contractor, SAIC. I was offered and took that job, which soon evolved into a full-time, salaried position as research director for the Political Instability Task Force. I did that for 10 years, then left it at the end of 2011 to try freelancing as a social scientist. Most of my work time since then has been devoted to the Early Warning Project and the Good Judgment Project, with dribs and drabs on other things and enough free time to write this blog.

So that’s where I’m coming from. Now, here is what I think I’ve learned from those experiences, and from watching and talking to others with similar training who do related things.

First, freelancing—what I’m doing now, and what usually gets fancified as “independent consulting”—is not a realistic option for most social scientists early in their careers, and probably not for most people, period. I would not have landed either of the large assignments I’ve gotten in the past few years without the professional network and reputation I had accumulated over the previous ten. This blog and my activity on social media have helped me expand that network, but nearly all of the paid jobs I’ve done in the past few years have come through those earlier connections. Best I can tell, there are no realistic short cuts to this kind of role.

Think tanks aren’t really an option for new Ph.D.s, either. Most of the jobs in that world are for recent undergraduates or MAs on the one hand and established scholars and practitioners on the other. There are some opportunities for new Ph.Ds looking to do straight-up analysis at places like RAND and the Congressional Research Service, but the niche is tiny, and those jobs will be very hard to land.

That brings me to the segment I know best, namely, government contracting. Budget cuts mean that jobs in that market are scarcer than they were a few years ago, but there are still lots of them. In that world, though, hiring is often linked to specific roles in contracts that have already been awarded. You’re not paid to pursue your own interests or write policy briefs; you’re usually hired to do tasks X and Y on contract Z, with the expectation that you’ll also be able to apply those skills to other, similar contracts after Z runs out. So, to land these jobs, you need to have abilities that can plug into clearly-defined contract tasks, but that also make you a reasonably good risk overall. Nowadays, programming and statistical (“data science”) skills are in demand, but so are others, including fluency in foreign languages and area expertise. If you want to get a feel for what’s valued in that world, spend some time looking at job listings from the big contracting firms—places like Booz Allen Hamilton, Lockheed Martin, SAIC, and Leidos—and see what they’re asking for.

If you do that, you’ll quickly discover that having a security clearance is a huge plus. If you think you want to do government-contract work, you can give yourself a leg up by finding some way to get that clearance while you’re still in graduate school. If you can’t do that, you might consider looking for work that isn’t your first choice on substance but will lead to a clearance in hopes of making an upward or lateral move later on.

All of that said, it’s important to think carefully about the down sides of a security clearance before you pursue one. A clearance opens some doors, but it closes others, some permanently. The process can take a long time, and it can be uncomfortable. When you get a clearance, you accept some constraints on your speech, and those never entirely fall away, even if you leave that world. The fact that you currently or once held a clearance and worked with agencies that required one will also make a certain impression on some people, and that never fully goes away, either. So it’s worth thinking carefully about those long-term implications before you jump in.

If you do go to work on either side of that fence—for a contractor, or directly for the government—you need to be prepared to work on topics on which you’re not a already an expert. It’s often your analytic skills they’re after, not your topical expertise, so you should be prepared to stretch that way. Maybe you’ll occasionally or eventually work back to your original research interests, maybe you’ll discover new ones, or maybe you’ll get bored or frustrated and restart your job search. In all cases, the process will go better if you’re prepared for any of the above.

Last but not least, if you’re not going into academia, I think it can help to know that your first job after grad school does not make or break your career. I don’t have data to support this claim, but my own experience and my observation of others tells me that nonacademic careers are not nearly as path dependent as academic ones. That can be scarier in some ways, but it’s also potentially liberating. Within the nontrivial constraints of the market, you can keep reinventing yourself, and I happen to think that’s great.

A Tale of Normal Failure

When I blog about my own research, I usually describe work I’ve already completed and focus on the results. This post is about a recent effort that ended in frustration, and it focuses on the process. In writing about this aborted project, I have two hopes: 1) to reassure other researchers (and myself) that this kind of failure is normal, and 2) if I’m lucky, to get some help with this task.

This particular ball got rolling a couple of days ago when I read a blog post by Dan Drezner about one aspect of the Obama administration’s new National Security Strategy (NSS) report. A few words in the bits Dan quoted got me thinking about the worldview they represented, and how we might use natural-language processing (NLP) to study that:

At first, I was just going to drop that awkwardly numbered tweetstorm and leave it there. I had some time that afternoon, though, and I’ve been looking for opportunities to learn text mining, so I decided to see what I could do. The NSS reports only became a thing in 1987, so there are still just 16 of them, and they all try to answer the same basic questions: What threats and opportunities does the US face in the world, and what should the government do to meet them? As such, they seemed like the kind of manageable and coherent corpus that would make for a nice training exercise.

I started by checking to see if anyone had already done with earlier reports what I was hoping to do with the latest one. It turned out that someone had, and to good effect:

I promptly emailed the corresponding author to ask if they had replication materials, or even just clean versions of the texts for all previous years. I got an autoreply informing me that the author was on sabbatical and would only intermittently be reading his email. (He replied the next day to say that he would put the question to his co-authors, but that still didn’t solve my problem, and by then I’d moved on anyway.)

Without those materials, I would need to start by getting the documents in the proper format. A little Googling led me to the National Security Strategy Archive, which at the time had PDFs of all but the newest report, and that one was easy enough to find on the White House’s web site. Another search led me to a site that converts PDFs to plain text online for free. I spent the next hour or so running those reports through the converter (and playing a little Crossy Road on my phone while I waited for the jobs to finish). Once I had the reports as .txt files, I figured I could organize my work better and do other researchers a solid by putting them all in a public repository, so I set one up on GitHub (here) and cloned it to my hard drive.

At that point, I was getting excited, thinking: “Hey, this isn’t so hard after all.” In most of the work I do, getting the data is the toughest part, and I already had all the documents I wanted in the format I needed. I was just a few lines of code away from the statistics and plots and that would confirm or infirm my conjectures.

From another recent collaboration, I knew that the next step would be to use some software to ingest those .txt files, scrub them a few different ways, and then generate some word counts and maybe do some topic modeling to explore changes over time in the reports’ contents. I’d heard several people say that Python is really good at these tasks, but I’m an R guy, so I followed the lead on the CRAN Task View for natural language processing and installed and loaded the ‘tm’ package for text mining.

And that’s where the wheels started to come off of my rickety little wagon. Using the package developers’ vignette and an article they published in the Journal of Statistical Software, I started tinkering with some code. After a couple of false starts, I found that I could create a corpus and run some common preprocessing tasks on it without too much trouble, but I couldn’t get the analytical functions to run on the results. Instead, I kept getting this error message:

Error: inherits(doc, "TextDocument") is not TRUE

By then it was dinner time, so I called it a day and went to listen to my sons holler at each other across the table for a while.

When I picked the task back up the next morning, I inspected a few of the scrubbed documents and saw some strange character strings—things like ir1 instead of in and ’ where an apostrophe should be. That got me wondering if the problem lay in the encoding of those .txt files. Unfortunately, neither the files themselves nor the site that produced them tell me which encoding they use. I ran through a bunch of options, but none of them fixed the problem.

“Okay, no worries,” I thought. “I’ll use gsub() to replace those funky bits in the strings by hand.” The commands ran without a hiccup, but the text didn’t change. Stranger, when I tried to inspect documents in the R terminal, the same command wouldn’t always produce the same result. Sometimes I’d get the head, and sometimes the tail. I tried moving back a step in the process and installed a PDF converter that I could run from R, but R couldn’t find the converter, and my attempts to fix that failed.

At this point, I was about ready to quit, and I tweeted some of that frustration. Igor Brigadir quickly replied to suggest a solution, but it involved another programming language, Python, that I don’t know:

To go that route, I would need to start learning Python. That’s probably a good idea for the long run, but it wasn’t going to happen this week. Then Ken Benoit pointed me toward a new R package he’s developing and even offered to help me :

That sounded promising, so I opened R again and followed the clear instructions on the README at Ken’s repository to install the package. Of course the installation failed, probably because I’m still using R Version 3.1.1 and the package is, I suspect, written for the latest release, 3.1.2.

And that’s where I finally quit—for now. I’d hit a wall, and all my usual strategies for working through or around it had either failed or led to solutions that would require a lot more work. If I were getting paid and on deadline, I’d keep hacking away, but this was supposed to be a “fun” project for my own edification. What seemed at first like a tidy exercise had turned into a tar baby, and I needed to move on.

This cycle of frustration –> problem-solving –> frustration might seem like a distraction from the real business of social science, but in my experience, it is the real business. Unless I’m performing a variation on a familiar task with familiar data, this is normal. It might be boring to read, but then most of the day-to-day work of social science probably is, or at least looks that way to the people who aren’t doing it and therefore can’t see how all those little steps fit into the bigger picture.

So that’s my tale of minor woe. Now, if anyone who actually knows how to do text-mining in R is inspired to help me figure out what I’m doing wrong on that National Security Strategy project, please take a look at that GitHub repo and the script posted there and let me know what you see.

Demography and Democracy Revisited

Last spring on this blog, I used Richard Cincotta’s work on age structure to take another look at the relationship between democracy and “development” (here). In his predictive models of democratization, Rich uses variation in median age as a proxy for a syndrome of socioeconomic changes we sometimes call “modernization” and argues that “a country’s chances for meaningful democracy increase as its population ages.” Rich’s models have produced some unconventional predictions that have turned out well, and if you buy the scientific method, this apparent predictive power implies that the underlying theory holds some water.

Over the weekend, Rich sent me a spreadsheet with his annual estimates of median age for all countries from 1972 to 2015, so I decided to take my own look at the relationship between those estimates and the occurrence of democratic transitions. For the latter, I used a data set I constructed for PITF (here) that covers 1955–2010, giving me a period of observation running from 1972 to 2010. In this initial exploration, I focused specifically on switches from authoritarian rule to democracy, which are observed with a binary variable that covers all country-years where an autocracy was in place on January 1. That variable (rgjtdem) is coded 1 if a democratic regime came into being at some point during that calendar year and 0 otherwise. Between 1972 and 2010, 94 of those switches occurred worldwide. The data set also includes, among other things, a “clock” counting consecutive years of authoritarian rule and an indicator for whether or not the country has ever had a democratic regime before.

To assess the predictive power of median age and compare it to other measures of socioeconomic development, I used the base and caret packages in R to run 10 iterations of five-fold cross-validation on the following series of discrete-time hazard (logistic regression) models:

  • Base model. Any prior democracy (0/1), duration of autocracy (logged), and the product of the two.
  • GDP per capita. Base model plus the Maddison Project’s estimates of GDP per capita in 1990 Geary-Khamis dollars (here), logged.
  • Infant mortality. Base model plus the U.S. Census Bureau’s estimates of deaths under age 1 per 1,000 live births (here), logged.
  • Median age. Base model plus Cincotta’s estimates of median age, untransformed.

The chart below shows density plots and averages of the AUC scores (computed with ‘roc.area’ from the verification package) for each of those models across the 10 iterations of five-fold CV. Contrary to the conventional assumption that GDP per capita is a useful predictor of democratic transitions—How many papers have you read that tossed this measure into the model as a matter of course?—I find that the model with the Maddison Project measure actually makes slightly less accurate predictions than the one with duration and prior democracy alone. More relevant to this post, though, the two demographic measures clearly improve the predictions of democratic transitions relative to the base model, and median age adds a smidgen more predictive signal than infant mortality.

transit.auc.by.fold

Of course, all of these things—national wealth, infant mortality rates, and age structures—have also been changing pretty steadily in a single direction for decades, so it’s hard to untangle the effects of the covariates from other features of the world system that are also trending over time. To try to address that issue and to check for nonlinearity in the relationship, I used Simon Wood’s mgcv package in R to estimate a semiparametric logistic regression model with smoothing splines for year and median age alongside the indicator of prior democracy and regime duration. Plots of the marginal effects of year and median age estimated from that model are shown below. As the left-hand plot shows, the time effect is really a hump in risk that started in the late 1980s and peaked sharply in the early 1990s; it is not the across-the-board post–Cold War increase that we often see covered in models with a dummy variable for years after 1991. More germane to this post, though, we still see a marginal effect from median age, even when accounting for those generic effects of time. Consistent with Cincotta’s argument and other things being equal, countries with higher median age are more likely to transition to democracy than countries with younger populations.

transit.ageraw.effect.spline.with.year

I read these results as a partial affirmation of modernization theory—not the whole teleological and normative package, but the narrower empirical conjecture about a bundle of socioeconomic transformations that often co-occur and are associated with a higher likelihood of attempting and sustaining democratic government. Statistical studies of this idea (including my own) have produced varied results, but the analysis I’m describing here suggests that some of the null results may stem from the authors’ choice of measures. GDP per capita is actually a poor proxy for modernization; there are a number of ways countries can get richer, and not all of them foster (or are fostered by) the socioeconomic transformations that form the kernel of modernization theory (cf. Equatorial Guinea). By contrast, demographic measures like infant mortality rates and median age are more tightly coupled to those broader changes about which Seymour Martin Lipset originally wrote. And, according to my analysis, those demographic measures are also associated with a country’s propensity for democratic transition.

Shifting to the applied forecasting side, I think these results confirm that median age is a useful addition to models of regime transitions, and it seems capture more information about those propensities than GDP (by a lot) and infant mortality (by a little). Like all slow-changing structural indicators, though, median age is a blunt instrument. Annual forecasts based on it alone would be pretty clunky, and longer-term forecasts would do well to consider other domestic and international forces that also shape (and are shaped by) these changes.

PS. If you aren’t already familiar with modernization theory and want more background, this ungated piece by Sheri Berman for Foreign Affairs is pretty good: “What to Read on Modernization Theory.”

PPS. The code I used for this analysis is now on GitHub, here. It includes a link to the folder on my Google Drive with all of the required data sets.

Follow

Get every new post delivered to your Inbox.

Join 11,085 other followers

%d bloggers like this: