No, Pope Francis, this is not World War Three

In the homily to a mass given this morning in Italy, at a monument to 100,000 soldiers killed in World War I, Pope Francis said:

War is madness… Even today, after the second failure of another world war, perhaps one can speak of a third war, one fought piecemeal, with crimes, massacres, destruction.

There are a lot of awful things happening around the world, and I appreciate the pope’s advocacy for peace, but this comparison goes too far. Take a look at this chart of battle deaths from armed conflict around the world from 1900 to 2005, from a study by the Peace Research Institute of Oslo:

The chart doesn’t include the past decade, but we don’t need all the numbers in one place to see what a stretch this comparison is. Take Syria’s civil war, which has probably killed more than 150,000 (source) and perhaps as many as 300,000 or more people over the past three years, for an annual death rate of 50,000–100,000. That is a horrifying toll, but it is vastly lower than the annual rates in the several millions that occurred during the World Wars. Put another way, World War II was like 40 to 80 Syrian civil wars at once.

The many other wars of the present do not substantially close this gap. The civil war in Ukraine has killed approximately 3,000 so far (source). More than 2,000 people have died in the fighting associated with Israel’s Operation Protective Edge in Gaza this year (source). The resurgent civil war in Iraq dwarfs them both but still remains well below the intensity of the (interconnected) war next door (source). There are more than 20 other armed conflicts ongoing around the world, but most of them are much less lethal than the ones in Syria and Iraq, and their cumulative toll does not even begin to approach the ones that occurred in the World Wars (source).

I sympathize with the Pope’s intentions, but I don’t think that hyperbole is the best way to realize them. Of course, Pope Francis is not alone; we’ve been hearing a lot of this lately. I wonder if violence on the scale of the World Wars now lies so far outside of our lived experience that we simply cannot fathom it. Beyond some level of disorder, things simply become terrible, and all terrible things are alike. I also worry that the fear this apparent availability cascade is producing will drive other governments to react in ways that only make things worse.

The era of democratization is not over

In the latest issue of the Journal of Democracy, (PDF), Marc Plattner makes the provocative claim that “the era of democratic transitions is over, and should now become the province of the historians.” By that, he seems to mean that we should not expect new waves of democratization similar in form and scale to the ones that have occurred before. I think Plattner is wrong, in part because he has defined “wave” too broadly. If we tighten up that concept a bit, I think we can see at least a few possibilities for new waves in the not-too-distant future, and thus an extension of the now–long-running era of democratization.

In his essay, Plattner implicitly adopts the definition of waves of democratization described by Samuel Huntington on p. 15 of his influential 1991 book:

A wave of democratization is a group of transitions from nondemocratic to democratic regimes that occur within a specified period of time and that significantly outnumber transitions in the opposite direction during that period of time.

Much of what’s been written and said about waves of democratization since that book was published accepts those terms and the three waves Huntington identifies when he applies them to the historical evidence: one in Europe from the 1820s to the 1920s; another and wider one in Europe, Latin America, and Asia from the 1940s to the early 1960s; and a third and so-far final one that began in Portugal in 1974, has been global in scope, and now appears to have stalled or ended.

I find Huntington’s definition and resulting periodization wanting because they focus on the what and don’t pay enough attention to the why. A large number of transitions might occur around the same time because they share common underlying causes; because they cause and reinforce each other; or as a matter of chance, when independent events just happen to cluster. The third possibility is not scientifically interesting (cf. the Texas sharpshooter fallacy). More relevant here, though, I think the first two become banal if we let the time lag or chain of causality stretch too far. We inhabit a global system; at some level, everything causes, and is caused by, everything else. For the wave idea to be scientifically useful, we have to restrict its use to clusters of transitions that share common, temporally proximate causes and/or directly cause and reinforce each other.

By that definition, I think we can make out at least five and maybe more such waves since the early 1900s, not the three or maybe four we usually hear about.

First, as Plattner  (p. 9) points out, what Huntington describes as the “first, long” wave really includes two distinct clusters: 1) the “dozen or so European and European-settler countries that already had succeeded in establishing a fair degree of freedom and rule of law, and then moved into the democratic column by gradually extending the suffrage”; and 2) “countries that became democratic after World War I, many of them new nations born from the midst of the European empires defeated and destroyed during the war.”

The second (or now third?) wave grew out of World War II. Even though this wave was relatively short, it also included a few distinct sub-clusters: countries defeated in that war, countries born of decolonization, and a number of Latin American cases. This wave is more coherent, in that all of these sub-clusters were at least partially nudged along by the war’s dynamics and outcomes. It wouldn’t be unreasonable to split the so-called second wave into two clusters (war losers and newly independent states) and a clump of coincidences (Latin America), but there are enough direct linkages across those sets to see meaning in a larger wave, too.

As for the so-called third wave, I’m with Mike McFaul (here) and others who see at least two separate clusters in there. The wave of democratization that swept southern Europe and Latin America in the 1970s and early 1980s is temporally and causally distinct from the spate of transitions associated with the USSR’s reform and disintegration, so it makes no sense to talk of a coherent era spanning the past 40 years. Less clear is where to put the many democratic transitions—some successful, many others aborted or short lived—that occurred in Africa as Communist rule collapsed. Based partly on Robert Bates’ analysis (here), I am comfortable grouping them with the post-Communist cases. Trends in the global economy and the disappearance of the USSR as a patron state directly affected many of these countries, and political and social linkages within and across these regional sets also helped to make democratization contagious once it started.

So, based on that definition and its application, I think it’s fair to say that we have seen at least five waves of democratization in the past two centuries, and perhaps as many as six or seven.

Given that definition, I think it’s also easier to see possibilities for new waves, or “clusters” if we want to make clearer the distinction from conventional usage. Of course, the probability of any new waves is partially diminished by the success of the earlier ones. Nearly two-thirds of the world’s countries now have regimes that most observers would call democratic, so the pool of potential democratizers is substantially diminished. As Plattner puts it (p. 14), “The ‘low-hanging fruit’ has been picked.” Still, if we look for groups of authoritarian regimes that share enough political, economic, social, and cultural connections to allow common causes and contagion to kick in, then I think we can find some sets in which this dynamic could clearly happen again. I see three in particular.

The first and most obvious is in the Middle East and North Africa, the region that has proved most resistant to democratization to date. In fact, I think we already saw—or, arguably, are still seeing—the next wave of democratization in the form of the Arab Spring and its aftermath. So far, that cluster of popular uprisings and state collapses has only produced one persistently democratic state (Tunisia), but it has also produced a democratic interlude in Egypt; a series of competitively elected (albeit ineffective) governments in Libya; a nonviolent transfer of power between elected governments in Iraq; ongoing (albeit not particularly liberal) revolutions in Syria and Yemen; and sustained, liberal challenges to authoritarian rule in Bahrain, Kuwait, and, perhaps, Saudi Arabia. In other words, a lot of countries are involved, and it ain’t over yet. Most of the Soviet successor states never really made it all the way to democracy, but we still think of them as an important cluster of attempts at democratization. I think the Arab Spring fits the same mold.

Beyond that, though, I also see the possibility of a wave of regime breakdowns and attempts at democracy in Asia brought on by economic or political instability in China. Many of the autocracies that remain in that region—and there are many—depend directly or indirectly on Chinese patronage and trade, so any significant disruption in China’s political economy would send shock waves through their systems as well. I happen to think that systemic instability will probably hit China in the next few years (see here, here, and here), but the timing is less relevant here than the possibility of this turbulence, and thus of the wider wave of democratization it could help to produce.

Last and probably least in its scope and impact, I think we can also imagine a similar cluster occurring in Eurasia in response to instability in Russia. The number of countries enmeshed in this network is smaller, but the average strength of their ties is probably similar.

I won’t hazard guesses now about the timing and outcome of the latter two possibilities beyond what I’ve already written about China’s increasing fragility. As the Arab Spring has shown, even when we can spot the stresses, it’s very hard to anticipate when they’ll overwhelm the sources of negative feedback and what form the new equilibrium will take. What I hope I have already done, though, is to demonstrate that, contra Plattner, there’s plenty of room left in the system for fresh waves of democratization. In fact, I think we even have a pretty good sense of where and how those waves are most likely to come.

2014 NFL Football Season Predictions

Professional (American) football season starts tonight when the Green Bay Packers visit last year’s champs, the Seattle Seahawks, for a Thursday-night opener thing that still seems weird to me. (SUNDAY, people. Pro football is played on Sunday.) So, who’s likely to win?

With the final preseason scores from our pairwise wiki survey in hand, we can generate a prediction for that game, along with all 255 other regular-season contests on the 2014 schedule. As I described in a recent post, this wiki survey offers a novel way to crowdsource the problem of estimating team strength before the season starts. We can use last year’s preseason survey data and game results to estimate a simple statistical model that accounts for two teams’ strength differential and home-field advantage. Then, we can apply that model to this year’s survey results to get game-level forecasts.

In the last post, I used the initial model estimates to generate predicted net scores (home minus visitor) and confidence intervals. This time, I thought I’d push it a little further and use predictive simulations. Following Gelman and Hill’s Data Analysis Using Regression and Multilevel/Hierarchical Models (2009), I generated 1,000 simulated net scores for each game and then summarized the distributions of those scores to get my statistics of interest.

The means of those simulated net scores for each game represent point estimates of the outcome, and the variance of those distributions gives us another way to compute confidence intervals. Those means and confidence intervals closely approximate the ones we’d get from a one-shot application of the predictive model to the 2014 survey results, however, so there’s no real new information there.

What we can do with those distributions that is new is compute win probabilities. The share of simulated net scores above 0 gives us an estimate of the probability of a home-team win, and 1 minus that estimate gives us the probability of a visiting-team win.

A couple of pictures make this idea clearer. First, here’s a histogram of the simulated net scores for tonight’s Packers-Seahawks game. The Packers fared pretty well in the preseason wiki survey, ranking 5th overall with a score of 77.5 out of 100. The defending-champion Seahawks got the highest score in the survey, however—a whopping 92.6—and they have home-field advantage, which is worth about 3.1 points on average, according  to my model. In my predictive simulations, 673 of the 1,000 games had a net score above 0, suggesting a win probability of 67%, or 2:1 odds, in favor of the Seahawks. The mean predicted net score is 5.8, which is pretty darn close to the current spread of -5.5.

Seattle Seahawks.Green Bay Packers

Things look a little tighter for the Bengals-Ravens contest, which I’ll be attending with my younger son on Sunday in our once-annual pilgrimage to M&T Bank Stadium. The Ravens wound up 10th in the wiki survey with a score of 60.5, but the Bengals are just a few rungs down the ladder, in 13th, with a score of 54.7. Add in home-field advantage, though, and the simulations give the Ravens a win probability of 62%, or about 3:2 odds. Here, the mean net score is 3.6, noticeably higher than the current spread of -1.5 but on the same side of the win/loss line. (N.B. Because the two teams’ survey scores are so close, the tables turn when Cincinnati hosts in Week 8, and the predicted probability of a home win is 57%.)

Baltimore Ravens.Cincinnati Bengals

Once we’ve got those win probabilities ginned up, we can use them to move from game-level to season-level forecasts. It’s tempting to think of the wiki survey results as season-level forecasts already, but what they don’t do is account for variation in strength of schedule. Other things being equal, a strong team with a really tough schedule might not be expected to do much better than a mediocre team with a relatively easy schedule. The model-based simulations refract those survey results through the 2014 schedule to give us a clearer picture of what we can expect to happen on the field this year.

The table below (made with the handy ‘textplot’ command in R’s gplots package) turns the predictive simulations into season-level forecasts for all 32 teams.* I calculated two versions of a season summary and juxtaposed them to the wiki survey scores and resulting rankings. Here’s what’s in the table:

  • WikiRank shows each team’s ranking in the final preseason wiki survey results.
  • WikiScore shows the score on which that ranking is based.
  • WinCount counts the number of games in which each team has a win probability above 0.5. This process gives us a familiar number, the first half of a predicted W-L record, but it also throws out a lot of information by treating forecasts close to 0.5 the same as ones where we’re more confident in our prediction of the winner.
  • WinSum, is the sum of each team’s win probabilities across the 16 games. This expected number of wins is a better estimate of each team’s anticipated results than WinCount, but it’s also a less familiar one, so I thought I would show both.

Teams appear in the table in descending order of WinSum, which I consider the single-best estimate in this table of a team’s 2014 performance. It’s interesting (to me, anyway) to see how the rank order changes from the survey to the win totals because of differences in strength of schedule. So, for example, the Patriots ranked 4th in the wiki survey, but they get the second-highest expected number of wins this year (9.8), just behind the Seahawks (9.9). Meanwhile, the Steelers scored 16th in the wiki survey, but they rank 11th in expected number of wins with an 8.4. That’s a smidgen better than the Cincinnati Bengals (8.3) and not much worse than the Baltimore Ravens (9.0), suggesting an even tighter battle for the AFC North division title than the wiki survey results alone.

2014 NFL Season-Level Forecasts from 1,000 Predictive Simulations Using Preseason Wiki Survey Results and Home-Field Advantage

2014 NFL Season-Level Forecasts from 1,000 Predictive Simulations Using Preseason Wiki Survey Results and Home-Field Advantage

There are a lot of other interesting quantities we could extract from the results of the game-level simulations, but that’s all I’ve got time to do now. If you want to poke around in the original data and simulation results, you can find them all in a .csv on my Google Drive (here). I’ve also posted a version of the R script I used to generate the game-level and season-level forecasts on Github (here).

At this point, I don’t have plans to try to update the forecasts during the season, but I will be seeing how the preseason predictions fare and occasionally reporting the results here. Meanwhile, if you have suggestions on other ways to use these data or to improve these forecasts, please leave a comment here on the blog.

* The version of this table I initially posted had an error in the WikiRank column where 18 was skipped and the rankings ran to 33. This version corrects that error. Thanks to commenter C.P. Liberatore for pointing it out.

What are all these violent images doing to us?

Early this morning, I got up, made some coffee, sat down at my desk, and opened Twitter to read the news and pass some time before I had to leave for a conference. One of the first things I saw in my timeline was a still from a video of what was described in the tweet as an ISIS fighter executing a group of Syrian soldiers. The soldiers lay on their stomachs in the dirt, mostly undressed, hands on their heads. They were arranged in a tightly packed row, arms and legs sometimes overlapping. The apparent killer stood midway down the row, his gun pointed down, smoke coming from its barrel.

That experience led me to this pair of tweets:

tweet 1

tweet 2

If you don’t use Twitter, you probably don’t know that, starting in 2013, Twitter tweaked its software so that photos and other images embedded in tweets would automatically appear in users’ timelines. Before that change, you had to click on a link to open an embedded image. Now, if you follow someone who appends an image to his or her tweet, you instantly see the image when the tweet appears in your timeline. The system also includes a filter of sorts that’s supposed to inform you before showing media that may be sensitive, but it doesn’t seem to be very reliable at screening for violence, and it can be turned off.

As I said this morning, I think the automatic display of embedded images is great for sharing certain kinds of information, like data visualizations. Now, tweets can become charticles.

I am increasingly convinced, though, that this feature becomes deeply problematic when people choose to share disturbing images. After I tweeted my complaint, Werner de Pooter pointed out a recent study on the effects of frequent exposure to graphic depictions of violence on the psychological health of journalists. The study’s authors found that daily exposure to violent images was associated with higher scores on several indices of psychological distress and depression. The authors conclude:

Given that good journalism depends on healthy journalists, news organisations will need to look anew at what can be done to offset the risks inherent in viewing User Generated Content material [which includes graphic violence]. Our findings, in need of replication, suggest that reducing the frequency of exposure may be one way to go.

I mostly use Twitter to discover stories and ideas I don’t see in regular news outlets, to connect with colleagues, and to promote my own work. Because I study political violence and atrocities, a fair share of my feed deals with potentially disturbing material. Where that material used to arrive only as text, it increasingly includes photos and video clips of violent or brutal acts as well. I am starting to wonder how routine exposure to those images may be affecting my mental health. The study de Pooter pointed out has only strengthened that concern.

I also wonder if the emotional power of those images is distorting our collective sense of the state of the world. Psychologists talk about the availability heuristic, a cognitive shortcut in which the ease of recalling examples of certain things drives our expectations about the likelihood or risk of those things. As Daniel Kahneman describes on p. 138 of Thinking, Fast and Slow,

Unusual events (such as botulism) attract disproportionate attention and are consequently perceived as less unusual than they really are. The world in our heads is not a precise replica of reality; our expectations about the frequency of events are distorted by the prevalence and emotional intensity of the messages to which we are exposed.

When those images of brutal violence pop into our view, they grab our attention, pack a lot of emotional intensity, and are often to hard to shake. The availability heuristic implies that frequent exposure to those images leads us to overestimate the threat or risk of things associated with them.

This process could even be playing some marginal role in a recent uptick in stories about how the world is coming undone. According to Twitter, its platform now has more than 270 million monthly active users. Many journalists and researchers covering world affairs probably fall in that 270 million. I suspect that those journalists and researchers spend more time watching their timelines than the average user, and they are probably more likely to turn off that “sensitive content” warning, too.

Meanwhile, smartphones and easier Internet access make it increasingly likely that acts of violence will be recorded and then shared through those media, and Twitter’s default settings now make it more likely that we see them when they are. Presumably, some of the organizations perpetrating this violence—and, sometimes, ones trying to mobilize action to stop it—are aware of the effects these images can have and deliberately push them to us to try to elicit that response.

As a result, many writers and analysts are now seeing much more of this material than they used to, even just a year or two ago. Whatever the actual state of the world, this sudden increase in exposure to disturbing material could be convincing many of us that the world is scarier and therefore more dangerous than ever before.

This process could have larger consequences. For example, lately I’ve had trouble getting thoughts of James Foley’s killing out of my mind, even though I never watched the video of it. What about the journalists and policymakers and others who did see those images? How did that exposure affect them, and how much is that emotional response shaping the public conversation about the threat the Islamic State poses and how our governments should respond to it?

I’m not sure what to do about this problem. As an individual, I can choose to unfollow people who share these images or spend less time on Twitter, but both of those actions carry some professional costs as well. The thought of avoiding these images also makes me feel guilty, as if I am failing the people whose suffering they depict and the ones who could be next. By hiding from those images, do I become complicit in the wider violence and injustice they represent?

As an organization, Twitter could decide to revert to the old no-show default, but that almost certainly won’t happen. I suspect this isn’t an issue for the vast majority of users, and it’s hard to imagine any social-media platform retreating from visual content as sites like Instagram and Snapchat grow quickly. Twitter could also try to remove embedded images that contain potentially disturbing material. As a fan of unfettered speech, though, I don’t find that approach appealing, either, and the unreliability of the current warning system suggests it probably wouldn’t work so well anyway.

In light of all that uncertainty, I’ll conclude with an observation instead of a solution: this is one hell of a huge psychological experiment we’re running right now, and its consequences for our own mental health and how we perceive the world around us may be more substantial than we realize.

Deriving a Fuzzy-Set Measure of Democracy from Several Dichotomous Data Sets

In a recent post, I described an ongoing project in which Shahryar Minhas, Mike Ward, and I are using text mining and machine learning to produce fuzzy-set measures of various political regime types for all countries of the world. As part of the NSF-funded MADCOW project,* our ultimate goal is to devise a process that routinely updates those data in near-real time at low cost. We’re not there yet, but our preliminary results are promising, and we plan to keep tinkering.

One of crucial choices we had to make in our initial analysis was how to measure each regime type for the machine-learning phase of the process. This choice is important because our models are only going to be as good as the data from which they’re derived. If the targets in that machine-learning process don’t reliably represent the concepts we have in mind, then the resulting models will be looking for the wrong things.

For our first cut, we decided to use dichotomous measures of several regime types, and to base those dichotomous measures on stringent criteria. So, for example, we identified as democracies only those cases with a score of 10, the maximum, on Polity’s scalar measure of democracy. For military rule, we only coded as 1 those cases where two major data sets agreed that a regime was authoritarian and only military-led, with no hybrids or modifiers. Even though the targets of our machine-learning process were crisply bivalent, we could get fuzzy-set measures from our classifiers by looking at the probabilities of class membership they produce.

In future iterations, though, I’m hoping we’ll get a chance to experiment with targets that are themselves fuzzy or that just take advantage of a larger information set. Bayesian measurement error models offer a great way to generate those targets.

Imagine that you have a set of cases that may or may not belong in some category of interest—say, democracy. Now imagine that you’ve got a set of experts who vote yes (1) or no (0) on the status of each of those cases and don’t always agree. We can get a simple estimate of the probability that a given case is a democracy by averaging the experts’ votes, and that’s not necessarily a bad idea. If, however, we suspect that some experts are more error prone than others, and that the nature of those errors follows certain patterns, then we can do better with a model that gleans those patterns from the data and adjusts the averaging accordingly. That’s exactly what a Bayesian measurement error model does. Instead of an unweighted average of the experts’ votes, we get an inverse-error-rate-weighted average, which should be more reliable than the unweighted version if the assumption about predictable patterns in those errors is largely correct.

I’m not trained in Bayesian data analysis and don’t know my way around the software used to estimate these models, so I sought and received generous help on this task from Sean J. Taylor. I compiled yes/no measures of democracy from five country-year data sets that ostensibly use similar definitions and coding criteria:

  • Cheibub, Gandhi, and Vreeland’s Democracy and Dictatorship (DD) data set, 1946–2008 (here);
  • Boix, Miller, and Rosato’s dichotomous coding of democracy, 1800–2007 (here);
  • A binary indicator of democracy derived from Polity IV using the Political Instability Task Force’s coding rules, 1800–2013;
  • The lists of electoral democracies in Freedom House’s annual Freedom in the World reports, 1989–2013; and
  • My own Democracy/Autocracy data set, 1955–2010 (here).

Sean took those five columns of zeroes and ones and used them to estimate a model with no prior assumptions about the five sources’ relative reliability. James Melton, Stephen Meserve, and Daniel Pemstein use the same technique to produce the terrific Unified Democracy Scores. What we’re doing is a little different, though. Where their approach treats democracy as a scalar concept and estimates a composite index from several measures, we’re accepting the binary conceptualization underlying our five sources and estimating the probability that a country qualifies as a democracy. In fuzzy-set terms, this probability represents a case’s degree of membership in the democracy set, not how democratic it is.

The distinction between a country’s degree of membership in that set and its degree of democracy is subtle but potentially meaningful, and the former will sometimes be a better fit for an analytic task than the latter. For example, if you’re looking to distinguish categorically between democracies and autocracies in order to estimate the difference in some other quantity across the two sets, it makes more sense to base that split on a probabilistic measure of set membership than an arbitrarily chosen cut point on a scalar measure of democracy-ness. You would still need to choose a threshold, but “greater than 0.5″ has a natural interpretation (“probably a democracy”) that suits the task in a way that an arbitrary cut point on an index doesn’t. And, of course, you could still perform a sensitivity analysis by moving the cut point around and seeing how much that choice affects your results.

So that’s the theory, anyway. What about the implementation?

I’m excited to report that the estimates from our initial measurement model of democracy look great to me. As someone who has spent a lot of hours wringing my hands over the need to make binary calls on many ambiguous regimes (Russia in the late 1990s? Venezuela under Hugo Chavez? Bangladesh between coups?), I think these estimates are accurately distinguishing the hazy cases from the rest and even doing a good job estimating the extent of that uncertainty.

As a first check, let’s take a look at the distribution of the estimated probabilities. The histogram below shows the estimates for the period 1989–2007, the only years for which we have inputs from all five of the source data sets. Voilà, the distribution has the expected shape. Most countries most of the time are readily identified as democracies or non-democracies, but the membership status of a sizable subset of country-years is more uncertain.

Estimated Probabilities of Democracy for All Countries Worldwide, 1989-2007

Estimated Probabilities of Democracy for All Countries Worldwide, 1989-2007

Of course, we can and should also look at the estimates for specific cases. I know a little more about countries that emerged from the collapse of the Soviet Union than I do about the rest of the world, so I like to start there when eyeballing regime data. The chart below compares scores for several of those countries that have exhibited more variation over the past 20+ years. Most of the rest of the post-Soviet states are slammed up against 1 (Estonia, Latvia, and Lithuania) or 0 (e.g., Uzbekistan, Turkmenistan, Tajikistan), so I left them off the chart. I also limited the range of years to the ones for which data are available from all five sources. By drawing strength from other years and countries, the model can produce estimates for cases with fewer or even no inputs. Still, the estimates will be less reliable for those cases, so I thought I would focus for now on the estimates based on a common set of “votes.”

Estimated Probability of Democracy for Selected Soviet Successor States, 1991-2007

Estimated Probability of Democracy for Selected Soviet Successor States, 1991-2007

Those estimates look about right to me. For example, Georgia’s status is ambiguous and trending less likely until the Rose Revolution of 2003, after which point it’s probably but not certainly a democracy, and the trend bends down again soon thereafter. Meanwhile, Russia is fairly confidently identified as a democracy after the constitutional crisis of 1993, but its status becomes uncertain around the passage of power from Yeltsin to Putin and then solidifies as most likely authoritarian by the mid-2000s. Finally, Armenia was one of the cases I found most difficult to code when building the Democracy/Autocracy data set for the Political Instability Task Force, so I’m gratified to see its probability of democracy oscillating around 0.5 throughout.

One nice feature of a Bayesian measurement error model is that, in addition to estimating the scores, we can also estimate confidence intervals to help quantify our uncertainty about those scores. The plot below shows Armenia’s trend line with the upper and lower bounds of a 90-percent confidence interval. Here, it’s even easier to see just how unclear this country’s democracy status has been since it regained independence. From 1991 until at least 2007, its 90-percent confidence interval straddled the toss-up line. How’s that for uncertain?

Armenia's Estimated Probability of Democracy with 90% Confidence Interval

Armenia’s Estimated Probability of Democracy with 90% Confidence Interval

Sean and I are still talking about ways to tweak this process, but I think the data it’s producing are already useful and interesting. I’m considering using these estimates in a predictive model of coup attempts and seeing if and how the results differ from ones based on the Polity index and the Unified Democracy Scores. Meanwhile, the rest of the MADCOW crew and I are now talking about applying the same process to dichotomous indicators of military rule, one-party rule, personal rule, and monarchy and then experimenting with machine-learning processes that use the results as their targets. There are lots of moving parts in our regime data-making process, and this one isn’t necessarily the highest priority, but it would be great to get to follow this path and see where it leads.

* NSF Award 1259190, Collaborative Research: Automated Real-time Production of Political Indicators

Mining Texts to Generate Fuzzy Measures of Political Regime Type at Low Cost

Political scientists use the term “regime type” to refer to the formal and informal structure of a country’s government. Of course, “government” entails a lot of things, so discussions of regime type focus more specifically on how rulers are selected and how their authority is organized and exercised. The chief distinction in contemporary work on regime type is between democracies and non-democracies, but there’s some really good work on variations of non-democracy as well (see here and here, for example).

Unfortunately, measuring regime type is hard, and conventional measures of regime type suffer from one or two crucial drawbacks.

First, many of the data sets we have now represent regime types or their components with bivalent categorical measures that sweep meaningful uncertainty under the rug. Specific countries at specific times are identified as fitting into one and only one category, even when researchers knowledgeable about those cases might be unsure or disagree about where they belong. For example, all of the data sets that distinguish categorically between democracies and non-democracies—like this one, this one, and this one—agree that Norway is the former and Saudi Arabia the latter, but they sometimes diverge on the classification of countries like Russia, Venezuela, and Pakistan, and rightly so.

Importantly, the degree of our uncertainty about where a case belongs may itself be correlated with many of the things that researchers use data on regime type to study. As a result, findings and forecasts derived from those data are likely to be sensitive to those bivalent calls in ways that are hard to understand when that uncertainty is ignored. In principle, it should be possible to make that uncertainty explicit by reporting the probability that a case belongs in a specific set instead of making a crisp yes/no decision, but that’s not what most of the data sets we have now do.

Second, virtually all of the existing measures are expensive to produce. These data sets are coded either by hand or through expert surveys, and routinely covering the world this way takes a lot of time and resources. (I say this from knowledge of the budgets for the production of some of these data sets, and from personal experience.) Partly because these data are so costly to make, many of these measures aren’t regularly updated. And, if the data aren’t regularly updated, we can’t use them to generate the real-time forecasts that offer the toughest test of our theories and are of practical value to some audiences.

As part of the NSF-funded MADCOW project*, Michael D. (Mike) Ward, Philip Schrodt, and I are exploring ways to use text mining and machine learning to generate measures of regime type that are fuzzier in a good way from a process that is mostly automated. These measures would explicitly represent uncertainty about where specific cases belong by reporting the probability that a certain case fits a certain regime type instead of forcing an either/or decision. Because the process of generating these measures would be mostly automated, they would be much cheaper to produce than the hand-coded or survey-based data sets we use now, and they could be updated in near-real time as relevant texts become available.

At this week’s annual meeting of the American Political Science Association, I’ll be presenting a paper—co-authored with Mike and Shahryar Minhas of Duke University’s WardLab—that describes preliminary results from this endeavor. Shahryar, Mike, and I started by selecting a corpus of familiar and well-structured texts describing politics and human-rights practices each year in all countries worldwide: the U.S. State Department’s Country Reports on Human Rights Practices, and Freedom House’s Freedom in the World. After pre-processing those texts in a few conventional ways, we dumped the two reports for each country-year into a single bag of words and used text mining to extract features from those bags in the form of vectorized tokens that may be grossly described as word counts. (See this recent post for some things I learned from that process.) Next, we used those vectorized tokens as inputs to a series of binary classification models representing a few different ideal-typical regime types as observed in few widely used, human-coded data sets. Finally, we applied those classification models to a test set of country-years held out at the start to assess the models’ ability to classify regime types in cases they had not previously “seen.” The picture below illustrates the process and shows how we hope eventually to develop models that can be applied to recent documents to generate new regime data in near-real time.

Overview of MADCOW Regime Classification Process

Overview of MADCOW Regime Classification Process

Our initial results demonstrate that this strategy can work. Our classifiers perform well out of sample, achieving high or very high precision and recall scores in cross-validation on all four of the regime types we have tried to measure so far: democracy, monarchy, military rule, and one-party rule. The separation plots below are based on out-of-sample results from support vector machines trained on data from the 1990s and most of the 2000s and then applied to new data from the most recent few years available. When a classifier works perfectly, all of the red bars in the separation plot will appear to the right of all of the pink bars, and the black line denoting the probability of a “yes” case will jump from 0 to 1 at the point of separation. These classifiers aren’t perfect, but they seem to be working very well.

 

prelim.democracy.svm.sepplot

prelim.military.svm.sepplot

prelim.monarchy.svm.sepplot

prelim.oneparty.svm.sepplot

Of course, what most of us want to do when we find a new data set is to see how it characterizes cases we know. We can do that here with heat maps of the confidence scores from the support vector machines. The maps below show the values from the most recent year available for two of the four regime types: 2012 for democracy and 2010 for military rule. These SVM confidence scores indicate the distance and direction of each case from the hyperplane used to classify the set of observations into 0s and 1s. The probabilities used in the separation plots are derived from them, but we choose to map the raw confidence scores because they exhibit more variance than the probabilities and are therefore easier to visualize in this form.

prelim.democracy.svmcomf.worldmap.2012

prelim.military.svmcomf.worldmap.2010

 

On the whole, cases fall out as we would expect them to. The democracy classifier confidently identifies Western Europe, Canada, Australia, and New Zealand as democracies; shows interesting variations in Eastern Europe and Latin America; and confidently identifies nearly all of the rest of the world as non-democracies (defined for this task as a Polity score of 10). Meanwhile, the military rule classifier sees Myanmar, Pakistan, and (more surprisingly) Algeria as likely examples in 2010, and is less certain about the absence of military rule in several West African and Middle Eastern countries than in the rest of the world.

These preliminary results demonstrate that it is possible to generate probabilistic measures of regime type from publicly available texts at relatively low cost. That does not mean we’re fully satisfied with the output and ready to move to routine data production, however. For now, we’re looking at a couple of ways to improve the process.

First, the texts included in the relatively small corpus we have assembled so far only cover a narrow set of human-rights practices and political procedures. In future iterations, we plan to expand the corpus to include annual or occasional reports that discuss a broader range of features in each country’s national politics. Eventually, we hope to add news stories to the mix. If we can develop models that perform well on an amalgamation of occasional reports and news stories, we will be able to implement this process in near-real time, constantly updating probabilistic measures of regime type for all countries of the world at very low cost.

Second, the stringent criteria we used to observe each regime type in constructing the binary indicators on which the classifiers are trained also appear to be shaping the results in undesirable ways. We started this project with a belief that membership in these regime categories is inherently fuzzy, and we are trying to build a process that uses text mining to estimate degrees of membership in those fuzzy sets. If set membership is inherently ambiguous in a fair number of cases, then our approximation of a membership function should be bimodal, but not too neatly so. Most cases most of the time can be placed confidently at one end of the range of degrees of membership or the other, but there is considerable uncertainty at any moment in time about a non-trivial number of cases, and our estimates should reflect that fact.

If that’s right, then our initial estimates are probably too tidy, and we suspect that the stringent operationalization of each regime type in the training data is partly to blame. In future iterations, we plan to experiment with less stringent criteria—for example, by identifying a case as military rule if any of our sources tags it as such. With help from Sean J. Taylor, we’re also looking at ways we might use Bayesian measurement error models to derive fuzzy measures of regime type from multiple categorical data sets, and then use that fuzzy measure as the target in our machine-learning process.

So, stay tuned for more, and if you’ll be at APSA this week, please come to our Friday-morning panel and let us know what you think.

* NSF Award 1259190, Collaborative Research: Automated Real-time Production of Political Indicators

The Worst World EVER…in the Past 5 or 10 Years

A couple of months ago, the head of the UN’s refugee agency announced that, in 2013, “the number of people displaced by violent conflict hit the highest level since World War II,” and he noted that the number was still growing in 2014.

A few days ago, under the headline “Countries in Crisis at Record High,” Foreign Policy‘s The Cable reported that the UN’s Inter-Agency Standing Committee for the first time ever had identified four situations worldwide—Syria, Iraq, South Sudan, and Central African Republic—as level 3 humanitarian emergencies, its highest (worst) designation.

Today, the Guardian reported that “last year was the most dangerous on record for humanitarian workers, with 155 killed, 171 seriously wounded and 134 kidnapped as they attempted to help others in some of the world’s most dangerous places.'”

If you read those stories, you might infer that the world has become more insecure than ever, or at least the most insecure it’s been since the last world war. That would be reasonable, but probably also wrong.  These press accounts of record-breaking trends are often omitting or underplaying a crucial detail: the data series on which these claims rely don’t extend very far into the past.

In fact, we don’t know how the current number of displaced persons compares to all years since World War II, because the UN only has data on that since 1989. In absolute terms, the number of refugees worldwide is now the largest it’s been since record-keeping began 25 years ago. Measured as a share of global population, however, the number of displaced persons in 2013 had not yet matched the peak of the early 1990s (see the Addendum here).

The Cable accurately states that having four situations designated as level-3 humanitarian disasters by the UN is “unprecedented,” but we only learn late in the story that the system which makes these designations has only existed for a few years. In other words, unprecedented…since 2011.

Finally, while the Guardian correctly reports that 2013 was the most dangerous year on record for aid workers, it fails to note that those records only reach back to the late 1990s.

I don’t mean to make light of worrisome trends in the international system or any of the terrible conflicts driving them. From the measures I track—see here and here, for example, and here for an earlier post on causes—I’d say that global levels of instability and violent conflict are high and waxing, but they have not yet exceeded the peaks we saw in the early 1990s and probably the 1960s. Meanwhile, the share of states worldwide that are electoral democracies remains historically high, and the share of the world’s population living in poverty has declined dramatically in the past few decades. The financial crisis of 2008 set off a severe and persistent global recession, but that collapse could have been much worse, and institutions of global governance deserve some credit for helping to stave off an even deeper failure.

How can all of these things be true at the same time? It’s a bit like climate change. Just as one or even a few unusually cool years wouldn’t reverse or disprove the clear long-term trend toward a hotter planet, an extended phase of elevated disorder and violence doesn’t instantly undo the long-term trends toward a more peaceful and prosperous human society. We are currently witnessing (or suffering) a local upswing in disorder that includes numerous horrific crises, but in global historical terms, the world has not fallen apart.

Of course, if it’s a mistake to infer global collapse from these local trends, it’s also a mistake to infer that global collapse is impossible from the fact that it hasn’t occurred already. The war that is already consuming Syria and Iraq is responsible for a substantial share of the recent increase in refugee flows and casualties, and it could spread further and burn hotter for some time to come. Probably more worrisome to watchers of long-term trends in international relations, the crisis in Ukraine and recent spate of confrontations between China and its neighbors remind us that war between major powers could happen again, and this time those powers would both or all have nuclear weapons. Last but not least, climate change seems to be accelerating with consequences unknown.

Those are all important sources of elevated uncertainty, but uncertainty and breakdown are not the same thing. Although those press stories describing unprecedented crises are all covering important situations and trends, I think their historical perspective is too shallow. I’m forty-four years old. The global system is less orderly than it’s been in a while, but it’s still not worse than it’s ever been in my lifetime, and it’s still nowhere near as bad as it was when my parents were born. I won’t stop worrying or working on ways to try to make things a tiny bit better, but I will keep that frame of reference in mind.

Notes From a First Foray into Text Mining

Guess what? Text mining isn’t push-button, data-making magic, either. As Phil Schrodt likes to say, there is no Data Fairy.

data fairy meme

I’m quickly learning this point from my first real foray into text mining. Under a grant from the National Science Foundation, I’m working with Phil Schrodt and Mike Ward to use these techniques to develop new measures of several things, including national political regime type.

I wish I could say that I’m doing the programming for this task, but I’m not there yet. For the regime-data project, the heavy lifting is being done by Shahryar Minhas, a sharp and able Ph.D. student in political science at Duke University, where Mike leads the WardLab. Shahryar and I are scheduled to present preliminary results from this project at the upcoming Annual Meeting of the American Political Science Association in Washington, DC (see here for details).

When we started work on the project, I imagined a relatively simple and mostly automatic process running from location and ingestion of the relevant texts to data extraction, model training, and, finally, data production. Now that we’re actually doing it, though, I’m finding that, as always, the devil is in the details. Here are just a few of the difficulties and decision points we’ve had to confront so far.

First, the structure of the documents available online often makes it difficult to scrape and organize them. We initially hoped to include annual reports on politics and human-rights practices from four or five different organizations, but some of the ones we wanted weren’t posted online in a format we could readily scrape. At least one was scrapable but not organized by country, so we couldn’t properly group the text for analysis. In the end, we wound up with just two sets of documents in our initial corpus: the U.S. State Department’s Country Reports on Human Rights Practices, and Freedom House’s annual Freedom in the World documents.

Differences in naming conventions almost tripped us up, too. For our first pass at the problem, we are trying to create country-year data, so we want to treat all of the documents describing a particular country in a particular year as a single bag of words. As it happens, the State Department labels its human rights reports for the year on which they report, whereas Freedom House labels its Freedom in the World report for the year in which it’s released. So, for example, both organizations have already issued their reports on conditions in 2013, but Freedom House dates that report to 2014 while State dates its version to 2013. Fortunately, we knew this and made a simple adjustment before blending the texts. If we hadn’t known about this difference in naming conventions, however, we would have ended up combining reports for different years from the two sources and made a mess of the analysis.

Once ingested, those documents include some text that isn’t relevant to our task, or that is relevant but the meaning of which is tacit. Common stop words like “the”, “a”, and “an” are obvious and easy to remove. More challenging are the names of people, places, and organizations. For our regime-data task, we’re interested in the abstract roles behind some of those proper names—president, prime minister, ruling party, opposition party, and so on—rather than the names themselves, but text mining can’t automatically derive the one for the other.

For our initial analysis, we decided to omit all proper names and acronyms to focus the classification models on the most general language. In future iterations, though, it would be neat if we could borrow dictionaries developed for related tasks and use them to replace those proper names with more general markers. For example, in a report or story on Russia, Vladimir Putin might get translated into <head of government>, the FSB into <police>, and Chechen Republic of Ichkeria into <rebel group>. This approach would preserve the valuable tacit information in those names while making it explicit and uniform for the pattern-recognition stage.

That’s not all, but it’s enough to make the point. These things are always harder than they look, and text mining is no exception. In any case, we’ve now run this gantlet once and made our way to an encouraging set of initial results. I’ll post something about those results closer to the conference when the paper describing them is ready for public consumption. In the meantime, though, I wanted to share a few of the things I’ve already learned about these techniques with others who might be thinking about applying them, or who already do and can commiserate.

Turning Crowdsourced Preseason NFL Strength Ratings into Game-Level Forecasts

For the past week, nearly all of my mental energy has gone into the Early Warning Project and a paper for the upcoming APSA Annual Meeting here in Washington, DC. Over the weekend, though, I found some time for a toy project on forecasting pro-football games. Here are the results.

The starting point for this toy project is a pairwise wiki survey that turns a crowd’s beliefs about relative team strength into scalar ratings. Regular readers will recall that I first experimented with one of these before the 2013-2014 NFL season, and the predictive power wasn’t terrible, especially considering that the number of participants was small and the ratings were completed before the season started.

This year, to try to boost participation and attract a more knowledgeable crowd of respondents, I paired with Trey Causey to announce the survey on his pro-football analytics blog, The Spread. The response has been solid so far. Since the survey went up, the crowd—that’s you!—has cast nearly 3,400 votes in more than 100 unique user sessions (see the Data Visualizations section here).

The survey will stay open throughout the season, but that doesn’t mean it’s too early to start seeing what it’s telling us. One thing I’ve already noticed is that the crowd does seem to be updating in response to preseason action. For example, before the first round of games, I noticed that the Baltimore Ravens, my family’s favorites, were running mid-pack with a rating of about 50. After they trounced the defending NFC champion 49ers in their preseason opener, however, the Ravens jumped to the upper third with a rating of 59. (You can always see up-to-the-moment survey results here, and you can cast your own votes here.)

The wiki survey is a neat way to measure team strength. On their own, though, those ratings don’t tell us what we really want to know, which is how each game is likely to turn out, or how well our team might be expected to do this season. The relationship between relative strength and game outcomes should be pretty strong, but we might want to consider other factors, too, like home-field advantage. To turn a strength rating into a season-level forecast for a single team, we need to consider the specifics of its schedule. In game play, it’s relative strength that matters, and some teams will have tougher schedules than others.

A statistical model is the best way I can think to turn ratings into game forecasts. To get a model to apply to this season’s ratings, I estimated a simple linear one from last year’s preseason ratings and the results of all 256 regular-season games (found online in .csv format here). The model estimates net score (home minus visitor) from just one feature, the difference between the two teams’ preseason ratings (again, home minus visitor). Because the net scores are all ordered the same way and the model also includes an intercept, though, it implicitly accounts for home-field advantage as well.

The scatterplot below shows the raw data on those two dimensions from the 2013 season. The model estimated from these data has an intercept of 3.1 and a slope of 0.1 for the score differential. In other words, the model identifies a net home-field advantage of 3 points—consistent with the conventional wisdom—and it suggests that every point of advantage on the wiki-survey ratings translates into a net swing of one-tenth of a point on the field. I also tried a generalized additive model with smoothing splines to see if the association between the survey-score differential and net game score was nonlinear, but as the scatterplot suggests, it doesn’t seem to be.

2013 NFL Games Arranged by Net Game Score and Preseason Wiki Survey Rating Differentials

2013 NFL Games Arranged by Net Game Score and Preseason Wiki Survey Rating Differentials

In sample, the linear model’s accuracy was good, not great. If we convert the net scores the model postdicts to binary outcomes and compare those postdictions to actual outcomes, we see that the model correctly classifies 60 percent of the games. That’s in sample, but it’s also based on nothing more than home-field advantage and a single preseason rating for each team from a survey with a small number of respondents. So, all things considered, it looks like a potentially useful starting point.

Whatever its limitations, that model gives us the tool we need to convert 2014 wiki survey results into game-level predictions. To do that, we also need a complete 2014 schedule. I couldn’t find one in .csv format, but I found something close (here) that I saved as text, manually cleaned in a minute or so (deleted extra header rows, fixed remaining header), and then loaded and merged with a .csv of the latest survey scores downloaded from the manager’s view of the survey page on All Our Ideas.

I’m not going to post forecasts for all 256 games—at least not now, with three more preseason games to learn from and, hopefully, lots of votes yet to be cast. To give you a feel for how the model is working, though, I’ll show a couple of cuts on those very preliminary results.

The first is a set of forecasts for all Week 1 games. The labels show Visitor-Home, and the net score is ordered the same way. So, a predicted net score greater than 0 means the home team (second in the paired label) is expected to win, while a predicted net score below 0 means the visitor (first in the paired label) is expected to win. The lines around the point predictions represent 90-percent confidence intervals, giving us a partial sense of the uncertainty around these estimates.

Week 1 Game Forecasts from Preseason Wiki Survey Results on 10 August 2014

Week 1 Game Forecasts from Preseason Wiki Survey Results on 10 August 2014

Of course, as a fan of particular team, I’m most interested in what the model says about how my guys are going to do this season. The next plot shows predictions for all 16 of Baltimore’s games. Unfortunately, the plotting command orders the data by label, and my R skills and available time aren’t sufficient to reorder them by week, but the information is all there. In this plot, the dots for the point predictions are colored red if they predict a Baltimore win and black for an expected loss. The good news for Ravens fans is that this plot suggests an 11-5 season, good enough for a playoff berth. The bad news is that an 8-8 season also lies within the 90-percent confidence intervals, so the playoffs don’t look like a lock.

2014 Game-Level Forecasts for the Baltimore Ravens from 10 August 2014 Wiki Survey Scores

2014 Game-Level Forecasts for the Baltimore Ravens from 10 August 2014 Wiki Survey Scores

So that’s where the toy project stands now. My intuition tells me that the predicted net scores aren’t as well calibrated as I’d like, and the estimated confidence intervals surely understate the true uncertainty around each game (“On any given Sunday…”). Still, I think this exercise demonstrates the potential of this forecasting process. If I were a betting man, I wouldn’t lay money on these estimates. As an applied forecaster, though, I can imagine using these predictions as priors in a more elaborate process that incorporates additional and, ideally, more dynamic information about each team and game situation over the course of the season. Maybe my doppelganger can take that up while I get back to my day job…

Postscript. After I published this post, Jeff Fogle suggested via Twitter that I compare the Week 1 forecasts to the current betting lines for those games. The plot below shows the median point spread from an NFL odds-aggregating site as blue dots on top of the statistical forecasts already shown above. As you can see, the statistical forecasts are tracking the betting lines pretty closely. There’s only one game—Carolina at Tampa Bay—where the predictions from the two series fall on different sides of the win/loss line, and it’s a game the statistical model essentially sees as a toss-up. It’s also reassuring that there isn’t a consistent direction to the differences, so the statistical process doesn’t seem to be biased in some fundamental way.

Week 1 Game-Level Forecasts Compared to Median Point Spread from Betting Sites on 11 August 2014

Week 1 Game-Level Forecasts Compared to Median Point Spread from Betting Sites on 11 August 2014

Forecasting Round-Up No. 7

1. I got excited when I heard on Twitter yesterday about a machine-learning process that turns out to be very good at predicting U.S. Supreme Court decisions (blog post here, paper here). I got even more excited when I saw that the guys who built that process have also been running a play-money prediction market on the same problem for the past several years, and that the most accurate forecasters in that market have done even better than that model (here). It sounds like they are now thinking about more rigorous ways to compare and cross-pollinate the two. That’s part of what we’re trying to do with the Early Warning Project, so I hope that they do and we can learn from their findings.

2. A paper in the current issue of the Journal of Personality and Social Psychology (here, but paywalled; hat-tip to James Igoe Walsh) adds to the growing pile of evidence on the forecasting power of crowds, with an interesting additional finding on the willingness of others to trust and use those forecasts:

We introduce the select-crowd strategy, which ranks judges based on a cue to ability (e.g., the accuracy of several recent judgments) and averages the opinions of the top judges, such as the top 5. Through both simulation and an analysis of 90 archival data sets, we show that select crowds of 5 knowledgeable judges yield very accurate judgments across a wide range of possible settings—the strategy is both accurate and robust. Following this, we examine how people prefer to use information from a crowd. Previous research suggests that people are distrustful of crowds and of mechanical processes such as averaging. We show in 3 experiments that, as expected, people are drawn to experts and dislike crowd averages—but, critically, they view the select-crowd strategy favorably and are willing to use it. The select-crowd strategy is thus accurate, robust, and appealing as a mechanism for helping individuals tap collective wisdom.

3. Adam Elkus recently spotlighted two interesting papers involving agent-based modeling (ABM) and forecasting.

  • The first (here) “presents a set of guidelines, imported from the field of forecasting, that can help social simulation and, more specifically, agent-based modelling practitioners to improve the predictive performance and the robustness of their models.”
  • The second (here), from 2009 but new to me, describes an experiment in deriving an agent-based model of political conflict from event data. The results were pretty good; a model built from event data and then tweaked by a subject-matter expert was as accurate as one built entirely by hand, and the hybrid model took much less time to construct.

4. Nautilus ran a great piece on Lewis Fry Richardson, a pioneer in weather forecasting who also applied his considerable intellect to predicting violent conflict. As the story notes,

At the turn of the last century, the notion that the laws of physics could be used to predict weather was a tantalizing new idea. The general idea—model the current state of the weather, then apply the laws of physics to calculate its future state—had been described by the pioneering Norwegian meteorologist Vilhelm Bjerknes. In principle, Bjerkens held, good data could be plugged into equations that described changes in air pressure, temperature, density, humidity, and wind velocity. In practice, however, the turbulence of the atmosphere made the relationships among these variables so shifty and complicated that the relevant equations could not be solved. The mathematics required to produce even an initial description of the atmosphere over a region (what Bjerknes called the “diagnostic” step) were massively difficult.

Richardson helped solve that problem in weather forecasting by breaking the task into many more manageable parts—atmospheric cells, in this case—and thinking carefully about how those parts fit together. I wonder if we will see similar advances in forecasts of social behavior in the next 100 years. I doubt it, but the trajectory of weather prediction over the past century should remind us to remain open to the possibility.

5. Last, a bit of fun: Please help Trey Causey and me forecast the relative strength of this year’s NFL teams by voting in this pairwise wiki survey! I did this exercise last year, and the results weren’t bad, even though the crowd was pretty small and probably not especially expert. Let’s see what happens if more people participate, shall we?

Follow

Get every new post delivered to your Inbox.

Join 8,219 other followers

%d bloggers like this: