Yes, By Golly, I Am Ready for Some Football

The NFL’s 2015 season sort of got underway last night with the Hall of Fame Game. Real preseason play doesn’t start until this weekend, and the first kickoff of the regular season is still a month away.

No matter, though — I’m taking the Hall of Fame Game as my cue to launch this year’s football forecasting effort. As it has for the past two years (see here and here), the process starts with me asking you to help assess the strength of this year’s teams by voting in a pairwise wiki survey:

In the 2015 NFL season, which team will be better?

That survey produces scores on a scale of 0–100. Those scores will become the crucial inputs into simulations based on a simple statistical model estimated from the past two years’ worth of survey data and game results. Using an R function I wrote, I’ve determined that I should be able to improve the accuracy of my forecasts a bit this year by basing them on a mixed-effects model with random intercepts to account for variation in home-team advantages across the league. Having another season’s worth of predicted and actual outcomes should help, too; with two years on the books, my model-training sample has doubled.

An improvement in accuracy would be great, but I’m also excited about using R Studio’s Shiny to build a web page that will let you explore the forecasts at a few levels: by game, by team, and by week. Here’s a screenshot of the game-level tab from a working version using the 2014 data. It plots the distribution of the net scores (home – visitor) from the 1,000 simulations, and it reports win probabilities for both teams and a line (the median of the simulated scores).

nfl.forecasts.app.game.20150809

The “By team” tab lets you pick a team to see a plot of the forecasts for all 16 of their games, along with their predicted wins (count of games with win probabilities over 0.5) and expected wins (sum of win probabilities for all games) for the year. The “By week” tab (shown below) lets you pick a week to see the forecasts for all the games happening in that slice of the season. Before, I plan to add annotation to the plot, reporting the lines those forecasts imply (e.g., Texans by 7).

nfl.forecasts.app.week.20150809

Of course, the quality of the forecasts displayed in that app will depend heavily on participation in the wiki survey. Without a diverse and well-informed set of voters, it will be hard to do much better than guessing that each team will do as well this year as it did last year. So, please vote here; please share this post or the survey link with friends and family who know something about pro football; and please check back in a few weeks for the results.

Advertisements

Be Vewy, Vewy Quiet

This blog has gone relatively quiet of late, and it will probably stay that way for a while. That’s partly a function of my personal life, but it also reflects a conscious decision to spend more time improving my abilities as a programmer.

I want to get better at scraping, making, munging, summarizing, visualizing, and analyzing data. So, instead of contemplating world affairs, I’ve been starting to learn Python; using questions on Stack Overflow as practice problems for R; writing scripts that force me to expand my programming skills; and building Shiny apps that put those those skills to work. Here’s a screenshot of one app I’ve made—yes, it actually works—that interactively visualizes ACLED’s latest data on violence against civilians in Africa, based partly on this script for scraping ACLED’s website:

acled.visualizer.20150728

When I started on this kick, I didn’t plan to stop writing blog posts about international affairs. As I’ve gotten into it, though, I’ve found that my curiosity about current events has ebbed, and the pilot light for my writing brain has gone out. Normally, writing ideas flare up throughout the day, but especially in the early morning. Lately, I wake up thinking about the coding problems I’m stuck on.

I think it’s a matter of attention, not interest. Programming depends on the tiniest details. All those details quickly clog the brain’s RAM, leaving no room for the unconscious associations that form the kernels of new prose. That clogging happens even faster when other parts of your life are busy, stressful, or off kilter, as they are for many of us, as as they are for me right now.

That’s what I think, anyway. Whatever the cause, though, I know that I’m rarely feeling the impulse to write, and I know that shift has sharply slowed the pace of publishing here. I’m leaving the channel open and hope I can find the mental and temporal space to keep using it, but who knows what tomorrow may bring?

ACLED in R

The Armed Conflict Location & Event Data Project, a.k.a. ACLED, produces up-to-date event data on certain kinds of political conflict in Africa and, as of 2015, parts of Asia. In this post, I’m not going to dwell on the project’s sources and methods, which you can read about on ACLED’s About page, in the 2010 journal article that introduced the project, or in the project’s user’s guides. Nor am I going to dwell on the necessity of using all political event data sets, including ACLED, with care—understanding the sources of bias in how they observe events and error in how they code them and interpreting (or, in extreme cases, ignoring) the resulting statistics accordingly.

Instead, my only aim here is to share an R script I’ve written that largely automates the process of downloading and merging ACLED’s historical and current Africa data and then creates a new data frame with counts of events by type at the country-month level. If you use ACLED in R, this script might save you some time and some space on your hard drive.

You can find the R script on GitHub, here.

The chief problem with this script is that the URLs and file names of ACLED’s historical and current data sets change with every update, so the code will need to be modified each time that happens. If the names were modular and the changes to them predictable, it would be easy to rewrite the code to keep up with those changes automatically. Unfortunately, they aren’t, so the best I can do for now is to give step-by-step instructions in comments embedded in the script on how to update the relevant four fields by hand. As long as the basic structure of the .csv files posted by ACLED doesn’t change, though, the rest should keep working.

[UPDATE: I revised the script so it will scrape the link addresses from the ACLED website and parse the file names from them. The new version worked after ACLED updated its real-time file earlier today, when the old version would have broken. Unless ACLED changes its file-naming conventions or the structure of its website, the version should work for the rest of 2015. In case it does fail, instructions on how to hard-code a workaround are included as comments at the bottom of the script.]

It should also be easy to adapt the part of the script that generates country-month event counts to slice the data even more finely, or to count by something other than event type. To do that, you would just need to add variables to the group_by() part of the block of code that produces the object ACLED.cm. For example, if you wanted to get counts of events by type at the level of the state or province, you would revise that line to read group_by(gwno, admin1, year, month, event_type). Or, if you wanted country-month counts of events by the type(s) of actor involved, you could use group_by(gwno, year, month, interaction) and then see this user’s guide to decipher those codes. You get the drift.

The script also shows a couple of examples of how to use ‘gglot2’ to generate time-series plots of those monthly counts. Here’s one I made of monthly counts of battle events by country for the entire period covered by ACLED as of this writing: January 1997–June 2015. A production-ready version of this plot would require some more tinkering with the size of the country names and the labeling of the x-axis, but the kind of small-multiples chart offers a nice way to explore the data before analysis.

Monthly counts of battle events, January 1997-June 2015

Monthly counts of battle events, January 1997-June 2015

If you use the script and find flaws in it or have ideas on how to make it work better or do more, please email me at ulfelder <at> gmail <dot> com.

The Stacked-Label Column Plot

Most of the statistical work I do involves events that occur rarely in places over time. One of the best ways to get or give a feel for the structure of data like that is with a plot that shows variation in counts of those events across sequential, evenly-sized slices of time. For me, that usually means a sequence of annual, global counts of those events, like the one below for successful and failed coup attempts over the past several decades (see here for the R script that generated that plot and a few others and here for the data):

Annual, global counts of successful and failed coup attempts per the Cline Center's SPEED Project

Annual, global counts of successful and failed coup attempts per the Cline Center’s SPEED Project, 1946-2005

One thing I don’t like about those plots, though, is the loss of information that comes from converting events to counts. Sometimes we want to know not just how many events occurred in a particular year but also where they occurred, and we don’t want to have to query the database or look at a separate table to find out.

I try to do both in one go with a type of column chart I’ll call the stacked-label column plot. Instead of building columns from bricks of identical color, I use blocks of text that describe another attribute of each unit—usually country names in my work, but it could be lots of things. In order for those blocks to have comparable visual weight, they need to be equally sized, which usually means using labels of uniform length (e.g., two– or three-letter country codes) and a fixed-width font like Courier New.

I started making these kinds of plots in the 1990s, using Excel spreadsheets or tables in Microsoft Word to plot things like protest events and transitions to and from democracy. A couple decades later, I’m finally trying to figure out how to make them in R. Here is my first reasonably successful attempt, using data I just finished updating on when countries joined the World Trade Organization (WTO) or its predecessor, the General Agreement on Tariffs and Trade (GATT).

Note: Because the Wordpress template I use crams blog-post content into a column that’s only half as wide as the screen, you might have trouble reading the text labels in some browsers. If you can’t make out the letters, try clicking on the plot, then increasing the zoom if needed.

Annual, global counts of countries joining the global free-trade regime, 1960-2014

Annual, global counts of countries joining the global free-trade regime, 1960-2014

Without bothering to read the labels, you can see the time trend fine. Since 1960, there have been two waves of countries joining the global free-trade regime: one in the early 1960s, and another in the early 1990s. Those two waves correspond to two spates of state creation, so without the labels, many of us might infer that those stacks are composed mostly or entirely of new states joining.

When we scan the labels, though, we discover a different story. As expected, the wave in the early 1960s does include a lot of newly independent African states, but it also includes a couple of Warsaw Pact countries (Yugoslavia and Poland) and some middle-income cases from other parts of the world (e.g., Argentina and South Korea). Meanwhile, the wave of the early 1990s turns out to include very few post-Communist countries, most of which didn’t join until the end of that decade or early in the next one. Instead, we see a second wave of “developing” countries joining on the eve of the transition from GATT to the WTO, which officially happened on January 1, 1995. I’m sure people who really know the politics of the global free-trade regime, or of specific cases or regions, can spot some other interesting stories in there, too. The point, though, is that we can’t discover those stories if we can’t see the case labels.

Here’s another one that shows which countries had any coup attempts each year between 1960 and 2014, according to Jonathan Powell and Clayton Thyne‘s running list. In this case, color tells us the outcomes of those coup attempts: red if any succeeded, dark grey if they all failed.

Countries with any coup attempts per Powell and Thyne, 1960-2014

One story that immediately catches my eye in this plot is Argentina’s (ARG) remarkable propensity for coups in the early 1960s. It shows up in each of the first four columns, although only in 1962 are any of those attempts successful. Again, this is information we lose when we only plot the counts without identifying the cases.

The way I’m doing it now, this kind of chart requires data to be stored in (or converted to) event-file format, not the time-series cross-sectional format that many of us usually use. Instead of one row per unit–time slice, you want one row for each event. Each row should at least two columns with the case label and the time slice in which the event occurred.

If you’re interested in playing around with these types of plots, you can find the R script I used to generate the ones above here. Perhaps some enterprising soul will take it upon him- or herself to write a function that makes it easy to produce this kind of chart across a variety of data structures.

It would be especially nice to have a function that worked properly when the same label appears more than once in a given time slice. Right now, I’m using the function ‘match’ to assign y values that evenly stack the events within each bin. That doesn’t work for the second or third or nth match, though, because the ‘match’ function always returns the position of the first match in the relevant vector. So, for example, if I try to plot all coup attempts each year instead of all countries with any coup attempts each year, the second or later events in the same country get placed in the same position as the first, which ultimately means they show up as blank spaces in the columns. Sadly, I haven’t figured out yet how to identify location in that vector in a more general way to fix this problem.

About That Decline in EU Contributions to UN Peacekeeping

A couple of days ago, Ambassador Samantha Power, the US Permanent Representative to the United Nations, gave a speech on peacekeeping in Brussels that, among other things, lamented a decline in the participation of European personnel in UN peacekeeping missions:

Twenty years ago, European countries were leaders in UN peacekeeping. 25,000 troops from European militaries served in UN peacekeeping operations – more than 40 percent of blue helmets at the time. Yet today, with UN troop demands at an all-time high of more than 90,000 troops, fewer than 6,000 European troops are serving in UN peacekeeping missions. That is less than 7 percent of UN troops.

The same day, Mark Leon Goldberg wrote a post for UN Dispatch (here) that echoed Ambassador Power’s remarks and visualized her point with a chart that was promptly tweeted by the US Mission to the UN:

Percentage of western European Troops in UN Peacekeeping missions (source: UN Dispatch)

When I saw that chart, I wondered if it might be a little misleading. As Ambassador Power noted in her remarks, the number of troops deployed as UN peacekeepers has increased significantly in recent years. With so much growth in the size of the pool, changes in the share of that pool contributed by EU members could result from declining contributions, but they could also result from no change, or from slower growth in EU contributions relative to other countries.

To see which it was, I used data from the International Peace Institute’s Providing for Peacekeeping Project to plot monthly personnel contributions from late 1991 to early 2014 for EU members and all other countries. Here’s what I got (and here is the R script I used to get there):

Monthly UN PKO personnel totals by country of origin, Nov 1991-Feb 2014

Monthly UN PKO personnel totals by country of origin, November 1991-February 2014

To me, that chart tells a different story than the one Ambassador Power and UN Dispatch describe. Instead of a sharp decline in European contributions over the past 20 years, we see a few-year surge in the early 1990s followed by a fairly constant level of EU member contributions since then. There’s even a mini-surge in 2005–2006 followed by a slow and steady return to the average level after that.

In her remarks, Ambassador Power compared Europe’s participation now to 20 years ago. Twenty years ago—late 1994 and early 1995—just happens to be the absolute peak of EU contributions. Not coincidentally, that peak coincided with the deployment of a UN PKO in Europe, the United Nations Protection Force (UNPROFOR) in Bosnia and Herzegovina, to which European countries contributed the bulk of the troops. In other words, when UN peacekeeping was focused on Europe, EU members contributed most of the troops. As the UN has expanded its peacekeeping operations around the world (see here for current info), EU member states haven’t really reduced their participation; instead, other countries have greatly increased theirs.

We can and should argue about how much peacekeeping the UN should try to do, and what various countries should contribute to those efforts. After looking at European participation from another angle, though, I’m not sure it’s fair to criticize EU members for “declining” involvement in the task.

Oh, and in case you’re wondering like I was, here’s a comparison of personnel contributions from EU members to ones from the United States over that same period. The US pays the largest share, but on the dimension Ambassador Power and UN Dispatch chose to spotlight—troop contributions—it offers very little.

unpko.contribution.comparison.eu.us

Monthly UN PKO personnel totals by country of origin, November 1991-February 2014

If At First You Don’t Succeed

A couple of weeks ago, I blogged about a failed attempt to do some exploratory text-mining on the US National Security Strategy reports (here). That project was supposed to give me a fun way to learn the basics of text mining in R, something I’ve been eager to do of late. In writing the blog post, I had two motives: 1) to help normalize the experience of getting stuck and failing in social science and data science, and 2) to appeal for help from more experienced coders who could help get me unstuck on this particular task.

The post succeeded on both counts. I won’t pepper you with evidence on the commiseration front, but I am excited to share the results of the coding improvements. In addition to learning how to text-mine, I have also been trying to learn how to use RStudio and Shiny to build interactive apps, and this project seemed like a good one to do both. So, I’ve created an app that lets users explore this corpus in three ways:

  • Plot word counts over time to see how the use of certain terms has waxed and waned over the 28 years the reports span.
  • Generate word clouds showing the 50 most common words in each of the 16 reports.
  • Explore associations between terms by picking one and see which 10 others are most closely correlated with it in the entire corpus.

For example, here’s a plot of change over time in the relative frequency of the term ‘terror’. Its usage spikes after 9/11 and then falls sharply when Barack Obama replaces George W. Bush as president.

NSS terror time trend

That pattern contrasts sharply with references to climate, which rarely gets mentioned until the Obama presidency, when its usage spikes upward. (Note, though, that the y-axis has been rescaled from the previous chart, so this large increase still has ‘climat’ only appearing about half as often as ‘terror’.)

NSS climat time trend

And here’s a word cloud of the 50 most common terms from the first US National Security Strategy, published in 1987. Surprise! The Soviet Union dominates the monologue.

NSS 1987 word cloud

When I built an initial version of the app a couple of Sundays ago, I promptly launched it on shinyapps.io to try to show it off. Unfortunately, the Shiny server only gives you 25 hours of free usage per billing cycle, and when I tweeted about the app, it got so much attention that those hours disappeared in a little over a day!

I don’t have my own server to host this thing, and I’m not sure when Shiny’s billing cycle refreshes. So, for the moment, I can’t link to a permanently working version of the app. If anyone reading this post is interested in hosting the app on a semi-permanent basis, please drop me a line at ulfelder <at> gmail. Meanwhile, R users can launch the app from their terminals with these two lines of code, assuming the ‘shiny’ package is already installed:

library(shiny)
runGitHub("national-security-strategy", "ulfelder")

You can also find all of the texts and code used in the app and some other stuff (e.g., the nss.explore.R script also implements topic modeling) in that GitHub repository, here.

A Tale of Normal Failure

When I blog about my own research, I usually describe work I’ve already completed and focus on the results. This post is about a recent effort that ended in frustration, and it focuses on the process. In writing about this aborted project, I have two hopes: 1) to reassure other researchers (and myself) that this kind of failure is normal, and 2) if I’m lucky, to get some help with this task.

This particular ball got rolling a couple of days ago when I read a blog post by Dan Drezner about one aspect of the Obama administration’s new National Security Strategy (NSS) report. A few words in the bits Dan quoted got me thinking about the worldview they represented, and how we might use natural-language processing (NLP) to study that:

At first, I was just going to drop that awkwardly numbered tweetstorm and leave it there. I had some time that afternoon, though, and I’ve been looking for opportunities to learn text mining, so I decided to see what I could do. The NSS reports only became a thing in 1987, so there are still just 16 of them, and they all try to answer the same basic questions: What threats and opportunities does the US face in the world, and what should the government do to meet them? As such, they seemed like the kind of manageable and coherent corpus that would make for a nice training exercise.

I started by checking to see if anyone had already done with earlier reports what I was hoping to do with the latest one. It turned out that someone had, and to good effect:

I promptly emailed the corresponding author to ask if they had replication materials, or even just clean versions of the texts for all previous years. I got an autoreply informing me that the author was on sabbatical and would only intermittently be reading his email. (He replied the next day to say that he would put the question to his co-authors, but that still didn’t solve my problem, and by then I’d moved on anyway.)

Without those materials, I would need to start by getting the documents in the proper format. A little Googling led me to the National Security Strategy Archive, which at the time had PDFs of all but the newest report, and that one was easy enough to find on the White House’s web site. Another search led me to a site that converts PDFs to plain text online for free. I spent the next hour or so running those reports through the converter (and playing a little Crossy Road on my phone while I waited for the jobs to finish). Once I had the reports as .txt files, I figured I could organize my work better and do other researchers a solid by putting them all in a public repository, so I set one up on GitHub (here) and cloned it to my hard drive.

At that point, I was getting excited, thinking: “Hey, this isn’t so hard after all.” In most of the work I do, getting the data is the toughest part, and I already had all the documents I wanted in the format I needed. I was just a few lines of code away from the statistics and plots and that would confirm or infirm my conjectures.

From another recent collaboration, I knew that the next step would be to use some software to ingest those .txt files, scrub them a few different ways, and then generate some word counts and maybe do some topic modeling to explore changes over time in the reports’ contents. I’d heard several people say that Python is really good at these tasks, but I’m an R guy, so I followed the lead on the CRAN Task View for natural language processing and installed and loaded the ‘tm’ package for text mining.

And that’s where the wheels started to come off of my rickety little wagon. Using the package developers’ vignette and an article they published in the Journal of Statistical Software, I started tinkering with some code. After a couple of false starts, I found that I could create a corpus and run some common preprocessing tasks on it without too much trouble, but I couldn’t get the analytical functions to run on the results. Instead, I kept getting this error message:

Error: inherits(doc, "TextDocument") is not TRUE

By then it was dinner time, so I called it a day and went to listen to my sons holler at each other across the table for a while.

When I picked the task back up the next morning, I inspected a few of the scrubbed documents and saw some strange character strings—things like ir1 instead of in and ’ where an apostrophe should be. That got me wondering if the problem lay in the encoding of those .txt files. Unfortunately, neither the files themselves nor the site that produced them tell me which encoding they use. I ran through a bunch of options, but none of them fixed the problem.

“Okay, no worries,” I thought. “I’ll use gsub() to replace those funky bits in the strings by hand.” The commands ran without a hiccup, but the text didn’t change. Stranger, when I tried to inspect documents in the R terminal, the same command wouldn’t always produce the same result. Sometimes I’d get the head, and sometimes the tail. I tried moving back a step in the process and installed a PDF converter that I could run from R, but R couldn’t find the converter, and my attempts to fix that failed.

At this point, I was about ready to quit, and I tweeted some of that frustration. Igor Brigadir quickly replied to suggest a solution, but it involved another programming language, Python, that I don’t know:

To go that route, I would need to start learning Python. That’s probably a good idea for the long run, but it wasn’t going to happen this week. Then Ken Benoit pointed me toward a new R package he’s developing and even offered to help me :

That sounded promising, so I opened R again and followed the clear instructions on the README at Ken’s repository to install the package. Of course the installation failed, probably because I’m still using R Version 3.1.1 and the package is, I suspect, written for the latest release, 3.1.2.

And that’s where I finally quit—for now. I’d hit a wall, and all my usual strategies for working through or around it had either failed or led to solutions that would require a lot more work. If I were getting paid and on deadline, I’d keep hacking away, but this was supposed to be a “fun” project for my own edification. What seemed at first like a tidy exercise had turned into a tar baby, and I needed to move on.

This cycle of frustration –> problem-solving –> frustration might seem like a distraction from the real business of social science, but in my experience, it is the real business. Unless I’m performing a variation on a familiar task with familiar data, this is normal. It might be boring to read, but then most of the day-to-day work of social science probably is, or at least looks that way to the people who aren’t doing it and therefore can’t see how all those little steps fit into the bigger picture.

So that’s my tale of minor woe. Now, if anyone who actually knows how to do text-mining in R is inspired to help me figure out what I’m doing wrong on that National Security Strategy project, please take a look at that GitHub repo and the script posted there and let me know what you see.

A Postscript on Measuring Change Over Time in Freedom in the World

After publishing yesterday’s post on Freedom House’s latest Freedom in the World report (here), I thought some more about better ways to measure what I think Freedom House implies it’s measuring with its annual counts of country-level gains and declines. The problem with those counts is that they don’t account for the magnitude of the changes they represent. That’s like keeping track of how a poker player is doing by counting bets won and bets lost without regard to their value. If we want to assess the current state of the system and compare it earlier states, the size of those gains and declines matters, too.

With that in mind, my first idea was to sum the raw annual changes in countries’ “freedom” scores by year, where the freedom score is just the sum of those 7-point political rights and civil liberties indices. Let’s imagine a year in which three countries saw a 1-point decline in their freedom scores; one country saw a 1-point gain; and one country saw a 3-point gain. Using Freedom House’s measure, that would look like a bad year, with declines outnumbering gains 3 to 2. Using the sum of the raw changes, however, it would look like a good year, with a net change in freedom scores of +1.

Okay, so here’s a plot of those sums of raw annual changes in freedom scores since 1982, when Freedom House rejiggered the timing of its survey.[1] I’ve marked the nine-year period that Freedom House calls out in its report as an unbroken run of bad news, with declines outnumbering gains every year since 2006. As the plot shows, when we account for the magnitude of those gains and losses, things don’t look so grim. In most of those nine years, losses did outweigh gains, but the net loss was rarely large, and two of the nine years actually saw net gains by this measure.

Annual global sums of raw yearly changes in Freedom House freedom scores (inverted), 1983-2014

Annual global sums of raw yearly changes in Freedom House freedom scores (inverted), 1983-2014

After I’d generated that plot, though, I worried that the sum of those raw annual changes still ignored another important dimension: population size. As I understand it, the big question Freedom House is trying to address with its annual report is: “How free is the world?” If we want to answer that question from a classical liberal perspective—and that’s where I think Freedom House is coming from—then individual people, not states, need to be our unit of observation.

Imagine a world with five countries where half the global population lives in one country and the other half is evenly divided between the other four. Now let’s imagine that the one really big country is maximally unfree while the other four countries are maximally free. If we compare scores (or changes in them) by country, things look great; 80 percent of the world is super-free! Meanwhile, though, half the world’s population lives under total dictatorship. An international relations theorist might care more about the distribution of states, but a liberal should care more about the distribution of people.

To take a look at things from this perspective, I decided to generate a scalar measure of freedom in the world system that sums country scores weighted by their share of the global population.[2] To make the result easier to interpret, I started by rescaling the country-level “freedom scores” from 14-2 to 0-10, with 10 indicating most free. A world in which all countries are fully free (according to Freedom House) would score a perfect 10 on this scale, and changes in large countries will move the index more than changes in small ones.

Okay, so here’s a plot of the results for the entire run of Freedom House’s data set, 1972–2014. (Again, 1981 is missing because that’s when Freedom House paused to align their reports with the calendar year.)  Things look pretty different than they do when we count gains and declines or even sum raw changes by country, don’t they?

A population-weighted annual scalar measure of freedom in the world, 1972-2014

A population-weighted annual scalar measure of freedom in the world, 1972-2014

The first thing that jumped out at me were those sharp declines in the mid-1970s and again in the late 1980s and early 1990s. At first I thought I must have messed up the math, because everyone knows things got a lot better when Communism crumbled in Eastern Europe and the Soviet Union, right? It turns out, though, that those swings are driven by changes in China and India, which together account for approximately one-third of the global population. In 1989, after Tienanmen Square, China’s score dropped from a 6/6 (or 1.67 on my 10-point scalar version) to 7/7 (or 0). At the time, China contained nearly one-quarter of the world’s population, so that slump more than offsets the (often-modest) gains made in the countries touched by the so-called fourth wave of democratic transitions. In 1998, China inched back up to 7/6 (0.83), and the global measure moved with it. Meanwhile, India dropped from 2/3 (7.5) to 3/4 (5.8) in 1991, and then again from 3/4 to 4/4 (5.0) in 1993, but it bumped back up to 2/4 (6.67) in 1996 and then 2/3 (7.5) in 1998. The global gains and losses produced by the shifts in those two countries don’t fully align with the conventional narrative about trends in democratization in the past few decades, but I think they do provide a more accurate measure of overall freedom in the world if we care about people instead of states, as liberalism encourages us to do.

Of course, the other thing that caught my eye in that second chart was the more-or-less flat line for the past decade. When we consider the distribution of the world’s population across all those countries where Freedom House tallies gains and declines, it’s hard to find evidence of the extended democratic recession they and others describe. In fact, the only notable downturn in that whole run comes in 2014, when the global score dropped from 5.2 to 5.1. To my mind, that recent downturn marks a worrying development, but it’s harder to notice it when we’ve been hearing cries of “Wolf!” for the eight years before.

NOTES

[1] For the #Rstats crowd: I used the slide function in the package DataCombine to get one-year lags of those indices by country; then I created a new variable representing the difference between the annual score for the current and previous year; then I used ddply from the plyr package to create a data frame with the annual global sums of those differences. Script on GitHub here.

[2] Here, I used the WDI package to get country-year data on population size; used ddply to calculate world population by year; merged those global sums back into the country-year data; used those sums as the denominator in a new variable indicating a country’s share of the global population; and then used ddply again to get a table with the sum of the products of those population weights and the freedom scores. Again, script on GitHub here (same one as before).

Estimating NFL Team-Specific Home-Field Advantage

This morning, I tinkered a bit with my pro-football preseason team strength survey data from 2013 and 2014 to see what other simple things I might do to improve the accuracy of forecasts derived from future versions of them.

My first idea is to go beyond a generic estimate of home-field advantage—about 3 points, according to my and everyone else’s estimates—with team-specific versions of that quantity. The intuition is that some venues confer a bigger advantage than others. For example, I would guess that Denver enjoys a bigger home-field edge than most teams because their stadium is at high altitude. The Broncos live there, so they’re used to it, but visiting teams have to adapt, and that process supposedly takes about a day for every 1,000 feet over 3,000. Some venues are louder than others, and that noise is often dialed up when visiting teams would prefer some quiet. And so on.

To explore this idea, I’m using a simple hierarchical linear model to estimate team-specific intercepts after taking preseason estimates of relative team strength into account. The line of R code used to estimate the model requires the lme4 package and looks like this:

mod1 <- lmer(score.raw ~ wiki.diff + (1 | home_team), results)

Where

score.raw = home_score - visitor_score
wiki.diff = home_wiki - visitor_wiki

Those wiki vectors are the team strength scores estimated from preseason pairwise wiki surveys. The ‘results’ data frame includes scores for all regular and postseason games from those two years so far, courtesy of devstopfix’s NFL results repository on GitHub (here). Because the net game and strength scores are both ordered home to visitor, we can read those random intercepts for each home team as estimates of team-specific home advantage. There are probably other sources of team-specific bias in my data, so those estimates are going to be pretty noisy, because I think it’s a reasonable starting point.

My initial results are shown in the plot below, which I get with these two lines of code, the second of which requires the lattice package:

ha1 <- ranef(mod1, condVar=TRUE)
dotplot(ha1)

Bear in mind that the generic (fixed) intercept is 2.7, so the estimated home-field advantage for each team is what’s shown in the plot plus that number. For example, these estimates imply that my Ravens enjoy a net advantage of about 3 points when they play in Baltimore, while their division-rival Bengals are closer to 6.

home.advantage.estimates

In light of DeflateGate, I guess I shouldn’t be surprised to see the Pats at the top of the chart, almost a whole point higher than the second-highest team. Maybe their insanely home low fumble rate has something to do with it.* I’m also heartened to see relatively high estimates for Denver, given the intuition that started this exercise, and Seattle, which I’ve heard said enjoys an unusually large home-field edge. At the same time, I honestly don’t know what to make of the exceptionally low estimates for DC and Jacksonville, who appear from these estimates to suffer a net home-field disadvantage. That strikes me as odd and undercuts my confidence in the results.

In any case, that’s how far my tinkering took me today. If I get really bored motivated, I might try re-estimating the model with just the 2013 data and then running the 2014 preseason survey scores through that model to generate “forecasts” that I can compare to the ones I got from the simple linear model with just the generic intercept (here). The point of the exercise was to try to get more accurate forecasts from simple models, and the only way to test that is to do it. I’m also trying to decide if I need to cross these team-specific effects with season-specific effects to try to control for differences across years in the biases in the wiki survey results when estimating these team-specific intercepts. But I’m not there yet.

* After I published this post, Michael Lopez helpfully pointed me toward a better take on the Patriots’ fumble rate (here), and Mo Patel observed that teams manage their own footballs on the road, too, so that particular tweak—if it really happened—wouldn’t have a home-field-specific effect.

A Few Rules of Thumb for Data Munging in Political Science

1. However hard you think it will be to assemble a data set for a particular analysis, it will be exponentially harder, with the size of the exponent determined by the scope and scale of the required data.

  • Corollary: If the data you need would cover the world (or just poor countries), they probably don’t exist.
  • Corollary: If the data you need would extend very far back in time, they probably don’t exist.
  • Corollary: If the data you need are politically sensitive, they probably don’t exist. If they do exist, you probably can’t get them. If you can get them, you probably shouldn’t trust them.

2. However reliable you think your data are, they probably aren’t.

  • Corollary: A couple of digits after decimal point is plenty. With data this noisy, what do those thousandths really mean, anyway?

3. Just because a data transformation works doesn’t mean it’s doing what you meant it to do.

4. The only really reliable way to make sure that your analysis is replicable is to have someone previously unfamiliar with the work try to replicate it. Unfortunately, a person’s incentive to replicate someone else’s work is inversely correlated with his or her level of prior involvement in the project. Ergo, this will rarely happen until after you have posted your results.

5. If your replication materials will include random parts (e.g., sampling) and you’re using R, don’t forget to set the seed for random number generation at the start. (Alas, I am living this mistake today.)

Please use the Comments to suggest additions, corrections, or modifications.

  • Author

  • Follow me on Twitter

  • Follow Dart-Throwing Chimp on WordPress.com
  • Enter your email address to follow this blog and receive notifications of new posts by email.

    Join 13,629 other followers

  • Archives

  • Advertisements
%d bloggers like this: