Most of the statistical work I do involves events that occur rarely in places over time. One of the best ways to get or give a feel for the structure of data like that is with a plot that shows variation in counts of those events across sequential, evenly-sized slices of time. For me, that usually means a sequence of annual, global counts of those events, like the one below for successful and failed coup attempts over the past several decades (see here for the R script that generated that plot and a few others and here for the data):
One thing I don’t like about those plots, though, is the loss of information that comes from converting events to counts. Sometimes we want to know not just how many events occurred in a particular year but also where they occurred, and we don’t want to have to query the database or look at a separate table to find out.
I try to do both in one go with a type of column chart I’ll call the stacked-label column plot. Instead of building columns from bricks of identical color, I use blocks of text that describe another attribute of each unit—usually country names in my work, but it could be lots of things. In order for those blocks to have comparable visual weight, they need to be equally sized, which usually means using labels of uniform length (e.g., two– or three-letter country codes) and a fixed-width font like Courier New.
I started making these kinds of plots in the 1990s, using Excel spreadsheets or tables in Microsoft Word to plot things like protest events and transitions to and from democracy. A couple decades later, I’m finally trying to figure out how to make them in R. Here is my first reasonably successful attempt, using data I just finished updating on when countries joined the World Trade Organization (WTO) or its predecessor, the General Agreement on Tariffs and Trade (GATT).
Note: Because the Wordpress template I use crams blog-post content into a column that’s only half as wide as the screen, you might have trouble reading the text labels in some browsers. If you can’t make out the letters, try clicking on the plot, then increasing the zoom if needed.
Without bothering to read the labels, you can see the time trend fine. Since 1960, there have been two waves of countries joining the global free-trade regime: one in the early 1960s, and another in the early 1990s. Those two waves correspond to two spates of state creation, so without the labels, many of us might infer that those stacks are composed mostly or entirely of new states joining.
When we scan the labels, though, we discover a different story. As expected, the wave in the early 1960s does include a lot of newly independent African states, but it also includes a couple of Warsaw Pact countries (Yugoslavia and Poland) and some middle-income cases from other parts of the world (e.g., Argentina and South Korea). Meanwhile, the wave of the early 1990s turns out to include very few post-Communist countries, most of which didn’t join until the end of that decade or early in the next one. Instead, we see a second wave of “developing” countries joining on the eve of the transition from GATT to the WTO, which officially happened on January 1, 1995. I’m sure people who really know the politics of the global free-trade regime, or of specific cases or regions, can spot some other interesting stories in there, too. The point, though, is that we can’t discover those stories if we can’t see the case labels.
Here’s another one that shows which countries had any coup attempts each year between 1960 and 2014, according to Jonathan Powell and Clayton Thyne‘s running list. In this case, color tells us the outcomes of those coup attempts: red if any succeeded, dark grey if they all failed.
One story that immediately catches my eye in this plot is Argentina’s (ARG) remarkable propensity for coups in the early 1960s. It shows up in each of the first four columns, although only in 1962 are any of those attempts successful. Again, this is information we lose when we only plot the counts without identifying the cases.
The way I’m doing it now, this kind of chart requires data to be stored in (or converted to) event-file format, not the time-series cross-sectional format that many of us usually use. Instead of one row per unit–time slice, you want one row for each event. Each row should at least two columns with the case label and the time slice in which the event occurred.
If you’re interested in playing around with these types of plots, you can find the R script I used to generate the ones above here. Perhaps some enterprising soul will take it upon him- or herself to write a function that makes it easy to produce this kind of chart across a variety of data structures.
It would be especially nice to have a function that worked properly when the same label appears more than once in a given time slice. Right now, I’m using the function ‘match’ to assign y values that evenly stack the events within each bin. That doesn’t work for the second or third or nth match, though, because the ‘match’ function always returns the position of the first match in the relevant vector. So, for example, if I try to plot all coup attempts each year instead of all countries with any coup attempts each year, the second or later events in the same country get placed in the same position as the first, which ultimately means they show up as blank spaces in the columns. Sadly, I haven’t figured out yet how to identify location in that vector in a more general way to fix this problem.