You know the telephone game, where a bunch of people sit in a circle or around a table and pass a whispered sentence from person to person until it comes back to the one who started it and they say the version they heard out loud and you all crack up at how garbled it got?
Well, I wonder if John Beieler is cracking up or crying right now, because the same thing is happening with a visualization he created using data from the recently released Global Dataset on Events, Language, and Tone, a.k.a. GDELT.
Back at the end of July, John posted a terrific animated set of maps of protest activity worldwide since 1979. In a previous post on a single slice of the data used in that animation, John was careful to attach a number of caveats to the work: the maps only include events covered in the sources GDELT scours, GDELT sometimes codes events that didn’t happen, GDELT sometimes struggles to put events in their proper geographic location, event labels in the CAMEO event classification scheme GDELT uses doesn’t always mean what you think they mean, counts of events don’t tell you anything about the size or duration of the events being counted, etc., etc. In the blogged cover letter for the animated series, John added one more very important caveat about the apparent increase in the incidence of protest activity over time:
When dealing with the time-series of data, however, one additional, and very important, point also applies. The number of events recorded in GDELT grows exponentially over time, as noted in the paper introducing the dataset. This means that over time there appears to be a steady increase in events, but this should not be mistaken as a rise in the actual amount of behavior
X (protest behavior in this case). Instead, due to changes in reporting and the digital recording of news stories, it is simply the case that there are more events of every type over time. In some preliminary work that is not yet publicly released, protest behavior seems to remain relatively constant over time as a percentage of the total number of events. This means that while there was an explosion of protest activity in the Middle East, and elsewhere, during the past few years, identifying visible patterns is a tricky endeavor due to the nature of the underlying data.
John’s post deservedly caught the eye of J. Dana Stuster, an assistant editor at Foreign Policy, who wrote a bit about it last week. Stuster’s piece was careful to repeat many of John’s caveats, but the headline—“Mapped: Every Protest on the Planet since 1979″—got sloppy, essentially shedding several of the most important qualifiers. As John had taken pains to note, what we see in the maps is not all that there is, and some of what’s shown in the maps didn’t really happen.
Well, you can probably where this is going. Not long after that Foreign Policy piece appeared, I saw this tweet from Chelsea Clinton:
In fewer than 140 characters, Clinton impressively managed to put back the caveat Foreign Policy had dropped in its headline about press coverage vs. reality, but the message had already been garbled, and now it was going viral. Fast forward to this past weekend, when the phrase “Watch a Jaw-dropping Visualization of Every Protest since 1979” made repeated appearances in my Twitter timeline. This next iteration came from Ultraculture blogger Jason Louv, and it included this bit:
Also fruitful: Comparing this data with media coverage and treatment of protest. Why is it easy to think of the 1960s and 70s as a time of dissent and our time as a more ordered, controlled and conformist period when the data so clearly shows that there is no comparison in how much protest there is now compared to then? Media distortion much?
So now we get a version that ignores both the caveat about GDELT’s coverage not being exhaustive or perfect and the related one about the apparent increase in protest volume over time being at least in part an artifact of “changes in reporting and the digital recording of news stories.” What started out as a simple proof-of-concept exercise—“The areas that are ‘bright’ are those that would generally be expected to be so,” John wrote in his initial post—had been twisted into a definitive visual record of protest activity around the world in the past 35 years.
As someone who thinks that GDELT is an analytical gusher and believes that it’s useful and important to make work like this accessible to broader audiences, I don’t know what to learn from this example. John was as careful as could be, but the work still mutated as it spread. How do you prevent this from happening, or at least mitigate the damage when it does?
If anyone’s got some ideas, I’d love to hear them.