The Myth of Comprehensive Data

“What about using Twitter sentiment?”

That suggestion came to me from someone at a recent Data Science DC meetup, after I’d given a short talk on assessing risks of mass atrocities for the Early Warning Project, and as the next speaker started his presentation on predicting social unrest. I had devoted the first half of my presentation to a digression of sorts, talking about how the persistent scarcity of relevant public data still makes it impossible to produce global forecasts of rare political crises—things like coups, insurgencies, regime breakdowns, and mass atrocities—that are as sharp and dynamic as we would like.

The meetup wasn’t the first time I’d heard that suggestion, and I think all of the well-intentioned people who have made it to me have believed that data derived from Twitter would escape or overcome those constraints. In fact, the Twitter stream embodies them. Over the past two decades, technological, economic, and political changes have produced an astonishing surge in the amount of information available from and about the world, but that surge has not occurred evenly around the globe.

Think of the availability of data as plant life in a rugged landscape, where dry peaks are places of data scarcity and fertile valleys represent data-rich environments. The technological developments of the past 20 years are like a weather pattern that keeps dumping more and more rain on that topography. That rain falls unevenly across the landscape, however, and it doesn’t have the same effect everywhere it lands. As a result, plants still struggle to grow on many of those rocky peaks, and much of the new growth occurs where water already collected and flora were already flourishing.

The Twitter stream exemplifies this uneven distribution of data in a couple of important ways. Take a look at the map below, a screenshot I took after letting Tweetping run for about 16 hours spanning May 6–7, 2015. The brighter the glow, the more Twitter activity Tweetping saw.

tweetping 1530 20150506 to 0805 20150507

Some of the spatial variation in that map reflects differences in the distribution of human populations, but not all of it. Here’s a map of population density, produced by Daysleeper using data from CEISIN (source). If you compare this one to the map of Twitter usage, you’ll see that they align pretty well in Europe, the Americas, and some parts of Asia. In Africa and other parts of Asia, though, not so much. If it were just a matter of population density, then India and eastern China should burn brightest, but they—and especially China—are relatively dark compared to “the West.” Meanwhile, in Africa, we see pockets of activity, but there are whole swathes of the continent that are populated as or more densely than the brighter parts of South America, but from which we see virtually no Twitter activity.

world population density map

So why are some pockets of human settlement less visible than others? Two forces stand out: wealth and politics.

First and most obvious, access to Twitter depends on electricity and telecommunications infrastructure and gadgets and literacy and health and time, all of which are much scarcer in poorer parts of the world than they are in richer places. The map below shows lights at night, as seen from space by U.S. satellites 20 years ago and then mapped by NASA (source). These light patterns are sometimes used as a proxy for economic development (e.g., here).


This view of the world helps explain some of the holes in our map of Twitter activity, but not all of it. For example, many of the densely populated parts of Africa don’t light up much at night, just as they don’t on Tweetping, because they lack the relevant infrastructure and power production. Even 20 years ago, though, India and China looked much brighter through this lens than they do on our Twitter usage map.

So what else is going on? The intensity and character of Twitter usage also depends on freedoms of information and speech—the ability and desire to access the platform and to speak openly on it—and this political layer keeps other areas in the dark in that Tweetping map. China, North Korea, Cuba, Ethiopia, Eritrea—if you’re trying to anticipate important political crises, these are all countries you would want to track closely, but Twitter is barely used or unavailable in all of them as a direct or indirect consequence of public policy. And, of course, there are also many places where Twitter is accessible and used but censorship distorts the content of the stream. For example, Saudi Arabia lights up pretty well on the Twitter-usage map, but it’s hard to imagine people speaking freely on it when a tweet can land you in prison.

Clearly, wealth and political constraints still strongly shape the view of the world we can get from new data sources like Twitter. Contrary to the heavily-marketed myth of “comprehensive data,” poverty and repression continue to hide large swathes of the world out of our digital sight, or to distort the glimpses we get of them.

Unfortunate for efforts to forecast rare political crises, those two structural features that so strongly shape the production and quality of data also correlate with the risks we want to anticipate. The map below shows the Early Warning Project‘s most recent statistical assessments of the risk of onsets of state-led mass-killing episodes. Now flash back to the visualization of Twitter usage above, and you’ll see that many of the countries colored most brightly on this map are among the darkest on that one. Even in 2015, the places about which we most need more information to sharpen our forecasts of rare political crises are the ones that are still hardest to see.

Statistically, this is the second-worst of all possible worlds, the worst one being the total absence of information. Data are missing not at random, and the processes producing those gaps are the same ones that put places at greater risk of mass atrocities and other political calamities. This association means that models we estimate with those data will often be misleading. There are ways to mitigate these problems, but they aren’t necessarily simple, cheap, or effective, and that’s before we even start in on the challenges of extracting useful measures from something as heterogeneous and complex as the Twitter stream.

So that’s what I see when I hear people suggest that social media or Google Trends or other forms of “digital exhaust” have mooted the data problems about which I so often complain. Lots of organizations are spending a lot of money trying to overcome these problems, but the political and economic topography producing them does not readily yield. The Internet is part of this complex adaptive system, not a space outside it, and its power to transform that system is neither as strong nor as fast-acting as many of us—especially in the richer and freer parts of the world—presume.

What are all these violent images doing to us?

Early this morning, I got up, made some coffee, sat down at my desk, and opened Twitter to read the news and pass some time before I had to leave for a conference. One of the first things I saw in my timeline was a still from a video of what was described in the tweet as an ISIS fighter executing a group of Syrian soldiers. The soldiers lay on their stomachs in the dirt, mostly undressed, hands on their heads. They were arranged in a tightly packed row, arms and legs sometimes overlapping. The apparent killer stood midway down the row, his gun pointed down, smoke coming from its barrel.

That experience led me to this pair of tweets:

tweet 1

tweet 2

If you don’t use Twitter, you probably don’t know that, starting in 2013, Twitter tweaked its software so that photos and other images embedded in tweets would automatically appear in users’ timelines. Before that change, you had to click on a link to open an embedded image. Now, if you follow someone who appends an image to his or her tweet, you instantly see the image when the tweet appears in your timeline. The system also includes a filter of sorts that’s supposed to inform you before showing media that may be sensitive, but it doesn’t seem to be very reliable at screening for violence, and it can be turned off.

As I said this morning, I think the automatic display of embedded images is great for sharing certain kinds of information, like data visualizations. Now, tweets can become charticles.

I am increasingly convinced, though, that this feature becomes deeply problematic when people choose to share disturbing images. After I tweeted my complaint, Werner de Pooter pointed out a recent study on the effects of frequent exposure to graphic depictions of violence on the psychological health of journalists. The study’s authors found that daily exposure to violent images was associated with higher scores on several indices of psychological distress and depression. The authors conclude:

Given that good journalism depends on healthy journalists, news organisations will need to look anew at what can be done to offset the risks inherent in viewing User Generated Content material [which includes graphic violence]. Our findings, in need of replication, suggest that reducing the frequency of exposure may be one way to go.

I mostly use Twitter to discover stories and ideas I don’t see in regular news outlets, to connect with colleagues, and to promote my own work. Because I study political violence and atrocities, a fair share of my feed deals with potentially disturbing material. Where that material used to arrive only as text, it increasingly includes photos and video clips of violent or brutal acts as well. I am starting to wonder how routine exposure to those images may be affecting my mental health. The study de Pooter pointed out has only strengthened that concern.

I also wonder if the emotional power of those images is distorting our collective sense of the state of the world. Psychologists talk about the availability heuristic, a cognitive shortcut in which the ease of recalling examples of certain things drives our expectations about the likelihood or risk of those things. As Daniel Kahneman describes on p. 138 of Thinking, Fast and Slow,

Unusual events (such as botulism) attract disproportionate attention and are consequently perceived as less unusual than they really are. The world in our heads is not a precise replica of reality; our expectations about the frequency of events are distorted by the prevalence and emotional intensity of the messages to which we are exposed.

When those images of brutal violence pop into our view, they grab our attention, pack a lot of emotional intensity, and are often to hard to shake. The availability heuristic implies that frequent exposure to those images leads us to overestimate the threat or risk of things associated with them.

This process could even be playing some marginal role in a recent uptick in stories about how the world is coming undone. According to Twitter, its platform now has more than 270 million monthly active users. Many journalists and researchers covering world affairs probably fall in that 270 million. I suspect that those journalists and researchers spend more time watching their timelines than the average user, and they are probably more likely to turn off that “sensitive content” warning, too.

Meanwhile, smartphones and easier Internet access make it increasingly likely that acts of violence will be recorded and then shared through those media, and Twitter’s default settings now make it more likely that we see them when they are. Presumably, some of the organizations perpetrating this violence—and, sometimes, ones trying to mobilize action to stop it—are aware of the effects these images can have and deliberately push them to us to try to elicit that response.

As a result, many writers and analysts are now seeing much more of this material than they used to, even just a year or two ago. Whatever the actual state of the world, this sudden increase in exposure to disturbing material could be convincing many of us that the world is scarier and therefore more dangerous than ever before.

This process could have larger consequences. For example, lately I’ve had trouble getting thoughts of James Foley’s killing out of my mind, even though I never watched the video of it. What about the journalists and policymakers and others who did see those images? How did that exposure affect them, and how much is that emotional response shaping the public conversation about the threat the Islamic State poses and how our governments should respond to it?

I’m not sure what to do about this problem. As an individual, I can choose to unfollow people who share these images or spend less time on Twitter, but both of those actions carry some professional costs as well. The thought of avoiding these images also makes me feel guilty, as if I am failing the people whose suffering they depict and the ones who could be next. By hiding from those images, do I become complicit in the wider violence and injustice they represent?

As an organization, Twitter could decide to revert to the old no-show default, but that almost certainly won’t happen. I suspect this isn’t an issue for the vast majority of users, and it’s hard to imagine any social-media platform retreating from visual content as sites like Instagram and Snapchat grow quickly. Twitter could also try to remove embedded images that contain potentially disturbing material. As a fan of unfettered speech, though, I don’t find that approach appealing, either, and the unreliability of the current warning system suggests it probably wouldn’t work so well anyway.

In light of all that uncertainty, I’ll conclude with an observation instead of a solution: this is one hell of a huge psychological experiment we’re running right now, and its consequences for our own mental health and how we perceive the world around us may be more substantial than we realize.

A Few Suggestions for Social Scientists New to Twitter

Earlier today, one scholar whose work I greatly admire asked another scholar whose work I greatly admire for advice on how to get started on Twitter. I liked Dan’s response, but I thought I’d take Christian’s query as an open invitation to share a few suggestions of my own. So:

Replace the egg with a picture of you. Seriously, don’t even start following people until you’ve done this. It’s not vain; it’s just letting people know that there’s (probably) a real human on the other end, and letting us know something about how you plan to present yourself in this context. Some people can get away with using cartoons or pictures of their pets or kids, but most of us can’t. So, unless you’re trying to make a very specific statement by doing something different, you probably shouldn’t try.

Decide why you’re using Twitter. If your main goal is to use Twitter as a news feed or to follow other peoples’ work, then it’s a really easy tool to use. Just poke around until you find people and organizations that routinely cover the issues that interest you, and follow them. If, however, your goal is to develop a professional audience, then you need to put more thought into what you tweet and retweet, and the rest of my suggestions might be useful.

Pick your niche(s). There are a lot of social scientists on Twitter, and many of them are picky about whom they follow. To make it worth peoples’ while to add you to their feed, pick one or a few of your research interests and focus almost all of your tweets and retweets on them. For example, I’ve tried to limit my tweets to the topics I blog about: democratization, coups, state collapse,  forecasting, and a bit of international relations. When I was new to Twitter, I focused especially on democratization and forecasting because those weren’t topics other people were tweeting much about at the time. I think that differentiation made it easier for people to attach an identity to my avatar, and to understand what they would get by following me that they weren’t already getting from the 500 other accounts in their feeds.

Keep the tweet volume low, at least at the start. For a long time, I tried to limit myself to two or three tweets per Twitter session, usually once or twice per day. That made me think carefully about what I tweeted, (hopefully) keeping the quality higher and preventing me from swamping peoples’ feeds, a big turnoff for many.

Don’t just share the news; augment it. If you’re tweeting a news story or journal article or something, use a short quote or comment that crystallizes the story or tells us something about why you think it’s worth reading. In other words, try to add value. I usually lead with the title, then insert the link, then hang the quote or comment at the end, like this:

But, of course, there are lots of ways to do this. You can also drop the title entirely, like this recent one from Joshua Kucera that got me laughing:

Keep it professional.  If you’re thinking of Twitter as an extension of your work, don’t tweet about personal stuff. This is especially important when you’re new to the medium. The occasional reference to your life outside the office can help people feel more connected to you, but please err on the side of reticence. I have chosen not to follow or unfollowed many people because the interesting stuff in their feed was overwhelmed by the personal and trivial (and sometimes just downright gross). At some point, all that jetsam gets in the way of the information I’m actually looking for, so I choose to cut it off.

Related to the previous suggestion, be polite. In theory, this should go without saying, but, hey, this is the Internet. If you’re using Twitter for professional purposes, I think it makes sense to use the same language and demeanor you’d use in the office or at a professional conference. That can include humor and the occasional personal tidbit you’d share in a hallway conversation, but probably not the bar talk, and definitely not the post-conference conversations with your confidantes. It most definitely does not include nastiness or pettiness.

Be generous. Don’t retweet something under your own handle just to troll for RTs. If you want to share something someone else already shared, just pass along his or her tweet. The exception to this rule is when you’re going to add your own comment. Then just be sure to acknowledge the source with a via or h/t (hat tip). If a bunch of people already shared something so you’re not sure whom to credit, the answer is, Don’t share it again.

If you modify someone’s tweet at all before passing it along, use MT. This is a Twitter pet peeve of mine. RT (retweet) should only be used when what follows is a verbatim replication of the original. If you change anything—abbreviate, drop a comma, whatever—use MT (modified tweet) instead.

Finally, know that it’s addictive. I don’t mean fun-and-time-consuming addictive; I mean addictive addictive, like nicotine and booze. Before you dive in, it’s worth considering how that addiction might negatively affect your life and how you plan to deal with it. Just because lots of people do it doesn’t mean it’s good for you. The time you spend on Twitter is time you could have spent doing something else. If that something else is more important and you’re prone to addiction, be careful.

Democratization Resources on Twitter

After lamenting the scarcity of democratization resources on Twitter in my last post, I’m realizing I might do some good by calling out some of the excellent people and organizations who are already there. What follows is woefully incomplete and US-centric, I’m sure, but it’s what I’ve got right now. If you think someone or something is missing, please let me know in the Comments or via Twitter, and I’ll take a look. Users are listed in alphabetical order by Twitter handle. (If you’re new to Twitter and looking for these users, bear in mind that capitalization doesn’t matter, but punctuation does. You can also find all of these accounts under a list I’ve created called democratization-resources.)

aceproject_org. From the ACE Electoral Knowledge Network, a useful source for information about technical aspects of upcoming elections.

africanelection. A terrific feed of election-related news stories selected by the African Elections project, mostly from local media.

davidjandura. A graduate student at Georgetown University who keeps a sharp eye on elections and party systems in the Middle East at his blog, Ahwa Talk.

Dem_Journal. A feed curated by Demokratizatsiya, a journal focused on political transformation in the Soviet successor states.

demdigest. Tweets announcing new entries to the always-interesting Democracy Digest blog, which is produced by the National Endowment for Democracy.

DemocracyTweetz. An activist feed from the World Movement for Democracy, a global democracy-promotion network backed by the National Endowment for Democracy.

electionguide. A resource for tracking election dates, brought to you by the International Foundation for Electoral Systems (IFES), one of the premier international bodies for election management support.

EricaChenoweth. Erica is an assistant professor at Wesleyan University who does terrific research on civil resistance movements, which often play a role in democratic transitions. She also blogs at Rational Insurgent.

IFES1987. A newsier feed from the aforementioned IFES, with election-related stories from around the world.

IRIglobal. The International Republican Institute‘s Twitter face, a nice feed of news on human rights, transitions, and consolidation.

FreedomHouseDC. A feed of stories related to civil and political rights, from Freedom House, of course.

kenroth. The executive director of Human Rights Watch Kenneth Roth, tweeting his organization’s points of concern.

marquezxavier. A political science lecturer at Victoria University of Wellington who blogs deep thoughts at Abandoned Footnotes.

NDI. Democracy-related news stories from all over the world, selected by staff at the US’s National Democratic Institute.

OpenSociety. Occasional and wide-ranging feed on human rights, from the Open Society Foundations founded by George Soros.

StanfordCDDRL. Occasional items from Stanford’s Center for Democracy, Development, and Rule of Law, a kind of interdisciplinary think tank within the university.

votesafe. A bursty but rich feed of election- and democracy-related news stories from Megan Reif, a graduate student at the University of Michigan.

Why Aren’t There More Democratization Scholars on Twitter?

I’ve been active on Twitter for a few months now, and I am still struggling to connect there with others scholars who study authoritarian politics, democratization, and democratic breakdown. There are plenty of people who self-identify as students or experts on national security, counter-terrorism, counterinsurgency, economic development, aid, and international relations theory.  There is also a nice collection of non-profit organizations and individual activists who are engaged in democracy promotion, an endeavor that’s related to, but clearly distinct from, scholarship on how, when, and why processes of regime change occur.

So where is everybody? As a member of the American Political Science Association‘s Comparative Democratization section, I know those scholars are out there. They just don’t seem to be tweeting. That’s a shame, because Twitter is a great way to share and vet ideas with, and learn from, scads of people you’ll never encounter in your corporeal life. In the past few months, Twitter has helped me, among other things: follow the twists and turns of Egypt’s transition plans; learn about shady license sales linked to upcoming elections in the Democratic Republic of Congo; gauge the resilience of pro-democracy protests in Morocco; and learn more about the roots of social unrest in China. It has also given me a nice venue to share and discuss my own ideas about things like the effectiveness of U.S. democracy promotion projects, the prospects for new democracies in the Arab world, and the nature of the democratization process.

Maybe I’m just looking under the wrong rocks, and there’s a host of democratization scholars on Twitter or in the wider world of long-form bloggers with whom I’ve simply failed to connect. If I’m not missing something, then maybe I ought to be selfishly glad; after all, scarcity helps drive interest toward those of us who are active in this medium.

Really, though, I can’t help but think there’s a big, fat missed opportunity here, both for the scholars who aren’t participating in the conversation and for the other communities of interest who might want to converse with them. This gap is especially glaring amid the flurry of regime collapses, revolutions, and, hopefully, democratic transitions we’re seeing in 2011. At times like this, the academic publishing cycle seems painfully slow, rendering work that attempts to respond to these kinds of developments inaccessible right when it’s most relevant.

So, students and scholars of democratization, consider this an open invitation: come join the conversation! And if you do, please give me a shout at @jay_ulfelder.

  • Author

  • Follow me on Twitter

  • Follow Dart-Throwing Chimp on
  • Enter your email address to follow this blog and receive notifications of new posts by email.

    Join 13,609 other subscribers
  • Archives

%d bloggers like this: