Farewell

I’m shutting down the blog. Well, I won’t be publishing new posts, anyway; the archived material will remain online, in case anyone finds it useful.

I started Dart-Throwing Chimp in the spring of 2011, not long after I made the jump to freelancing, with a few goals in mind. I wanted to make myself more visible and appealing to potential clients. I wanted to practice and improve as a writer, a researcher, and a coder. And I wanted to participate in interesting conversations with colleagues and the wider world.

The blog succeeded on all of those counts. In so doing, though, it also became a larger and larger job of its own. Some of the more involved posts, like my annual coup forecasts, required several days of work. Even the shorter ones often take hours to write, and there were stretches when I was writing three or four of those each week.

I don’t get paid for any of that time. For a while, that strategy made sense to me. It doesn’t any more. The personal and professional opportunity costs have come to outweigh the benefits.

I’m shuttering the blog, but I continue to look for new work as a writer and a data scientist, for lack of a better term. If you think I might be useful to you or your organization—as a freelancer, or maybe as something else—please let me know (ulfelder at gmail dot com; CV here).

Meanwhile, thanks for reading, and I hope I’ll see you around.

Be Vewy, Vewy Quiet

This blog has gone relatively quiet of late, and it will probably stay that way for a while. That’s partly a function of my personal life, but it also reflects a conscious decision to spend more time improving my abilities as a programmer.

I want to get better at scraping, making, munging, summarizing, visualizing, and analyzing data. So, instead of contemplating world affairs, I’ve been starting to learn Python; using questions on Stack Overflow as practice problems for R; writing scripts that force me to expand my programming skills; and building Shiny apps that put those those skills to work. Here’s a screenshot of one app I’ve made—yes, it actually works—that interactively visualizes ACLED’s latest data on violence against civilians in Africa, based partly on this script for scraping ACLED’s website:

acled.visualizer.20150728

When I started on this kick, I didn’t plan to stop writing blog posts about international affairs. As I’ve gotten into it, though, I’ve found that my curiosity about current events has ebbed, and the pilot light for my writing brain has gone out. Normally, writing ideas flare up throughout the day, but especially in the early morning. Lately, I wake up thinking about the coding problems I’m stuck on.

I think it’s a matter of attention, not interest. Programming depends on the tiniest details. All those details quickly clog the brain’s RAM, leaving no room for the unconscious associations that form the kernels of new prose. That clogging happens even faster when other parts of your life are busy, stressful, or off kilter, as they are for many of us, as as they are for me right now.

That’s what I think, anyway. Whatever the cause, though, I know that I’m rarely feeling the impulse to write, and I know that shift has sharply slowed the pace of publishing here. I’m leaving the channel open and hope I can find the mental and temporal space to keep using it, but who knows what tomorrow may bring?

Data Science Takes Work, Too

Yesterday, I got an email from the editor of an online publication inviting me to contribute pieces that would bring statistical analysis to bear on some topics they are hoping to cover. I admire the publication, and the topics interest me.

There was only one problem: the money. The honorarium they could offer for a published piece is less than my hourly consulting rate, and all of the suggested projects—as well most others I can imagine that would fit into this outlet’s mission—would probably take days to do. I would have to find, assemble, and clean the relevant data; explore and then analyze the fruits of that labor; generate and refine visualizations of those results; and, finally, write approximately 1,000 words about it. Extrapolating from past experience, I suspect that if I took on one of these projects, I would be working for less than minimum wage. And, of course, that estimated wage doesn’t account for the opportunity costs of foregoing other work (or leisure) I might have done during that time.

I don’t mean to cast aspersions on this editor. The publication is attached to a non-profit endeavor, so the fact that they were offering any payment at all already puts them well ahead of most peers. I’m also guessing that many of this outlet’s writers have salaried “day” jobs to which their contributions are relevant, so the honorarium is more of a bonus than a wage. And, of course, I spend hours of unpaid time writing posts for this blog, a pattern that some people might reasonably interpret as a signal of how much (or little) I think my time is worth.

Still, I wonder if part of the issue here is that this editor just had no idea how much work those projects would entail. A few days ago, Jeff Leek ran an excellent post on the Simply Statistics blog, about how “data science done well looks easy—and that is a big problem for data scientists.” As Leek points out,

Most well executed and successful data science projects don’t (a) use super complicated tools or (b) fit super complicated statistical models. The characteristics of the most successful data science projects I’ve evaluated or been a part of are: (a) a laser focus on solving the scientific problem, (b) careful and thoughtful consideration of whether the data is the right data and whether there are any lurking confounders or biases and (c) relatively simple statistical models applied and interpreted skeptically.

It turns out doing those three things is actually surprisingly hard and very, very time consuming. It is my experience that data science projects take a solid 2-3 times as long to complete as a project in theoretical statistics. The reason is that inevitably the data are a mess and you have to clean them up, then you find out the data aren’t quite what you wanted to answer the question, so you go find a new data set and clean it up, etc. After a ton of work like that, you have a nice set of data to which you fit simple statistical models and then it looks super easy to someone who either doesn’t know about the data collection and cleaning process or doesn’t care.

All I can say to all of that is: YES. On topics I’ve worked for years, I realize some economies of scale by knowing where to look for data, knowing what those data look like, and having ready-made scripts that ingest, clean, and combine them. Even on those topics, though, updates sometimes break the scripts, sources come and go, and the choice of model or methods isn’t always obvious. Meanwhile, on new topics, the process invariably takes many hours, and it often ends in failure or frustration because the requisite data don’t exist, or you discover that they can’t be trusted.

The visualization part alone can take a lot of time if you’re finicky about it—and you should be finicky about it, because your charts are what most people are going to see, learn from, and remember. Again, though, I think most people who don’t do this work simply have no idea.

Last year, as part of a paid project, I spent the better part of a day tinkering with an R script to ingest and meld a bunch of time series and then generate a single chart that would compare those time series. When I finally got the chart where I wanted it, I showed the results to someone else working on that project. He liked the chart and immediately proposed some other variations we might try. When I responded by pointing out that each of those variations might take an hour or two to produce, he was surprised and admitted that he thought the chart had come from a canned routine.

We laughed about it at the time, but I think that moment perfectly illustrates the disconnect that Gill describes. What took me hours of iterative code-writing and drew on years of accumulated domain expertise and work experience looked to someone else like nothing more than the result of a few minutes of menu-selecting and button-clicking. When that’s what people think you do, it’s hard to get them to agree to pay you well for what you actually do.

Some Thoughts on “Alternative Academic Careers”

I’m headed to New Orleans next week for the the annual convention of the International Studies Association (ISA), and while there I’m scheduled to speak on a panel on “alternative academic careers” (Friday at 1:45 PM). To help organize my views on the subject and to share them with people who are curious but can’t attend the panel, I thought I would turn them into a blog post. So:

Let me start with some caveats. I am a white, male U.S. citizen who grew up in a family that wasn’t poor and who married a woman while in grad school. I would prefer to deal in statistics, but on this topic, I can only really speak from experience, and that experience has been conditioned by those personal characteristics. In other words, I know my view is narrow and biased, but on this topic, that view is all I’ve got, so take it for whatever you think it’s worth.

My own career trajectory has been, I think, unusual. I got my Ph.D. from Stanford in the spring of 1997. At the time, I had no academic job offers; I had a spouse who wanted to go to art school; we had two dogs and couldn’t find a place we could afford that would rent to us in the SF Bay Area (this was the leading edge of the dot-com boom); and we both had family in the DC metro area. So, we packed up and moved in with my mother-in-law in Baltimore, and I started looking for work in and around Washington.

It took me a few months to land a job as an analyst in a little branch of a big forensic accounting firm, basically writing short pieces on political risk for a newsletter that went out to the firm’s corporate clients. We moved to the DC suburbs and I did that for about a year until I got a job with a small government contractor that did research projects for the U.S. “intelligence community” and the Department of Defense. After a couple of years of that and the birth of our first son, I decided I needed a change, so I took a job writing book-length research reports on telecom firms and industry segments for a trade-news outfit. At the time, I was also doing some freelance feature writing on whatever I could successfully pitch, and I thought a full-time writing job would help me move faster in that direction.

After a couple of years of that and no serious traction on the writing front, I got a line through one of my dissertation committee members on a part-time consulting thing with a big government contractor, SAIC. I was offered and took that job, which soon evolved into a full-time, salaried position as research director for the Political Instability Task Force. I did that for 10 years, then left it at the end of 2011 to try freelancing as a social scientist. Most of my work time since then has been devoted to the Early Warning Project and the Good Judgment Project, with dribs and drabs on other things and enough free time to write this blog.

So that’s where I’m coming from. Now, here is what I think I’ve learned from those experiences, and from watching and talking to others with similar training who do related things.

First, freelancing—what I’m doing now, and what usually gets fancified as “independent consulting”—is not a realistic option for most social scientists early in their careers, and probably not for most people, period. I would not have landed either of the large assignments I’ve gotten in the past few years without the professional network and reputation I had accumulated over the previous ten. This blog and my activity on social media have helped me expand that network, but nearly all of the paid jobs I’ve done in the past few years have come through those earlier connections. Best I can tell, there are no realistic short cuts to this kind of role.

Think tanks aren’t really an option for new Ph.D.s, either. Most of the jobs in that world are for recent undergraduates or MAs on the one hand and established scholars and practitioners on the other. There are some opportunities for new Ph.Ds looking to do straight-up analysis at places like RAND and the Congressional Research Service, but the niche is tiny, and those jobs will be very hard to land.

That brings me to the segment I know best, namely, government contracting. Budget cuts mean that jobs in that market are scarcer than they were a few years ago, but there are still lots of them. In that world, though, hiring is often linked to specific roles in contracts that have already been awarded. You’re not paid to pursue your own interests or write policy briefs; you’re usually hired to do tasks X and Y on contract Z, with the expectation that you’ll also be able to apply those skills to other, similar contracts after Z runs out. So, to land these jobs, you need to have abilities that can plug into clearly-defined contract tasks, but that also make you a reasonably good risk overall. Nowadays, programming and statistical (“data science”) skills are in demand, but so are others, including fluency in foreign languages and area expertise. If you want to get a feel for what’s valued in that world, spend some time looking at job listings from the big contracting firms—places like Booz Allen Hamilton, Lockheed Martin, SAIC, and Leidos—and see what they’re asking for.

If you do that, you’ll quickly discover that having a security clearance is a huge plus. If you think you want to do government-contract work, you can give yourself a leg up by finding some way to get that clearance while you’re still in graduate school. If you can’t do that, you might consider looking for work that isn’t your first choice on substance but will lead to a clearance in hopes of making an upward or lateral move later on.

All of that said, it’s important to think carefully about the down sides of a security clearance before you pursue one. A clearance opens some doors, but it closes others, some permanently. The process can take a long time, and it can be uncomfortable. When you get a clearance, you accept some constraints on your speech, and those never entirely fall away, even if you leave that world. The fact that you currently or once held a clearance and worked with agencies that required one will also make a certain impression on some people, and that never fully goes away, either. So it’s worth thinking carefully about those long-term implications before you jump in.

If you do go to work on either side of that fence—for a contractor, or directly for the government—you need to be prepared to work on topics on which you’re not a already an expert. It’s often your analytic skills they’re after, not your topical expertise, so you should be prepared to stretch that way. Maybe you’ll occasionally or eventually work back to your original research interests, maybe you’ll discover new ones, or maybe you’ll get bored or frustrated and restart your job search. In all cases, the process will go better if you’re prepared for any of the above.

Last but not least, if you’re not going into academia, I think it can help to know that your first job after grad school does not make or break your career. I don’t have data to support this claim, but my own experience and my observation of others tells me that nonacademic careers are not nearly as path dependent as academic ones. That can be scarier in some ways, but it’s also potentially liberating. Within the nontrivial constraints of the market, you can keep reinventing yourself, and I happen to think that’s great.

The Siren Song of Certainty

Statistician William Briggs ran a great post today under the headline “Uncertainty Is an Impossible Sell” (h/t Danilo Freire on Facebook). Read the whole thing, but here’s the money quote:

If you want to set up business as a data scientist (the newfangled term by which statisticians are beginning to call themselves), the lesson is this: promise the moon and charge like you’re actually going there. Failure is rarely punished and never remembered.

Sad but true. Sad because the siren song of certainty tempts us into wasteful spending and poorly informed decision-making and makes it tougher for honest brokers to compete in the marketplace of paid work and ideas.

Here’s Daniel Kahneman on the latter point in Thinking, Fast and Slow (pp. 262–263):

Optimism is highly valued, socially and in the market; people and firms reward the providers of dangerously misleading information more than they reward truth tellers…

Experts who acknowledge the full extent of their ignorance may expect to be replaced by more confident competitors, who are better able to gain the trust of clients. An unbiased appreciation of uncertainty is a cornerstone of rationality—but it is not what people and organizations want. Extreme uncertainty is paralyzing under dangerous circumstances, and the admission that one is merely guessing is especially unacceptable when the stakes are high. Acting on pretended knowledge is often the preferred solution.

Remember, the con in con artist is short for confidence. The more excited or flattered or assured a forecaster or other expert makes you feel, the more skeptical you should probably be.

What are all these violent images doing to us?

Early this morning, I got up, made some coffee, sat down at my desk, and opened Twitter to read the news and pass some time before I had to leave for a conference. One of the first things I saw in my timeline was a still from a video of what was described in the tweet as an ISIS fighter executing a group of Syrian soldiers. The soldiers lay on their stomachs in the dirt, mostly undressed, hands on their heads. They were arranged in a tightly packed row, arms and legs sometimes overlapping. The apparent killer stood midway down the row, his gun pointed down, smoke coming from its barrel.

That experience led me to this pair of tweets:

tweet 1

tweet 2

If you don’t use Twitter, you probably don’t know that, starting in 2013, Twitter tweaked its software so that photos and other images embedded in tweets would automatically appear in users’ timelines. Before that change, you had to click on a link to open an embedded image. Now, if you follow someone who appends an image to his or her tweet, you instantly see the image when the tweet appears in your timeline. The system also includes a filter of sorts that’s supposed to inform you before showing media that may be sensitive, but it doesn’t seem to be very reliable at screening for violence, and it can be turned off.

As I said this morning, I think the automatic display of embedded images is great for sharing certain kinds of information, like data visualizations. Now, tweets can become charticles.

I am increasingly convinced, though, that this feature becomes deeply problematic when people choose to share disturbing images. After I tweeted my complaint, Werner de Pooter pointed out a recent study on the effects of frequent exposure to graphic depictions of violence on the psychological health of journalists. The study’s authors found that daily exposure to violent images was associated with higher scores on several indices of psychological distress and depression. The authors conclude:

Given that good journalism depends on healthy journalists, news organisations will need to look anew at what can be done to offset the risks inherent in viewing User Generated Content material [which includes graphic violence]. Our findings, in need of replication, suggest that reducing the frequency of exposure may be one way to go.

I mostly use Twitter to discover stories and ideas I don’t see in regular news outlets, to connect with colleagues, and to promote my own work. Because I study political violence and atrocities, a fair share of my feed deals with potentially disturbing material. Where that material used to arrive only as text, it increasingly includes photos and video clips of violent or brutal acts as well. I am starting to wonder how routine exposure to those images may be affecting my mental health. The study de Pooter pointed out has only strengthened that concern.

I also wonder if the emotional power of those images is distorting our collective sense of the state of the world. Psychologists talk about the availability heuristic, a cognitive shortcut in which the ease of recalling examples of certain things drives our expectations about the likelihood or risk of those things. As Daniel Kahneman describes on p. 138 of Thinking, Fast and Slow,

Unusual events (such as botulism) attract disproportionate attention and are consequently perceived as less unusual than they really are. The world in our heads is not a precise replica of reality; our expectations about the frequency of events are distorted by the prevalence and emotional intensity of the messages to which we are exposed.

When those images of brutal violence pop into our view, they grab our attention, pack a lot of emotional intensity, and are often to hard to shake. The availability heuristic implies that frequent exposure to those images leads us to overestimate the threat or risk of things associated with them.

This process could even be playing some marginal role in a recent uptick in stories about how the world is coming undone. According to Twitter, its platform now has more than 270 million monthly active users. Many journalists and researchers covering world affairs probably fall in that 270 million. I suspect that those journalists and researchers spend more time watching their timelines than the average user, and they are probably more likely to turn off that “sensitive content” warning, too.

Meanwhile, smartphones and easier Internet access make it increasingly likely that acts of violence will be recorded and then shared through those media, and Twitter’s default settings now make it more likely that we see them when they are. Presumably, some of the organizations perpetrating this violence—and, sometimes, ones trying to mobilize action to stop it—are aware of the effects these images can have and deliberately push them to us to try to elicit that response.

As a result, many writers and analysts are now seeing much more of this material than they used to, even just a year or two ago. Whatever the actual state of the world, this sudden increase in exposure to disturbing material could be convincing many of us that the world is scarier and therefore more dangerous than ever before.

This process could have larger consequences. For example, lately I’ve had trouble getting thoughts of James Foley’s killing out of my mind, even though I never watched the video of it. What about the journalists and policymakers and others who did see those images? How did that exposure affect them, and how much is that emotional response shaping the public conversation about the threat the Islamic State poses and how our governments should respond to it?

I’m not sure what to do about this problem. As an individual, I can choose to unfollow people who share these images or spend less time on Twitter, but both of those actions carry some professional costs as well. The thought of avoiding these images also makes me feel guilty, as if I am failing the people whose suffering they depict and the ones who could be next. By hiding from those images, do I become complicit in the wider violence and injustice they represent?

As an organization, Twitter could decide to revert to the old no-show default, but that almost certainly won’t happen. I suspect this isn’t an issue for the vast majority of users, and it’s hard to imagine any social-media platform retreating from visual content as sites like Instagram and Snapchat grow quickly. Twitter could also try to remove embedded images that contain potentially disturbing material. As a fan of unfettered speech, though, I don’t find that approach appealing, either, and the unreliability of the current warning system suggests it probably wouldn’t work so well anyway.

In light of all that uncertainty, I’ll conclude with an observation instead of a solution: this is one hell of a huge psychological experiment we’re running right now, and its consequences for our own mental health and how we perceive the world around us may be more substantial than we realize.

Retooling

Over the next year, I plan to learn how to write code to do text mining.

I’m saying this out loud for two reasons. The first is self-centered; I see a public statement about my plans as a commitment device. By saying publicly that I plan to do this thing, I invest some of my credibility in following through, and my credibility is personally and professionally valuable to me.

I’m also saying this out loud, though, because I believe that the thinking behind this decision might interest other people working in my field. There are plenty of things I don’t know how to do that would be useful in my work on understanding and forecasting various forms of political instability. Three others that spring to mind are Bayesian data analysis, network theory, and agent-based modeling.

I’m choosing to focus on text mining instead of something else because I think that the single most significant obstacle to better empirical analysis in the social sciences is the scarcity of data, and I think that text mining is the most promising way out of this desert.

The volume of written and recorded text we produce on topics of interest to social scientists is incomprehensibly vast. Advances in computing technology and the growth of the World Wide Web have finally made it possible to access and analyze those texts—contemporary and historical—on a large scale with efficiency. This situation is still new, however, so most of this potential remains unrealized. There is a lot of unexplored territory on the other side of this frontier, and that territory is still growing faster than our ability to map it.

Lots of other people in political science and sociology are already doing text mining, and many of them are probably doing it better than I ever will.  One option would be to wait for their data sets to arrive and then work with them.

My own restlessness discourages me from following that strategy, but there’s also a principled reason not just to take what’s given: we do better analysis when we deeply understand where our data come from. The data sets you know the best are the ones you make. The data sets you know second-best are the ones someone else made with a process or instruments you’ve also used and understand. Either way, it behooves me to learn what these instruments are and how to apply them.

Instead of learning text mining, I could invest my time in learning other modeling and machine-learning techniques to analyze available data. My modeling repertoire is pretty narrow, and the array of options is only growing, so there’s plenty of room for improvement on that front, too.

In my experience, though, more complex models rarely add much to the inferential or predictive power we get from applying relatively simple models to the right data. This may not be true in every field, but it tends to be true in work on political stability and change, where the phenomena are so complex and still so poorly understood. On these topics, the best we can usually do is to find gross patterns that recur among data representing theoretically coherent processes or concepts.

Relatively simple models usually suffice to discover those gross patterns. What’s harder to come by are the requisite data. I think text mining is the most promising way to make them, so I am now going to learn how to do it.

%d bloggers like this: