Data Science Takes Work, Too

Yesterday, I got an email from the editor of an online publication inviting me to contribute pieces that would bring statistical analysis to bear on some topics they are hoping to cover. I admire the publication, and the topics interest me.

There was only one problem: the money. The honorarium they could offer for a published piece is less than my hourly consulting rate, and all of the suggested projects—as well most others I can imagine that would fit into this outlet’s mission—would probably take days to do. I would have to find, assemble, and clean the relevant data; explore and then analyze the fruits of that labor; generate and refine visualizations of those results; and, finally, write approximately 1,000 words about it. Extrapolating from past experience, I suspect that if I took on one of these projects, I would be working for less than minimum wage. And, of course, that estimated wage doesn’t account for the opportunity costs of foregoing other work (or leisure) I might have done during that time.

I don’t mean to cast aspersions on this editor. The publication is attached to a non-profit endeavor, so the fact that they were offering any payment at all already puts them well ahead of most peers. I’m also guessing that many of this outlet’s writers have salaried “day” jobs to which their contributions are relevant, so the honorarium is more of a bonus than a wage. And, of course, I spend hours of unpaid time writing posts for this blog, a pattern that some people might reasonably interpret as a signal of how much (or little) I think my time is worth.

Still, I wonder if part of the issue here is that this editor just had no idea how much work those projects would entail. A few days ago, Jeff Leek ran an excellent post on the Simply Statistics blog, about how “data science done well looks easy—and that is a big problem for data scientists.” As Leek points out,

Most well executed and successful data science projects don’t (a) use super complicated tools or (b) fit super complicated statistical models. The characteristics of the most successful data science projects I’ve evaluated or been a part of are: (a) a laser focus on solving the scientific problem, (b) careful and thoughtful consideration of whether the data is the right data and whether there are any lurking confounders or biases and (c) relatively simple statistical models applied and interpreted skeptically.

It turns out doing those three things is actually surprisingly hard and very, very time consuming. It is my experience that data science projects take a solid 2-3 times as long to complete as a project in theoretical statistics. The reason is that inevitably the data are a mess and you have to clean them up, then you find out the data aren’t quite what you wanted to answer the question, so you go find a new data set and clean it up, etc. After a ton of work like that, you have a nice set of data to which you fit simple statistical models and then it looks super easy to someone who either doesn’t know about the data collection and cleaning process or doesn’t care.

All I can say to all of that is: YES. On topics I’ve worked for years, I realize some economies of scale by knowing where to look for data, knowing what those data look like, and having ready-made scripts that ingest, clean, and combine them. Even on those topics, though, updates sometimes break the scripts, sources come and go, and the choice of model or methods isn’t always obvious. Meanwhile, on new topics, the process invariably takes many hours, and it often ends in failure or frustration because the requisite data don’t exist, or you discover that they can’t be trusted.

The visualization part alone can take a lot of time if you’re finicky about it—and you should be finicky about it, because your charts are what most people are going to see, learn from, and remember. Again, though, I think most people who don’t do this work simply have no idea.

Last year, as part of a paid project, I spent the better part of a day tinkering with an R script to ingest and meld a bunch of time series and then generate a single chart that would compare those time series. When I finally got the chart where I wanted it, I showed the results to someone else working on that project. He liked the chart and immediately proposed some other variations we might try. When I responded by pointing out that each of those variations might take an hour or two to produce, he was surprised and admitted that he thought the chart had come from a canned routine.

We laughed about it at the time, but I think that moment perfectly illustrates the disconnect that Gill describes. What took me hours of iterative code-writing and drew on years of accumulated domain expertise and work experience looked to someone else like nothing more than the result of a few minutes of menu-selecting and button-clicking. When that’s what people think you do, it’s hard to get them to agree to pay you well for what you actually do.

Leave a comment

6 Comments

  1. Brian

     /  March 19, 2015

    Jay. I agree. What’s your hourly marginal cost? I’m sure you could work in parallel? Thus your MC=MR and you’d be ahead. Add in the intangible factors of supporting this magazine and enhancing your brand.

    All consultants leverage their data. So there is an increasing return on data science for the next project.

    Your costs are reflection of learning, training and gaining expertise. Doing your analysis in R when maybe it could be done cheaper in Excel .xlsx appears to a cost improvement. My reading of your blogs is that you learned R programming as a challenge, learning experience, accomplishment to present at a conference and mastering a leading edge technique.

    Is the marginal revenue on this blog > MR of magazine article.

    Love your blog.

    Reply
  2. Jay, you make an important point that goes beyond this particular situation.

    A great many people don’t understand what goes into actually doing things that they are opining on. In my aread of interest, this would include those who believe that terrorists can easily build or obtain a nuclear weapon and those who believe that Moscow is capable of militarily capturing and holding all of Ukraine.

    Doing things in the real world requires certain baseline capabilities and then applying them, which takes time. If those capabilities or the time isn’t there, the event based on them isn’t going to happen. But if one ignores that data have to be collected, sorted, and put through an analysis process, or that certain military capabilities will have to come from somewhere and applied against a resistant population, or doesn’t realize what actions in the real world require, well, it’s easy enough to write them.

    Reply
  3. Michael

     /  March 19, 2015

    Thanks for this and your ongoing blog. I agree with Cheryl that the problem you cite applies to work beyond the particular type you describe. It seems to be a general tendency to substantially underestimate the time, skill and costs required to do any kind of work, simply because one is not familiar with that work. Once a work product is obtained, there is rarely full appreciation of what effort may have gone into it. Probably applies even to yard work. With regard to analytical/research work in particular, the procurers also frequently presume that researchers have a regular income such as from a university, so they don’t need to be paid much, or anything, for a piece.
    What Is To Be Done? Free-Lance Contractors of the World Unite!

    Reply
  4. I must admit that initially it does sound like it’s easy. But I’m no fool, nor am I a stats wiz. After reading the amount of work that goes into what you do, I thought for a minute. The conclusion I arrived at is that I think the assumption of relative ease comes in many disciplines across the board, especially at the hand of those who have no idea of what it takes to complete the process. But I appreciated this…

    Reply
  1. The challenge of Data Science | Models are illuminating and wrong
  2. Down the Country-Month Rabbit Hole | Dart-Throwing Chimp

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

  • Author

  • Follow me on Twitter

  • Follow Dart-Throwing Chimp on WordPress.com
  • Enter your email address to follow this blog and receive notifications of new posts by email.

    Join 13,609 other subscribers
  • Archives

%d bloggers like this: