In Praise of Slow Number-Crunching

At 11:30 AM on a recent morning, I started my computer chewing on a statistical model that included a half-dozen measures observed annually in countries worldwide over a few decades. With the computational horsepower and statistical software available nowadays, the results of an estimation like this one usually pop up in a fraction of a second. In this case, though, 3:15 PM rolled around, and the process was still going. As I waited for the user interface to show me that each small step in the estimation process had been completed, I was gaining some appreciation for what people mean when they talk about watching paint dry.

This particular estimation was so much slower than the usual regression analysis because I was using a multilevel (a.k.a. hierarchical or mixed-effects) model, which in this case included not only country- and region-specific intercepts but also country-specific slopes for one of the covariates. Those country- and region-specific parts should give me more reliable estimates of my quantities of interest–the marginal effects of oil income on the likelihood that a country will experience a transition to democracy–but they also increase the computational burden by several orders of magnitude.

(Technical aside: I mostly use R for statistical analysis, but on the advice of some knowledgeable colleagues, I am using the ‘gllamm’ module in Stata 9.2 for multilevel modeling.)

We think of delays as inefficiencies, but it turns out that delays can still be useful, too. One of the downsides to the incredible advances in the technology we use for statistical modeling is that we don’t need to think so hard about what we’re doing before we start doing it. When you can estimate dozens of models in just a few minutes, there no practical incentive to worry about specifying the model properly in the first place. Instead, there’s a strong temptation to run estimations until you see something you “like,” which–let’s be honest–usually means something that either confirms your prior beliefs or looks more publishable. Poking around the model space until you hit on results you like is not a sound research design, but I’m betting it happens a lot more often than quantitative social scientists would care to admit.

When the estimations don’t go so fast, we have to use our computing time more wisely. For me, that means thinking more carefully from the start about which type of model to use, which covariates to include, and whether and how to transform any of the measures before the analysis. These are all things we are (or should be) taught to do in our methods training, but I know from personal experience that curiosity, impatience, and the peculiar incentives of academic publishing tempt us to pay less attention to these simple principles than we should.

Warp-speed number-crunching is not alchemy, magically transforming our estimates into science. When the underlying research process is not scientific, the results won’t be, either. Sometimes, a little inefficiency can help save us from ourselves.

Leave a comment

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: