I got an email this morning from a colleague asking for the replication files for a paper I published in 2005 (PDF). Sheepishly, I had to admit that I didn’t have them.
Data-sharing and replication weren’t the professional norm in political science 10 years ago. Best I can recall, it never even occurred to me to put the files where future me could easily find them. I did the research, submitted the paper, and moved on to the next project. During peer review, no one asked to see the data and .do files I used, and the email I got today was, I think, the first time anyone had asked for them.
I’ve probably changed PCs three or four times in the intervening decade and haven’t kept all of the retired machines. I spent some time this afternoon looking on a DVD with files from one of those out-to-pasture PCs, but to no avail. Now, I’m staring at a frozen blue Microsoft ScanDisk screen on a laptop running Windows 98 and realizing that this path is probably a dead end, too. Those were all my options.
There’s a simple lesson here: if you’re going to do something you want to construe as science, you need to store your data—quantitative, qualitative, audio, imagery, whatever—where you can easily find and share it in perpetuity.
That’s a helluva lot easier now than it was 10 years ago, thanks to things like GitHub, Google Drive, Dataverse, and various other backup and cloud-storage services. It still doesn’t happen by itself, though. You still have to choose to do it. Today, I’m relearning why that’s important—for science, of course, but also for my professional reputation.