1. However hard you think it will be to assemble a data set for a particular analysis, it will be exponentially harder, with the size of the exponent determined by the scope and scale of the required data.
- Corollary: If the data you need would cover the world (or just poor countries), they probably don’t exist.
- Corollary: If the data you need would extend very far back in time, they probably don’t exist.
- Corollary: If the data you need are politically sensitive, they probably don’t exist. If they do exist, you probably can’t get them. If you can get them, you probably shouldn’t trust them.
2. However reliable you think your data are, they probably aren’t.
- Corollary: A couple of digits after decimal point is plenty. With data this noisy, what do those thousandths really mean, anyway?
3. Just because a data transformation works doesn’t mean it’s doing what you meant it to do.
4. The only really reliable way to make sure that your analysis is replicable is to have someone previously unfamiliar with the work try to replicate it. Unfortunately, a person’s incentive to replicate someone else’s work is inversely correlated with his or her level of prior involvement in the project. Ergo, this will rarely happen until after you have posted your results.
5. If your replication materials will include random parts (e.g., sampling) and you’re using R, don’t forget to set the seed for random number generation at the start. (Alas, I am living this mistake today.)
Please use the Comments to suggest additions, corrections, or modifications.