Last week, I aired my skepticism on the power of (non-parametric) pattern-recognition techniques to predict many political events, and Phil Schrodt responded with a more optimistic take (see here and here). Since then, Phil and I have carried the conversation a bit further via email. As a coda to this discussion, I thought I would post the relevant pieces of that correspondence.
I wrote to Phil:
At the end of your post, you address my point about rare events by saying that sample size doesn’t matter so much for training PR tools, as long as their are recognizable clusters of features in the examples that are available.
Isn’t that a big “if,” though? For most big political events, my prior belief would be that there would be recognizable clusters of features for the event itself, but many possible clusters or sequences of events or other features preceding those. The analogy that comes to mind is the difference between a bobsled track and sledding hill. Where political phenomena work like a bobsled track, following regular patterns to the common destination, then we should be able to learn their patterns from a small number of examples and extrapolate successfully to future instances. My guess, though, is that they’re often more like a sledding hill, where the approaches to the bottom are more varied and conditional (on things like the starting point), even though the destination is ultimately the same. In situations where that’s right, then we would need a much larger set of examples to identify any regularities and extrapolate successfully from them.
I realize this is, for the most part, an empirical question. Still, I would wager that my original point about the importance of sample size will apply to at least a fair number of the things we might want to forecast in political science.
I sort of agree with you here and sort of not.
On the “bobsled track vs. sledding hill” analogy — which is nice — that is the point (and I realized at the time I needed to elaborate on it) I meant when I used the phrase “clusters and the variance around them.” The question is not just what something looks like and how many points are close to it, but *how* close they are (plus, of course, the issue of the false positives: how many other points are in the vicinity that shouldn’t be?). So normal elections in established democracies probably all look pretty much the same (in some coding scheme); the collapse of coalition governments in established parliamentary democracies probably has more variance that that, but is still reasonably regular (opposition parties signal their intentions before they actually break, etc). Something like genocide is, as you note, probably on the far end of unpredictable, which is to say that there is a great deal of variance and a large number of false positives. So in those instances, one needs to invest a whole lot of effort to try to find those few things that are in common.
Where I’d still make the case for the clustering/PR approach is that I think it usually can work on *less* data than a corresponding frequentist method, particularly one that is depending on large-sample properties (e.g. approximations to the Normal distribution via the Central Limit Theorem). And, I suppose, that the PR methods are probably more robust against violations of assumptions (or, more generally, they tend to be non-parametric) than are the statistical methods. But they will do better with more information than with less.
And that’s where we’ll leave it for now.