Maybe Pattern Recognition Will Work Better Than I Thought

The eminent Phil Schrodt, master of methods and pater familias of political-science programming, read yesterday’s bit on applied pattern recognition and had some things to say about it. Phil and I know each other from my years working with the Political Instability Task Force, and I know he knows a lot more about this topic than I do, so I asked him to make a guest post out of it. Here’s what Phil wrote:

Three responses to Jay’s recent post on pattern recognition (PR).

First, as I argued rather extensively a couple decades back in an unpublishable book, PR is at the core of much of political analysis; in fact probably most of the political analysis that we call “qualitative” or “case study.” Statistical studies are a tiny and somewhat awkward subset of the means by which we could formally assess political behavior for most categorical political events (as opposed to natural quantitative measures such as public opinion polls), though I will be the first to acknowledge that given the tremendous amount of work over the past decade or so, statistical methods are “unreasonably effective” at this task.

But the bottom line is if not PR, what else is going on? We know that the human brain is not only extraordinarily effective at PR in general, but there is increasing evidence that it is hardwired for various forms of socially-relevant PR.

Take, for example, the fact that you can probably recognize any of the songs played at your high school senior prom after the first couple of bars of music. (Presupposing, of course, that you went to a high school senior prom; like most of the socially dysfunctional geeks reading this blog, I didn’t…I digress…) This ability is closely related to the fact that you could recognize the voice of your (counter-factual) date at that prom within seconds if he/she phoned you after twenty years. Those features, in turn, are related to how we evolved (yes, evolved…sorry, Republican presidential primary contenders except Huntsman) as social primates. There is also evidence — which I could dig up except that I’m writing for a blog — that episodic (“story telling”) PR is another one of those highly evolved features, quite possibly due to the need for social primates to figure out who they should and should not behave altruistically towards — play the C/D option frequently enough in the iterated Prisoner’s Dilemma, and you die. Or at least don’t reproduce. (I hate to think where proms fit in this framework…though this might provide some insight…)

So for me the issue is figuring out how a computer can simulate that PR. This is a difficult problem which, I would suggest, we’ve spent almost no time at all on — the number of PR articles in the whole of the published political science literature probably numbers a two or three dozen at best, plus probably a comparable number by computer scientists who happen to choose politics as a domain for demonstrating machine learning algorithms (see here, for example). The vast bulk of the systematic, data-based work in political science has either been the aforementioned statistical studies, or else (following far behind in numbers, though still much larger than the PR work) approaches that are loosely — at times really, really loosely — based on game theory (see here), another highly simplified approximation to human reasoning that sometimes kinda sorta maybe works in some situations if we don’t look too closely (but I digress…)

Now, on the issue of data: yes, right now we don’t have a whole lot of machine-readable data, but that is a temporary situation and will only improve. The existing event data sets are quite limited and consequently rare events are an issue. This, however, is changing rapidly. For example, if the new DARPA ICEWS global data set being produced by Lockheed is made available (likely a DARPA decision), we will have another data set containing on the order of magnitude of 10-million events (the Lockheed ICEWS data covering only Asia had around 4-million events, so extrapolating, the global set might have around 25-million, just as a guess), and with far greater detail on substate actors (an ICEWS specialty) compared to the VRA/King-Lowe “10-million events” set. Meanwhile Peter Nardulli’s SPEED project is using a variety of text-recognition and natural language processing methods to generate a dense global data set on political events going back to 1945.

But more generally, the availability and reliably of this data — machine coded from machine-readable sources — is only going to improve. At present, the low-hanging fruit — news reports we can (relatively) easily download — goes back only to around 1990 (conveniently, about the time of the end of the Cold War). But that period of data is increasing at the exact rate of one year per year (and, to a much more limited extent, some efforts such as Nardulli’s are also pushing this backwards in time as well), and the number of sources available online is now in the thousands (compared to the single-source Reuters and AFP-based data sets of the VRA and KEDS projects), and is compiled by aggregators such as Google News and the European Media Monitor. Machine coding methods are also improving, through the development of better ontologies, hugely more extensive dictionaries with tens of thousands of actors (ICEWS again), the development of a wide variety of tools such as parsers, automated translation and named-entity-recognition software in computational linguistics, and integration of these into third-generation automated coders such as Lockheed’s JABARI-NLP (link). In all likelihood, machine coding is already more accurate than acutal human coding, though it is not at the level of the claimed human accuracy, which tends to be about twice as high (see this fascinating study by Mikhaylov, Laver, and Benoit on attempting to replicate the coding of the Comparative Manifestos Project, a task very similar to event data coding), and it is only going to get better.

Finally, the issue of rare events (and human PR more generally) is one of identifying archetypes — a radical new idea Max Weber suggested a mere century ago — then selectively getting the data that instantiates both the archetype and determining how much variance there is around the archetype. Suppose tomorrow (it would be nice…) we saw the following three things happen in Syria:

1. Government media go off the air.

2. Tanks surround major government buildings in Damascus and other major cities.

3. Vague reports emerge about a “Emergency Committee for National Salvation and Unity” consisting largely of colonels.

That’s probably sufficient information for us to conclude a “coup” has occurred in Syria. This is based on a combination of how a “coup” has been defined and the various instantiations of “coups” that we have seen in the historical record. In human PR, that historical record goes back considerably further than what we have in the event data record, and also “selects on the dependent variable” — if we want examples of coups, we look at coup-prone countries (e.g. Latin America prior to the 1990s, Africa in the 1970s and 1980s, Thailand, Turkey), not at the historical record generally. Most of the evidence from the cognitive sciences indicates that the brain does this automatically — memories, and particularly episodic memories, are stored in networks of similar events. “If it fires together, it wires together” (link). Approximating this in a computational environment is a difficult task — the structures and intrinsic capabilities of carbon-based and silicon-based memories differ dramatically — but there is plenty of work in this area, and it is certainly becoming easier and more efficient with high-memory, high-speed parallel computer clusters.

Archetypes, in turn, generally depend on clustering, not the sample-size-dependent features of traditional statistics. This is why problems such as IBM Watson/Jeopardy, the Netflix film-preference algorithm, and the WalMart shopping basket co-purchase (e.g. beer and diapers) algorithms work. If there are a couple other people out there who really like vampire films, John Ford Westerns, and classical Looney Tunes animation, Netflix just needs to find that cluster; it doesn’t need to be a big cluster (though that one probably is big…). I can pull out a dozen historical analogues of the political contagion process of the Arab Spring and this gives me plenty to work with from the perspective of figuring out what is typical or not about the current situation; I don’t need a sample of 1,000.

In short, we’re unquestionably not there yet on computational pattern recognition of political activities, and arguably we’ve hardly even started. But both the data side and the computational side have changed exponentially in the past decade — and these trends continue — so this would seem to be a promising avenue of research.

PS. Phil and I wrap up the discussion (for now) with one more exchange, here.