My younger son is a huge fan of the Baltimore Ravens, and his enthusiasm over the past several years has converted me, so we had a lot of fun (and gut-busting anxiety) watching the Super Bowl on Sunday.
As a dad and fan, my favorite part of the night was the Baltimore win. As a forecaster, though, my favorite discovery of the night was a web site called Advanced NFL Stats, one of a budding set of quant projects applied to the game of football. Among other things, Advanced NFL Stats produces charts of the probability that either team will win every pro game in progress, including the Super Bowl. These charts are apparently based on a massive compilation of stats from games past, and they are updated in real time. As we watched the game, I could periodically refresh the page on my mobile phone and give us a fairly reliable, up-to-the-minute forecast of the game’s outcome. Since the Super Bowl confetti has settled, I’ve spent some time poking through archived charts of the Ravens’ playoff run, and that exercise got me thinking about two lessons for forecasters.
1. Improbable doesn’t mean impossible.
To get to the Super Bowl, the Ravens had to beat the Denver Broncos in the divisional round of the playoffs. Trailing by seven with 3:12 left in that game, the Ravens turned the ball over to Denver on downs at the Broncos’ 31-yard line. To win from there, the Ravens would need a turnover or quick stop; then a touchdown; then either a successful two-point conversion or a first score in overtime.
As the chart below shows, the odds of all of those things coming together were awfully slim. At that point—just before “Regulation” on the chart’s bottom axis—Advanced NFL Stats’ live win-probability graph gave the Ravens roughly a 1% chance of winning. Put another way, if the game could be run 100 times from that position, we would only expect to see Baltimore win once.
Well, guess what happened? The one-in-a-hundred event, that’s what. Baltimore got the quick stop they needed, Denver punted, Joe Flacco launched a 70-yard bomb down the right sideline to Jacoby Jones for a touchdown, the Ravens pushed the game into overtime, and two minutes into the second extra period at Mile High Stadium, Justin Tucker booted a 47-yard field goal to carry Baltimore back to the AFC Championship.
For Ravens’ fans, that outcome was a %@$# miracle. For forecasters, it was a great reminder that even highly unlikely events happen sometimes. When Nate Silver’s model indicates on the eve of the 2012 election that President Obama has a 91% chance of winning, it isn’t saying that Obama is going to win. It’s saying he’s probably going to win, and the Ravens-Broncos game reminds us that there’s an important difference. Conversely, when a statistical model of rare events like coups or mass killings identifies certain countries as more susceptible than others, it isn’t necessarily suggesting that those highest-risk cases are definitely going to suffer those calamities. When dealing with events as rare as those, even the most vulnerable cases will escape most years without a crisis.
The larger point here is one that’s been made many times but still deserves repeating: no single probabilistic forecast is plainly right and wrong. A sound forecasting process will reliably distinguish the more likely from the less likely, but it won’t attempt to tell us exactly what’s going to happen in every case. Instead, the more accurate the forecasts, the more closely the frequency of real-world outcomes or events will track the predicted probabilities assigned to them. If a meteorologist’s model is really good, we should end up getting wet roughly half of the times she tells us there’s a 50% chance of rain. And almost every time the live win-probability graph gives a football team a 99% chance of winning, they will go on to win that game—but, as my son will happily point out, not every time.
2. The “obvious” indicators aren’t always the most powerful predictors.
Take a look at the Advanced NFL Stats chart below, from Sunday’s Super Bowl. See that sharp dip on the right, close to the end? Something really interesting happened there: late in the game, Baltimore led on score (34-29) but trailed San Francisco in its estimated probability of winning (about 45%).
How could that be? Consideration of the likely outcomes of the next two possessions makes it clearer. At the time, San Francisco had a first-and-goal situation from Baltimore’s seven yard line. Teams with four shots at the end zone from seven yards out usually score touchdowns, and teams that get the ball deep in their own territory with a two- or three-point deficit and less than two minutes to play usually lose. In that moment, the live forecast confirmed the dread that Ravens fans were feeling in our guts: even though San Francisco was still trailing, the game had probably slipped away from Baltimore.
I think there’s a useful lesson for forecasters in that peculiar situation: the most direct indicators don’t tell the whole story. In football, the team with a late-game lead is usually going to win, but Advanced NFL Stats’ data set and algorithm have uncovered at least one situation where that’s not the case.
This lesson also applies to efforts to forecasts political processes, like violent conflict and regime collapse. With the former, we tend to think of low-level violence as the best predictor of future civil wars, but that’s not always true. It’s surely a valuable piece of information, but there are other sources of positive and negative feedback that might rein in incipient violence in some cases and produce sudden eruptions in others. Ditto for dramatic changes in political regimes. Eritrea, for example, recently had some sort of mutiny and North Korea did not, but that doesn’t necessarily mean the former is closer to breaking down than the latter. There may be features of the Eritrean regime that will allow it to weather those challenges and aspects of the North Korean regime that predispose it to more abrupt collapse.
In short, we shouldn’t ignore the seemingly obvious signals, but we should be careful to put them in their proper context, and the results will sometimes be counter-intuitive.