Two Forecasting Lessons from a Crazy Football Season

My younger son is a huge fan of the Baltimore Ravens, and his enthusiasm over the past several years has converted me, so we had a lot of fun (and gut-busting anxiety) watching the Super Bowl on Sunday.

As a dad and fan, my favorite part of the night was the Baltimore win. As a forecaster, though, my favorite discovery of the night was a web site called Advanced NFL Stats, one of a budding set of quant projects applied to the game of football. Among other things, Advanced NFL Stats produces charts of the probability that either team will win every pro game in progress, including the Super Bowl. These charts are apparently based on a massive compilation of stats from games past, and they are updated in real time. As we watched the game, I could periodically refresh the page on my mobile phone and give us a fairly reliable, up-to-the-minute forecast of the game’s outcome. Since the Super Bowl confetti has settled, I’ve spent some time poking through archived charts of the Ravens’ playoff run, and that exercise got me thinking about two lessons for forecasters.

1. Improbable doesn’t mean impossible.

To get to the Super Bowl, the Ravens had to beat the Denver Broncos in the divisional round of the playoffs. Trailing by seven with 3:12 left in that game, the Ravens turned the ball over to Denver on downs at the Broncos’ 31-yard line. To win from there, the Ravens would need a turnover or quick stop; then a touchdown; then either a successful two-point conversion or a first score in overtime.

As the chart below shows, the odds of all of those things coming together were awfully slim. At that point—just before “Regulation” on the chart’s bottom axis—Advanced NFL Stats’ live win-probability graph gave the Ravens roughly a 1% chance of winning. Put another way, if the game could be run 100 times from that position, we would only expect to see Baltimore win once.


Well, guess what happened? The one-in-a-hundred event, that’s what. Baltimore got the quick stop they needed, Denver punted, Joe Flacco launched a 70-yard bomb down the right sideline to Jacoby Jones for a touchdown, the Ravens pushed the game into overtime, and two minutes into the second extra period at Mile High Stadium, Justin Tucker booted a 47-yard field goal to carry Baltimore back to the AFC Championship.

For Ravens’ fans, that outcome was a %@$# miracle. For forecasters, it was a great reminder that even highly unlikely events happen sometimes. When Nate Silver’s model indicates on the eve of the 2012 election that President Obama has a 91% chance of winning, it isn’t saying that Obama is going to win. It’s saying he’s probably going to win, and the Ravens-Broncos game reminds us that there’s an important difference. Conversely, when a statistical model of rare events like coups or mass killings identifies certain countries as more susceptible than others, it isn’t necessarily suggesting that those highest-risk cases are definitely going to suffer those calamities. When dealing with events as rare as those, even the most vulnerable cases will escape most years without a crisis.

The larger point here is one that’s been made many times but still deserves repeating: no single probabilistic forecast is plainly right and wrong. A sound forecasting process will reliably distinguish the more likely from the less likely, but it won’t attempt to tell us exactly what’s going to happen in every case. Instead, the more accurate the forecasts, the more closely the frequency of real-world outcomes or events will track the predicted probabilities assigned to them. If a meteorologist’s model is really good, we should end up getting wet roughly half of the times she tells us there’s a 50% chance of rain. And almost every time the live win-probability graph gives a football team a 99% chance of winning, they will go on to win that game—but, as my son will happily point out, not every time.

2. The “obvious” indicators aren’t always the most powerful predictors.

Take a look at the Advanced NFL Stats chart below, from Sunday’s Super Bowl. See that sharp dip on the right, close to the end? Something really interesting happened there: late in the game, Baltimore led on score (34-29) but trailed San Francisco in its estimated probability of winning (about 45%).


How could that be? Consideration of the likely outcomes of the next two possessions makes it clearer. At the time, San Francisco had a first-and-goal situation from Baltimore’s seven yard line. Teams with four shots at the end zone from seven yards out usually score touchdowns, and teams that get the ball deep in their own territory with a two- or three-point deficit and less than two minutes to play usually lose. In that moment, the live forecast confirmed the dread that Ravens fans were feeling in our guts: even though San Francisco was still trailing, the game had probably slipped away from Baltimore.

I think there’s a useful lesson for forecasters in that peculiar situation: the most direct indicators don’t tell the whole story. In football, the team with a late-game lead is usually going to win, but Advanced NFL Stats’ data set and algorithm have uncovered at least one situation where that’s not the case.

This lesson also applies to efforts to forecasts political processes, like violent conflict and regime collapse. With the former, we tend to think of low-level violence as the best predictor of future civil wars, but that’s not always true. It’s surely a valuable piece of information, but there are other sources of positive and negative feedback that might rein in incipient violence in some cases and produce sudden eruptions in others. Ditto for dramatic changes in political regimes. Eritrea, for example, recently had some sort of mutiny and North Korea did not, but that doesn’t necessarily mean the former is closer to breaking down than the latter. There may be features of the Eritrean regime that will allow it to weather those challenges and aspects of the North Korean regime that predispose it to more abrupt collapse.

In short, we shouldn’t ignore the seemingly obvious signals, but we should be careful to put them in their proper context, and the results will sometimes be counter-intuitive.

Oh, and…THIS:


Leave a comment


  1. Grant

     /  February 6, 2013

    I didn’t understand a single world you said about football, but it was interesting anyway. But it seems to me that there’s a difference between Obama losing the 2012 election and the Ravens winning this game. In something as sparsely populated as a game isn’t there more chance of individuals defying the odds* (and possibly outdoing what they are normally capable of precisely because they know how difficult it is) and outperforming what you would expect than in largely populated events like games where the majority** could be expected to shut down the minority?

    *Assuming that the odds were properly calculated that is.
    **In that case everyone who was voting, not simply registered voters.

    • Hmm, I’m not so sure. If my assumptions about Advanced NFL Stats are correct, the in-game forecasts are based on a fairly large sample of games past, so they’re not so sparse after all. Thirty-two teams play 16 weeks plus playoffs, so almost 270 games each season, iterated over many seasons… Most specific game situations will be fairly rare, but that’s still a pretty good-sized sample on which to base your simulations. Meanwhile, presidential elections only happen every four years, and the factors shaping their outcomes have arguably evolved over the past century, so comparisons of 1912 to 2012 are dubious. Clearly there’s uncertainty around the estimates from both kinds of models, and probably a lot of variation in the scope of that uncertainty across situations within each kind of model, but I actually suspect the bands around football forecasts would generally be tighter than the ones around the presidential election forecasts. But that’s just a guess.

      • Grant

         /  February 7, 2013

        Alright, that was my fault. I really dropped the ball on terminology that time. By ‘populated’ I didn’t mean the statistical sense of having a great deal of data, I meant it in the regular sense of ‘lots of people existing’ with my suggestion being that an event that has fewer people might show greater defiance of odds (such as a football game) than an event that has many millions of people (such as a presidential election). The confusion is entirely my fault.

    • Grant

       /  February 6, 2013

      Sorry, I meant largely populated events like elections, not games.

  1. Using Wiki Surveys to Forecast Rare Events | Dart-Throwing Chimp
  2. Turning Crowdsourced Preseason NFL Strength Ratings into Game-Level Forecasts | Dart-Throwing Chimp

Leave a Comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

Join 13,569 other followers

%d bloggers like this: