Professional (American) football season starts tonight when the Green Bay Packers visit last year’s champs, the Seattle Seahawks, for a Thursday-night opener thing that still seems weird to me. (SUNDAY, people. Pro football is played on Sunday.) So, who’s likely to win?
With the final preseason scores from our pairwise wiki survey in hand, we can generate a prediction for that game, along with all 255 other regular-season contests on the 2014 schedule. As I described in a recent post, this wiki survey offers a novel way to crowdsource the problem of estimating team strength before the season starts. We can use last year’s preseason survey data and game results to estimate a simple statistical model that accounts for two teams’ strength differential and home-field advantage. Then, we can apply that model to this year’s survey results to get game-level forecasts.
In the last post, I used the initial model estimates to generate predicted net scores (home minus visitor) and confidence intervals. This time, I thought I’d push it a little further and use predictive simulations. Following Gelman and Hill’s Data Analysis Using Regression and Multilevel/Hierarchical Models (2009), I generated 1,000 simulated net scores for each game and then summarized the distributions of those scores to get my statistics of interest.
The means of those simulated net scores for each game represent point estimates of the outcome, and the variance of those distributions gives us another way to compute confidence intervals. Those means and confidence intervals closely approximate the ones we’d get from a one-shot application of the predictive model to the 2014 survey results, however, so there’s no real new information there.
What we can do with those distributions that is new is compute win probabilities. The share of simulated net scores above 0 gives us an estimate of the probability of a home-team win, and 1 minus that estimate gives us the probability of a visiting-team win.
A couple of pictures make this idea clearer. First, here’s a histogram of the simulated net scores for tonight’s Packers-Seahawks game. The Packers fared pretty well in the preseason wiki survey, ranking 5th overall with a score of 77.5 out of 100. The defending-champion Seahawks got the highest score in the survey, however—a whopping 92.6—and they have home-field advantage, which is worth about 3.1 points on average, according to my model. In my predictive simulations, 673 of the 1,000 games had a net score above 0, suggesting a win probability of 67%, or 2:1 odds, in favor of the Seahawks. The mean predicted net score is 5.8, which is pretty darn close to the current spread of -5.5.
Things look a little tighter for the Bengals-Ravens contest, which I’ll be attending with my younger son on Sunday in our once-annual pilgrimage to M&T Bank Stadium. The Ravens wound up 10th in the wiki survey with a score of 60.5, but the Bengals are just a few rungs down the ladder, in 13th, with a score of 54.7. Add in home-field advantage, though, and the simulations give the Ravens a win probability of 62%, or about 3:2 odds. Here, the mean net score is 3.6, noticeably higher than the current spread of -1.5 but on the same side of the win/loss line. (N.B. Because the two teams’ survey scores are so close, the tables turn when Cincinnati hosts in Week 8, and the predicted probability of a home win is 57%.)
Once we’ve got those win probabilities ginned up, we can use them to move from game-level to season-level forecasts. It’s tempting to think of the wiki survey results as season-level forecasts already, but what they don’t do is account for variation in strength of schedule. Other things being equal, a strong team with a really tough schedule might not be expected to do much better than a mediocre team with a relatively easy schedule. The model-based simulations refract those survey results through the 2014 schedule to give us a clearer picture of what we can expect to happen on the field this year.
The table below (made with the handy ‘textplot’ command in R’s gplots package) turns the predictive simulations into season-level forecasts for all 32 teams.* I calculated two versions of a season summary and juxtaposed them to the wiki survey scores and resulting rankings. Here’s what’s in the table:
- WikiRank shows each team’s ranking in the final preseason wiki survey results.
- WikiScore shows the score on which that ranking is based.
- WinCount counts the number of games in which each team has a win probability above 0.5. This process gives us a familiar number, the first half of a predicted W-L record, but it also throws out a lot of information by treating forecasts close to 0.5 the same as ones where we’re more confident in our prediction of the winner.
- WinSum, is the sum of each team’s win probabilities across the 16 games. This expected number of wins is a better estimate of each team’s anticipated results than WinCount, but it’s also a less familiar one, so I thought I would show both.
Teams appear in the table in descending order of WinSum, which I consider the single-best estimate in this table of a team’s 2014 performance. It’s interesting (to me, anyway) to see how the rank order changes from the survey to the win totals because of differences in strength of schedule. So, for example, the Patriots ranked 4th in the wiki survey, but they get the second-highest expected number of wins this year (9.8), just behind the Seahawks (9.9). Meanwhile, the Steelers scored 16th in the wiki survey, but they rank 11th in expected number of wins with an 8.4. That’s a smidgen better than the Cincinnati Bengals (8.3) and not much worse than the Baltimore Ravens (9.0), suggesting an even tighter battle for the AFC North division title than the wiki survey results alone.
There are a lot of other interesting quantities we could extract from the results of the game-level simulations, but that’s all I’ve got time to do now. If you want to poke around in the original data and simulation results, you can find them all in a .csv on my Google Drive (here). I’ve also posted a version of the R script I used to generate the game-level and season-level forecasts on Github (here).
At this point, I don’t have plans to try to update the forecasts during the season, but I will be seeing how the preseason predictions fare and occasionally reporting the results here. Meanwhile, if you have suggestions on other ways to use these data or to improve these forecasts, please leave a comment here on the blog.
* The version of this table I initially posted had an error in the WikiRank column where 18 was skipped and the rankings ran to 33. This version corrects that error. Thanks to commenter C.P. Liberatore for pointing it out.