Professional (American) football season starts tonight when the Green Bay Packers visit last year’s champs, the Seattle Seahawks, for a Thursday-night opener thing that still seems weird to me. (SUNDAY, people. Pro football is played on Sunday.) So, who’s likely to win?

With the final preseason scores from our pairwise wiki survey in hand, we can generate a prediction for that game, along with all 255 other regular-season contests on the 2014 schedule. As I described in a recent post, this wiki survey offers a novel way to crowdsource the problem of estimating team strength before the season starts. We can use last year’s preseason survey data and game results to estimate a simple statistical model that accounts for two teams’ strength differential and home-field advantage. Then, we can apply that model to this year’s survey results to get game-level forecasts.

In the last post, I used the initial model estimates to generate predicted net scores (home minus visitor) and confidence intervals. This time, I thought I’d push it a little further and use predictive simulations. Following Gelman and Hill’s *Data Analysis Using Regression and Multilevel/Hierarchical Models* (2009), I generated 1,000 simulated net scores for each game and then summarized the distributions of those scores to get my statistics of interest.

The means of those simulated net scores for each game represent point estimates of the outcome, and the variance of those distributions gives us another way to compute confidence intervals. Those means and confidence intervals closely approximate the ones we’d get from a one-shot application of the predictive model to the 2014 survey results, however, so there’s no real new information there.

What we can do with those distributions that *is* new is compute win probabilities. The share of simulated net scores above 0 gives us an estimate of the probability of a home-team win, and 1 minus that estimate gives us the probability of a visiting-team win.

A couple of pictures make this idea clearer. First, here’s a histogram of the simulated net scores for tonight’s Packers-Seahawks game. The Packers fared pretty well in the preseason wiki survey, ranking 5th overall with a score of 77.5 out of 100. The defending-champion Seahawks got the highest score in the survey, however—a whopping 92.6—and they have home-field advantage, which is worth about 3.1 points on average, according to my model. In my predictive simulations, 673 of the 1,000 games had a net score above 0, suggesting a win probability of 67%, or 2:1 odds, in favor of the Seahawks. The mean predicted net score is 5.8, which is pretty darn close to the current spread of -5.5.

Things look a little tighter for the Bengals-Ravens contest, which I’ll be attending with my younger son on Sunday in our once-annual pilgrimage to M&T Bank Stadium. The Ravens wound up 10th in the wiki survey with a score of 60.5, but the Bengals are just a few rungs down the ladder, in 13th, with a score of 54.7. Add in home-field advantage, though, and the simulations give the Ravens a win probability of 62%, or about 3:2 odds. Here, the mean net score is 3.6, noticeably higher than the current spread of -1.5 but on the same side of the win/loss line. (N.B. Because the two teams’ survey scores are so close, the tables turn when Cincinnati hosts in Week 8, and the predicted probability of a home win is 57%.)

Once we’ve got those win probabilities ginned up, we can use them to move from game-level to season-level forecasts. It’s tempting to think of the wiki survey results as season-level forecasts already, but what they don’t do is account for variation in strength of schedule. Other things being equal, a strong team with a really tough schedule might not be expected to do much better than a mediocre team with a relatively easy schedule. The model-based simulations refract those survey results through the 2014 schedule to give us a clearer picture of what we can expect to happen on the field this year.

The table below (made with the handy ‘textplot’ command in R’s gplots package) turns the predictive simulations into season-level forecasts for all 32 teams.* I calculated two versions of a season summary and juxtaposed them to the wiki survey scores and resulting rankings. Here’s what’s in the table:

**WikiRank**shows each team’s ranking in the final preseason wiki survey results.**WikiScore**shows the score on which that ranking is based.**WinCount**counts the number of games in which each team has a win probability above 0.5. This process gives us a familiar number, the first half of a predicted W-L record, but it also throws out a lot of information by treating forecasts close to 0.5 the same as ones where we’re more confident in our prediction of the winner.**WinSum**, is the sum of each team’s win probabilities across the 16 games. This expected number of wins is a better estimate of each team’s anticipated results than WinCount, but it’s also a less familiar one, so I thought I would show both.

Teams appear in the table in descending order of WinSum, which I consider the single-best estimate in this table of a team’s 2014 performance. It’s interesting (to me, anyway) to see how the rank order changes from the survey to the win totals because of differences in strength of schedule. So, for example, the Patriots ranked 4th in the wiki survey, but they get the second-highest expected number of wins this year (9.8), just behind the Seahawks (9.9). Meanwhile, the Steelers scored 16th in the wiki survey, but they rank 11th in expected number of wins with an 8.4. That’s a smidgen better than the Cincinnati Bengals (8.3) and not much worse than the Baltimore Ravens (9.0), suggesting an even tighter battle for the AFC North division title than the wiki survey results alone.

There are a lot of other interesting quantities we could extract from the results of the game-level simulations, but that’s all I’ve got time to do now. If you want to poke around in the original data and simulation results, you can find them all in a .csv on my Google Drive (here). I’ve also posted a version of the R script I used to generate the game-level and season-level forecasts on Github (here).

At this point, I don’t have plans to try to update the forecasts during the season, but I will be seeing how the preseason predictions fare and occasionally reporting the results here. Meanwhile, if you have suggestions on other ways to use these data or to improve these forecasts, please leave a comment here on the blog.

* The version of this table I initially posted had an error in the WikiRank column where 18 was skipped and the rankings ran to 33. This version corrects that error. Thanks to commenter C.P. Liberatore for pointing it out.

## athenarcarson9

/ September 4, 2014I wonder what a team mascot for [redacted] would look like. A big black blob, perhaps?

## Texas Cowman

/ September 4, 2014Wow, I remember trying to predict game scores in the 1980’s from the year before.

All of our data came in cheap pulp paper magazines. It amazed me how many

super bowl winners didn’t even win their division in the next year.

Dallas, one of the wealthiest teams, is rated 26th and G. Bay, a fan-owned team,

is no. 5.

## Welton

/ September 5, 2014Neat Jay! You could also scrape team rankings from experts, average those ranks, and then combine those rankings with the ones from the PWC survey. Perhaps weighted .6 for experts and .4 for survey. Do you have a sense of how your survey compares to say ESPN preseason ranks?

## dartthrowingchimp

/ September 5, 2014Thanks, Welton.

If my objective were to produce the single-best set of forecasts I could and I had unlimited time, then the strategy you suggest sounds like a smart one to me. The latter’s not happening for obvious reasons, but I also should have been clearer about my (limited) goals.

For this toy project, I’m less interested in developing the most accurate forecasts possible–something a lot of people with a lot of financial backing are already doing–than I am in seeing how much predictive power we can squeeze out of scores from that wiki survey taken before the season even starts. Those results will tell us something interesting about the usefulness of that method, and crowdsourcing more generally, for prediction. If the method works reasonably well for this task–and, from last year’s NFL survey, it looks like it might–then it could be applied to lots of other problems, some of them less “fun.”

## CP Liberatore

/ September 5, 2014Under WikiRank, why is there no 18th ranked team and how can the Titans be ranked 33rd out of 32 teams?

## dartthrowingchimp

/ September 5, 2014Whoops, thanks for pointing that out. It looks like I appended the rankings to the ordered survey results before deleting the row for a user-added “idea.” Someone tried to add “San Francisco” to the list of teams, apparently not realizing how the pairwise part of the survey worked. When I download the score data from All Our Ideas, it includes rows for these user-added ideas, even if I didn’t approve their addition to the list on which people actually vote. So it doesn’t affect the voting and scoring, but it shows up in the data with a score of 50.00 (so 18th here), and that bumped the rest of the list off by one. I will fix it soon and post a corrected version.

## dartthrowingchimp

/ September 5, 2014Now corrected. Thanks again.

## cpaulliberatore

/ September 5, 2014Well done. Very interesting. Thanks!

>