Yes, By Golly, I Am Ready for Some Football

The NFL’s 2015 season sort of got underway last night with the Hall of Fame Game. Real preseason play doesn’t start until this weekend, and the first kickoff of the regular season is still a month away.

No matter, though — I’m taking the Hall of Fame Game as my cue to launch this year’s football forecasting effort. As it has for the past two years (see here and here), the process starts with me asking you to help assess the strength of this year’s teams by voting in a pairwise wiki survey:

In the 2015 NFL season, which team will be better?

That survey produces scores on a scale of 0–100. Those scores will become the crucial inputs into simulations based on a simple statistical model estimated from the past two years’ worth of survey data and game results. Using an R function I wrote, I’ve determined that I should be able to improve the accuracy of my forecasts a bit this year by basing them on a mixed-effects model with random intercepts to account for variation in home-team advantages across the league. Having another season’s worth of predicted and actual outcomes should help, too; with two years on the books, my model-training sample has doubled.

An improvement in accuracy would be great, but I’m also excited about using R Studio’s Shiny to build a web page that will let you explore the forecasts at a few levels: by game, by team, and by week. Here’s a screenshot of the game-level tab from a working version using the 2014 data. It plots the distribution of the net scores (home – visitor) from the 1,000 simulations, and it reports win probabilities for both teams and a line (the median of the simulated scores).

nfl.forecasts.app.game.20150809

The “By team” tab lets you pick a team to see a plot of the forecasts for all 16 of their games, along with their predicted wins (count of games with win probabilities over 0.5) and expected wins (sum of win probabilities for all games) for the year. The “By week” tab (shown below) lets you pick a week to see the forecasts for all the games happening in that slice of the season. Before, I plan to add annotation to the plot, reporting the lines those forecasts imply (e.g., Texans by 7).

nfl.forecasts.app.week.20150809

Of course, the quality of the forecasts displayed in that app will depend heavily on participation in the wiki survey. Without a diverse and well-informed set of voters, it will be hard to do much better than guessing that each team will do as well this year as it did last year. So, please vote here; please share this post or the survey link with friends and family who know something about pro football; and please check back in a few weeks for the results.

What Should the U.S. Do Now in Syria?

You tell me.

To help you do that, I’ve created a pairwise wiki survey on All Our Ideas. Click HERE to participate. You can vote on the options I listed or add your own.

Results are updated in real time. Just click on the View Results tab to see what the crowd is saying so far.

Before you add an idea, make sure it isn’t already covered in the existing set by clicking on the View Results tab and then the View All button at the bottom of the list.

Using Wiki Surveys to Forecast Rare Events

Pro football starts back up for real in just a few weeks. Who’s going to win the next Super Bowl?

This is a hard forecasting problem. The real action hasn’t even started yet, so most of the information we have about how a team will play this season is just extrapolated from other recent seasons, and even without big personnel changes, team performance can vary quite a bit from year to year. Also, luck plays a big role in pro football; one bad injury to a key player or one unlucky break in a playoff game can visibly bend (or end) the arc of a team’s season. I suspect that most fans could do a pretty good job sorting teams now by their expected strength, but correctly guessing exactly who will win the championship is a much tougher nut to crack.

Of course, that doesn’t mean people aren’t trying. PredictWise borrows from online betting site Betfair to give us one well-informed set of forecasts about that question. Here’s a screenshot from this morning (August 11, 2013) of PredictWise’s ten most likely winners of Super Bowl XLVIII:

predictwise 2014 super bowl forecast 20130811

I’m not trying to make the leap into sports bookmaking, but I am interested in hard forecasting questions, and I’m currently using this Super Bowl-champs question as a toy problem to explore how we might apply a relatively new crowdsourced survey technology to forecasting rare events. So far, I think the results look promising.

The technology is a pairwise wiki survey. It’s being developed by a Princeton-based research project called All Our Ideas, and according to its creators, here’s how it works:

[A pairwise wiki survey] consists of a single question with many possible answer items. Respondents can participate in a pairwise wiki survey in two ways: fi rst, they can make pairwise comparisons between items (i.e., respondents vote between item A and item B); and second, they can add new items that are then presented to future respondents.

The resulting pairwise votes are converted into aggregate ratings using a Bayesian hierarchical model that estimates collective preferences most consistent with the observed data.

Pairwise wiki surveys weren’t designed specifically for forecasting, but I think we can readily adapt them to versions of that task that involve comparisons of risk. Instead of asking which item in a pair respondents prefer, we can ask them which item in a pair is more likely. The resulting scores won’t quite be the forecasts we’re looking for, but they’ll contain a lot of useful information about rank ordering and relative likelihoods.

My Super Bowl survey has only been running for a few days now, but I’ve already got more than 1,300 votes, and the results it’s produced so far look pretty credible to me. Here’s a screenshot of the top 10 from my wiki survey on the next NFL champs, as of 8 AM on August 11, 2013. As you can see, the top 10 is nearly identical to the top 10 at Betfair—they’ve got the Bengals where my survey has the NY Giants—and even within the top 10, the teams sort pretty similarly.

allourideas 2014 super bowl survey results 20130811

The 0-100 scores in that chart aren’t estimates of the probability that a team will win the Super Bowl. Because only one team can win, those estimates would have to sum to 100 across all 32 teams in the league. Instead, the scores shown here are the model-estimated chances that each team will win if pitted against another team chosen at random. As such, they’re better thought of as estimates of relative strength with an as-yet unspecified relationship to the probability of winning the championship.

This scalar measure of relative strength will often be interesting on its own, but for forecasting applications, we’d usually prefer to have these values expressed as probabilities. Following PredictWise, we can get there by summing all the scores and treating that sum as the denominator in a likelihood ratio that behaves like a true probability. For example, when that screenshot of my wiki survey was taken, the scores across all 32 NFL teams summed to 1,607, so the estimated probability of the Atlanta Falcons winning Super Bowl XLVIII were 5.4% (87/1,607), while the chances that my younger son’s Ravens will repeat were pegged about 4.9% (79/1,607).

For problems with a unique outcome, this conversion is easy to do, because the contours of the event space are known in advance. As the immortals in Highlander would have it, “There can be only one.”

Things get tougher if we want to apply this technique to problems where there isn’t a fixed number of events—say, trying to anticipate where coups or insurgencies or mass atrocities are likely to happen in the coming year.  One way to extend the method to these kinds of problems would be to use historical data to identify the base rate of relevant events and then use that base rate as a multiplier in the transformation math as follows:

predicted probability = base rate * [ score / (sum of scores) ]

When a rare-events model is well calibrated, the sum of the predicted probabilities it produces should roughly equal the number of events that actually occur. The approach I’m proposing just works that logic in reverse, starting with the (reasonable) assumption that the base rate is a good forecast of the number of events that will occur and then inflating or deflating the estimated probabilities accordingly.

For example, say I’m interested in forecasting onsets of state-sponsored mass killing around the world, and I know the annual base rate of these events over the past few decades has only been about 1.2. I could use a pairwise wiki survey to ask respondents “Which country is more likely to experience an onset of state-sponsored mass killing in 2014?” and get scores like the ones in the All Our Ideas chart above. To convert the score for Country X to a predicted probability, I could sum the resulting scores for all countries, divide Country X’s score by that sum, and then multiply the result by 1.2.

This process might seem a bit ad hoc, but I think it’s one reasonable solution to a tough problem. In fact, this is basically the same thing that happens in a logistic regression model, which statisticians (and wannabes like me) often use to forecast discrete events. In the equation we get from a logistic regression model, the intercept captures information about the base rate and uses that as a starting point for all of the responses, which are initially expressed as log odds. The vector of predictors and the associated weights just slides the log odds up or down from that benchmark, and a final operation converts those log odds to a familiar 0-1 probability.

On the football problem, I would expect Betfair to be more accurate than my wiki survey, because Betfair’s odds are based on the behavior of people who are putting money on the line. For rare events in international politics, though, there is no Betfair equivalent. In situations where statistical modeling is inefficient or impossible—or we just want to know what some pool of respondents believe the risks are—I think this wiki-survey approach could be a useful tool.

  • Author

  • Follow me on Twitter

  • Follow Dart-Throwing Chimp on WordPress.com
  • Enter your email address to follow this blog and receive notifications of new posts by email.

    Join 13,609 other subscribers
  • Archives

%d bloggers like this: