Estimating NFL Team-Specific Home-Field Advantage

This morning, I tinkered a bit with my pro-football preseason team strength survey data from 2013 and 2014 to see what other simple things I might do to improve the accuracy of forecasts derived from future versions of them.

My first idea is to go beyond a generic estimate of home-field advantage—about 3 points, according to my and everyone else’s estimates—with team-specific versions of that quantity. The intuition is that some venues confer a bigger advantage than others. For example, I would guess that Denver enjoys a bigger home-field edge than most teams because their stadium is at high altitude. The Broncos live there, so they’re used to it, but visiting teams have to adapt, and that process supposedly takes about a day for every 1,000 feet over 3,000. Some venues are louder than others, and that noise is often dialed up when visiting teams would prefer some quiet. And so on.

To explore this idea, I’m using a simple hierarchical linear model to estimate team-specific intercepts after taking preseason estimates of relative team strength into account. The line of R code used to estimate the model requires the lme4 package and looks like this:

mod1 <- lmer(score.raw ~ wiki.diff + (1 | home_team), results)

Where

score.raw = home_score - visitor_score
wiki.diff = home_wiki - visitor_wiki

Those wiki vectors are the team strength scores estimated from preseason pairwise wiki surveys. The ‘results’ data frame includes scores for all regular and postseason games from those two years so far, courtesy of devstopfix’s NFL results repository on GitHub (here). Because the net game and strength scores are both ordered home to visitor, we can read those random intercepts for each home team as estimates of team-specific home advantage. There are probably other sources of team-specific bias in my data, so those estimates are going to be pretty noisy, because I think it’s a reasonable starting point.

My initial results are shown in the plot below, which I get with these two lines of code, the second of which requires the lattice package:

ha1 <- ranef(mod1, condVar=TRUE)
dotplot(ha1)

Bear in mind that the generic (fixed) intercept is 2.7, so the estimated home-field advantage for each team is what’s shown in the plot plus that number. For example, these estimates imply that my Ravens enjoy a net advantage of about 3 points when they play in Baltimore, while their division-rival Bengals are closer to 6.

home.advantage.estimates

In light of DeflateGate, I guess I shouldn’t be surprised to see the Pats at the top of the chart, almost a whole point higher than the second-highest team. Maybe their insanely home low fumble rate has something to do with it.* I’m also heartened to see relatively high estimates for Denver, given the intuition that started this exercise, and Seattle, which I’ve heard said enjoys an unusually large home-field edge. At the same time, I honestly don’t know what to make of the exceptionally low estimates for DC and Jacksonville, who appear from these estimates to suffer a net home-field disadvantage. That strikes me as odd and undercuts my confidence in the results.

In any case, that’s how far my tinkering took me today. If I get really bored motivated, I might try re-estimating the model with just the 2013 data and then running the 2014 preseason survey scores through that model to generate “forecasts” that I can compare to the ones I got from the simple linear model with just the generic intercept (here). The point of the exercise was to try to get more accurate forecasts from simple models, and the only way to test that is to do it. I’m also trying to decide if I need to cross these team-specific effects with season-specific effects to try to control for differences across years in the biases in the wiki survey results when estimating these team-specific intercepts. But I’m not there yet.

* After I published this post, Michael Lopez helpfully pointed me toward a better take on the Patriots’ fumble rate (here), and Mo Patel observed that teams manage their own footballs on the road, too, so that particular tweak—if it really happened—wouldn’t have a home-field-specific effect.

Advertisements

Turning Crowdsourced Preseason NFL Strength Ratings into Game-Level Forecasts

For the past week, nearly all of my mental energy has gone into the Early Warning Project and a paper for the upcoming APSA Annual Meeting here in Washington, DC. Over the weekend, though, I found some time for a toy project on forecasting pro-football games. Here are the results.

The starting point for this toy project is a pairwise wiki survey that turns a crowd’s beliefs about relative team strength into scalar ratings. Regular readers will recall that I first experimented with one of these before the 2013-2014 NFL season, and the predictive power wasn’t terrible, especially considering that the number of participants was small and the ratings were completed before the season started.

This year, to try to boost participation and attract a more knowledgeable crowd of respondents, I paired with Trey Causey to announce the survey on his pro-football analytics blog, The Spread. The response has been solid so far. Since the survey went up, the crowd—that’s you!—has cast nearly 3,400 votes in more than 100 unique user sessions (see the Data Visualizations section here).

The survey will stay open throughout the season, but that doesn’t mean it’s too early to start seeing what it’s telling us. One thing I’ve already noticed is that the crowd does seem to be updating in response to preseason action. For example, before the first round of games, I noticed that the Baltimore Ravens, my family’s favorites, were running mid-pack with a rating of about 50. After they trounced the defending NFC champion 49ers in their preseason opener, however, the Ravens jumped to the upper third with a rating of 59. (You can always see up-to-the-moment survey results here, and you can cast your own votes here.)

The wiki survey is a neat way to measure team strength. On their own, though, those ratings don’t tell us what we really want to know, which is how each game is likely to turn out, or how well our team might be expected to do this season. The relationship between relative strength and game outcomes should be pretty strong, but we might want to consider other factors, too, like home-field advantage. To turn a strength rating into a season-level forecast for a single team, we need to consider the specifics of its schedule. In game play, it’s relative strength that matters, and some teams will have tougher schedules than others.

A statistical model is the best way I can think to turn ratings into game forecasts. To get a model to apply to this season’s ratings, I estimated a simple linear one from last year’s preseason ratings and the results of all 256 regular-season games (found online in .csv format here). The model estimates net score (home minus visitor) from just one feature, the difference between the two teams’ preseason ratings (again, home minus visitor). Because the net scores are all ordered the same way and the model also includes an intercept, though, it implicitly accounts for home-field advantage as well.

The scatterplot below shows the raw data on those two dimensions from the 2013 season. The model estimated from these data has an intercept of 3.1 and a slope of 0.1 for the score differential. In other words, the model identifies a net home-field advantage of 3 points—consistent with the conventional wisdom—and it suggests that every point of advantage on the wiki-survey ratings translates into a net swing of one-tenth of a point on the field. I also tried a generalized additive model with smoothing splines to see if the association between the survey-score differential and net game score was nonlinear, but as the scatterplot suggests, it doesn’t seem to be.

2013 NFL Games Arranged by Net Game Score and Preseason Wiki Survey Rating Differentials

2013 NFL Games Arranged by Net Game Score and Preseason Wiki Survey Rating Differentials

In sample, the linear model’s accuracy was good, not great. If we convert the net scores the model postdicts to binary outcomes and compare those postdictions to actual outcomes, we see that the model correctly classifies 60 percent of the games. That’s in sample, but it’s also based on nothing more than home-field advantage and a single preseason rating for each team from a survey with a small number of respondents. So, all things considered, it looks like a potentially useful starting point.

Whatever its limitations, that model gives us the tool we need to convert 2014 wiki survey results into game-level predictions. To do that, we also need a complete 2014 schedule. I couldn’t find one in .csv format, but I found something close (here) that I saved as text, manually cleaned in a minute or so (deleted extra header rows, fixed remaining header), and then loaded and merged with a .csv of the latest survey scores downloaded from the manager’s view of the survey page on All Our Ideas.

I’m not going to post forecasts for all 256 games—at least not now, with three more preseason games to learn from and, hopefully, lots of votes yet to be cast. To give you a feel for how the model is working, though, I’ll show a couple of cuts on those very preliminary results.

The first is a set of forecasts for all Week 1 games. The labels show Visitor-Home, and the net score is ordered the same way. So, a predicted net score greater than 0 means the home team (second in the paired label) is expected to win, while a predicted net score below 0 means the visitor (first in the paired label) is expected to win. The lines around the point predictions represent 90-percent confidence intervals, giving us a partial sense of the uncertainty around these estimates.

Week 1 Game Forecasts from Preseason Wiki Survey Results on 10 August 2014

Week 1 Game Forecasts from Preseason Wiki Survey Results on 10 August 2014

Of course, as a fan of particular team, I’m most interested in what the model says about how my guys are going to do this season. The next plot shows predictions for all 16 of Baltimore’s games. Unfortunately, the plotting command orders the data by label, and my R skills and available time aren’t sufficient to reorder them by week, but the information is all there. In this plot, the dots for the point predictions are colored red if they predict a Baltimore win and black for an expected loss. The good news for Ravens fans is that this plot suggests an 11-5 season, good enough for a playoff berth. The bad news is that an 8-8 season also lies within the 90-percent confidence intervals, so the playoffs don’t look like a lock.

2014 Game-Level Forecasts for the Baltimore Ravens from 10 August 2014 Wiki Survey Scores

2014 Game-Level Forecasts for the Baltimore Ravens from 10 August 2014 Wiki Survey Scores

So that’s where the toy project stands now. My intuition tells me that the predicted net scores aren’t as well calibrated as I’d like, and the estimated confidence intervals surely understate the true uncertainty around each game (“On any given Sunday…”). Still, I think this exercise demonstrates the potential of this forecasting process. If I were a betting man, I wouldn’t lay money on these estimates. As an applied forecaster, though, I can imagine using these predictions as priors in a more elaborate process that incorporates additional and, ideally, more dynamic information about each team and game situation over the course of the season. Maybe my doppelganger can take that up while I get back to my day job…

Postscript. After I published this post, Jeff Fogle suggested via Twitter that I compare the Week 1 forecasts to the current betting lines for those games. The plot below shows the median point spread from an NFL odds-aggregating site as blue dots on top of the statistical forecasts already shown above. As you can see, the statistical forecasts are tracking the betting lines pretty closely. There’s only one game—Carolina at Tampa Bay—where the predictions from the two series fall on different sides of the win/loss line, and it’s a game the statistical model essentially sees as a toss-up. It’s also reassuring that there isn’t a consistent direction to the differences, so the statistical process doesn’t seem to be biased in some fundamental way.

Week 1 Game-Level Forecasts Compared to Median Point Spread from Betting Sites on 11 August 2014

Week 1 Game-Level Forecasts Compared to Median Point Spread from Betting Sites on 11 August 2014

Two Forecasting Lessons from a Crazy Football Season

My younger son is a huge fan of the Baltimore Ravens, and his enthusiasm over the past several years has converted me, so we had a lot of fun (and gut-busting anxiety) watching the Super Bowl on Sunday.

As a dad and fan, my favorite part of the night was the Baltimore win. As a forecaster, though, my favorite discovery of the night was a web site called Advanced NFL Stats, one of a budding set of quant projects applied to the game of football. Among other things, Advanced NFL Stats produces charts of the probability that either team will win every pro game in progress, including the Super Bowl. These charts are apparently based on a massive compilation of stats from games past, and they are updated in real time. As we watched the game, I could periodically refresh the page on my mobile phone and give us a fairly reliable, up-to-the-minute forecast of the game’s outcome. Since the Super Bowl confetti has settled, I’ve spent some time poking through archived charts of the Ravens’ playoff run, and that exercise got me thinking about two lessons for forecasters.

1. Improbable doesn’t mean impossible.

To get to the Super Bowl, the Ravens had to beat the Denver Broncos in the divisional round of the playoffs. Trailing by seven with 3:12 left in that game, the Ravens turned the ball over to Denver on downs at the Broncos’ 31-yard line. To win from there, the Ravens would need a turnover or quick stop; then a touchdown; then either a successful two-point conversion or a first score in overtime.

As the chart below shows, the odds of all of those things coming together were awfully slim. At that point—just before “Regulation” on the chart’s bottom axis—Advanced NFL Stats’ live win-probability graph gave the Ravens roughly a 1% chance of winning. Put another way, if the game could be run 100 times from that position, we would only expect to see Baltimore win once.

ravens_broncos_realtime_forecast

Well, guess what happened? The one-in-a-hundred event, that’s what. Baltimore got the quick stop they needed, Denver punted, Joe Flacco launched a 70-yard bomb down the right sideline to Jacoby Jones for a touchdown, the Ravens pushed the game into overtime, and two minutes into the second extra period at Mile High Stadium, Justin Tucker booted a 47-yard field goal to carry Baltimore back to the AFC Championship.

For Ravens’ fans, that outcome was a %@$# miracle. For forecasters, it was a great reminder that even highly unlikely events happen sometimes. When Nate Silver’s model indicates on the eve of the 2012 election that President Obama has a 91% chance of winning, it isn’t saying that Obama is going to win. It’s saying he’s probably going to win, and the Ravens-Broncos game reminds us that there’s an important difference. Conversely, when a statistical model of rare events like coups or mass killings identifies certain countries as more susceptible than others, it isn’t necessarily suggesting that those highest-risk cases are definitely going to suffer those calamities. When dealing with events as rare as those, even the most vulnerable cases will escape most years without a crisis.

The larger point here is one that’s been made many times but still deserves repeating: no single probabilistic forecast is plainly right and wrong. A sound forecasting process will reliably distinguish the more likely from the less likely, but it won’t attempt to tell us exactly what’s going to happen in every case. Instead, the more accurate the forecasts, the more closely the frequency of real-world outcomes or events will track the predicted probabilities assigned to them. If a meteorologist’s model is really good, we should end up getting wet roughly half of the times she tells us there’s a 50% chance of rain. And almost every time the live win-probability graph gives a football team a 99% chance of winning, they will go on to win that game—but, as my son will happily point out, not every time.

2. The “obvious” indicators aren’t always the most powerful predictors.

Take a look at the Advanced NFL Stats chart below, from Sunday’s Super Bowl. See that sharp dip on the right, close to the end? Something really interesting happened there: late in the game, Baltimore led on score (34-29) but trailed San Francisco in its estimated probability of winning (about 45%).

ravens_49ers_realtime_forecast

How could that be? Consideration of the likely outcomes of the next two possessions makes it clearer. At the time, San Francisco had a first-and-goal situation from Baltimore’s seven yard line. Teams with four shots at the end zone from seven yards out usually score touchdowns, and teams that get the ball deep in their own territory with a two- or three-point deficit and less than two minutes to play usually lose. In that moment, the live forecast confirmed the dread that Ravens fans were feeling in our guts: even though San Francisco was still trailing, the game had probably slipped away from Baltimore.

I think there’s a useful lesson for forecasters in that peculiar situation: the most direct indicators don’t tell the whole story. In football, the team with a late-game lead is usually going to win, but Advanced NFL Stats’ data set and algorithm have uncovered at least one situation where that’s not the case.

This lesson also applies to efforts to forecasts political processes, like violent conflict and regime collapse. With the former, we tend to think of low-level violence as the best predictor of future civil wars, but that’s not always true. It’s surely a valuable piece of information, but there are other sources of positive and negative feedback that might rein in incipient violence in some cases and produce sudden eruptions in others. Ditto for dramatic changes in political regimes. Eritrea, for example, recently had some sort of mutiny and North Korea did not, but that doesn’t necessarily mean the former is closer to breaking down than the latter. There may be features of the Eritrean regime that will allow it to weather those challenges and aspects of the North Korean regime that predispose it to more abrupt collapse.

In short, we shouldn’t ignore the seemingly obvious signals, but we should be careful to put them in their proper context, and the results will sometimes be counter-intuitive.

Oh, and…THIS:

flacco_confetti

  • Follow me on Twitter

  • Follow Dart-Throwing Chimp on WordPress.com
  • Enter your email address to follow this blog and receive notifications of new posts by email.

    Join 13,640 other followers

  • Archives

  • Advertisements
%d bloggers like this: