This morning, I tinkered a bit with my pro-football preseason team strength survey data from 2013 and 2014 to see what other simple things I might do to improve the accuracy of forecasts derived from future versions of them.
My first idea is to go beyond a generic estimate of home-field advantage—about 3 points, according to my and everyone else’s estimates—with team-specific versions of that quantity. The intuition is that some venues confer a bigger advantage than others. For example, I would guess that Denver enjoys a bigger home-field edge than most teams because their stadium is at high altitude. The Broncos live there, so they’re used to it, but visiting teams have to adapt, and that process supposedly takes about a day for every 1,000 feet over 3,000. Some venues are louder than others, and that noise is often dialed up when visiting teams would prefer some quiet. And so on.
To explore this idea, I’m using a simple hierarchical linear model to estimate team-specific intercepts after taking preseason estimates of relative team strength into account. The line of R code used to estimate the model requires the lme4 package and looks like this:
mod1 <- lmer(score.raw ~ wiki.diff + (1 | home_team), results)
score.raw = home_score - visitor_score
wiki.diff = home_wiki - visitor_wiki
Those wiki vectors are the team strength scores estimated from preseason pairwise wiki surveys. The ‘results’ data frame includes scores for all regular and postseason games from those two years so far, courtesy of devstopfix’s NFL results repository on GitHub (here). Because the net game and strength scores are both ordered home to visitor, we can read those random intercepts for each home team as estimates of team-specific home advantage. There are probably other sources of team-specific bias in my data, so those estimates are going to be pretty noisy, because I think it’s a reasonable starting point.
My initial results are shown in the plot below, which I get with these two lines of code, the second of which requires the lattice package:
ha1 <- ranef(mod1, condVar=TRUE)
Bear in mind that the generic (fixed) intercept is 2.7, so the estimated home-field advantage for each team is what’s shown in the plot plus that number. For example, these estimates imply that my Ravens enjoy a net advantage of about 3 points when they play in Baltimore, while their division-rival Bengals are closer to 6.
In light of DeflateGate, I guess I shouldn’t be surprised to see the Pats at the top of the chart, almost a whole point higher than the second-highest team. Maybe their insanely home low fumble rate has something to do with it.* I’m also heartened to see relatively high estimates for Denver, given the intuition that started this exercise, and Seattle, which I’ve heard said enjoys an unusually large home-field edge. At the same time, I honestly don’t know what to make of the exceptionally low estimates for DC and Jacksonville, who appear from these estimates to suffer a net home-field disadvantage. That strikes me as odd and undercuts my confidence in the results.
In any case, that’s how far my tinkering took me today. If I get really bored motivated, I might try re-estimating the model with just the 2013 data and then running the 2014 preseason survey scores through that model to generate “forecasts” that I can compare to the ones I got from the simple linear model with just the generic intercept (here). The point of the exercise was to try to get more accurate forecasts from simple models, and the only way to test that is to do it. I’m also trying to decide if I need to cross these team-specific effects with season-specific effects to try to control for differences across years in the biases in the wiki survey results when estimating these team-specific intercepts. But I’m not there yet.
* After I published this post, Michael Lopez helpfully pointed me toward a better take on the Patriots’ fumble rate (here), and Mo Patel observed that teams manage their own footballs on the road, too, so that particular tweak—if it really happened—wouldn’t have a home-field-specific effect.