Interactive 2015 NFL Forecasts

As promised in my last post, I’ve now built and deployed a web app that lets you poke through my preseason forecasts for the 2015 NFL regular season:

2015 NFL Forecasts

I learned several new tricks in the course of generating these forecasts and building this app, so the exercise served its didactic purpose. (You can find the code for the app here, on GitHub.) I also got lucky with the release of a new R package that solved a crucial problem I was having when I started to work on this project a couple of weeks ago. Open source software can be a wonderful thing.

The forecasts posted right now are based on results of the pairwise wiki survey through the morning of Monday, August 17. At that point, the survey had already logged upwards of 12,000 votes, triple the number cast in last year’s edition. This time around, I posted a link to the survey on the r/nfl subreddit, and that post produced a brief torrent of activity from what I hope was a relatively well-informed crowd.

The regular season doesn’t start until September, and I will update these forecasts at least once more before that happens. With so many votes already cast, though, the results will only change significantly if a) a large number of new votes are cast and b) those votes differ substantially from the ones already cast, and those conditions are highly unlikely to intersect.

One thing these forecasts help to illustrate is how noisy a game professional football is. By noisy, I mean hard to predict with precision. Even in games where one team is much stronger than the other, we still see tremendous variance in the simulated net scores and the associated outcomes. Heavy underdogs will win big every once in a while, and games we’d consider close when watching can produce a wide range of net scores.

Take, for example, the week 1 match-up between the Bears and Packers. Even though Chicago’s the home team, the simulation results (below) favor Green Bay by more than eight points. At the same time, those simulations also include a smattering of outcomes in which the Bears win by multiple touchdowns, and the peak of the distribution of simulations is pretty broad and flat. Some of that variance results from the many imperfections of the model and survey scores, but a lot of it is baked into the game, and plots of the predictive simulations nicely illustrate that noisiness.

nfl.forecast.packers.bears

The big thing that’s still missing from these forecasts is updating during the season. The statistical model that generates the predictive simulations takes just two inputs for each game — the difference between the two teams’ strength scores and the name of the home team — and, barring catastrophe, only one of those inputs can change as the season passes. I could leave the wiki survey running throughout the season, but the model that turns survey votes into scores doesn’t differentiate between recent and older votes, so updating the forecasts with the latest survey scores is unlikely to move the needle by much.*

I’m now hoping to use this problem as an entry point to learning about Bayesian updating and how to program it in R. Instead of updating the actual survey scores, we could treat the preseason scores as priors and then use observed game scores or outcomes to sequentially update estimates of them. I haven’t figured out how to implement this idea yet, but I’m working on it and will report back if I do.

* The pairwise wiki survey runs on open source software, and I can imagine modifying the instrument to give more weight to recent votes than older ones. Right now, I don’t have the programming skills to make those modifications, but I’m still hoping to find someone who might want to work with me, or just take it upon himself or herself, to do this.

2015 Tour de France Predictions

I like to ride bikes, I like to watch the pros race their bikes, and I make forecasts for a living, so I thought it would be fun to try to predict the outcome of this year’s Tour de France, which starts this Saturday and ends on July 26. I’m also interested in continuing to explore the predictive power of pairwise wiki surveys, a crowdsourcing tool that I’ve previously used to try to forecast mass-killing onsets, coup attempts, and pro football games, and that ESPN recently used to rank NBA draft prospects.

So, a couple of weeks ago, I used All Our Ideas to create a survey that asks, “Which rider is more likely to win the 2015 Tour de France?” I seeded the survey with the names of 11 riders—the 10 seen by bookmakers at Paddy Power as the most likely winners, plus Peter Sagan because he’s fun to watchposted a link to the survey on Tumblr, and trolled for respondents on Twitter and Facebook. The survey got off to a slow start, but then someone posted a link to it in the r/cycling subreddit, and the votes came pouring in. As of this afternoon, the survey had garnered more than 4,000 votes in 181 unique user sessions that came from five continents (see the map below). The crowd also added a handful of other riders to the set under consideration, bringing the list up to 16.

tourdefrance.2015.votemap

So how does that self-selected crowd handicap the race? The dot plot below shows the riders in descending order by their survey scores, which range from 0 to 100 and indicate the probability that that rider would beat a randomly chosen other rider for a randomly chosen respondent. In contrast to Paddy Power, which currently shows Chris Froome as the clear favorite and gives Nairo Quintana a slight edge over Alberto Contador, this survey sees Contador as the most likely winner (survey score of 90), followed closely by Froome (87) and a little further by Quintana (80). Both sources put Vincenzo Nibali as fourth likeliest (73) and Tejay van Garderen (65) and Thibaut Pinot (51) in the next two spots, although Paddy Power has them in the opposite order. Below that, the distances between riders’ chances get smaller, but the wiki survey’s results still approximate the handicapping of the real-money markets pretty well.

tourdefrance.2015.scores

There are at least a couple of ways to try to squeeze some meaning out those scores. One is to read the chart as a predicted finishing order for the 16 riders listed. That’s useful for something like a bike race, where we—well, some of us, anyway—care not only who wins, but also where other will riders finish, too.

We can also try to convert those scores to predicted probabilities of winning. The chart below shows what happens when we do that by dividing each rider’s score by the sum of all scores and then multiplying the result by 100. The probabilities this produces are all pretty low and more tightly bunched than seems reasonable, but I’m not sure how else to do this conversion. I tried squaring and cubing the scores; the results came closer to what the betting-market odds suggest are the “right” values, but I couldn’t think of a principled reason to do that, so I’m not showing those here. If you know a better way to get from those model scores to well-calibrated win probabilities, please let me know in the comments.

tourdefrance.2015.winprobs

So that’s what the survey says. After the Tour concludes in a few weeks, I’ll report back on how the survey’s predictions fared. Meanwhile, here’s wishing the athletes a crash–, injury–, and drug–free tour. Judging by the other big races I’ve seen so far this year, it should be a great one to watch.

A Good Dream

The novel Station Eleven—an immediate addition to my short list of favorite books—imagines the world after the global political economy has disintegrated. A flu pandemic has killed almost all humans, and the ones who remain inhabit the kinds of traveling bands or small encampments that are only vaguely familiar to most of us. There is no gasoline, no Internet, no electricity.

“I dreamt last night I saw an airplane,” Dieter whispered. They were lying a few feet apart in the dark of the tent. They had only ever been friends—in a hazy way Kirsten thought of him as family—but her thirty-year-old tent had finally fallen apart a year ago and she hadn’t yet managed to find a new one. For obvious reasons she was no longer sharing a tent with Sayid, so Dieter, who had one of the largest tents in the Symphony, had been hosting her. Kirsten heard soft voices outside, the tuba and the first violin on watch. The restless movements of the horses, penned between the three caravans for safety.

“I haven’t thought of an airplane in so long.”

“That’s because you’re so young.” A slight edge to his voice. “You don’t remember anything.”

“I do remember things. Of course I do. I was eight.”

Dieter had been twenty years old when the world ended. The main difference between Dieter and Kirsten was that Dieter remembered everything. She listened to him breathe.

“I used to watch for it,” he said. “I used to think about the countries on the other side of the ocean, wonder if any of them had somehow been spared. If I ever saw an airplane, that meant that somewhere planes still took off. For a whole decade after the pandemic, I kept looking at the sky.”

“Was it a good dream?”

“In the dream I was so happy,” he whispered. “I looked up and there it was, the plane had finally come. There was still a civilization somewhere. I fell to my knees. I started weeping and laughing, and then I woke up.”

Leaving New Orleans by jet yesterday morning only a couple of weeks after reading that book, flying—with wifi on a tablet!—felt miraculous again. As we lifted away from the airport an hour after sunrise on a clear day, I could see a dozen freighters lined up on the Mississippi, a vast industrial plant of some kind billowing steam on the adjacent shore, a railway spreading like capillaries as it ran out of the plant.

As we inhabit that world, it feels inevitable, but it was not. Our political economy is as natural as a termite mound, but it did not have to arise and cohere, to turn out like this—to turn out at all.

Nor does it have to persist. The first and only other time I visited New Orleans was in 2010, for the same conference in the same part of town—the Warehouse District, next to the river. Back then, a little closer to Katrina, visual reminders of the flood that already happened gave that part of the city an eerie feel. I stayed in hotel a half-mile south of the conference venue, and the walk to the Hilton led me past whole blocks that were still mostly empty, fresh coats of bright paint covering the facades that water had submerged five years before.

Now, with pictures in the news of tunnels scratched out of huge snow banks in Boston and Manhattan ringed by ice, it’s the future flood that haunts New Orleans in my mind as I walk back from an excursion to the French Quarter to get the best possible version of a drink made from boiled water and beans grown thousands of miles away, scores of Mardi Gras bead strings still hanging from some gutters. Climate change is “weirding” our weather, rendering the models we use to anticipate events like Katrina less and less reliable. A flood will happen again, probably sooner than we expect, and yet here everybody is, returning and rebuilding and cavorting right where all that water will want to go.

Self Points

For the second year in a row, Dart-Throwing Chimp won Best Blog (Individual) at the Online Achievement in International Studies awards, a.k.a. the Duckies (see below). Thank you for continuing to read and, apparently, for voting.

Duckie 2015

Before the awards, Eva Brittin-Snell, a student of IR at University of Sussex, interviewed a few of last year’s winners, including me, about blogging on international affairs. You can read her post on the SAGE Connection blog, here.

Estimating NFL Team-Specific Home-Field Advantage

This morning, I tinkered a bit with my pro-football preseason team strength survey data from 2013 and 2014 to see what other simple things I might do to improve the accuracy of forecasts derived from future versions of them.

My first idea is to go beyond a generic estimate of home-field advantage—about 3 points, according to my and everyone else’s estimates—with team-specific versions of that quantity. The intuition is that some venues confer a bigger advantage than others. For example, I would guess that Denver enjoys a bigger home-field edge than most teams because their stadium is at high altitude. The Broncos live there, so they’re used to it, but visiting teams have to adapt, and that process supposedly takes about a day for every 1,000 feet over 3,000. Some venues are louder than others, and that noise is often dialed up when visiting teams would prefer some quiet. And so on.

To explore this idea, I’m using a simple hierarchical linear model to estimate team-specific intercepts after taking preseason estimates of relative team strength into account. The line of R code used to estimate the model requires the lme4 package and looks like this:

mod1 <- lmer(score.raw ~ wiki.diff + (1 | home_team), results)

Where

score.raw = home_score - visitor_score
wiki.diff = home_wiki - visitor_wiki

Those wiki vectors are the team strength scores estimated from preseason pairwise wiki surveys. The ‘results’ data frame includes scores for all regular and postseason games from those two years so far, courtesy of devstopfix’s NFL results repository on GitHub (here). Because the net game and strength scores are both ordered home to visitor, we can read those random intercepts for each home team as estimates of team-specific home advantage. There are probably other sources of team-specific bias in my data, so those estimates are going to be pretty noisy, because I think it’s a reasonable starting point.

My initial results are shown in the plot below, which I get with these two lines of code, the second of which requires the lattice package:

ha1 <- ranef(mod1, condVar=TRUE)
dotplot(ha1)

Bear in mind that the generic (fixed) intercept is 2.7, so the estimated home-field advantage for each team is what’s shown in the plot plus that number. For example, these estimates imply that my Ravens enjoy a net advantage of about 3 points when they play in Baltimore, while their division-rival Bengals are closer to 6.

home.advantage.estimates

In light of DeflateGate, I guess I shouldn’t be surprised to see the Pats at the top of the chart, almost a whole point higher than the second-highest team. Maybe their insanely home low fumble rate has something to do with it.* I’m also heartened to see relatively high estimates for Denver, given the intuition that started this exercise, and Seattle, which I’ve heard said enjoys an unusually large home-field edge. At the same time, I honestly don’t know what to make of the exceptionally low estimates for DC and Jacksonville, who appear from these estimates to suffer a net home-field disadvantage. That strikes me as odd and undercuts my confidence in the results.

In any case, that’s how far my tinkering took me today. If I get really bored motivated, I might try re-estimating the model with just the 2013 data and then running the 2014 preseason survey scores through that model to generate “forecasts” that I can compare to the ones I got from the simple linear model with just the generic intercept (here). The point of the exercise was to try to get more accurate forecasts from simple models, and the only way to test that is to do it. I’m also trying to decide if I need to cross these team-specific effects with season-specific effects to try to control for differences across years in the biases in the wiki survey results when estimating these team-specific intercepts. But I’m not there yet.

* After I published this post, Michael Lopez helpfully pointed me toward a better take on the Patriots’ fumble rate (here), and Mo Patel observed that teams manage their own footballs on the road, too, so that particular tweak—if it really happened—wouldn’t have a home-field-specific effect.

Post Mortem on 2014 Preseason NFL Forecasts

Let’s end the year with a whimper, shall we?

Back in September (here), I used a wiki survey to generate a preseason measure of pro-football team strength and then ran that measure through a statistical model and some simulations to gin up forecasts for all 256 games of the 2014 regular season. That season ended on Sunday, so now we can see how those forecasts turned out.

The short answer: not awful, but not so great, either.

To assess the data and model’s predictive power, I’m going to focus on predicted win totals. Based on my game-level forecasts, how many contests was each team expected to win? Those totals nicely summarize the game-level predictions, and they are the focus of StatsbyLopez’s excellent post-season predictive review, here, against which I can compare my results.

StatsbyLopez used two statistics to assess predictive accuracy: mean absolute error (MAE) and mean squared error (MSE). The first is the average of the distance between each team’s projected and observed win totals. The second is the average of the square of those distances. MAE is a little easier to interpret—on average, how far off was each team’s projected win total?—while MSE punishes larger errors more than the first, which is nice if you care about how noisy your predictions are. StatsbyLopez used those stats to compare five sets of statistical predictions to the preseason betting line (Vegas) and a couple of simple benchmarks: last year’s win totals and a naive forecast of eight wins for everyone, which is what you’d expect to get if you just flipped a coin to pick winners.

Lopez’s post includes some nice column charts comparing those stats across sources, but it doesn’t include the stats themselves, so I’m going to have to eyeball his numbers and do the comparison in prose.

I summarized my forecasts two ways: 1) counts of the games each team had a better-than-even chance of winning, and 2) sums of each team’s predicted probabilities of winning.

  • The MAE for my whole-game counts was 2.48—only a little bit better than the ultra-naive eight-wins-for-everyone prediction and worse than everything else, including just using last year’s win totals. The MSE for those counts was 8.89, still worse than everything except the simple eights. For comparison, the MAE and MSE for the Vegas predictions were roughly 2.0 and 6.0, respectively.
  • The MAE for my sums was 2.31—about as good as the worst of the five “statsheads” Lopez considered, but still a shade worse than just carrying forward the 2013 win totals. The MSE for those summed win probabilities, however, was 7.05. That’s better than one of the sources Lopez considered and pretty close to two others, and it handily beats the two naive benchmarks.

To get a better sense of how large the errors in my forecasts were and how they were distributed, I also plotted the predicted and observed win totals by team. In the charts below, the black dots are the predictions, and the red dots are the observed results. The first plot uses the whole-game counts; the second the summed win probabilities. Teams are ordered from left to right according to their rank in the preseason wiki survey.

Predicted (black) and observed (red) 2014 regular-season win totals by team using whole-game counts

Predicted (black) and observed (red) 2014 regular-season win totals by team using whole-game counts

Predicted (black) and observed (red) 2014 regular-season win totals by team using summed win probabilities

Predicted (black) and observed (red) 2014 regular-season win totals by team using summed win probabilities

Substantively, those charts spotlight some things most football fans could already tell you: Dallas and Arizona were the biggest positive surprises of the 2014 regular season, while San Francisco, New Orleans, and Chicago were probably the biggest disappointments.  Detroit and Buffalo also exceeded expectations, although only one of them made it to the postseason, while Tampa Bay, Tennessee, the NY Giants, and the Washington football team also under-performed.

Statistically, it’s interesting but not surprising that the summed win probabilities do markedly better than the whole-game counts. Pro football is a noisy game, and we throw out a lot of information about the uncertainty of each contest’s outcome when we convert those probabilities into binary win/lose calls. In essence, those binary calls are inherently overconfident, so the win counts they produce are, predictably, much noisier than the ones we get by summing the underlying probabilities.

In spite of its modest performance in 2014, I plan to repeat this exercise next year. The linear regression model I use to convert the survey results into game-level forecasts has home-field advantage and the survey scores as its only inputs. The 2014 version of that model was estimated from just a single prior season’s data, so doubling the size of the historical sample to 512 games will probably help a little. Like all survey results, my team-strength score depends on the pool of respondents, and I keep hoping to get a bigger and better-informed crowd to participate in that part of the exercise. And, most important, it’s fun!

The Political Power of Inertia

Political scientists devote a lot of energy to theorizing about dramatic changes—things like revolutions, coups, popular uprisings, transitions to democracy, and the outbreak of wars within and between states. These changes are fascinating and consequential, but they are also extremely rare. In politics, as in physics, inertia is a powerful force. Our imagination is drawn to change, but if we want to understand the world as it is, then we have to explain the prevalence of continuity as well.

Examples of inertia in politics are easy to find. War is justifiably a central concern for political science, but for many decades now, almost none of the thousands of potential wars within and between states have actually happened. Once a war does start, though, it often persists for years in spite of the tremendous costs involved. The international financial system suffers frequent and sometimes severe shocks and has no sovereign to defend it, and yet the basic structure of that system has persisted for decades. Whole journals are devoted to popular uprisings and other social movements, but they very rarely happen, and when they do, they often fail to produce lasting institutional change. For an array of important phenomena in the social sciences, by far the best predictor of the status of the system at time (t + 1) is the status of the system at time (t).

One field in which inertia gets its due is organization theory. A central theme in that neck of the intellectual woods is the failure of firms and agencies to adapt to changes in their environment and the search for patterns that might explain those failures. Some theories of institutional design at the level of whole political systems also emphasize stasis over change. Institutions are sometimes said to be “sticky,” meaning that they often persist in spite of evident flaws and available alternatives. As Paul Pierson observes, “Once established, patterns of political mobilization, the institutional ‘rules of the game,’ and even citizens’ basic ways of thinking about the political world will often generate self-reinforcing dynamics.”

In international relations and comparative politics, we see lots of situations in which actions that might improve the lot of one or more parties are not taken. These are situations in which inertia is evident, even though it appears to be counterproductive. We often explain failures to act in these situations as the result of collective action problems. As Mancur Olson famously observed, people, organizations, and other agents have diverse interests; action to try to produce change is costly; and the benefits of those costly actions are often diffuse. Under these circumstances, a tally of expected costs and benefits will often discourage agents from taking action, tempting them instead to forego those costs and look to free ride on the contributions of others instead.

Collective action problems are real and influential. Still, I wonder if our theories put too much emphasis on those system-level sources of inertia and too little on causes at the level of the individual. We like to think of ourselves as free and unpredictable, but humans really are creatures of habit. For example, a study published in 2010 in Science (here) used data sampled from millions of mobile-phone users to show that there is “a potential 93% average predictability” in where users go and when, “an exceptionally high value rooted in the inherent regularity of human behavior.” The authors conclude that,

Despite our deep-rooted desire for change and spontaneity, our daily mobility is, in fact, characterized by a deep-rooted regularity.

A related study (here) used mobility and survey data from Kenya and found essentially the same thing. Its authors reported that “mobility estimates are surprisingly robust to the substantial biases in phone ownership across different geographical and socioeconomic groups.” Apparently, this regularity is not unique to rich countries.

The microfoundations of our devotion to routine may be evident in neurobiology. Behavioral routines are physically expressed and reinforced in the development of neural pathways related to specific memories and actions, and in the thickening of the myelin sheaths that facilitate conduction along those pathways. The result is a virtuous or vicious circle, depending on the behavior and context. Athletes and musicians take advantage of this process through practice, but practice is mostly repetition, and repetition is a form of routine. Repetition begets habituation begets repetition.

This innate attachment to routine may contribute to political inertia. Norms and institutions are often regarded as clever solutions to collective action problems that would otherwise thwart our interests and aspirations. At least in part, those norms and institutions may also be social manifestations of an inborn and profound preference for routine and regularity.

In our theoretical imaginations, we privilege change over stasis. As alternative futures, however, the two are functionally equivalent, and stasis is vastly more common than change. In principle, our theories should cover both alternatives. In practice, that is very hard to do, and many of us choose to emphasize the dramatic over the routine. I wonder if we have chosen wrong.

For now, I’ll give the last word on this topic to Frank Rich. He wrote a nice essay for the October 20, 2014, issue of New York Magazine about an exercise in which he read his way back through the daily news from 1964 to compare it to the supposedly momentous changes afoot in 2014. His conclusion:

Even as we recognize that the calendar makes for a crude and arbitrary marker, we like to think that history visibly marches on, on a schedule we can codify.

The more I dove back into the weeds of 1964, the more I realized that this is both wishful thinking and an optical illusion. I came away with a new appreciation of how selective our collective memory is, and of just how glacially history moves.

Turning Crowdsourced Preseason NFL Strength Ratings into Game-Level Forecasts

For the past week, nearly all of my mental energy has gone into the Early Warning Project and a paper for the upcoming APSA Annual Meeting here in Washington, DC. Over the weekend, though, I found some time for a toy project on forecasting pro-football games. Here are the results.

The starting point for this toy project is a pairwise wiki survey that turns a crowd’s beliefs about relative team strength into scalar ratings. Regular readers will recall that I first experimented with one of these before the 2013-2014 NFL season, and the predictive power wasn’t terrible, especially considering that the number of participants was small and the ratings were completed before the season started.

This year, to try to boost participation and attract a more knowledgeable crowd of respondents, I paired with Trey Causey to announce the survey on his pro-football analytics blog, The Spread. The response has been solid so far. Since the survey went up, the crowd—that’s you!—has cast nearly 3,400 votes in more than 100 unique user sessions (see the Data Visualizations section here).

The survey will stay open throughout the season, but that doesn’t mean it’s too early to start seeing what it’s telling us. One thing I’ve already noticed is that the crowd does seem to be updating in response to preseason action. For example, before the first round of games, I noticed that the Baltimore Ravens, my family’s favorites, were running mid-pack with a rating of about 50. After they trounced the defending NFC champion 49ers in their preseason opener, however, the Ravens jumped to the upper third with a rating of 59. (You can always see up-to-the-moment survey results here, and you can cast your own votes here.)

The wiki survey is a neat way to measure team strength. On their own, though, those ratings don’t tell us what we really want to know, which is how each game is likely to turn out, or how well our team might be expected to do this season. The relationship between relative strength and game outcomes should be pretty strong, but we might want to consider other factors, too, like home-field advantage. To turn a strength rating into a season-level forecast for a single team, we need to consider the specifics of its schedule. In game play, it’s relative strength that matters, and some teams will have tougher schedules than others.

A statistical model is the best way I can think to turn ratings into game forecasts. To get a model to apply to this season’s ratings, I estimated a simple linear one from last year’s preseason ratings and the results of all 256 regular-season games (found online in .csv format here). The model estimates net score (home minus visitor) from just one feature, the difference between the two teams’ preseason ratings (again, home minus visitor). Because the net scores are all ordered the same way and the model also includes an intercept, though, it implicitly accounts for home-field advantage as well.

The scatterplot below shows the raw data on those two dimensions from the 2013 season. The model estimated from these data has an intercept of 3.1 and a slope of 0.1 for the score differential. In other words, the model identifies a net home-field advantage of 3 points—consistent with the conventional wisdom—and it suggests that every point of advantage on the wiki-survey ratings translates into a net swing of one-tenth of a point on the field. I also tried a generalized additive model with smoothing splines to see if the association between the survey-score differential and net game score was nonlinear, but as the scatterplot suggests, it doesn’t seem to be.

2013 NFL Games Arranged by Net Game Score and Preseason Wiki Survey Rating Differentials

2013 NFL Games Arranged by Net Game Score and Preseason Wiki Survey Rating Differentials

In sample, the linear model’s accuracy was good, not great. If we convert the net scores the model postdicts to binary outcomes and compare those postdictions to actual outcomes, we see that the model correctly classifies 60 percent of the games. That’s in sample, but it’s also based on nothing more than home-field advantage and a single preseason rating for each team from a survey with a small number of respondents. So, all things considered, it looks like a potentially useful starting point.

Whatever its limitations, that model gives us the tool we need to convert 2014 wiki survey results into game-level predictions. To do that, we also need a complete 2014 schedule. I couldn’t find one in .csv format, but I found something close (here) that I saved as text, manually cleaned in a minute or so (deleted extra header rows, fixed remaining header), and then loaded and merged with a .csv of the latest survey scores downloaded from the manager’s view of the survey page on All Our Ideas.

I’m not going to post forecasts for all 256 games—at least not now, with three more preseason games to learn from and, hopefully, lots of votes yet to be cast. To give you a feel for how the model is working, though, I’ll show a couple of cuts on those very preliminary results.

The first is a set of forecasts for all Week 1 games. The labels show Visitor-Home, and the net score is ordered the same way. So, a predicted net score greater than 0 means the home team (second in the paired label) is expected to win, while a predicted net score below 0 means the visitor (first in the paired label) is expected to win. The lines around the point predictions represent 90-percent confidence intervals, giving us a partial sense of the uncertainty around these estimates.

Week 1 Game Forecasts from Preseason Wiki Survey Results on 10 August 2014

Week 1 Game Forecasts from Preseason Wiki Survey Results on 10 August 2014

Of course, as a fan of particular team, I’m most interested in what the model says about how my guys are going to do this season. The next plot shows predictions for all 16 of Baltimore’s games. Unfortunately, the plotting command orders the data by label, and my R skills and available time aren’t sufficient to reorder them by week, but the information is all there. In this plot, the dots for the point predictions are colored red if they predict a Baltimore win and black for an expected loss. The good news for Ravens fans is that this plot suggests an 11-5 season, good enough for a playoff berth. The bad news is that an 8-8 season also lies within the 90-percent confidence intervals, so the playoffs don’t look like a lock.

2014 Game-Level Forecasts for the Baltimore Ravens from 10 August 2014 Wiki Survey Scores

2014 Game-Level Forecasts for the Baltimore Ravens from 10 August 2014 Wiki Survey Scores

So that’s where the toy project stands now. My intuition tells me that the predicted net scores aren’t as well calibrated as I’d like, and the estimated confidence intervals surely understate the true uncertainty around each game (“On any given Sunday…”). Still, I think this exercise demonstrates the potential of this forecasting process. If I were a betting man, I wouldn’t lay money on these estimates. As an applied forecaster, though, I can imagine using these predictions as priors in a more elaborate process that incorporates additional and, ideally, more dynamic information about each team and game situation over the course of the season. Maybe my doppelganger can take that up while I get back to my day job…

Postscript. After I published this post, Jeff Fogle suggested via Twitter that I compare the Week 1 forecasts to the current betting lines for those games. The plot below shows the median point spread from an NFL odds-aggregating site as blue dots on top of the statistical forecasts already shown above. As you can see, the statistical forecasts are tracking the betting lines pretty closely. There’s only one game—Carolina at Tampa Bay—where the predictions from the two series fall on different sides of the win/loss line, and it’s a game the statistical model essentially sees as a toss-up. It’s also reassuring that there isn’t a consistent direction to the differences, so the statistical process doesn’t seem to be biased in some fundamental way.

Week 1 Game-Level Forecasts Compared to Median Point Spread from Betting Sites on 11 August 2014

Week 1 Game-Level Forecasts Compared to Median Point Spread from Betting Sites on 11 August 2014

We Are All Victorians

“We have no idea, now, of who or what the inhabitants of our future might be. In that sense, we have no future. Not in the sense that our grandparents had a future, or thought they did. Fully imagined cultural futures were the luxury of another day, one in which ‘now’ was of some greater duration. For us, of course, things can change so abruptly, so violently, so profoundly, that futures like our grandparents’ have insufficient ‘now’ to stand on. We have no future because our present is too volatile… We have only risk management. The spinning of the given moment’s scenarios. Pattern recognition.”

That’s the fictional Hubertus Bigend sounding off in Chapter Six of William Gibson’s fantastic 2003 novel. Gibson is best known as an author of science fiction set in the not-too-distant future. As that passage suggests, though, he is not uniquely interested in looking forward. In Gibson’s renderings, future and past might exist in some natural sense, but our ideas of them can only exist in the present, which is inherently and perpetually liminal.

In Chapter Six, the conversation continues:

“Do we have a past, then?” Stonestreet asks.

“History is a best-guess narrative about what happened and when,” Bigend says, his eyes narrowing. “Who did what to whom. With what. Who won. Who lost. Who mutated. Who became extinct.”

“The future is there,” Cayce hears herself say, “looking back at us. Trying to make sense of the fiction we will have become. And from where they are, the past behind us will look nothing at all like the past we imagine behind us now.”

“You sound oracular.” White teeth.

“I only know that the one constant in history is change: The past changes. Our version of the past will interest the future to about the extent we’re interested in in whatever the past the Victorians believed in. It simply won’t seem very relevant.”

I read that passage and I picture a timeline flipped vertical and frayed at both ends. Instead of a flow of time from left to right, we have only the floating point of the present, with ideas about the future and past radiating outwards and nothing to which we can moor any of it.

In a recent interview with David Wallace-Wells for Paris Review, Gibson revisits this theme when asked about science fiction as futurism.

Of course, all fiction is speculative, and all history, too—endlessly subject to revision. Particularly given all of the emerging technology today, in a hundred years the long span of human history will look fabulously different from the version we have now. If things go on the way they’re going, and technology keeps emerging, we’ll eventually have a near-total sorting of humanity’s attic.

In my lifetime I’ve been able to watch completely different narratives of history emerge. The history now of what World War II was about and how it actually took place is radically different from the history I was taught in elementary school. If you read the Victorians writing about themselves, they’re describing something that never existed. The Victorians didn’t think of themselves as sexually repressed, and they didn’t think of themselves as racist. They didn’t think of themselves as colonialists. They thought of themselves as the crown of creation.

Of course, we might be Victorians, too.

Of course we are. How could we not be?

That idea generally fascinates me, but it also specifically interests me as a social scientist. As discussed in a recent post, causal inference in the social sciences depends on counterfactual reasoning—that is, imagining versions of the past and future that we did not see.

Gibson’s rendering of time reminds us that this is even harder than we like to pretend. It’s not just that we can’t see the alternative histories we would need to compare to our lived history in order to establish causality with any confidence. We can’t even see that lived history clearly. The history we think we see is a pattern that is inexorably constructed from materials available in the present. Our constant disdain for most past versions of those renderings should give us additional pause when attempting to draw inferences from current ones.

The Ethics of Political Science in Practice

As citizens and as engaged intellectuals, we all have the right—indeed, an obligation—to make moral judgments and act based on those convictions. As political scientists, however, we have a unique set of potential contributions and constraints. Political scientists do not typically have anything of distinctive value to add to a chorus of moral condemnation or declarations of normative solidarity. What we do have, hopefully, is the methodological training, empirical knowledge and comparative insight to offer informed assessments about alternative courses of action on contentious issues. Our primary ethical commitment as political scientists, therefore must be to get the theory and the empirical evidence right, and to clearly communicate those findings to relevant audiences—however unpalatable or inconclusive they might be.

That’s a manifesto of sorts, nested in a great post by Marc Lynch at the Monkey Cage. Marc’s post focuses on analysis of the Middle East, but everything he writes generalizes to the whole discipline.

I’ve written a couple of posts on this theme, too:

  • This Is Not a Drill,” on the challenges of doing what Marc proposes in the midst of fast-moving and politically charged events with weighty consequences; and
  • Advocascience,” on the ways that researchers’ political and moral commitments shape our analyses, sometimes but not always intentionally.

Putting all of those pieces together, I’d say that I wholeheartedly agree with Marc in principle, but I also believe this is extremely difficult to do in practice. We can—and, I think, should—aspire to this posture, but we can never quite achieve it.

That applies to forecasting, too, by the way. Coincidentally, I saw this great bit this morning in the Letter from the Editors for a new special issue of The Appendix, on “futures of the past”:

Prediction is a political act. Imagined futures can be powerful tools for social change, but they can also reproduce the injustices of the present.

Concern about this possibility played a role in my decision to leave my old job, helping to produce forecasts of political instability around the world for private consumption by the U.S. government. It is also part of what attracts me to my current work on a public early-warning system for mass atrocities. By making the same forecasts available to all comers, I hope that we can mitigate that downside risk in an area where the immorality of the acts being considered is unambiguous.

As a social scientist, though, I also understand that we’ll never know for sure what good or ill effects our individual and collective efforts had. We won’t know because we can’t observe the “control” worlds we would need to confidently establish cause and effect, and we won’t know because the world we seek to understand keeps changing, sometimes even in response to our own actions. This is the paradox at the core of applied, empirical social science, and it is inescapable.

Follow

Get every new post delivered to your Inbox.

Join 12,959 other followers

%d bloggers like this: