2015 Tour de France Predictions

I like to ride bikes, I like to watch the pros race their bikes, and I make forecasts for a living, so I thought it would be fun to try to predict the outcome of this year’s Tour de France, which starts this Saturday and ends on July 26. I’m also interested in continuing to explore the predictive power of pairwise wiki surveys, a crowdsourcing tool that I’ve previously used to try to forecast mass-killing onsets, coup attempts, and pro football games, and that ESPN recently used to rank NBA draft prospects.

So, a couple of weeks ago, I used All Our Ideas to create a survey that asks, “Which rider is more likely to win the 2015 Tour de France?” I seeded the survey with the names of 11 riders—the 10 seen by bookmakers at Paddy Power as the most likely winners, plus Peter Sagan because he’s fun to watchposted a link to the survey on Tumblr, and trolled for respondents on Twitter and Facebook. The survey got off to a slow start, but then someone posted a link to it in the r/cycling subreddit, and the votes came pouring in. As of this afternoon, the survey had garnered more than 4,000 votes in 181 unique user sessions that came from five continents (see the map below). The crowd also added a handful of other riders to the set under consideration, bringing the list up to 16.


So how does that self-selected crowd handicap the race? The dot plot below shows the riders in descending order by their survey scores, which range from 0 to 100 and indicate the probability that that rider would beat a randomly chosen other rider for a randomly chosen respondent. In contrast to Paddy Power, which currently shows Chris Froome as the clear favorite and gives Nairo Quintana a slight edge over Alberto Contador, this survey sees Contador as the most likely winner (survey score of 90), followed closely by Froome (87) and a little further by Quintana (80). Both sources put Vincenzo Nibali as fourth likeliest (73) and Tejay van Garderen (65) and Thibaut Pinot (51) in the next two spots, although Paddy Power has them in the opposite order. Below that, the distances between riders’ chances get smaller, but the wiki survey’s results still approximate the handicapping of the real-money markets pretty well.


There are at least a couple of ways to try to squeeze some meaning out those scores. One is to read the chart as a predicted finishing order for the 16 riders listed. That’s useful for something like a bike race, where we—well, some of us, anyway—care not only who wins, but also where other will riders finish, too.

We can also try to convert those scores to predicted probabilities of winning. The chart below shows what happens when we do that by dividing each rider’s score by the sum of all scores and then multiplying the result by 100. The probabilities this produces are all pretty low and more tightly bunched than seems reasonable, but I’m not sure how else to do this conversion. I tried squaring and cubing the scores; the results came closer to what the betting-market odds suggest are the “right” values, but I couldn’t think of a principled reason to do that, so I’m not showing those here. If you know a better way to get from those model scores to well-calibrated win probabilities, please let me know in the comments.


So that’s what the survey says. After the Tour concludes in a few weeks, I’ll report back on how the survey’s predictions fared. Meanwhile, here’s wishing the athletes a crash–, injury–, and drug–free tour. Judging by the other big races I’ve seen so far this year, it should be a great one to watch.

Two Guys on Bikes Talk International Development

A couple of weeks ago, on the tail end of a lunchtime group bike ride, I complained to the one guy still headed my way—let’s call him Bob, because I didn’t ask him if I could share our conversation—about the lousy state of the roads in our area. We were heading south on Beach Drive through the northern neck of Rock Creek Park in Washington, DC, crunching through fallen leaves and dodging or bouncing off cracks and holes in the asphalt. Like a coarse file, I said. Before winter even starts, he said.

That got us to wondering why the roads weren’t in better shape, and that got us to lamenting the failure of local, state, and national governments to spend more on infrastructure in the past few years, when borrowing was cheap and the economy was dragging. Bob applauded the Obama administration’s first stimulus package but complained that it mostly just dumped money into the economy, a lot of which ended up “going to China.” That remark about China segued into a short but thoughtful complaint about the federal government’s focus on free trade.

I said I didn’t have a problem with freer trade and was actually glad to see living standards improve so much in some of the poorest parts of the world in the last couple of decades. I know there are some losers, I said, but I’m okay with the American middle class getting a little worse off if it means billions of really poor people in other countries are now much better off. After all, they’re all people, right?

I would call Bob a strong liberal, so his response came as a surprise. “I am not okay with that,” he told me. He didn’t say anything about Americans as such, or clang any other patriotic bells. Instead, he said that people he knows personally were having trouble feeding their kids or getting divorced or otherwise struggling in the past several years. I said something like, “Right, but people who were dying before age five are now doing a little better,” I argued. “Nope, still not okay,” he responded.

What started out as a boilerplate cyclists’ lament on road conditions had turned into a debate of sorts on the ethics of international development. As often happens, we’d found our way to a version of the Trolley Problem. Growth is coming down the track, but it will be distributed unevenly, and some people might even get run over. If you could guide that trolley’s path, how should you choose? Proximity? Familiarity? Nationality? At random? Should we worry most about maximizing overall welfare, or should the people close to us count more? On what grounds?

I won’t try to resolve that debate here, and you already know what I think from the anecdote. Instead, I wanted to share the story because it reminded me of something important in the politics of global development. Equality sounds good in the abstract, but we do not sit comfortably with it in practice. Most of us care more about some people than others, and those feelings shape our politics. We can—and, I think, should—aspire to global fairness, but we can also expect to keep tripping over our own feelings when we walk in that direction.

Wisdom of Crowds FTW

I’m a cyclist who rides indoors a fair amount, especially in cold or wet weather. A couple of months ago, I bought an indoor cycle with a flywheel and a power meter. For the past several years, I’d been using the kind of trainer you attach to the back wheel of your bike for basement rides. Now, though, my younger son races, so I wanted something we could both use without too much fuss, and his coach wants to see power data from his home workouts.

To train properly with a power meter, I need to benchmark my current fitness. The conventional benchmark is Functional Threshold Power (FTP), which you can estimate from your average power output over a 20-minute test. To get the best estimate, you need to go as hard as you can for the full 20 minutes. To do that, you need to pace yourself. Go out too hard and you’ll blow up partway through. Go out too easy and you’ll probably end up lowballing yourself.

Once you have an estimate of your FTP, that pacing is easy to do: just ride at the wattage you expect to average. But what do you do when you’re taking the test for the first time?

I decided to solve that problem by appealing to the wisdom of the crowd. When I ride outdoors, I often ride with the same group, and many of those guys train with power meters. That means they know me and they know power data. Basically, I had my own little panel of experts.

Early this week, I emailed that group, told them how much I weigh (about 155 lbs), and asked them to send me estimates of the wattage they thought I could hold for 20 minutes. Weight matters because power covaries with it. What the other guys observe is my speed, which is a function of power relative to weight. So, to estimate power based on observed speed, they need to know my weight, too.

I got five responses that ranged from 300 to 350. Based on findings from the Good Judgment Project, I decided to use the median of those five guesses—314—as my best estimate.

I did the test on Tuesday. After 15 minutes of easy spinning, I did 3 x 30 sec at about 300W with 30 sec easy in between, then another 2 min easy, then 3 min steady above 300W, then 7 min easy, and then I hit it. Following emailed advice from Dave Guttenplan, who sometimes rides with our group, I started out a little below my target, then ramped up my effort after about 5 min. At the halfway point, I peeked at my interval data and saw that I was averaging 310W. With 5 min to go, I tried to up the pace a bit more. With 1 min to go, I tried to dial up again and found I couldn’t go much harder. No finish-line sprint for me. When the 20-minute mark finally arrived, I hit the “interval” button, dialed the resistance down, and spent the next minute or so trying not to barf—a good sign that I’d given it just about all I had.

And guess what the final average was: 314!

Now, you might be thinking I tried to hit that number because it makes for a good story. Of course I was using the number as a guideline, but I’m as competitive as the next guy, so I was actually pretty motivated to outperform the group’s expectations. Over the last few minutes of the test, I was getting a bit cross-eyed, too, and I don’t remember checking the output very often.

This result is also partly coincidence. Even the best power meters have a margin of error of about 2 percent, and that’s assuming they’re properly calibrated. So the best I can say is that my average output from that test was probably around 314W, give or take several watts.

Still, as an applied stats guy who regularly works with “wisdom of crowds” systems, I thought this was a great illustration of those methods’ utility. In this case, the remarkable accuracy of the crowd-based estimate surely had a lot to do with the crowd’s expertise. I only got five guesses, but they came from people who know a lot about me as a rider and whose experience training with power and looking at other riders’ numbers has given them a strong feel for the distribution of these stats. If I’d asked a much bigger crowd who didn’t know me or the data, I suspect the estimate would have missed badly (like this one). Instead, I got just what I needed.

“They Said It Was Going to Rain”

Most Saturdays and some Sundays, I hook up with a bike ride that winds out of DC’s Rock Creek Park into semi-rural Maryland and back again over the course of a few hours. I depend on this ride for hard training and a shot of competition, but I’m a wet-weather wimp and will usually stay home and use the trainer in my basement if it’s raining or probably going to rain. So, one of the first things I do when I get up most weekend mornings is check the hourly forecasts at weather.com and Weather Underground. If there’s much risk of rain, I’ll open the radar map again close to my 9:45 departure and run the animated forecast for the next few hours. If that animation shows yellow or orange blobs swarming my regular route when I’m going to be on it, I almost always stay in.

One recent Sunday, the forecast had me hemming and hawing for a bit before I decided to go. The hourly breakout at weather.com pegged the chance of rain at 70 percent for the first couple of hours I’d be out, but it wasn’t raining at 9:30 and the radar map didn’t look bad, either. Updating completed, out I went.

The weather often dominates conversations at the start and finish of the ride, and on that Sunday two themes rang through the chatter I overheard: we’d gotten really lucky, and weather forecasters are idiots. “They said it was going to rain,” the Greek chorus kept repeating.

wet paris roubaix

But, of course, that’s not what “they” said. In point of fact, meteorologists had pegged the odds of rain at about 2:1. According to those forecasts, it was probably going to rain, but the chances that it would stay dry weren’t so bad, either. I wouldn’t bet my mortgage on a probability of 0.3, but I’m okay with occasionally risking a soggy ride on one.

As a weather-wimpy cyclist, I was happy to catch the lucky break that Sunday. As a guy who sometimes forecasts for a living, I was intrigued by the consistent way in which so many people had distorted that probability. In our heads, the quantified uncertainty we saw in the paper or on the web was transformed into a categorical prediction of rain. What the modeler would want to contextualize before assessing—“For all of the hours I said there was a 70-percent chance of rain, how often did rain actually happen?”—the intended audience was fine judging in isolation and declaring, “Wrong!”

That we’re not so great at processing probabilities won’t surprise anyone familiar with psychological research from the past few decades on that subject. Exactly what form that bias takes under what conditions, though, still seems to be something of a mystery. In a New York Times blog post about forecasts of the U.S. presidential election, statistician Andrew Gelman wrote:

What if the weatherman told you there was a 30 percent chance of rain—would you be shocked if it rained that day? No.

Apparently, Gelman hasn’t met the crew from my weekend ride. Gelman goes on to connect his assertion to work by Amos Tversky and Daniel Kahneman on prospect theory, which is based, in part, on the expectation people systematically overestimate the risk of low-probability events and underestimate the risk of high-probability ones. That expectation, in turn, is based on empirical research that has been replicated elsewhere, as the following chart shows:

probability weighting estimates

What’s puzzling to me here is that my fellow riders seemed to be distorting things in the opposite direction. Instead of taking a probability of 0.7 and thinking of it as a toss-up as Gelman and that chart predict they would, they had converted it into a sure thing. That’s still bias, of course—just not the kind I would have expected.

If there’s a moral to this story, it’s that we still have a lot of work left to do in understanding how we cogitate on uncertainty and what that implies about how we should produce and present probabilistic forecasts. In many domains, we’re getting better and better at the forecasting part, but even very accurate forecasts are only as useful as we make them or let them be. To get from the one to the other, we still need to learn a lot more about how we process and act on that information—not just individually, but also organizationally and socially.

Dr. Bayes, or How I Learned to Stop Worrying and Love Updating

Over the past week, I spent a chunk of every morning cycling through the desert around Tucson, Arizona. On Friday, while riding toward my in-laws’ place in the mountains west of town, I heard the roar of a jet overhead. My younger son’s really into flying right now, and Tucson’s home to a bunch of fighter jets, so I reflexively glanced up toward the noise, hoping to spot something that would interest him.

On that first glance, all I saw was an empty patch of deep blue sky. Without effort, my brain immediately retrieved a lesson from middle-school physics, reminding me that the relative speeds of light and sound meant any fast-moving plane would appear ahead of its roar. But which way? Before I glanced up again, I drew on prior knowledge of local patterns to guess that it would almost certainly be to my left, traveling east, and not so far ahead of the sound because it would be flying low as it approached either the airport or the Air Force Base.  Moments after my initial glance, I looked up a second time and immediately spotted the plane where I’d now expected to find it. When I did, I wasn’t surprised to see that it was a commercial jet, not a military one, because most of the air traffic in the area is civilian.

This is Bayesian thinking, and it turns out that we do it all the time.  The essence of Bayesian inference is updating. We humans intuitively form and hold beliefs (estimates) about all kinds of things. Those beliefs are often erroneous, but it turns out that we can make them better by revising (updating) them whenever we encounter new information that pertains to them. Updating is really just a form of learning, but Bayes’ theorem gives us a way to structure that learning that turns out to be very powerful. As cognitive scientists Tom Griffiths and Joshua Tenenbaum summarize in a nice 2006 paper [PDF] called “Statistics and the Bayesian Mind,”

The mathematics of Bayesian belief is set out in the box. The degree to which one should believe in a particular hypothesis h after seeing data d is determined by two factors: the degree to which one believed in it before seeing d, as reflected by the prior probability P(h), and how well it predicts the data d, as reflected in the likelihood, P(d|h).

This might sound like a lot of work or just too arcane to bother, but Griffiths and Tenenbaum argue that we often think that way intuitively. Their paper gives several examples, including predictions about the next result in a series of coin flips and the common tendency to infer causality from clusters that actually arise at random.

The same process appears in my airplane-spotting story. My initial glance is akin to the base rates that are often used as the starting point for Bayesian inference: to see something you hear, look where sound is coming from. When that prediction failed, I went through three rounds of updating before I looked up again—one based on general knowledge about the relative speeds of light and sound, and then a second (direction) and third (commercial vs. military) based on prior observations of local air traffic. My final “prediction” turned out to be right because those local patterns are strong, but even with all that objective information, there was still some uncertainty. Who knows, there could have been an emergency, or a rogue pilot, or an alien invasion…

I’m writing about this because I think it’s interesting, but I also have ulterior motives. A big part of my professional life involves using statistical models to forecast rare political events, and I am deeply frustrated by frequent encounters with people who dismiss statistical forecasts out of hand (see here and here for previous posts on the subject). It’s probably unrealistic of me to think so, but I am hopeful that recognition of the intuitive nature and power of Bayesian updating might make it easier for skeptics to make use of my statistical forecasts and others like them.

I’m a firm believer in the forecasting power of statistical models, so I usually treat a statistical forecast as my initial belief (or prior, in Bayesian jargon) and then only revise that forecast as new information arrives. That strategy is based on another prior, namely, the body of evidence amassed by Phil Tetlock and others that the predictive judgments of individual experts often aren’t very reliable, and that statistical models usually produce more accurate forecasts.

From personal experience I gather that most people, including many analysts and policymakers, don’t share that belief about the power of statistical models for forecasting. Even so, I would like to think those skeptics might still see how Bayes’ rule would allow them to make judicious use of statistical forecasts, even if they trust their own or other experts’ judgments more. After all, to ignore a statistical forecast is equivalent to holding the extreme view that that statistical forecast holds absolutely no useful information. In The Theory that Would Not Die, an entertaining lay history of Bayes’ rule, Sharon Bertsch McGrayne quotes Larry Stone, a statistician who used the theorem to help find a nuclear submarine that went missing in 1968, as saying that, “Discarding one of the pieces of information is in effect making the subjective judgment that its weight is zero and the other weight is one.”

So instead of rejecting the statistical forecast out of hand, why not update in response to it? When the statistical forecast closely accords with your prior belief, it will strengthen your confidence in that judgment, and rightly so. When the statistical forecast diverges from your prior belief, Bayes’ theorem offers a structured but simple way to arrive at a new estimate. Experience shows that this deliberate updating will produce more accurate forecasts than the willful myopia involved in ignoring the new information the statistical model has provided. And, as a kind of bonus, the deliberation involved in estimating the conditional probabilities Bayes’ theorem requires may help to clarify your thinking about the underlying processes involved and the sensitivity of your forecasts to certain assumptions.

PS. For some nice worked examples of Bayesian updating, see Appendix B of The Theory that Would Not Die or Chapter 8 of Nate Silver’s book, The Signal and the Noise. And thanks to Paul Meinshausen for pointing out the paper by Griffiths and Tenenbaum, and to Jay Yonamine for recommending The Theory That Would Not Die.

What Cycling Has Taught Me about Rule of Law (and Other Social-Science Geekery)

I make my living as a political scientist, but my big hobby is cycling. I’ve been riding a bike for fun and for exercise off and on for more than two decades now, ever since I was a teenager and first got hooked on mountain biking in the woods behind my parents’ house in Carrboro, NC. Nowadays, I usually spend 10-15 hours on the bike each week, most of it on the roads in and around Washington, DC. I haven’t raced in a few years–a handful of crashes convinced me to get out of the formal side of the sport–but I still do a lot of fast group rides, some of which draw scores of riders when the weather’s nice.

When I’m on my bike, I try to push my work life to the back of my mind and just concentrate on the ride. That, and the exercise, are usually the point of the thing. Still, I’ve spent my entire adulthood training to think like a social scientist, and road cycling is an inherently social endeavor, so the mental firewall sometimes crumbles. When that happens, one of a few recurring themes usually comes to mind, depending on the situation.

1. Rule of law doesn’t mean that laws rule. Social scientists who study political and economic development talk a lot about the importance of “rule of law,” which boils down to the idea that sensible laws are predictably and fairly enforced. According to many development theorists, rich democracies like the United States have prospered in large part because they transitioned early from arbitrary and capricious regulations to rule of law. Poorer countries will only see their economic growth rates take off and politics stabilize, the thinking goes, when they manage to make the same shift.

Even in the United States, though, rule of law can be startlingly incomplete. When you’re perched on a 17-lb. carbon-fiber sculpture, trying to share the road with streams of 3,500-lb. hurtling steel boxes, you’re constantly reminded that the formal rules only tell a small part of the story. Most drivers exceed the formal speed limit most of the time. Almost no one comes to a complete stop at stop signs, even though that’s what the rules of the road tell you to do. (Cyclists are probably worse about this one than drivers, by the way, often blowing through stop signs where cars are already waiting.) A lot of people in Maryland, where I live, talk on hand-held mobile phones while they drive, despite the fact that the state passed a law last year banning that behavior. Some drivers expect cyclists to clear the road for them, even though the law in most (all?) states instructs cyclists to take the lane when the rider judges that it’s not safe to squeeze onto the shoulder. Significantly, this gap between formal and informal rules doesn’t just happen when the police aren’t around. On numerous occasions, I’ve had police officers tell me to do something not prescribed by law (e.g., avoid this road, stay on that bike path), apparently because they thought it was expedient.

If you tried to survive in this environment by counting on people to follow the formal rules, you’d be toast. Some of this is just ignorance of the law, but some of it–like speeding–is the result of informal practices that dominate the formal rules. Some of those informal practices might be more efficient than their formal counterparts, but surely some are not. So, even in places where “rule of law” supposedly prevails, many of our daily practices are still built around shared expectations based on unwritten and sometimes inefficient rules, and these unwritten rules can be very hard to dislodge when they are widely followed.

These observations have strongly influenced how I think about prescriptions for better governance in “developing” countries that are based on changes to formal rules. Some political scientists and economists place great faith in the idea that desirable social outcomes can be brought about by crafting rules that will give people incentives to behave in the ways we’d like. On paper, that idea makes some sense. In practice, however, this yawning gap between formal and informal institutions on the roads reminds me that real life is a lot more complicated.

2. Some people act as if (your) life is really cheap. There are a lot of bad or distracted drivers out there who unintentionally put themselves and cyclists at risk; whenever I encounter them, I might shake my head, but I’m not all that surprised. What do surprise me are the extraordinarily dangerous things some drivers will do to send cyclists a message when they don’t like how those riders are behaving on the road. As far as I can tell, these people just don’t think my life is worth very much, or they just don’t think about it at all.

A couple of years ago, I was on a big group ride in a semi-rural part of Montgomery County on a Saturday morning when a driver apparently got frustrated with waiting behind us for a safe place to pass. On a fast downhill, where our group had stretched into single file and was travelling at or above the 30 mph speed limit, this guy decided to try to pass, then abruptly pulled his car back to the right, splitting the line of cyclists right in front of me. A couple of seconds later, he hit his brakes hard, even though the riders in front of him were still flying down the hill at the same speed. I swerved just enough to avoid ramming straight into his back bumper, clipping the back-left corner of his car instead. My chin hit the trunk, then I flipped through the air and landed on my bum in the opposite lane. Lucky for me, no cars were coming the other way. Without a word and with barely a pause, the driver sped off to his house, which turned out to be just a half-mile down the same road. He pulled into his garage and stayed inside, even when the police came.

That’s just one of many close calls I’ve had on the road with drivers who seemed to be using their machines to tell me how they felt about my presence or behavior on the road. Sometimes it’s just a yell as they pass, but at least a few times a month it’s more: a swerve that squeezes me to the edge of, or even off, the road; a tailgater who could kill me with just a touch of the gas; a guy a couple of weeks ago who sped by, pulled over, jumped out of his car, and screamed at me to come fight him, apparently because I’d delayed him at the last traffic light. (I’ve been part of that particular scenario a few times now.)

It’s hard to imagine that these drivers would engage in these behaviors if they could think through all the potential costs of their actions. For starters, I’d like to think my life is worth something to them, if only in the abstract sense that most of us see human life as a thing worth protecting. Even in totally selfish terms, though, an incident in which I’m badly hurt or killed would be a huge inconvenience for the driver, too. The police, the insurance, the possibility of courts and even jail time–all of that’s going to be a much bigger hassle than the few extra seconds they might wait for a safe opportunity to pass me.

I sometimes think of these angry drivers when I’m reading theories of civil war and other forms of political violence. In the past couple of decades, a lot of the thinking about why civil wars happen where and when they do has centered on the assumption that violence is an instrument which organizations use to advance their political interests, and that individuals who choose to participate in that violence do so after weighing its expected risks and benefits. I still think both of those assumptions can be useful ones for purposes of theorizing about violence, but my experiences on the road have also taught me that those assumptions have stark limits. Sometimes, people threaten or use violence in ways that just don’t seem to take much account of the consequences, and trying to understand that behavior as the product of cost-benefit analysis can take us pretty far away from reality.

3. We all belong to tribes. Cyclists often ride in packs, and the conversation in those packs often turns to drivers. In those conversations, “we” (riders) are typically described as good people doing good things for bodies and our planet, and “they” (drivers) are often described as careless or even bad people who are thinking only of themselves and denying us our rightful place on the road. We share stories of injustices suffered by ourselves or other riders on Facebook and Twitter, and when one of us is threatened by a driver, others often rally around to protect him or her, even if it’s someone we hardly know. That time I clipped the back of a fast-braking car on my Saturday-morning ride, a dozen guys I’ve never seen off a bike stopped their day to make sure I was okay, then waited for almost a half-hour to talk to the police in hopes of punishing the driver.

Based on my limited knowledge of anthropology, I gather this is standard in-group/out-group behavior. We see ourselves as part of a social collective with a distinct identity and way of life; we identify external threats to that way of life; and we go out of our way to protect members of our collective from those threats, even in situations where it isn’t self-evidently “rational” to do so. This is exactly the kind of us vs. them behavior that political scientists and sociologists often describe when discussing “ethnic” or “tribal” groups, usually in pejorative terms. People in rich countries are supposed to have traded in these traditional identities for more “modern” ones, and that break with tradition is supposed to give them the freedom to make decisions based on efficiency instead of obligation.

In short, cyclists may not be an ethnic group, but they sometimes act like one. That cyclists can act like an ethnic group reinforces my belief that the constructivists are right about the origins and behavior of human communities. Supposedly “modern” humans are just the same old people plunked down in different contexts, “ancient hatreds” can get pretty intense pretty fast, and modernity–whatever that is–is not a cure for these quirks of our nature.

  • Author

  • Follow me on Twitter

  • Follow Dart-Throwing Chimp on WordPress.com
  • Enter your email address to follow this blog and receive notifications of new posts by email.

    Join 13,609 other subscribers
  • Archives

%d bloggers like this: