I’m a cyclist who rides indoors a fair amount, especially in cold or wet weather. A couple of months ago, I bought an indoor cycle with a flywheel and a power meter. For the past several years, I’d been using the kind of trainer you attach to the back wheel of your bike for basement rides. Now, though, my younger son races, so I wanted something we could both use without too much fuss, and his coach wants to see power data from his home workouts.

To train properly with a power meter, I need to benchmark my current fitness. The conventional benchmark is Functional Threshold Power (FTP), which you can estimate from your average power output over a 20-minute test. To get the best estimate, you need to go as hard as you can for the full 20 minutes. To do that, you need to pace yourself. Go out too hard and you’ll blow up partway through. Go out too easy and you’ll probably end up lowballing yourself.

Once you have an estimate of your FTP, that pacing is easy to do: just ride at the wattage you expect to average. But what do you do when you’re taking the test for the first time?

I decided to solve that problem by appealing to the wisdom of the crowd. When I ride outdoors, I often ride with the same group, and many of those guys train with power meters. That means they know me and they know power data. Basically, I had my own little panel of experts.

Early this week, I emailed that group, told them how much I weigh (about 155 lbs), and asked them to send me estimates of the wattage they thought I could hold for 20 minutes. Weight matters because power covaries with it. What the other guys observe is my speed, which is a function of power relative to weight. So, to estimate power based on observed speed, they need to know my weight, too.

I got five responses that ranged from 300 to 350. Based on findings from the Good Judgment Project, I decided to use the median of those five guesses—314—as my best estimate.

I did the test on Tuesday. After 15 minutes of easy spinning, I did 3 x 30 sec at about 300W with 30 sec easy in between, then another 2 min easy, then 3 min steady above 300W, then 7 min easy, and then I hit it. Following emailed advice from Dave Guttenplan, who sometimes rides with our group, I started out a little below my target, then ramped up my effort after about 5 min. At the halfway point, I peeked at my interval data and saw that I was averaging 310W. With 5 min to go, I tried to up the pace a bit more. With 1 min to go, I tried to dial up again and found I couldn’t go much harder. No finish-line sprint for me. When the 20-minute mark finally arrived, I hit the “interval” button, dialed the resistance down, and spent the next minute or so trying not to barf—a good sign that I’d given it just about all I had.

And guess what the final average was: 314!

Now, you might be thinking I tried to hit that number because it makes for a good story. Of course I was using the number as a guideline, but I’m as competitive as the next guy, so I was actually pretty motivated to outperform the group’s expectations. Over the last few minutes of the test, I was getting a bit cross-eyed, too, and I don’t remember checking the output very often.

This result is also partly coincidence. Even the best power meters have a margin of error of about 2 percent, and that’s assuming they’re properly calibrated. So the best I can say is that my average output from that test was probably around 314W, give or take several watts.

Still, as an applied stats guy who regularly works with “wisdom of crowds” systems, I thought this was a great illustration of those methods’ utility. In this case, the remarkable accuracy of the crowd-based estimate surely had a lot to do with the crowd’s expertise. I only got five guesses, but they came from people who know a lot about me as a rider and whose experience training with power and looking at other riders’ numbers has given them a strong feel for the distribution of these stats. If I’d asked a much bigger crowd who didn’t know me or the data, I suspect the estimate would have missed badly (like this one). Instead, I got just what I needed.

## Tim v.G.

/ November 22, 2014The “good story” turns on your having used the median of the five guesses, but my reading of the Good Judgement paper you refer to suggests that for the best estimate you should have used a transformed weighted average – though taking the median is said to be a “close second”. Incidentally, most discussions of the famous Galton ox-weight exercise seem to claim he took the average of all guesses, but his 1907 Nature paper indicates he took the median: “According to the democratic principle of “one vote one value,” the middlemost estimate expresses the vox populi, every other estimate being condemned as too low or too high by a majority of the voters…”

## dartthrowingchimp

/ November 22, 2014The transformed weighted average applies to probabilistic forecasts where you already have track records for the forecasters, neither of which was the case here. I was referring to that paper to justify the use of median rather than mean. (N.B. We’re a geeky lot, aren’t we?)

## .

/ January 8, 2015I think I would have preferred to see what the final results were without peeking at the data during the test at all. But, pretty cool.

## dartthrowingchimp

/ January 9, 2015As an applied stats guy, I agree. I want the purest comparison possible. As a cyclist, though, I know that wouldn’t really work. You need to target and smooth your effort to get the best result possible, and you need the real-time feedback to do that.