The Birth of Crowdsourcing?

From p. 106 of the first paperback edition of The Professor and the Madman, a slightly overwrought but enjoyable history of the origins of the Oxford English Dictionary, found on the shelf of a vacation rental:

The new venture that [Richard Chenevix] Trench seemed now to be proposing would demonstrate not merely the meaning but the history of meaning, the life story of each word. And that would mean the reading of everything and the quoting of everything that showed anything of the history of the words that were to be cited. The task would be gigantic, monumental, and—according to the conventional thinking of the times—impossible.

Except that here Trench presented an idea, an idea that—to those ranks of conservative and frock-coated men who sat silently in the [London Library] on that dank and foggy evening [in 1857]—was potentially dangerous and revolutionary. But it was the idea that in the end made the whole venture possible.

The undertaking of the scheme, he said, was beyond the ability of any one man. To peruse all of English literature—and to comb the London and New York newspapers and the most literate of the magazines and journals—must be instead “the combined action of many.” It would be necessary to recruit a team—moreover, a huge one—probably comprising hundreds and hundreds of unpaid amateurs, all of them working as volunteers.

The audience murmured with surprise. Such an idea, obvious though it may sound today, had never been put forward before. But then, some members said as the meeting was breaking up, it did have some real merit.

And here’s what that crowdsourcing process ended up looking like in practice:

[Frederick] Furnivall then issued a circular calling for volunteer readers. They could select from which period of history they would like to read books—from 1250 to 1526, the year of the New English Testament; from then to 1674, the year when Milton died; or from 1674 to what was then the present day. Each period, it was felt, represented the existence of different trends in the development of the language.

The volunteers’ duties were simple enough, if onerous. They would write to the society offering their services in reading certain books; they would be asked to read and make word-lists of all that they read, and would then be asked to look, super-specifically, for certain words that currently interested the dictionary team. Each volunteer would take a slip of paper, write at its top left-hand side the target word, and below, also on the left, the date of the details that followed: These were, in order, the title of the book or paper, its volume and page number, and then, below that, the full sentence that illustrated the use of the target word. It was a technique that has been undertaken by lexicographers to the present day.

Herbert Coleridge became the first editor of what was to be called A New English Dictionary on Historical Principles. He undertook as his first task what may seem prosaic in the extreme: the design of a small stack of oak-board pigeonholes, nine holes wide and six high, which could accommodate the anticipated sixty to one hundred thousand slips of paper that would come in from the volunteers. He estimated that the first volume of the dictionary would be available to the world within two years. “And were it not for the dilatoriness of many contributors,” he wrote, clearly in a tetchy mood, “I should not hesitate to name an earlier period.”

Everything about these forecasts was magnificently wrong. In the end more than six million slips of paper came in from the volunteers; and Coleridge’s dreamy estimate that it might take two years to have the first salable section of the dictionary off the presses—for it was to be sold in parts, to help keep revenues coming in—was wrong by a factor of ten. It was this kind of woefully naive underestimate—of work, of time, of money—that at first so hindered the dictionary’s advance. No one had a clue what they were up against: They were marching blindfolded through molasses.

So, even with all those innovations, this undertaking also produced a textbook example of the planning fallacy. I wonder how quickly and cheaply the task could have been completed with Mechanical Turk, or with some brush-clearing assistance from text mining?

