The Ghosts of Wu Chunming’s Past, Present, and Future

On a blogged recommendation from Chris Blattman, I’m now reading Factory Girls. Written by Leslie T. Chang and published in 2008, it’s a non-fiction book about the young migrant women whose labor has stoked the furnaces of China’s economic growth over the past 30 years.

One of the book’s implicit “findings” is that this migration, and the larger socioeconomic transformation of which it is a part, is a difficult but ultimately rewarding process for many. Chang writes (p. 13, emphasis in the original):

Migration is emptying villages of young people. Across the Chinese countryside, those plowing and harvesting in the fields are elderly men and women, charged with running the farm and caring for the younger children who are still in school. Money sent home by migrants is already the biggest source of wealth accumulation in rural China. Yet earning money isn’t the only reason people migrate. In surveys, migrants rank ‘seeing the world,’ ‘developing myself,’ and ‘learning new skills’ as important as increasing their incomes. In many cases, it is not crippling poverty that drives migrants out from home, but idleness. Plots of land are small and easily farmed by parents; nearby towns offer few job opportunities. There was nothing to do at home, so I went out.

That idea fits my priors, and I think there is plenty of system-level evidence to support it. Economic development carries many individual and collective costs, but the available alternatives are generally worse.

Still, as I read, I can’t help but wonder how much the impressions I take away from the book are shaped by selection bias. Like most non-fiction books written for a wide audience, Factory Girls blends reporting on specific cases—here, the experiences of certain women who have made the jump from small towns to big cities in search of paid work—with macro-level data on the systemic trends in which those cases are situated. The cases are carefully carefully and artfully reported, and it’s clear that Chang worked on and cared deeply about this project for many years.

No matter how hard the author tried, though, there’s a hitch in her research design that’s virtually impossible to overcome. Chang can only tell the stories of migrants who shared their stories with her, and these sources are not a random sample of all migrants. Even worse for attempts to generalize from those sources, there may be a correlation between the ability and desire to tell your story to a foreign reporter and the traits that make some migrants more successful than others. We don’t hear from young women who are too ashamed or humble or disinterested to tell their stories to a stranger who wants to share them with the world. We certainly can’t hear from women who have died or been successfully hidden from the reporter’s view for one reason or another. If the few sources who open up to Chang aren’t representative of the pool of young women whose lives she aims to portray, then their stories won’t be, either.

An anecdote from Wu Chunming, one of the two young women on whom the book focuses, stuck in my mind as a metaphor for the selection process that might skew our view of the process Chang means to describe. On pp. 46-47, Chang writes:

Guangdong in 1993 was even more chaotic than it is today. Migrants from the countryside flooded the streets looking for work, sleeping in bus stations and under bridges. The only way to find a job was to knock on factory doors, and Chunming and her friends were turned away from many doors before they were hired at the Guotong toy factory. Ordinary workers there made one hundred yuan a month, or about twelve dollars; to stave off hunger, they bought giant bags of instant noodles and added salt and boiling water. ‘We thought if we ever made two hundred yuan a month,’ Chunming said later, ‘we would be perfectly happy.’

After four months, Chunming jumped to another factory, but left soon after a fellow worker said her cousin knew of better jobs in Shenzhen. Chunming and a few friends traveled there, spent the night under a highway overpass, and met the girl’s cousin the next morning. He brought them to a hair salon and took them upstairs, where a heavily made-up young woman sat on a massage bed waiting for customers. Chunming was terrified at the sight. ‘I was raised very traditionally,’ she said. ‘I thought everyone in that place was bad and wanted me to be a prostitute. I thought that once I went in there, I would turn bad too.’

The girls were told that they should stay and take showers in a communal stall, but Chunming refused. She walked back down the stairs, looked out the front door, and ran, abandoning her friends and the suitcase that contained here money, a government-issued identity card, and a photograph of her mother…

‘Did you ever find out what happened to the friends you left behind in the hair salon?’ I asked.

‘No,’ she said. ‘I don’t know if it was a truly bad place or just a place where you could work as a massage girl if you wanted. But it was frightening that they would not let us leave.’

In that example, we hear Wu’s side of this story and the success that followed. What we don’t hear are the stories of the other young women who didn’t run away that day. Maybe the courage or just impulsiveness Chunming showed in that moment is something that helped her become more successful afterwards, and that also made her more likely to encounter and open up to a reporter.

Chang implicitly flags this issue for us at the end of that excerpt, and she explicitly addresses it in a “conversation” with the author that follows the text in my paperback edition. Still, Chang can’t tell us the versions of the story that she doesn’t hear. In social-scientific jargon, those other young women left behind at the hair salon are the unobserved counterfactuals to the optimistic narrative we get from Chunming. A more literary soul might describe those other girls as the ghosts of Wu Chunming’s past, present, and future. Unlike Dickens’ phantoms, though, these other lives actually happened, and yet we still can’t see them.

In a recent blog post, sociologist Zeynep Tufekci wrote about the relationship between a project’s research design and the inferences we can draw from it:

Research methods, a topic that is seemingly so dry, are the heart and soul of knowledge. Most data supports more than one theory. This does NOT mean all data supports all theories: rather, multiple explanations can fit one set of findings. Choosing the right underlying theory, an iterative process that always builds upon itself, requires thinking hard on how data selection impacts findings, and how presentation of findings lends itself to multiple theories, and how theories fit with existing worldviews, and how better research design can help us distinguish between competing explanation.

A good research project consciously grapples with these.

Like the video Tufekci critiques in her essay, Chang’s book is a research project. Factory Girls is a terrific piece of work and writing, but those of us who read it with an eye toward understanding the wider processes its stories are meant to represent should do so with caution, especially if it confirms our prior beliefs. I hope that economic development is mostly improving the lives of young women and men in China, and there is ample macro-level evidence that it is. The stories Chang relates seem to confirm that view, but a little thinking about selection effects suggests that we should expect them to do that. To really test those beliefs, we would need to trace the life courses of a wider sample of young women. As is often happens in social science, though, the cases most important to testing our mental models are also the hardest to see.

Forecasting Round-up No. 8

1. The latest Chronicle of Higher Education includes a piece on forecasting international affairs (here) by Beth McMurtrie, who asserts that

Forecasting is undergoing a revolution, driven by digitized data, government money, new ways to analyze information, and discoveries about how to get the best forecasts out of people.

The article covers terrain that is familiar to anyone working in this field, but I think it gives a solid overview of the current landscape. (Disclosure: I’m quoted in the piece, and it describes several research projects for which I have done or now do paid work.)

2. Yesterday, I discovered a new R package that looks to be very useful for evaluating and comparing forecasts. It’s called ‘scoring‘, and it does just that, providing functions to implement an array of proper scoring rules for probabilistic predictions of binary and categorical outcomes. The rules themselves are nicely discussed in a 2013 publication co-authored by the package’s creator, Ed Merkle, and Mark Steyvers. Those rules and a number of others are also discussed in a paper by Patrick Brandt, John Freeman, and Phil Schrodt that appeared in the International Journal of Forecasting last year (earlier ungated version here).

I found the package because I was trying to break the habit of always using the area under the ROC curve, or AUC score, to evaluate and compare the accuracy of forecasts from statistical models of rare events. AUC is quite useful as far as it goes, but it doesn’t address all aspects of forecast accuracy we might care about. Mathematically, the AUC score represents the probability that a prediction selected at random from the set of cases that had an event of interest (e.g., a coup attempt or civil-war onset) will be larger than a prediction selected at random from the set of cases that didn’t. In other words, AUC deals strictly in relative ranking and tells us nothing about calibration.

This came up in my work this week when I tried to compare out-of-sample estimates from three machine-learning algorithms—kernel-based regularized least squares (KRLS), Random Forests (RF), and support vector machines (SVM)—trained on and then applied to the same variables and data. In five-fold cross-validation, the three algorithms produced similar AUC scores, but histograms of the out-of-sample estimates showed much less variance for KRLS than RF and SVM. The mean out-of-sample “forecast” from all three was about 0.009, the base rate for the event, but the maximum for KRLS was only about 0.01, compared with maxes in the 0.4s and 0.7s for the others. It turned out that KRLS was doing about as well at rank ordering the cases as RF and SVM, but it was much more conservative in estimating the likelihood of an event. To consider that difference in my comparisons, I needed to apply scoring rules that were sensitive to forecast calibration and my particular concern with avoiding false negatives, and Merkle’s ‘scoring’ package gave me the functions I needed to do that. (More on the results some other time.)

3. Last week, Andreas Beger wrote a great post for the WardLab blog, Predictive Heuristics, cogently explaining why event data is so important to improving forecasts of political crises:

To predict something that changes…you need predictors that change.

That sounds obvious, and in one sense it is. As Beger describes, though, most of the models political scientists have built so far have used slow-changing country-year data to try to anticipate not just where but also when crises like coup attempts or civil-war onsets will occur. Some of those models are very good at the “where” part, but, unsurprisingly, none of them does so hot on the “when” part. Beger explains why that’s true and how new data on political events can help us fix that.

4. Finally, Chris Blattman, Rob Blair, and Alexandra Hartman have posted a new working paper on predicting violence at the local level in “fragile” states. As they describe in their abstract,

We use forecasting models and new data from 242 Liberian communities to show that it is to possible to predict outbreaks of local violence with high sensitivity and moderate accuracy, even with limited data. We train our models to predict communal and criminal violence in 2010 using risk factors measured in 2008. We compare predictions to actual violence in 2012 and find that up to 88% of all violence is correctly predicted. True positives come at the cost of many false positives, giving overall accuracy between 33% and 50%.

The patterns Blattman and Blair describe in that last sentence are related to what Beger was talking about with cross-national forecasting. Blattman, Blair, and Hartman’s models run on survey data and some other structural measures describing conditions in a sample of Liberian localities. Their predictive algorithms were derived from a single time step: inputs from 2008 and observations of violence from 2010. When those algorithms are applied to data from 2010 to predict violence in 2012, they do okay—not great, but “[similar] to some of the earliest prediction efforts at the cross-national level.” As the authors say, to do much better at this task, we’re going to need more and more dynamic data covering a wider range of cases.

Whatever the results, I think it’s great that the authors are trying to forecast at all. Even better, they make explicit the connections they see between theory building, data collection, data exploration, and prediction. On that subject, the authors get the last word:

However important deductive hypothesis testing remains, there is much to gain from inductive, data-driven approaches as well. Conflict is a complex phenomenon with many potential risk factors, and it is rarely possible to adjudicate between them on ex ante theoretical grounds. As datasets on local violence proliferate, it may be more fruitful to (on occasion) let the data decide. Agnosticism may help focus attention on the dependent variable and illuminate substantively and statistically significant relationships that the analyst would not have otherwise detected. This does not mean running “kitchen sink” regressions, but rather seeking models that produce consistent, interpretable results in high dimensions and (at the same time) improve predictive power. Unexpected correlations, if robust, provide puzzles and stylized facts for future theories to explain, and thus generate important new avenues of research. Forecasting can be an important tool in inductive theory-building in an area as poorly understood as local violence.

Finally, testing the predictive power of exogenous, statistically significant causes of violence can tell us much about their substantive significance—a quantity too often ignored in the comparative politics and international relations literature. A causal model that cannot generate predictions with some reasonable degree of accuracy is not in fact a causal model at all.

  • Author

  • Follow me on Twitter

  • Follow Dart-Throwing Chimp on WordPress.com
  • Enter your email address to follow this blog and receive notifications of new posts by email.

    Join 13,609 other subscribers
  • Archives

%d bloggers like this: