Weighing the Evidence

I have no important disagreements with James Heckman’s description of the significance of early childhood experiences or the radical differences in those experiences among children of different socioeconomic classes. I am no less eager than he to find solutions. But we differ in our confidence about the state of knowledge regarding early childhood intervention.

The most famous evidence on behalf of early childhood intervention comes from the programs that Heckman describes, Perry Preschool and the Abecedarian Project. The samples were small. Perry Preschool had just 58 children in the treatment group and 65 in the control group, while Abecedarian had 57 children in the treatment group and 54 in the control group. In both cases the people who ran the program were also deeply involved in collecting and coding the evaluation data, and they were passionate advocates of early childhood intervention. These shortcomings do not automatically disqualify the results, but think of it this way: if the case against the efficacy of early childhood interventions rested on two studies with small samples conducted by people who were openly hostile to such interventions, no one would pay any attention to them.

So we have positive, statistically significant results, but from studies with problematic aspects. What next? The obvious answer: let’s replicate those programs with a larger sample and better segregation of program and evaluation. And that’s what has happened. The replication was called the Infant Health and Development Program (IHDP). It had a randomly selected treatment group of 377 and a control group of 608, all of them low–birth weight babies. For each infant the intervention began upon discharge from the neonatal nursery and continued until the child reached 36 months of age. The program had three components: frequent home visits by a trained counselor, attendance at a child development center five days a week for at least four hours beginning at twelve months, and parent group meetings after the children reached twelve months. The intervention was designed on the Abecedarian model and in many ways was more intensive.

The first follow-ups at 24 and 36 months were highly positive. By the time the participants were age five, however, most of those results had disappeared. In the follow-up at age eighteen, the results for the treatment and control children showed no effect for any of the indicators, which covered intellectual ability, academic achievement, behavioral problems, and physical health.

A small number of studies report positive results for early intervention programs; most do not.

Did IHDP achieve anything at all? Its advocates argue that it modestly helped the heavier babies in the sample (those weighing more than 2,000 grams), but without offering a theoretical basis for thinking this represents anything more than after-the-fact slicing and dicing of the data. Before the fact, the program design had focused on the lighter babies (less than 2,000 grams), making them two-thirds of the sample, because they were more likely to have developmental problems and therefore presumably were more likely to show results than heavier babies. The aggregated treatment sample produced exactly the results that one would expect from a well-executed evaluation of a program that had no long-term effects. In thinking about the prospects for large-scale early childhood interventions, why should we privilege the results of Perry Preschool and Abecedarian over the results from the much larger, more intensive, and more rigorously evaluated IHDP?

The literature on early intervention is far more extensive than Perry Preschool, Abecedarian, and IHDP. Hundred of titles have been written on Head Start alone. The studies that have interpretable data lead me to pose a parallel question: Why should we privilege the small number of studies that report positive results over the much greater number that do not?

Toward the end of his career, sociologist Peter Rossi, a dedicated progressive and the nation’s leading expert on social program evaluation from the 1960s through the 1980s, summarized his encyclopedic knowledge of the evaluation literature with his “metallic laws.” Rossi’s iron law was that “the expected value of any net impact assessment of any large scale social program is zero.” His stainless steel law was that “the better designed the impact assessment of a social program, the more likely is the resulting estimate of net impact to be zero.” To me, the experience of early childhood intervention programs follows the familiar, discouraging pattern that led him to formulate his laws: small-scale experimental efforts staffed by highly motivated people show effects. When they are subject to well-designed large-scale replications, those promising signs attenuate and often evaporate altogether.