Research report no.14 2005

The efficacy of early childhood interventions

by Sarah Wise, Lisa da Silva, Elizabeth Webster and Ann Sanson

5. Adequacy of evaluation design

Ideally, evaluations of interventions should be systematic, comprehensive and use rigorous scientific controls, such as randomised trials and sufficient statistical power, to find meaningful program effects (Sanders 2003). Some existing reviews of program evaluations have developed standards, grades or levels of evidence for early childhood interventions, based on certain criteria. These categories are used as a means of reporting the rigour of the evaluation design (for example, Mrazek and Brown 2002).

Evidence rating system

The evidence rating system adopted in this report aims to provide information on a number of fundamental research design elements. The elements included in this review are:

The presence or absence of each design element is recorded in Tables 1-5 below. Full details of the intervention evaluations and outcomes are provided in Appendix 2.

Adequacy of cluster 1 evaluations

All evaluations in cluster 1 included a representative sample of participants. Most used reliable measures, made appropriate choices about measures and used appropriate analytic approaches. Four of the six interventions (Perry, CPC, High/Scope and PIDI) included an appropriate control or comparison group and four (Perry, Head Start, High/Scope, PIDI) collected pre-intervention data. Half of the interventions had follow-up data (Perry, CPC, High/Scope).

The evaluation integrity of three interventions in cluster 1 was very good, with all three interventions containing nine of the ten research design elements (Perry, CPC, High/Scope). The evaluation integrity of one intervention (Saginaw) was very poor, containing only two of the research design elements; while the evaluation integrity of the remaining two interventions (Head Start, PIDI) was moderate (six design elements). These details are illustrated in Table 1.

Cluster 1 evaluations

Adequacy of cluster 2 evaluations

All but one of the evaluations in cluster 2 (SHELLS) contained an appropriate control or comparison group. All of the evaluations included pre-intervention measures. SHELLS and Baby HUGS did not collect follow-up data, while the remaining evaluations included at least intermediate follow-up data. Half of the evaluations did not have adequate statistical power and half did not use reliable measures.

The evaluation integrity of one intervention (Elmira PEIP) was excellent, reflecting all ten of the design elements. One intervention (SHELLS) had very poor evaluation integrity (one design element present) while the evaluation integrity of the remaining six interventions was moderate to good. These details are illustrated in Table 2.

Cluster 2 evaluations

Adequacy of cluster 3 evaluations

All of the evaluations of interventions in cluster 3 included appropriate control or comparison groups, a representative sample, adequate statistical power, reliable measures and chose appropriate outcome measures.

Table 3 shows that the evaluation integrity of two of the interventions was very good, with both evaluations containing nine of the ten design elements (New Hope, FTP). The evaluation integrity of the remaining intervention (TPDP) was good, containing seven design elements.

Cluster 3 evaluations

Adequacy of cluster 4 evaluations

Most of the evaluations in cluster 4 included a representative sample and chose appropriate outcome measures, while two-thirds of the evaluations included an appropriate control or comparison group and two-thirds used reliable measures. For most of the other design elements, approximately half contained each design element. Attrition in the evaluations was acceptable in only four of the evaluations (Abecedarian, IHDP, Incredible Years, ECEAP) and were not applicable in half of the interventions due to the lack of longitudinal follow-up.

The evaluation integrity of three interventions was very good, with all evaluations containing nine of the ten design elements (Abecedarian, IHDP, Incredible Years). Two interventions (Sure Start and NEWPIN) had very poor evaluation integrity, with each intervention containing only one design element. However, more comprehensive evaluations of Sure Start are pending. The evaluation integrity of the remaining seven evaluations was moderate to good (five to seven design elements). These details are illustrated in Table 4.

Cluster 4 evaluations

Adequacy of cluster 5 evaluations

All three of the evaluations in cluster 5 contained an intermediate follow-up and a representative sample, however none of them contained a long-term follow-up. In addition, attrition was high in all but one evaluation (Cuyahoga) and only Triple P included an appropriate control group and used an appropriate analytic approach.

As shown in Table 5, the evaluation integrity of Triple P was good (seven design elements); the evaluation integrity of PAT was poor (four design elements); and the evaluation integrity of Cuyahoga was moderate (five design elements).

Cluster 5 evaluations

Relative adequacy of evaluations across clusters

It is difficult to make any firm distinctions between clusters, given the great variability in evaluation integrity within clusters. With the exception of cluster 5, each cluster contained evaluations with very good integrity, while all clusters except cluster 3 contained evaluations with very poor to poor integrity.

One design element that warrants further discussion is the use of reliable measures. Regardless of cluster, most of the evaluations included some objective measures, as well as parental reports. Although parent reported measures have their merit, and are usually the most expedient way of data collection, they are subjective by nature. Objective measures are therefore needed to corroborate parental reports.


Research Report 14: Contents | Next | Previous