On September 23, 1999, the Mars Climate Orbiter spacecraft approached Mars, where it was expected to fire its rockets, enter the Red Planet's orbit, and send data to scientists back on Earth. But something went wrong. Despite careful planning and countless checks and double checks from teams of engineers, the orbiter fired its rockets too late, glanced off the Martian atmosphere, and hurtled into space, lost forever.
On review, NASA found that one mistake, one tiny overlooked detail, caused the loss of the $125 million spacecraft: One team of engineers used English measurements for navigation, whereas another used metric measurements. That rocket scientists can overlook something so obvious serves as a stark reminder that in many endeavors, the devil truly is in the details.
Disappointing Findings from Gold-Standard Studies
A study of 2,140 6th graders using Thinking Reader, a software program designed to improve reading comprehension by asking students computer-adaptive questions about young adult novels, found no effects on reading comprehension (Drummond et al., 2011).
A study involving 2,446 4th graders found no higher mathematics achievement for students working with the popular Odyssey Math software, which is used by 3 million students across the United States (Wijekumar, Hitchcock, Turner, Lei, & Peck, 2009).
A study of nearly 10,000 4th and 5th graders found that students whose teachers were trained in the Classroom Assessment for Student Learning program demonstrated no higher levels of achievement than control-group students (Randel et al., 2011).
A group of more than 600 5th graders who were taught for one year with Collaborative Strategic Reading, a scaffolded approach to reading instruction, had no higher levels of achievement than control-group students (Hitchcock, Dimino, Kurki, Wilkins, & Gersten, 2010).
An examination of Project CRISS (Creating Reading Independence through Student-owned Strategies), which encourages consistent reading strategies in core subject areas, found no significant effects for the nearly 2,500 students whose teachers adopted the approach (Kushman, Hanita, & Raphael, 2011).
Poorly Designed—or Poorly Implemented?
Sixty-nine percent of students used the Thinking Reader software less than the program's developer specified; use dropped off so much that by the end of the school year, only 8.9 percent of students had finished the third and final novel (Drummond et al., 2011).
Students used the Odyssey Math software, on average, only 38 minutes a week, far below the 60 minutes that program developers required; of the 60 classrooms studied, students in only one classroom actually used the software for the time required (Wijekumar et al., 2009).
On average, teachers spent 31 hours being trained in the Classroom Assessment for Student Learning approach (60 hours were required); no differences were observed in their classroom practices compared with teachers not using the program (Randel et al., 2011).
Classroom observations revealed that only 21.6 percent of teachers used all five Collaborative Strategic Reading strategies included in the approach, as required by program developers (Hitchcock et al., 2010).
This inconsistent adoption makes it difficult, if not impossible, to determine whether to fault the programs themselves or the way they were implemented for their lackluster effects, especially as none of the studies parsed their data to determine whether classrooms with better implementation achieved better results. We're left asking why, in study after study, something got lost in translation from program design to implementation.
The problem doesn't seem to be that teachers received no professional development: In most cases, teachers attended 2–3 days of whole-group training followed by 2–3 more days of classroom coaching and observation, arguably a generous level of support. The fine print of some studies, however, suggests another crucial variable: leadership support. In the Project CRISS study, for example, researchers discovered that of the 23 schools in the experimental group, only three held 20 or more teacher study group meetings during the school year (as recommended); nine held five or fewer meetings. These differences, coupled with nearly one-third of teachers reporting that principals never conducted required classroom walkthroughs to monitor program adoption (Kushman, Hanita, & Raphael, 2011), suggest high variability in principals' levels of attention and support for program implementation.
Targeting Coaching to Ensure Implementation
It's worth noting that one recent study did find significant results, and the program under scrutiny took a slightly different approach to coaching support. The study of Kindergarten PAVEd for Success (K-PAVE), an approach that aims to improve early reading through explicit vocabulary instruction and other strategies, found positive effects equivalent to one additional month of learning per school year (Goodson, Wolf, Bell, Turner, & Finney, 2010).
As in the other studies, teachers received training in whole groups as well as classroom observation and coaching, but with an important twist: Coaches didn't give equal levels of support to all teachers. Rather, they provided more intense support to teachers who seemed to be struggling with implementation. Although researchers still found some inconsistencies in program implementation, teacher coaches actively worked to reduce those variations (Goodson et al., 2010).
It would be a leap, of course, to conclude that this single variation alone accounts for the better results of the K-PAVE program. Nonetheless, these findings suggest that when adopting new programs, principals should be aware of the wide variation that will inevitably exist in implementation. The best strategy may well be to provide coaching and support where it is most needed to ensure that all teachers implement the program effectively.