Education is a complex endeavor, and education research needs to take that complexity into account.
Premium Resource
Credit: Copyright(C)2000-2006 Adobe Systems, Inc. All Rights Reserved.
One of the most difficult tasks school leaders face is deciding which resources and programs should go into the classroom. A publisher may claim that its instructional program raises student scores by a grade level with only six weeks of use—but has the publisher documented this claim through research results that have been repeated in classroom after classroom? How can education administrators know which claims to believe?
In the United States, the federal government has attempted to address this issue—and to inject more rigor into the development and evaluation of instructional programs—by inserting into the No Child Left Behind Act the requirement that federally funded instructional programs be supported by “scientifically based” research. To enhance access to such research, the government also established the What Works Clearinghouse (www.whatworks.ed.gov), whose preferred research model is the medical model. Unfortunately, this model, although powerful in certain circumstances, presents serious challenges in education settings.
The Medical Model
The most popular experimental model used in medicine and agriculture for the last 100 years is the randomized assignment design. Although there are many different ways to carry out a randomized assignment study, one of the cleanest is the medical model, in which health care professionals randomly assign individuals with a particular medical condition to one of two groups and then treat one group with an experimental drug and the other group with an inert placebo. The individuals administering the treatment, as well as those receiving it, areblind, meaning that they don't know who receives the experimental drug and who receives the placebo. Following treatment, the researchers measure the outcome and report any side effects. They can then clearly attribute any differences between the two groups to the effects of the experimental drug.
The medical research model is powerful because it allows researchers to make strong statements about the impact that specific variables have on the outcomes they observe. Undoubtedly, this model has moved medicine, agriculture, and other fields forward.
The Medical Model in Education Research
Self-selection of participants. In all but a few federally funded studies, schools or school districts can choose not to participate. This self-selection can bias the outcome and limit our ability to generalize on the basis of the findings.
Treatment-teacher interactions. Unlike an experimental medical treatment, the “treatment” in an education study varies greatly from one classroom to the next. Teachers differ in how they implement a particular instructional program and in how they use that program with different students. Such variations reduce our ability to draw conclusions about the program that apply to other teachers and classrooms.
Multiple reasons for improvement. All program changes in schools take place within the context of other unrelated trends and events. In any experiment, other factors may have caused the results we attribute to the program that we are evaluating.
Characteristics of test scores. Some test score data are inappropriate for use with many statistical analyses. For instance, percentile ranks are not on an interval scale. As a result, they cannot be fairly used to calculate average scores or analyze variance. At the same time, most test scores are not equally precise for all students. This means that a statistical analysis of the test scores of all students may not yield valid information about specific groups of students, such as low achievers.
Nonblind studies. In medical trials, the health professionals administering a drug can remain unaware of whether a particular patient is getting the drug of interest or the placebo. This cannot be the case in an education study; a teacher will always know what textbook she is using in her classroom, for example. As a result, the teacher's impressions of the program under study may sway the results in her classroom.
Control groups. In a medical setting, individuals are randomly assigned to either the control group or the treatment group, and any doctor may have patients in both groups. In education, a particular teacher rarely has the ability to deliver one program to some students and another program to other students in the classroom. Therefore, random assignment in an education study is normally done at the classroom level; the control group is a group of classes that do not participate in the new program. The students in the control group don't receive a placebo treatment; they just get the same program their teachers were using prior to the experiment.
Hawthorne effect. One of the early findings of industrial psychology, dating from Elton Mayo's studies of production at Western Electric's Hawthorne plant in the 1920s and 1930s, is that changing a condition in a production process and observing the outcome normally results in greater output (Draper, 2005). In classrooms where we implement a new instructional approach, this Hawthorne effect is likely to cause positive results, even if the long-term effect of the new approach is minimal or even harmful.
To put all these concerns into context, consider the following typical scenario. A publisher wants to use the medical research model to collect evidence about its new reading program's effectiveness for struggling students in grades 3–5. The publisher contacts a few school districts and asks them to be involved in an experimental study. Some districts decline to take part. Splitting classrooms in the participating districts randomly into control and treatment groups, the researchers give half of the teachers the materials for the new program and lead them through a two-hour training activity; the other teachers receive nothing new. After six months, the publisher collects test score data from the statewide assessment. An independent group's statistical analysis finds that the mean percentile for students in the treatment group is higher than that for the control group. These results, the publisher concludes, indicate that the new reading program is effective.
Districts voluntarily chose to participate, and therefore this sample may not accurately represent all districts that might adopt the program.
Because the publisher didn't evaluate how the program was implemented, we might expect differences in the ways teachers used the program.
Because the statewide assessment isn't designed to be particularly sensitive to struggling students, scores on this test may not clearly reflect the program's effect on the students of interest. Moreover, assessment results reported in percentiles are inappropriate for analysis by many statistical approaches (including t-tests); this renders the test score differences difficult to interpret.
During the course of the study, factors such as social events, a virus outbreak, or teacher absenteeism may have influenced the performance of students in the experimental and control classrooms. In addition, because schools are social settings, teachers in control classes may have implemented parts of the program that they learned about from their colleagues in the experimental group. We can't tell for sure what caused the difference in scores between the experimental and control groups—the new reading program, or any of these external factors.
Because the control group students received nothing new, but the treatment group students did, the Hawthorne effect may account for part of the difference in student performance.
A Better Research Paradigm
This example demonstrates why the medical model may not be the best choice for most education research. A stronger model would measure students at several points across time, match individual students in the treatment group to students in the control group, take into account the effectiveness of a program's implementation, and use achievement measures developed specifically for the students in the study.
In research, as in life, we need to value diversity. No one research model holds the magic formula to understanding education. Education research is difficult. But we can improve our odds of understanding learning by considering the information from a wide variety of nonmedical approaches in addition to the information we gather from medical-design studies.
Ethnographic designs involve on-site observations to create portraits of how classroom processes work, as well as how processes work differently in different settings. These designs are useful in identifying characteristics that make it easier or harder to implement a program, and they also provide direct evidence for a program's usefulness with particular groups of students.
Time series studies using structural equations models enable researchers to determine how different classroom factors contribute to student growth across time. These models can capture a wide variety of characteristics in classrooms that might interact to enhance student growth. No program works in a vacuum when it is implemented in a school.
Quasi-experimental designs look at cross-sectional data concerning student performance and growth before and after a particular program is put in place. This kind of design is particularly useful when there is a change to all of a district's schools at the same time—for example, a new textbook adoption, a new bell schedule, or an extended school year.
When these designs are used well, they enable researchers to make causal inferences from information collected without the medical model. They also allow us to say more than whether a particular program “works” or not. Whether a program works is almost always less interesting and useful than why and in what conditions a program might help a particular group of students.
Another promising approach is to use a growth study with a virtual control group. A number of researchers are using the Northwest Evaluation Association's Growth Research Database to conduct studies that apply this approach. The virtual control group is a sample of students selected from a variety of schools and measured with the same instruments used with the treatment group. This approach reduces costs associated with validity studies and enables researchers to carry out studies in a wider variety of settings, obtaining more valid generalizations from the findings.
It will be wonderful when researchers and educators know which instructional programs are most effective in a particular education situation. But strict adherence to one research paradigm that insists on the medical model as the only valid source of information will not get us there. If we encourage researchers to use a broad range of appropriate research models to provide useful information to teachers and administrators, with high research quality and without overly restrictive regulation by government agencies, we should be able to help kids learn more. Isn't that why we are in education in the first place?
References
•
Berliner, D. C. (2002). Educational research: The hardest science of all.Educational Researcher, 31(8). Available: www.aera.net/publications/?id=438
•
Draper, S. W. (2005). The Hawthorne effect and other expectancy effects. Glasgow, Scotland: Department of Psychology, University of Glasgow. Available:www.psy.gla.ac.uk/~steve/hawth.html
End Notes
•
1 The Northwest Evaluation Association's Growth Research Database contains longitudinal student achievement data collected from more than 1,600 school districts and nearly 10,000 schools in more than 45 states over 10 academic years. The student achievement data collected in the Growth Research Database is a product of the NWEA Achievement Levels Tests and Measures of Academic Progress, which most participating districts use to assess students twice annually. The Growth Research Database allows longitudinal analysis of highly accurate growth information for large and age-diverse student populations. More information is available atwww.nwea.org/research/grd.asp.
•
ASCD is a community dedicated to educators' professional growth and well-being.