Most people know the basic story of handwashing in medicine: Infections in medical facilities were a seemingly intractable problem until, starting in the 19th century, iconoclastic doctors used data to show that washing hands (and instruments) reduced infections and saved lives. Yet whether from stubbornness, politics among practitioners, or genuine disbelief that washing made a difference, there was plenty of resistance before basic steps to prevent infections took hold. Today, however, routine steps to prevent infection are commonplace. For instance, when you visit a doctor's office or emergency room, you'll see sanitizer by the door (even more so now in the midst of the coronavirus crisis).
In education, though, we still don't "wash our hands." Instead, we too often alternatively ignore, belittle, or weaponize scientific findings relevant to education, depending on personal or institutional preferences. We don't, in the education sector, do enough to support a culture or politics that prizes empiricism and learning—including learning about which education practices work best and what empirical data indicates about which practices are most effective.
The "Threshold of Certainty" and Other Dilemmas
There are different ways of understanding the world. Some people understand things through religion, history, tradition, culture, or ideology. For others it's science, sometimes filtered through these other touchstones. Philosophers sometimes refer to science as "epistemically special" meaning that, compared to other methods of learning about the world, science has been especially successful and revered. Policymakers, advocates, and salespeople are eager to use the "according to science … " argument, because Americans believe in "science" (Kennedy, 2016).
A key problem for the policymaker or practitioner who hopes to use findings of science honestly is identifying a threshold of certainty for acting on those findings in some way. How persuasive must scientific evidence be before we feel compelled to act? There is always some uncertainty in scientific knowledge; every theory is understood to be provisional, to be the best explanation we have now. Yet teachers and policymakers cannot wait for overwhelming evidence and seemingly flawless theory; sometimes they must act—even when the evidence is far from ideal. For example, researchers know less about helping children develop reading fluency than they do about other topics in reading, such as decoding. But practitioners will encounter children who have trouble with fluency, and they must help those children as best they can.
When should education leaders or teachers act based on the best science available, and when should they conclude that the best available still isn't good enough and should have no influence? And what then?
These are difficult questions for many fields, but education is arguably doing a far worse job of engaging with conflicting and incomplete evidence in a thoughtful way than many others. The education sector often struggles even when fairly robust bodies of evidence exist on a particular intervention, for example on issues like reading instruction. Sometimes educators rush enthusiastically to apply an approach that hasn't amassed enough evidence of effectiveness ("overpromising") and sometimes they "underdeliver" by not adopting methods clearly backed by scientific research.
Overpromising: Exuberance for Evidence-Poor Practices
Sometimes the problem is straightforward: educators, researchers, or policymakers base decisions on poor science. Learning styles provides a prominent example. The scientific claim made in such approaches and theories is that different people have different preferred ways of learning—for example, via images or via words (Cuevas & Dawson, 2018) or via different modalities, such as vision, hearing, or kinesthesia (movement) (Fleming & Mills, 1992). Further, these preferences (particularly whether or not a learner is given chances to learn through their optimal style) affect learning outcomes. For decades, psychologists and other researchers have pointed out that the research base for these theories is absent or thin (Pashler et al., 2009; Willingham, 2018). Yet products claiming to help educators cater to learning styles abound, and textbooks intended for future teachers still present learning styles theories as valid. The theories are a fixture at many education conferences and schools of education.
In other instances, the basic science is solid, but the field of education races toward unproven applications of that science. Educators' recent fascination with grit provides an example (Duckworth et al., 2007). Arguably, for example, it would be prudent to wait a bit longer to be certain that scientists agree that Angela Duckworth's particular description of grit—exhibiting passion and perseverance for long-term goals—is helpful to understanding an individual's motivation before applying strategies based on that concept in classrooms.
Even if you think the scientific evidence on grit is firmly in hand, it seems obvious that a person probably can't (and shouldn't) choose what another person is passionate about. The student, not the educator, should choose his or her own goal, so the popular idea of making students gritty about performing schoolwork is likely flawed.
Overpromising based on incomplete scientific evidence also creeps into policymaking. Teacher evaluation provides a compelling example. Faced with abundant research evidence that teacher evaluations were overwhelmingly perfunctory (Weisberg et al., 2009), policymakers—prodded by the Obama Administration's Race to the Top grant program—acted by implementing evaluation systems for all teachers in just a few years. Moving a field of several million practitioners from a situation where hardly anyone is evaluated meaningfully to one where everyone will be—every year—was a Mt. Everest of change management, yet implementation proceeded heedlessly. Flawed models were implemented, policies weren't well thought through, perverse consequences abounded, and teachers were frustrated. Instances where the policy appeared to work—for example in Washington, D.C. (Dee & Wyckoff, 2015)—were overshadowed by problems elsewhere. An important idea, evaluating teachers more effectively, and some helpful science about the usefulness and limitations of value-added analysis were largely discredited.
Underdelivering: Ignoring Science that Might Help
There are also times when solid science is underutilized in education. Researchers know, for instance, that reading comprehension depends heavily on how much the reader knows about the subject matter of the text; knowledge of whales helps a lot when reading an article about whales (Adams, 2009). Yet the obvious implication—that if we hope to improve reading comprehension, we need to plan and sequence the curriculum to ensure a broad base of knowledge—has mostly been ignored or, perhaps worse, politicized.
Phonics instruction—systematically teaching beginning readers the relationship between sounds and letters/letter groups—is another example. The method was originally based on intuition and real-world observations, but scientific evaluations of phonics show that it's more effective than other common methods of reading instruction, a conclusion endorsed by a great majority of researchers (International Literacy Association, 2019). Yet many early elementary teachers don't use the method because they were not taught it, were taught to disbelieve it, or are improperly trained so they implement it haphazardly.
Causes of Misjudgment in the Use of Science
So why do educators and education policymakers so often overestimate or underestimate the value of scientific research? We believe it's because at one or more points in the process of using science to improve education, something breaks down. The process for adapting scientific evidence for use in education is generally a cycle of three steps. It begins with identifying a finding from research in basic science—for instance, something about how children understand numbers, or how individuals relate to one another in groups. The second step is exploiting that knowledge to make classrooms or schools more effective. The third is evaluating that application to be sure it helps as intended. The process is cyclical because we should always be updating our knowledge of basic science and trying to find still-more-effective educational practices.
At step 1, overpromise may be due to straightforward inaccuracies in conclusions drawn through basic science, as in the case of learning styles, where the evidence from real science is thin. Underpromise may happen here when practitioners don't know about high-quality scientific evidence related to a practice that may be helpful for learning—as has been the case, for example, regarding methods of committing information to memory. Underpromise may also happen when educators seek to apply findings from research that is somehow flawed—mistaking correlation for causation, for example.
At step 2, overpromise may happen because people assume the process of applying basic scientific principles to interventions will be simple, which it rarely is. Alternatively, advocates for a certain approach, or the media, may hype a finding out of proportion to its place in a broader body of evidence. Underpromise at step 2 might look like a lack of classroom-ready applications for a scientifically validated principle. For example, a school leader who sees the value of a carefully sequenced, knowledge-rich curriculum has few to choose from.
In step 3, measuring progress, problems arise when educators don't have reliable, valid measures of the outcomes that a strategy or program is meant to improve. This is a problem, for instance, in evaluating programs meant to boost creativity or critical thinking. They overpromise because there is, in fact, no way of knowing whether or not they are effective.
A final factor leading to a lack of progress in applying research is forgetting the cyclical nature of science (and its application). Education as a field is extraordinarily bad at remembering what we've tried before. We keep having many of the same arguments that raged a century ago—on, for example, the benefits of teaching reading via phonics versus other methods and the relative merits of discovery learning versus direct instruction (Mayer, 2004; Rousseau, 1762/1909). Forgetting our past can lead to either underpromise or overpromise, depending on whether we've forgotten evidence of effectiveness or uselessness.
Of Values, Science, and Education
Applied sciences like education inevitably entail values. That's because the goal is not to describe the world as it is (as in natural sciences), but to change the world, to make it more like it ought to be. Basic science can tell you that some children are good at math whereas others are good with words—but it won't tell you whether you should ask children to work harder on their weaknesses or further develop their strengths. Discussions that are ostensibly about the scientific merit of findings can easily turn into cloaked discussions of values—a source of much confusion and mischief in education.
Consider Teach for America, in which corps members (mostly recent college graduates) undergo an accelerated training program and then teach in understaffed schools, where they receive ongoing support. The program has attracted a firestorm of criticism, but numerous studies indicate that, on average, its teachers perform as well or even modestly better than other teachers in the communities where they work (Backes & Hansen, 2018; Clark et al., 2013; Kane, Rockoff, & Staiger, 2008).
From a values standpoint, someone could certainly have concerns about Teach for America. For example, one might see value in teachers coming from children's home communities or believe the accelerated training that Teach for America initially offers undermines efforts to professionalize teaching. Well-intentioned people can disagree on these and other issues. But rather than discuss them—and attendant values—openly, people with reservations about Teach for America often promote low-quality research and anecdotes that indicate the program doesn't work.
Charter schools offer another example. A large body of evidence indicates that, on average, charter schools in the suburbs and rural communities are a mixed bag in terms of results on standardized tests, whereas charters in urban communities tend to outperform non-charter public schools with similar resources and students—sometimes dramatically (Center for Research on Education Outcomes, 2015). Again, reasonable people can disagree on a host of values issues around charter schools—for example, concerning the role of choice in education, or collateral effects charters have on school systems and communities. But again, instead of openly discussing values and their role in judging various options (as well as data), values and empiricism become conflated.
Even on less acrimonious issues, values can confuse debate. Discussions of education technology, for example, are often clouded by concerns about the role of technology in modern culture.
Making It Better
We suggest two approaches that might help educators, education policymakers, and researchers find a way out of this mess that we have created.
First, those in the education field must become more respectful of evidence, starting with practitioners. Teachers need not be statisticians or researchers, but they should at least have a familiarity with basic research principles and methods, including common threats to validity. We expect district leaders would benefit from more experience and training in this arena as well. We recognize, however, that their time for this work is limited.
In other fields, professional organizations provide leadership to ensure that reliable, timely research summaries are available to practitioners, as the American Medical Association does for physicians. The American Federation of Teachers and the National Education Association could provide a valuable service to members by offering systematic, scientifically literate reviews of prominent research findings, and of findings that should be prominent.
Second, we must become more respectful of and transparent about values. Schools are political creations, so values are embedded in their DNA. We goad people into using science and data as a weapon to defend their position when we disrespect them for speaking about values. Values cannot be a trump card and must be weighed against scientific evidence—but we do ourselves a disservice when we wish values away rather than clarifying them. We hope that teacher-preparation programs will do more to ensure that educators develop a clear understanding of the distinction between scientific evidence and values, and how each informs decisions about education.
Recently, commenting on Twitter about the idea of a "science of reading," education advocate and historian Diane Ravitch tweeted that, "Teaching reading involves art and craft and experience, not 'science.'" In fact, it involves all four. That we continue to pit them against each other shows how far we have to go.