The terms "research-based" and "evidence-based" are widely used to promote one educational program or another, but we too rarely ask, "What do you mean by ‘research'?" When someone claims they have looked at the research, they might mean anything from a comprehensive literature review to scanning a few posts on Facebook or similar sources. Too often, the latter sources are full of supreme confidence, while real researchers are careful to acknowledge the limitations of their work.
Careful consumers will distinguish between spurious claims and well-founded research. Here I want to highlight five levels of claims justifying why a practice is sound. All of them assert themselves as "research-based," but only two truly draw on genuine research.
Level 1: "I Believe It!"
The first level of claim points to personal beliefs. We're all entitled to our own beliefs, but we are not entitled to our own facts. In my work in schools around the world, I hear strongly held beliefs about the value of corporal punishment, the efficacy of using grades as punishment, and the lack of necessity for teacher collaboration, because teacher discretion is inviolable.
However strongly held these beliefs may be, they are not based on research. Each of these claims, and many others grounded in personal beliefs, not only aren't supported by evidence, they're contradicted by the latest and best research. Corporal punishment is counterproductive, leading to worse behavior by students who have been subject to it (Global Initiative to End All Corporal Punishment of Children, 2016). If grading as punishment were truly effective, after more than a century of punishing students with Fs and zeroes, wouldn't we expect to see all work submitted on time with perfect quality? And while teacher discretion remains an important part of some decisions, the positive effects of collaborative teams in schools are clear (DuFour & Reeves, 2016).
Level 2: Personal Experience and Seeming Success
"It works for me!" is a common refrain to support ineffective leadership and pedagogical practices. For instance, administrators may insist on taking the time to announce basic information orally during staff meetings (rather than just providing the info in written form), despite the demonstrable failure of this tactic to inform teachers fully or influence their work. And teachers may cling to the delivery-of-content model of teaching (both in-person and virtually), believing delivery is the same as learning, although interactive lessons with frequent checks for understanding are more effective. In virtual lessons I observed this year, I saw many examples of highly engaged students frequently interacting with the teacher. But I also saw 1st graders sit still while a teacher talked for 30 minutes. Any parent of a seven-year-old knows that this practice is, to use a technical term, crazy.
When someone says "It works for me," we need to challenge precisely what that claim means. In the foregoing examples, it means "It's comfortable for me" or "I feel competent and in control when I do this." It certainly doesn't mean that either the adults in the meeting or the students in the virtual class are benefitting.
Level 3: Collective Experience
"It's not just me, it's the whole 3rd-grade team!" "The math department is in complete agreement on this." Imagine being a teacher in a school where such claims are common, and the impact this uniformity would have on your ability to engage in innovative practice. Group agreement that a strategy works isn't research showing that strategy works. Some of the greatest gains I've seen in student achievement happened not when a department was in agreement on practices to use, but when a few brave teachers broke out of the mold and tried something new.
Level 4: Systematic Comparison of "Before and After"
Many teachers are reluctant to engage in action research—research that involves the researcher as a participant—because they fear the results will be only anecdotal and won't apply more generally. That concern may be valid if the experience of a particular classroom is unique; a strategy that boosts specific types of students' learning might not work for other groups. But action research can indicate whether a strategy is effective if teachers use a "science fair" approach (Reeves, 2008). In this approach, many teachers share their experiences trying out the same practice using three-part displays. The first part shows the challenge—perhaps sluggish academic achievement, low attendance, or little parent engagement. The middle part shows the professional practice tried—a strategy to engage students, alternative grading policy, or new means of reaching parents. The third describes the results after teachers tried the practice. When a single teacher experiences gains in student achievement as the result of an improved professional practice, it is an anecdote. But when a room full of teachers from different grades, subjects, and schools all try the same practice and achieve similar results, then it becomes a body of evidence.
In fact, this approach has the qualities of a perfect experiment, in which the performances of the same students are compared before and after a specific practice is implemented. Each student has the same teacher and general background (nutrition, parental support, etc.) before and after; the only change is the intervention, and several teachers have compared their students before and after the intervention. This is far more credible than the common practice of comparing two groups of students when the difference in the groups isn't just the presence or absence of a program or practice, but also other differences—like teachers, families, and school environments.
Moreover, the science fair approach shows teachers and leaders what professional practices are effective in their own schools, with their existing schedules, funding, and other factors. If the objective is to change professional practices, this approach of "inside-out" change is more effective than "top-down" changes, which frequently fail.
Level 5: Preponderance of the Evidence
Every research method has strengths and weaknesses. When someone asks me for the best study supporting a particular educational practice, I reply that this is a fool's errand. There is no "best study," but educators can consider the cumulative effect of different studies, using different methods, from different parts of the educational universe, that all come to similar conclusions.
Consider research on the impact of teacher efficacy. My quantitative studies of more than 2,000 schools placed efficacy as one of the key variables in improving student achievement (Reeves, 2011). Qualitative research and deep case studies have revealed similar findings (Hargreaves & Fullan, 2012). A synthesis of many studies, or meta-analysis, (Marzano, 1998) came to the same conclusion, as did a 2018 synthesis of meta-analyses by Donohoo and colleagues. So, educators who consider the evidence from these varied sources can feel confident that teacher efficacy is indeed key.
It's easy to find an article that supports a particular practice, and this leads to the claim that one can usually find an education research study that supports contradictory claims. It's much more difficult—but more credible—to find many different studies from different sources and different locations, all of which come to very similar conclusions.
Which Two Qualify?
Therefore, when evaluating research claims, we should probe whether the claim is based on personal belief, personal experience, collective experience, systematic comparisons, or the preponderance of the evidence. Only the last two of these five claims qualify as research.
So why do personal beliefs and experiences dominate educators' thinking? We have the greatest familiarity with our own experience; many of us likely believe grading and homework practices that we experienced as students were effective because we became college-educated teachers. Isn't it obvious that our experiences worked? While that may seem obvious, let's ask, "What percentage of our students today will become educators?" If the answer is less than 100 percent, we should reconsider the assumption that all students will experience, with current practices, the results that we experienced. The acid test of any research is whether we can find, and are open to, data that contradict our expectations.
We should also try to discern whether the researcher knew the answer to the research question before beginning the study. If an investigator already knows the answer, it's not research—it's more like the entertaining 3rd grade science project mixing baking soda and vinegar to create a "volcanic explosion." Real research doesn't begin with "I am going to prove …" but with the question, "I wonder what will happen if …."
As educators evaluate and conduct research, we need fewer claims of certainty—and more genuine wonder.