July 1, 2025

•

5 min (est.)

•

Vol. 17

•

No. 1

Why Testing Shouldn’t Be the First Response to Last Year’s Learning Gaps

Robert A. Bjork

Kate Jones

Dylan Wiliam

Tests conducted early in the school year will tell us little about what students have and have not learned. The latest research about memory offers an explanation.

Assessment Instructional Strategies

Over the past year, a number of studies have attempted to quantify exactly how students’ learning has been disrupted by the coronavirus pandemic. Students have missed out on learning opportunities—what might be called “unfinished learning” (Thomas B. Fordham Institute, 2021). They may also seem to have forgotten much of what they used to know—what is sometimes called “learning loss” (Kuhfeld et al., 2020).

Though estimating the pandemic’s effects on student learning may be important for policymakers, it is far more important for educators to think forward. We must focus on what can be done to ensure that students make as much progress as possible as they return to school, while acknowledging that anything that we do needs to take account of the constantly shifting context of novel variants of the virus, the availability of vaccines for younger students, and the adoption of mask mandates. One proposal that has gained traction over recent months is the idea that schools need to formally test their students upon their return to face-to-face schooling. Proponents of this approach argue that this would tell teachers where their students are, so that teachers can frame their teaching appropriately.

There are two problems with this argument. The first is that test results provide little guidance for teachers about what to teach. The second is that tests conducted early in the school year tell us little about what students have and have not learned.

Why Test Results Can’t Guide Teaching

Achievement tests can do a reasonable job of telling us where a student is along a continuum of knowledge, but these results tell us little about what to do next. Knowing that students in a particular grade or year group are “three months behind where they should be” doesn’t provide us with any information about how to close the gap between where they are and where they need to be.

Realizing this, some test publishers have tried to increase the usefulness of their tests by offering diagnostic score reports. For example, if a 60-question science test includes 20 items on biology, 20 on chemistry, and 20 on physics, it seems plausible that getting separate scores by subject for the questions would offer useful information about the allocation of teaching time.

If a student gave correct answers to only 10 of the biology questions versus 20 of the other—provided the questions were equally difficult—it might seem that the student needs to spend more time reviewing biology. However, this might not be the case. Since we only have 20 questions on each subject, the score for biology is much less reliable than the score for science as a whole.

Unless students score very differently on different aspects of what is being tested, diagnostic scores tell us little about an individual student’s strengths and weaknesses, even though such tests might tell us what materials the class as a whole has more or less learned.

Why Early Tests Won’t Tell Us Much

To understand why tests may not tell us much about how much students have learned, it is important to understand the latest research about how human memory works.

Most people tend to think that memory works in the way that Edward Thorndike suggested over 100 years ago in his laws of use and disuse (Thorndike, 1914). Put simply, if you use certain knowledge or facts, you remember them, and if you don’t, you forget them.

Achievement tests can do a reasonable job of telling us where a student is along a continuum of knowledge, but these results tell us little about what to do next.

But what does “forget them” mean? If you learned French in school, for example, it is almost certain that you were taught the French word for “ear.” If you cannot recall the word now, it is tempting to conclude that the memory for that word has faded away like footprints in the sand. That, though, is not how human memory works. Information, once learned, remains in our memories, but can (and often does) become non-recallable because of disuse and other factors.

This distinction between how easy something is for students to retrieve at any given moment and how well something has been learned is at the heart of Bjork and Bjork’s (1992) “new theory of disuse” framework, which assumes that any item in memory has two important characteristics: storage strength and retrieval strength. Storage strength describes how well something has been learned at some point, including things like how well that item is linked to other items in one’s memory. One important feature of this model is that, unless there is damage to the brain, storage strength cannot decline; it can only stay the same or increase.

On the other hand, how easy something is to retrieve at any given time—retrieval strength—can go up or down, depending on how well the student learned something in the first place, how recently it was retrieved from memory, and what the cues are for the memory in the current environment. Understanding the distinction between retrieval strength and storage strength is at the heart of effective teaching, and it also provides a powerful explanation about what happens when lessons that students seemed to understand don’t test well later on.

Every teacher has had the experience of teaching an apparently successful lesson—as judged by what students can recall and what questions they can answer at the end of the lesson--but then finding a week or two later that their students remember very little. What happened is that retrieval strength is high right after a lesson, but that fact does not assure us that storage strength (the long-term changes in the students’ capabilities) is high.

So, if we test students as soon as they return to school, we might find that students cannot recall things that they knew 18 months earlier, and it would be tempting to assume that this is an example of “lost learning.” However, if they knew it well 18 months ago, storage strength for that material is high; it’s retrieval strength we need to work on. In other words, what students can do on their first days back in school is likely to be a poor guide to what they have actually learned in the storage-strength sense.

The Power of Review

Testing our students as soon as they return to the classroom is likely to be an unpleasant experience for students and provide little meaningful information for teachers. Instead, we propose that prior to assessing the levels of pandemic-related learning loss, teachers should first carry out a refresher review of previously learned material. Doing so will increase both retrieval strength and storage strength, and, at the same time, give students confidence that what they used to know has not been forgotten, but is still there, waiting to be reactivated.

Some practical strategies make this teaching more effective. First, reviewing or restudying material when retrieval strength is low has a greater impact on storage strength than the same amount of time spent restudying material when retrieval strength is high. An hour spent restudying material after a break is likely to have more impact on long-term learning than the same time spent before the break, when the material was more familiar.

If students do have difficulty retrieving material that they used to know well (low retrieval strength, high storage strength), then restudying the material will increase both retrieval strength and storage strength.

Second, while restudying material increases both storage and retrieval strength, successfully retrieving the same material from memory has an even bigger impact. Retrieving knowledge from memory modifies our memories to encode the correct information and decreases the retrieval strength of competing or incorrect information.

If You Do Give a Test…

Applying research findings to one’s teaching is often far from straightforward. On the one hand, a body of research has shown that giving students retrieval practice in the form of a low-stakes test can boost confidence and decrease anxiety, as students are then subsequently less nervous when it comes to formal testing and assessment (Agarwal, et. al, 2014). It is also the case that even a failed test can not only increase the power of subsequent learning of the tested information, but also increase new learning (Chan, Meissner, & Davis, 2018).

But if a test is difficult enough that a student is unable to answer the questions, then there is no immediate benefit—it is successful retrieval that the student needs.

This is the delicate balance that teachers must master: How to incorporate the very significant benefits of testing, including what Kapur (2008) has referred to as “errorful learning,” while, at the same time, encouraging students to ask questions and make mistakes.

One way of lowering the stakes for practice testing is to use “zero-stakes” tests, in which students complete a test on their own, score their own work, and do not have to report how they did. In addition to providing retrieval practice, self-testing has an additional benefit known as the “hypercorrection effect.” When students discover that an answer they thought was correct is, in fact, incorrect, they are more likely to remember the correct answer than if they had just guessed. More importantly, the more confident they were that their answer was correct, the greater the benefit is of correction (Butterfield & Metcalfe, 2001).

Research on the power of retrieval practice also sheds light on how to maximize the impact of well-known practices such as “think-pair-share.” Many teachers use this technique by posing a question and then allowing students to immediately turn and talk to a partner. This approach provides little in the way of retrieval practice. Instead, the teacher might ask students to do a “brain dump”: writing down what they can recall about a specific topic from memory, without any support or assistance. Then, the students talk to their partners to compare notes. When using “think-pair-share,” it is important that we don’t skimp on the think.

Start from Where Your Students Are

There's an old joke about a driver who is lost and asks a local for directions to the driver’s destination, to which the local replies, "Well, if I were you, I wouldn't start from here." The local’s comment is not particularly helpful; the driver has no other alternative but to start from where he or she is. Teachers can fall prey to doing something similar to their students. Telling a student “You should be able to do this” is not helpful.

David Ausubel (1969) reminds us that “the most important single factor influencing learning is what the learner already knows”(p. vi). In other words, “We need to start from where the learner is, not where we would like the learner to be.”

As teachers return to the classroom, they are likely to face a constantly changing landscape. There cannot be any hard-and fast rules about what to do. But by bearing in mind the key distinction between how well something has been learned (storage strength) and how easy it is to recall at any moment (retrieval strength), teachers will be better able to make decisions about what activities are most likely to help their students learn effectively, no matter what challenges they face.

References

•

Agarwal, P. K., D’Antonio, L., Roediger III, H. L., McDermott, K. B., & McDaniel, M. A. (2014). Classroom-based programs of retrieval practice reduce middle school and high school students’ test anxiety. Journal of Applied Research in Memory and Cognition, 3(3), 131-139.

•

Ausubel, D. P. (1968). Educational psychology: A cognitive view. New York, NY: Holt, Rinehart & Winston.

•

Bjork, R. A., & Bjork, E. L. (1992). A new theory of disuse and an old theory of stimulus fluctuation. In A. F. Healy, S. M. Kosslyn, & R. M. Shiffrin (Eds.), From learning processes to cognitive processes: Essays in honor of William K. Estes (Vol. 2, pp. 35-67). Hillsdale, NJ: Lawrence Erlbaum Associates.

•

Butterfield, B., & Metcalfe, J. (2001). Errors committed with high confidence are hypercorrected. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27(6), 1491–1494.

•

Chan, J. C. K., Meissner, C. A., & Davis, S. D. (2018). Retrieval potentiates new learning: a theoretical and meta-analytic review. Psychological Bulletin, 144, 1111-1146.

•

Kapur, M. (2008). Productive failure. Cognition and Instruction, 26(3), 379-425.

•

Kuhfeld, M., & Tarasawa, B. (2020). The COVID-19 slide: What summer learning loss can tell us about the potential impact of school closures on student academic achievement. Portland, OR: Northwest Evaluation Association.

•

Sinharay, S., Puhan, G., & Haberman, S. J. (2010). Reporting diagnostic scores in educational testing: Temptations, pitfalls, and some solutions. Multivariate Behavioral Research, 45(3), 553-573. doi:10.1080/00273171.2010.483382

•

Thomas B. Fordham Institute. (2021). The acceleration imperative: A plan to address elementary students’ unfinished learning in the wake of Covid-19. Washington, DC: Thomas B. Fordham Institute.

•

Thorndike, E. L. (2014). The Psychology of Learning. Teachers College: New York.

Robert A. Bjork is a distinguished research professor in the Department of Psychology at the University of California, Los Angeles. His research focuses on human learning and memory and on the implications of the science of learning for instruction and training.

Learn More

Kate Jones is a teacher of history in Abu Dhabi, United Arab Emirates and the author of Love to Teach: Research and Resources for every classroom, the Retrieval Practice collection and Five Formative Assessment Strategies In Action (John Catt Publishing).

Learn More

Dylan Wiliam is Emeritus Professor of Educational Assessment at UCL Institute of Education. He is the author of many books, including Creating the Schools Our Children Need (Learning Sciences International, 2018).