Skip to content
ascd logo

Log in to Witsby: ASCD’s Next-Generation Professional Learning and Credentialing Platform
November 1, 2009
Vol. 67
No. 3

The Next Generation of Testing

Simulations. Situated exercises. Tracking students' thought processes as they solve a problem. Welcome to the world of 21st-century assessment.

premium resources logo

Premium Resource

Since the IBM Type 805 Test Scoring Machine first hit the market in 1938, fill-in-the-bubble score sheets and scanners have remained the dominant technologies used in local, state, and national assessments. Test booklets and bubble sheets—along with a well-developed set of psychometric principles—are now so deeply embedded in the culture that for many Americans, they are synonymous with assessment.
But underlying these pre–World War II technologies are approaches to testing from the same era, which rely heavily on multiple-choice questions and measure only a portion of the skills and knowledge outlined in state education standards. These approaches do not align well with what we know about how students learn. Nor do they tell us much about how to help students do better. As a result, at a time when we're testing students more than ever—and using those results to make crucial judgments about the performance of schools, teachers, and students—our testing methods don't serve our education system nearly as well as they should.
An alternative—performance-based testing—reached its zenith in the late 1980s and early 1990s. States began to experiment with using projects, portfolios, exhibitions, and other activities to measure content mastery. But the states' performance assessments were costly and technically inadequate for use in school accountability systems. Significant problems were also reported concerning the reliability of such programs' assessment scores (Koretz, McCaffrey, Klein, Bell, & Stecher, 1992). As a result, states began to move away from performance-based assessment systems back to less-expensive multiple-choice assessments.
But now the convergence of powerful computer technologies and important developments in cognitive science holds out the prospect of a new generation of student testing—one that could significantly improve teaching and learning. These technologies, which feature the efficiency and consistency of machine-read scoring along with cognitively challenging, open-ended performance tasks, can help us build assessments that move beyond bubble-filling and, at the same time, offer rigorous and reliable evidence of student learning.

What Technology Can Tell Us

States have slowly begun to adapt new technologies, such as the Internet, to student testing. Just over one-half of U.S. states, for instance, use computers to deliver a portion of the annual state testing programs mandated by No Child Left Behind (Bausell, 2008). However, for the most part these states' investments in technology have not led to fundamental changes in approaches to testing; they've simply made old approaches more efficient. Even the most technologically advanced states have done little except replace the conventional paper-based, multiple-choice, fill-in-the-bubble tests with computerized versions of the same.
Technology, however, has the potential to do more than just make our current approach to testing more efficient. A growing number of testing and learning experts argue that technology can dramatically improve assessment— as well as teaching and learning. Several new research projects demonstrate how information technology can both deepen and broaden assessment practices in elementary and secondary education by both assessing more comprehensively and assessing new skills and concepts—all of which can strengthen state assessments, national standardized tests like the National Assessment of Educational Progress (NAEP), and classroom-based tests meant to help teachers improve their instruction.
These new technology-enabled assessments can help educators understand more than just whether a student answered a test question correctly or incorrectly. Using multiple forms of media that enable both visual and graphical representations, these assessments present complex, multistep problems for students to solve, and they collect detailed information about an individual student's approach to problem solving. This information can show how students arrive at their answers, what those pathways reveal about students' grasp of underlying concepts, and how teachers can alter their instruction to help move students forward. Most important, the new research projects have produced assessments that reflect what cognitive research tells us about how people learn, providing an opportunity to greatly strengthen the quality of instruction.

Promising Models

A number of promising research projects are beginning to explore the potential of technology to transform testing in fundamental ways. One of the largest efforts to pilot new forms of technology-based assessment is the Problem Solving in Technology-Rich Environments (TRE) project. It was launched in spring 2003, when a nationally representative sample of 2,000 students participated in a study to explore how information technology could be incorporated into NAEP. The goal was to create scenarios that would simulate real-world problem solving.
The TRE scenarios test scientific inquiry skills such as the ability to find information about a given topic, judge what information is relevant, plan and conduct experiments, monitor one's efforts, organize and interpret results, and communicate a coherent interpretation. In one component, 8th graders used a simulated helium balloon to solve problems of increasing complexity about relationships among buoyancy, mass, and volume. For example, the students were asked to determine the relationship between payload mass and balloon altitude. To solve the problem, students gathered evidence by running simulated experiments using a variety of different payload masses. Once they had enough evidence, they submitted their conclusions using both open-ended and multiple-choice responses.
The TRE approach demonstrates several unique capabilities of technology-enabled assessments. They can offer more complex, multistep problems for students to solve. In addition, multiple forms of media, such as the animated helium balloon and instrument panels in the TRE simulation, can present information in more useful and compelling ways than text alone.
Finally, technology-enabled assessments can present tasks based on complex data sets in ways that even elementary school students can use. In the TRE problem-solving exercise, for example, students see both visual and graphical representations showing what happens to the balloon during each experiment. Figure 1 (p. 49) shows a graph generated from the results of an 8th grader's simulated experiments.
Figure 1. A Technology-Rich Environments (TRE) Exercise
el200911 tucker fig1
Source: Bennett, R. E., Persky, H., Weiss, A. R., & Jenkins, F. (2007). Problem-solving in technology-rich environments: A report from the NAEP Technology-Based Assessment Project (NCES 2007-466). U.S. Department of Education. Washington, DC: National Center for Education Statistics.

Tracking the Learning Process

The problems in Technology-Rich Environments can be dynamic, presenting new information and challenges on the basis of a student's actions. This enables students to take different approaches and even test multiple solutions. Moreover, databases can record descriptive data about the strategies students used and the actions they took.
In the simulation exercise, for instance, every student action—which experiments students ran, which buttons they pushed, and what values they chose and in what order—is logged into a database. The database also records when the student took each of these actions.
The quality of students' experimental design choice is evaluated using a set of rules and then scored using statistical frameworks. These algorithms are linked across multiple skills, enabling instructors to evaluate students on the basis of multiple points of evidence. And because each of the component skills can be traced back to observable student actions, instructors can gather detailed evidence to determine why a student responded the way he or she did, helping to identify gaps in skill level, conceptual misunderstandings, or other information that could inform instruction.
Instead of just one data point—a right or wrong answer—technology-enabled assessments can produce hundreds of data points about student actions and responses. One of the major research challenges at this time is developing and validating statistical algorithms to analyze and distill these data into usable information.

Real-Life Applications

Simulated exercises are useful for assessing students' knowledge of interactions among multiple variables in a complex system, such as an ecosystem. But because these models assess both process and content, they require assessments that are closely linked with classroom instruction.
This presents a problem for the broad use of these models. The TRE project, for example, restricted its assessment to scientific problem solving with technology— rather than science content— because NAEP cannot assume that students in the United States' 14,000 school districts have all covered the same science content.
In contrast, the Calipers project, funded by the National Science Foundation, seeks to develop high-quality, affordable performance assessments that can be used both for large-scale testing and in classrooms to inform instruction. Focused on physical science standards related to forces and motion, along with life sciences standards related to populations and ecosystems, Calipers engages students in such problem-solving tasks as determining the proper angle and speed to rescue an injured skier on an icy mountain (see fig. 2). Similar to Technology-Rich Environments, Calipers captures descriptive data, describing the approach that a student took to solve the problem (choice of experimental values, choice of formulas), along with multiple-choice and open-ended responses. These descriptive data, along with student reflection and self-assessment activities, can provide to both students and teachers information to guide learning and instruction.
Figure 2. CALIPERS Problem: Rescuing Injured Skiers
el200911 tucker fig2
Source: From Calipers: Simulation-based assessments, by SRI International, 2006–2007, Menlo Park, CA: Author. Copyright © 2006 by SRI International. Reprinted with permission. Available: http://calipers.sri.com/assessments.html.
Fully immersive simulations, such as those found in medical education and military training, point to further applications of technology. iStan, a lifelike, sensor-filled mannequin that can talk, sweat, bleed, vomit, and have a heart attack, is used for medical training to simulate patient interactions and responses. The U.S. Army has "instrumentalized" many of its war games and other performance exercises, using video cameras and sensors to gather multiple sources of data about what is happening and when. These extensive data can illustrate multiple interactions among team members and lead to productive conversations about what happened, why, and how to improve. These types of assessments and simulated experiences are becoming more prevalent in higher education and the workplace.
This focus on situated assessment—assessing behavior in realistic situations—is increasingly important when people need to be able to communicate, collaborate, synthesize, and respond in flexible ways to new and challenging environments. However, assessing the ability to approach new situations flexibly is challenging in the current paper-and-pencil testing environment.
John Bransford, a professor at the University of Washington and a leading expert in cognition and learning technology, is designing assessments that enable students to demonstrate not only what they can recall, but also how they can use their expertise. Technology-enhanced environments and virtual worlds, such as those found in medical training, are necessary for students to practice and gain feedback in real-life situated environments.

Putting It All Together

But technology alone cannot transform assessment. We first need to overcome logistical and funding challenges that often impede efforts to maintain, administer, and update schools' technological infrastructure. Also, new assessment models must not erode efforts to promote high expectations for all students, nor should they disadvantage low-income schools and students with limited access to technology.
Successful changes to assessment will also require equally challenging revisions to standards, curriculum, instruction, and teacher training. Without deliberate attention to these areas from policymakers and educators, there is no guarantee that technology will fundamentally change core practices and methods in education, a field that is notoriously impervious to change. According to education historian Larry Cuban (1996), just adding technology and hoping for education transformation, without considering the content and practice of instruction, will do no more than automate existing processes.
Although simulations and other technological advances offer many capabilities and opportunities, these tools are only as good as the cognitive models on which they are based. According to University of Maryland education researcher Robert Mislevy (Mislevy, Steinberg, Almond, Haertel, & Penuel, 2000), we can't use the data these tools generate to inform assessment and instruction unless we have a greater understanding of how students learn within a domain. Scientist and researcher Randy Bennett (in press) noted that, "In principle, having a modern cognitive-scientific basis should help us build better assessments in the same way as having an understanding of physics helps engineers build better bridges."
In fact, technology-enabled assessments expose the flaws in our current development of education standards. Most standards are written as though we've asked teachers to ensure that their students can drive to a specific destination—let's say, Albuquerque, New Mexico. Our current assessments can tell us whether a student has arrived, but they don't tell us whether the students who haven't arrived are on their way, made a wrong turn, or have a flat tire.
Technology-enabled assessments could, in principle, operate like a global positioning system (GPS), capable of frequently monitoring and assessing progress along the way. But a GPS is useless without the software that relates physical location back to a detailed map complete with roads, possible detours, and routes to Albuquerque. Similarly, to be transformative and to enhance teaching and learning, technology-enabled assessments need to depend on a detailed understanding of how learning progresses in math, science, and other disciplines. So far, our technological capabilities surpass our knowledge in these areas.

So What Will It Take?

With technology changing at a rapid pace, we have many of the tools to create vastly improved assessment systems and practices. Given the political climate and opportunities for change—the availability of stimulus funds (including Secretary of Education Arne Duncan's commitment of $350 million in stimulus funding to support assessment work), the movement toward common standards, and the upcoming reauthorization of No Child Left Behind—we can bypass the debate between the two flawed options of either maintaining the status quo or returning to performance-based assessment systems of the 1990s.
Yet, because changes in assessment affect our entire education system and infrastructure, from state agencies to test makers to federal officials to classroom teachers, we won't see the real benefits from technology-enabled assessments— improved teaching and learning— without careful attention from policymakers and deliberate strategies to create change. Here are some steps we should take.
First, we need to develop common standards. If we want to use technology to assess more deeply and at a higher level of cognitive challenge, we'll need more of the extended performance-like tasks described here (which are currently in use in the 2009 NAEP science test and the international PISA tests). We'll also require fewer standards because students need to spend more time on these tasks than on multiple-choice items, and development can be expensive. Finally, clearer standards are crucial. If we are unable to clearly define the standards within the curriculum, then we end up with generic tests and weaker instruments.
Common standards could lead to a race to develop a national common assessment. But a national test would almost surely be based on the current deeply embedded assessment tools and practices, foregoing the opportunity for significant changes in the next generation of testing. A better option is to develop a five-to-seven-year plan to support the research and development of the next generation of assessments— with investments all along the pipeline, from the crazy new idea to modest, low-risk improvements.
Given the stakes attached to testing, the less proven the idea, the more we need to try it in a low-risk environment. And although states have been the primary drivers of testing policies, this innovation plan would enable districts, consortiums, and school networks— especially those that aspire to use richer and newer forms of assessment—to take the lead. We should offer waivers to schools and educators participating in these initiatives to use new summative and formative assessment practices. We also need a plan to evaluate and scale these assessments up along the way, starting with small pilots in a few schools, with incentives to build demand so that successful ideas reach more students in more districts and become worthy alternatives to current high-stakes testing.
New technologies offer us the opportunity to plot a course that maintains accountability goals but encourages significant innovation and prioritizes the use of technology-enabled assessments—not just for automation, but for substantive improvements in student achievement.

Bausell, C. V. (2008, March 27). Tracking U.S. trends. Education Week.

Bennett, R. E., & Gitomer, D. H. (in press). Transforming K–12 assessment. In C. Wyatt-Smith & J. Cumming (Eds.), Assessment issues of the 21st century. New York: Springer.

Cuban, L. (1996, October 9). Techno-reformers and classroom teachers. Education Week.

Koretz, D., McCaffrey, D. F., Klein, S. P., Bell, R. M., & Stecher, B. M. (1992). The reliability of scores from the 1992 Vermont Portfolio Assessment Program. Santa Monica, CA: RAND Corporation.

Mislevy, R. J., Steinberg, L. S., Almond, R. G., Haertel, G. D., & Penuel, W. R. (2000). Leverage points for improving educational assessment (CSE Technical Report). Los Angeles: University of California Graduate School of Education and Information Studies.

ASCD is a community dedicated to educators' professional growth and well-being.

Let us help you put your vision into action.
Discover ASCD's Professional Learning Services
From our issue
Product cover image 110022.jpg
Multiple Measures
Go To Publication