December 1, 1996

•

5 min (est.)

•

Vol. 54

•

No. 4

Practicing What We Preach in Designing Authentic Assessments

Grant Wiggins

Designing credible performance tasks and assessments is not easy but we can improve our efforts by using standards and peer review.

Teaching Strategies

What if a student asked for a good grade merely for handing the paper in? What if student divers and gymnasts were able to judge and score their own performances in meets, and did so based on effort and intent? Naive ideas, of course—yet this is just what happens in schools every day when faculty submit new curricular frameworks or design new assessments.

Most faculty products are assessed, if at all, merely on whether we worked hard: Did we hand in a lengthy report, based on lots of discussion? Did we provide students with a test that we happen to like? Only rarely do we demand formal self– or peer–assessment of our design work, against standards and criteria. This not only leads to less rigorous reports and designs but also seems a bit hypocritical: We ask students to do this all the time. We need to better practice what we preach.

But how do we ensure that ongoing design and reform work is more rigorous and credible? At the Center on Learning, Assessment, and School Structure (CLASS) in Princeton, New Jersey, we use design standards and a workable peer review process for critiquing and improving all proposed new curricular frameworks, tests, and performance assessments. At the heart of the work is making adult work standards–based, not process–based or merely guided by good intentions. Using such standards can go a long way in helping parents, students, and the community have faith in locally designed systems.

Standard-Based vs. Process-Based Reform Work

Many new curriculum frameworks and assessment systems produce a significant (and often understandable) backlash. A major reason is that the work is typically produced without reference to specific standards for the proposals and final product.

Think of a typical districtwide curriculum reform project. Twelve teachers and supervisors hold meetings all school year to develop a new mathematics curriculum. Their work culminates in a report produced over a three–week period in the summer, at district behest and with district financial support, resulting in a new local mathematics curriculum framework. They follow a time–tested process of scanning national reports, searching for consensus about themes and topics and logical progressions, and summarizing their findings and recommendations. But against what standards is their product (as opposed to their process) to be judged? The usual answer is: no legitimate standards at all, other than the implicit one that when the authors deem their work finished, the report is complete.

By contrast, what if all report–writers had to answer these questions: Is the report useful to readers? Does it engage and inform the readers? Does it anticipate the reactions of its critics? Does it meet professional standards of curriculum design or measurement? Does it meet the purposes laid out in a charge to the committee? Most important: Did the writers regularly self–assess and revise their work in progress against such criteria and standards? Did they regularly seek feedback from faculty affected en route?—the same writing process questions we properly put to students. Their report would have far greater impact if they addressed such questions. By contrast, with no self–assessment and self–adjustment along the way, the work is predictably ineffective in getting other faculty to change practice or in helping skeptical parents understand the need to do so.

Similarly with new assessments. Almost every teacher designs tests under the most naive premise: "If I designed it and gave it, it must be valid and reliable." Yet we know from research, our own observations, and the process of peer review that few teacher–designed tests and assessments meet the most basic standards for technical credibility, intellectual defensibility, coherence with system goals, and fairness to students.

When we practice what we preach about self–assessment and adjustment against standards, we can ensure more rigorous and effective local teacher products, greater collegiality, and better student performance.

In standards–based reform projects, in short, we must seek a disinterested review of products against standards all along the way—not just follow a process in the hope that our work turns out well. The challenge for school reformers is to ensure that their work has impact, like any other performance. Desired effects must be designed in; they must inform all our work from the beginning. As with student performances, then, we will meet standards only by "backwards design"—making self assessment and peer review against performance standards central to the process of writing and revision—before it is too late.

Rather than teaching a lock–step process of design, we at CLASS teach faculties to see that design is always iterative. We constantly rethink our designs, using feedback based on clear design standards. We will likely never revisit our original designs if we lack powerful criteria and a review process with the implicit obligation to critique all work against the criteria. We are often satisfied with (and misled by) our effort and good intentions.

Assessment Design Standards

Standards–based reform work begins with clear standards for eventual products. At CLASS, we instruct faculties involved in performance–based assessment reform in the use of a design template, a design process, and a self–assessment and peer review process based on ultimate–product standards. In addition, we work with leaders to make such standards– based design work more routine in and central to local faculty life (linked to job descriptions, department meetings, and performance appraisal systems, as well as individual and team design work). The template is also the database structure for assessment tasks and rubrics on our World Wide Web site, http://www.classnj.org.

Does it measure what it says it measures? Is this a valid assessment of the intended achievement?
Are the scoring criteria and rubrics clear, descriptive, and explicitly related to district goals and standards?
Is the scoring system based on genuine standards and criteria, derived from analysis of credible models?
Does the task require a sophisticated understanding of required content?
Does the task require a high degree of intellectual skill and performance quality?
Does the task simulate or replicate authentic, messy, real–world challenges, contexts, and constraints faced by adult professionals, consumers, or citizens?
Does the scoring system enable a reliable yet adequately fine discrimination of degrees of work quality?
Is the task worthy of the time and energy required to complete it?
Is the task challenging—an appropriate stretch for students?

Naturally, in parallel to what we ask of students, there are rubrics for self– and peer–assessment of these questions.

Anticipating Key Design Difficulties

Figure 1. What Does Understanding Mean?

Complete the following sentence to help construct an authentic, credible performance assessment in any subject matter:

The students really understand (the idea, issue, theory, event being assessed) only when they can...

provide credible theories, models, or interpretations to explain...
avoid such common misunderstandings as...
make such fine, subtle distinctions as...
effectively interpret such ambiguous situations or language as...
explain the value or importance of...
critique...
see the plausibility of the “odd” view that...
empathize with...
critically question the commonly held view that...
invent...
recognize the prejudice within that...
question such strong personal but unexamined beliefs as...
accurately self-assess...

Peer Review

Besides improving the process of developing performance assessments, peer review can yield a profound result: the beginning of a truly professional relationship with colleagues. In CLASS projects, teachers have termed peer review one of the most satisfying (if initially scary) experiences in their careers. As a 32-year veteran teacher put it, "This is the kind of conversation I entered the profession to have, yet never had. I'm rejuvenated. I'm optimistic."

Peer reviewers serve as consultants to the designer, not glib judges. The process itself is evaluated against a basic criterion in support of that goal: The designer must feel that the design was understood and improved by the process, and the reviewers must feel that the process was insightful and team building. As the following guidelines reveal, the reviewers give specific, focused, and useful feedback: Stage 1: Peers review task without designer present. The designer states issues he or she wishes highlighted, self–assesses (optional), and then leaves. The peers read the materials, referring to the assessment design criteria. Working individually, the peers summarize the work's strengths and weaknesses and then report to the group. The group fills out a sheet summarizing the key feedback and guidance, thus rehearsing the oral report to follow. Reviewers rate the task against the task rubric, if appropriate.Stage 2: Peers discuss review with designer. Appointing a time keeper/facilitator is crucial. The facilitator's job is to gently but firmly ensure that the designer listens (instead of defending). First, the designer clarifies technical or logistical issues (without elaboration)—the design must stand by itself as much as possible. Second, the peers give oral feedback and guidance. Third, the group and the designer discuss the feedback; the designer takes notes and asks questions. Finally, the group decides what issues should be presented to the faculty as a whole—lessons learned and problems evoked.Criteria for peer review:1. The core of the discussion involves considering: To what extent is the "targeted achievement" well assessed? To what extent do the task and rubric meet the design criteria? What would make the assessment more valid, reliable, authentic, engaging, rigorous, fair, and feasible?2. The reviewers should be friendly, honest consultants. The designer's intent should be treated as a given (unless the unit's goal and means are unclear or lack rigor). The aim is to improve the designer's idea, not substitute it with the reviewers' aesthetic judgments, intellectual priorities, or pet designs.3. The designer asks for focused feedback in relation to specific design criteria, goals, or problems.4. The designer's job in the second session is primarily to listen, not explain, defend, or justify design decisions.5. The reviewers' job is first to give useful feedback (did the effect match the intent?), and only then, useful guidance.

Note that we distinguish here between feedback and guidance. The best feedback is highly specific and descriptive of how the performance met standards. Recall how often a music teacher or tennis coach provides a steady flow of feedback (Wiggins 1993). Feedback is not praise and blame or mere encouragement. Try becoming better at any performance if all you hear is "Nice effort!" or "You can do better" or "We didn't like it." Whatever the role or value of praise and dislike, they are not feedback: The information provided does not help you improve. In feedback and guidance, what matters is judging the design against criteria related to sound assessment. Peer reviewers are free to offer concrete guidance—suggestions on how the design might be improved—assuming the designer grasps and accepts the feedback.

Assessment System Criteria

Beyond reviewing specific performance tasks and rubrics, we need to evaluate entire assessment systems. For such systemic assessments, a more complex set of criteria includes credibility, technical soundness, usefulness, honesty, intellectual rigor, and fairness (Wiggins 1996).

Again, a key to credibility is disinterested judging—using known and intellectually defensible tasks and criteria—whether we are talking about student or faculty work. A psychometrician may well find a local assessment system not up to a rigid technical standard; but such a system can still be credible and effective within the real–world constraints of school time, talent, and budgets.

Credibility is a concern of the whole school community. We need other feedback—not just from peer reviewers, teacher–designers, or psychometricians, but from parents, school boards, college admissions officers, and legislators. Alas, what one group finds credible, another often doesn't. Clients for our information have differing needs and interests in the data; if we fail to consider these clients, our local assessment systems may be inadequate and provincial. But if we improperly mimic large–scale, audit testing methods in an effort to meet psychometric standards for local assessment design, we often develop assessment systems that are neither authentic nor effective as feedback.

Peer review should always consider the possible customers for the assessment information, to determine whether both the task and the reporting of results are apt and adequate (Wiggins 1996). The primary customer is always the student.

Principles Underlying the Standards and Criteria

When proposing standards and criteria for performance assessments, we need to remember—and clearly state—the underlying values of our proposals. Assessment is not merely a blind set of techniques, after all, but a means to some valued end. Effective and appropriate school assessment is based on five principles:

1. Reform focuses on the purpose, not merely the techniques, of assessment. Too many reform projects tamper with the technology of assessment without reconnecting with the purposes of assessment. Assessment must recapture essential educational aims: to help the student learn and to help the teacher instruct. All other needs, such as accountability testing and program evaluation, come second. Merely shifting from multiple–choice questions to performance testing changes nothing if we still rely on the rituals of year–end, secure, one–shot testing.

2. Students and teachers are entitled to a more instructional and user–friendly assessment system than provided by current systems and psychometric criteria. A deliberately instructional assessment makes sure that tests enlighten students about real–world intellectual tasks, criteria, context, and standards; and such an assessment is built to ensure user–friendly, powerful feedback. Conventional tests often prevent students from fully understanding and meeting their intellectual obligations. And teachers are entitled to an accountability system that facilitates better teaching.

3. Assessment is central, not peripheral, to instruction. We must design curriculums backwards from complex and exemplary challenges. A performance–based system integrates curriculum and assessment design, thereby making the sequence of work more coherent and meaningful from the learner's point of view.

4. Authentic tasks must anchor the assessment process, so that typical test questions play a properly subordinate role. Students must see what adults really do with their knowledge; and all students must learn what athletes already know—that performance is more than just the drill work that develops discrete knowledge and skill. Genuine tasks demand challenges that require good judgment, adaptiveness, and the habits of mind—such as craftsmanship and tolerance for ambiguity—never tested by simplistic test items.

5. In assessment, local is better. Site–level assessments must be of higher intellectual quality—more tightly linked to instruction—than superficial standardized tests can ever be. No externally run assessment can build the kind of local capacity for and interest in high–quality assessment at the heart of all genuine local improvement. But local assessment must be credible—and that means inviting disinterested assessment by people other than the student's teachers, and including oversight of the entire assessment design and implementation system (for case studies in assessment reform, see CLASS 1996).

By keeping these principles in mind, we can continually improve our reform work. Process–driven improvement efforts can become rigid and noncreative; we resort to following the letter of the law only. The real power of standards–based reform is that we are free to innovate and divert from process—if we see a better way to approach the standards and better honor our principles. Thus, our reform efforts, not just our designs, also demand constant self–assessment and self–adjustment, based on comparing emerging work against our principles.

Professionalism depends on standards–based work and peer review. Despite the long–standing habits of schools where teachers are left alone to design assessments, we believe that such practices are counterproductive to both local credibility and professional development. Every school and district ought to require peer review of major assessments, based on sound and agreed–upon standards and criteria of design and use.

References

•

Center on Learning, Assessment, and School Structure (CLASS). (1996). Measuring What Matters: The Case for Assessment Reform (video). Princeton, N.J.: CLASS.

•

Fairtest: National Center for Fair and Open Testing. (1995). Principles and Indicators for Student Assessment Systems. Cambridge, Mass.: Fairtest.

•

Gardner, H. (1992). The Unschooled Mind. New York: Basic Books.

•

Perkins, D. (1992). Smart Schools: Better Thinking and Learning in Every Child. New York: Free Press.

•

Wiggins, G. (1993). Assessing Student Performance: Exploring the Purpose and Limits of Testing. San Francisco: Jossey Bass.

•

Wiggins, G. (1996). "Honesty and Fairness: Toward Better Grading and Reporting," in Communicating Student Learning, edited by T. Guskey. 1996 ASCD Yearbook Alexandria, Va.: ASCD.

End Notes

•

1 For student performance tasks, too, rubric and task writers should emphasize impact–related criteria so that students know the purpose of the task. Thus, instead of just scoring for organization, clarity, and accuracy in essay writing, we should include criteria related to how persuasive and engaging the piece is.

•

4 Some may wonder about the utility or ethics of discussing the work in the designer's absence. We have found that this first stage gives the peers freedom to express vague concerns and complete criticisms. When the designer is always present, we find that the session bogs down as the designer justifies and explains all decisions.

•

5 Video and print material on the peer review process is available from CLASS.

•

6 Fairtest (1995) has developed standards and indicators for assessment processes and systems. Contact Fairtest at National Center for Fair & Open Testing, 342 Broadway, Cambridge, MA 02139. Phone: (617) 864-4810; fax: (617) 497-2224; e-mail: FairTest@aol.com.

•

Grant Wiggins (1950–2015) was president of Authentic Education, a consulting, research, and publishing company. He authored many books and was coauthor of Understanding by Design®, an award-winning framework for curriculum design that extolled the virtues of backward planning.

Wiggins, a nationally recognized assessment expert, worked on some of the most influential reform initiatives in the country, including Vermont's portfolio system and Ted Sizer's Coalition of Essential Schools. He consulted with schools, districts, and state education departments on a variety of reform matters, organized conferences and workshops, standards clarification, and developed print materials and web resources on curricular change.

UNDERSTANDING BY DESIGN® and UbD® are registered trademarks of Backward Design, LLC used under license.

Learn More