A classroom-based assessment project yields insights about the potential of performance assessment to redirect instruction.
Two years ago, when a group of researchers at the University of Colorado at Boulder began working closely with 3rd grade teachers in a classroom performance assessment project, our intention was not to introduce an already-developed curriculum and assessment package. Rather, we wanted to help teachers develop (or select) performance assessments in reading and mathematics congruent with their own instructional goals.
Our research team agreed with the Resnicks (1992) and Wiggins (1989) that introducing performance assessments aimed at thinking and problem-solving goals could be an important inducement for instructional improvement. We disagreed, however, that high-stakes consequences should be used to leverage change. Even authentic measures are corruptible and, when practiced for, can distort curriculum and undermine professional autonomy. In our study, we were interested in a bottom-up approach where teachers tried out performance measures in the context of classroom instruction.
The Assessment Project
Our study was conducted in a school district that serves a lower-and middle-class population on the outskirts of Denver. We worked with teams of 3rd grade teachers in three schools. After a preliminary workshop attended by representatives of 10 schools, school teams submitted proposals indicating their desire to participate and were accepted for the project.
To establish project goals, we met with the 14 teachers in the spring and at the start of the 1992–93 school year. Once the year began, we got together every week for after-school workshops, alternating between reading and mathematics so that subject-matter specialists could rotate among schools.
In reading, teachers selected meaning-making and fluency as instructional goals to assess. They used “running-records” to assess fluency, especially for below-grade level readers. This meant listening to a child read individually and recording any misread words, with attention to the types of errors made. To assess comprehension, teachers asked students to write summaries. Because most students could not write adequate summaries at first, workshop discussions in the fall emphasized effective teaching strategies for written summaries as well as scoring issues. Later on, in spring 1993, teachers worked to extend ideas about meaning-making and written summaries to expository texts, which had not previously been a part of the 3rd grade reading curriculum.
In mathematics, teachers identified place value, addition, subtraction, and multiplication as key instructional goals. Throughout the year, they made extensive requests for materials and ideas for teaching in ways suggested by the new mathematics curriculum. They also wanted to include new topics such as geometry and probability. In their classrooms, teachers used open-ended and nonroutine problems (like those in fig. 1), along with hands-on materials for modeling problems. Some teachers had not previously worked with manipulatives or place-value “mats” (laminated boards with columns for 100s, 10s, and 1s) and used them for the first time. Our discussions at the biweekly meetings focused on using good problems interchangeably for instruction and assessment, making observations and how to keep track of them, analyzing student work, and developing rubrics for scoring problem solving and explanations.
Figure 1. Sample Open-Ended Math Problems Used for Instruction and Assessment in Project Classrooms
Find the missing digits:
What if your class were playing a game and your teacher gave you these numbers: 4, 6, 7, 2. How would you put the numbers in the boxes to make the largest possible answer?
You have 24 square tiles. How many different rectangles can you make, using all of the tiles for each one? Draw each rectangle on graph paper, label each side, and write the multiplication fact it shows.
What would you tell someone about multiplying by 1?
Gina says that when she can't remember a multiplication fact like 9 x 3, she turns it around to 3 x 9, and often she remembers that one. Can she do that? What would you tell Gina?
Joe had 5 sheets of paper. Each sheet had 4 large circles on it. In each circle, there were 2 stars. How many stars were there in all? Draw a picture to show how you got your answer.
I have 17 wheels, and each one is on a bike or a trike. How many bikes and trikes do I have? Is there just one answer? Explain how you did the problem.
The research part of the project included baseline and follow-up measures of student learning in both participating and matched control schools, and in interviews with parents and students. Our research team gathered data about changes in instruction and assessment practices through teacher interviews, transcripts from the workshops, samples of student papers, scoring rubrics, and the like.
Before turning to project difficulties and accomplishments, a word of caution is needed. While generalizations describing trends are the most useful to policymakers and teachers in other schools and districts, generalizations about either successes or difficulties do not necessarily represent the experiences of individual teachers. In fact, the 14 project teachers represented a tremendous range of teaching styles and abilities. A few were already teaching in ways that coincided with the new curriculum frameworks; some pursued traditional content but were excellent classroom managers and challenged us intellectually about why we thought explanations and open-ended problems were a good thing. A few teachers participated in the project to the minimum extent possible and therefore were affected little by either the struggles or the successes.
The Struggles
Despite the successes I note below, the assessment project did not encounter only smooth sailing. I report these difficulties because they are very relevant to making practical suggestions for staff development and to informing policy discussion nationally about school delivery and opportunity-to-learn standards.
One of the first things we discovered was that not all 14 teachers in the project were true volunteers. Some had been implicitly volunteered by their principals or gently coerced by colleagues. In addition, we were unprepared for the large conceptual differences between us and many of the teachers. Because teams had volunteered after hearing our project rationale, and because the district had newly developed curriculum frameworks consistent with emerging national standards in literacy and mathematics, we assumed that teachers' views about instruction would be similar to those reflected in the curriculum. In fact, even some teachers who were energetic project participants were happy with the use of basal readers and chapter tests in the math book, and were not necessarily familiar with curricular shifts implied by the new district framework in mathematics.
For their part, teachers were equally disconcerted by project demands they hadn't bargained for. Yes, everyone knew about the weekly workshops. But trying out assessments in classrooms had enormous implications. Early in the project, teachers complained constantly about time. They didn't have enough time to do extra planning, to meet as a team to talk about scoring, or to do the scoring (of reading summaries and math explanations). Most of all, there wasn't enough time in the instructional day to teach reading both the old way and the new way and to add new math problems to their current routines. For some, a commitment to whole-class instruction made it impossible to manage time for individual assessments of below-grade level readers.
We resolved problems about time in different ways. Instead of expecting that teachers would try out both reading and math activities in their classrooms every week, we slowed the pace by moving to a schedule of alternating weeks. We also modeled the use of running records during regular reading groups and discussed management strategies to make time for individual assessments of students most in need. We arranged university course credit, which had not originally been planned, and teachers used their usual team strategies to share the load. For example, in one school, each member of the team developed a center for a unit on probability. The district agreed to help by providing a half-day release time per month.
By January, some of the tensions about time had dissipated, perhaps because teachers were more comfortable incorporating new activities in place of the old. For other teachers, competition between instructional time and time spent on assessment increased as they struggled to keep written records of observations of students (Borko et al., in preparation). Other time pressures also increased as teachers caught on to the ideas of the project and needed more time to think and plan. For example, in February, one team used an extra district inservice day to review the new math frameworks and decide “what the kids needed to know and how we were going to teach it.” In the words of one teacher, “That's when we really sat down and started on this process” (not in September when the project nominally began).
Dissonance between researchers' and teachers' views about subject matter instruction was sometimes acknowledged and joked about in workshops at two of the schools, but for the most part researchers avoided confrontations about differences in beliefs and did not propose radical changes in instruction. We suggested reading and mathematics activities that departed from a strictly skills-based approach, and teachers adopted or adapted them as they saw fit (Flexer et al. 1994). We remained adamant about refusing to include timed tests on math facts as part of project portfolios, but given implied school policies and pressure from teachers in other grades, timed tests continued to coexist with project activities.
In other areas where we did not confront differences, project-derived activities sometimes took on a character we would not have endorsed. For example, for some teachers, assessment of writing summaries moved away from being a measure of how well a student understood the gist of a story and became more and more focused on features of the written product (Borko et al. 1994). When interviewed, some children said that a good summary meant that handwriting was neat and all the words were spelled right (Davinroy et al. 1994). In math, teachers sometimes made problems easier or taught specific strategies for getting the answer that reduced the conceptual challenge of the problem.
The Successes
The majority of teachers in our project were effective teachers. When faced with new content, they did what good teachers do: invent ways to help their students master the new material.
For example, when first asked to write summaries of what they had read, the 3rd graders tended to produce long lists of events, instead of focusing on the main points of the story or problem resolution. In our workshop conversations, teachers concluded that helping students get better at summaries was worthwhile because it would help them pay attention to what was important in a story and because writing well-organized summaries was a good writing task as well.
Teachers involved their students in the development of scoring criteria and in grading one another's summaries. When children were at first undiscriminating in their scoring (everyone got a 3 on the 4-point scale), one teacher prompted a more thoughtful class discussion about “what makes a good summary” by writing some bad summaries for the kids to score and then comparing these to summaries on book jackets and in advertisements. Teachers had their entire classes read the same story and then, as a group, develop a summary. The class-authored summaries elicited debate, suggestion-by-suggestion, as to the proper order of points and whether or not specific details needed to be included. Eventually students got much better at writing summaries as a result of their teachers' effective use of modeling and class discussions about using scoring rubrics.
Teachers were also willing to try out a wide array of hands-on, problem-based activities in mathematics. One team invented its own activities for geometry and probability, and it seemed to us that teachers were more inventive in devising challenging problems for kids in these new content areas. One school had used Marilyn Burns's (1991) five-week multiplication program the year before, another school team decided to try it, and a third school used it in conjunction with help from a district math specialist. As one teacher explained, having these materials was especially useful because Burns had already thought through the process and organized feasible instructional activities and accompanying assessments. This left teachers free to focus on implementation and their students' responses.
By the end of the year, most of the teachers were using math activities more closely aligned with the NCTM Standards (1989) to replace and supplement more traditional practices of text-based work. They had also extended the range of mathematical challenges for 3rd graders and raised their own expectations (Flexer et al. 1994). In end-of-year interviews, teachers told us that students now had a clearer understanding of why teachers graded papers the way they did (grading was less mysterious); and in both reading and math, teachers felt they had greater knowledge about what their students could do.
For many teachers, these successes continued into the next year. In two schools, by mutual agreement, we continued to meet with teachers on a less demanding schedule. In the spring of the second year, teachers' comments at an inservice given by one team for other teachers in their school showed a thorough understanding of project issues and ownership of original project aims: Teacher 1: The CU [University of Colorado at Boulder] people felt that if you taught the strategy, that was just like teaching an algorithm. So we wanted the kids to pull their own resources from their head. Each day we gave them problems that would be different and use a different strategy. We didn't want them to think that yesterday we used a table and then look at a problem today and automatically make a table....During my debriefing time on the following day, I wrote on poster paper three different ways that three different children solved the problem to show them that there is more than one way to solve the problem.Teacher 2: Confession time here. I remember five years ago, three years ago, we lived with that textbook. Well, we haven't used our math book too much this year. Maybe a little bit as a resource, but it's not like, page 36 today, 38 tomorrow, and the next day is more on 40.... So what I'm going to show you are just some examples that two years ago I would have said, “No way. Third graders cannot handle this.” But it's amazing, they can when they're exposed to it.
Teachers eventually developed greater sophistication about scoring criteria. They also revisited problematic assessment issues, such as the difficulties some children had with writing summaries and explanations in math. (Indeed, the problem of writing and language skills having an undue influence in more authentic assessments is an issue nationwide.) Initially, some teachers allowed specific students to show their thinking in alternative ways—for example, through oral retellings or by explaining a picture in math. At the same time, however, scoring rules for some teachers included spelling and handwriting and other features of the written product instead of focusing on comprehension or mathematical thinking.
After a year, teachers were much more aware that scoring rules should depend on what you were scoring for (the intended construct, in measurement terms). By working with expository text, teachers were able to refocus attention on the purpose of summarizing (Davinroy and Hiebert 1994). They also became clearer about the multiple dimensions buried in their scoring rubrics—a tension that some resolved by giving two scores; for example, one for reading comprehension and one for writing, or one for the explanation and one for getting the right answer. Rubrics in math also shifted to focus more on the reasonableness of the answer and explanation rather than the accuracy of the calculation.
Last but not least, project successes also included appreciable gains in student learning in mathematics (Shepard et al. 1994). Independent outcome measures showed no change in reading, possibly because both participating and control schools were further along in implementing new language arts curriculums at the start of the project than they were in math. The specific gains in mathematics could be tied directly to project activities. Especially apparent were students' abilities to recognize and extend patterns and to write mathematical explanations.
Figure 2 shows a student's answer to one segment of a problem from a Maryland math assessment. Step 4 in the problem involves computing how many pitchers are needed for 46 cups of lemonade, given the information that a one-quart pitcher holds 4 cups and that 2 one-quart pitchers hold 8 cups. At the top end of the distribution, in the participating schools, many more students could give a complete answer after the project than could do so in the baseline year. More important, demonstrable gains occurred at the low end of the distribution as well.
Figure 2. Sample Student Response to One Step from a Maryland Math Assessment Task
In the baseline year, most low-scoring students could not fill out the table in Step 4 and could not write an explanation. After the project year, low-performing students in the participating schools most frequently gave wrong answers of either 15 or 60; nonetheless, they completed the table, and gave explanations with real mathematical content (for which they received partial credit). For example: I counted by fours which is 60, then I went in the ones which is 15.On the cups as you go along you count four more each time.
Implications for Staff Development
What implications for staff development can be drawn from our classroom-based assessment project? Current calls for assessment-driven reform acknowledge the need for staff development but tend to underestimate the extent and depth of what is needed. While teaching toward open-ended tasks might be an immediate improvement over worksheets designed to mimic standardized tests, our experience shows that well-intentioned efforts to help kids improve at assessment tasks can be misdirected if teachers do not understand the philosophical and conceptual bases of the intended curricular goals: “Why should our kids have to write explanations, if we already know whether they know it or not?”
Losing track of the purpose of written summaries as an assessment of meaning-making, and oversimplifying mathematical problem solving are examples of how the original purpose of assessment tasks can be distorted. Through ongoing project discussions, teachers eventually recognized that these issues were problematic and addressed them with deepening understanding. It is hard to imagine how teachers could have gained such detailed project insights in a one-time inservice session or on their own, if external assessments were the only mechanism for instructional reform.
appropriate materials to try out and adapt;
time to reflect and to develop new instructional approaches; and
ongoing support from experts to learn (and challenge) the conceptual bases behind intended reforms.
The need for materials poses several interesting dilemmas. Professional, autonomous teachers do not need canned curriculum packages or scripted lessons. If we want teachers to try significantly different content and modes of instruction, however, the teachers in our project would argue that they have neither the time nor the know-how (initially) to invent their own materials. Even having an abundant supply of materials in the curriculum library did not help because teachers had no time to review them, nor did they know which were good and which were not. What worked best in our project was for us to supply good examples in response to teacher-identified topics. Teachers then extended the examples and invented entire instructional units.
Just as constructivist pedagogy would allow students the opportunity to develop their own understandings, teachers need the opportunity to try new instructional strategies, observe what works and what doesn't, and then talk with colleagues about both logistics and underlying rationale. The teachers in our study learned the most by trying new, challenging content with their students and by being surprised by what their students could do.
If teachers are unfamiliar with new curriculum expectations, they cannot be expected to appreciate from first-time tellings why making connections and communication are important mathematical goals, or what the proper place of skill instruction should be in developing literacy. These are the kinds of questions that are best returned to and debated with specialists after teachers have had some first-hand experience with new content in their own classrooms. It is perhaps not surprising that as a university researcher, I make a place for talking with “experts.” Experts, however, include curriculum specialists and lead teachers in school districts who have a thorough understanding of the conceptual basis behind content standards and curriculum frameworks.
The ongoing need to talk to experts is illustrated by the following example. After attending a district inservice on assessment at the start of the second school year, one team of teachers asked why we had never told them about scoring for more than one dimension. The answer was we had “told them” (one school was already using double scoring), but expert advice had not made sense until teachers had gained relevant experience with that issue in their own classrooms.
The successes of our assessment project support the claims of assessment reform advocates, albeit on a much more modest and tentative scale. Performance assessments have great potential for redirecting instruction toward more challenging and appropriate learning goals. Open-ended assessment tasks not only prompted teachers to teach differently, but criteria were made explicit, and children learned more. The concomitant struggles, however, give the lie to the presumption that new assessments will automatically improve instruction. If teachers are being asked to make fundamental changes in what they teach and how they teach it, then they need sustained support to try out new practices, learn the new theory, and make it their own.
References
•
Borko, H., K. H. Davinroy, M. D. Flory, and G. H. Hiebert. (1994). “Teachers' Knowledge and Beliefs about Summary as a Component of Reading.” In Beliefs about Texts and Instruction with Text, edited by R. Garner and P. A. Alexander, pp. 155–182. Hillsdale, N.J.: Lawrence Erlbaum Associates.
•
Borko, H., V. Mayfield, S. F. Marion, R. Flexer, and K. Cumbo. (In preparation). “Teachers' Developing Ideas and Practices about Mathematics Performance Assessment: Successes, Stumbling Blocks, and Implications for Professional Development.”
•
Burns, M. (1991). Math by All Means: Multiplication Grade 3. Sausalito, Calif.: The Math Solution Publications.
•
Davinroy, K. H., C. L. Bliem, and V. Mayfield. (April 1994). “`How Does My Teacher Know What I Know?': Third-Graders Perceptions of Math, Reading, and Assessment.” Paper presented at the annual meeting of the American Educational Research Association, New Orleans.
•
Davinroy, K. H., and E. H. Hiebert. (1994). “An Examination of Teachers' Thinking about Assessment of Expository Text.” In NRC Yearbook, edited by D. J. Leu and C. K. Kinzer. Chicago: National Reading Conference.
•
Flexer, R. J., K. Cumbo, H. Borko, V. Mayfield, and S. F. Marion. (April 1994). “How `Messing About' with Performance Assessment in Mathematics Affects What Happens in Classrooms.” Paper presented at the annual meeting of the American Educational Research Association, New Orleans.
•
National Council of Teachers of Mathematics. (1989). Curriculum and Evaluation Standards for School Mathematics. Reston, Va.: NCTM.
•
Resnick, L. B., and D. P. Resnick. (1992). “Assessing the Thinking Curriculum: New Tools for Educational Reform.” In Changing Assessments: Alternative Views of Aptitude, Achievement, and Instruction, edited by B. R. Gifford and M. C. O'Connor. Boston: Kluwer Academic Publishers.
•
Shepard, L. A., R. J. Flexer, E. H. Hiebert, S. F. Marion, V. Mayfield, and T. J. Weston. (April 1994). “Effects of Introducing Classroom Performance Assessments on Student Learning.” Paper presented at the annual meeting of the American Educational Research Association, New Orleans.
•
Wiggins, G. (1989). “Teaching to the (Authentic) Test.” Educational Leadership 46, 7: 41–47.
End Notes
•
1 University of Colorado faculty researchers included Roberta Flexer, a specialist in mathematics education; Elfrieda Hiebert, a specialist in reading; Hilda Borko, whose specialty is teacher change; and Lorrie Shepard, an assessment specialist.
•
2 To ensure that teachers were free from the worry of preparing students for the Comprehensive Test of Basis Skills (CTBS), we obtained a two-year waiver from standardized testing from the state. Obtaining the waiver required a host of approvals from district officials, the teachers union, and each school's parent accountability committee.
ASCD is a community dedicated to educators' professional growth and well-being.