U.S. education has a long history of testing. We can probably agree, although risking oversimplification, that what gets tested in our schools is what gets taught. Large-scale standardized tests, such as the SAT, have primarily emphasized reading and math. Looming before us now, however, is the addition of writing as something to be tested—as something that counts.
The cover of a recent issue of Time proclaimed that the SAT's new writing section—which requires students to write brief essays—will profoundly affect what happens in our classrooms concerning writing instruction and assessment. This public discussion of changes in the SAT comes on the heels of two major reports: the National Center for Education Statistics 2002 report on National Assessment of Educational Progress (NAEP) data concerning student writing achievement; and the College Board's report, conducted through its National Commission on Writing in America's Schools and Colleges, entitled The Neglected “R” (2003).
We should be highly critical of what shapes curriculum, instruction, and assessment in our classrooms. Public schools are frantically treading water in the rising tide of federal and state mandates driven by standards and high-stakes testing; teachers and students alike are drowning in numbers. Most of that legislation ensures that a large percentage of schools appear to be failures and a large number of teachers appear to be underqualified—not in an effort to improve public education, but to discredit it in the public eye.
The impending changes to the SAT and proclamations regarding the 2002 NAEP writing test results reveal another significant fact: that the decisions made in the design, implementation, and scoring of standardized commercial tests indirectly dictate curriculum, instruction, and assessment in our schools. Yes, NAEP writing data from 2002 and the next manifestation of the SAT in 2005 do matter in our quest to reform classroom writing instruction—but not in ways that politicians, “educrats,” and the media discuss.
How We Assess = How We Teach
All high-stakes standardized testing is limited in what it offers teachers and students. Moreover, it has a consistently detrimental impact on teaching and learning in the classroom (Abrams & Madaus, 2003). Standardized commercial tests provide narrow data on student ability and learning and distorted data on school and statewide education quality. Such tests mandate teacher practices at the expense of best practice supported by decades of research (Hillocks, 2003; Mabry, 1999).
During a discussion of this phenomenon, a teacher in a graduate course admitted that because of standardized writing assessment she felt obligated to offer her students an essay plan that prescribed their writing in several ways: Students should write a three-sentence introduction. They should restate the writing prompt in the first or second sentence; the third sentence should be a traditional thesis that establishes three points in the order in which they will appear in the body of the essay. Three body paragraphs follow, which adhere to a formula as well: These paragraphs should begin with a topic sentence, followed by four or five sentences of discussion. One of these sentences should be a compound sentence; the final sentence in the paragraph should connect to the original thesis. The conclusion of the essay should restate the thesis and the writing prompt.
As Hillocks discovered, state assessment of writing has revitalized the traditional five-paragraph essay at the expense of authentic expression. Those who legislate the running of schools prescribe what students should learn, quantifying all learning through the most narrow and fragmented—although statistically manageable—means of assessment available.
Writing—like art, band, chorus, or athletics—serves as a vivid example of the inadequacy of standardized, multiple-choice, high-stakes testing in measuring whole activities. Popham (2001), joined by a number of other researchers (Abrams & Madaus, 2003; Coles, 2000; Kohn, 2000), explains that standardized testing—best typified by the traditional SAT—has inherent weaknesses in terms of gathering and interpreting data that reflect both an individual student's knowledge and a specific group of students' achievement. Popham argues that standardized testing fails when it becomes the curriculum—resulting in teaching to the test—and when test scores become the sole criterion for awarding students credit or promotions and for judging the quality of schools or statewide school systems.
Hillocks (2003), Coles (2000), Freedman (1995), and Mabry (1999) offer a solid core of evidence that high-stakes standardized testing is harmful to reading and writing instruction and to student achievement. Traditional, multiple-choice exams reduce reading and writing to fragmented and inauthentic worksheet activities, with students reading merely to choose A, B, C, D, or E. Students do not actually compose on these kinds of “writing” exams but rather scan other people's sentences for errors in grammar, mechanics, and usage. When reading and writing instruction and classroom assessment begin to mirror high-stakes tests such as these, they reduce the amount of time students actually read and compose in any holistic or authentic way.
Even when our state-designed or commercial assessments move to what appear to be more open-ended forms of testing, in which students do compose original essays (as in many state assessments and in the new SAT), these tests produce the same negative impact on instruction, assessment, and learning as the more often maligned, selected-response tests do (Freedman, 1995; Hillocks, 2003; Mabry, 1999). The rubrics and sample essays provided by the test designers become the curriculum. Students are trained to write as the SAT mandates.
We can anticipate that student writing will be prompt-driven and that computer scoring—which will be implemented with the SAT—will have the same problems associated with the spelling and grammar functions of most word-processing programs. Computers can quickly identify many surface features of writing, such as passive voice and sentence length (one popular program flags all sentences longer than 60 words). These programs also flag as incorrect such sentence beginnings as “And,” “But,” or “Hopefully.” Such assumptions about quality writing will influence writing instruction by implying that these superficial criteria are actually valid ways to assess writing; yet we find passive voice, sentences in excess of 60 words, and sentences that begin with “And” in the best of published writing.
What computer grading cannot assess is the appropriateness of writers' choices and the quality of writers' ideas. Only authentic practices that lead to expert teachers assessing student work can help students become more effective writers. And that will not happen as a result of standardized testing.
Red Flags Ahead
The new writing section in the SAT may seem wonderful at first blush to those who teach students to write and who advocate authentic assessment. But a closer look shows that the SAT will also include an isolated grammar section, thus creating a perceived need for isolated grammar instruction and implying that isolated grammar instruction is writing instruction. Both misconceptions have been refuted for decades by experts (Weaver, 1996; Williams, 1990).
The NAEP writing data have led to public hand-wringing about the writing abilities of students and the quality of writing instruction in our English and language arts classrooms, but few people—if any—have raised two important issues about the data and the ensuing debate. First, does the NAEP test itself accurately reflect student writing ability or simply the mandates of large-scale testing? Second, should we test student writing in one-shot, standardized formats to begin with? The answer to both questions is no. Hillocks (2003) and Mabry (1999) show that when assessment rubrics and sample essays become templates for students to follow, we lose any chance of achieving authentic or valuable writing instruction. When a test wields power like this, teachers abdicate their expertise as writing instructors.
The 2002 NAEP writing test does offer educators some important red flags, however, that could lead us out of the black hole of measurement and standardized testing. Those red flags are large and waving frantically. First, there is danger in data overload: Can all those numbers and disaggregations possibly do teachers any good? Can holistic acts be reduced to numbers in effective and meaningful ways? Can we teach any given student more effectively if we know that 4th grade Asian/Pacific Islander students scored, on average, 167 on the 2002 NAEP writing test? Our test mania has made testing an act of sorting instead of a means to more effective teaching and learning.
For example, saying that “Jessica scored 162 on the NAEP 2004 writing exam, which is 2 points below the average for 8th grade girls across the United States” contributes little to improving student learning. Effective writing assessment requires an effective evaluator who is experienced with the growth of the student being assessed and who can describe in a detailed and specific way the student's writing: “Jessica's personal narrative includes a number of specific verbs—such as ‘peppered,’ ‘tossed,’ and ‘mumbled’—that give her most recent piece a stronger voice than her earlier works.” This kind of assessment does contribute to improving student learning.
Another valuable lesson to garner from the NAEP writing test concerns flawed writing prompts. For example, the grade 4 writing prompt asks students to look at two surrealistic drawings and write a story. Not only does it assume certain cultural and social experiences on the part of the student, but it also asks the student to write fiction. The prompt virtually guarantees that the writing sample is more a reflection of the student's cognitive development and culture than of his or her writing ability. For these reasons, grade 4 data do not reflect student writing ability alone.
A Call to Arms
Call for legislation at the federal and state levels that ensures that no standardized test stands as the sole factor considered in making an education decision concerning a student, a school, or a school system. Standardized test scores should play a role—not the role—in policy and decision making. We must demand that all final decisions about achievement be left in the hands of educators. People, not numbers, must oversee the fate of students—especially in such holistic acts as writing.
Help create and implement authentic assessment strategies for writing that contribute to the call for high standards and accountability (Brozo & Hargis, 2003; Popham, 2003). This means rejecting computer-only grading of student writing. A computer can never replace the human element in writing assessment.
Clearly define what matters in writing in terms of learning and student performances of learning. Best practice in writing instruction should address the so-called isolated skills of writing (such as grammar and mechanics), but educators must deliver this instruction within the context of each student's own writing.
Build a broad community of expert writing instructors and ensure effective teacher training in writing instruction. Teachers who write themselves should be teaching writing instruction to teachers. Students should write daily by choice and with purpose.
A call to reform classroom writing instruction must come from the inside, from practicing writing instructors who are experts in their field. Reform should not be driven by data from standardized tests or by a desire to use computers to facilitate our jobs. Learning to write is not a formula that can be imprinted on each student but rather an act of discovery that classroom writing instruction must support.