Quality Over Counting: Mindsets for Grading Reform

Change needs to be strategic, purposeful, and focused on making incremental shifts.

Reforming grading and reporting practices is challenging. I would know, because when I started teaching high school history decades ago, I was the zero guy. I was the late penalty, score-all-the-homework, no-second-chances guy. Thirteen years later, as I came to understand the principles of sound assessment, my mindset changed completely, and I've never looked back.

The work wasn't easy. The biggest hurdle I saw in schools that sought to create a new grading and reporting culture was getting a critical mass of colleagues to even be open to the conversation. Another hurdle is finding the most effective and efficient starting point. The standards movement of the 1990s should have ushered in a new approach to grading and reporting. The onset of curricular standards brought criterion-referencing to the forefront and, for the first time in many schools, made intended learning crystal clear. The groundwork had been laid for what should have a been a fairly seamless transition to a new grading paradigm, but the movement was more of a slow roll than a seismic shift in how student learning was reported.

Grading reform doesn't happen overnight; we aren't simply going to snap out of habitual practices. Change needs to be strategic, purposeful, and focused on making incremental shifts toward reaching the ideal. Short-term wins can add up to a seismic shift in grading and reporting.

The onset of the Covid-19 pandemic last spring forced schools to pivot to a virtual learning model, which immediately exposed the vulnerabilities of traditional grading and reporting models. The good news is that sound assessment and grading practices are universally transferrable to a remote, hybrid, or face-to-face learning environment.

The following six mindsets offer points of entry into the conversation about reshaping our grading and reporting routines, so grades become a clear reflection of student proficiency against curricular standards and not just an accumulation of points. Each teacher, school, or district has to take inventory of current norms and establish momentum behind a conversation that can easily get sidetracked by hidden agendas and sponsorship of the status quo.

1. Quality Over Counting

The current sets of standards in most schools have never been more sophisticated. Whether directly via the mandated standards (i.e. Common Core State Standards) or indirectly via education and advocacy groups (i.e. P21), there has been a collective call for students to think and learn at increasingly complex levels. As such, the demonstrations of that thinking and learning should also be more sophisticated. Grading practices that rely on counting the number of correct responses are, at best, incomplete; at worst, they're obsolete.

Take a look at the following standards samples from the Common Core and the Next Generation Science Standards:

Use data from a random sample to draw inferences about a population with an unknown characteristic of interest. Generate multiple samples (or simulated samples) of the same size to gauge the variation in estimates or predictions. (CCSS.MATH.CONTENT.7.SP.A.2)
Construct an explanation based on evidence for how geoscience processes have changed Earth's surface at varying time and spatial scales. (MS-ESS2-2 Earth's Systems)

The verbs in the above standards are not binary choices; rather, they are conducive to examining the quality of student learning in ways that increase in sophistication.

Fewer, more distinguishable levels are far more reliable than a 0-100 scale (Brookhart & Guskey, 2019). Describing performances along a scale of novice, approaching, competent, and sophisticated allows teachers to capture a continual uptick in quality. Using a 0-100 scale makes indistinguishable by score alone whether incorrect responses were the result of a misunderstanding, a simple mistake, or an absence of response altogether.

2. Criteria Over Comparisons

Emphasizing quality leads to the establishment of clear success criteria. Traditional grading often emphasizes a process where students are judged in comparison to their peers. When gradations of quality are articulated as clear success criteria (usually via a rubric), teachers are well-positioned to judge each student's performance on its own merits.

Success criteria needs to be clear, transparent, and substantive (Brookhart, 2007); it should provide students with a clear vision of what success against the standards looks like along those few gradations. Clear criteria also provide a clearer pathway to success via formative assessment opportunities and more accurate self- and peer assessment opportunities (Brown & Harris, 2013).

3. Standards Over Tasks

One way educators fracture the relationship between teaching and reporting is how we organize evidence of leaning. If we teach to standards, why continue to organize gradebooks by task types (tests, quizzes, assignments, projects, labs)? Standards-based report cards need standards-based grades, which come from standards-based evidence. This evidence emerges from standards-based assessments that are aligned to standards-based instruction. To organize evidence by tasks is to ultimately splinter the evidence of learning. Standards have never been organized by task types, so our gradebooks shouldn't either.

Standards are organized by strands, categories, or domains. English Language Arts typically has Reading, Writing, Speaking and Listening, and Language Development; Math has Geometry, Ratios and Proportions, Expressions and Equations, and Statistics and Probability. Even the National Core Arts Standards are organized into four categories: Creating, Performing/Presenting/Producing, Responding, and Connecting.

Taking advantage of the way standards are organized is both simple and complex. The organization is ready-made, but the reorganization of evidence and the gradebook is more difficult. The following science and engineering practices, as an example, could be synthesized into the following four reporting categories:

Quality Over Counting: Mindsets for Grading Reform

Science and Engineering Practices	Categories
Asking questions; planning and carrying out investigations	Planning Investigations
Analyzing and interpreting data; using mathematics and computational thinking	Critical Thinking through Computation
Engaging in argument from evidence; obtaining, evaluating, and communicating information	Engaging in Argument
Constructing explanations; developing and using models	Scientific Explanations through Models

This organization is predicated on the teacher deciding that the science and engineering practices would be the evidentiary focal point. The point is to ensure that the organization of evidence enhances the seamlessness between what a teacher knows to be true of the learners and what they ultimately end up reporting.

4. Accuracy Over Leverage

Using grades to coerce behavioral compliance with promises (extra credit) or threats (late penalties) is a hallmark of traditional grading. The inclusion of non-achievement factors would bring the validity of what teachers report into question (Brookhart, 2013). Including how students learn (responsibility, work ethic, effort, attitude) with what they've learned results in grades that are, at best, opaque; at worst, they are meaningless.

One of the biggest misunderstandings of standards-based grading is that the non-achievement factors don't matter; they do. Achievement grades are the reason students will ultimately gain entry into college; their habits of learning are the reason they will graduate from college. It is not okay for students to turn work in late. But it's equally not okay to distort achievement levels as a result of lateness. Given current remote or hybrid learning models, the observation of these non-achievement factors has become increasingly more complex; having any of them contribute at all to a student's achievement grade would be inequitable and even unethical.

Eliminating the traditional punitive practices can't happen in a vacuum. School and district leaders have to think about redefining student accountability by creating replacement routines where students are held accountable in a timely manner without distorting achievement levels. The transition to more effective practices must be paired with the creation of new systems (predictable routines) so that teachers making this shift feel supported in knowing what process to follow should students not follow through on their responsibilities.

5. Learning Over Time

So many of our grading practices have sent the subversive message of speed; success in the traditional mean averaging system is contingent upon early success. We must emphasize whether a student learns over when they learn. There are specific time constructs to the school year and grading periods, but we also know that some students need longer to learn. Any assessment and grading system that disadvantages students who didn't learn fast enough is unfair and inequitable.

If some students need longer to learn and the material is important (which it should be), then reassessment is crucial. Reassessment allows teachers to make the subtle shift to grading the end, allowing grades to represent students' current level of understanding, regardless of how low or slow the start.

Emphasizing standards over tasks connects nicely here, as teachers, when focused on standards, can see where they are assessing the same standards multiple times on different tasks (i.e. a quiz and one section of a unit test). It may be necessary to duplicate some assessments where this natural reoccurrence is not apparent (it's not an either-or proposition), but teachers can start by recognizing where it already happens.

This can lead teachers to utilize the most or more recent evidence to determine student proficiency. Once a student knows a binary standard (right/wrong, can/can't, yes/no), it is irrelevant that they used to not know it. Other times teachers may use the more recent evidence when learning standards are more complex and one demonstration would be an insufficient sample of evidence.

6. Practice Over Points

The relationship between practice and games or rehearsal and performance is self-evident. This same relationship can exist between the formative and summative purposes of assessment, which can be mutually supportive rather than conflicting (Black, 2013). The increased awareness and use of formative assessment as an essential instructional strategy has surfaced an ever-existing dichotomy between the formative (initiating learning and providing feedback and next steps) and summative (verifying learning) purposes. The challenge is that it is difficult to do both simultaneously as grades, scores, or levels can often interfere with a student's willingness to keep learning (Wiliam, 2011).

Knowing that some students need longer to learn suggests using early evidence of learning as practice, where the focus is on feedback and what's next. Grading early on is problematic for two reasons: 1) It disadvantages those who need longer to learn and 2) the early evidence may not reach the full cognitive complexity of the standards; this would misrepresent proficiency.

The irony of shifting our grading mindsets is that ideally, teachers will do less grading. As teachers emphasize practice, they can begin to empower a culture of learning that finally sheds the practice of harvesting points. After all, no kindergarten student ever asks their teacher, "Are you grading this?" Ideally, this is a question no student at any grade-level would ask.

Working Inside-Out

Reforming our grading practices is an inside-out process that begins with rethinking what grades are and what they are supposed to communicate. Begin where there are already hints of an agreeable mindset, or where there is consensus of one pressing aspect of reform. Thoughts are a precursor to change; go there in mind and you will go there in action.

References

•

Black, P. (2013). Formative and summative aspects of assessment: Theoretical and research foundations in the context of pedagogy. In J. H. McMillan (Ed.), SAGE handbook of research on classroom assessment (pp. 167–178). Thousand Oaks, CA: SAGE.

•

Brookhart, S. (2007). Expanding views about formative classroom assessment: A review of the literature. J. H. McMillan (Ed.), Formative classroom assessment: Theory into practice. New York: Teachers College Press.

•

Brookhart, S. M. (2013). Grading. In J. H. McMillan (Ed.), SAGE handbook of research on classroom assessment (pp. 257–271). Thousand Oaks, CA: SAGE.

•

Brown, G. T. L., & Harris, L. R. (2013). Student self-assessment. In J. H. McMillan (Ed.), SAGE handbook of research on classroom assessment (pp. 367–393). Thousand Oaks, CA: SAGE.

•

Guskey, T. R. & Brookhart, S. M (2019). Reliability in grading and grading scales. In T.R. Guskey & S. M. Brookhart (Ed.), What we know about grading (pp. 13-31). Alexandria, VA: ASCD.

•

NGSS Lead States. 2013. Next Generation Science Standards: For States, By States. Washington, DC: The National Academies Press.

•

Wiliam, D. (2011). Embedded formative assessment. Bloomington, IN: Solution Tree Press.

Tom Schimmer is an education author, keynote speaker, and consultant from Vancouver, B.C. He has worked with schools and districts around the world in the areas of assessment, grading & reporting, leadership, and RTI. He is the author of six books, including Grading from the Inside Out (Solution Tree, 2016).