HomepageISTEEdSurge
Skip to content
ascd logo

Log in to Witsby: ASCD’s Next-Generation Professional Learning and Credentialing Platform
Join ASCD
November 1, 2012
Vol. 70
No. 3

Art and Science of Teaching / Reducing Error in Teacher Observation Scores

author avatar

Given current trends in teacher evaluation, one of teachers' main concerns relates to the accuracy of their scores. They have the right to be concerned, given the low reliabilities commonly reported in studies of various observation systems (Bill and Melinda Gates Foundation, 2011). Error is inherent in any type of observation system. Indeed, error is inherent in any type of measurement.
One type of error found in teacher observation scores is measurement error. This occurs when the person observing and scoring a teacher doesn't adequately understand or use the observation system. We can correct this type of error through rigorous observer training.
Another type of error is sampling error. This occurs when the rater observes a class that doesn't represent a teacher's usual behavior. For example, a teacher might typically ask a great many questions of all students but not on the day he or she is observed. Sampling error is more difficult to address than measurement error.

What to Do About Sampling Error

The obvious way to eradicate sampling error is to observe teachers every day they teach, which, of course, is impossible. The current convention is to do unannounced, random observations. Some districts and schools now require supervisors to do about five observations of each teacher. But because day-to-day lessons require different instructional strategies, far more than five observations are required to obtain an accurate representation of a teacher's pedagogical skill.
In the teacher evaluation model based on The Art and Science of Teaching (2007), I've identified three types of lessons: (1) those in which a teacher introduces new content, (2) those in which students practice and deepen their understanding of previously introduced content, and (3) those that require students to apply what they've learned. Each involves different instructional strategies.
This fact alone might add sampling error to an observation. If an observer is required to look for a long list of instructional strategies during every observation, but some strategies typically occur only in a specific type of lesson, he or she would have to note the absence of various strategies during the observed lesson even when those strategies wouldn't have been suitable.
Videos of classroom teachers have shown that teachers use lessons that introduce new content 60 percent of the time, lessons that help students practice and deepen their understanding 35 percent of the time, and lessons that ask students to apply what they've learned 5 percent of the time. If an observer made five random observations of a teacher's classes, the probability of seeing one lesson of each type would be only 18 percent. In other words, chances are good that teacher scores based on five random observations would contain a great deal of sampling error.

Five Steps That Help

At some point, K–12 evaluators might be able to conduct sufficient teacher observations to reduce sampling error. In the interim, I recommend five steps.

Use Teacher Self-Evaluation

Although having teachers rate themselves introduces the possibility of teachers scoring themselves too high, it can provide a useful reference point. In fact, in two of three possible outcomes, teacher self-evaluations help decrease the error in the observer's rating.
For example, if the teacher's self-rating is the same as the observer's, that's a good indication that the observer rating is accurate. If the teacher's self-rating is lower than the observer's, it's possible that the teacher has underrated his or her skill level, but it's more likely that the observer's rating is inflated; teachers will likely be more aware of their tendencies over the years than will observers. Finally, if the teacher's self-rating is higher than the observer's, the teacher may have an inflated view of his or her pedagogical skills, or the observer's score may be low as a result of sampling error or measurement error. In this case, the remaining strategies can provide additional information.

Use Announced Observations for Different Lesson Types

It's wise to schedule three announced observations during which the observed teacher demonstrates one of the three types of lessons. This procedure ensures that observers will see examples of instructional strategies specific to the different lesson types.
Of course, this might introduce another type of error—the teacher attempting to impress the observer by using strategies during announced observations that he or she typically doesn't use. If the rating scale describes specific levels of development for each instructional strategy (Marzano, 2012), the teacher will probably score low in terms of his or her skill in these rarely used strategies, thus defeating his or her purpose of using those strategies.

Use Brief Walk-Throughs as Unannounced Observations

Many schools routinely use brief, unannounced walk-throughs during which observers observe in teachers' classrooms for 3 to 5 minutes. Observers can collect information to resolve any uncertainties in teacher scores. For example, if a teacher's self-rating is higher than an observer's rating, ratings from walk-throughs might reconcile the differences.

Record Teachers' Classes on Video

Random recordings of teachers' classes are both easy and inexpensive to do using modern digital video cameras. Raters can score the recordings independently or in teams, and teachers can be included in scoring their own recordings.

Let Teachers Challenge Scores

Teachers should be allowed to challenge their final summative scores on specific elements by providing evidence—such as classroom videos, student artifacts, or student responses to survey questions—that shows they have effectively used those elements in the classroom. This gives teachers a say in the scores they receive.

A Useful Tool

Teacher observation is a useful and valid part of teacher evaluation. By incorporating some of the strategies I suggest, schools can reduce sampling error without requiring a great deal of additional resources.
References

Bill and Melinda Gates Foundation. (2011). Learning about teaching: Initial findings from the Measures of Effective Teaching project. Bellevue, WA: Author. Retrieved from www.gatesfoundation.org/college-ready-education/Documents/preliminary-findings-research-paper.pdf

Marzano, R. J. (2007). The art and science of teaching: A comprehensive framework for effective instruction. Alexandria, VA: ASCD.

Marzano, R. J. (2012). Evaluations that help teachers improve. Educational Leadership, 70(3), 14–19.

End Notes

1 I derived this probability by computing the probability of each possible way that five observations would include at least one instance of each lesson type using the multinomial distribution and then summing these probabilities.

Robert Marzano is the CEO of Marzano Research Laboratory in Centennial, CO, which provides research-based, partner-centered support for educators and education agencies—with the goal of helping teachers improve educational practice.

As strategic advisor, Robert brings over 50 years of experience in action-based education research, professional development, and curriculum design to Marzano Research. He has expertise in standards-based assessment, cognition, school leadership, and competency-based education, among a host of areas.

He is the author of 30 books, 150 articles and chapters in books, and 100 sets of curriculum materials for teachers and students in grades K–12.

Learn More

ASCD is a community dedicated to educators' professional growth and well-being.

Let us help you put your vision into action.
From our issue
Product cover image 113034.jpg
Teacher Evaluation: What's Fair? What's Effective?
Go To Publication