It all started with a whiteboard filled with options. For almost 10 years, we had recorded and watched lessons from hundreds of diverse classrooms across the United States. What we saw in those lessons intrigued us. They didn't resemble the "teacher tells, students practice" traditionalism prevalent in the 1970s and 1980s, but neither did we see many lessons that engaged students at high levels of thinking and reasoning. We also saw marked variability in other dimensions of instruction, like teacher use of student ideas, teachers' and students' use of academic language, and the focus on mathematical meaning and practices.
Over time, we worked to develop an observational tool, the Mathematical Quality of Instruction (MQI), that would capture these and other dimensions of mathematics classrooms; then we trained several dozen teachers and former teachers to score those videotaped lessons using the observational tool. These raters' testimonials—unsolicited by us—that using the MQI and watching video had resulted in important changes in their own teaching led us to think about using the instrument in a similar way with classroom teachers. The overly full whiteboard was the result of brainstorming how such a learning opportunity would best work.
From the start of the brainstorming meeting, it was clear that we had more questions than answers. How might watching and rating lessons on video prompt teacher learning? Should teachers watch and score their own teaching, or should they use stock videos from our library? Should we give teachers the master scores that expert raters gave the clips, or should we let groups come to their own understanding? To what extent could teachers take on the learning process themselves?
In many ways, these inquiries revolve around an age-old question: When is a facilitator or trainer useful, and when is teacher-led professional learning more effective? Many of us felt strongly that instructional practice could be improved if teachers engaged in serious, evaluative conversations about teaching—but that having a facilitator attached to each group may work against this goal. We also knew we wanted teachers to begin with the list of important practices suggested by the observation instrument, deeply internalize this list of practices, and then develop and use their own ideas about how to apply them in their instruction. But how could we do that?
Despite optimism regarding teacher professional learning communities and collaborative professional development, empirical evidence has been mixed. Previous studies of video clubs suggested that simply bringing teachers together to watch a video would not necessarily result in deep discussions (van Es & Sherin, 2008). Likewise, having teachers show their own lessons to others sometimes resulted in superficial conversations (MacDonald, 2011). Our previous experience suggested that even with an organized method for analyzing stock or personal footage, misinterpretations were bound to occur, especially when participants held strong views of teaching or lacked adequate mathematical knowledge.
Which Works Best?
We suspected that we were not alone in having a whiteboard full of questions. Teachers, school leaders, and district curriculum coordinators and coaches likely go through the same process, trying to envision the most effective way to roll out new lessons, standards, curriculum materials, and instructional techniques. Scholars have written about working with teachers and videos, and a few even contrasted situations in which teachers used their own videos or watched stock video (Seidel, Stürmer, Blomberg, Kobarg, & Schwindt, 2011; Zhang, Lundeberg, Koehler, & Eberhardt, 2011). Others have been interested in the question of whether more organically led groups could be as effective as heavily facilitated groups (Calandra, Gurvitch, & Lund, 2008; Saxe & Gearhart, 2001; van Es & Sherin, 2006). We decided to try out each option with a different group of teachers, in a kind of mini-experiment.
Piloting, 1.0
Before bringing the professional development to those different groups, we conducted a first-round pilot with a group of 12 very tolerant teachers and math coaches. Doing so gave us an opportunity to develop materials and see teachers' reactions to those materials before beginning our mini-experiment.
It is safe to say that the initial pilot was fairly disastrous.
We started with the premise that after only a brief introduction to our rating instrument, teachers could discuss and assign their own scores for videos they watched. We hoped that a community would develop at the pilot site that would enable teachers to collaboratively build their own understanding of the rating instrument and how to apply it to specific videos. But the teachers wanted a much firmer knowledge base about what they were looking for before scoring on their own.
Having facilitators who held the "right" answers—in the form of master scores for each clip—was also a problem. We had hoped teachers would invest in long conversations about what they saw, but telling teachers the "right" scores seemed to impede some participants' contributions to the conversations. Other participants, however, noted that they wanted those right answers while they learned to use the rating instrument.
Another problem with having the "right" answers is that neither we nor the participating teachers were always convinced that the master scores were correct. The explanations that expert raters gave often sounded technical and arcane, designed for research but not for teacher use. We needed to rescore the videos with teachers in mind.
This pilot brought home a myriad of other lessons: The monthly sessions we planned were spaced too far apart for teachers to really internalize the ratings instrument and cohere as a group. Poor ordering of different dimensions within the instrument led to teacher misunderstandings of key content. Teachers also craved connections to their own practice.
Piloting, 2.0
With these lessons in mind, we went into the second pilot with a redesigned plan, including a simplified instrument, more extensive initial training on what each item on the instrument meant, more videos, and more frequent teacher meetings. Our first pilot had suggested that only when teachers "owned" the scoring conversations and connected the conversations to their own practice would we see high-quality conversation and improvements in teachers' analytic skills. However, we didn't know whether teachers would hold productive discussions without an external facilitator. We also wondered whether having teachers record and score their own and their colleagues' instruction would help them connect the professional development to practice.
Thus, we designed this new pilot to test these issues. We first trained all teachers in the basics of the instrument over a three-day summer session. Then, teachers split into four groups, with each group representing one of the conditions we wanted to consider: teacher-created video vs. stock video, and facilitator-led discussion vs. teacher-led discussion. We plan to pilot the program with three different groups during the study for a total of 12 groups, three in each condition. Halfway through the pilot, with six groups completed, we are beginning to learn some early lessons.
All groups had fruitful conversations about mathematics teaching. For example, in a teacher-led group, a video clip in which students solved a word problem ("John spends 3/4 of his allowance on what he likes and saves 1/5 of his allowance; how much does he have left over for school lunch?") led to an intense debate about whether the teacher had woven student ideas into the discussion of the problem. In the video clip, the teacher notes that students have many different answers to the problem (19/20, 11/20, 4/20, and 1/20); summons one student to the board to show his solution; and then asks that student to talk about his work:
Student: I got the 3/4ths, and I subtracted it by 1/5th because 3/4ths is what he spends and 1/5th is what he saves. And then I got this, and I subtracted it and I got 11/20ths.Teacher: So 11—what would half be in 20ths?Student: 10.Teacher: 10.Student: It's almost—Teacher: Now hold on a second; just think about this for a minute. So, he's telling me that he's going to spend close to half of his money on school lunches each week. If you're telling me, 11/20ths that's half; but he's already spending 3/4ths on other things—on whatever he wants—so if he's already spending 3/4ths on other things, would he have a half left to spend on—Student: No.Teacher: So logically that's not going to fit. The math is correct, but is that the correct way to work the math for this problem? Maybe not.
Responding to the clip, one participant initially argued that the teacher had woven students' ideas into instruction, but another disagreed:
I guess it depends on what one defines as mathematical ideas…. the students weren't really given the chance to express their ideas. They were able to say what they felt the answer was, and one got up and did a procedure mathematically, but they were not given the opportunity to really talk about what their ideas were and … why they believed they were correct. And so, to me, that's why it wasn't a [high-scoring clip]. While she was hearing their answers, she wasn't hearing their ideas.
These distinctions—between students simply expressing answers and explaining why those answers were correct and between teachers hearing answers and hearing ideas—are crucial. Other group participants agreed that the clip did not warrant a high score for using student ideas in instruction.
This discussion was not unique; similar ones took place in all groups. These teachers were developing a common language for talking about instruction (for instance, defining what "student ideas" means) and norming their evaluations of instruction. In doing so, teachers produced ratings similar to those of the researchers who had scored the clips. But did any of our four versions of the professional development prove more effective than the others? We organize our observations by research question.
Who Should Facilitate?
Conversations in the teacher-led groups tended to be just as rich as and, at times, richer than in the groups with an outside facilitator. Teacher-led groups' scores for specific clips were often identical to or very close to the expert scores, and facilitators noted that in cases where scores diverged, teachers' reasoning was often as persuasive as that of the experts. Without a facilitator directing the conversation, teachers also had the freedom to focus on what they found interesting and important in the clip.
At times, however, teachers reported wanting more guidance about the instrument itself; in facilitator-led groups, that guidance was available. In some teacher-led groups, discussions stopped when teachers agreed about how to score particular clips. And even when disagreements occurred in teacher-led groups, discussions tended to be shorter, with teachers more likely to agree to disagree and move on.
Overall, our experience suggests that teachers and schools in the early stages of scoring videos with an observational rubric could do well to hire or train a facilitator with strong knowledge of the instrument to guide sessions. Later sessions, however, can easily be facilitated by teachers themselves, with the possible effect of enhancing teachers' engagement.
Where Should Video Come From?
In the groups that videotaped their own teaching, the first four of ten after-school sessions used stock video to create group norms and deepen teachers' understanding of the rating instrument. In the subsequent six sessions, teacher volunteers took turns capturing their own instruction, choosing a short clip, and then submitting it for scoring. Groups' success at doing this varied tremendously. In some cases, teachers were reluctant to share videos of their teaching; in other cases, teachers eagerly took on this new work. Responses tended to vary by site and facilitator, with a group largely composed of teachers from a single school ultimately achieving the most frequent participation in the process.
Teachers in these groups reported that they learned more about teaching practice by watching their peers than by watching teachers from stock clips. In groups that used stock video only, some teachers reported wanting more connection to local practice and seemed bored by the middle of the 10 sessions. Teachers in groups using their own video noted overlaps between what they saw their colleagues teaching and what they themselves would soon be teaching, and they gleaned ideas and practices from one another. Others noted that their peers' instruction was often stronger than what they had seen in stock video clips. For instance, one teacher remarked that he felt that his teaching of division was improved because he had observed his peers teaching this content so well.
Thus, when teachers used their own video, they found the experience to be useful. Yet in comparing sessions that used stock and teacher-captured video, facilitators noticed other patterns. In one teacher-led group, teachers consistently scored one another's clips higher than project staff thought was warranted. Facilitators from other groups noted that even when scores were not inflated, discussion of colleagues' instruction was less detailed and critical than was typical when the teachers were discussing stock video.
Not all groups using their own video experienced these phenomena; in the group composed largely of teachers from one school, the facilitator felt teachers were on target in their scores, even when the instruction did not warrant the highest scores on the instrument. Yet our experiences suggest that those wishing to do similar work should anticipate that the "culture of nice" within schools may pose an obstacle to teachers' honest discussion of their own videos. Giving teachers more time to develop trust may help.
The Best of All Worlds
We don't know whether these trends generalize beyond these teachers, but our experience strongly suggests combining all conditions: Start with strong facilitation, eventually turning leadership of groups over to teachers; and start with stock video, perhaps asking all teachers to contribute after trust has been established and norms for watching one another's videos are in place.
Our study suggests that teachers appreciate the opportunity to discuss mathematics instruction with colleagues, and they nearly universally report that it improved their teaching. Giving teachers—not just administrators—meaningful access to classroom observation instruments, video exemplars, and opportunities to rate and discuss these exemplars can be a valuable addition to a district's instructional improvement efforts.
Authors' note: The research reported here was supported by the Institute for Education Sciences, U.S. Department of Education, through Grant R305C090023 to President and Fellows of Harvard College, as well as by the National Science Foundation, through Grant DRL-1221693. The opinions expressed are those of the authors and do not represent views of the IES, the U.S. Department of Education, or NSF.