Studies of the impact of class size on student achievement may be more plentiful than for any other issue in education. Although one might expect this huge research effort to yield clear answers about the effects of class size, sharp disagreements about these studies' findings have persisted.
Advocacy groups take opposite stances. The American Federation of Teachers, for example, asserts that taken together, these studies . . . provide compelling evidence that reducing class size, particularly for younger children, will have a positive effect on student achievement. (Murphy & Rosenberg, 1998, p. 3) The Heritage Foundation, by contrast, claims that “there's no evidence that smaller class sizes alone lead to higher student achievement” (Rees & Johnson, 2000).
Reviewers of class size studies also disagree. One study contends that “large reductions in school class size promise learning benefits of a magnitude commonly believed not within the power of educators to achieve” (Glass, Cahen, Smith, & Filby, 1982, p. 50), whereas another claims that “the . . . evidence does not offer much reason to expect a systematic effect from overall class size reduction policies” (Hanushek, 1999, p. 158).
That the American Federation of Teachers and the Heritage Foundation sponsor conflicting judgments is easy to understand. But why have reviewers come to such divergent views about the research on class size, and what does the evidence really say?
Early Small Field Experiments
To answer these questions, we must look at several research traditions, beginning with early experiments on class size. Experiments have always been a popular research technique because investigators can assign their subjects randomly to different conditions and then compare the results of those conditions—and this human intervention can appear to provide information about causes and effects. Experiments on class size, however, are nearly always done in field settings—schools—where uncontrolled events can undermine the research and affect results.
Small experimental studies on the effects of class size began to appear in the 1920s, and scores of them emerged subsequently. In the 1960s, informal reviews of these efforts generally concluded that differences in class size generated little to no effect. By the late 1970s, however, a more sophisticated research method, meta-analysis, had been invented, which facilitated the statistical assembly of results from small-but-similar studies to estimate effects for the studies' populations. Reviewers quickly applied meta-analysis to results from early experiments in class size (Glass & Smith, 1979; Educational Research Service, 1980; Glass et al., 1982; Hedges & Stock, 1983) and eventually emerged with a consensus that short-term exposure to small classes generates—usually minor—gains in student achievement and that those gains are greater in the early grades, in classrooms with fewer than 20 students, and for students from groups that are traditionally disadvantaged in education.
Most of these early class size experiments, however, had involved small samples, short-term exposures to small classes, only one measure of student success, and a single education context (such as one school or school district). Poor designs had also made results of some studies questionable. Researchers needed to use different strategies to ascertain the effects of long-term exposure to small classes and to assess whether the advantages of early exposure to small classes would generalize to other successes and be sustainable.
Surveys
Survey research has provided evidence on the effects of class size by analyzing naturally occurring differences in schools and classrooms and by asking whether these differences are associated with student outcomes.
Well-designed surveys can offer evidence about the impact of variables that experiments cannot manipulate—such as gender, minority status, and childhood poverty—but survey research cannot easily establish relationships between causes and effects. For example, if a survey examines a sample of schools where average class size varies and discovers that those schools with smaller classes also have higher levels of student achievement, has the survey ascertained that class size generated achievement? Hardly. Those schools with smaller classes might also have had more qualified teachers, better equipment, more up-to-date curriculums, newer school buildings, more students from affluent homes, or a more supportive community environment—factors that may also have helped generate higher levels of achievement. To use survey data to make the case for a causal relation between class size and student outcomes, then, researchers must use statistical processes that control for the competing effects of other variables.
Serious surveys of education achievement in the United States began in the 1960s with the famous Coleman report (Coleman et al., 1966). Written by authors with impressive reputations and released with great fanfare, this massive, federally funded study involved a national sample and took on many issues then facing education. Today, most people remember the report for its startling claim that student achievement is almost totally influenced by the students' families and peers and not by the characteristics of their schools. This claim was widely accepted—indeed, was greeted with dismay by educators and endorsed with enthusiasm by fiscal conservatives—despite flaws in the report's methods that were noted by thoughtful critics.
Since then, researchers have conducted surveys to establish whether differences in school funding or in the reforms that funds can buy—such as small class sizes—are associated with desired education outcomes. Most of these surveys, usually designed by economists, have involved questionable design features and small samples that did not represent the wide range of U.S. schools, classrooms, or students.
In the 1980s, economist Eric Hanushek began to review these flawed studies and to discuss their supposed implications. Hanushek, committed to the notion that public schools are ineffective and should be replaced by a marketplace of competing private schools, concluded that differences in public school funding are not associated with education outcomes (see Hanushek, 1986, and various publications since).
Other analysts have challenged Hanushek's methods and conclusions on several grounds. Larry Hedges and Rob Greenwald, for example, have pointed out that Hanushek merely counts the number of effects that he believes are statistically significant, but because most of the studies that he reviewed had small samples, he has, of course, found few statistically significant effects. When researchers combine those effects in meta-analyses, however, they find that differences in school funding and the benefits that funds can buy—such as small classes—do, indeed, have an impact (see Hedges, Laine, & Greenwald, 1994, and other publications since).
Other commentators have noted that Hanushek's reviews include many studies that used inappropriate samples or did not employ controls for other school characteristics whose effects might be confused with those of class size. In addition, most of the studies did not examine class size directly but looked instead at student-teacher ratio—that is, the number of students divided by the number of “teachers” reported for a school or school district. Such an approach ignores the actual allocation of students and teachers to classrooms and includes as “teachers” such persons as administrators, nurses, counselors, coaches, specialty teachers, and other professionals who rarely appear in classrooms. Such a ratio does not tell us the number of students actually taught by teachers in classrooms.
Hanushek has not responded well to such criticisms; rather, he has found reasons to quarrel with the details and to continue publishing reviews claiming that small classes have few to no effects. These efforts have allied Hanushek with political conservatives who have extolled his conclusions, complimented his efforts, and asked him to testify in various forums where class size issues are debated. Because of these responses and activities, it is no longer possible to give credence to Hanushek's judgments about class size.
Fortunately, a few well-designed, large-scale surveys have investigated class size directly (see, for example, Elliott, 1998; Ferguson, 1991; Ferguson & Ladd, 1996; Wenglinsky, 1997). These studies concluded that long-term exposure to small classes in the early grades can be associated with student achievement; that the extra gains that such exposure generates may be substantial; and that such gains may not appear with exposure to small classes in the upper grades or at the secondary school levels.
Trial Programs and Large Field Experiments
Other types of small class research have addressed some of the shortcomings of early experiments and surveys. In the 1980s, state legislatures in the United States began political debates about the effects of small class size, and some states began trial programs or large-scale field experiments.
Indiana's Project Prime Time
In 1981, the Indiana legislature allocated $300,000 for a two-year study on the effects of reducing class size for the early grades in 24 randomly selected public schools. But initial results were so impressive that the state allocated funds to reduce class sizes in the 1st grade for all Indiana schools in 1984–85 and for K–3 by 1987–88, with an average of 18 students for each teacher.
Because of the statewide design of the initiative, it was impossible to compare results for small classes with a comparable group of larger classes. Some schools in the state had small classes before Project Prime Time began, however, so researchers compared samples of 2nd grade achievement records from six school districts that had reduced class size with three that had not. They found substantially larger gains in reading and mathematics achievement for students in small classes (McGivern, Gilman, & Tillitski, 1989).
These results seemed promising, but critics soon pounced on the design of the Project Prime Time study, decrying the fact that students had not been assigned to experimental and control groups on a random basis; pointing out that other changes in state school policy had also been adopted during the project; and suggesting that the state's teachers were motivated to make certain that small classes achieved better results because they knew how the trial program's results were supposed to come out. Indiana students probably did benefit from the project, but a persuasive case for small classes had not yet been made. A better experiment was needed.
Tennessee's Project STAR
Such an experiment shortly appeared in Tennessee's Project STAR (Student/Teacher Achievement Ratio) arguably the largest and best-designed field experiment ever undertaken in education (Finn & Achilles, 1990; Finn, Gerber, Achilles, & Boyd-Zaharias, 2001; Folger, 1989; Grissmer, 1999; Krueger, 1999, 2000; Krueger & Whitmore, 2001; Mosteller, 1995; Nye, Hedges, & Konstantopoulos, 1999).
In the mid-1980s, the Tennessee legislature funded a four-year study to compare the achievement of early-grade students assigned randomly to one of three conditions: standard classes (with one certificated teacher and more than 20 students); supplemented classes (with one teacher and a full-time, noncertificated teacher's aide); and small classes (with one teacher and about 15 students). The study began with students entering kindergarten in 1985 and called for each student to attend the same type of class for four years. To control variables, the study asked each participating school to sponsor all three types of classes and to assign students and teachers randomly to each type. Participating teachers received no prior training for the type of class they were to teach.
The project invited all the state's primary schools to be in the study, but each participating school had to agree to remain in the program for four years; to have the class rooms needed for the project; and to have at least 57 kindergarten students so that all three types of classes could be set up. Participating schools received no additional support other than funds to hire additional teachers and aides. These constraints meant that troubled schools and those that disapproved of the study—and schools that were too small, crowded, or underfunded—would not participate in the STAR program, so the sample for the first year involved “only” 79 schools, 328 classrooms, and about 6,300 students. Those schools came from all corners of the state, however, and represented urban, inner-city, suburban, and rural school districts. The sample population included majority students, a sizable number of African American students, and students receiving free school lunches.
At the beginning of each year of the study, the sample population changed somewhat. Some participating students had moved away, been required to repeat kindergarten, or left the study because of poor health. Other families moved into the districts served by STAR schools, however, and their children filled the vacant seats. Also, because attending kindergarten was not then mandatory in Tennessee, some new students entered the STAR program in the lst grade.
In addition, some parents tried to move their children from one type of STAR class to another, but administrators allowed only a few students to move from a standard class to a supplemented class or vice versa. By the end of the study, then, some students had been exposed to a STAR class for four years, but others had spent a shorter time in such classes. These shifts might have biased STAR results, but Alan Krueger's careful analysis (1999) concluded that such bias was minimal.
Near the end of each year, STAR students took the Stanford Achievement Test battery and received separate scores for reading, word-study skills, and mathematics. Results from these tests were similar for students who were in the standard and supplemented classes, indicating that the presence of untrained aides in supplemented classes did not contribute to improving student achievement. Results for small classes were sharply different, however, with long-term exposure to small classes generating substantially higher levels of achievement and with gains becoming greater the longer that students were in small classes.
Figure 1 displays these two effects in reading achievement for average students. STAR investigators found that the students in small classes were 0.5 months ahead of the other students by the end of kindergarten, 1.9 months ahead at the end of 1st grade, 5.6 months ahead in 2nd grade, and 7.1 months ahead by the end of 3rd grade. The achievement advantages were smaller, although still impressive, for students who were only exposed to one, two, or three years of small classes. STAR investigators found similar (although not identical) results for word-study skills and mathematics.
Figure 1. Average Months of Grade-Equivalent Advantage in Reading Achievement Scores for Students in Small Classes
Small-class advantages appeared for all types of students participating in the study. The gains were similar for boys and girls, but they were greater for impoverished students, African American students, and students from inner-city schools—groups that are traditionally disadvantaged in education.
These initial STAR findings were impressive, but would students who had been exposed to small classes in the early grades retain their extra gains when they entered standard size classes in 4th grade? To answer this question, the Tennessee legislature authorized a second study to examine STAR student outcomes during subsequent years of schooling.
At the end of each year, until they were in the 12th grade in 1997–1998, these students took the Comprehensive Tests of Basic Skills and received scores in reading, mathematics, science, and social science. The results showed that average students who had attended small classes were months ahead of those from standard classes for each topic assessed at each grade level. Figure 2 displays results from some of these tests, showing, for example, that when typical students who had attended small classes in the early grades reached grade 8, they were 4.1 months ahead in reading, 3.4 months ahead in mathematics, 4.3 months ahead in science, and 4.8 months ahead in social science.
Figure 2. Average Months of Grade-Equivalent Advantage in Achievement Scores for Students Who Experienced One or More Years of Small Classes
Students who had attended small classes also enjoyed other advantages in the upper grades. They earned better grades on average, and fewer dropped out or had to repeat a year. And when they reached high school, more small class students opted to learn foreign languages, study advanced-level courses, and take the ACT and SAT college entrance examinations. More graduated from high school and were in the top 25 percent of their classes. Moreover, initial published results suggest that these upper-grade effects were again larger for students who are traditionally disadvantaged in education.
Figure 3 illustrates the percentages of students who opted to take the ACT or SAT exams as high school seniors. Roughly 44 percent of those from small classes took one or both of these tests, whereas only 40 percent of those from standard classes did so. The difference, however, was far greater for African American students. Instruction in small classes during the early grades had eliminated more than half of the traditional disadvantages that African American students have displayed in participation rates in the ACT and SAT testing programs.
Figure 3. Percentage of Students Who Took the ACT or SAT College Entrance Exam by Early-Grade Class Type
Taken together, findings from the STAR project have been impressive, but they are not necessarily definitive. The STAR student sample did not quite match the U.S. population, for example, because very few Hispanic, Native American, and immigrant (non-English-speaking) families were living in Tennessee in the middle-1980s. Also, news about the greater achievement gains of small classes leaked out early during the STAR project, and one wonders how this may have affected participating teachers and why parents whose children were in other types of classes did not then demand that their children be reassigned to small classes. Finally, the STAR schools had volunteered to participate, suggesting that the teachers and principals in those schools may have had strong interests in trying innovative ideas. Questions such as these should not cause us to reject the findings from the STAR project, but we should keep in mind that this was a single study and that, as always, other evidence is needed to increase certainty about class size effects.
Wisconsin's SAGE Program
Findings from Project STAR have prompted class size reduction efforts in other states. One type of effort focuses on increasing the number of small, early-grade classes in schools in disadvantaged neighborhoods. STAR investigators supervised such a program in Tennessee in 1989, reducing K–3 class sizes in 17 school districts where the average family income was low. The results of this and similar projects in North Carolina, Michigan, Nevada, and New York have confirmed that students from small classes generate higher achievement scores when compared with their previous performance and with those of students in other schools. Most of these projects, however, have been small in scope.
A much larger project focused on the needs of disadvantaged students is Wisconsin's Student Achievement Guarantee in Education (SAGE) Program (Molnar et al., 1999, 2000; Zahorik, 1999). Led by Alex Molnar, this program began as a five-year pilot project for K–3 classes in school districts where at least 50 percent of students were living below the poverty level. The program invited all schools in these districts to apply for the program, but it was able to fund only a few of these schools, and no additional schools were to be added during the pilot project. Schools received an additional $2,000 for each low-income student enrolled in SAGE classrooms. All school districts that applied were allowed to enter the program, and 30 schools in 21 districts began the program at the K–1 grade levels in 1996, with 2nd grade added in 1997 and 3rd grade in 1998.
The SAGE program's major intervention was to reduce the average K–3 class size to 15 students for each teacher. To assess outcomes of the program, researchers compared results from small class SAGE schools with results from standard class size schools in the same districts having similar K–3 enrollments, racial compositions, average family incomes, and prior records of achievement in reading. Findings so far have indicated larger gains for students from small classes—in achievement scores for language arts, reading, and mathematics—that are roughly comparable to those from Project STAR. In addition, as with Project STAR, African American students have made relatively larger gains.
Like project STAR, the SAGE program studied schools that had volunteered for the program and provided them with sufficient funds to hire additional teachers. The SAGE program, however, involved more Hispanic, Asian, and Native American students than had the STAR project.
After the announcement of findings from the initial effort, the Wisconsin legislature extended the SAGE program to other primary schools in the state. Therefore, what began as a small trial project has now blossomed into a statewide program that makes small classes in the early grades available for schools serving needy students.
The California Class Size Reduction Program
In 1996, California began a class size reduction program that has been far more controversial than such programs elsewhere. In earlier years, California had experienced many social problems, and major measures of achievement ranked California schools last in the United States. That year, however, a fiscal windfall became available, and then-governor Pete Wilson announced that primary schools would receive $650 annually for each student (an amount later increased to $800) if they would agree to reduce class sizes in the early grades from the statewide average of more than 28 students to not more than 20 students in each class (Hymon, 1997; Korostoff, 1998; Stecher, Bohrnstedt, Kirst, McRobbie, & Williams, 2001).
Several problems quickly surfaced. First, the California definition of a small class was larger than the size recommended in other studies. In fact, the size of small classes in California matched the size of standard classes in some other states. On the other hand, some California schools had been coping with 30–40 students in each classroom in the early grades, so a reduction to 20 students constituted an improvement.
The second problem was that the program's per-student funding was inadequate. Contrast the SAGE program's additional $2,000 for each student with the $650 or $800 offered by California. Nevertheless, the lure of additional funding proved seductive, and most California school districts applied to participate. This inadequate funding imposed serious consequences on poorer school districts, which had to abolish other needed activities to afford hiring teachers for smaller classes. In effect, then, the program created rather than solved problems for underfunded school districts.
In addition, when the California program began, many of its primary schools were overcrowded, and the state was suffering from a shortage of well-trained, certificated teachers. To cope with the lack of space, some schools created spaces for smaller classes by cannibalizing other needed facilities such as special education quarters, child care centers, music and art rooms, computer laboratories, libraries, gymnasiums, or teachers' lounges. Other schools had to tap into their operating budgets to buy portable classrooms, resulting in delays in paying for badly needed curricular materials or repairs for deteriorating school buildings. And to staff their smaller classes, many schools had to hire teachers without certification or prior training.
So far, results from the California program have been only modest. Informal evidence suggests that most students, parents, and teachers are pleased with their schools' smaller classes. And comparisons between the measured achievements of 3rd grade students from districts that did and did not participate in the early phases of the program have indicated minor advantages for California's smaller classes. These effects, however, have been smaller than those reported for the STAR and SAGE programs.
In many ways, the California initiative has provided a near-textbook case of how a state should not reduce class size. After failing to conduct a trial program, California adopted an inadequate definition of class size, committed insufficient funds to the initiative, and ignored serious problems of overcrowding and teacher shortages. This example should remind us that small classes are not a panacea for education. To be effective, programs for reducing class size need careful planning and consideration of the needs and strengths of existing school systems.
What We Now Know About Small Classes
When planned thoughtfully and funded adequately, small classes in the early grades generate substantial gains for students, and those extra gains are greater the longer students are exposed to those classes.
Extra gains from small classes in the early grades are larger when the class has fewer than 20 students.
Extra gains from small classes in the early grades occur in a variety of academic disciplines and for both traditional measures of student achievement and other indicators of student success.
Students whose classes are small in the early grades retain their gains in standard size classrooms and in the upper grades, middle school, and high school.
All types of students gain from small classes in the early grades, but gains are greater for students who have traditionally been disadvantaged in education.
Initial results indicate that students who have traditionally been disadvantaged in education carry greater small-class, early-grade gains forward into the upper grades and beyond.
The extra gains associated with small classes in the early grades seem to apply equally to boys and girls.
Evidence for the possible advantages of small classes in the upper grades and high school is inconclusive.
Tentative Theories
Why should reducing class size have such impressive effects in the early grades? Theories about this phenomenon have fallen largely into two camps.
Most theorists focus on the teacher, reasoning that small classes work their magic because the small class context improves interactions between the teacher and individual students. In the early grades, students first learn the rules of standard classroom culture and form ideas about whether they can cope with education. Many students have difficulty with these tasks, and interactions with a teacher on a one-to-one basis—a process more likely to take place when the class is small—help the students cope. In addition, teachers in small classes have higher morale, which enables them to provide a more supportive environment for initial student learning. Learning how to cope well with school is crucial to success in education, and those students who solve this task when young will thereafter carry broad advantages—more effective habits and positive self-concepts—that serve them well in later years of education and work.
The need to master this task confronts all students, but doing so is often a more daunting challenge for students who come from impoverished homes, ethnic groups that have suffered from discrimination or are unfamiliar with U.S. classroom culture, or urban communities where home and community problems interfere with education. Thus, students from such backgrounds have traditionally had more difficulty coping with classroom education, and they are more likely to be helped by a reduction in class size.
This theory also helps explain why reductions in class size in the upper grades may not generate significant advantages. Older students normally have learned to cope with standard classrooms and have developed either effective or ineffective attitudes concerning academic subjects—and these attitudes are not likely to change just because of a reduction in class size.
The theory also suggests a caution. Students are likely to learn more and develop better attitudes toward education if they are exposed to well-trained and enthusiastic teachers, appropriate and challenging curriculums, and physical environments in their classrooms and schools that support learning. If conditions such as these are not also present, then reducing class size in the early grades will presumably have little impact. Thus, when planning programs for reducing class size, we should also think about the professional development of the teachers who will participate in them and the educational and physical contexts in which those programs will be placed.
A second group of theories designed to account for class size effects focuses on the classroom environment and student conduct rather than on the teacher. We know that discipline and classroom management problems interfere with subject-matter instruction. Theories in this group argue that these problems are less evident in small classes and that students in small classes are more likely to be engaged in learning. Moreover, teacher stress is reduced in small classes, so teachers in the small class context can provide more support for student learning. Studies have also found that small instructional groups can provide an environment for learning that is quite different from that of the large classroom. Small instructional groups can create supportive contexts where learning is less competitive and students are encouraged to form supportive relationships with one another.
Theories such as these suggest that the small class environment is structurally different from that of the large class. Less time is spent on management and more time is spent on instruction, students participate at higher levels, teachers are able to provide more support for learning, and students have more positive relationships. Such processes should lead both to greater subject-matter learning and to more positive attitudes about education among students, with more substantial effects in the early grades and for those groups that are traditionally disadvantaged in education.
These two theories are not mutually exclusive. On the contrary, both may provide partial insights into what happens in small classes and why small class environments help so many students. Collecting other types of evidence to assess such theories directly would be useful, particularly observational studies that compare the details of interaction in early-grade classes of various sizes and surveys of the attitudes and self-concepts of students who have been exposed to classes of different sizes. Unfortunately, good studies of these effects have been hard to find.
Policy Implications and Actions
Given the strength of findings from research on small classes, why haven't those findings provoked more reform efforts? Although many state legislatures have debated or begun reform initiatives related to class size, most primary schools in the United States today do not operate under policies that mandate small classes for early grades. Why not?
This lack of attention has several causes, among them ignorance about the issue, confusion about the results of class size research and ineffective dissemination of those results, prejudices against poor and minority students, the politicizing of debates about class size effects and their implications, and practical problems associated with adopting small classes.
Recent debates about class size have become quite partisan in the United States, with Democrats generally favoring class size reductions and Republicans remaining hostile to them. Responding to President Bill Clinton's 1998 State of the Union address, the U.S. Congress set up a modest program, aimed at urban school districts with high concentrations of poverty, which provided funds for hiring additional teachers during the 1999 and 2000 fiscal years. This program enabled some districts to reduce class sizes in the early grades, and informal results from those cities indicated gains in student achievement.
Republicans have been lukewarm about extending this program—some apparently believing that it is ineffective or is merely a scheme to enhance the coffers of teachers' unions—and have welcomed President George W. Bush's call for an alternative federal program focused on high-stakes achievement tests and using results from those tests to apply sanctions to schools if they do not perform adequately.
The major problems standing in the way of reducing class sizes, however, are often practical ones. In many cases, cutting class sizes means hiring more teachers. With the looming shortage of qualified teachers, recruiting more teachers may be even more difficult than finding the funds to pay their salaries. Further, many schools would have to find or create extra rooms to house the additional classes created by small class programs, which would require either modifying school buildings or acquiring temporary classroom structures.
In many cases, meeting such needs would mean increasing the size of public school budgets, a step abhorred by fiscal conservatives and those who are critical of public education. The latter have argued that other reforms would cost less and be more effective than reducing class sizes. In response to such claims, various studies have estimated the costs of class size reduction programs or compared their estimated costs with those of other proposed reforms. Unfortunately, studies of this type must make questionable assumptions, so the results of their efforts have not been persuasive.
Nevertheless, reducing the size of classes for students in the early grades often requires additional funds. All students would reap sizable education benefits and long-lasting advantages, however, and students from educationally disadvantaged groups would benefit even more. Indeed, if we are to judge by available evidence, no other education reform has yet been studied that would provide such striking benefits. Debates about reducing class sizes, then, are disputes about values. If citizens are truly committed to providing a quality public education and a level playing field for all students regardless of background, they will find the funds needed to reduce class size.
In Pursuit of Better Schools: What Research Says
Educational Leadership is pleased to publish the first in a series of research reports. This article is condensed from “Small Classes and Their Effects,” a major research synthesis that appears as part of a series supported by the Rockefeller Foundation—In Pursuit of Better Schools: What Research Says. The Rockefeller Foundation supports research on major issues facing education today.
Further information about the series and a longer, downloadable version of this research synthesis may be found at http://edpolicyreports.org in early February. Look for more of this series in upcoming issues of Educational Leadership.