Even in the midst of constant debate about the state of U.S. education and conflicting opinions regarding the value of No Child Left Behind (NCLB), most educators agree that the primary criterion of school success is the ongoing growth and achievement of every student. Well-intentioned though NCLB may be, certain aspects of the law actually deter schools from fostering academic growth for some students.
NCLB's adequate yearly progress (AYP) model, for example, identifies schools as successful or unsuccessful on the basis of the percentage of students in each grade who have attained the minimum “proficiency” level. This standards-based approach causes two problems.
First, the process used to measure AYP ignores the real progress of large percentages of students. For example, a student who begins the year far below the proficiency level but progresses through the year to only slightly below that level will be lumped into the “non-proficient” group, even though that student's dramatic growth actually reflects school success.
Second, AYP requirements skew school improvement efforts by focusing them on a narrow group of students. Because the current AYP model considers only the number of students labeled “proficient,” it pressures schools to put more resources into supporting the students most likely to attain that rating—in other words, students just below the proficiency level. Other students—those who are advanced and those who have fallen far behind—may receive much less attention. Thus, even schools that successfully meet AYP requirements may have sizeable populations of bright students who are insufficiently challenged and at-risk students who are allowed to languish at the bottom.
Given the inherent weaknesses of NCLB, it is imperative that schools adopt a more stringent set of principles, holding themselves accountable for the strong growth of all students and the accelerated growth of students who are performing below standards. To achieve this goal, schools need much more useful assessments than those that the states now provide.
What should today's assessments look like? First, they need to measure and report individual student achievement growth, especially as it relates to state content and performance standards. In addition, these assessments should provide data that inform instruction and identify needed curriculum adjustments. And finally, they must deliver results that can lead to action quickly, establishing growth targets for each student and providing data that teachers can use to evaluate their own effectiveness, both with individuals and with groups of students.
“We need a new assessment tool that can provide growth data and augment other assessments already in use,” says James Bach, principal of Minnesota's Chaska Middle School East. Bach says that his teachers wantA way to balance the accountability emphasis of the Minnesota state test with a tool that charts individual growth for each student at his or her level of achievement.
Traditional Assessment Tools
Most of the traditional assessments employed by schools, although somewhat useful, do not provide the data needed to accurately assess individual student progress. Linda Clark, superintendent of Meridian School District in Idaho, asserts thatTraditional models of testing do their job well—ranking and sorting kids. But how many times during a student's career does he or she need to be ranked and sorted? What's most important is that such tests do not inform classroom instruction.
Standardized norm-referenced tests, the assessment tool used by most schools, provide useful group data but seldom measure individual student growth adequately. Although the results show how students perform against a national sample, such results can be misleading. When underachievers are rated against one another, for example, their apparent growth may actually reflect unsatisfactory progress; such students may remain far below proficiency levels. Without a true picture of how each student is progressing or failing to progress over time, a school cannot fine-tune its programs or refocus instruction to better respond to the needs of every student.
Criterion-based tests measure how well a student has progressed in specific areas of knowledge and skills toward state performance standards. Although such tests can provide a degree of instructional focus, they are seldom appropriately challenging for every student, and they often offer a limited measurement of growth. Imagine a high-achieving student who scores 100 percent on a criterion-based test. This high score shows that the student has mastered the skills represented on the test, but it does not show how much more the student knows; thus, the test does nothing to enhance the teacher's ability to provide rich enough materials and curriculum content to allow that student to continue to grow.
In contrast, imagine an underachieving student who scores at the bottom of the scale on the criterion-based test. The test simply shows that the student has failed to master the skills on the test, but it does not indicate where the student's skill level actually falls.
Turning to Technology
Computerized assessment can provide the timeliness and cost-effectiveness needed to meet some of the ambitious demands of today's schools. Computerized adaptive testing (CAT) adds another component, enhancing the ability of computerized assessment to quickly and accurately garner information about student achievement. CAT can measure proficiency and growth in specific subjects by custom-adjusting the difficulty of questions as a student takes the test, shaping the assessment to reflect that student's performance and capabilities.
In adaptive testing, if a student answers a question correctly, the subsequent question increases in difficulty; if the student answers the question incorrectly, the next item is easier. For example, Item 1 may show a circle with a portion shaded and ask the student what decimal, to the tenths place, would be used to represent the shaded area. If the student fails to select the correct answer, the computer presents as Item 2 an easier question on decimals: for instance, identifying the greater or lesser of two decimals. But if the student answers the first item correctly, the computer presents as Item 2 a more difficult question: for example, choosing a decimal that represents a shaded region of a circle to the hundredths place, or choosing a fraction or mixed number that represents a terminating decimal.
Increasing Student Engagement
Computerized adaptive testing enhances student engagement by alleviating the boredom that high achievers experience when tests are too easy, as well as the frustration that low-achieving students feel when tests are too difficult.
Let's look at Martin, a 6th grade student who incorrectly answers an item. The next question the computer presents is slightly easier than the first. He understands the question better, but again responds incorrectly. The questions progressively get easier with each incorrect answer. Martin is encouraged because he can better read and understand the questions, and he remains engaged. By the sixth question, Martin answers correctly. The seventh question is a little more difficult, and Martin answers incorrectly. As the computer adjusts to Martin's ability level, it presents the eighth question, which Martin answers correctly. Within only a few items, the test has adapted to Martin's ability. Because the test only presents questions that Martin can reasonably attempt to answer, the results are an excellent indicator of the skills and concepts that will appropriately challenge Martin.
Reflecting Individual Student Growth
The results of computerized adaptive testing provide a growth measure of individual achievement and also show where each student ranks relative to others and relative to a state's proficiency standard.
For example, in the spring of her 5th grade year, Freda achieved a score of 225 on the RIT scale, an equal-interval or vertical measurement scale that measures academic growth like a yardstick measures height. As a 4th grade student the previous spring, she scored 210. The RIT scale enables the school to determine that Freda grew 15 points during the past year. Because this school's district also knows that the scores of 4th graders who started in the 210–220 range in mathematics grew an average of 8 points from spring to spring, Freda's 15-point gain exemplifies strong performance.
We can also look at Freda's growth in the context of state standards to determine whether her progress is on track to meet AYP, graduation, or other requirements. For example, if Freda lived in Iowa she would need to attain a RIT score of 218 in math by grade 6 to be on target to reach a proficiency score of 247 by grade 10. By monitoring Freda's progress each year, her teachers can determine whether Freda is meeting her growth targets.
One District's Use of CAT
Many districts find that the rich, individualized data they get from CAT make it worth adding to their testing portfolio. The Stillwater Area School District in Minnesota uses CAT to correlate student test scores and growth requirements with each student's learning needs. Because results are available almost immediately, teachers can see which skills each student has mastered and which skills he or she needs to work on next. The district uses the Northwest Evaluation Association's CAT-based test, called Measures of Academic Progress (MAP), and receives reports following each administration of the test that identify each student's achievement for specific skill areas. Let's imagine that data on word analysis and vocabulary development indicate that one 6th grade student, Kristen, has a score of 185 on the RIT scale. Because the data representing each area of learning are mapped to the scale, a single report shows the teacher the skills and concepts that Kristen has mastered and those she needs to develop, as well as where those skills lie within a continuum of learning.
Greg, another 6th grader, struggled with traditional tests that gave him questions deemed appropriate for most of his peers. As an underperformer, Greg often could not comprehend the vocabulary in the first few questions on a traditional test. He quickly became frustrated and did not put forth his best effort. Sometimes he spent the remainder of the test guessing answers without attempting to even read the questions. The results from this traditional test did not provide any information that his teacher could use to help Greg. They only indicated that he was performing below the 6th grade level, something his teacher already knew.
The more comprehensive data provided by CAT indicate that Greg's word analysis and vocabulary development are very low but that his achievement in literary analysis is slightly better. By looking at Greg's skills and scores in the context of all the skills mapped out, his teacher can see which skills and concepts are within Greg's ability range and which he should learn next to be progressively challenged and continue to grow. Greg's teacher uses this information to create different instructional tracks for her class, one for those who are on grade level and others for those who are above or below grade level.
Stillwater also uses MAP data to refocus curriculum and instruction as needed. Case in point: Two years ago, Stillwater discovered through review of its test data that far more 7th grade students were academically ready for algebra classes than school resources were able to handle. As a result, the math department restructured its staff and added more algebra classes. Only 21 percent of 7th grade students were taking algebra two years ago; today, 41 percent of students are enrolled.
How CAT and NCLB Can Work Together
NCLB regulations demand that tests measure grade-level progress, comparing the scores of students in a particular grade with the scores of students in that same grade the previous year. But districts and states are finding that they can include growth data obtained through CAT and still satisfy NCLB requirements.
The Idaho Standards Achievement Test, for example, uses a blended approach with a system developed by Northwest Evaluation Association. This system augments the individualized adaptive features of CAT with a core of items aligned with both the content and the achievement standards for the grade in which the students are enrolled. Because the test is based on the equal-interval RIT scale, it not only measures achievement status and academic growth but also shows progress toward state standards.
In a single testing session, the Idaho test first presents students with grade-level items that meet NCLB requirements. It then seamlessly shifts into the adaptive portion, which provides the wealth of data that teachers use in the classroom and administrators use for analyzing needs and portioning out resources. Anecdotally, educators have observed that students attend to the second portion of the test better.
Although districts in other states use CAT to supplement their state tests, Idaho is the only state that currently offers this blended approach. Teachers have warmly received the state test because the data provide them with immediate results, are specific to the students currently in their classrooms (rather than students from the past year), and save them preparation time by outlining specific skill sets for which students need support.
Targeted Professional Development
Besides providing the data that schools need to adjust curriculum, computerized adaptive testing can provide information about the instructional skills and training that teachers must acquire to address areas of weakness, boost student achievement, and improve school performance.
In Minnesota's Chaska Middle School East, teachers studied assessment data to help them refocus their professional development plans and teaching strategies to address areas in which student achievement was lagging. For example, initial test data showed a need for improvement in inferential reading comprehension. For Chaska's principal, James Bach, the response was obvious: The school needed to design professional development to help teachers do a better job of teaching this skill. Actually implementing that type of program proved difficult at first, however. The data that Chaska teachers had to work with at that point—garnered from four different tests—were not complete or timely enough, nor did they systematically track student growth. The call among staff for improved quantity and quality of data led to an equally adamant request for improvements in testing.
Chaska educators found that computerized adaptive testing met all their criteria. Because CAT is tailored to the achievement level of each student, teachers can better assess the needs of all their students, from gifted to learning-disabled, and adapt instructional strategies on an individual basis. Because the system measures academic growth and achievement, teachers now have the data they need to more effectively identify gaps in professional development. In response to recent test results, teachers developed a plan encompassing three different objectives: integrating a new reading strategy in all classes called “preview and highlight”; learning how to teach critical reading skills in all curriculum areas; and learning a note-taking strategy that they could use to assist students in social studies.
A Collaborative Effort
Idaho's Meridian School District uses the quality growth data provided by computerized adaptive testing to foster a more collaborative teaching environment. Teachers share strategies based on data and work together to regroup students according to their instructional needs. The resulting change in culture has led to across-the-board school improvement. Teachers regularly converse about what works, what doesn't, and what resources and materials each has successfully used to further students' progress. They also work together to realign how they teach a subject, selecting specific skills as goals that students must master before they move to the next level. Rather than following the traditional class model—one teacher responsible for 30 or so students who range across the performance spectrum—Meridian maintains a less rigid grouping model that fosters learning and significant growth for kids of all capabilities. As superintendent Linda Clark comments,The old models of instruction just don't work for all kids. To get students to their specific growth targets, you have to deliver instruction differently, depending on where they fall. Usually, the groups that aren't growing are in the top half of the classroom since instruction has traditionally been delivered to the lowest common denominator. We used to say that very bright students will advance on their own, but we now know that students don't grow unless they're introduced to new materials.
The verdict of Meridian teachers on this more collaborative, focused teaching style based on data garnered from computerized adaptive testing is overwhelmingly positive, according to Clark:Our teachers report they now have the kind of power they never felt with traditional testing. We have the process we need and good solid data, so we can see if what we're doing is making a difference. We know where each child starts out, and we know what to do to move each one ahead.
Such data, showing individual student growth, are more valuable than data that merely tell us whether students have exceeded an arbitrary “proficiency” level. When individual students achieve, that is the best indicator of school improvement—one student at a time.