browse EQA
2010issues
2009issues
2008issues
- Beyond the school gate
- Improving student learning
- Let's teach maths and science
- What's real in a virtual world?
2007issues
- Careers and transition
- Curriculum for the 21st century
- Early childhood education & care
- Teachers and Teaching
2006issues
2005issues
2004issues
Winter 2005
The Assessment agenda
Bell curves are for the birds
Standards cannot stand alone. DOUGLAS REEVES pinpoints flaws in traditional tests and identifies some essential features of effective assessment practice in a standards-based environment.
AFTER TRAVELLING TWO MILLION MILES over the past ten years helping to implement standards in schools, I’ve learnt a great deal about how school reform does and does not work. To summarise these lessons in four words: standards are not enough. If our goals are to improve student achievement and educational equity, standards are a necessary but insufficient part of the equation. The mere announcement of standards is a prescription for frustration unless we accompany those standards with formative standards-based classroom assessments that give meaningful and frequent feedback to students and teachers.
The right assessments for the right purpose
While many school systems have either implemented academic content standards or are in the process of developing them, the link between the promise of standards and the reality of their implementation is a tenuous one. Schools that adopt new standards but retain old assessments should not be surprised that the test content will drive educational practice. The most perverse situation occurs when a school claims to be ‘standards-based’ but uses a norm-referenced assessment. The latter does not compare student performance to an objective standard, but only to the performance of other students. Thus we have the spectacle of some students failing to meet standards but labelled ‘proficient’ because they have outscored their peers; and other students who met the standards but are labelled ‘deficient’ because they were outperformed by their peers. Unless standards are linked to assessments, the standards become little more than a political slogan full of good, but empty, intentions.
Even if a school provides annual standards-based tests, it remains insufficient for successful standards-based reform. The key is consistent formative classroom assessment, used not for the purpose of evaluating students, but for the purpose of improving student performance. While the end-of-year exam is the educational autopsy, the formative classroom assessment is a physical. Autopsies may be of some interest to physicians, but these procedures are singularly unhelpful for the patient. Formative assessment, by contrast, provides immediate feedback to students and teachers, allowing for midcourse corrections to be made throughout the year. When assessments are provided on a consistent basis, teachers save classroom time by focusing their instruction on the parts of the curriculum that students need the most, rather than merely engaging in ‘coverage’—a procedure that frequently omits vital content and, in a ritual that bores students throughout the world, covers content that the students have already mastered.
Essentially, we must decide the fundamental purpose of assessment. If the purpose is to rate and rank students with little relationship to what they learnt in school, then norm-referenced tests will do the job. If the purpose is to conduct a summative evaluation of learning, then end-ofyear standards-based assessments will do. But, if the purpose of assessment is, as I believe it must be, the improvement of teaching and learning, then classroom formative assessments based on standards are essential. Rather than an event at the end of the year, these assessments provide constructive feedback for teachers, students, parents and school leaders throughout the year.
From the bell to the mountain
Traditional tests are associated with average scores or norms, hence the ‘normal curve’, commonly called the bell curve (fig 1). Norm-referenced assessments are predicated on the notion that the objective of every classroom teacher is to beat teachers elsewhere, and the objective of every student is not to cooperate, but to compete with others. This impulse to be above average is oddly fostered by the test companies themselves, who are able to report data in such a way that nearly every district and group system can claim to be above average—something most people know is an impossibility. Worse yet, precedence over the demand for knowledge. There are functionally illiterate students who can answer a sufficient quantity of multiple-choice questions to be in the middle band of national test results. It should be worrisome that professional educators, not to mention parents and policymakers, could be satisfied with such an inadequate level of performance. Nonetheless, the traditional approach to assessment takes comfort not in achievement, but in the average.
Traditional assessments seek to discriminate among different students. The ideal test item is one that a substantial number of students will get wrong. These are regarded in the test industry as good discriminators, not because they discriminate in a sense of racial or cultural bias, but because they distinguish one student from another. A test item that is answered correctly by every single student is regarded as a poor discriminator, even though such a wonderful performance was due to the hard work of teachers and students. If a test item fails to differentiate among students, it serves no statistical purpose. Of course, the practical result of this process is to systematically discourage students and teachers, a result apparently assessment advocates. Consider this: If an extraordinarily high percentage of teachers and students work diligently to learn the Pythagorean theorem and, when presented with a test item, carefully display their learning and answer the question correctly, the result in a norm-based system is not celebration, but the determination that the test item must be discarded. By such logic, the driving test would be radically different if traffic safety officials discarded their inspection of a prospective driver’s use of the brake and steering wheel because ‘too many’ driving students got it right.

Standards-based assessments are designed so that a large number of students can achieve proficiency. When a large number of students fail to succeed at a particular challenge on a standards-based assessment, that item is not regarded as a ‘good discriminator’. Rather, it is a signal that more work must be done in the classroom. Standards-based classrooms are built on a philosophical foundation that every child can learn, rather than the philosophy that every child has a fixed place on the bell curve from which movement is unlikely to occur. Visually, performance in the standards-based environment is more like the mountain-shaped curve, the right-most curve in figure 2 that displays the performance progression ‘from the bell to the mountain’. The distribution of student performance in the mountain curve shows that while there are differences among students, those differences need not be failure, as was the case in figure 1. Rather, differences within the mountain curve take place within a narrowed zone of success, not the chasm that separates those who solved the norm-referenced testing puzzle and those who did not.

Proficiency, not guessing
Standards-based assessments involve a demonstration of proficiency, not a guess on a multiple-choice test. While norm-referenced tests are overwhelmingly in multiple-choice format, an assessment designed to measure the degree to which a student meets an objective standard will challenge the student to think, reason, analyse, communicate, write and demonstrate an understanding of learning. When discussing this subject before seminars and other audiences, I will usually issue the challenge, ‘Will those who have never guessed on a multiple-choice test please stand up?’ In thousands of seminars and speeches around the world, I have never once had a participant rise. Critics of standards-based performance assessments believe that multiple-choice tests are inherently more rigorous and objective than standards-based performance assessments. On the contrary, when students have a choice of A, B, C, or D as a response, there is a 25% chance that they can guess correctly, thereby feigning proficiency on that test item when they are clearly not proficient. In a performance assessment, by contrast, students are able to demonstrate proficiency when they have genuinely mastered the subject.
It is evident that the logistics of performance assessments are much more complex than running thousands of answer sheets through an electronic scanner. The supposed need for the efficiency of electronic answer sheets is based on the premise that assessments occur only during one week, and hence thousands of tests must be graded in a very short period of time. In a classroom in which standards-based performance assessments predominate, however, there is no such thing as the week of terror associated with traditional tests. Rather, assessments happen every week of the year. Teachers collaborate on grading, revising and creating new assessments. Teachers participate in evaluating the student work as a collective activity so that educators have a clear idea of what other students in the same grade are able to accomplish. This becomes an important professional development activity for the teacher rather than a mindless administrative task in which bubbles on an answer sheet are compared to letters in a scoring guide.
Conclusion
The implementation of standards is meaningless without a sound system of standards-based assessments that take place throughout the year and that are based on academic standards. These assessments are strikingly different from traditional norm-referenced tests in important ways. Students, teachers and educational leaders must be aware of these differences and make clear to everyone involved in the educational enterprise why standards are important and why standards-based assessments are essential.
The author owns the copyright in this article. For information related to the reuse of this work in any form please contact the publisher denise.quinn@curriculum.edu.au
top





