High-stakes testing is the practice of using students’ scores on standardized exams to inform major education decisions, such as school closings, hiring and firing of teachers, and/or student promotion and graduation. It is one of the primary issues affecting New York’s students, who are required to sit for at least sixteen standardized exams between third and eighth grade, and an additional five to graduate from high school (New York City Department of Education, undated). Test scores also determine 85 percent of a school’s grade on the New York City Department of Education’s “Progress Reports,” which in turn helps determine which schools will be closed (New York City Department of Education 2012). Proponents argue that the test score data help schools target instruction to their greatest needs and are a necessary standard measure of school and educator quality. Some argue that evaluating students, teachers, and schools by means of test scores and the consequences imposed provide a powerful motivator to improve.
Yet many researchers have raised questions about the accuracy, effectiveness, and educational benefits of using high-stakes testing in schools (Heubert & Hauser 1999; National Research Council 2009; Hout & Elliot 2011).1 The emphasis on standardized exams in New York City and elsewhere has encouraged teaching to the test, a narrowing of curriculum, and increasing amounts of money and instructional time devoted to preparing for the exams, giving them and scoring them.2 Data from the most reliable assessments, the national exams known as the NAEPs, show no narrowing of the achievement gap and little progress in achievement in New York City schools compared to other large cities (Haimson & Marcus 21012). The New York State exam results also show large and persistent achievement gaps in the city’s public schools (New York City Department of Education 2010). Basing high-stakes decisions on these exams can result in students with the greatest needs being deprived of necessary resources and opportunities. Extensive research also shows that retaining students on the basis of low test scores doesn’t help them succeed, but instead leads to higher dropout rates (Institute for Education and Social Policy & National Center for Schools and Communities 2004).
Students of color and students from low-income families consistently perform worse on standardized tests than their white peers (Vanneman et al.; Dillion 2009). English language learners and students with disabilities also struggle to achieve on standardized tests and suffer educational penalties as a result of high stakes placed on exam scores.3 Federal policy initiatives to turn around low-performing schools through high-stakes testing, punitive sanctions, and demanding proficiency goals have largely failed (Hout & Elliot 2011). In addition, the validity of New York State’s standardized tests was dealt a serious blow in 2010 when Harvard researchers discovered the tests were becoming easier to pass, creating an illusion that students were doing better each year than they actually were (Koretz 2010).
Campbell’s Law4 suggests that any time high stakes are placed on one set of quantitative indicators, the less reliable these indicators will become, as people learn how to “game” the system. Accordingly, there has been a sharp increase of cheating allegations as schools, teachers, and students are increasingly judged on the results of standardized exams (Otterman 2011).
A nine-year study published by the National Academy of Sciences in 2011 confirmed many negative effects that educators have feared from emphasis on high-stakes testing. For example, tests not only fail to encourage students’ curiosity, but they fail to adequately measure student knowledge in tested subjects. Further, the higher the stakes attached to an exam, the more intensely focused teachers must be on teaching the material in the exam. This leads to a narrowing of curriculum and a decrease in student learning on untested subjects. The study’s authors recommended “designing an incentive system that uses multiple performance measures” to achieve better outcomes for students (Hout & Elliot 2011).
In New York City, a group of schools known as the Performance Standards Consortium has done that, with impressive results. Rather than sitting for Regents Exams to graduate high school, Consortium students are required to complete “performance-based assessment tasks” such as research papers, original science experiments, and mathematical analyses of real-world problems. A report on their academic outcomes showed remarkable results for students in high-risk groups. For example, the graduation rate for black students in Consortium schools is nearly 61 percent, compared to 54 percent for other New York City schools. For English language learners, the graduation rate is 69.5 percent, compared to 39.7 percent citywide. Special education students attending Consortium schools graduate at double the rates of other city schools. In 2011, 94 percent of Consortium graduates attending four-year colleges returned for their second year, compared to a national return rate of 75 percent (New York Performance Standards Consortium 2012).
Research does not support the use of standardized tests as the sole or primary factor in making high-stakes education policy or promotion decisions. Using multiple measures of student progress, based on performance or portfolio work, supports development of richer curriculum and results in better outcomes for all students, especially those at-risk in New York City.
1 In addition, see: New York State Professors Against High Stakes Testing, “New York State Regents: End the reliance on high stakes standardized testing,” http://www.change.org/petitions/new-york-state-regents-end-the-reliance-on-highstakes-standardized-testing.
2 In 2008, the Independent Budget Office estimated that New York City spent $130 million on the city’s accountability initiative during that year; this analysis did not count the cost of the state exams, the instructional time spent on test prep, or teachers removed from the classroom to score them (Independent Budget Office 2008).
3 “In 2010, 24 percent of 4th graders labeled as ELLs were deemed proficient in English Language Arts compared to 58 percent of non-ELLs. By eighth grade only 4 percent of ELLs were classified as proficient compared to 54 percent of non-ELLs. It is therefore little surprise that of the 2006 cohort, only 40 percent of ELLs graduated after four years compared to 75 percent for non-ELLs.” From: New York State Professors Against High Stakes Testing, “New York State Regents: End the reliance on high stakes standardized testing,” www.change.org/petitions/new-york-stateregents-end-the-reliance-on-high-stakes-standardized-testing.
4 Campbell’s Law states: “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”
Hout, M., and Elliot, S., eds. 2011. Incentives and Test-Based Accountability in Education. Washington DC: National Academies Press.
Vanneman, A., L. Hamilton, J. Baldwin Anderson, and T. Rahman, T. 2009. Achievement Gaps: How Black and White Students in Public Schools Perform in Mathematics and Reading on the National Assessment of Educational Progress. NCES 2009-455. Washington, DC: National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education. Downloadable PDF available at:
Koretz, D. 2010. “Evidence About the Leniency of 8th-Grade Standards,” letter to David Steiner (June 20)
Examples of Best Policy and Practice
The New York Performance Standards Consortium
Prepared by: Johanna E. Miller, New York Civil Liberties Union, www.nyclu.org, and Leonie Haimson, Class Size Matters, www.classsizematters.org