(In)Stability of Test Scores
Keywords:
large-scale testing, G-theory, educational policy, test reliabilityAbstract
Both school and district administrators use the results of standardized, large-scale tests to inform decisions about the need for, or success of, educational programs and interventions. However, test results at the school level are subject to random fluctuations due to changes in cohort, test items, and other factors outside of the school’s control. This study examined year to year changes in school level results on standardized tests delivered in Ontario, Canada. G-theory analyses found that test scores are not stable enough for meaningful conclusions to be made based on year to year changes in school level results. For small and medium sized schools, years of data need to be collected before defensible decisions can be made about trends in test scores. The authors introduce a ‘bounce’ statistic that provides a simple, easy to interpret measure of test score stability.
References
Alberta Ministry of Education. (2021). Student learning assessments. https://www.alberta.ca/student-learning-assessments.aspx
Anderson, J. O., Lin, H. S., Treagust, D. F., Ross, S. P., & Yore, L. D. (2007). Using large-scale assessment datasets for research in science and mathematics education: Programme for International Student Assessment (PISA). International Journal of Science and Mathematics Education, 5(4), 591-614. https://doi.org/10.1007/s10763-007-9090-y
Artuso, A. (2016, February, 28). School rankings raise many questions. The Toronto Sun.
http://www.torontosun.com/2016/02/27/school-rankings-raise-many-questions.
Bolden, B., Christou, T., DeLuca, C., Klinger, D. A., Kutsyuruba, B., Pyper, J., Shulha, L. M., & Wade-Woolley, L. (2014). Collaborative inquiry in Ontario schools. An evaluation report for the Ontario Ministry of Education. Literacy and Numeracy Secretariat.
Brennan, R. L. (2010). Generalizability theory and classical test theory. Applied Measurement in Education, 24(1), 1-21. https://doi.org/10.1080/08957347.2011.532417
Briesch, A. M., Chafouleas, S. M., & Johnson, A. (2016). Use of generalizability theory within k–12 school-based assessment: A critical review and analysis of the empirical literature. Applied Measurement in Education, 29(2), 83-107. https://doi.org/10.1080/08957347.2016.1138955
British Columbia Minstry of Education. (2021). Foundation skills assessment. https://www2.gov.bc.ca/gov/content/education-training/k-12/administration/program-management/assessment/foundation-skills-assessment.
Broglio, S. P., Zhu, W., Sopiarz, K., & Park, Y. (2009). Generalizability theory analysis of balance error scoring system reliability in healthy young adults. Journal of Athletic Training, 44(5), 497-502. https://doi.org/10.4085/1062-6050-44.5.497
Calder, M. (2015). Board working to improve grade 9 EQAO math scores. http://www.ucdsb.on.ca/ucdsbnews/2015-2016SchoolYear/October/Pages/UCDSBGrade9MathEQAOScores.aspx
Canadian Language and Literacy Research Network. (2008). The impact of the literacy and numeracy secretariat: Phase 2 program evaluation. University of Western Ontario.
Cowley, P., & Emes, J. (2020). Report card in Ontario’s elementary schools 2020. Fraser Institute. https://www.fraserinstitute.org/sites/default/files/ontario-elementary-school-rankings-2020-13385.pdf
Earl, L. (2008). Leadership for evidence-informed conversations. In L. M. Earl & H. Timperley (Eds.), Professional learning conversations: Challenges in using evidence for improvement (Vol. 1, pp. 43-52). Springer Science & Business Media.
Earl, L., & Katz, S. (2006). Leading in a data rich world: Harnessing data for school improvement. Corwin.
Educational Quality and Accountability Office. (2017). Ontario student achievement: EQAO’s provincial elementary school report: Results of the assessments of reading, writing and mathematics, primary division (grades 1–3) and junior division (grades 4–6), 2016–2017. https://www.eqao.com/provincial-report-elementary-2017/
Educational Quality and Accountability Office. (2020). About EQAO. https://www.eqao.com/about-eqao/
Gagnon, R., Charlin, B., Lambert, C., Carriere, B., & Van der Vleuten, C. (2009). Script concordance testing: more cases or more questions? Advances in Health Sciences Education, 14(3), 367-375.
Goren, P. (2012). Data, data, and more data—What’s an educator to do? American Journal of Education, 118(2), 233-237.
Hamilton Wentworth District School Board. (2019). HWDSB EQAO results leads to investment in people, practice and progress. https://www.hwdsb.on.ca/wp-content/uploads/2019/09/EQAO-Infographic-2019.pdf
Hastings Prince Edward District School Board. (2012). EQAO results for grades 3, 6 and 9 continue to improve. http://www.hpedsb.on.ca/archives/eqao-results-for-grade-3-6-and-9-continued-to-improve/
Hattie, J. (2008). Visible learning: A synthesis of over 800 meta-analyses relating to achievement. Routledge.
Hollingshead, L., & Childs, R. A. (2011). Reporting the percentage of students above a cut score: The effect of group size. Educational Measurement: Issues and Practice, 30(1), 36-43. https://doi.org/10.1111/j.1745-3992.2010.00198.x
Klinger, D. A., DeLuca, C., & Miller, T. (2008). The evolving culture of large-scale assessments in Canadian education. Canadian Journal of Educational Administration and Policy, 76(3), 1–34.
Klinger, D. A., & Rogers, W. T. (2011). Teachers’ perceptions of large-scale assessment programs within low-stakes accountability frameworks. International Journal of Testing, 11(2), 122–143. https://doi.org/10.1080/15305058.2011.552748
Klinger, D. A., Rogers, W. T., Anderson, J. O., Poth, C., & Calman, R. (2006). Contextual and school factors associated with achievement on a high-stakes examination. Canadian Journal of Education, 29(3), 771–797. https://doi.org/10.2307/20054195
Klinger, D. A., & Wade-Woolley, L. (2009). Supporting low performing schools in Ontario. Technical report prepared for the U. S. department of education. WestEd Organization.
Leithwood, K. (2011). School leadership, evidence-based decision making, and large-scale student assessment. In C. Webber & J. Lupart (Eds.), Leading student assessment (pp. 17-39). Springer.
Limestone District School Board. (2017). EQAO results show achievement in some levels continuing to improve. https://www.limestone.on.ca/news/news_releases_2017-2018/e_q_a_o_results_show_achievement_in_some_levels_co
Manitoba Ministry of Education. (n.d.). Assessment and evaluation. https://www.edu.gov.mb.ca/k12/assess/assess_program.html
McDonnell, L. M. (2005). Assessment and accountability from the policy maker’s perspective. In J. Herman & E. Haertel (Eds.), Uses and misuses of data for educational accountability and improvement (104th Yearbook of the National Society for the Study of Education) (pp. 35–54). Blackwell.
McNeish, D. (2017). Small sample methods for multilevel modeling: A colloquial elucidation of REML and the Kenward-Roger correction. Multivariate Behavioral Research, 52(5), 661-670. https://doi.org/10.1080/00273171.2017.1344538
Ontario Ministry of Education. (2010). Growing success: Assessment, evaluation and reporting in Ontario schools. Author. http://www.edu.gov.on.ca/eng/policyfunding/growSuccess.pdf
Prince Edward Island Ministry of Education. (2019). Provincial assessments. https://www.princeedwardisland.ca/en/information/education-and-lifelong-learning/provincial-assessments
Rainbow District School Board. (2016). School valuation framework. https://www.rainbowschools.ca/wp-content/uploads/2016/04/School_Information_Profile.pdf
Renfrew County District School Board. (2016). Board improvement plan for student achievement and well-being kindergarten to grade 12: 2016-2017. https://www.rcdsb.on.ca/en/resourcesGeneral/RCDSBBIPSA2016-2017-1.pdf
Rogers, W. T. (2014). Improving the utility of large-scale assessments in Canada. Canadian Journal of Education/Revue canadienne de l'éducation, 37(3), 1-22.
Scholarhood. (2017). Compare schools & neighbourhoods. We help families find homes in the boundaries of the best schools. www.scholarhood.ca
Toronto District School Board. (2018). Multi-year strategic plan. https://www.tdsb.on.ca/Portals/0/leadership/board_room/Multi-Year_Strategic_Plan.pdf
Ungerleider, C. (2006). Reflections on the use of large-scale student assessment for improving student success. Canadian Journal of Education, 29(3), 873–873. https://doi.org/10.2307/20054200
Upper Canada District School Board. (2018). Board improvement plan for student achievement and wellness 2018-2019. https://p16cdn4static.sharpschool.com/UserFiles/Servers/Server_148343/File/Our_Board/District%20Plans/BIPSAW/BIPSAW%20UCDSB%202018-2019%20Full%20Version.pdf
Volante, L. (2004). Teaching to the test: What every educator and policy-maker should know. Canadian Journal of Educational Administration and Policy, 35, 1-9.
Waterloo Region District School Board. (2016). Standardized test results show room to improve. https://cle.wrdsb.ca/2016/09/22/eqao-message-from-our-director/
Downloads
Published
Issue
Section
License
Copyright (c) 2022 Stefan Merchant, Jessica Rich, Don Klinger
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.