Increasing Automated Scorability of Innovative Item Types

Increasing Automated Scorability of Innovative Item Types

Authors

  • Measurement Services, Pearson
  • Measurement Services, Pearson
  • Measurement Services, Pearson, United States of America

Keywords:

No keywords

Abstract

The cost, time, and potential for increased measurement error associated with Constructed Response (CR) and other innovative format test items has led to the development of Automated Scoring (AS) systems. However, AS systems, while designed to ameliorate these potential challenges, may introduce new challenges of their own, and the primary among those is ensuring that the CR prompt, test interface, and scoring rubric allow for the AS system to identify a deployable scoring model. This article explains how the automated Scorability of CR prompts can be improved by presenting a set of guidelines regarding prompt specification, response formatting, and scoring rubric design. These guidelines are supported through examples in the context of two AS technologies—one for scoring responses to mathematics items and the other for scoring written responses.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Downloads

Published

2024-03-25

How to Cite

Eggers-Robertson, L., Jacobs, G. M., & Wolfe, E. W. (2024). Increasing Automated Scorability of Innovative Item Types. Journal of Applied Testing Technology. Retrieved from http://jattjournal.net/index.php/atp/article/view/173211

Issue

Section

Articles

References

Bloom, B. S., Engelhart, M. D., Furst, E. J., Hill, W. H., & Krathwohl, D. R. (1956). Taxonomy of educational objectives: The classification of educational goals. Vol. Handbook I: Cognitive Domain. New York: David Kay Company.

Brennan, R. L. (1992). Generalizability Theory. Educational Measurement: Issues and Practice, 11(4), 27-34. https:// doi.org/10.1111/j.1745-3992.1992.tb00260.x

Burstein, J. (2003). The E-rater® scoring engine: Automated essay scoring with natural language processing. In M. D. Shermis, & J. C. Burstein (Eds.), Automated Essay Scoring: A Cross-Disciplinary Perspective (pp. 113-121). Mahwah: Lawrence Erlbaum Associates.

Elliot, S. (2003). IntelliMetric™: From here to validity. In M. D. Shermis, & J. C. Burstein (Eds.), Automated Essay Scoring: A Cross-Disciplinary Perspective (pp. 71-86). Mahwah: Lawrence Erlbaum Associates.

Foltz, P. W., Laham, D., & Landauer, T. K. (1999). The intelligent essay assessor: Applications to educational technology. Interactive Media Electronic Journal of Computer-Enhanced Learning, 1(2).

Frederiksen, N. (1981). The Real Test Bias. Princeton: Educational testing service. https://onlinelibrary.wiley.com/doi/pdf/10.1002/j.2333-8504.1981.tb01267.x.

https://doi.org/10.1002/j.2333-8504.1981.tb01267.x

Hoffmann, B. (1962). The Tyranny of Testing. New York: Crowell-Collier Press. Linacre, J. M. (1989). Many-facet rasch measurement. Chicago: MESA Press.

Lottridge, S., Wood, S., & Shaw, D. (2018). The effectiveness of machine score-ability ratings in predicting automated scoring performance. Applied Measurement in Education, 31(3), 215-232. https://doi.org/10.1080/08957347.2018.1464452

Masters, G. N. (1982). A rasch model for partial credit scoring. Psychometrika, 47, 149-174. https://doi.org/10.1007/BF02296272

Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16(2), 159-176. https://doi.org/10.1177/014662169201600206

Page, E. B. (1968). The use of the computer in analyzing student essays. International Review of Education, 14(2), 210-225. https://doi.org/10.1007/BF01419938

Pearson. (1996-2023). Automated Scoring. A visionary approach to automated scoring of educational assessments. https://www.pearsonassessments.com/large-scale-assessments/k-12-large-scale-assessments/automated-scoring.html

Rudner, L. M. & Liang, T. (2002). Automated essay scoring using Bayes’ Theorem. The Journal of Technology, Learning, and Assessment, 1(2), 1-22. https://ejournals.bc.edu/index.php/jtla/article/view/1668/1512

Schneider, C. & Boyer, M. (2020). Design and implementation for automated scoring systems. In D. Yan, A. A. Rupp, & P. W. Foltz (Eds.), Handbook of Automated Scoring: Theory into Practice (pp. 217-240). Boca Raton: CRC Press. https://doi.org/10.1201/9781351264808-12

Wiggins, G. (1989). A true test: Toward more authentic and equitable assessment. Phi Delta Kappan, 70(9), 703-713. https://doi.org/10.1177/003172171109200721

Williamson, D. M., Bennett, R. E., Lazer, S., Bernstein, J., Foltz, P., Landauer, T. K., Rubin, D. P., Way, W. D., & Sweeney, K. (2010). Automated Scoring for the Assessment of Common Core Standards ELA. Princeton, NJ: Educational Testing Service. https://web.archive.org/web/20220707121614/

https://www.ets.org/s/commonassessments/pdf/AutomatedScoringAssessCommonCoreStandards.pdf

Wolfe, E. W. (2020). Human scoring with automated scoring in mind. In D. Yan, A. A. Rupp, & P. W. Foltz (Eds.), Handbook of Automated Scoring: Theory into Practice (pp. 49-68). Boca Raton: Chapman and Hall/CRC. https://doi.org/10.1201/9781351264808-4

Similar Articles

1 2 3 4 5 6 > >> 

You may also start an advanced similarity search for this article.

Loading...