Evaluating Coherence in Writing: Comparing the Capacity of Automated Essay Scoring Technologies
Keywords:Attribute-Specific Scoring, Automated Essay Scoring, Coherence Scoring, Deep-Neural Automated Essay Scoring
Automated Essay Scoring (AES) technologies provide innovative solutions to score the written essays with a much shorter time span and at a fraction of the current cost. Traditionally, AES emphasized the importance of capturing the â€œcoherenceâ€ of writing because abundant evidence indicated the connection between coherence and the overall writing quality yet, limited studies have been conducted to investigate the capacity of the modern and traditional automated essay scoring technologies in capturing the sequential information (i.e., cohesion). In this study, we investigate the performance of traditional and modern AES systems in attribute-specific scoring. Traditional AES focuses on holistic scoring with limited application for the attribute-specific scoring. Hence, the current study focuses on understanding whether a deep-neural AES system using a convolutional neural networks approach could provide better performance in attribute-specific essay scoring compared to a traditional feature-based AES system in capturing coherence scores in essays. Our finding indicated that a deep-neural AES model showed improved accuracy in predicting coherence-related score categories. Implications for the scoring capacity of the two models are also discussed.
How to Cite
Adler-Kassner, L., & Oâ€™Neill, P. (2010). Reframing writing assessment to improve teaching and learning; Utah State University Press. https://doi.org/10.2307/j.ctt4cgrtq
Alikaniotis, D., likaniotis, D., Yannakoudakis, H., & Rei, M. (2016). Automatic text scoring using neural networks.arXiv preprint arXiv:1606.04289. https://doi.org/10.18653/v1/P16-1068
Attali, Y., & Burstein, J. (2004). Automated essay scoring with eâ€raterÂ® v. 2.0. ETS Research Report Series, 2004(2), i-21. https://doi.org/10.1002/j.2333-8504.2004.tb01972.x
Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit. â€œOâ€™Reilly Media, Incâ€.
Bridgeman, B., Trapani, C., & Attali, Y. (2012). Comparison of human and machine scoring of essays: Differences by gender, ethnicity, and country. Applied Measurement in Education, 25(1), 27-40. https://doi.org/10.1080/08957347.2012.635502
Burstein, J., Tetreault, J., & Andreyev, S. (2010, June). Using entity-based features to model coherence in student essays. In Human language technologies: The 2010 annual conference of the North American chapter of the Association for Computational Linguistics (pp. 681684).
Coyle, J.P. (2010) Teaching writing skills that enhance student success in future employment. Collected Essays on Learning and Teaching, 3, pp.195-200. https://doi.org/10.22329/celt.v3i0.3262
Crossley, S. and McNamara, D. (2010). Cohesion, coherence, and expert evaluations of writing proficiency. In Proceedings of the Annual Meeting of the Cognitive Science Society (Vol. 32, No. 32).
DeVillez, R. (2003). Writing: Step by step. Kendall Hunt.
Dong, F. and Zhang, Y. (2016) November. Automatic features for essay scoring-an empirical study. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 1072-1077). https://doi.org/10.18653/v1/D16-1115 PMid:27154846
Dong, F., Zhang, Y. and Yang, J. (2017, August). Attentionbased recurrent convolutional neural network for automatic essay scoring. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017) (pp. 153-162). https://doi.org/10.18653/v1/K17-1017
Farag, Y., Yannakoudakis, H. and Briscoe, T. (2018). Neural automated essay scoring and coherence modeling for adversarially crafted input. arXiv preprint arXiv:1804.06898. https://doi.org/10.18653/v1/N181024
Graham, M., Milanowski, A., & Miller, J. (2012). Measuring and promoting inter-rater agreement of teacher and principal performance ratings. Center for Educator Compensation Reform. http://files.eric.ed.gov/fulltext/ ED532068.pdf.
Hamp-Lyons, L. (2002). The scope of writing assessment. Assessing writing, 8(1), pp.5-16. https://doi.org/10.1016/S1075-2935(02)00029-6
Higgins, D., Burstein, J., Marcu, D. and Gentile, C. (2004). Evaluating multiple aspects of coherence in student essays. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLTNAACL 2004 (pp. 185-192).
Hunter, D.M., Jones, R.M. and Randhawa, B.S. (1996). The use of holistic versus analytic scoring for large-scale assessment of writing. The Canadian Journal of Program Evaluation, 11(2), p.61.
Johns, A. M. (1986). Coherence and academic writing: Some definitions and suggestions for teaching. Tesol Quarterly, 20(2), 247-265. https://doi.org/10.2307/3586543
Ke, Z. and Ng, V. (2019), August. Automated Essay Scoring: A Survey of the State of the Art. In IJCAI (pp. 63006308). https://doi.org/10.24963/ijcai.2019/879
Kim, Y. (2014). Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882. https://doi.org/10.3115/v1/D14-1181
Landis, J. R., & Koch, G. G. (1977). An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics, 363-374. https://doi.org/10.2307/2529786 PMid:884196
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. https://doi.org/10.1109/5.726791
Lee, H., Grosse, R., Ranganath, R. and Ng, A.Y. (2009, June). Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th annual international conference on machine learning (pp. 609-616). https://doi.org/10.1145/1553374.1553453
Li, J., Li, R. and Hovy, E. (2014, October). Recursive deep models for discourse parsing. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 2061-2069). https://doi.org/10.3115/v1/D14-1220
Lukhele, R., Thissen, D. and Wainer, H. (1994). On the relative value of multipleâ€choice, constructed response, and examineeâ€selected items on two achievement tests. Journal of Educational Measurement, 31(3), 234-250. https://doi.org/10.1111/j.1745-3984.1994.tb00445.x
Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in neural information processing systems, 30.
Mascle, D.D. (2013). Writing self-efficacy and written communication skills. Business Communication Quarterly, 76(2), 216-225. https://doi.org/10.1177/1080569913480234
Mathias, S. and Bhattacharyya, P. (2018, May). ASAP++: Enriching the ASAP automated essay grading dataset with essay attribute scores. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018).
McNamara, D. S., Crossley, S. A., Roscoe, R. D., Allen, L. K., & Dai, J. (2015). A hierarchical classification approach to automated essay scoring. Assessing Writing, 23, 35-59. https://doi.org/10.1016/j.asw.2014.09.002
Mikolov, T., KarafiÃ¡t, M., Burget, L., CernockÃ½, J., & Khudanpur, S. (2010, September). Recurrent neural network based language model. In Interspeech (Vol. 2, No. 3, pp. 1045-1048). https://doi.org/10.21437/ Interspeech.2010-343
Miltsakaki, E., & Kukich, K. (2004). Evaluation of text coherence for electronic essay scoring systems. Natural Language Engineering, 10(1), 25-55. https://doi.org/10.1017/S1351324903003206
Ng, H. T., Wu, S. M., Wu, Y. Ch. Hadiwinoto, & J. Tetreault. (2013). The CoNLL-2013 shared task on grammatical error correction. Proceedings of CoNLL: Shared Task. https://doi.org/10.3115/v1/W14-1701
Nopita, D. (2011). Constructing coherent ideas and using coherence devices in written descriptive essays: A study at the fourth grade English Department students of STBA Haji Agus Salim Bukittinggi. Lingua Didaktika: Jurnal Bahasa danPembelajaran Bahasa, 4(2), 96-104. https://doi.org/10.24036/ld.v4i2.1260
Page, E. B. (1994). Computer grading of student prose, using modern concepts and software. The Journal of experimental education, 62(2), 127-142. https://doi.org/10.108 0/00220973.1994.9943835
Pennington, J., Socher, R., & Manning, C. D. (2014, October). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp.1532-1543). https://doi.org/10.3115/v1/D14-1162
Shin, J., & Gierl, M. J. (2021). More efficient processes for creating automated essay scoring frameworks: A demonstration of two algorithms. Language Testing, 38(2), 247-272. https://doi.org/10.1177/0265532220937830
Stecher, B. M., Rahn, M. L., Ruby, A., Alt, M. N., & Robyn, A. (1997). Using alternative assessments in vocational education: Appendix B: Kentucky Instructional Results Information System (KIRIS). Berkeley, CA: National Center for Research in Vocational Education.
Taghipour, K. and Ng, H.T. (2016, November). A neural approach to automated essay scoring. In Proceedings of the 2016 conference on empirical methods in natural language processing (pp. 1882-1891). https://doi.org/10.18653/v1/D16-1193
Tay, Y., Luu, A. T., & Hui, S. C. (2018). Recurrently controlled recurrent networks. Advances in neural information processing systems, 31.
Tay, Y., Phan, M., Tuan, L. A., & Hui, S. C. (2018, April). Skip Flow: Incorporating neural coherence features for end-to-end automatic text scoring. In Proceedings of the AAAI conference on artificial intelligence, 32(1), 5948-5955. https://doi.org/10.1609/aaai.v32i1.12045
Viera, A. J., & Garrett, J. M. (2005). Understanding interobserver agreement: The kappa statistic. Family medicine, 37(5), 360-363.
Williams, R.J. & Zipser, D. (1989). A learning algorithm for continually running fully recurrent neural networks. Neural computation, 1(2), 270-280. https://doi.org/10.1162/neco.19220.127.116.110
Zaidi, A.H., (2016). Neural Sequence Modelling for Automated Essay Scoring [Unpublished masterâ€™s thesis].
University of Cambridge. https://www.cl.cam.ac.uk/~ahz22/docs/mphil-thesis.pdf
Zhao, S., Zhang, Y., Xiong, X., Botelho, A. and Heffernan, N. (2017, April). A memory-augmented neural model for automated grading. In Proceedings of the Fourth (2017)
ACM Conference on Learning@ Scale (pp. 189-192) https://doi.org/10.1145/3051457.3053982