Abstract
In the assessment of essay writing, reliably measuring examinee ability can be difficult owing to bias effects arising from rater characteristics. To address this, item response theory (IRT) models that incorporate rater characteristic parameters have been proposed. These models estimate the ability of examinees from scores assigned by multiple raters while considering their scoring characteristics, thereby achieving more accurate measurement of ability compared with a simple average of scores. However, issues arise when different groups of examinees are assessed by distinct sets of raters. In such cases, test linking is required to standardize the scale of ability estimates among multiple examinee groups. Traditional test linking methods require administrators to design groups in which either examinees or raters are partially shared—a requirement that is often impractical in real-world assessment settings. To overcome this problem, we introduce a novel linking method that does not rely on common examinees and raters by utilizing a recent automated essay scoring (AES) method. Our method not only facilitates test linking but also enables effective collaboration between human raters and AES, which enhances the accuracy of ability measurement.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
https://kaggle.com/competitions/asap-aes
This is a publicly available dataset following the privacy policy of the Hewlett Foundation.
References
Abosalem, Y.: Assessment techniques and students’ higher-order thinking skills. Int. J. Secondary Educ. 4, 1–11 (2016)
Bock, R.D., Zimowski, M.F.: Handbook of Modern Item Response Theory. Springer, New York (1997)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186 (2019)
Engelhard, G.: Constructing rater and task banks for performance assessments. J. Outcome Meas. 1, 19–33 (1997)
Linacre, J.M.: Many-Faceted Rasch Measurement. MESA Press (1989)
Linacre, J.M.: A user’s guide to FACETS: Rasch-model computer programs (2014)
Loyd, B.H., Hoover, H.D.: Vertical equating using the Rasch model. J. Educ. Meas. 17(3), 179–193 (1980)
Patz, R.J., Junker, B.W.: Applications and extensions of MCMC in IRT: multiple item types, missing data, and rated responses. J. Educ. Behav. Stat. 24(4), 342–366 (1999)
Patz, R.J., Junker, B.W., Johnson, M.S., Mariano, L.T.: The hierarchical rater model for rated test items and its application to large-scale educational assessment data. J. Educ. Behav. Stat. 27(4), 341–384 (2002)
Sinharay, S., Holland, P.W.: A new approach to comparing several equating methods in the context of the NEAT design. J. Educ. Meas. 47, 261–285 (2010)
Uto, M.: Accuracy of performance-test linking based on a many-facet Rasch model. Behav. Res. Methods 53(4), 1440–1454 (2021)
Uto, M., Okano, M.: Learning automated essay scoring models using item-response-theory-based scores to decrease effects of rater biases. IEEE Trans. Learn. Technol. 14(6), 763–776 (2021)
Uto, M., Ueno, M.: A generalized many-facet Rasch model and its Bayesian estimation using Hamiltonian Monte Carlo. Behaviormetrika 47(2), 469–496 (2020)
Wiberg, M., Branberg, K.: Kernel equating under the non-equivalent groups with covariates design. Appl. Psychol. Measur. 39 (2015)
Wilson, M., Hoskens, M.: The rater bundle model. J. Educ. Behav. Stat. 26(3), 283–306 (2001)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Aramaki, K., Uto, M. (2024). Collaborative Essay Evaluation with Human and Neural Graders Using Item Response Theory Under a Nonequivalent Groups Design. In: Olney, A.M., Chounta, IA., Liu, Z., Santos, O.C., Bittencourt, I.I. (eds) Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium and Blue Sky. AIED 2024. Communications in Computer and Information Science, vol 2151. Springer, Cham. https://doi.org/10.1007/978-3-031-64312-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-64312-5_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-64311-8
Online ISBN: 978-3-031-64312-5
eBook Packages: Computer ScienceComputer Science (R0)