Collaborative Essay Evaluation with Human and Neural Graders Using Item Response Theory Under a Nonequivalent Groups Design

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2151))

Included in the following conference series:

International Conference on Artificial Intelligence in Education

1263 Accesses

Abstract

In the assessment of essay writing, reliably measuring examinee ability can be difficult owing to bias effects arising from rater characteristics. To address this, item response theory (IRT) models that incorporate rater characteristic parameters have been proposed. These models estimate the ability of examinees from scores assigned by multiple raters while considering their scoring characteristics, thereby achieving more accurate measurement of ability compared with a simple average of scores. However, issues arise when different groups of examinees are assessed by distinct sets of raters. In such cases, test linking is required to standardize the scale of ability estimates among multiple examinee groups. Traditional test linking methods require administrators to design groups in which either examinees or raters are partially shared—a requirement that is often impractical in real-world assessment settings. To overcome this problem, we introduce a novel linking method that does not rely on common examinees and raters by utilizing a recent automated essay scoring (AES) method. Our method not only facilitates test linking but also enables effective collaboration between human raters and AES, which enhances the accuracy of ability measurement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Linking essay-writing tests using many-facet models and neural automated essay scoring

Article Open access 20 August 2024

Rater-Effect IRT Model Integrating Supervised LDA for Accurate Measurement of Essay Writing Ability

Integration of Automated Essay Scoring Models Using Item Response Theory

Notes

1.
https://kaggle.com/competitions/asap-aes
This is a publicly available dataset following the privacy policy of the Hewlett Foundation.

References

Abosalem, Y.: Assessment techniques and students’ higher-order thinking skills. Int. J. Secondary Educ. 4, 1–11 (2016)
Article Google Scholar
Bock, R.D., Zimowski, M.F.: Handbook of Modern Item Response Theory. Springer, New York (1997)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186 (2019)
Google Scholar
Engelhard, G.: Constructing rater and task banks for performance assessments. J. Outcome Meas. 1, 19–33 (1997)
Google Scholar
Linacre, J.M.: Many-Faceted Rasch Measurement. MESA Press (1989)
Google Scholar
Linacre, J.M.: A user’s guide to FACETS: Rasch-model computer programs (2014)
Google Scholar
Loyd, B.H., Hoover, H.D.: Vertical equating using the Rasch model. J. Educ. Meas. 17(3), 179–193 (1980)
Article Google Scholar
Patz, R.J., Junker, B.W.: Applications and extensions of MCMC in IRT: multiple item types, missing data, and rated responses. J. Educ. Behav. Stat. 24(4), 342–366 (1999)
Article Google Scholar
Patz, R.J., Junker, B.W., Johnson, M.S., Mariano, L.T.: The hierarchical rater model for rated test items and its application to large-scale educational assessment data. J. Educ. Behav. Stat. 27(4), 341–384 (2002)
Article Google Scholar
Sinharay, S., Holland, P.W.: A new approach to comparing several equating methods in the context of the NEAT design. J. Educ. Meas. 47, 261–285 (2010)
Article Google Scholar
Uto, M.: Accuracy of performance-test linking based on a many-facet Rasch model. Behav. Res. Methods 53(4), 1440–1454 (2021)
Article Google Scholar
Uto, M., Okano, M.: Learning automated essay scoring models using item-response-theory-based scores to decrease effects of rater biases. IEEE Trans. Learn. Technol. 14(6), 763–776 (2021)
Article Google Scholar
Uto, M., Ueno, M.: A generalized many-facet Rasch model and its Bayesian estimation using Hamiltonian Monte Carlo. Behaviormetrika 47(2), 469–496 (2020)
Article Google Scholar
Wiberg, M., Branberg, K.: Kernel equating under the non-equivalent groups with covariates design. Appl. Psychol. Measur. 39 (2015)
Google Scholar
Wilson, M., Hoskens, M.: The rater bundle model. J. Educ. Behav. Stat. 26(3), 283–306 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

The University of Electro-Communications, Tokyo, Japan
Kota Aramaki & Masaki Uto

Authors

Kota Aramaki
View author publications
You can also search for this author in PubMed Google Scholar
Masaki Uto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Kota Aramaki or Masaki Uto .

Editor information

Editors and Affiliations

University of Memphis, Memphis, TN, USA
Andrew M. Olney
University of Duisburg-Essen, Duisburg, Germany
Irene-Angelica Chounta
Jinan University, Guangzhou, China
Zitao Liu
UNED, Madrid, Spain
Olga C. Santos
Universidade Federal de Alagoas, Maceio, Brazil
Ig Ibert Bittencourt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aramaki, K., Uto, M. (2024). Collaborative Essay Evaluation with Human and Neural Graders Using Item Response Theory Under a Nonequivalent Groups Design. In: Olney, A.M., Chounta, IA., Liu, Z., Santos, O.C., Bittencourt, I.I. (eds) Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium and Blue Sky. AIED 2024. Communications in Computer and Information Science, vol 2151. Springer, Cham. https://doi.org/10.1007/978-3-031-64312-5_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-64312-5_10
Published: 02 July 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-64311-8
Online ISBN: 978-3-031-64312-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics