Abstract
The automatic evaluation of chat-oriented dialogue systems remains an open problem. Most studies have evaluated them by hand, but this approach requires huge cost. We propose a regression-based automatic evaluation method that evaluates the utterances generated by chat-oriented dialogue systems based on the similarities to many reference sentences and their annotated evaluation values. Our proposed method estimates the scores of utterances with high correlations to the human annotated scores; the sentence-wise correlation coefficients reached 0.514, and the system-wise correlation were 0.772.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
We used NIST geometric sequence smoothing, which is implemented in nltk (Method 3).
References
Higashinaka R, Imamura K, Meguro T, Miyazaki C, Kobayashi N, Sugiyama H, Hirano T, Makino T, Matsuo Y (2014) Towards an open-domain conversational system fully based on natural language processing. In: Proceedings of the 25th international conference on computational linguistics, pp. 928–939
Sugiyama H, Meguro T, Higashinaka R, Minami Y (2014) Open-domain utterance generation using phrase pairs based on dependency relations. In: Proceedings of spoken language technology workshop, pp. 60–65
Ritter A, Cherry C, Dolan W (2011) Data-driven response generation in social media. In: Proceedings of the 2011 conference on empirical methods in natural language processing, pp. 583–593
Papineni K, Roukos S, Ward T, Zhu W (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, pp. 311–318
Galley M, Brockett C, Sordoni A, Ji Y, Auli M, Quirk C, Mitchell M, Gao J, Dolan B (2015) Delta-BLEU: a discriminative metric for generation tasks with intrinsically diverse targets, pp. 445–450
Sculley D (2009) Large scale learning to rank. In: Proceedings of NIPS 2009 workshop on advances in ranking, pp. 1–6
Smola AJ, Sch B, Schölkopf B (2004) A tutorial on support vector regression. Statist Comput 14(3):199–222
Wallace RS (2004) The anatomy of A.L.I.C.E. ALICE Artificial Intelligence Foundation, Inc.
Isozaki H, Hirao T, Duh K, Sudoh K, Tsukada H (2010) Automatic evaluation of translation quality for distant language Pairs. In: Proceedings of the conference on empirical methods on natural language processing, pp. 944–952
Grice HP (1975) Logic and conversation. In: Syntax and semantics. 3: speech acts, pp. 41–58
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Sugiyama, H., Meguro, T., Higashinaka, R. (2019). Automatic Evaluation of Chat-Oriented Dialogue Systems Using Large-Scale Multi-references. In: Eskenazi, M., Devillers, L., Mariani, J. (eds) Advanced Social Interaction with Agents . Lecture Notes in Electrical Engineering, vol 510. Springer, Cham. https://doi.org/10.1007/978-3-319-92108-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-92108-2_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92107-5
Online ISBN: 978-3-319-92108-2
eBook Packages: EngineeringEngineering (R0)