Automatic Evaluation of Chat-Oriented Dialogue Systems Using Large-Scale Multi-references

Hiroaki Sugiyama³⁵,
Toyomi Meguro³⁵ &
Ryuichiro Higashinaka^35,36

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 510))

1008 Accesses
1 Citations

Abstract

The automatic evaluation of chat-oriented dialogue systems remains an open problem. Most studies have evaluated them by hand, but this approach requires huge cost. We propose a regression-based automatic evaluation method that evaluates the utterances generated by chat-oriented dialogue systems based on the similarities to many reference sentences and their annotated evaluation values. Our proposed method estimates the scores of utterances with high correlations to the human annotated scores; the sentence-wise correlation coefficients reached 0.514, and the system-wise correlation were 0.772.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Usefulness, localizability, humanness, and language-benefit: additional evaluation criteria for natural language dialogue systems

Article 04 January 2016

Survey on evaluation methods for dialogue systems

Article Open access 25 June 2020

Deep AM-FM: Toolkit for Automatic Dialogue Evaluation

Notes

1.
https://www.google.co.jp/trends/topcharts#date=2012.
2.
We used NIST geometric sequence smoothing, which is implemented in nltk (Method 3).

References

Higashinaka R, Imamura K, Meguro T, Miyazaki C, Kobayashi N, Sugiyama H, Hirano T, Makino T, Matsuo Y (2014) Towards an open-domain conversational system fully based on natural language processing. In: Proceedings of the 25th international conference on computational linguistics, pp. 928–939
Google Scholar
Sugiyama H, Meguro T, Higashinaka R, Minami Y (2014) Open-domain utterance generation using phrase pairs based on dependency relations. In: Proceedings of spoken language technology workshop, pp. 60–65
Google Scholar
Ritter A, Cherry C, Dolan W (2011) Data-driven response generation in social media. In: Proceedings of the 2011 conference on empirical methods in natural language processing, pp. 583–593
Google Scholar
Papineni K, Roukos S, Ward T, Zhu W (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, pp. 311–318
Google Scholar
Galley M, Brockett C, Sordoni A, Ji Y, Auli M, Quirk C, Mitchell M, Gao J, Dolan B (2015) Delta-BLEU: a discriminative metric for generation tasks with intrinsically diverse targets, pp. 445–450
Google Scholar
Sculley D (2009) Large scale learning to rank. In: Proceedings of NIPS 2009 workshop on advances in ranking, pp. 1–6
Google Scholar
Smola AJ, Sch B, Schölkopf B (2004) A tutorial on support vector regression. Statist Comput 14(3):199–222
Article MathSciNet Google Scholar
Wallace RS (2004) The anatomy of A.L.I.C.E. ALICE Artificial Intelligence Foundation, Inc.
Google Scholar
Isozaki H, Hirao T, Duh K, Sudoh K, Tsukada H (2010) Automatic evaluation of translation quality for distant language Pairs. In: Proceedings of the conference on empirical methods on natural language processing, pp. 944–952
Google Scholar
Grice HP (1975) Logic and conversation. In: Syntax and semantics. 3: speech acts, pp. 41–58
Google Scholar

Download references

Author information

Authors and Affiliations

NTT Communication Science Laboratories, 2-4, Hikari-dai, Seika-cho, Souraku-gun, Kyoto, Japan
Hiroaki Sugiyama, Toyomi Meguro & Ryuichiro Higashinaka
NTT Media Intelligence Laboratories, 1-1, Hikari-no-oka, Yokosuka-shi, Kanagawa, Japan
Ryuichiro Higashinaka

Authors

Hiroaki Sugiyama
View author publications
You can also search for this author in PubMed Google Scholar
Toyomi Meguro
View author publications
You can also search for this author in PubMed Google Scholar
Ryuichiro Higashinaka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hiroaki Sugiyama .

Editor information

Editors and Affiliations

Language Technologies Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
Maxine Eskenazi
LIMSI-CNRS, Sorbonne University, Paris, France
Laurence Devillers
LIMSI-CNRS, Paris-Saclay University, Orsay, France
Joseph Mariani

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sugiyama, H., Meguro, T., Higashinaka, R. (2019). Automatic Evaluation of Chat-Oriented Dialogue Systems Using Large-Scale Multi-references. In: Eskenazi, M., Devillers, L., Mariani, J. (eds) Advanced Social Interaction with Agents . Lecture Notes in Electrical Engineering, vol 510. Springer, Cham. https://doi.org/10.1007/978-3-319-92108-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-92108-2_2
Published: 02 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92107-5
Online ISBN: 978-3-319-92108-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Automatic Evaluation of Chat-Oriented Dialogue Systems Using Large-Scale Multi-references

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Usefulness, localizability, humanness, and language-benefit: additional evaluation criteria for natural language dialogue systems

Survey on evaluation methods for dialogue systems

Deep AM-FM: Toolkit for Automatic Dialogue Evaluation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Automatic Evaluation of Chat-Oriented Dialogue Systems Using Large-Scale Multi-references

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Usefulness, localizability, humanness, and language-benefit: additional evaluation criteria for natural language dialogue systems

Survey on evaluation methods for dialogue systems

Deep AM-FM: Toolkit for Automatic Dialogue Evaluation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation