Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3459104.3459169acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiseeieConference Proceedingsconference-collections
research-article

Contextualised Word Embeddings Based on Transfer Learning to Dialogue Response Generation: a Proposal and Comparisons

Published: 20 July 2021 Publication History

Abstract

Contextualised word embeddings have recently become essential elements of Natural Language Processing (NLP) systems since these embedding models encode not only words but also their contexts to generate context-specific representations. Pre-trained models such as BERT, GPT, and derived architectures are increasingly present on NLP task benchmarks. Several comparative analyses of such models have been performed, but so far no one compares the most recent architectures in a dialogue generation dataset by considering multiple metrics relevant to the task. In this paper, we not only propose an encoder-decoder system that uses transfer learning with pre-trained word embeddings, but we also systematically compare various pretrained contextualised word embedding architectures on the DSTC-7 dataset, using metrics based on mutual information, dialogue length, and variety of answers. We use the word embeddings as a first layer of the encoder, making it possible to encode the texts in a latent space. As a decoder, we use an LSTM layer and a byte pair encoding tokenisation, aligned with state-of-the-art dialogue systems recently published. The networks are trained during the same amount of epochs, with the same optimisers and learning rates. Considering the quality of the dialogue, our results show that there is no superior technique on all metrics. However, there are relevant differences concerning the computational costs to encode the data.

References

[1]
Ian R Beaver. 2018. Automatic Conversation Review for Intelligent Virtual Assistants. https://digitalrepository.unm.edu/cs_etds/93 (2018).
[2]
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2016. Enriching Word Vectors with Subword Information. arXiv:1607.04606 (2016).
[3]
Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv:1406.1078 (2014).
[4]
Ron Chrisley. 2008. Philosophical foundations of artificial consciousness. Artificial Intelligence in Medicine 44, 2 (2008), 119–137.
[5]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423
[6]
Timothy Dozat. 2016. Incorporating Nesterov Momentum into Adam. ICLR 2016 Workshop (2016).
[7]
John R Firth. 1957. A synopsis of linguistic theory, 1930-1955. Studies in Linguistic Analysis, Philological Society, Oxford (1957).
[8]
MichelGalley, ChrisBrockett, XiangGao, BillDolan,and Jianfeng Gao. 2018. End-to-End Conversation Modeling: Moving beyond Chitchat. http://workshop.colips.org/dstc7 (2018).
[9]
Nuria Haristiani. 2019. Artificial Intelligence (AI) Chatbots Language Learning Medium: An inquiry. In Journal of Physics: Conference Series, Vol. 1387. IOP Publishing, 012020.
[10]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780.
[11]
Vladimir Ilievski, Claudiu Musat, Andreea Hossmann, and Michael Baeriswyl. 2018. Goal-oriented chatbot dialog management bootstrapping with transfer learning. arXiv:1802.00500 (2018).
[12]
H. N. Io and C. B. Lee. 2017. Chatbots and conversational agents: A bibliometric analysis. 2017 IEEE International Conference on Industrial Engineering and Engineering Management. (2017.). https://doi.org/10.1109/IEEM.2017.8289883
[13]
Chandra Khatri, Anu Venkatesh, Behnam Hedayatnia, Raefer Gabriel, Ashwin Ram, and Rohit Prasad. 2018. Alexa Prize—State of the Art in Conversational AI. AI Magazine 39, 3 (2018), 40–55.
[14]
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. Albert: A lite bert for self-supervised learning of language representations. arXiv:1909.11942 (2019).
[15]
Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016. A Diversity-Promoting Objective Function for Neural Conversation Models. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, San Diego, California, 110–119. https://doi.org/10.18653/v1/N16- 1014
[16]
Jiwei Li, Will Monroe, Alan Ritter, Dan Jurafsky, Michel Galley, and Jianfeng Gao. 2016. Deep Reinforcement Learning for Dialogue Generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, Texas, 1192–1202. https://doi.org/10.18653/v1/D16-1127
[17]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111–3119.
[18]
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1532–1543.
[19]
Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving Language Understanding by Generative Pre-Training. https://openai.com/blog/language-unsupervised/
[20]
Alan Ritter, Colin Cherry, and William B Dolan. 2011. Data-driven response generation in social media. In Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 583–593.
[21]
Rico Sennrich, Barry Haddow, and Alexandra Birch. 2015. Neural machine translation of rare words with subword units. arXiv:1508.07909 (2015).
[22]
Iulian V Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, and Joelle Pineau. 2016. Building end-to-end dialogue systems using generative hierarchical neural network models. In Thirtieth AAAI Conference on Artificial Intelligence. 3776–3783
[23]
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 3104–3112. http://papers.nips.cc/paper/5346-sequence-to-sequence- learning-with-neural-networks.pdf
[24]
Lisa Torrey and Jude Shavlik. 2010. Transfer learning. In Handbook of research on machine learning applications and trends: algorithms, methods, and techniques. IGI Global, 242–264.
[25]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998–6008.
[26]
Jeremy West, Dan Ventura, and Sean Warnick. 2007. Spring research presentation: A theoretical foundation for inductive transfer. Brigham Young University, College of Physical and Mathematical Sciences 1 (2007), 32.
[27]
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. arXiv:1906.08237 (2019).
[28]
Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, and Bill Dolan. 2019. DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation. arXiv:1911.00536 (2019).

Index Terms

  1. Contextualised Word Embeddings Based on Transfer Learning to Dialogue Response Generation: a Proposal and Comparisons
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ISEEIE 2021: 2021 International Symposium on Electrical, Electronics and Information Engineering
    February 2021
    644 pages
    ISBN:9781450389839
    DOI:10.1145/3459104
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 July 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
    • Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

    Conference

    ISEEIE 2021

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 30
      Total Downloads
    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 17 Nov 2024

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media