Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3373017.3373028acmotherconferencesArticle/Chapter ViewAbstractPublication PagesacswConference Proceedingsconference-collections
research-article

A Short Survey of Pre-trained Language Models for Conversational AI-A New Age in NLP

Published: 04 February 2020 Publication History

Abstract

Building a dialogue system that can communicate naturally with humans is a challenging yet interesting problem of agent-based computing. The rapid growth in this area is usually hindered by the long-standing problem of data scarcity as these systems are expected to learn syntax, grammar, decision making, and reasoning from insufficient amounts of task-specific dataset. The recently introduced pre-trained language models have the potential to address the issue of data scarcity and bring considerable advantages by generating contextualized word embeddings. These models are considered counterpart of ImageNet in NLP and have demonstrated to capture different facets of language such as hierarchical relations, long-term dependency, and sentiment. In this short survey paper, we discuss the recent progress made in the field of pre-trained language models. We also deliberate that how the strengths of these language models can be leveraged in designing more engaging and more eloquent conversational agents. This paper, therefore, intends to establish whether these pre-trained models can overcome the challenges pertinent to dialogue systems, and how their architecture could be exploited in order to overcome these challenges. Open challenges in the field of dialogue systems have also been deliberated.

References

[1]
Pawel Budzianowski and Ivan Vulic. 2019. Hello, It’s GPT-2 - How Can I Help You? Towards the Use of Pretrained Language Models for Task-Oriented Dialogue Systems. CoRR abs/1907.05774(2019). arxiv:1907.05774
[2]
Pawel Budzianowski, Tsung-Hsien Wen, Bo-Hsiang Tseng, Iñigo Casanueva, Stefan Ultes, Osman Ramadan, and Milica Gasic. 2018. MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018. 5016–5026. https://doi.org/10.18653/v1/d18-1547
[3]
Guan-Lin Chao and Ian Lane. 2019. BERT-DST: Scalable End-to-End Dialogue State Tracking with Bidirectional Encoder Representations from Transformer. In Proc. Interspeech 2019. https://doi.org/10.21437/interspeech.2019-1355
[4]
Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy Liang, and Luke Zettlemoyer. 2018. QuAC: Question Answering in Context. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018. 2174–2184. https://doi.org/10.18653/v1/d18-1241
[5]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers). 4171–4186.
[6]
Jianfeng Gao, Michel Galley, and Lihong Li. 2018. Neural Approaches to Conversational AI. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2018, Ann Arbor, MI, USA, July 08-12, 2018. 1371–1374. https://doi.org/10.1145/3209978.3210183
[7]
Hugo Larochelle, Dumitru Erhan, and Yoshua Bengio. 2008. Zero-data Learning of New Tasks. In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, AAAI 2008, Chicago, Illinois, USA, July 13-17, 2008. 646–651.
[8]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692(2019). arxiv:1907.11692
[9]
Yasuhito Ohsugi, Itsumi Saito, Kyosuke Nishida, Hisako Asano, and Junji Tomita. 2019. A Simple but Effective Method to Incorporate Multi-turn Context with BERT for Conversational Machine Comprehension. In Proceedings of the First Workshop on NLP for Conversational AI. Association for Computational Linguistics, Florence, Italy, 11–17. https://doi.org/10.18653/v1/w19-4102
[10]
Mark Palatucci, Dean Pomerleau, Geoffrey E. Hinton, and Tom M. Mitchell. 2009. Zero-shot Learning with Semantic Output Codes. In Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, Vancouver, British Columbia, Canada.1410–1418.
[11]
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep Contextualized Word Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers). 2227–2237. https://doi.org/10.18653/v1/n18-1202
[12]
Chen Qu, Liu Yang, Minghui Qiu, W. Bruce Croft, Yongfeng Zhang, and Mohit Iyyer. 2019. BERT with History Answer Embedding for Conversational Question Answering. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, Paris, France, July 21-25, 2019.1133–1136. https://doi.org/10.1145/3331184.3331341
[13]
Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. URL https://s3-us-west-2. amazonaws. com/openai-assets/researchcovers/languageunsupervised/language understanding paper. pdf (2018).
[14]
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019).
[15]
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016. 2383–2392. https://doi.org/10.18653/v1/d16-1264
[16]
Siva Reddy, Danqi Chen, and Christopher D. Manning. 2019. CoQA: A Conversational Question Answering Challenge. TACL 7(2019), 249–266. https://doi.org/10.1162/tacl_a_00266
[17]
Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. 2016. Bidirectional Attention Flow for Machine Comprehension. abs/1611.01603 (2016).
[18]
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. CoRR abs/1906.08237(2019). arxiv:1906.08237
[19]
Yi Ting Yeh and Yun-Nung Chen. 2019. FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension. CoRR abs/1908.05117(2019). arxiv:1908.05117
[20]
Zhuosheng Zhang, Yuwei Wu, Hai Zhao, Zuchao Li, Shuailiang Zhang, Xi Zhou, and Xiang Zhou. 2019. Semantics-aware BERT for Language Understanding. CoRR abs/1909.02209(2019). arxiv:1909.02209

Cited By

View all
  • (2024)GPT (Generative Pre-Trained Transformer)— A Comprehensive Review on Enabling Technologies, Potential Applications, Emerging Challenges, and Future DirectionsIEEE Access10.1109/ACCESS.2024.338949712(54608-54649)Online publication date: 2024
  • (2024)Combating Fake News on Social Media: A Fusion Approach for Improved Detection and InterpretabilityIEEE Access10.1109/ACCESS.2023.334284312(2074-2085)Online publication date: 2024
  • (2024)Unleashing the Power of AI in Communication Technology: Advances, Challenges, and Collaborative ProspectsArtificial General Intelligence (AGI) Security10.1007/978-981-97-3222-7_10(211-226)Online publication date: 31-Aug-2024
  • Show More Cited By
  1. A Short Survey of Pre-trained Language Models for Conversational AI-A New Age in NLP

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ACSW '20: Proceedings of the Australasian Computer Science Week Multiconference
    February 2020
    367 pages
    ISBN:9781450376976
    DOI:10.1145/3373017
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 February 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Agent-based computing
    2. dialogue systems
    3. intelligent agents
    4. natural language processing
    5. pre-trained language models

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ACSW '20
    ACSW '20: Australasian Computer Science Week 2020
    February 4 - 6, 2020
    VIC, Melbourne, Australia

    Acceptance Rates

    Overall Acceptance Rate 61 of 141 submissions, 43%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)219
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 21 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)GPT (Generative Pre-Trained Transformer)— A Comprehensive Review on Enabling Technologies, Potential Applications, Emerging Challenges, and Future DirectionsIEEE Access10.1109/ACCESS.2024.338949712(54608-54649)Online publication date: 2024
    • (2024)Combating Fake News on Social Media: A Fusion Approach for Improved Detection and InterpretabilityIEEE Access10.1109/ACCESS.2023.334284312(2074-2085)Online publication date: 2024
    • (2024)Unleashing the Power of AI in Communication Technology: Advances, Challenges, and Collaborative ProspectsArtificial General Intelligence (AGI) Security10.1007/978-981-97-3222-7_10(211-226)Online publication date: 31-Aug-2024
    • (2024)Evolution of ChatGPT and Different Language Models: A ReviewSmart Trends in Computing and Communications10.1007/978-981-97-1313-4_8(87-97)Online publication date: 2-Jun-2024
    • (2023)Enhancing predictive modeling for Indian banking stock trendsJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23147245:5(8761-8773)Online publication date: 4-Nov-2023
    • (2023)The Programmer’s Assistant: Conversational Interaction with a Large Language Model for Software DevelopmentProceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584037(491-514)Online publication date: 27-Mar-2023
    • (2023)An experimental study measuring the generalization of fine‐tuned language representation models across commonsense reasoning benchmarksExpert Systems10.1111/exsy.1324340:5Online publication date: 10-Feb-2023
    • (2023)Keeping the Questions Conversational: Using Structured Representations to Resolve Dependency in Conversational Question Answering2023 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN54540.2023.10191510(1-7)Online publication date: 18-Jun-2023
    • (2023)Comparative Analysis of BERT Models for Sentiment Analysis on Twitter Data2023 9th International Conference on Smart Computing and Communications (ICSCC)10.1109/ICSCC59169.2023.10335061(658-663)Online publication date: 17-Aug-2023
    • (2023)Research on Error Handling Techniques for Speech Interaction2023 IEEE 3rd International Conference on Computer Systems (ICCS)10.1109/ICCS59700.2023.10335595(16-20)Online publication date: 22-Sep-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media