Deep Learning in Conversational Language Understanding

Gokhan Tur³,
Asli Celikyilmaz⁴,
Xiaodong He⁴,
Dilek Hakkani-Tür³ &
…
Li Deng⁵

10k Accesses

Abstract

Recent advancements in AI resulted in increased availability of conversational assistants that can help with tasks such as seeking times to schedule an event and creating a calendar entry at that time, finding a restaurant and booking a table there at a certain time. However, creating automated agents with human-level intelligence still remains one of the most challenging problems of AI. One key component of such systems is conversational language understanding, which is a holy grail area of research for decades, as it is not a clearly defined task but relies heavily on the AI application it is used for. Nevertheless, this chapter attempts to compile the recent deep learning based literature on such goal-oriented conversational language understanding studies, starting with a historical perspective, pre-deep learning era work, moving toward most recent advances in this field.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Non-goal oriented dialogue agents: state of the art, dataset, and evaluation

Article 05 June 2020

Deep Reinforcement-Based Conversational AI Agent in Healthcare System

Benchmarking Natural Language Understanding Services for Building Conversational Agents

References

Allen, J. (1995). Natural language understanding, chapter 8. Benjamin/Cummings.
Google Scholar
Allen, J. F., Miller, B. W., Ringger, E. K., & Sikorski, T. (1996). A robust system for natural spoken dialogue. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 62–70.
Google Scholar
Andreas, J., Rohrbach, M., Darrell, T., & Klein, D. (2016). Learning to compose neural networks for question answering. In Proceedings of NAACL.
Google Scholar
Bapna, A., Tur, G., Hakkani-Tur, D., & Heck, L. (2017). Towards zero-shot frame semantic parsing for domain scaling. In Proceedings of the Interspeech.
Google Scholar
Bellegarda, J. R. (2004). Statistical language model adaptation: Review and perspectives. Speech Communication Special Issue on Adaptation Methods for Speech Recognition, 42, 93–108.
Google Scholar
Bonneau-Maynard, H., Rosset, S., Ayache, C., Kuhn, A., & Mostefa, D. (2005). Semantic annotation of the French MEDIA dialog corpus. In Proceedings of the Interspeech, Lisbon, Portugal.
Google Scholar
Bowman, S. R., Gauthier, J., Rastogi, A., Gupta, R., & Manning, C. D. (2016). A fast unified model for parsing and sentence understanding. In Proceedings of ACL.
Google Scholar
Celikyilmaz, A., Sarikaya, R., Hakkani, D., Liu, X., Ramesh, N., & Tur, G. (2016). A new pre-training method for training deep learning models with application to spoken language understanding. In Proceedings of The 17th Annual Meeting of the International Speech Communication Association (INTERSPEECH 2016).
Google Scholar
Chen, Y.-N., Hakkani-Tur, D., & He, X. (2015a). Zero-shot learning of intent embeddings for expansion by convolutional deep structured semantic models. In Proceedings of the IEEE ICASSP.
Google Scholar
Chen, Y.-N., Hakkani-Tür, D., Tur, G., Gao, J., & Deng, L. (2016). End-to-end memory networks with knowledge carryover for multi-turn spoken language understanding. In Proceedings of the Interspeech, San Francisco, CA.
Google Scholar
Chen, Y.-N., Wang, W. Y., Gershman, A., & Rudnicky, A. I. (2015b). Matrix factorization with knowledge graph propagation for unsupervised spoken language understanding. In Proceedings of the ACLIJCNLP.
Google Scholar
Chen, Y.-N., Wang, W. Y., & Rudnicky, A. I. (2013). Unsupervised induction and filling of semantic slots for spoken dialogue systems using frame-semantic parsing. In Proceedings of the IEEE ASRU.
Google Scholar
Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press.
Google Scholar
Chu-Carroll, J., & Carpenter, B. (1999). Vector-based natural language call routing. Computational Linguistics, 25(3), 361–388.
Google Scholar
Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the ICML, Helsinki, Finland.
Google Scholar
Dahl, D. A., Bates, M., Brown, M., Fisher, W., Hunicke-Smith, K., Pallett, D., et al. (1994). Expanding the scope of the ATIS task: the ATIS-3 corpus. In Proceedings of the Human Language Technology Workshop. Morgan Kaufmann.
Google Scholar
Damnati, G., Bechet, F., & de Mori, R. (2007). Spoken language understanding strategies on the france telecom 3000 voice agency corpus. In Proceedings of the ICASSP, Honolulu, HI.
Google Scholar
Dauphin, Y., Tur, G., Hakkani-Tür, D., & Heck, L. (2014). Zero-shot learning and clustering for semantic utterance classification. In Proceedings of the ICLR.
Google Scholar
Deng, L., & Li, X. (2013). Machine learning paradigms for speech recognition: An overview. IEEE Transactions on Audio, Speech, and Language Processing, 21(5), 1060–1089.
Article Google Scholar
Deng, L., & O’Shaughnessy, D. (2003). Speech processing: A dynamic and optimization-oriented approach. Marcel Dekker, New York: Publisher.
Google Scholar
Deng, L., & Yu, D. (2011). Deep convex nets: A scalable architecture for speech pattern classification. In Proceedings of the Interspeech, Florence, Italy.
Google Scholar
Deoras, A., & Sarikaya, R. (2013). Deep belief network based semantic taggers for spoken language understanding. In Proceedings of the IEEE Interspeech, Lyon, France.
Google Scholar
Dupont, Y., Dinarelli, M., & Tellier, I. (2017). Label-dependencies aware recurrent neural networks. arXiv preprint arXiv:1706.01740.
Elman, J. L. (1990). Finding structure in time. Cognitive science, 14(2), 179–211.
Article Google Scholar
Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139.
Article MathSciNet Google Scholar
Gorin, A. L., Abella, A., Alonso, T., Riccardi, G., & Wright, J. H. (2002). Automated natural spoken dialog. IEEE Computer Magazine, 35(4), 51–56.
Article Google Scholar
Gorin, A. L., Riccardi, G., & Wright, J. H. (1997). How may I help you? Speech Communication, 23, 113–127.
Article Google Scholar
Guo, D., Tur, G., Yih, W.-t., & Zweig, G. (2014). Joint semantic utterance classification and slot filling with recursive neural networks. In In Proceedings of the IEEE SLT Workshop.
Google Scholar
Gupta, N., Tur, G., Hakkani-Tür, D., Bangalore, S., Riccardi, G., & Rahim, M. (2006). The AT&T spoken language understanding system. IEEE Transactions on Audio, Speech, and Language Processing, 14(1), 213–222.
Article Google Scholar
Hahn, S., Dinarelli, M., Raymond, C., Lefevre, F., Lehnen, P., Mori, R. D., et al. (2011). Comparing stochastic approaches to spoken language understanding in multiple languages. IEEE Transactions on Audio, Speech, and Language Processing, 19(6), 1569–1583.
Article Google Scholar
Hakkani-Tür, D., Tur, G., Celikyilmaz, A., Chen, Y.-N., Gao, J., Deng, L., & Wang, Y.-Y. (2016). Multi-domain joint semantic frame parsing using bi-directional RNN-LSTM. In Proceedings of the Interspeech, San Francisco, CA.
Google Scholar
He, X., & Deng, L. (2011). Speech recognition, machine translation, and speech translation a unified discriminative learning paradigm. In IEEE Signal Processing Magazine, 28(5), 126–133.
Article MathSciNet Google Scholar
He, X. & Deng, L. (2013). Speech-centric information processing: An optimization-oriented approach. In Proceedings of the IEEE, 101(5), 1116–1135.
Article Google Scholar
Hemphill, C. T., Godfrey, J. J., & Doddington, G. R. (1990). The ATIS spoken language systems pilot corpus. In Proceedings of the Workshop on Speech and Natural Language, HLT’90, pp. 96–101, Morristown, NJ, USA. Association for Computational Linguistics.
Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G., Rahman Mohamed, A., Jaitly, N., et al. (2012). Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 29(6), 82–97.
Article Google Scholar
Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Advances in Neural Computation, 18(7), 1527–1554.
Article MathSciNet Google Scholar
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735–1780.
Article Google Scholar
Hori, C., Hori, T., Watanabe, S., & Hershey, J. R. (2014). Context sensitive spoken language understanding using role dependent lstm layers. In Proceedings of the Machine Learning for SLU Interaction NIPS 2015 Workshop.
Google Scholar
Huang, X., & Deng, L. (2010). An overview of modern speech recognition. In Handbook of Natural Language Processing, Second Edition, Chapter 15.
Google Scholar
Jaech, A., Heck, L., & Ostendorf, M. (2016). Domain adaptation of recurrent neural networks for natural language understanding. In Proceedings of the Interspeech, San Francisco, CA.
Google Scholar
Jordan, M. (1997). Serial order: A parallel distributed processing approach. Technical Report 8604, University of California San Diego, Institute of Computer Science.
Chapter Google Scholar
Kalchbrenner, N., Grefenstette, E., & Blunsom, P. (2014). A convolutional neural network for modelling sentences. In Proceedings of the ACL, Baltimore, MD.
Google Scholar
Kim, Y. (2014). Convolutional neural networks for sentence classification. In Proceedings of the EMNLP, Doha, Qatar.
Google Scholar
Kim, Y.-B., Stratos, K., Sarikaya, R., & Jeong, M. (2015). New transfer learning techniques for disparate label sets. In Proceedings of the ACL-IJCNLP.
Google Scholar
Kuhn, R., & Mori, R. D. (1995). The application of semantic classification trees to natural language understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17, 449–460.
Article Google Scholar
Kurata, G., Xiang, B., Zhou, B., & Yu, M. (2016a). Leveraging sentence-level information with encoder LSTM for semantic slot filling. In Proceedings of the EMNLP, Austin, TX.
Google Scholar
Kurata, G., Xiang, B., Zhou, B., & Yu, M. (2016b). Leveraging sentence-level information with encoder lstm for semantic slot filling. arXiv preprint arXiv:1601.01530.
Lee, J. Y., & Dernoncourt, F. (2016). Sequential short-text classification with recurrent and convolutional neural networks. In Proceedings of the NAACL.
Google Scholar
Li, J., Deng, L., Gong, Y., & Haeb-Umbach, R. (2014). An overview of noise-robust automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(4), 745–777.
Article Google Scholar
Liu, B., & Lane, I. (2015). Recurrent neural network structured output prediction for spoken language understanding. In Proc: NIPS Workshop on Machine Learning for Spoken Language Understanding and Interactions.
Google Scholar
Liu, B., & Lane, I. (2016). Attention-based recurrent neural network models for joint intent detection and slot filling. In Proceedings of the Interspeech, San Francisco, CA.
Google Scholar
Mesnil, G., Dauphin, Y., Yao, K., Bengio, Y., Deng, L., Hakkani-Tür, D., et al. (2015). Using recurrent neural networks for slot filling in spoken language understanding. IEEE Transactions on Audio, Speech, and Language Processing, 23(3), 530–539.
Article Google Scholar
Mesnil, G., He, X., Deng, L., & Bengio, Y. (2013). Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In Proceedings of the Interspeech, Lyon, France.
Google Scholar
Natarajan, P., Prasad, R., Suhm, B., & McCarthy, D. (2002). Speech enabled natural language call routing: BBN call director. In Proceedings of the ICSLP, Denver, CO.
Google Scholar
Pieraccini, R., Tzoukermann, E., Gorelov, Z., Gauvain, J.-L., Levin, E., Lee, C.-H., et al. (1992). A speech understanding system based on statistical representation of semantics. In Proceedings of the ICASSP, San Francisco, CA.
Google Scholar
Price, P. J. (1990). Evaluation of spoken language systems: The ATIS domain. In Proceedings of the DARPA Workshop on Speech and Natural Language, Hidden Valley, PA.
Google Scholar
Ravuri, S., & Stolcke, A. (2015). Recurrent neural network and lstm models for lexical utterance classification. In Proceedings of the Interspeech.
Google Scholar
Raymond, C., & Riccardi, G. (2007). Generative and discriminative algorithms for spoken language understanding. In Proceedings of the Interspeech, Antwerp, Belgium.
Google Scholar
Sarikaya, R., Hinton, G. E., & Deoras, A. (2014). Application of deep belief networks for natural language understanding. IEEE Transactions on Audio, Speech, and Language Processing, 22(4).
Article Google Scholar
Sarikaya, R., Hinton, G. E., & Ramabhadran, B. (2011). Deep belief nets for natural language call-routing. In Proceedings of the ICASSP, Prague, Czech Republic.
Google Scholar
Seneff, S. (1992). TINA: A natural language system for spoken language applications. Computational Linguistics, 18(1), 61–86.
Google Scholar
Simonnet, E., Camelin, N., Deleglise, P., & Esteve, Y. (2015). Exploring the use of attention-based recurrent neural networks for spoken language understanding. In Proceedings of the NIPS Workshop on Machine Learning for Spoken Language Understanding and Interaction.
Google Scholar
Socher, R., Lin, C. C., Ng, A. Y., & Manning, C. D. (2011). Parsing natural scenes and natural language with recursive neural networks. In Proceedings of ICML.
Google Scholar
Sordoni, A., Bengio, Y., Vahabi, H., Lioma, C., Simonsen, J. G., & Nie, J.-Y. (2015). A hierarchical recurrent encoder-decoder for generative context-aware query suggestion. In Proceedings of the ACM CIKM.
Google Scholar
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Advances in neural information processing systems 27, chapter Sequence to sequence learning with neural networks.
Google Scholar
Tafforeau, J., Bechet, F., Artiere1, T., & Favre, B. (2016). Joint syntactic and semantic analysis with a multitask deep learning framework for spoken language understanding. In Proceedings of the Interspeech, San Francisco, CA.
Google Scholar
Tur, G., & Deng, L. (2011). Intent determination and spoken utterance classification, Chapter 4 in book: Spoken language understanding. New York, NY: Wiley.
Google Scholar
Tur, G., Hakkani-Tür, D., & Heck, L. (2010). What is left to be understood in ATIS? In Proceedings of the IEEE SLT Workshop, Berkeley, CA.
Google Scholar
Tur, G., & Mori, R. D. (Eds.). (2011). Spoken language understanding: Systems for extracting semantic information from speech. New York, NY: Wiley.
MATH Google Scholar
Vinyals, O., Fortunato, M., & Jaitly, N. (2015). Pointer networks. In Proceedings of the NIPS.
Google Scholar
Vinyals, O., & Le, Q. V. (2015). A neural conversational model. In Proceedings of the ICML.
Google Scholar
Vu, N. T., Gupta, P., Adel, H., & Schütze, H. (2016). Bi-directional recurrent neural network with ranking loss for spoken language understanding. In Proceedings of the IEEE ICASSP, Shanghai, China.
Google Scholar
Vukotic, V., Raymond, C., & Gravier, G. (2016). A step beyond local observations with a dialog aware bidirectional gru network for spoken language understanding. In Proceedings of the Interspeech, San Francisco, CA.
Google Scholar
Walker, M., Aberdeen, J., Boland, J., Bratt, E., Garofolo, J., Hirschman, L., et al. (2001). DARPA communicator dialog travel planning systems: The June 2000 data collection. In Proceedings of the Eurospeech Conference.
Google Scholar
Wang, Y., Deng, L., & Acero, A. (2011). Semantic frame based spoken language understanding, Chapter 3. New York, NY: Wiley.
Google Scholar
Ward, W., & Issar, S. (1994). Recent improvements in the CMU spoken language understanding system. In Proceedings of the ARPA HLT Workshop, pages 213–216.
Google Scholar
Weizenbaum, J. (1966). Eliza—A computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36–45.
Article Google Scholar
Woods, W. A. (1983). Language processing for speech understanding. Prentice-Hall International, Englewood Cliffs, NJ: In Computer Speech Processing.
Google Scholar
Xu, P., & Sarikaya, R. (2013). Convolutional neural network based triangular crf for joint intent detection and slot filling. In Proceedings of the IEEE ASRU.
Google Scholar
Yao, K., Peng, B., Zhang, Y., Yu, D., Zweig, G., & Shi, Y. (2014). Spoken language understanding using long short-term memory neural networks. In Proceedings of the IEEE SLT Workshop, South Lake Tahoe, CA. IEEE.
Google Scholar
Yao, K., Zweig, G., Hwang, M.-Y., Shi, Y., & Yu, D. (2013). Recurrent neural networks for language understanding. In Proceedings of the Interspeech, Lyon, France.
Google Scholar
Zhai, F., Potdar, S., Xiang, B., & Zhou, B. (2017). Neural models for sequence chunking. In Proceedings of the AAAI.
Google Scholar
Zhang, X., & Wang, H. (2016). A joint model of intent determination and slot filling for spoken language understanding. In Proceedings of the IJCAI.
Google Scholar
Zhu, S., & Yu, K. (2016a). Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding. In submission.
Google Scholar
Zhu, S., & Yu, K. (2016b). Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding. arXiv preprint arXiv:1608.02097.

Download references

Author information

Authors and Affiliations

Google, Mountain View, CA, USA
Gokhan Tur & Dilek Hakkani-Tür
Microsoft Research, Redmond, WA, USA
Asli Celikyilmaz & Xiaodong He
Citadel, Chicago & Seattle, USA
Li Deng

Authors

Gokhan Tur
View author publications
You can also search for this author in PubMed Google Scholar
Asli Celikyilmaz
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong He
View author publications
You can also search for this author in PubMed Google Scholar
Dilek Hakkani-Tür
View author publications
You can also search for this author in PubMed Google Scholar
Li Deng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gokhan Tur .

Editor information

Editors and Affiliations

AI Research at Citadel , Chicago, Illinois, USA
Li Deng
Tsinghua University , Beijing, China
Yang Liu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tur, G., Celikyilmaz, A., He, X., Hakkani-Tür, D., Deng, L. (2018). Deep Learning in Conversational Language Understanding. In: Deng, L., Liu, Y. (eds) Deep Learning in Natural Language Processing. Springer, Singapore. https://doi.org/10.1007/978-981-10-5209-5_2

Download citation

DOI: https://doi.org/10.1007/978-981-10-5209-5_2
Published: 24 May 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-5208-8
Online ISBN: 978-981-10-5209-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics