Nothing Special   »   [go: up one dir, main page]

Skip to main content

Deep Learning in Conversational Language Understanding

  • Chapter
  • First Online:
Deep Learning in Natural Language Processing

Abstract

Recent advancements in AI resulted in increased availability of conversational assistants that can help with tasks such as seeking times to schedule an event and creating a calendar entry at that time, finding a restaurant and booking a table there at a certain time. However, creating automated agents with human-level intelligence still remains one of the most challenging problems of AI. One key component of such systems is conversational language understanding, which is a holy grail area of research for decades, as it is not a clearly defined task but relies heavily on the AI application it is used for. Nevertheless, this chapter attempts to compile the recent deep learning based literature on such goal-oriented conversational language understanding studies, starting with a historical perspective, pre-deep learning era work, moving toward most recent advances in this field.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Allen, J. (1995). Natural language understanding, chapter 8. Benjamin/Cummings.

    Google Scholar 

  • Allen, J. F., Miller, B. W., Ringger, E. K., & Sikorski, T. (1996). A robust system for natural spoken dialogue. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 62–70.

    Google Scholar 

  • Andreas, J., Rohrbach, M., Darrell, T., & Klein, D. (2016). Learning to compose neural networks for question answering. In Proceedings of NAACL.

    Google Scholar 

  • Bapna, A., Tur, G., Hakkani-Tur, D., & Heck, L. (2017). Towards zero-shot frame semantic parsing for domain scaling. In Proceedings of the Interspeech.

    Google Scholar 

  • Bellegarda, J. R. (2004). Statistical language model adaptation: Review and perspectives. Speech Communication Special Issue on Adaptation Methods for Speech Recognition, 42, 93–108.

    Google Scholar 

  • Bonneau-Maynard, H., Rosset, S., Ayache, C., Kuhn, A., & Mostefa, D. (2005). Semantic annotation of the French MEDIA dialog corpus. In Proceedings of the Interspeech, Lisbon, Portugal.

    Google Scholar 

  • Bowman, S. R., Gauthier, J., Rastogi, A., Gupta, R., & Manning, C. D. (2016). A fast unified model for parsing and sentence understanding. In Proceedings of ACL.

    Google Scholar 

  • Celikyilmaz, A., Sarikaya, R., Hakkani, D., Liu, X., Ramesh, N., & Tur, G. (2016). A new pre-training method for training deep learning models with application to spoken language understanding. In Proceedings of The 17th Annual Meeting of the International Speech Communication Association (INTERSPEECH 2016).

    Google Scholar 

  • Chen, Y.-N., Hakkani-Tur, D., & He, X. (2015a). Zero-shot learning of intent embeddings for expansion by convolutional deep structured semantic models. In Proceedings of the IEEE ICASSP.

    Google Scholar 

  • Chen, Y.-N., Hakkani-Tür, D., Tur, G., Gao, J., & Deng, L. (2016). End-to-end memory networks with knowledge carryover for multi-turn spoken language understanding. In Proceedings of the Interspeech, San Francisco, CA.

    Google Scholar 

  • Chen, Y.-N., Wang, W. Y., Gershman, A., & Rudnicky, A. I. (2015b). Matrix factorization with knowledge graph propagation for unsupervised spoken language understanding. In Proceedings of the ACLIJCNLP.

    Google Scholar 

  • Chen, Y.-N., Wang, W. Y., & Rudnicky, A. I. (2013). Unsupervised induction and filling of semantic slots for spoken dialogue systems using frame-semantic parsing. In Proceedings of the IEEE ASRU.

    Google Scholar 

  • Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press.

    Google Scholar 

  • Chu-Carroll, J., & Carpenter, B. (1999). Vector-based natural language call routing. Computational Linguistics, 25(3), 361–388.

    Google Scholar 

  • Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the ICML, Helsinki, Finland.

    Google Scholar 

  • Dahl, D. A., Bates, M., Brown, M., Fisher, W., Hunicke-Smith, K., Pallett, D., et al. (1994). Expanding the scope of the ATIS task: the ATIS-3 corpus. In Proceedings of the Human Language Technology Workshop. Morgan Kaufmann.

    Google Scholar 

  • Damnati, G., Bechet, F., & de Mori, R. (2007). Spoken language understanding strategies on the france telecom 3000 voice agency corpus. In Proceedings of the ICASSP, Honolulu, HI.

    Google Scholar 

  • Dauphin, Y., Tur, G., Hakkani-Tür, D., & Heck, L. (2014). Zero-shot learning and clustering for semantic utterance classification. In Proceedings of the ICLR.

    Google Scholar 

  • Deng, L., & Li, X. (2013). Machine learning paradigms for speech recognition: An overview. IEEE Transactions on Audio, Speech, and Language Processing, 21(5), 1060–1089.

    Article  Google Scholar 

  • Deng, L., & O’Shaughnessy, D. (2003). Speech processing: A dynamic and optimization-oriented approach. Marcel Dekker, New York: Publisher.

    Google Scholar 

  • Deng, L., & Yu, D. (2011). Deep convex nets: A scalable architecture for speech pattern classification. In Proceedings of the Interspeech, Florence, Italy.

    Google Scholar 

  • Deoras, A., & Sarikaya, R. (2013). Deep belief network based semantic taggers for spoken language understanding. In Proceedings of the IEEE Interspeech, Lyon, France.

    Google Scholar 

  • Dupont, Y., Dinarelli, M., & Tellier, I. (2017). Label-dependencies aware recurrent neural networks. arXiv preprint arXiv:1706.01740.

  • Elman, J. L. (1990). Finding structure in time. Cognitive science, 14(2), 179–211.

    Article  Google Scholar 

  • Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139.

    Article  MathSciNet  Google Scholar 

  • Gorin, A. L., Abella, A., Alonso, T., Riccardi, G., & Wright, J. H. (2002). Automated natural spoken dialog. IEEE Computer Magazine, 35(4), 51–56.

    Article  Google Scholar 

  • Gorin, A. L., Riccardi, G., & Wright, J. H. (1997). How may I help you? Speech Communication, 23, 113–127.

    Article  Google Scholar 

  • Guo, D., Tur, G., Yih, W.-t., & Zweig, G. (2014). Joint semantic utterance classification and slot filling with recursive neural networks. In In Proceedings of the IEEE SLT Workshop.

    Google Scholar 

  • Gupta, N., Tur, G., Hakkani-Tür, D., Bangalore, S., Riccardi, G., & Rahim, M. (2006). The AT&T spoken language understanding system. IEEE Transactions on Audio, Speech, and Language Processing, 14(1), 213–222.

    Article  Google Scholar 

  • Hahn, S., Dinarelli, M., Raymond, C., Lefevre, F., Lehnen, P., Mori, R. D., et al. (2011). Comparing stochastic approaches to spoken language understanding in multiple languages. IEEE Transactions on Audio, Speech, and Language Processing, 19(6), 1569–1583.

    Article  Google Scholar 

  • Hakkani-Tür, D., Tur, G., Celikyilmaz, A., Chen, Y.-N., Gao, J., Deng, L., & Wang, Y.-Y. (2016). Multi-domain joint semantic frame parsing using bi-directional RNN-LSTM. In Proceedings of the Interspeech, San Francisco, CA.

    Google Scholar 

  • He, X., & Deng, L. (2011). Speech recognition, machine translation, and speech translation a unified discriminative learning paradigm. In IEEE Signal Processing Magazine, 28(5), 126–133.

    Article  MathSciNet  Google Scholar 

  • He, X. & Deng, L. (2013). Speech-centric information processing: An optimization-oriented approach. In Proceedings of the IEEE, 101(5), 1116–1135.

    Article  Google Scholar 

  • Hemphill, C. T., Godfrey, J. J., & Doddington, G. R. (1990). The ATIS spoken language systems pilot corpus. In Proceedings of the Workshop on Speech and Natural Language, HLT’90, pp. 96–101, Morristown, NJ, USA. Association for Computational Linguistics.

    Google Scholar 

  • Hinton, G., Deng, L., Yu, D., Dahl, G., Rahman Mohamed, A., Jaitly, N., et al. (2012). Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 29(6), 82–97.

    Article  Google Scholar 

  • Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Advances in Neural Computation, 18(7), 1527–1554.

    Article  MathSciNet  Google Scholar 

  • Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735–1780.

    Article  Google Scholar 

  • Hori, C., Hori, T., Watanabe, S., & Hershey, J. R. (2014). Context sensitive spoken language understanding using role dependent lstm layers. In Proceedings of the Machine Learning for SLU Interaction NIPS 2015 Workshop.

    Google Scholar 

  • Huang, X., & Deng, L. (2010). An overview of modern speech recognition. In Handbook of Natural Language Processing, Second Edition, Chapter 15.

    Google Scholar 

  • Jaech, A., Heck, L., & Ostendorf, M. (2016). Domain adaptation of recurrent neural networks for natural language understanding. In Proceedings of the Interspeech, San Francisco, CA.

    Google Scholar 

  • Jordan, M. (1997). Serial order: A parallel distributed processing approach. Technical Report 8604, University of California San Diego, Institute of Computer Science.

    Chapter  Google Scholar 

  • Kalchbrenner, N., Grefenstette, E., & Blunsom, P. (2014). A convolutional neural network for modelling sentences. In Proceedings of the ACL, Baltimore, MD.

    Google Scholar 

  • Kim, Y. (2014). Convolutional neural networks for sentence classification. In Proceedings of the EMNLP, Doha, Qatar.

    Google Scholar 

  • Kim, Y.-B., Stratos, K., Sarikaya, R., & Jeong, M. (2015). New transfer learning techniques for disparate label sets. In Proceedings of the ACL-IJCNLP.

    Google Scholar 

  • Kuhn, R., & Mori, R. D. (1995). The application of semantic classification trees to natural language understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17, 449–460.

    Article  Google Scholar 

  • Kurata, G., Xiang, B., Zhou, B., & Yu, M. (2016a). Leveraging sentence-level information with encoder LSTM for semantic slot filling. In Proceedings of the EMNLP, Austin, TX.

    Google Scholar 

  • Kurata, G., Xiang, B., Zhou, B., & Yu, M. (2016b). Leveraging sentence-level information with encoder lstm for semantic slot filling. arXiv preprint arXiv:1601.01530.

  • Lee, J. Y., & Dernoncourt, F. (2016). Sequential short-text classification with recurrent and convolutional neural networks. In Proceedings of the NAACL.

    Google Scholar 

  • Li, J., Deng, L., Gong, Y., & Haeb-Umbach, R. (2014). An overview of noise-robust automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(4), 745–777.

    Article  Google Scholar 

  • Liu, B., & Lane, I. (2015). Recurrent neural network structured output prediction for spoken language understanding. In Proc: NIPS Workshop on Machine Learning for Spoken Language Understanding and Interactions.

    Google Scholar 

  • Liu, B., & Lane, I. (2016). Attention-based recurrent neural network models for joint intent detection and slot filling. In Proceedings of the Interspeech, San Francisco, CA.

    Google Scholar 

  • Mesnil, G., Dauphin, Y., Yao, K., Bengio, Y., Deng, L., Hakkani-Tür, D., et al. (2015). Using recurrent neural networks for slot filling in spoken language understanding. IEEE Transactions on Audio, Speech, and Language Processing, 23(3), 530–539.

    Article  Google Scholar 

  • Mesnil, G., He, X., Deng, L., & Bengio, Y. (2013). Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In Proceedings of the Interspeech, Lyon, France.

    Google Scholar 

  • Natarajan, P., Prasad, R., Suhm, B., & McCarthy, D. (2002). Speech enabled natural language call routing: BBN call director. In Proceedings of the ICSLP, Denver, CO.

    Google Scholar 

  • Pieraccini, R., Tzoukermann, E., Gorelov, Z., Gauvain, J.-L., Levin, E., Lee, C.-H., et al. (1992). A speech understanding system based on statistical representation of semantics. In Proceedings of the ICASSP, San Francisco, CA.

    Google Scholar 

  • Price, P. J. (1990). Evaluation of spoken language systems: The ATIS domain. In Proceedings of the DARPA Workshop on Speech and Natural Language, Hidden Valley, PA.

    Google Scholar 

  • Ravuri, S., & Stolcke, A. (2015). Recurrent neural network and lstm models for lexical utterance classification. In Proceedings of the Interspeech.

    Google Scholar 

  • Raymond, C., & Riccardi, G. (2007). Generative and discriminative algorithms for spoken language understanding. In Proceedings of the Interspeech, Antwerp, Belgium.

    Google Scholar 

  • Sarikaya, R., Hinton, G. E., & Deoras, A. (2014). Application of deep belief networks for natural language understanding. IEEE Transactions on Audio, Speech, and Language Processing, 22(4).

    Article  Google Scholar 

  • Sarikaya, R., Hinton, G. E., & Ramabhadran, B. (2011). Deep belief nets for natural language call-routing. In Proceedings of the ICASSP, Prague, Czech Republic.

    Google Scholar 

  • Seneff, S. (1992). TINA: A natural language system for spoken language applications. Computational Linguistics, 18(1), 61–86.

    Google Scholar 

  • Simonnet, E., Camelin, N., Deleglise, P., & Esteve, Y. (2015). Exploring the use of attention-based recurrent neural networks for spoken language understanding. In Proceedings of the NIPS Workshop on Machine Learning for Spoken Language Understanding and Interaction.

    Google Scholar 

  • Socher, R., Lin, C. C., Ng, A. Y., & Manning, C. D. (2011). Parsing natural scenes and natural language with recursive neural networks. In Proceedings of ICML.

    Google Scholar 

  • Sordoni, A., Bengio, Y., Vahabi, H., Lioma, C., Simonsen, J. G., & Nie, J.-Y. (2015). A hierarchical recurrent encoder-decoder for generative context-aware query suggestion. In Proceedings of the ACM CIKM.

    Google Scholar 

  • Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Advances in neural information processing systems 27, chapter Sequence to sequence learning with neural networks.

    Google Scholar 

  • Tafforeau, J., Bechet, F., Artiere1, T., & Favre, B. (2016). Joint syntactic and semantic analysis with a multitask deep learning framework for spoken language understanding. In Proceedings of the Interspeech, San Francisco, CA.

    Google Scholar 

  • Tur, G., & Deng, L. (2011). Intent determination and spoken utterance classification, Chapter 4 in book: Spoken language understanding. New York, NY: Wiley.

    Google Scholar 

  • Tur, G., Hakkani-Tür, D., & Heck, L. (2010). What is left to be understood in ATIS? In Proceedings of the IEEE SLT Workshop, Berkeley, CA.

    Google Scholar 

  • Tur, G., & Mori, R. D. (Eds.). (2011). Spoken language understanding: Systems for extracting semantic information from speech. New York, NY: Wiley.

    MATH  Google Scholar 

  • Vinyals, O., Fortunato, M., & Jaitly, N. (2015). Pointer networks. In Proceedings of the NIPS.

    Google Scholar 

  • Vinyals, O., & Le, Q. V. (2015). A neural conversational model. In Proceedings of the ICML.

    Google Scholar 

  • Vu, N. T., Gupta, P., Adel, H., & Schütze, H. (2016). Bi-directional recurrent neural network with ranking loss for spoken language understanding. In Proceedings of the IEEE ICASSP, Shanghai, China.

    Google Scholar 

  • Vukotic, V., Raymond, C., & Gravier, G. (2016). A step beyond local observations with a dialog aware bidirectional gru network for spoken language understanding. In Proceedings of the Interspeech, San Francisco, CA.

    Google Scholar 

  • Walker, M., Aberdeen, J., Boland, J., Bratt, E., Garofolo, J., Hirschman, L., et al. (2001). DARPA communicator dialog travel planning systems: The June 2000 data collection. In Proceedings of the Eurospeech Conference.

    Google Scholar 

  • Wang, Y., Deng, L., & Acero, A. (2011). Semantic frame based spoken language understanding, Chapter 3. New York, NY: Wiley.

    Google Scholar 

  • Ward, W., & Issar, S. (1994). Recent improvements in the CMU spoken language understanding system. In Proceedings of the ARPA HLT Workshop, pages 213–216.

    Google Scholar 

  • Weizenbaum, J. (1966). Eliza—A computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36–45.

    Article  Google Scholar 

  • Woods, W. A. (1983). Language processing for speech understanding. Prentice-Hall International, Englewood Cliffs, NJ: In Computer Speech Processing.

    Google Scholar 

  • Xu, P., & Sarikaya, R. (2013). Convolutional neural network based triangular crf for joint intent detection and slot filling. In Proceedings of the IEEE ASRU.

    Google Scholar 

  • Yao, K., Peng, B., Zhang, Y., Yu, D., Zweig, G., & Shi, Y. (2014). Spoken language understanding using long short-term memory neural networks. In Proceedings of the IEEE SLT Workshop, South Lake Tahoe, CA. IEEE.

    Google Scholar 

  • Yao, K., Zweig, G., Hwang, M.-Y., Shi, Y., & Yu, D. (2013). Recurrent neural networks for language understanding. In Proceedings of the Interspeech, Lyon, France.

    Google Scholar 

  • Zhai, F., Potdar, S., Xiang, B., & Zhou, B. (2017). Neural models for sequence chunking. In Proceedings of the AAAI.

    Google Scholar 

  • Zhang, X., & Wang, H. (2016). A joint model of intent determination and slot filling for spoken language understanding. In Proceedings of the IJCAI.

    Google Scholar 

  • Zhu, S., & Yu, K. (2016a). Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding. In submission.

    Google Scholar 

  • Zhu, S., & Yu, K. (2016b). Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding. arXiv preprint arXiv:1608.02097.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gokhan Tur .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Tur, G., Celikyilmaz, A., He, X., Hakkani-Tür, D., Deng, L. (2018). Deep Learning in Conversational Language Understanding. In: Deng, L., Liu, Y. (eds) Deep Learning in Natural Language Processing. Springer, Singapore. https://doi.org/10.1007/978-981-10-5209-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-5209-5_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-5208-8

  • Online ISBN: 978-981-10-5209-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics