Abstract
Third-party applications deployed on vocal home-devices (Google Home, Amazon Echo...) are usually rule-based and follow an hard-coded dialogue graph. In this paper we describe how we included artificial intelligence in our vocal conversational agent actually running in production on Amazon Echo and soon on Google Home. This approach is based on contextual bandits, a special case of reinforcement learning, that allows to pilot the dialogue inside a fussy dialogue graph while taking advantage of the features available in the home-devices’ frameworks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Allesiardo, R., Féraud, R., & Bouneffouf, D. (2014). A neural networks committee for the contextual bandit problem. In Neural Information Processing - 21st International Conference, ICONIP (pp. 374–381).
Amazon (2017). Alexa skill kit.
Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2–3), 235–256.
Barto, A. G., & Mahadevan, S. (2003). Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13(1–2), 41–77.
Bouraoui, J.-L., & Lemaire, V. (2017). Cluster-based graphs for conceiving dialog systems. In Workshop DMNLP at European Conference on Machine Learning (ECML).
Chu, W., Li, L., Reyzin, L., & Schapire, R. (2011). Contextual bandits with linear payoff functions. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, volume 15 of Proceedings of Machine Learning Research, pages 208–214, Fort Lauderdale, FL, USA. PMLR.
Cuayáhuitl, H., Renals, S., Lemon, O., & Shimodaira, H. (2010). Evaluation of a hierarchical reinforcement learning spoken dialogue system. Computer Speech and Language, 24(2), 395.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805.
Dhingra, B., Li, L., Li, X., Gao, J., Chen, Y.-N., Ahmed, F., & Deng, L. (2016). End-to-end reinforcement learning of dialogue agents for information access. Technical report.
Fatemi, M., Asri, L. E., Schulz, H., He, J., & Suleman, K. (2016). Policy networks with two-stage training for dialogue systems. In Proceedings of the SIGDIAL 2016 Conference, The 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 13-15 September 2016, Los Angeles, CA, USA (pp. 101–110).
Féraud, R., Allesiardo, R., Urvoy, T., & Clérot, F. (2016). Random forest for the contextual bandit problem. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, AISTATS 2016, Cadiz, Spain, May 9-11, 2016 (pp. 93–101).
Google (2017). Dialogflow.
Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2016a). Bag of tricks for efficient text classification.
Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2016b). Bag of tricks for efficient text classification. arXiv:1607.01759.
Lai, T., & Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1), 4–22.
Langford, J. & Zhang, T. (2007). The epoch-greedy algorithm for multi-armed bandits with side information. In Advances in Neural Information Processing Systems 20, Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 3-6, 2007 (pp. 817–824).
Li, L., Chu, W., Langford, J., & Schapire, R. E. (2010). A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th International Conference on World Wide Web, WWW ’10 (pp. 661–670). New York, NY, USA: ACM.
Microsoft (2017). Botframework.
Rojas-Barahona, L. M., Gasic, M., Mrksic, N., Su, P., Ultes, S., Wen, T., Young, S. J., & Vandyke, D. (2017). A network-based end-to-end trainable task-oriented dialogue system. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, Valencia, Spain, April 3-7, 2017, Volume 1: Long Papers (pp. 438–449).
Singh, S. P., Kearns, M. J., Litman, D. J., & Walker, M. A. (1999). Reinforcement learning for spoken dialogue systems. In Advances in Neural Information Processing Systems 12, [NIPS Conference, Denver, Colorado, USA, November 29 - December 4, 1999] (pp. 956–962).
Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9–44.
Torregrossa, F., Kooli, N., Allesiardo, R., & Pigneul, E. (2019). How we achieved a production ready slot filling deep neural network without initial natural language data. In T. Gedeon, K. W. Wong, & M. Lee (Eds.), Neural Information Processing (pp. 247–255). Cham: Springer International Publishing.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Allesiardo, R., Sauldubois, C., Depaulis, F., Bulteau, N., Chantrel, F., Pigneul, E. (2022). A Practical Approach to Intelligent Spoken Dialogue for Third-Party Applications on Home Devices with Linear Bandits. In: Jaziri, R., Martin, A., Rousset, MC., Boudjeloud-Assala, L., Guillet, F. (eds) Advances in Knowledge Discovery and Management. Studies in Computational Intelligence, vol 1004. Springer, Cham. https://doi.org/10.1007/978-3-030-90287-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-90287-2_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-90286-5
Online ISBN: 978-3-030-90287-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)