Abstract
It is well known that speech communication is a very important segment of human-robot interaction. The paper presents our experience from the project “Design of Robots as Assistive Technology for the Treatment of Children with Developmental Disorders”, with focus on the development of more expressive dialogue systems based on automatic speech recognition (ASR) and text-to-speech synthesis (TTS) in South Slavic languages. The paper presents the most recent results of our research related to the development of expressive conversational human-robot interaction, specifically in the field of conversion of voice and style of synthesized speech based on a new generation of deep neural network (DNN) based speech synthesis algorithms, as well as the field of emotional speech recognition. The development of dialogue strategies is described in more details in the second part of the paper, as well as the experience in their clinical applications for treatment of children with cerebral palsy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Hamacher, A., Bianchi-Berthouze, N., Pipe, A.G., Eder, K.: Believing in BERT: using expressive communication to enhance trust and counteract operational error in physical Human-Robot Interaction. In: 25th IEEE International Symposium on Robot and Human Interactive Communication, 26–31 August 2016, 8 pages (2016). https://doi.org/10.1109/roman.2016.7745163
Berns, K., Zafar, Z.: Emotion based human-robot interaction. In: Ronzhin, A., Shishlakov, V. (eds.) 13th International Scientific-Technical Conference on Electromechanics and Robotics “Zavalishin’s Readings”, St. Petersburg, Russia, 18–21 April 2018, MATEC Web of Conferences, vol. 161, Article 01001, 7 pages (2018). https://doi.org/10.1051/matecconf/201816101001
Popović, B., et al.: A novel split-and-merge algorithm for hierarchical clustering of Gaussian mixture models. Appl. Intell. 37(3), 377–389 (2012). https://doi.org/10.1007/s10489-011-0333-9
Popović, B., Ostrogonac, S., Pakoci, E., Jakovljević, N., Delić, V.: Deep Neural Network based continuous speech recognition for Serbian Using the Kaldi Toolkit. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS (LNAI), vol. 9319, pp. 186–192. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23132-7_23
Pakoci, E., Popović, B., Pekar, D.: Language model optimization for a deep neural network based speech recognition system for Serbian. In: Karpov, A., Potapova, R., Mporas, I. (eds.) SPECOM 2017. LNCS (LNAI), vol. 10458, pp. 483–492. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66429-3_48
Sečujski, M., Pekar, D., Knežević, D., Svrkota V.: Prosody prediction in speech synthesis based on regression trees. In: Halupka-Rešetar, S., et al. (eds.) The 3rd International Conference of Syntax, Phonology and Language Analysis, pp. 224–236. Cambridge Scholar Publishing (2012)
Nwe, T., Foo, S., De Silva, L.: Speech emotion recognition using hidden Markov models. Speech. 41, 603–623 (2003)
Schüller, B., Batliner, A., Steidl, S., Seppi, D.: Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun. 53, 1062–1087 (2011)
Delić, V., Bojanić, M., Gnjatović, M., Sečujski, M., Jovičić, S.: Discrimination capability of prosodic and spectral features for emotional speech recognition. Elektronika ir Elektrotechnika 18(9), 51–54 (2012). https://doi.org/10.5755/j01.eee.18.9.2806
Suzić, S., Delić, T., Jovanović, V., Sečujski, M., Pekar D., Delić, V.: A comparison of multi-style DNN-based TTS approaches using small datasets. In: 13th International Scientific-Technical Conference on Electromechanics and Robotics “Zavalishin’s Readings”, St. Petersburg, Russia, April 2018, MATEC Web Conference, vol. 161, 6 pages (2018). https://doi.org/10.1051/matecconf/201816103005
Fan, Y., Qian, Y., Soong, F. K., He, L.: Multi-speaker modeling and speaker adaptation for DNN-based TTS synthesis. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, April 2015. https://doi.org/10.1109/icassp.2015.7178817
Hojo, N., Ijima, Y., Mizuno, H.: An investigation of DNN-based speech synthesis using speaker codes. In: Interspeech, San Francisco, USA. https://doi.org/10.21437/interspeech.2016-589
Gnjatović, M.: Therapist-centered design of a robot’s dialogue behavior. Cogn. Comput. 6(4), 775–788 (2014)
Gnjatović, M., Delić, V.: Cognitively-inspired representational approach to meaning in machine dialogue. Knowl. Based Syst. 71, 25–33 (2014)
Gnjatović, M., Janev, M., Delić, V.: Focus tree: modeling attentional information in task-oriented human-machine interaction. Appl. Intell. 37(3), 305–320 (2012)
Mišković, D., Gnjatović, M., Štrbac, P., Trenkić, B., Jakovljević, N., Delić, V.: Hybrid methodological approach to context-dependent speech recognition. Int. J. Adv. Robot. Syst. 14(1), 12 (2017)
Gnjatović, M., et al.: Pilot corpus of child-robot interaction in therapeutic settings. In: Proceedings of the 8th IEEE International Conference on Cognitive Infocom. (CogInfoCom), Debrecen, Hungary, pp. 253–257 (2017)
Tasevski, J., Gnjatović, M., Borovac, B.: Assessing the Children’s Receptivity to the Robot MARKO. Acta Polytechnica Hungarica, Special Issue on Cognitive Infocommunications (in press)
Zwecker, M., Zeilig, G., Ohry, A.: Professor Heinrich Sebastian Frenkel: a forgotten founder of rehabilitation medicine. Spinal Cord 42, 55–56 (2004)
Acknowledgments
Research was supported in part by the Ministry of Education, Science and Technological Development of Serbia (grants TR32035 and III44008).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Delić, V. et al. (2018). Toward More Expressive Speech Communication in Human-Robot Interaction. In: Ronzhin, A., Rigoll, G., Meshcheryakov, R. (eds) Interactive Collaborative Robotics. ICR 2018. Lecture Notes in Computer Science(), vol 11097. Springer, Cham. https://doi.org/10.1007/978-3-319-99582-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-99582-3_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99581-6
Online ISBN: 978-3-319-99582-3
eBook Packages: Computer ScienceComputer Science (R0)