Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3511616.3513104acmotherconferencesArticle/Chapter ViewAbstractPublication PagesacswConference Proceedingsconference-collections
research-article

A Novel Policy for Pre-trained Deep Reinforcement Learning for Speech Emotion Recognition

Published: 21 March 2022 Publication History

Abstract

Deep Reinforcement Learning (deep RL) has gained tremendous success in gaming but it has rarely been explored for Speech Emotion Recognition (SER). In the RL literature, policy used by the RL agent plays a major role in action selection, however, there is no RL policy tailored for SER. Also, an extended learning period is a general challenge for deep RL, which can impact the speed of learning for SER. In this paper, we introduce a novel policy, the “Zeta policy” tailored for SER and introduce pre-training in deep RL to achieve a faster learning rate. Pre-training with a cross dataset was also studied to discover the feasibility of pre-training the RL agent with a similar dataset in a scenario where real environmental data is not available. We use “IEMOCAP” and “SAVEE” datasets for the evaluation with the problem of recognising four emotions, namely happy, sad, angry, and neutral. The experimental results show that the proposed policy performs better than existing policies. Results also support that pre-training can reduce training time and is robust to a cross-corpus scenario.

References

[1]
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/
[2]
Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N Chang, Sungbok Lee, and Shrikanth S Narayanan. 2008. IEMOCAP: interactive emotional dyadic motion capture database. Language Resources and Evaluation 42, 4 (2008), 335. https://doi.org/10.1007/s10579-008-9076-6
[3]
Sylvain Calinon. 2018. Learning from demonstration (programming by demonstration). Encyclopedia of Robotics(2018), 1–8.
[4]
Nicolò Cesa-Bianchi, Claudio Gentile, Gábor Lugosi, and Gergely Neu. 2017. Boltzmann exploration done right(NIPS’17). Curran Associates Inc., 6287–6296.
[5]
François Chollet and others. 2015. Keras. https://keras.io
[6]
Hoon Chung, Hyeong Bae Jeon, and Jeon Gue Park. 2020. Semi-supervised Training for Sequence-to-Sequence Speech Recognition Using Reinforcement Learning. In Proceedings of the International Joint Conference on Neural Networks. Institute of Electrical and Electronics Engineers Inc.https://doi.org/10.1109/IJCNN48605.2020.9207023
[7]
Gabriel V de la Cruz Jr, Yunshu Du, and Matthew E Taylor. 2019. Pre-training Neural Networks with Human Demonstrations for Deep Reinforcement Learning. In Adaptive Learning Agents (ALA). http://arxiv.org/abs/1709.04083
[8]
Steven Davis and Paul Mermelstein. 1980. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE transactions on acoustics, speech, and signal processing 28, 4(1980), 357–366.
[9]
Caroline Etienne, Guillaume Fidanza, Andrei Petrovskii, Laurence Devillers, and Benoit Schmauch. 2018. CNN+LSTM Architecture for Speech Emotion Recognition with Data Augmentation. In Workshop on Speech, Music and Mind (SMM 2018). ISCA, ISCA. https://doi.org/10.21437/SMM.2018-5
[10]
Rasool Fakoor, Xiaodong He, Ivan Tashev, and Shuayb Zarar. 2018. Reinforcement Learning To Adapt Speech Enhancement to Instantaneous Input Signal Quality. arXiv:1711.10791 [cs](2018). http://arxiv.org/abs/1711.10791
[11]
Jianqing Fan, Zhaoran Wang, Yuchen Xie, and Zhuoran Yang. 2019. A Theoretical Analysis of Deep Q-Learning. arXiv (1 2019). http://arxiv.org/abs/1901.00137
[12]
Hyewon Han, Kyunggeun Byun, and Hong Goo Kang. 2018. A deep learning-based stress detection algorithm with speech signal. In AVSU 2018 - Proceedings of the 2018 Workshop on Audio-Visual Scene Understanding for Immersive Multimedia, Co-located with MM 2018. Association for Computing Machinery, Inc, New York, NY, USA, 11–15. https://doi.org/10.1145/3264869.3264875
[13]
Kun Han, Dong Yu, and Ivan Tashev. 2014. Speech Emotion Recognition Using Deep Neural Network and Extreme Learning Machine. In Interspeech 2014.
[14]
S Haq, P J B Jackson, and J Edge. 2008. Audio-visual feature selection and reduction for emotion classification.
[15]
Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z Leibo, and Audrunas Gruslys. 2017. Deep Q-learning from Demonstrations. arXiv:1704.03732 [cs](2017). http://arxiv.org/abs/1704.03732
[16]
Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Ian Osband, 2018. Deep q-learning from demonstrations. In Proceedings AAAI.
[17]
Kun Yi Huang, Chung Hsien Wu, Qian Bei Hong, Ming Hsiang Su, and Yi Hsuan Chen. 2019. Speech Emotion Recognition Using Deep Neural Network Considering Verbal and Nonverbal Speech Sounds. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Vol. 2019-May. Institute of Electrical and Electronics Engineers Inc., 5866–5870. https://doi.org/10.1109/ICASSP.2019.8682283
[18]
Dias Issa, M. Fatih Demirci, and Adnan Yazici. 2020. Speech emotion recognition with deep convolutional neural networks. Biomedical Signal Processing and Control 59 (5 2020), 101894. https://doi.org/10.1016/j.bspc.2020.101894
[19]
Vitaly Kurin, Sebastian Nowozin, Katja Hofmann, Lucas Beyer, and Bastian Leibe. 2017. The atari grand challenge dataset. arXiv1705.10998(2017).
[20]
Michail G Lagoudakis and Ronald Parr. 2003. Reinforcement learning as classification: leveraging modern classifiers(ICML’03). AAAI Press, 424–431.
[21]
Egor Lakomkin, Mohammad Ali Zamani, Cornelius Weber, Sven Magg, and Stefan Wermter. 2018. EmoRL: Continuous Acoustic Emotion Classification using Deep Reinforcement Learning. Proceedings - IEEE International Conference on Robotics and Automation (4 2018), 4445–4450. http://arxiv.org/abs/1804.04053
[22]
Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, and Julien Epps. 2019. Direct Modelling of Speech Emotion from Raw Speech. http://arxiv.org/abs/1904.03833
[23]
Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, Junaid Qadir, and Björn W Schuller. 2020. Deep Representation Learning in Speech Processing: Challenges, Recent Advances, and Future Trends. arXiv:2001.00378 [cs, eess](2020). http://arxiv.org/abs/2001.00378
[24]
Siddique Latif, Rajib Rana, and Junaid Qadir. 2018. Adversarial Machine Learning And Speech Emotion Recognition: Utilizing Generative Adversarial Networks For Robustness. arXiv (11 2018). http://arxiv.org/abs/1811.11402
[25]
Siddique Latif, Rajib Rana, Shahzad Younis, Junaid Qadir, and Julien Epps. 2018. Cross Corpus Speech Emotion Classification - An Effective Transfer Learning Technique. (2018).
[26]
Felix Leibfried and Peter Vrancx. 2018. Model-Based Regularization for Deep Reinforcement Learning with Transcoder Networks. arXiv (9 2018). http://arxiv.org/abs/1809.01906
[27]
Yuzong Liu and Katrin Kirchhoff. 2014. Graph-based semi-supervised acoustic modeling in DNN-based speech recognition. In 2014 IEEE Spoken Language Technology Workshop (SLT). IEEE, 177–182.
[28]
Brian McFee, Colin Raffel, Dawen Liang, Daniel P W Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. 2015. librosa: Audio and music signal analysis in python, Vol. 8.
[29]
Yafeng Niu, Dongsheng Zou, Yadong Niu, Zhongshi He, and Hua Tan. 2017. A breakthrough in Speech emotion recognition using Deep Retinal Convolution Neural Networks. arXiv (7 2017). http://arxiv.org/abs/1707.09917
[30]
Tim Paek. 2006. Reinforcement learning for spoken dialogue systems: Comparing strengths and weaknesses for practical deployment. In Proc. Dialog-on-Dialog Workshop, Interspeech.
[31]
Ling Pan, Qingpeng Cai, Qi Meng, Wei Chen, and Longbo Huang. 2020. Reinforcement learning with dynamic boltzmann softmax updates. In IJCAI International Joint Conference on Artificial Intelligence, Vol. 2021-January. International Joint Conferences on Artificial Intelligence, 1992–1998. https://doi.org/10.24963/ijcai.2020/276
[32]
R. Polikar, L. Upda, S. S. Upda, and V. Honavar. 2001. Learn++: an incremental learning algorithm for supervised neural networks. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 31, 4 (2001), 497–508. https://doi.org/10.1109/5326.983933
[33]
Rajib Rana, Siddique Latif, Raj Gururajan, Anthony Gray, Geraldine Mackenzie, Gerald Humphris, and Jeff Dunn. 2019. Automated screening for distress: A perspective for the future. European Journal of Cancer Care 28, 4 (7 2019). https://doi.org/10.1111/ecc.13033
[34]
Y Shen, C Huang, S Wang, Y Tsao, H Wang, and T Chi. 2019. Reinforcement Learning Based Speech Enhancement for Robust Speech Recognition. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 6750–6754. https://doi.org/10.1109/ICASSP.2019.8683648
[35]
David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (2016), 484–489. https://doi.org/10.1038/nature16961
[36]
Satinder P Singh, Michael J Kearns, Diane J Litman, and Marilyn A Walker. 1999. Reinforcement learning for spoken dialogue systems. In Nips. 956–962.
[37]
J Stockholm and P Pasquier. 2009. Reinforcement Learning of Listener Response for Mood Classification of Audio. In 2009 International Conference on Computational Science and Engineering, Vol. 4. 849–853. https://doi.org/10.1109/CSE.2009.184
[38]
Samuel Thomas, Michael L Seltzer, Kenneth Church, and Hynek Hermansky. 2013. Deep neural network features and semi-supervised training for low resource speech recognition. In 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, 6704–6708.
[39]
Samarth Tripathi, Sarthak Tripathi, and Homayoon Beigi. 2018. Multi-Modal Emotion recognition on IEMOCAP Dataset using Deep Learning. (4 2018). http://arxiv.org/abs/1804.05788
[40]
Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, Junhyuk Oh, Dan Horgan, Manuel Kroiss, Ivo Danihelka, Aja Huang, Laurent Sifre, Trevor Cai, John P Agapiou, Max Jaderberg, Alexander S Vezhnevets, Rémi Leblond, Tobias Pohlen, Valentin Dalibard, David Budden, Yury Sulsky, James Molloy, Tom L Paine, Caglar Gulcehre, Ziyu Wang, Tobias Pfaff, Yuhuai Wu, Roman Ring, Dani Yogatama, Dario Wünsch, Katrina McKinney, Oliver Smith, Tom Schaul, Timothy Lillicrap, Koray Kavukcuoglu, Demis Hassabis, Chris Apps, and David Silver. 2019. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 7782 (2019), 350–354. https://doi.org/10.1038/s41586-019-1724-z
[41]
Oriol Vinyals, Timo Ewalds, Sergey Bartunov, Petko Georgiev, Alexander Sasha Vezhnevets, Michelle Yeo, Alireza Makhzani, Heinrich Küttler, John Agapiou, Julian Schrittwieser, 2017. Starcraft ii: A new challenge for reinforcement learning. arXiv 2017, 1708.04782 (2017).
[42]
Christopher John Cornish Hellaby Watkins. 1989. Learning from Delayed Rewards. Ph. D. Dissertation. Cambridge, UK.
[43]
Marco Wiering. 1999. Explorations in Efficient Reinforcement Learning. Ph. D. Dissertation. https://dare.uva.nl/search?identifier=6ac07651-85ee-4c7b-9cab-86eea5b818f4
[44]
Dong Yu, Li Deng, and George Dahl. 2010. Roles of pre-training and fine-tuning in context-dependent DBN-HMMs for real-world speech recognition. In Proc. NIPS Workshop on Deep Learning and Unsupervised Feature Learning.
[45]
H Yu and P Yang. 2019. An Emotion-Based Approach to Reinforcement Learning Reward Design. In 2019 IEEE 16th International Conference on Networking, Sensing and Control (ICNSC). 346–351. https://doi.org/10.1109/ICNSC.2019.8743211
[46]
Qiang Li Zhao, Yan Huang Jiang, and Ming Xu. 2010. Incremental learning by heterogeneous bagging ensemble. In International Conference on Advanced Data Mining and Applications. Springer, 1–12.

Cited By

View all
  • (2023)Speech Emotion Recognition Using Attention ModelInternational Journal of Environmental Research and Public Health10.3390/ijerph2006514020:6(5140)Online publication date: 14-Mar-2023
  • (2023)Video Emotional Classification Based on Deep Reinforcement Learning2023 3rd Asia-Pacific Conference on Communications Technology and Computer Science (ACCTCS)10.1109/ACCTCS58815.2023.00079(168-171)Online publication date: Feb-2023
  • (2022)Dinamik Ortamlarda Derin Takviyeli Öğrenme Tabanlı Otonom Yol Planlama Yaklaşımları için Karşılaştırmalı AnalizAdıyaman Üniversitesi Mühendislik Bilimleri Dergisi10.54365/adyumbd.10255459:16(248-262)Online publication date: 14-Apr-2022

Index Terms

  1. A Novel Policy for Pre-trained Deep Reinforcement Learning for Speech Emotion Recognition
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      ACSW '22: Proceedings of the 2022 Australasian Computer Science Week
      February 2022
      260 pages
      ISBN:9781450396066
      DOI:10.1145/3511616
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 21 March 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      ACSW 2022
      ACSW 2022: Australasian Computer Science Week 2022
      February 14 - 18, 2022
      Brisbane, Australia

      Acceptance Rates

      Overall Acceptance Rate 61 of 141 submissions, 43%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)48
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 20 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Speech Emotion Recognition Using Attention ModelInternational Journal of Environmental Research and Public Health10.3390/ijerph2006514020:6(5140)Online publication date: 14-Mar-2023
      • (2023)Video Emotional Classification Based on Deep Reinforcement Learning2023 3rd Asia-Pacific Conference on Communications Technology and Computer Science (ACCTCS)10.1109/ACCTCS58815.2023.00079(168-171)Online publication date: Feb-2023
      • (2022)Dinamik Ortamlarda Derin Takviyeli Öğrenme Tabanlı Otonom Yol Planlama Yaklaşımları için Karşılaştırmalı AnalizAdıyaman Üniversitesi Mühendislik Bilimleri Dergisi10.54365/adyumbd.10255459:16(248-262)Online publication date: 14-Apr-2022

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media