research-article

A Novel Policy for Pre-trained Deep Reinforcement Learning for Speech Emotion Recognition

Authors:

Thejan Rajapakshe,

Bjorn SchullerAuthors Info & Claims

ACSW '22: Proceedings of the 2022 Australasian Computer Science Week

Pages 96 - 105

https://doi.org/10.1145/3511616.3513104

Published: 21 March 2022 Publication History

Abstract

Deep Reinforcement Learning (deep RL) has gained tremendous success in gaming but it has rarely been explored for Speech Emotion Recognition (SER). In the RL literature, policy used by the RL agent plays a major role in action selection, however, there is no RL policy tailored for SER. Also, an extended learning period is a general challenge for deep RL, which can impact the speed of learning for SER. In this paper, we introduce a novel policy, the “Zeta policy” tailored for SER and introduce pre-training in deep RL to achieve a faster learning rate. Pre-training with a cross dataset was also studied to discover the feasibility of pre-training the RL agent with a similar dataset in a scenario where real environmental data is not available. We use “IEMOCAP” and “SAVEE” datasets for the evaluation with the problem of recognising four emotions, namely happy, sad, angry, and neutral. The experimental results show that the proposed policy performs better than existing policies. Results also support that pre-training can reduce training time and is robust to a cross-corpus scenario.

References

[1]

Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/

[2]

Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N Chang, Sungbok Lee, and Shrikanth S Narayanan. 2008. IEMOCAP: interactive emotional dyadic motion capture database. Language Resources and Evaluation 42, 4 (2008), 335. https://doi.org/10.1007/s10579-008-9076-6

[3]

Sylvain Calinon. 2018. Learning from demonstration (programming by demonstration). Encyclopedia of Robotics(2018), 1–8.

[4]

Nicolò Cesa-Bianchi, Claudio Gentile, Gábor Lugosi, and Gergely Neu. 2017. Boltzmann exploration done right(NIPS’17). Curran Associates Inc., 6287–6296.

[5]

François Chollet and others. 2015. Keras. https://keras.io

[6]

Hoon Chung, Hyeong Bae Jeon, and Jeon Gue Park. 2020. Semi-supervised Training for Sequence-to-Sequence Speech Recognition Using Reinforcement Learning. In Proceedings of the International Joint Conference on Neural Networks. Institute of Electrical and Electronics Engineers Inc.https://doi.org/10.1109/IJCNN48605.2020.9207023

[7]

Gabriel V de la Cruz Jr, Yunshu Du, and Matthew E Taylor. 2019. Pre-training Neural Networks with Human Demonstrations for Deep Reinforcement Learning. In Adaptive Learning Agents (ALA). http://arxiv.org/abs/1709.04083

[8]

Steven Davis and Paul Mermelstein. 1980. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE transactions on acoustics, speech, and signal processing 28, 4(1980), 357–366.

[9]

Caroline Etienne, Guillaume Fidanza, Andrei Petrovskii, Laurence Devillers, and Benoit Schmauch. 2018. CNN+LSTM Architecture for Speech Emotion Recognition with Data Augmentation. In Workshop on Speech, Music and Mind (SMM 2018). ISCA, ISCA. https://doi.org/10.21437/SMM.2018-5

[10]

Rasool Fakoor, Xiaodong He, Ivan Tashev, and Shuayb Zarar. 2018. Reinforcement Learning To Adapt Speech Enhancement to Instantaneous Input Signal Quality. arXiv:1711.10791 [cs](2018). http://arxiv.org/abs/1711.10791

[11]

Jianqing Fan, Zhaoran Wang, Yuchen Xie, and Zhuoran Yang. 2019. A Theoretical Analysis of Deep Q-Learning. arXiv (1 2019). http://arxiv.org/abs/1901.00137

[12]

Hyewon Han, Kyunggeun Byun, and Hong Goo Kang. 2018. A deep learning-based stress detection algorithm with speech signal. In AVSU 2018 - Proceedings of the 2018 Workshop on Audio-Visual Scene Understanding for Immersive Multimedia, Co-located with MM 2018. Association for Computing Machinery, Inc, New York, NY, USA, 11–15. https://doi.org/10.1145/3264869.3264875

Digital Library

[13]

Kun Han, Dong Yu, and Ivan Tashev. 2014. Speech Emotion Recognition Using Deep Neural Network and Extreme Learning Machine. In Interspeech 2014.

[14]

S Haq, P J B Jackson, and J Edge. 2008. Audio-visual feature selection and reduction for emotion classification.

[15]

Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z Leibo, and Audrunas Gruslys. 2017. Deep Q-learning from Demonstrations. arXiv:1704.03732 [cs](2017). http://arxiv.org/abs/1704.03732

[16]

Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Ian Osband, 2018. Deep q-learning from demonstrations. In Proceedings AAAI.

[17]

Kun Yi Huang, Chung Hsien Wu, Qian Bei Hong, Ming Hsiang Su, and Yi Hsuan Chen. 2019. Speech Emotion Recognition Using Deep Neural Network Considering Verbal and Nonverbal Speech Sounds. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Vol. 2019-May. Institute of Electrical and Electronics Engineers Inc., 5866–5870. https://doi.org/10.1109/ICASSP.2019.8682283

[18]

Dias Issa, M. Fatih Demirci, and Adnan Yazici. 2020. Speech emotion recognition with deep convolutional neural networks. Biomedical Signal Processing and Control 59 (5 2020), 101894. https://doi.org/10.1016/j.bspc.2020.101894

[19]

Vitaly Kurin, Sebastian Nowozin, Katja Hofmann, Lucas Beyer, and Bastian Leibe. 2017. The atari grand challenge dataset. arXiv1705.10998(2017).

[20]

Michail G Lagoudakis and Ronald Parr. 2003. Reinforcement learning as classification: leveraging modern classifiers(ICML’03). AAAI Press, 424–431.

[21]

Egor Lakomkin, Mohammad Ali Zamani, Cornelius Weber, Sven Magg, and Stefan Wermter. 2018. EmoRL: Continuous Acoustic Emotion Classification using Deep Reinforcement Learning. Proceedings - IEEE International Conference on Robotics and Automation (4 2018), 4445–4450. http://arxiv.org/abs/1804.04053

Digital Library

[22]

Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, and Julien Epps. 2019. Direct Modelling of Speech Emotion from Raw Speech. http://arxiv.org/abs/1904.03833

[23]

Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, Junaid Qadir, and Björn W Schuller. 2020. Deep Representation Learning in Speech Processing: Challenges, Recent Advances, and Future Trends. arXiv:2001.00378 [cs, eess](2020). http://arxiv.org/abs/2001.00378

[24]

Siddique Latif, Rajib Rana, and Junaid Qadir. 2018. Adversarial Machine Learning And Speech Emotion Recognition: Utilizing Generative Adversarial Networks For Robustness. arXiv (11 2018). http://arxiv.org/abs/1811.11402

[25]

Siddique Latif, Rajib Rana, Shahzad Younis, Junaid Qadir, and Julien Epps. 2018. Cross Corpus Speech Emotion Classification - An Effective Transfer Learning Technique. (2018).

[26]

Felix Leibfried and Peter Vrancx. 2018. Model-Based Regularization for Deep Reinforcement Learning with Transcoder Networks. arXiv (9 2018). http://arxiv.org/abs/1809.01906

[27]

Yuzong Liu and Katrin Kirchhoff. 2014. Graph-based semi-supervised acoustic modeling in DNN-based speech recognition. In 2014 IEEE Spoken Language Technology Workshop (SLT). IEEE, 177–182.

[28]

Brian McFee, Colin Raffel, Dawen Liang, Daniel P W Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. 2015. librosa: Audio and music signal analysis in python, Vol. 8.

[29]

Yafeng Niu, Dongsheng Zou, Yadong Niu, Zhongshi He, and Hua Tan. 2017. A breakthrough in Speech emotion recognition using Deep Retinal Convolution Neural Networks. arXiv (7 2017). http://arxiv.org/abs/1707.09917

[30]

Tim Paek. 2006. Reinforcement learning for spoken dialogue systems: Comparing strengths and weaknesses for practical deployment. In Proc. Dialog-on-Dialog Workshop, Interspeech.

[31]

Ling Pan, Qingpeng Cai, Qi Meng, Wei Chen, and Longbo Huang. 2020. Reinforcement learning with dynamic boltzmann softmax updates. In IJCAI International Joint Conference on Artificial Intelligence, Vol. 2021-January. International Joint Conferences on Artificial Intelligence, 1992–1998. https://doi.org/10.24963/ijcai.2020/276

[32]

R. Polikar, L. Upda, S. S. Upda, and V. Honavar. 2001. Learn++: an incremental learning algorithm for supervised neural networks. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 31, 4 (2001), 497–508. https://doi.org/10.1109/5326.983933

Digital Library

[33]

Rajib Rana, Siddique Latif, Raj Gururajan, Anthony Gray, Geraldine Mackenzie, Gerald Humphris, and Jeff Dunn. 2019. Automated screening for distress: A perspective for the future. European Journal of Cancer Care 28, 4 (7 2019). https://doi.org/10.1111/ecc.13033

[34]

Y Shen, C Huang, S Wang, Y Tsao, H Wang, and T Chi. 2019. Reinforcement Learning Based Speech Enhancement for Robust Speech Recognition. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 6750–6754. https://doi.org/10.1109/ICASSP.2019.8683648

[35]

David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (2016), 484–489. https://doi.org/10.1038/nature16961

[36]

Satinder P Singh, Michael J Kearns, Diane J Litman, and Marilyn A Walker. 1999. Reinforcement learning for spoken dialogue systems. In Nips. 956–962.

[37]

J Stockholm and P Pasquier. 2009. Reinforcement Learning of Listener Response for Mood Classification of Audio. In 2009 International Conference on Computational Science and Engineering, Vol. 4. 849–853. https://doi.org/10.1109/CSE.2009.184

Digital Library

[38]

Samuel Thomas, Michael L Seltzer, Kenneth Church, and Hynek Hermansky. 2013. Deep neural network features and semi-supervised training for low resource speech recognition. In 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, 6704–6708.

[39]

Samarth Tripathi, Sarthak Tripathi, and Homayoon Beigi. 2018. Multi-Modal Emotion recognition on IEMOCAP Dataset using Deep Learning. (4 2018). http://arxiv.org/abs/1804.05788

[40]

Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, Junhyuk Oh, Dan Horgan, Manuel Kroiss, Ivo Danihelka, Aja Huang, Laurent Sifre, Trevor Cai, John P Agapiou, Max Jaderberg, Alexander S Vezhnevets, Rémi Leblond, Tobias Pohlen, Valentin Dalibard, David Budden, Yury Sulsky, James Molloy, Tom L Paine, Caglar Gulcehre, Ziyu Wang, Tobias Pfaff, Yuhuai Wu, Roman Ring, Dani Yogatama, Dario Wünsch, Katrina McKinney, Oliver Smith, Tom Schaul, Timothy Lillicrap, Koray Kavukcuoglu, Demis Hassabis, Chris Apps, and David Silver. 2019. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 7782 (2019), 350–354. https://doi.org/10.1038/s41586-019-1724-z

[41]

Oriol Vinyals, Timo Ewalds, Sergey Bartunov, Petko Georgiev, Alexander Sasha Vezhnevets, Michelle Yeo, Alireza Makhzani, Heinrich Küttler, John Agapiou, Julian Schrittwieser, 2017. Starcraft ii: A new challenge for reinforcement learning. arXiv 2017, 1708.04782 (2017).

[42]

Christopher John Cornish Hellaby Watkins. 1989. Learning from Delayed Rewards. Ph. D. Dissertation. Cambridge, UK.

[43]

Marco Wiering. 1999. Explorations in Efficient Reinforcement Learning. Ph. D. Dissertation. https://dare.uva.nl/search?identifier=6ac07651-85ee-4c7b-9cab-86eea5b818f4

[44]

Dong Yu, Li Deng, and George Dahl. 2010. Roles of pre-training and fine-tuning in context-dependent DBN-HMMs for real-world speech recognition. In Proc. NIPS Workshop on Deep Learning and Unsupervised Feature Learning.

[45]

H Yu and P Yang. 2019. An Emotion-Based Approach to Reinforcement Learning Reward Design. In 2019 IEEE 16th International Conference on Networking, Sensing and Control (ICNSC). 346–351. https://doi.org/10.1109/ICNSC.2019.8743211

[46]

Qiang Li Zhao, Yan Huang Jiang, and Ming Xu. 2010. Incremental learning by heterogeneous bagging ensemble. In International Conference on Advanced Data Mining and Applications. Springer, 1–12.

Cited By

Slade SZhang LAsadi HLim CYu YZhao DPanesar AWu PGao R(2025)Cluster search optimisation of deep neural networks for audio emotion classificationKnowledge-Based Systems10.1016/j.knosys.2025.113223314(113223)Online publication date: Apr-2025
https://doi.org/10.1016/j.knosys.2025.113223
Rajapakshe TRana RKhalifa SSchuller B(2024)Domain Adapting Deep Reinforcement Learning for Real-World Speech Emotion RecognitionIEEE Access10.1109/ACCESS.2024.351976112(193101-193114)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3519761
Singh JSaheer LFaust O(2023)Speech Emotion Recognition Using Attention ModelInternational Journal of Environmental Research and Public Health10.3390/ijerph2006514020:6(5140)Online publication date: 14-Mar-2023
https://doi.org/10.3390/ijerph20065140
Show More Cited By

Index Terms

A Novel Policy for Pre-trained Deep Reinforcement Learning for Speech Emotion Recognition

Index terms have been assigned to the content through auto-classification.

Recommendations

A multi-step on-policy deep reinforcement learning method assisted by off-policy policy evaluation
Abstract
On-policy deep reinforcement learning (DRL) has the inherent advantage of using multi-step interaction data for policy learning. However, on-policy DRL still faces challenges in improving the sample efficiency of policy evaluations. Therefore, we ...
Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
Neural Information Processing
Abstract
As the two hottest branches of machine learning, deep learning and reinforcement learning both play a vital role in the field of artificial intelligence. Combining deep learning with reinforcement learning, deep reinforcement learning is a method ...
Policy reuse in deep reinforcement learning
AAAI'17: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence

Driven by recent developments in Artificial Intelligence research, a promising new technology for building intelligent agents has evolved. The approach is termed Deep Reinforcement Learning and combines the classic field of Reinforcement Learning (RL) ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ACSW '22: Proceedings of the 2022 Australasian Computer Science Week

February 2022

260 pages

ISBN:9781450396066

DOI:10.1145/3511616

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 March 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

ACSW 2022

ACSW 2022: Australasian Computer Science Week 2022

February 14 - 18, 2022

Brisbane, Australia

Acceptance Rates

Overall Acceptance Rate 61 of 141 submissions, 43%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
209
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)5

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Slade SZhang LAsadi HLim CYu YZhao DPanesar AWu PGao R(2025)Cluster search optimisation of deep neural networks for audio emotion classificationKnowledge-Based Systems10.1016/j.knosys.2025.113223314(113223)Online publication date: Apr-2025
https://doi.org/10.1016/j.knosys.2025.113223
Rajapakshe TRana RKhalifa SSchuller B(2024)Domain Adapting Deep Reinforcement Learning for Real-World Speech Emotion RecognitionIEEE Access10.1109/ACCESS.2024.351976112(193101-193114)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3519761
Singh JSaheer LFaust O(2023)Speech Emotion Recognition Using Attention ModelInternational Journal of Environmental Research and Public Health10.3390/ijerph2006514020:6(5140)Online publication date: 14-Mar-2023
https://doi.org/10.3390/ijerph20065140
Yuan TYuan Y(2023)Video Emotional Classification Based on Deep Reinforcement Learning2023 3rd Asia-Pacific Conference on Communications Technology and Computer Science (ACCTCS)10.1109/ACCTCS58815.2023.00079(168-171)Online publication date: Feb-2023
https://doi.org/10.1109/ACCTCS58815.2023.00079
TAN ZKARAKÖSE M(2022)Dinamik Ortamlarda Derin Takviyeli Öğrenme Tabanlı Otonom Yol Planlama Yaklaşımları için Karşılaştırmalı AnalizAdıyaman Üniversitesi Mühendislik Bilimleri Dergisi10.54365/adyumbd.10255459:16(248-262)Online publication date: 14-Apr-2022
https://doi.org/10.54365/adyumbd.1025545

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten